GLSL Flow Control

Filed under:GPGPU, Rendering, General — posted by Brian on February 11, 2008 @ 1:00 am

The nVidia GeForce 8 Documentation explicitly says that loops should now support run-time condition testing, which to my mind means that non-const variables should be permissible in the condition test. Either way, I should be able to create an infinite loop and have a statement inside which breaks out of the loop once a counter variable reaches a certain, dynamically calculated number. And barring that, I should be able to create a for-loop that executes some max number of iterations, but also has a statement inside which breaks out once the counter variable reaches a certain number.

Of course, none of the above works on my 8800 GTX. I speculated that there was some loop unrolling going on that might be throwing the system off, but a quick look at the generated assembly (using NVemulate, handy little program that it is) indicated otherwise. The last example I gave above is especially weird: I can create a for-loop that executes a max number of iterations and put a break condition inside, but the number I put in as the max number of iterations matters and is very, very touchy, despite that the underlying assembly doesn't change at all. Sometimes I can use 70 just fine, sometimes I can only go up to 10, which is highly restrictive.

This isn't an app-killer; I've got my system basically working and working well, but for some of my requirements the restrictions might get in my way.

OpenGL Debugging

Filed under:GPGPU, Rendering, General — posted by Brian on February 9, 2008 @ 9:58 pm

Alas, the majority of nVidia's profiling/debugging tools only work with DirectX, so I am either forced to insert code myself or find different tools.

GLIntercept looked promising, but it hasn't been updated in a little over two years. There's no logging of geometry shader info in there, which I could really use. It also appears to be mostly non-interactive, preferring to output results to log files (though it does have some very interesting plugins that allow you to dynamically edit shaders or fly through a scene).

glslDevil is much cooler, though very temperamental. It gives dynamic information, allows stepping through GL calls, allows stepping through shaders (very cool), allows putting watches on shader variables (cooler), can show/save the framebuffer, and do a whole lot more. It also gives geometry shader information. It does, however, crash a lot; it crashes when my research program compiles a shader, which is, uh, unfortunate.

GPU Random Number Generation

Filed under:GPGPU, Procedural Content Generation, General — posted by admin on February 8, 2008 @ 12:34 pm

It confuses me a bit why the GPU doesn't have a decent random number generator. Procedural textures have been around for quite some time, and moving their generation to a shader makes sense. There is a noise function in GLSL which doesn't work reliably (everything I've read indicates it always returns a 0 on nVidia cards). I needed some kind of random number generation for my building generation, so I had to write my own.

The Mersenne Twister was the first to catch my attention, but after looking at a few implementations, it doesn't look like a natural fit for a GPU. It's fast, which is good, but it has a few bits of the algorithm (using an array for a state vector, really large numbers that would cause an error on the GPU) that made it prohibitive. If I knew the algorithm well enough, perhaps I could overcome the limitations, but I really don't have any experience in this area.

Someone on GameDev.net recommended Simplex Noise, which resembles Perlin noise. This algorithm seemed fairly involved though, and I don't need Perlin noise - random noise works fine for my purposes. Were I randomly generating textures, this would be something to look into.

I landed on a Linear Congruential Generator, which seems to be the simplest type of generator. It's fast, in my case only requiring a single multiplication and modulus operator. The distribution it generates is reasonable; it could be better, but I'm not doing cryptography here. I had to search a bit on the internet for numbers that worked better (the numbers in the Wikipedia article don't play nice with the GPU), but now I'm pretty much set.

Seeding is slightly awkward on the GPU, since I'm doing some ping-ponging and there's no global memory to speak of. Basically I send in a seed with my first bit of data. Then I generate a few numbers and output a new seed with a building component by adding some component-specific attributes to the current seed. Since those are floats, the inaccuracy may be costing me a degree of determinism. Alternatively, it would probably be more random and more deterministic (odd little combination there) if the new seed I output were just the sum of a few random integers, but that would be slower and I'm not sure that it's necessary for my purposes.

GLSL Lessons Learned

Filed under:GPGPU, Rendering, Procedural Content Generation, General — posted by Brian on February 5, 2008 @ 7:08 pm

So apparently GLSL is very touchy, and instead of giving reasonable error messages or even reporting linking errors, it will choose to simply not work. Sometimes this will be a cryptic failure to link with a meaningless OpenGL error; sometimes the shader will just hang indefinitely. Here are some things I've learned:

Just because you have some varyings and they're used all throughout your code does not mean they're active when you link the shader program. This is problematic when you have to get the location of those varyings so that you can bind them to transform feedback. nVidia is kind enough to provide an extension to force varying variables to be active upon linking. Be sure to use that.

If you attempt to access an invalid array index with a constant, it's a linking error, not a runtime error.

Using variables in loop-terminating conditions is, at best, problematic. If you're very, very lucky, you won't have any problems. What's more likely to happen is that your shader will hang indefinitely. For that scenario, a work-around exists where you use a large (not too large, mind) constant instead of a variable, and then break out of the loop when the loop counter reaches the variable.

It's nice that logs exist, but in my experience, they only report hard syntax errors or blatantly invalid statements (invalid implicit casts, for instance). Subtle nuances like the one above are unreported.

These aren't the only problems, but they're the most prevalent ones currently.

Ping Pong

Filed under:GPGPU, Procedural Content Generation, General — posted by Brian on January 27, 2008 @ 10:11 pm

For complex GPGPU stuff and even with normal graphics rendering, it's sometimes required to "ping pong" input and output repeatedly to achieve a desired effect. The output in one pass becomes the input in the next, and the output of that next pass becomes the input to the next, and so on. I just got this process working using vertex buffers and transform feedback, which will lead naturally into my building generation:

Fill a vertex buffer object (we'll call it "send") with some initial information that you want expanded upon. In the case of building generation, this initial information will be the initial lot.

Turn on transform feedback and bind it to a different vertex buffer ("receive"). Transform feedback is used to take geometry data after it's gone through the vertex and geometry processing and place that data back in a client-side buffer. For our purposes, the vertex shader becomes a simple pass-through shader (it doesn't transform the vertices), and the geometry shader is where the spawning of new information (building components) takes place.

Draw the send buffer such that the results of the geometry shader end up in the receive buffer.

Repeat the above steps N times, where N is the total number of passes you need over the data. When dealing with a split grammar, N might just be the number of rules if no rules go 'backward' in the grammar. However, there's no real guarantee of this - I've written grammars myself that go backward. For the time being, I'm going to fix N at 2 * numRules.

Here are some considerations:

Why am I fixing N? Because I haven't thought of a better way to tell if processing of the entire building has finished. I have a hunch I might be able to pull something off here by doing some trickery with the fragment shader, but I'm not certain yet.

Is this going to be faster than doing everything on the CPU? I have no clue. Geometry shaders aren't fully mature yet. When they do mature, this *should* be faster, but I'm not sure by how much yet.

Should I only process a single component in one shader pass, or should I run through multiple steps? Here I run into the risk of exceeding the maximum outputs of a geometry shader, but I think I can safely run through two or three steps in one pass without that being a problem. Whether this will help or hurt speed, I don't know yet.

The next step is writing the code-generator that takes a grammar and turns it into GLSL code. I'm convinced I can do this in a relatively straight-forward fashion, with only a few awkward places (basically anything involving strings). I do, however, need to find a good (and fast) noise function to generate random numbers. I've heard simplex noise is good for this, but I haven't found a lot of information on it yet.

From Grammars to Shaders

Filed under:GPGPU, Rendering, Procedural Content Generation, General — posted by Brian on January 24, 2008 @ 8:43 am

So here's the current problem I'm working on:

There's no good way to send the rules across into the shader dynamically, so I'm nixing that. Instead, I'm writing a translator that takes the rules (as already parsed by Lex/Yacc) and converts them directly into shader code. The rules fit mostly neatly into a series of if-then statements, with little problems when variables or functions are encountered (these problems don't seem heavy, but enough to make the translator a little awkward in places).

After that, it becomes a problem of ping-ponging the generated building components back and forth in such a way that I don't waste time processing components that have already reached terminal status. I thought using multiple transform feedback targets would work - sending the finished components to a different buffer than the unfinished - but there's a limit on the amount of information I can feedback in one shot such that I can't get out all the information needed for this. The nasty alternative seems to be searching through the feedback buffer for terminals and shifting them out, but I can't imagine that being in any way fast. If only fragment shaders allowed a greater number of outputs (since processing any component might result in 10+ other components, a fragment shader won't be able to work here). Fragment shaders are more mature than geometry shaders, and I'm afraid that the geometry shader might not provide a significant speed boost even if I get my current problems worked out.

GPGPU

Filed under:GPGPU, Rendering, Procedural Content Generation — posted by Brian on January 17, 2008 @ 8:47 pm

So here are my current gripes with GPGPU programming:

First, CUDA (a GPGPU library from nVidia) isn't supported under Vista. I'm hesitant to move away from Vista, because if I need to tap into DX10 at some point it won't be feasible on XP. At any rate, CUDA programming seems to be, quite frankly, a little icky.

No static global memory exposed by GLSL! It'd be nice if I could do some computations once, store them in some internal structures, and then use those structures the next time a shader is invoked. But alas, those structures are forced to be local to the current invocation of the shader, meaning any precomputation is thrown out.

No serialization/deserialization of textures. Since any real data has to be sent through a texture, it would be awfully nice if I could fill a texture with my structure data, send it across, and easily reassemble the structures on the other side without a huge speed penalty. Not really possible. Instead, I'm going to have to arrange my texture in such a way that it can act as a structure on its own and pray (oh shall I pray!) that texture reads aren't too slow.

Fragment shaders can only produce one output (barring the existence of multiple render targets, which might allow more). This is by design, I know, but it means that I'm *probably* going to be forced to use geometry shaders, which aren't very mature right now. Hopefully transform feedback isn't horribly slow.

I've read in forums (though I haven't benchmarked this personally) that flow control on the GPU is pretty slow. Let's hope that's false, because my shader is going to rely pretty heavily on it.

So, yea, this ought to be interesting.



image: detail of installation by Bronwyn Lace