Unfortunately, AMD doesn't provide new OpenGL 4.2 drivers this month as Catalyst 11.11 are OpenGL 4.1 drivers. I am really hopying we will get something fresh next month!
The evolutions recorded this month are the result of the newly released OpenGL Samples Pack 4.2.2.0 which has been tested with last month and this month drivers. Beside a new bug highlighted by the new 420-image-store sample on NVIDIA the main element to notice is that the 420-buffer-uniform sample pass succesfully on AMD. However, if I am still using the same AMD drivers since last month, how this is possible? Yes, I change the sample. It uses "dynamically uniform expression" indexing instead of indexing with "general integer expression" the block array.
A fragment-shader expression is dynamically uniform if all fragments evaluating it get the same resulting value. When loops are involved, this refers to the expression's value for the same loop iteration. When functions are involved, this refers to calls from the same call point. This is similarly defined for other shader stages, based on the per-instance data they process. Note that constant expressions are trivially dynamically uniform. It follows that typical loop counters based on these are also dynamically uniform.
This definsion raises a question: So dynamically uniform expressions only apply to the fragment shader stage? This diesn't really make much sense to me because the concept deal with the coherence between different fragment shader execusion but this coherence could apply to other stages as well.
The other question is obvisouly where does dynamically uniform expressions apply? A little more of investigation tells us for the sampler arrays and this led me to include the sample 400-sampler-array-gtc as a feature request more than a year ago. The specification also clearly specify indexing of atomic counters arrays as quote below.
When aggregated into arrays within a shader, samplers can only be indexed with a dynamically uniform integral expression, otherwise results are undefined
When aggregated into arrays within a shader, atomic counters can only be indexed with a dynamically uniform integral expression, otherwise results are undefined.
However, in GLSL there is another opaque type that can be declared as an array: image. Interestingly the behaviour is not consistent but it seems really hard to imagine that samplers and images would follow radical different memory access coherency requirements.
When aggregated into arrays within a shader, images can be indexed with general integer expressions.
Following this path, we are reaching another type which evolve memory access and can be declared as an array: uniform blocks.
Any integral expression can be used to index a uniform block array, as per section 4.1.9 "Arrays".
This leads us to the explanation of why AMD implementation has never pass the uniform buffer array samples of the OpenGL Samples Pack: It's likely to be an hardware limitation that the OpenGL specification hasn't address correctly and uniformally.
This analyse concludes that NVIDIA OpenGL 4 hardware support "general integer expressions" indexing of uniform block, sampler, image and atomic counters arrays but AMD OpenGL 4 hardware to date doesn't. As a feature request, I am not really interested by "general integer expressions" indexing of such arrays but it could be good enough to limit the correhence constraint to the gl_Instance rate... maybe on future hardware but I am afraid that geometry shader instancing prevent us such intermediate strategy.
As a general recommandation for OpenGL programmers, using "dynamically uniform expressions" to index uniform block, sampler, image and atomic counters arrays will keep us safe of running into cross-platform problems.
These tests have been done on Windows 7 64 with the OpenGL Samples Pack 4.2.2.0 on an GeForce GTX 470 and a Radeon HD 5850.
OpenGL Samples Pack 4.2.2.0, OpenGL specification tests | AMD Catalyst 11.10 preview 3 (16/10/2011) | NVIDIA Forceware 285.62 (25/10/2011) | NVIDIA Forceware 290.36 (28/11/2011) | |
---|---|---|---|---|
420-transform-feedback-instanced | Can't readback built-in variables. max_vertices affects the alignment in the transform feedback buffer | |||
420-texture-storage | ||||
420-texture-pixel-store | ||||
420-texture-compressed | Texture storage with BPTC generates invalid operation errors | |||
420-test-depth-conservative | ||||
420-sampler-fetch | ||||
420-memory-barrier | ||||
420-image-unpack | Unpack isn't correct? | |||
420-image-store | Scissor test dysfunctional? | Scissor test dysfunctional? | ||
420-image-load | ||||
420-draw-base-instance | ||||
420-direct-state-access-ext | Unsupported DSA storage functions | |||
420-buffer-uniform | ||||
420-atomic-counter | glMapBufferRange on atomic counter fails | |||
410-program-separate-dsa-ext | ||||
410-program-binary | ||||
410-program-64 | ||||
410-primitive-tessellation-5 | Bug on the shader interface matching: Block member not active with linked separated program | |||
410-primitive-tessellation-2 | ||||
410-primitive-instanced | ||||
410-fbo-layered | ||||
400-transform-feedback-stream | max_vertices affects the alignment in the transform feedback buffer | |||
400-transform-feedback-object | ||||
400-texture-buffer-rgb | ||||
400-sampler-gather | ||||
400-sampler-fetch | ||||
400-sampler-array | ||||
400-program-varying-structs | ||||
400-program-varying-blocks | ||||
400-program-subroutine | ||||
400-program-64 | ||||
400-primitive-tessellation | ||||
400-primitive-smooth-shading | ||||
400-primitive-instanced | ||||
400-fbo-rtt-texture-array | ||||
400-fbo-rtt | ||||
400-fbo-multisample | ||||
400-fbo-layered | ||||
400-draw-indirect | ||||
400-blend-rtt | ||||
330-texture-pixel-store | ||||
330-transform-feedback-separated | ||||
330-transform-feedback-interleaved | ||||
330-primitive-point-sprite | Pop free clipping | Pop free clipping | ||
330-fbo-srgb | ||||
330-error-sampler-offset | ||||
330-draw-without-vertex-array | ||||
330-buffer-type | i32 vertex input data not supported |
OpenGL Samples Pack 4.2.2.0, proprietary features | AMD Catalyst 11.10 preview 3 (16/10/2011) | NVIDIA Forceware 285.62 (25/10/2011) | NVIDIA Forceware 290.36 (28/11/2011) |
---|---|---|---|
420-texture-copy-nv | NV_copy_image not supported | ||
420-primitive-bindless-nv | NV_shader_buffer_load not supported | ||
420-fbo-multisample-position-amd | AMD_sample_positions not supported | AMD_sample_positions not supported | |
420-fbo-multisample-dsa-nv | NV_texture_multisample not supported | ||
420-draw-indirect-amd | AMD_multi_draw_indirect not supported | AMD_multi_draw_indirect not supported | |
420-test-depth-clamp-separate-amd | AMD_depth_clamp_separate not supported | AMD_depth_clamp_separate not supported |
OpenGL Samples Pack 4.2.2.0, specification bugs workaround | AMD Catalyst 11.10 preview 3 (16/10/2011) | NVIDIA Forceware 285.62 (25/10/2011) | NVIDIA Forceware 290.36 (28/11/2011) |
---|---|---|---|
420-glsl-interface-matching-array-gtc | Can write a valid vertex shader output with no valid geometry shader input possible | Can write a valid vertex shader output with no valid geometry shader input possible | Can write a valid vertex shader output with no valid geometry shader input possible |
400-sampler-array-gtc | No workaround for this specification bug | Allows dynamic indexing of the sampler array | Allows dynamic indexing of the sampler array |
330-draw-instanced-array-dsa-gtc | No workaround for this specification bug | No workaround for this specification bug | No workaround for this specification bug |