After my post dedicated to the OpenGL 4.1 drivers status, I receive quite some feedbacks from AMD. My tests are based on my OpenGL Samples Pack 4.1 developed over nVidia OpenGL 4.1 drivers as it has been released since the OpenGL BOF end of July. A conscequence is that my OpenGL 4.1 samples are build upon nVidia implementation which has implied some quite bad results while running on AMD because of implementation philosophy differences.
Obviously, before publishing my post, I had a look at the samples trying to figure out what went wrong but when you are facing "unexpected error" messages, it's pretty hard to make progress. This is how it begins with early drivers either from AMD or nVidia and probably anyone. Hence, Graham Sellers from AMD point me to the direction of understanding AMD implementation throught specification quotes so that I could make my sample work on AMD... and this is where the separate programs drama began.
This is something I figure out across the year. I believe that AMD and nVidia has 2 differents approach regarding OpenGL. AMD tries to follow the specification by the letter in a quite pedantic maner even if the specification doesn't make sense. For nVidia the approach is quite different. Some developers speak about "nVidia's OpenGL" regarding nVidia's implementation. nVidia approaches is less strict and more pragmatics with an implementation that doesn't hesitate to relax some restrictions and even provides more features not only through extensions. Explicit varying location are implemented since nVidia OpenGL 3.3 beta drivers for example.
Regarding GL_ARB_separate_shader_objects, I assumed some specification details that are actually not valid according to the specification. These assumptions came from good sense, OpenGL uses but also a long interest on nVidia's separate programs.
GL_ARB_separate_shader_objects is the promoted extension to core for GL_EXT_separate_shader_objects a pretty badly designed extension relying on deprecated mecanisms and fixed functions legacy. It became quite interesting once promoted to ARB despite a name which is a total non-sense following the OpenGL tokens dictionary. "Separate shader objects"? What does it mean? Shader objects are already per-shader stage since the beginning... GL_ARB_separate_program_objects or GL_ARB_program_pipeline_object would have been better to me but well.
GL_ARB_separate_shader_objects allows using multiple different program objects to setup all the GPU stages.
So far with OpenGL, the GLSL linker ensures that the communication between stages was going well and even performs some interesting optimisations removing across stages unused varying variables for example.
With separate programs, the compiler has to make some assumptions about inputs provided by the previous stage whatever this stage actually is. For this purpose, a new section called "shader interface matching" has be written in the specification. Unfortunatly, following this section by the letter implies differents shader matching rules for separate and non-separate programs regarding explicit varying locations, which can lead to force OpenGL programmer to write different shaders for both program types... for no good technical reasons. Let's take a problematic example:
With separate programs, the location is going to be used for the shader interface matching. However with non-separate programs, the matching is performed per names which implies that the location qualifier is ignored. That doesn't make any sense to do this, but this is what the specification says...
Concretly, explicit varying locations override name matching with separate programs but are silently ignored with non separable programs.
Finally, separate programs require to redeclare gl_PerVertex blocks... hum... why?
Separate programs and non-separate programs evolves with different set of rules which leaves them apart while technically they are connected. There are good reasons to use non-separate programs for compiler optimizations purposes but there are also good reasons to use seperate programs for software design optimization purposes putting OpenGL programmers in this middle ground.
Since OpenGL 3.1 but especially OpenGL 3.3, the specifications has made a move to direct state access (DSA) and the new OpenGL program pipeline object is no exception with a pretty DSA API... with one exception! The specification clearly says that a program pipeline object is actually created by binding the object...
A program pipeline object is created by binding a name returned by GenProgramPipelines with the command void BindProgramPipeline(uint pipeline);
Adding verbose declarations, using different matching rules from separate programs and non-separate programs and having to use glBindProgramPipeline to create the effective pipeline object don't make sense but this is what is written in the specification, what the ARB has agreed on. AMD and nVidia has implemented logically OpenGL 4.1 following their own philosophies: AMD has interpretted the OpenGL specification by the letter implementing some fairly stupid ideas and nVidia has interpretted the OpenGL specification in its own way, a clever way but a non-conformed way... Well, all in all we are pretty doomed to use the full capabilities of the separate programs.
How everything could have been better? I quite believe that if the ARB has put more attention hen reviewing the specification, which means probably taking more time, these issues would have been fixed as these problems are maybe "details" but quite obvious.
Let's start with the DSA issue. As my OpenGL 4.1 samples demonstrated, this grose specification mistake has been implemented by both AMD and nVidia in a way that the program pipeline object can be used as a pure DSA object. AMD and nVidia OpenGL teams are particularly talented, it makes sense to have the implementation writted this way as it doesn't make any difference when the implementations are used following strictly the specifications. Could we really rely on this work-around? What is going to happen when Intel and Apple will provides implementations for OpenGL 4.1? (within 10 years from now...) This could be a software bug so I think the specification as to be followed to the letter. Anyway, OpenGL 4.1 is far from being completly DSA which makes it impossible to design a fully DSA renderer.
On the regard of the verbose and useless gl_PerVertex redeclarations, it implies a compilation error on nVidia but this is something that will eventually be fixed, so that unfortunatly it has to be use following the specification.
Finally, the shader mathing rules: I much as I love the explicit varying location, as it isn't supported with non separate programs, I think it should not be used. Fortunately, the name matching is working the same way between separate programs and non-separate programs. Using varying structures allows de define a clear protocol between stages. It's less flexible than explicit varying location but really robust.
Following this discussion, I updated the OpenGL 4.1 samples pack to report the drivers status. I really wish that nVidia implementation was what OpenGL specifies but it's not. The goal of specification is to follow them and weihter of not the specification is good or not is another problem. Hence, for my samples I decided to follow the specification by the letter. However, I decide to add some sort of extented samples using the postfix "gtc" to illustrate the changes I would enjoy for OpenGL 4.2 and wish some are already supported.
Drivers: | AMD Catalyst 10.10c (beta) | nVidia Forceware 260.93 (beta) |
---|---|---|
410-debug-output-arb | AMD_debug_output support only | |
410-program-varying | gl_PerVertex redeclaration involves compiler errors... | |
410-program-separate | gl_PerVertex redeclaration involves compiler errors... | |
410-program-binary | GL_PROGRAM_BINARY_RETRIEVABLE_HINT must be set to GL_TRUE or can't be retrived on fsome platform | |
410-program-64 | glVertexAttribLPointer is null | |
410-primitive-tessellation-5 | gl_PerVertex redeclaration involves compiler errors... | |
410-primitive-tessellation-2 | gl_PerVertex redeclaration involves compiler errors... | |
410-primitive-instanced | Using explicit location silently ignore throw a parsing error. | Unexpected warning |
410-fbo-layered | Unexpected warning | |
400-transform-feedback-object | ||
400-texture-compression-arb | ||
400-texture-buffer-rgb | ||
400-sampler-gather | ||
400-sampler-fetch | ||
400-sampler-array | ||
400-program-varying-structs | Doesn't support varying struct and offensive error message | |
400-program-varying-blocks | Unexpected warning / gl_in.length() not fully supported | |
400-program-subroutine | ||
400-program-64 | ||
400-primitive-tessellation | Unexpected warning | |
400-primitive-smooth-shading | Unexpected warning | |
400-primitive-instanced | Unexpected warning | |
400-fbo-rtt-texture-array | ||
400-fbo-rtt | ||
400-fbo-multisample | ||
400-fbo-layered | ||
400-draw-indirect | ||
400-buffer-uniform | Unsupported uniform block array | |
400-blend-rtt | ||
330-texture-array | Required glTexParameteri to setup filtering, sampler unsupported | |
330-sampler-object | Sampler object doesn't always oversede texture parameters |
Following some samples that illustrates some OpenGL 4.2+ feature requests I made and taking the "gtc" post-fix. I wrote the following samples as it shows I think either specification bugs, design mistakes, lack of arrucacy or lack of perspectives like the issue discussed in this post.
Drivers: | AMD Catalyst 10.10c (beta) | nVidia Forceware 260.93 (beta) |
---|---|---|
410-program-varying-gtc | Not supported as OpenGL specify... | A GLSL compiler warning would be nice |
410-program-separate-dsa-gtc | A debug output warning would be nice | A debug output warning would be nice |
400-sampler-array-gtc | Not supported as OpenGL specify... | A GLSL compiler warning would be nice |
400-buffer-uniform-shared-gtc | Not supported as OpenGL specify... | A GLSL compiler warning would be nice |