With my previous post dedicated to compilers, I recieved a lot of interesting comments. First, I would like to make everyone remember that my test is one scenario and the conclusions doesn't extend the scenarios illustrated in the post and I don't attempt to generalize these bound results as a universal truth about compilers.
Thanks to feedbacks, I also learn that GCC 4.5 now support some link time optimizations with the -lto and -whole_program flags. In this new post, I would like to show the performances benefit that such options provide in GCC but also in Visual C++.
Finally, I ran the test using 32 bits programs and I think it is about time to switch to 64 bits build that why we are also comparing 32 bits and 64 bits performances.
The test have been done with SSE2 optimizations enabled, fast math (/fp:fast, -fast-math), Ox or O3 and link time optimization as specified. With these settings, we are looking for maximum performance and we are willing to lose some accuracy. Mesurements have been done on a Phenom II X6 1055T and a Core 2 Q6600 running Windows 7 64.
The tests are based on Ovt'sa, a pure C++ program, not especially effective for what it does or optimized for anything. It uses GLM and GLI but no other dependency. Despite using SSE optimizations, the program isn't especially design to take advantage of them and only run on a single thread. No disk access is included in the mesurements.
Like we could see in my previous post, Visual C++ 2005 and 2008 were pretty inneficient at generating SSE2 code in 32 bits leaving them behide GCC 4.5 in term of performances. Visual C++ 2010 fixed this issue and provide the same level of performance between the 32 and 64 bits build on the Phenom II but better on the Core 2, the 64 bits build is more efficient. GCC remains behide Visual C++ 2010 in both cases, but GCC 4.5 provides more performance in 64 bits.
The Phenom II is giving a nice looking graph to the compiler results where Visual C++ with LTO is making progress for each version. However, GCC provides the same level of performance with or without LTO. When LTO is disabled, Visual C++ is losing the lead on GCC. I don't think it means that GCC would become more efficient with proper LTO optimizations, it's probably more that optimizations are placed at different places between GCC and Visual C++. The results on the Core 2 are surprizingly quite fuzzy with the best performance for Visual C++ 2005 and Visual C++ 2008 finishing last.
Optimizing a code for a platform is somewhat ok proper mesurements but generalizing optimizing for all platforms is pretty challenging. Accoring to this test, building in 64 bits mode only provide performance benefits. Link time optimizations are pretty mature in Visual C++ but seems to be disabled on GCC 4.5 with such scenario. I am looking forward GCC 4.6 where I expect more benefits for them.