nVidia is well known for its SDK and tools by graphics developers. ATI has probably not has shiny tools but still provides some crazy good tools!
Today, I would like to speak about ATI Tootle which is just the best mesh optimizer available, ways better than D3DXMesh optimization and nVidia NvTriStrip. It doesn't reduce the mesh quality, it just makes meshes a lot faster to render based on the last researches in this area!
When using indexed vertices, the vertices processed by a vertex shader are saved in a cache: the vertex cache. N (according the graphics chip) vertices are saved in this cache but after N different ones processed the first one is removed from the cache so we better have to use it again better it gets removed.
Graphics chips have some mechanism to discard fragment based on fragment occlusion: Z-Cull, Early Z and of course Z tests. The sooner you discard fragments, the less processing you do and save some memory band wise (framebuffer writes).
Finally, of course when there is cache miss; it's required to read it from the graphics memory...
ATI Tootle provides 2 methods: A slow one that generates faster meshes and a fast one that generates shower (but still really fast!) meshes. Don't worry, it's going to be as fast on G-Force or PowerVR hardwares!
One other really popular method for vertex cache optimization is the researsh results of Tom Forsyth: Linear-Speed Vertex Cache Optimisation. There are some implementations available across the world.
Images data are always twiddled or compressed in memory to optimize the texel fetches, cache miss, memory band wide... Meshes optimizations it more complicating but has useful! Getting your assets right provides fast rendering!