CONN BURANICZ
TECH ARTIST
GRAPHICS - SHADERS - CODING - TOOLS
128x128, notice especially around the straw and the children's mouths in the background
64x64, the BC7 is very close, but notice the lower face is completely 4-Color "purplelized"
256x256, the BC1 Blocks are especially noticable on Rock's leg and the Van
128x128, notice especially around the straw and the children's mouths in the background
PREAMBLE
BLOCK COMPRESSION HO!
Man, I've used texture compression my entire Tech Art career, and of course unknowingly most of my life between PNGs and JPGs etc. I remember quite vividly back in 2014 when Shane Calimlim first explained to me the difference between DXT1 and DXT5. And yet, for years, my Tech Artist "expertise" on the topic boiled down to: "BC4 is for grayscale, BC5 is for Normal Maps etc." For a Tech Artist who loves getting pretty low-level, I was no longer happy with that. Sure, I read the DirectX documentation before, but for someone like me, simply reading through white papers or watching GDC talks, it doesn't stick. In order to thoroughly grasp these algorithms, I need to implement them myself: and that's what I did.
This article is here to talk about how I wrote my own BC1 and BC7 file formats! To be clear, my Block Compression is purely software-based, writing and reading the images back out. There is no GPU hardware decoding, or anything like that involved.
My articles always kind of waffle between tutorial and anecdote. I'll try to give an overview of the basics, and focus on some parts that I really struggled with. If you'd like to know more, please feel to reach out, I'd love to hear from ya!
SOURCES (GOOD AND MEH)
When looking for Block Compression documentation, I'm sure alot of people first stumble onto the official Direct3D one:
Since BC1 is so straightforward, that documentation is all I needed to implement it. However, BC7 was another story. Frankly, I found the Direct3D documentation lacking in regards to BC7. For example, it doesn't even have a full list of all the partition sets! I jotted down copious amounts of notes and scribbles, trying to keep up with all the terminologies: subsets, p-bits etc. I got to a point where I realized...this documentation wasn't enough for me. That's when I stumbled upon even better documentation:
BPTC is the OpenGL equivilent of BC7, and MAN this documentation illuminated so many of the holes that Direct3D left me with: particularly with p-bits, and a FULL list of Partition Tables for Two-Subset and Three Subsets!! I highly recommend using Khronos Group for BC7 breakdown instead of the Direct3D one. Sometimes, its just a matter of finding good sources and test data!
Lastly, I have to mention Nathan Reed's excellent breakdown of the seven Block Compression types. I'm sure plenty of folks are already familiar with it; I know I certainly read it a few years back. He breaks each down better than I ever could. Great quick-read resource, pictures and all. I love his line about: "BC2 is a bit of an odd duck.." haha
Front and back, these papers helped me organize my thoughts, keep a To-Do List, and mad ravings!
Front and back, these papers helped me organize my thoughts, keep a To-Do List, and mad ravings!
NOT A FAN OF PYTHON
Yes, I used Python 3 and PyCharm IDE to develop this project. Let me be the first guy to say I dislike programming in Python very much, and this project only solidifies my preference for C++ over Python. Why did I do it then? Python is certainly an odd choice for bitwise-heavy algorithims. Well, I wanted to challenge myself to become stronger in Python. I was inspired by one of John Carmack's keynote speech where he described how when he learning Haskell, imerssing yourself in a heavy-duty project is the only real way to get to know the strengths and weaknesses of a language TO BE CONTINUED
The biggest advantage of Python, in my opinion is it's ubitiqity across Software Tools such as SideFX Houdini and Autodesk Maya. Using Python as a glue to interface between Houdini to a proprietary engine blah blah blah is excellent!
BC1
THE BASICS
Alright, let's quickly go over the basics for anyone unfamiliar. Why don't videogames simply read-in PNGs or TGAs while the game is running? I mean, we certainly use those formats when working in our editors. However, those conventional formats are not hardware supported. The BCn family of texture compressions are directly supported by modern GPUs, and can be decompressed very quickly!
To be clear: hardware texture compression does not only reduce memory footprint...it helps performance too! BLAH BLAH BLAH
A 256x256 image of Robocop. Notice how much more detail is preserved using BC1 compared to downsampling to get equivilent memory footprint!
A 256x256 image of Robocop. Notice how much more detail is preserved using BC1 compared to downsampling to get equivilent memory footprint!
TODO:
Add BC1 Alpha vs Uncompressed Alpha Sphere etc
Comparing BC1's One-Bit Stencil Alpha to full 8-Bit Alpha
Showcasing how much the Alpha Threshold changes the image
Comparing BC1's One-Bit Stencil Alpha to full 8-Bit Alpha
DOWNFALL OF BC1
TODO: Explain how rapid change of color (Noise) does not play well. Perhaps here is where Resolution affects results alot. But at that point, with resolutions the memory footprint delta between is only a few dozen KB, so perhaps for those BC7 is totally worth it.
The Top Row is the original, uncompressed images decreasing in resolution. The Bottom Row are images using BC1. Notice how BC1 has a harder time maintaining the image the lower the resolution becomes
The Top Row is the original, uncompressed images decreasing in resolution. The Bottom Row are images using BC1. Notice how BC1 has a harder time maintaining the image the lower the resolution becomes
CONCLUSION?
TODO: End BC1 Section somehow!
BC7
MORE DETAIL, MORE COMPLICATED
I went straight from the simplest to the most difficult and newest format. Introduced with DirectX 11, BC7 was exponentially more difficult to author. BLAH BLAH
Anecdotally, BC7 really saved my bacon! As a Tech Artist working on, complex, interactive visuals: I encode alot of Data into Vertex Color, UV Channels and Textures. We used all four channel of our Texture, each with special data. BC1-BC5 simply BUTCHERED our data, and so we had to resort to uncompressed RGBA8 textures. However, several months in, one of Graphics Engineers implemented BC7 into our proprietary engine, and viola, my data was mostly intact! BLAH BLAH TODO: talk about Blue Noise!!!!!!!!!!!
64x64 Closeup of Nina's Face. BC7 is able to match original pretty closely while BC1 struggles of course with the usual issues
64x64 Closeup of Nina's Face. BC7 is able to match original pretty closely while BC1 struggles of course with the usual issues
WHAT DO ALL THESE TERMS MEAN
As I mentioned in the Sources section above, I felt that Microsoft's Documentation on BC7 didn't layout the terms clearly for me. This probably speaks of my shortcomings, but I needed to jot down notes, and reread the paragraphs several times to finally grasp what the characteristics of that make up the Modes meant. I hope this next section helps somebody understand how to implement a BC7 Tool faster than I did!
SUBSETS
TODO: Explain Subsets
PARTITIONS
TODO: Explain Subsets
INDICES
TODO: Explain This
P-BITS
TODO: Explain This
ENCODING THE MODE
TODO: Explain This
256x256, it's funny how Mode3 was selected for the entire white border around the characters, creating a green border
A pretty even distribution between the four opaque BC7 Modes, with Mode1 dominating as usual
256x256, it's funny how Mode3 was selected for the entire white border around the characters, creating a green border
THE MODES
Opaque has four Modes, and I will go over them one-by-one and what I feel are their greatest strengths
TODO: include chart breakdown for each Mode.
Also include examples of same image using different mode one at time
Mode 0
Best for Blocks with high color delta; it might be the best Mode for "Blue Noise" because it can (potentially) capture 24 unique colors! However, it has the weakest Color Depth, and is very bad with capturing gradients.
Mode 1
Because of 3-Bit Indices, Color Depth, and Partitions, this Mode is the best at smooth, accurate gradients. Two Subsets means its no good with "junction points" where colors shift in the image though. So as long as the "hue" is simular across the block, it's golden!
Mode 2
Great with a limited amount of high-contrast colors. Simular to Mode 0, but swapping more partition choices for measly 2-Bit Indices. If there are no more than six colors, it'll preserve the data very well.
Mode 3
Has the highest Color Depth, and can capture a limited number of colors very accurately. The best Mode for if the Block can be split into two distinct gradient patterns. Of the four opaque Modes, it can (at best) capture 8 unique colors...
The Only Mode with 16 Partition Choices
The Only Mode with 16 Partition Choices
Only Mode with Shared P-Bit, so it's extra difficult for that P-Bit to truly be useful
Only Mode with Shared P-Bit, so it's extra difficult for that P-Bit to truly be useful
Three Subsets, with high Partition Selection means it has the best chance of finding the right "jigsaw piece"
Three Subsets, with high Partition Selection means it has the best chance of finding the right "jigsaw piece"
Great Color Depth means it could be very good with a gradient block, but 2-Bit Indices means it could also be pretty bad!
Great Color Depth means it could be very good with a gradient block, but 2-Bit Indices means it could also be pretty bad!
THE MODES
Opaque has four Modes, and I will go over them one-by-one and what I feel are their greatest strengths
TODO: include chart breakdown for each Mode.
Also include examples of same image using different mode one at time
Each Image shows compression using one Mode Only. Helps showcase the strengths/weaknesses between the Modes
A 16x16 Image where each 8x8 Corner was tailor-made to show the best of Modes 0, 1, 2, and 3
Each Image shows compression using one Mode Only. Helps showcase the strengths/weaknesses between the Modes
The Pixels are compared with original. The whiter the pixel the more inaccurate the Pixel has become compared to the original
Here is Robocop completely compressed using one Mode each. By far the biggest difference you'll notice is the background:: where Mode 0 and 2 have alot more banding because of the weakness with gradients. The reflections in Robocop's suit come out slightly stronger though
The Pixels are compared with original. The whiter the pixel the more inaccurate the Pixel has become compared to the original
TIMING
No doubt about it, BC7 takes dreadfully than longer than BC1 to write out an image. This is because, for each 4x4 Block, BC7 needs to compare "the score" of every Mode to determine which is the most suited Mode to use. On top of that, every Mode needs to choose the appropriate Partition
MAYBE PUT THIS EARLIER IN THE BC7 SECTION!!
TODO: Add Chart Showing How Long BC7 vs BC1
Mode2 is selected for the 3-Subset Blocks while Mode3 is selected for the 2-Subset Blocks as they should be
More examples of Modes being correctly chosen based on which Mode has the correct number of Partitions and best Color Depth
For gradients, Mode1 dominates because of its 6-Bit (plus P-Bit) Color Depth and its 3-Bit Indices. Although Mode 2 and 3 win out as well
Mode2 is selected for the 3-Subset Blocks while Mode3 is selected for the 2-Subset Blocks as they should be
REMINDS ME OF MEMORY MAPPERS
Man, I LOVE bitwise operations and radical optimization. By the time I started on BC7, I realized that these Block Compression formats reminded alot of a few years back when I was implementing Memory Mappers for Conntendo. I'll give the anology that BC1 is like the UxROM Mapper while BC7 is like MMC5 BLAH BLAH
TODO: Add Noise Comparison Image
TODO: Add RyuNina Image with Debug Color the Modes
CONCLUSION?
TODO: End BC7 Section somehow!