CONN BURANICZ
TECH ARTIST
GRAPHICS - SHADERS - CODING - TOOLS
128x128, notice especially around the straw and the children's mouths in the background. AI generated image of my wife haha
64x64, the BC7 is very close, but notice the lower face is completely 4-Color "purplelized"
256x256, the BC1 Blocks are especially noticable on Rock's leg and the Van
128x128, notice especially around the straw and the children's mouths in the background. AI generated image of my wife haha
PREAMBLE
BLOCK COMPRESSION HO!
Man, I've used texture compression my entire Tech Art career, and in fact I've been using it most of my life unknowingly between PNGs, JPGs etc. I remember quite vividly back in 2014 when Shane Calimlim first explained to me the difference between DXT1 and DXT5. And yet, for years, my Tech Artist "expertise" on the topic boiled down to: "BC4 is for grayscale, BC5 is for Normal Maps etc." For a Tech Artist who loves getting pretty low-level, I was no longer happy with that. Sure, I read the DirectX documentation etc, but for someone like me, simply reading through white papers or watching GDC talks, it doesn't stick. In order to thoroughly grasp these algorithms, I need to implement them myself: and that's what I did.
This article is here to talk about how I wrote my own BC1 and BC7 file formats! To be clear, my Block Compression is purely software-based, writing and reading the images back out. There is no GPU hardware decoding, or anything like that involved.
My articles always kind of waffle between tutorial and anecdotes. I'll try to give an overview of the basics, and focus on some parts that I really struggled with. his article ain't much, but believe it or not, it's taken as long to write this article as it was to actually finish the programming! If you'd like to know more, please feel to reach out, I'd love to hear from ya!
SOURCES (GOOD AND MEH)
When looking for Block Compression documentation, I'm sure alot of people first stumble onto the official Direct3D one:
Since BC1 is so straightforward, that documentation is all I needed to implement it. However, BC7 was another story. Frankly, I found the Direct3D documentation lacking severely in regards to BC7. For example, it doesn't even have a full list of all the partition sets! I jotted down a copious amount of notes and scribbles, trying to keep up with all the terminologies: subsets, p-bits etc. I got to a point where I realized...this documentation wasn't enough for me. That's when I stumbled upon even better documentation:
BPTC is the OpenGL equivalent of BC7, and MAN this documentation illuminated so many of the holes that the Direct3D doc left me with: particularly with p-bits, and a FULL list of Partition Tables for Two-Subset and Three-Subsets!! I highly recommend using Khronos Group for BC7 breakdown instead of the Direct3D one. Sometimes, its just a matter of finding good sources and test data!
I have to mention Nathan Reed's excellent breakdown of the seven Block Compression types. I'm sure plenty of folks are already familiar with it; I know I certainly read it a few years back. He breaks each down better than I ever could. Great quick-read resource, pictures and all. I love his line about: "BC2 is a bit of an odd duck.." haha.
A joint American, Russian University White Paper with very thorough breakdowns, with images of not only the BCn Family of Compression, but other formats as well. There are many more sources at the bottom of that paper as well. The start has excellent breakdowns of why Block Compression is so beneficial for game development. Cannot recommend this highly enough. I'd love to revisit this one again if or when I implement anymore compression techniques in the future!
My article inevitability retreads what these sources have already talked about. I don't see my article as "stand-alone", it's certainly not a tutorial. I see this article more as a companion piece to more thorough sources out there. Even though, alot of the descriptions and visual aids I provide may be redundant, I hope I at least conveyed the "same info" in a different way, that will be helpful or inspirational to someone.
Front and back, these papers helped me organize my thoughts, keep a To-Do List, and mad ravings!
Front and back, these papers helped me organize my thoughts, keep a To-Do List, and mad ravings!
NOT A FAN OF PYTHON
Yes, I used Python 3 and PyCharm IDE to develop this project. Let me be the first guy to say I dislike programming in Python very much, and this project only solidifies my preference for C++ over Python. Why did I do it then? Python is certainly an odd choice for bitwise-heavy algorithms. Well, I wanted to challenge myself to become stronger in Python. Actually, I was inspired by one of John Carmack's keynote speeches I attended a few years back. Carmark described how when he learns a new language, like Haskell, the only real way to get to know the strengths and weaknesses of a language is to immerse yourself heavily into it, really put it through its paces. For me, yes I've used Pythons plenty of times professionally over the years. But most of my experience was tied to using it with an API, reading from XMLs etc. I hadn't yet worked on something from the ground-up; a meaty, thorough project like this.
So, this article is about Block Compression, but just go into a couple reasons why I dislike using Python. One, dynamically-typed languages mean, if your code has a typo, guess what...you just declared a new variable! Tracking down bugs is just plain harder. And this may sound petty, but I don't like using Indentation to denote code logic. I personally feel brackets, semi-colons are much more elegant and readable. And that's the crux of the Python matter for me: usability and readability.
So, the biggest advantage of Python, in my opinion is it's ubiquity across Software Tools such as SideFX Houdini and Autodesk Maya. Using Python as a glue to interface between Houdini to a proprietary engine is excellent! Granted, the real glue is actually the interfacing files in-between such as writing out Transform Data into XML. But anyway, thanks to Python, I've been able to contribute to pipelines where the Data remains gnostic, and can be seamlessly imported-exported between the Engine, Maya, and Houdini!
There's plenty of more knowledgeable people out there who can provide a much more thorough breakdown regarding the yays and nays of Python. I just know for future projects, I'm looking forward to using other languages.
BC1
THE BASICS
Alright, let's quickly go over the basics of compression for anyone unfamiliar. This part get's very simplified, so please feel free to skip.
The goal of texture compression is to reduce the file size while maintaining as much of the original image data as possible. Most texture formats out there use some sort of texture compression already. When raw pixels are written to a file format, they must be compressed. When a program wants to display the saved image, it needs to decompress it.
Wait, if we have to decompress when our program reads the image, that means extra calculations right...why not shrink the texture, and avoid compression all-together? Well, take a look at this Robocop image below:: notice how the BC1 version on the Left is 32 KB, the same as the uncompressed, down-sampled version on the Right...but it looks ALOT closer to the original Center Image.
A 256x256 image of Robocop. Notice how much more detail is preserved using BC1 compared to downsampling to get equivilent memory footprint!
112x112 Image from Goof Troop SNES game. Sprite Tiles and Palettes are MUCH for storing Pixel Art like this. We are storing data for all of the Green background Art, for one. While BC1 retains alot more data than bicubic downsampling, the color accuracy falters tremendously because each 4x4 Block calculates it's own palette
A 256x256 image of Robocop. Notice how much more detail is preserved using BC1 compared to downsampling to get equivilent memory footprint!
So, why Block Compression? Why don't videogames simply read-in PNGs or JPGs while the game is running? I mean, we certainly use those formats when authoring, and saving out the images. However, when we import a PNG into a Game Editor, that Game Engine is most likely converting that file into another format already. Game Editors do not directly display JPGs etc.
Games don't use conventional image formats like PNG for a few reasons. For one, they are not hardware supported, and the BCn family of texture compressions are directly supported by modern GPUs: the integrated circuits are literally built to handle decompressing quickly! GPU-accelerated file formats minimize the runtime tax of using compressed textures.
File formats like JPGs etc are meant for the internet, power points etc, where the entire image is displayed at once. Videogames need to access different chunks of the image very quickly, known as "Random Texel Access," and these other formats cannot decompress chunks, as needed, like Block Compression can.
PERFORMANCE INCREASE
To be clear: hardware texture compression doesn't only reduce memory footprint...it helps performance too! At the risk of over-simplifying it, performance is (potentially) improved because smaller texture sizes means less data that has to transfer from storage to the GPU. Reading through the texture faster means the GPU can get to the next task quicker. So you see, Block Compression shrinks down game cartridge sizes and improves the frame rate! Important to note:: my program encodes/decodes the entire Image at once, but the beauty of Block Compression is that the GPUs only need to decode the few visible 4x4 Blocks at a time.
4x4 BLOCKS
So it's called Block Compression because the Images are subdivided into four Pixel wide, four pixel tall blocks. Each of these 4x4 Blocks are compressed one at a time: the Blocks are their own island, they do not have knowledge of neighboring blocks or the Image as-a-whole. That's why "Random Texel Access" works with Block Compression, any 4x4 chunk of the image can be encoded/decoded without needing any other data.
The BCn algorithim decides which color data to store expressly by iterating over the 16 pixels in the 4x4 Block. All 4x4 Blocks store the same amount of data (data size depends on BCn type). A 16x16 image will consist of 16 Blocks, while a 512x512 will consist of over 16,000 Blocks!
40x40 Eyeball. We overlayed a Grid to help see the 4x4 Blocks. Notice how each Block has no more than four colors
40x40 Eyeball. We overlayed a Grid to help see the 4x4 Blocks. Notice how each Block has no more than four colors
INDICES AND COLOR
So, what exactly are we storing inside of 4x4 Block for a BC1 Image? First, let's discuss what would be inside a 4x4 Area of an uncompressed image. Each pixel in a "raw" image, stores a unique color. Each "raw" pixel consists of 24-Bit RGB Color, with Red, Green, Blue each consisting of 8-Bits (we are ignoring Alpha for now). And so, one uncompressed 4x4 Block would contain 48 Bytes. Documentation refers to the RGBA channels as components. So for now, we're only talking about three components.
In comparison, BC1 Images are 1/6th the file size compared to the original, totaling 8 Bytes, wow! So, to put it simply, Block Compression shrinks the file size by reducing how many colors a Block stores. Rather than 16 unique colors, each 4x4 Block stores only two, also known as Endpoints. The different BCn types have different ways of going about it, but that's the gist of it: reducing colors!
For BC1, the two colors stored are compressed from 24-Bit [8:8:8] to 16-Bit [5:6:5]. This reduction of color precision means that BC1 compression is very bad with smooth color gradients. Green gets the most precision because the human eye perceives a shift in green more than red or blue.
RGB Half Gradients (0-128), the Top is BC1, and Bottom is Raw full 8-Bit Precision. Chose Half Gradient to more easily see the Banding
RGB Half Gradients (0-128), the Top is BC1, and Bottom is Raw full 8-Bit Precision. Chose Half Gradient to more easily see the Banding
The later BCn versions get more complex, but BC1 is very simple: each 4x4 Block stores two Colors. Pixels have a choice between four colors though. This four Color Table consists of the two stored "endpoint colors" and a 33% and 66% blend between the two Endpoints. Each of the sixteen Pixels store a 2-Bit Index.: enough to decide one of four colors from the table(00, 01, 10, and 11). All sixteen Indices added together is only 4 Bytes, the size of one RGBA color! With the two 16-Bit Colors, that totals to a lean 64-Bits.
Comparing the Data Size of one Uncompressed Block versus one BC1 Block. From sixteen Unique Colors, compressed to four
Comparing the Data Size of one Uncompressed Block versus one BC1 Block. From sixteen Unique Colors, compressed to four
DECIDING ON THE DATA
Let me explain how my program decides what the two Endpoint Colors are (also known as the minimum and maximum colors) for each 4x4 Block. The Microsoft Documentation doesn't provide details on how to decide this, but BC1 is simple enough, so I deduced what to do myself, and implemented it. Take a look at the Python Snippet below. A For Loop iterates over all sixteen pixels. The "score" of the Pixel Color is determined by getting the "length" of the color (the sum of the color channels dividing by three). Max Color starts at Black and Min Color starts at White, so they're almost guaranteed to be overwritten. If the current Pixel value is higher or lower then the current Max and Min Color, they are overwritten until all Pixels are iterated over.
def get_endpoint_colors(self, pixel_set):
# Get Starting Values, Black for Max, White for Min
max_color = qRgb(0, 0, 0)
max_score = ch.get_color_length(max_color)
min_color = qRgb(255, 255, 255)
min_score = ch.get_color_length(min_color)
for pixel in pixel_set:
pix_score = ch.get_color_length(pixel)
# Overwrite Min Color if current Pixel "lower"
if pix_score < min_score:
min_color = pixel
min_score = ch.get_color_length(min_color)
# Overwrite Max Color if current Pixel "higher"
if pix_score > max_score:
max_color = pixel
max_score = ch.get_color_length(max_color)
return [min_color, max_color]
Once the Endpoint Colors are found, we blend between the two Endpoints to get the middle Colors for our Color Table. Apparently, different GPUs use different blend ratios, but I stuck with 33% and 66% blend for the middle Colors.
Finally, we need to calculate the Indices. The Index for each Pixel is calculated very simularly to Min and Max Color. Each Pixel is iterated over and compared with one of the four colors. The Pixel's "score" is subtracted from the current color's "score" to get a "delta."
delta = abs(pixel_score - cur_color_score)
The Color with the lowest "delta" and it's Index value is used for that pixel!
COLOR CLASH AND STAIR STEPS
One major side-effect of using 4x4 Blocks is that the Image is now comprised of a Grid, with the neighboring Blocks knowing nothing about each other's final colors. As a result, there might be "color clashes", not unlike ZX Spectrum games. Two neighboring 4x4 might have very different Color Tables, where in the original image, those set of pixels look more seamless. Take a look at the BC1 Eyeball image above again: the Iris is green but because of the limitations, some 4x4 Blocks turn out with a Grayscale Color Table. Meanwhile, the neighbor Blocks have green in their Color Table; there's a clear transition between one Block to the next... a Color Clash!
The 4x4 Block Grid mixed with "Color Clashing" leads to the issue of "Stair Stepping." Diagonal areas of the Image tend to create these "Steps" because of the four color limitation. Take look at the images below: the bumper of the van and the diagonal bars of the fence (versus the straight bars).
The Stair-Stepping is most noticable on the Van Bumper, as it's diagonal, not aligned with Block Grid. Also, this image shows how Image Noise causes BC1 to look "dirtier"
The Stair-Stepping is yes very noticeable on the Diagonal Bars of the Fence. Look at the Outlines of the Characters though, Blocks jump out
The Stair-Stepping is most noticable on the Van Bumper, as it's diagonal, not aligned with Block Grid. Also, this image shows how Image Noise causes BC1 to look "dirtier"
STENCIL ALPHA
BC1 is 1/6th the size of the Raw Image, and it's able to store 1-Bit Alpha, without needing anymore Bits!! This is thanks to the mathematical concept known as Data Degeneracy, also known as Data Redundancy . So, Nathan Reed's article already explains this very well, so I'll be brief in my own explanation.
It reminds me of the Associative Property back in Pre-Algebra:: the rearranging of the Data produces the same result. For BC1, we store two Endpoint Colors and blend them. The blended Colors are the same, even if we switch the Endpoint Colors around. Swapping the Endpoint Colors is how BC1 stores whether or not a 4x4 Block is using 1-Bit Alpha. If the 1st Endpoint Color's value is greater than the 2nd, then BC1 is in "Opaque Mode", otherwise it's in "Alpha Mode."
When a BC1 Block is in "Alpha Mode," the final color in our Color Table (Index 3) now represents transparency. That's right, we now only have three colors...oh no! When working with only four colors, losing one takes it toll. But luckily, this "Alpha Mode" is set per 4x4 Block, so in the Blue Sphere example below, only the edges suffer with three colors, while the middle "Opaque" Blocks still have four colors.
When I was implementing this Stencil Alpha Mode, I immediately thought about Background Tile vs Sprites on the NES! NES Tiles use 2-Bit Palettes, but Sprites reserve the 4th Palette for "Transparent" pixels, EXACTLY like what BC1 is working with!
Comparing BC1's One-Bit Stencil Alpha to full 8-Bit Alpha
Showcasing how much the Alpha Threshold changes the image
Comparing BC1's One-Bit Stencil Alpha to full 8-Bit Alpha
ALPHA TROUBLE
So, when I implemented 1-Bit Alpha to BC1, I ran into some edge-cases that the documentation didn't warn me about. So I'm talking about it here briefly just in case anyone else runs into these problems. In short, I was getting False Positives
where 4x4 Blocks were opaque, when they should have been transparent, or for other images, in "Alpha Mode" when they should Not have been. Take a look at the images below to see what I mean.
I was digging into it: it seemed I was reaching these edge-cases because the compression down to [5:6:5] RGB, then decompresing when read, was causing the Min Color to all of a sudden have a higher value than the Max Color. I'm still not entirely, all the factors that caused the issue. I 'solved' the issue by rechecking Min Color and Max Color, post 16-Bit compression; making sure the Endpoint Order is truly Max to Min, (or Min to Max for Alpha Mode)
With 5:6:5 Color Compression, I got False Positives where Blocks that should have Alpha did not, and Blocks that should be Opaque were Stenciled-Out...
With 5:6:5 Color Compression, I got False Positives where Blocks that should have Alpha did not, and Blocks that should be Opaque were Stenciled-Out...
DOWNFALL OF BC1
So BC1 is pretty great, maintains alot of the image integrity, at a sixth of the size, with limited alpha solution! But when does BC1 really fall flat? So we already pointed the problem with Diagonals, BC1 doesn't play nice with Image details that pass through multiple 4x4 Blocks. Block Compression looks real bad when viewers can easily identify where the blocks start and end.
All these problems are exasperated with lower resolutions. At low resolution, the 4x4 Blocks cover more percentage of the image, and the four-color limitation really becomes an issue. One Block for a 256x256 image covers ~0.4% of the image, while for 64x64, its ~1.6% and believe me...those Block borders get real noticeable!
The Top Row is the original, uncompressed images decreasing in resolution. The Bottom Row are images using BC1. Notice how BC1 has a harder time maintaining the image the lower the resolution becomes
The Top Row is the original, uncompressed images decreasing in resolution. The Bottom Row are images using BC1. Notice how BC1 has a harder time maintaining the image the lower the resolution becomes
Huge deltas between pixels, i.e. rapid change of color (Noise) does not play well at all. Take a look at the 16x16 Blue Noise image below, BC1 (on the left) completely decimates the concept of "random noise" entirely. The four-color limitation and Block grid is laid bare for all to see. The original Blue Noise has 256 Pixels, 256 unique colors. While BC7 is able to match that, BC1 caps at only 64 colors (16 Blocks * 4 Colors).
I want to mention that good Blue Noise is possible comfortably with just 24 unique colors; it all depends on how the unique colors are distributed amongst eachother. However, Block Compression is meant for compressing Textures that are authored by Artists, without concern for color limitations, tiles etc. Take a look at the second set of noise images to see examples of "good" noise with limited color palette.
The downfall of BC1, for all to see...the 4x4 Blocks boundaries are very obvious, and color limitation breaks any illusion of "Random Noise." While still flawed, BC7 does a better job
Here I'm showing 16x16 Noise alternatives using same or less colors. On Left, the BC1 Block Palettes, but hand-authored to look more "noisy". Middle is 24-Color Blue Noise, and Right is 4-Tone Grayscale Noise
The downfall of BC1, for all to see...the 4x4 Blocks boundaries are very obvious, and color limitation breaks any illusion of "Random Noise." While still flawed, BC7 does a better job
CONCLUSION
BC1 for high resolution, opaque Textures works wonderfully, shaving off to 1/6th data. Smooth color gradients will definitely lead to banding, but it all depends on how close the Player can even get to seeing them! When working with low resolution Textures though, it might be better off to switch to another Block Compression...but you know, when resolutions are so low, the memory footprint delta between using BC1 and BC7 is only several KB... maybe we won't don't even need to worry about it!
Implementing a software-version of BC1 is super straight forward, and believe me, crunching down all those images into Blocks, seeing the results is a lot of fun!
BC7
MORE DETAIL, MORE COMPLICATED
I went straight from the simplest BCn to the most difficult, newest format. Introduced with DirectX 11, BC7 was exponentially more difficult to author for me. First of all, you need to support multiple Modes of encoding and decoding. Furthermore, each 4x4 Block contains alot more data. It's no longer just 2-Bit Indices per Pixel, there's Partitions, Subsets, and the dreaded P-Bits! This section goes over BC7, and I really hope the Mode comparison pictures come in handy (because generating all these examples took more work than I thought)
While BC1 uses four bytes per 4x4 Block, BC7 doubles that with eight bytes (128-Bits) per 4x4 Block. Compared to BC1, the image quality is MUCH more lossless, take a look at this comparison of Nina below!
64x64 Closeup of Nina's Face. BC7 is able to match original pretty closely while BC1 struggles of course with the usual issues
64x64 Closeup of Nina's Face. BC7 is able to match original pretty closely while BC1 struggles of course with the usual issues
SAVED MY BACON
So first, I'd like to tell ya anecdotally how BC7 really saved my bacon on a past project! As a Tech Artist working on, complex, interactive visuals: I encode a lot of Data into Vertex Color, UV Channels and Textures. Often I use all four channel of our Textures, each with special data (encoded into 8-Bit Color channels). These "Tech Art" Textures were large, but we were forced to use uncompressed RGBA8 textures because BC1-BC5 simply BUTCHERED our data. Splitting this encoded data into multiple Textures hurt our performance more, so we had to settle with using just one uncompressed texture.
However, several months into the project, one of our Graphics Engineers implemented BC7 into our proprietary engine, and viola, our data was mostly intact! The Indirection in the Red and Green Channel no longer was causing "holes" to appear in the center of the raindrops, but properly distorting them into non-uniform shapes! In addition to smaller file sizes, we had a slight performance increase all thanks to BC7!
WHAT DO ALL THESE TERMS MEAN
Alright, let's get into BC7. As I mentioned in the Sources section above, I felt that Microsoft's Documentation on BC7 didn't layout the terms clearly for me. This probably speaks of my shortcomings, but I needed to jot down notes, and reread the paragraphs several times to finally grasp what the characteristics that make up the Modes meant. I hope this next section helps somebody understand how to implement a BC7 Tool faster than I did!
MODE
Each 4x4 Block will be selected to use one of eight Modes, which decides what kind of data the Block will be compressed into (number of subets, color depth etc). I have a whole section later below going into detail about each Mode Type. When it comes to determining which Mode to use...each 4x4 Block needs to be compressed N times, and the Mode that "scores" the best will be chosen.
SUBSETS
The 4x4 Blocks are split into groups, with each "subset" having two unique Colors (also referred to as Endpoints) to blend between. Mode 0 and 2 have three Subsets, while Mode 1 and 3 split pixels into two Subsets "groups".
PARTITIONS
The Pattern for how the Subset "groups" are divided up. Each 4x4 Block stores a Partition ID, which is used to grab the specific array from the Partition Table when decompressing the 4x4 Block. Mode 0 only has room for a 4-Bit Parition ID, so it uses only the first 16 Partitions from the Partition Table.
Discovering which Partition is "best" for the 4x4 Block is the most labor-inducing part of the compressing process. Once again, documentation doesn't go into detail about how to determine the best Partition, so this is my solution. We have to brute-force "score" the 16 pixels through all available Partitions...that's right, compressing the Block 64 (or 16) times, then discarding all but the most ideal compression. Once compressed (discovering Endpoints and Indices for each Subset), we subtract each compressed Pixel value with original Pixel value. The sum of the deltas between the 16 pixels is our "score". Like in Golf, the lowest scoring Partition is the winner.
The Khronos Group Documentation has all 64 Partition Tables for both Subset Categories. These numbers can easily be copy-pasted into your code as a series of arrays. Also, I found a great image from Jon Rocatis's Blog that visualizes all the Partition Tables:
The 64 Partition Tables for Two and Three Subsets Source: Jon Rocatis's Blog!
The 64 Partition Tables for Two and Three Subsets Source: Jon Rocatis's Blog!
INDICES
They behave the same as BC1. Indices determine the blend between the two Endpoint colors. BC7 Modes either use 2-Bit Indices or 3-Bit Indices, a whopping eight colors interpolated from the Endpoints. The total amount of potential colors that a 4x4 Block can represent is calculated by Subset Count * Index Depth. I say potentially because depending on Color Depth, the actual Colors being blended etc, might result in identical colors.
P-BITS
Man, these were the hardest to wrap my head just exactly how to use them! What does that "P" stand for anyway ("Per" endpoint??) Anyway, P-Bits are the shared least significant bit between the RGB color channels. So instead of 5-Bit precision, for example, the P-Bits provided appends one extra Bit, for 6-Bits! There are two types of P-Bits (depends on how many Bits the Mode has "leftoever")
-
Unique: One P-Bit per Color
-
Shared: One P-Bit per Subset
The documentation I read doesn't specify how your code should decide if the P-Bit should be Set or Cleared. To me, it made sense to tally the Least-Significant-Bit (LSB) of each Color Channel that would be using the P-Bit. If the majority of Color Channel's LSB was Set, then P-Bit would be Set, otherwise Cleared. The P-Bit should be calculated when the Colors have already converted to a lower precision, not at full 8-Bit Precision.
So why even do this? I suppose the idea of the P-Bit is so the Modes uses exactly their Eight Byte budget. For example, without P-Bits, Mode 1 would have two Bits left over...that's wasteful!
DECODING THE MODE ID
So when decompressing a BC7 Block, how do we know which Mode it's using? Microsoft's documentation has this to say about it::
A BC7 implementation can specify one of 8 modes, with the mode specified
in the least significant bit of the 16 byte (128 bit) block. The mode is encoded
by zero or more bits with a value of 0 followed by a 1.
I don't know about you, but that leaves some gaps for me...no pseudo code either. I hope this helps fill the gaps for some!
So, for BC7, the selected Mode is known based on how many trailing Zeroes follow a One in the Least Significant Byte of the 4x4 Block. Mode 0 has NO trailing Zeroes, while Mode 7 has seven Zeroes. Yes, this means that the Modes gradually get more and more "wasted" bits. I wondered; why can't they store the Mode using the last three Bits of the Block? My best guess is that they wanted to save as many Bits as possible for the earlier Modes (on the flipside, the latter Modes suffer). Anyway, of how Mode ID is encoded, it's possible to get false positives. That's why it's important that the Mode ID is conditionally tested starting from Mode 7 (uses entire Byte) lastly to Mode 0 (uses single Bit). Please take a look at this Python Snippet for visual aid.
def extract_mode(self, block):
final_byte = block[15] # Last Byte (LSB) should contain Mode Bits
if (final_byte & 0xFF) == 128:
return md.mode_7
elif (final_byte & 0x7F) == 64:
return md.mode_6
elif (final_byte & 0x3F) == 32:
return md.mode_5
elif (final_byte & 0x1F) == 16:
return md.mode_4
elif (final_byte & 0x0F) == 8:
return md.mode_3
elif (final_byte & 0x07) == 4:
return md.mode_2
elif (final_byte & 0x03) == 2:
return md.mode_1
else: # Mode 0
return md.mode_0
256x256, it's funny how Mode3 was selected for the entire white border around the characters, creating a green border
128x128: A pretty even distribution between the four opaque BC7 Modes, with Mode1 dominating as usual.
256x256, it's funny how Mode3 was selected for the entire white border around the characters, creating a green border
THE MODES
BC7 has Eight unique Modes that a 4x4 Block can be encoded into. The Modes allow specific parts of the image to be encoded in different ways instead of a one-size fits all solution. Mode 0 to Mode 3 are Opaque while Mode 4 - Mode 7 have Alpha. For my project, I've only implemented the first four Opaque Modes, and I will go over them one-by-one and explain what I feel are their greatest strengths.
The Tool automatically decides the best Mode, users do not ever need to select Modes themselves, but I hope this section helps show why certain Modes "win" over the others. No other Block Compression documentation that I've seen goes over the Modes in this type of detail, so I hope this section in particular gives folk a helpful perspective.
Trivia:: there's also apparently a super-secret Mode 8!! The documentation warns us, if we pass this Mode to hardware, a zeroed-out 4x4 Block is returned.
Mode2 is selected for the 3-Subset Blocks while Mode3 is selected for the 2-Subset Blocks as they should be
More examples of Modes being correctly chosen based on which Mode has the correct number of Partitions and best Color Depth
For gradients, Mode1 dominates because of its 6-Bit (plus P-Bit) Color Depth and its 3-Bit Indices. Although Mode 2 and 3 win out as well
Mode2 is selected for the 3-Subset Blocks while Mode3 is selected for the 2-Subset Blocks as they should be
REMINDS ME OF MEMORY MAPPERS
Before I go into the Modes, I want to quickly interject:: while I was implementing BC7, I realized that these Block Compression formats really reminded me of implementing Memory Mappers for my Conntendo Project a few years back. Both NES Memory Mappers and Block Compression started with simple versions, but got drastically more complex as they went along. For this analogy, I'd say that BC1 is like the UxROM Mapper: simple, very straight forward. Meanwhile, BC7 is definitely like MMC5; many more features and they both have Modes! Man, I LOVE bitwise operations and radical optimization, and these projects give me the chance to do that! Anyway, back to our feature presentation.
Mode 0
Best for Blocks with high color delta; it might be the best Mode for "Blue Noise" because it can (potentially) capture 24 unique colors! However, it has the weakest Color Depth, and is very bad with capturing gradients.
Mode 1
Because of 3-Bit Indices, Color Depth, and Partitions, this Mode is the best at smooth, accurate gradients. Two Subsets means its no good with "junction points" where colors shift in the image though. As long as the "hue" is similar across the block, it's golden. According to my tests, this is the most often used Mode!!
Mode 2
Great with a limited amount of high-contrast colors. Simular to Mode 0, but swapping more partition choices for measly 2-Bit Indices. If there are no more than six colors, it'll preserve the data very well.
Mode 3
Has the highest Color Depth, and can capture a limited number of colors very accurately. The best Mode for if the Block can be split into two distinct gradient patterns. Of the four opaque Modes, it can (at best) capture 8 unique colors...
The Only Mode with 16 Partition Choices
The Only Mode with 16 Partition Choices
Only Mode with Shared P-Bit, so it's extra difficult for that P-Bit to truly be useful
Only Mode with Shared P-Bit, so it's extra difficult for that P-Bit to truly be useful
Three Subsets, with high Partition Selection means it has the best chance of finding the right "jigsaw piece"
Three Subsets, with high Partition Selection means it has the best chance of finding the right "jigsaw piece"
Great Color Depth means it could be very good with a gradient block, but 2-Bit Indices means it could also be pretty bad!
Great Color Depth means it could be very good with a gradient block, but 2-Bit Indices means it could also be pretty bad!
COMPARING THE MODES
I created a 16x16 Test Image meant specifically to demonstrate the range of each Opaque Mode. Top Left for Mode 0, Top Right for Mode 1, Bottom Left for Mode 2, and Bottom Right for Mode 3.The two collages below: show this Test Image compressed entirely using one Mode each. The bottom, black n white Collage shows how accurate the colors are; the whiter the pixel, the more the compressed pixel strays from the original.
As you can see: because of it's low Color Depth, Mode 0 strays more than the others. That said, it handles the "Blue Noise" Top Left better than the other three Modes (although still quite flawed. The beauty is that BC7 uses all these Modes to minimize loss from compression. That's why BC7 is the champ...its Modes turn it into a swiss-army-compression-knife!
Each Image shows compression using one Mode Only. Helps showcase the strengths/weaknesses between the Modes
A 16x16 Image where each 8x8 Corner was tailor-made to show the best of Modes 0, 1, 2, and 3
Each Image shows compression using one Mode Only. Helps showcase the strengths/weaknesses between the Modes
The Pixels are compared with original. The whiter the pixel the more inaccurate the Pixel has become compared to the original
Here is Robocop completely compressed using one Mode each. By far the biggest difference you'll notice is the background:: where Mode 0 and 2 have alot more banding because of the weakness with gradients. The reflections in Robocop's suit come out slightly stronger though
The Pixels are compared with original. The whiter the pixel the more inaccurate the Pixel has become compared to the original
TIMING
No doubt about it, BC7 takes dreadfully longer than BC1 to write out an image. This is because, for each 4x4 Block, BC7 needs to compare "the score" of every Mode to determine which is the most suited Mode to use. On top of that, every Mode needs to choose the appropriate Partition, and iterate through all those too. In short, BC7 needs to calculate each 4x4 Block dozens of times, yikes.
So look at the chart below to see the time it takes for BC1 and BC7 to encode then decode the image at various resolutions. The Timing Increase roughly correlates to the resolution increase: doubling the resolution, quadruples the pixel count, very roughly quadrupling the timing. As you can see, BC7 takes painfully longer. At very low resolutions, this is hardly felt, but as early as 128x128, there is substantial wait times.
To record the timings, I used the 'time.process_time' function in Python. Perhaps it is the function, or some other aspect of Python, but the timing results are wildly nondeterministic. Time to encode/decode varied wildly by 50%. So, I chose the median Time from running the compression five times each. Please consider this chart a rough approximation of the time-cost between BC1 and BC7.
No doubt, real hardware-accelerated Block Compression is done much quicker. I mean my program is single-threaded, using Python...multi-threading and a compiled language would run circles around this.
A Chart that shows the Time it takes for BC1 and BC7 to encode/decode an Image at various resolutions. BC7 takes substantially longer
A Chart that shows the Time it takes for BC1 and BC7 to encode/decode an Image at various resolutions. BC7 takes substantially longer
CONCLUSION
Yeah, it's true, ya don't need to thoroughly understand the ins-n-outs of how the Block Compression works to use them effectively on your Textures. Heck, for most cases, leaving at default for hi-res Textures will probably work fine: BC1 for BaseColor, and BC5 for Normal Maps etc. Even if I was hired as a Graphics Programmer, I doubt any game company would need me to actually implement custom Block Compression techniques. That said, you never know when your experiences will come in handy, even if it's just a springboard to a seemingly unrelated issue.
At the very least, when Artists ask why their Textures look so "stair-steppy," I can now more thoroughly explain it, rather than simply tell em, "switch to RGBA8"! I certainly had a lot of fun with this project...apart from the struggles of muscling through it with Python of course haha.