(Reading: We strongly suggest you read chapters 5 and 9 of Head First HTML with these notes.)
Today, we will build on our idea of representation to discuss how images are represented in digital form. We'll work up to it, first starting with how color is represented (which is based on the physiology of the human eye), then looking at images as rectangular arrangements of spots of pure color. Finally, we'll calculate the file size of an image and discuss one way of compressing the file so that it is smaller and therefore faster to download. This compression is, in fact, a different representation of the information. We'll also briefly mention two other representations, but we won't spend time in class discussing those formats. You can read that on your own.
Before we see how all possible colors can be represented, let's first start with a simple list of a few built-in colors that we can specify by name.
What color names can we use? All standards-compliant browsers promise to handle (at least) the following 17 color names. (Ignore the middle column for now.) View this page on the web to see the color samples.
You can use such colors in a variety of ways. For example, to make colored text, you can use the CSS properties of "color" and "background-color" in the SPAN tag. There are many other CSS properties whose values are colors.
This sentence uses too many colors!
This is accomplished by:
<span style="color: black; background-color: yellow">This</span> <span style="color: red; background-color: gray">sentence</span> <span style="color: white; background-color: olive">uses</span> <span style="color: lime; background-color: maroon">too</span> <span style="color: aqua; background-color: purple">many</span> <span style="color: fuchsia; background-color: gray">colors!</span>
For other colors, it is safest to express them numerically (though many browsers will recognize these color names). Furthermore, those non-standard names will not pass the validator, as many of you discovered in earlier assignments.
Like numbers and text characters, colors are represented by numbers in the computer. How? For that, we need to understand additive colors and color vision.
Our retinas happen to have rod-shaped cells that are sensitive to all light, and cone-shaped cells that come in three kinds: red-sensitive, green-sensitive, and blue-sensitive. Therefore, there are three (additive) primary colors: Red, Green and Blue or RGB. All visible colors are seen by exciting these three types of cells in various degrees. (For more information, consult these Wikipedia articles on additive color and color vision.)
Color monitors and TV sets use RGB to display all their colors, including yellow, chartreuse, you name it. So, every color is some amount of Red, some amount of Green, and some amount of Blue.
On computers, RGB color components are standardly defined on a scale from 0 to 255, which is 8 bits or 1 byte. Play with the Color Pad applet to get a feel for this. Examples:
The knowledge of RGB colors comes in handy with CSS. In CSS, you can specify a color in several ways. In the following numerical examples, all three specify turquoise, a light blue-green color like this.
color: rgb(64,224,208); /* three RGB numbers in the range 0-255 */ color: rgb(25%,88%,82%); /* three RGB percentages */ color: #40E0D0; /* three RGB numbers expressed as a hexadecimal triple */ color: turquoise; /* a color name supported in many, but not all, browsers */
Many browsers support more than the 17 standard names, but it is unwise to count on all browsers supporting an odd name like "turquoise." It's safest to use one of the numerical methods.
People use decimal (base 10), computers use binary (base 2), but programmers often use hexadecimal (base 16) for convenience.
Binary numerals get long very fast. It is not easy to remember 24 binary digits, but you can more easily remember 6 hexadecimal digits. Each hexadecimal digit represents exactly four binary digits (bits). (This is because 24=16.)
One way to understand hexadecimal is by analogy with decimal, but we're all so familiar with decimal numerals that our reflexes get in the way. (In fact, humans throughout history have used many different numeral systems; decimal is not sacrosanct.) So, we first need to break down decimal notation so that you can see the analogy with hexadecimal. For now, we'll stick with two-digit numerals, but the same ideas extend to any larger numbers.
Decimal notation works by organizing things into groups of ten, then counting the groups and the leftovers: Suppose you had a bunch of sticks on the ground and you bundled them all into groups of 10 with some left over (fewer than 10). Now, use a symbol to denote the number of bundles and another symbol to denote the number of sticks left over. You've just invented two-digit numbers in base 10.
Hexadecimal: Do the same thing with bundles of 16, and you've invented two-digit numbers in base 16. For example, if you had thirty-five sticks (I'm trying to avoid decimal notation), they could be bundled into two groups of sixteen and three left over, so the hexadecimal notation is 23. Careful! That numeral isn't the decimal number twenty-three! It's still thirty-five sticks, but we write it down in hexadecimal as 23.
To distinguish a decimal numeral from a hexadecimal numeral, we use subscripts. So, to say that thirty-five sticks is written 23 in hexadecimal, we can write:
Both decimal and hexadecimal notations are based on place value. We say that 2316 means 3510 because it's a "2" in the sixteens place and "3" in the ones place, just like 3510 has a "3" in the tens place and a "5" in the ones place.
Let's take another example. Suppose we have 2610 sticks. That's one group of 16 and 10 left over. How do we write that number in hexadecimal? Is it 11016??? That is, a "1" in the sixteens place followed by a "10" in the ones place??? No; that would be confusing, since it would look like a three-digit numeral. We need a symbol that means ten. We can't use "10," since that's not a single symbol. Instead, we use "A"; that is, A16=1010. Similarly, "B" means 11, "C" means 12, "D" means 13, "E" means 14, and "F" means 15. We don't need any more symbols, because we can't have 16 things left over, since that would make another group of 16. The following table summarizes these correspondences and what we've done so far.
To convert a big decimal number to hexadecimal, just divide. For example, 23010 divided by 16 is 1410 with a remainder of 610. Thus, the hexadecimal numeral is E616. To convert a hexadecimal number to decimal, just multiply: E616=E*16+6=14*16+6=230.
Try the following conversions as an in-class exercise. You can use a calculator, you can ask your neighbors, anything you like.
You can check your work with the following form:
Now that we know both hexadecimal and binary, you can convert binary to hexadecimal (and vice versa). However, you would probably do so by converting the binary number to decimal and then the decimal number to hexadecimal. There's a better way, involving almost no arithmetic (or, rather, all the arithmetic is with one-digit numbers you can add in your head). Indeed, this technique is the reason that computer scientists like using hexadecimal. (Well, this and getting to spell words like ACE and DEADBEEF with hex digits.)
Let's start with an example. Suppose you need to convert the following from binary to hexadecimal:
01010100 = ??16
What we're going to do is to take the bits in chunks of four bits, so to mark the chunks we'll insert a period in the middle of the number:
0101.0100 = ??16
Now, we just convert each chunk directly into hex. The first chunk, 0101, is just the number 5. The second chunk, 0100, is just the number 4. Those are already in hex, so we are done:
0101.0100 = 5416
(Try doing it via decimal, to check. The decimal value corresponding to both of these is 80+4=84.)
Let's do another one, this time with slightly larger values:
10101100 = ??16
Again, take the bits in chunks of four bits:
1010.1100 = ??16
Now, we just convert each chunk directly into hex. The first chunk, 1010, is 8+2 or 1010, which is the digit A in hex. The second chunk, 1100, is 8+4 or 1210, which is the digit C in hex. So we are now done:
1010.1100 = AC16
(Again, check our work by doing it via decimal. The decimal value corresponding to both of these is 160+12=172.)
Why does this work? Suppose we needed to convert 172 from decimal to
hex: our first step would be to divide the number by 16. In binary,
binary point to the left by one place is equivalent to
dividing by two, so moving the binary point four places is equivalent to
dividing by 16. So when we put a period in the middle of the 8-bit
binary number, it was exactly the same as dividing by 16. We then have
the quotient to the left of the binary point, and the remainder to the
right of the binary point. Just convert each to hex, and we are done.
Notice that the only arithmetic we have to do is converting each chunk of four bits to the equivalent hex digit. The mental arithmetic involved is limited: we know that (1) we are adding one-digit numbers, (2) at most four of them, and (3) the sum will always be less than 16.
This technique works in reverse, too. Let's convert DEADBEEF from hexadecimal to binary:
???????? = DEADBEEF16
1101.1110.1010.1101.1011.1110.1110.1111 = DEADBEEF16
Now, to be honest, it may seem ridiculous to convert a large hex digit like D to binary in your head. First, you don't have to do it in your head; a few quick calculations on paper will suffice. Second, if you can manage to remember (memorize) just three values, you can get the rest by mental arithmetic. Remember:
A = 1010 = 8+2 = 1010
C = 1210 = 8+4 = 1100
F = 1510 = 8+4+2+1 = 1111
F is the easiest to remember, because it's the largest four-bit number, so all the bits have to be one. So, F is just 1111, with no arithmetic at all.
Then, if we have to convert B, just remember that B is one more than A, so write down the bits for A (1010), but add one and write down 1011.
If we have to convert D, just remember that D is one more than C, and so we write down 1101.
If we have to convert E, just remember that E is one less than F, and so we write down 1110.
We already know that every color in a computer is a combination of some amount of each of the three primary colors: red, green and blue. The amounts are always given in the same order: red, green, blue. The amounts are numbers from 0 to 25510, or, in hexadecimal, 00 to FF16. Each primary is expressed as a two-digit numeral in hexadecimal, using a leading zero if necessary so that the numeral is always two digits. Three pairs of hexadecimal digits completely specifies a color. Finally, the notation for a color always starts with a pound sign (#). For example, a color like (35, 230, 10) would be written #23E60A.
Now you can construct the colors above (gold, cornflower and DodgerBlue). Furthermore, you can understand the middle column of the table at the top. Let's go back to that table and observe the following:
FFis greater than
FF, which is roughly half the brightness of that primary. For example, red (
#FF0000) is preceded by maroon (
Here is a more complete color name list .
Using a web page you created previously (or this example web page ), experiment with defining a color numerically. Use the SPAN tag to color some text. If you can't think of a color to try, try Chocolate. The syntax is:
<SPAN style="color: #RRGGBB"> text </span>
If you want a preview of the color, type the color value you want to try (in the #RRGGBB syntax) in the following form and press return/enter:
That's it! It takes some practice to get the hang of computing the hexadecimal numerals, but nothing you haven't done before.
Now that we know how to represent a color, we can represent images. You can think of an image as a rectangular 2D grid of spots of pure color, each represented as RRGGBB. A spot of pure color is called a pixel, short for picture element, the atom of a picture. Pixels are better seen if you blow up an image several times; here are some examples. Check out the following description of pixels on page 44.
Every image on the computer monitor is represented with pixels, including the windows themselves! Such images are saved in files that, in addition to the image data, contain information on the size of the image, the set of colors used, the origin of the image, etc. Depending on how exactly this information is saved, we refer to them as image formats. GIF, JPG, PNG, QT, and BMP are some of the well-known image formats. We will talk more about image formats below. For now, we will focus on the number of pixels and the representation of each pixel, and consequently, the file size of the image.
We said above that the amount of each primary color is a number from 0 to 25510 or 00 to FF16. It is no coincidence that this is exactly one byte (8 bits). A byte is a convenient chunk of computer memory, so one byte was devoted to representing the amount of a single primary color. Thus, it takes 3 bytes (24 bits) to represent a single spot of pure color.
Aside: with 256 values for each primary, that yields 256 x 256 x 256 = 16,777,216 colors. Humans can distinguish over 10 million colors, so 24-bit color is sufficient to represent more colors than humans can distinguish. All modern monitors use this so-called 24-bit color. Some old monitors used 16-bit or 8-bit color, which were relatively impoverished, being only able to represent 65,536 colors (for a 16-bit monitor) or 256 colors (for an 8-bit monitor). Of course, a black-and-white monitor can only represent two colors, which could be called 1-bit color.
Since each pixel takes 24 bits (3 bytes), to represent, even a small picture can require a surprising amount of space.
Example: A good monitor might have 100 pixels to the inch, so a picture the size of a 3x5 index card would be 300 pixels by 500 pixels. That's a total of 300x500=150,000 pixels. Since each pixel takes 3 bytes, the file size for the image is at least
300 x 500 x 3 = 450,000 bytes
This is about 450 kilobytes (abbreviated kB, the "k" is lowercase, but theBis uppercase; see the note on abbreviations) or nearly half a megabyte. Not only is that a lot of storage space, but more importantly it takes significant time to download unless your modem is very fast. For example, if you have an old-style telephone modem that can only handle 56kbps (56k bits per second) = 7kBps (7k bytes per second), you will need a little over 1 minute to download it (recall that 1 byte = 8 bits). That's a lot of time.
Telephone modems? Yes, some people still use telephone modems. But faster DSL modems (ranging from 128 kbps to 1500kbps) and cable modems (ranging from 300 kbps to 6000kbps) have become very popular.
However, the advent of faster connection speeds has been accompanied by the rise of websites with content (higher-resolution photos, songs, videos) that completely consumes the additional bandwidth. So no one ever has enough network bandwidth, and it's wise to avoid squandering it. If someone in your audience finds your website slow to download, they'll move on to another website.
On the first day of class, Scott took approximately 30 digital pictures of students, each of which was about 2MB.
Short of making our images smaller (fewer pixels), what can we do to speed up the downloads? We can compress the files.
There are two classes of compression techniques:
We will look in detail at one kind of lossless compression, which is indexed color (GIF encoding), because it gives us a window into the kinds of ideas and techniques that matter in designing representations of information.
The idea behind indexed color is that if a particular color is used many times in an image, we can create a "shorthand" for it. In fact, if we limit the number of colors, each one can be assigned a shorthand. What will be confusing is that the colors are, of course, represented as numerals and so are the shorthands! For example, instead of saying (for the umpteenth time), color #D619E0, we'll just say, for example, color number 5. This will only work, however, if the shorthands really are shorter. They are, and we'll see exactly how much.
One way to think about indexed color is that we are creating a "paint-by-numbers" picture. We choose
Example (see this earlier example): Imagine that the 300x500 picture uses only two colors, say red and yellow. Suppose we make up a table of colors (two entries) and then represent the image with an array of "color indexes," like a paint-by-numbers set.
- What is the numbered list of colors? There are just two:
index color 0 #FF0000 1 #FFFF00
- We then paint the picture using just two numbers, 0 and 1. A zero means a pixel is red, and a one means the pixel is yellow.
- How many bits does it take to represent this image? Well, there are 300x500 or 150,000 pixels, but each one is just 1 bit, so it takes 150,000 bits or 150,000/8 = 18,750 bytes or about 18 kB. Compare that with the 450 kB in our earlier example, and you can see this is much smaller. In fact, it's 1/24th the size, since each pixel takes 1 bit to represent rather than 24. It'll be 24 times faster to download.
- What about that table of colors? That's called the color palette, by analogy with an artist's palette. That has to be represented too. Otherwise, the browser would know there were only two colors in the picture, but wouldn't know what colors they are. There are two entries in this palette, each of which is 3 bytes (24 bits), so add at least 6 more bytes to the representation.
You can see the general scheme at work: we create a table of all the colors used in the picture. The shorthand for a color is simply its index in the table. We will limit the table so that the shorthands will be at most 8 bits. Since the shorthands are all replacing 24-bit color specifications, the shorthand is at most one-third the size. In the example above, the shorthand is 1/24th the size.
Let's continue with the example. What is the file size if the image uses 4 colors, say red, yellow, blue and lime? In that case, the table looks like this:
index color 00 #FF0000 01 #FFFF00 10 #0000FF 11 #00FF00
As you can see, the shorthand is now two bits instead of one. Therefore, the 150,000 pixels require 300,000 bits or 300,000/8=37,500 bytes or about 37.5kB. Obviously, this is about twice the size of the previous example, since each shorthand is now twice as big. Nevertheless, it's still much smaller than the 450 kB uncompressed file.
What about the size of the palette? That's now twice as big, too. Four entries at 3 bytes each adds 12 bytes to the file size, which is a negligible increase to the 37.5 kB.
What's the pattern here? The number of colors in the original image determines the size of the palette, which determines the number of bits in each shorthand, which then determines the size of the file as a whole. The shorthand for a color is simply the binary numeral for the row that the color is in the table. For example, the color red in the last example was in row zero (00 in binary) and the color lime was in row 3 (11 in binary). However, the relationship between the number of colors and the size of the shorthand is not an obvious one. Let's do one more example before we state the rule.
Suppose that the same 300x500 image uses 16 colors, say sixteen of the named colors that we began this lecture with. In that case, the table looks like this:
Indexes, Color names, hexadecimal values, and samples shorthand Color name #RRGGBB Example 0000 black #000000
0001 gray #808080
0010 silver #C0C0C0
0011 white #FFFFFF
0100 maroon #800000
0101 red #FF0000
0110 olive #808000
0111 yellow #FFFF00
1000 green #008000
1001 lime #00FF00
1010 teal #008080
1011 aqua #00FFFF
1100 navy #000080
1101 blue #0000FF
1110 purple #800080
1111 fuchsia #FF00FF
As you can see, the shorthand is now four bits. Therefore, the 150,000 pixels require 600,000 bits or 600,000/8=75,000 bytes or about 75 kB. Larger, but still much smaller than the 450 kB uncompressed file.
What about the size of the palette? Sixteen entries at 3 bytes each adds 48 bytes to the file size.
You can see that the number of bits required for each pixel is the key quantity. This quantity is called bits per pixel or "bpp." It's also often called "bit depth" so that the file size of an image is just width times height times bit depth, almost as if it were a physically 3D box.
Finally, we can state the rule:
The bit depth of an image must be large enough so that the number of rows in the table is enough for all the colors. If the bit depth is d, the number of rows in the table is 2d.
Here's the exact relationship, along with the size of a 300x500 image:
Mapping bit-depth to number of colors bit-depth max colors file size of 300x500 image 1 2 18kB 2 4 37kB 3 8 55kB 4 16 73kB 5 32 91kB 6 64 110kB 7 128 128kB 8 256 147kB
Consider an image that is 80 x 100.
In summary, you can reduce your image file size by using fewer colors. Of course, this may reduce the quality of your image. It's a tradeoff.
The GIF file format (i.e., image representation) is the best known example of indexed color format. Here is how it works: Imagine a mural painter who will go to your house and paint a mural on your wall, anything you want. But there's a catch: she'll only make one trip to your house, and her van only holds 256 cans of paint. She has a warehouse of 16 million cans of paint, and you can choose any 256 that you want, but you can't have a mural with more than 256 different colors in it.
This is the essential idea behind GIF images and indexed color.
We've learned how indexed color works and how it affects file size. This is important not only for the theoretical understanding of why representations matter, but also for the practical usefulness of understanding how to reduce the sizes of your images. In this section, we'll review how to compute the approximate size of an indexed-color (GIF) image. Why do we do this? Because it combines all the conceptual issues into one small calculation.
Note that there are many additional details affecting the size of GIF images that we will not cover here. One detail is a certain amount of fixed overhead for representing information like the file type (so that the computer can tell it's a GIF image and not a JPEG, PNG or even a DOC file), and a few bytes to store the dimensions of the image (its width and height and the number of colors). Another detail is that the bits representing the pixels can be compressed further (in a lossless way) using a standard bit-compression algorithm. And GIF images support transparency and animations as well. For more information see this summary and this detailed explanation of GIF file format. We will not be concerned with these details. We will focus on the relationship between the file size and the dimensions of the image, including the number of colors.
A key concept in the computation is the bit-depth of the image. Read on page 19 the definition of bit-depth. It's the number of bits necessary to represent the desired number of colors. Remember that the number of colors is 2d, where d is the bit depth. It's an exponential relationship. Adding just one bit to the bit-depth doubles the number of colors you can have.
Recall that the GIF representation comes in two parts:
Thus, our computation breaks down into two parts.
width * height * bit-depth / 8
num_colors * 3
To find the rough size of an image, we first determine the
bit-depth, then we compute the file size using the two formulas above.
(This is the
rough size because, remember, we are omitting some
fixed overhead and further compression techniques.) You can combine
them into one formula:
(width * height * bit-depth) / 8 + (3 * num-colors)
Finally, because the file size will usually be large (thousands or millions of byte), we divide by 1000 or 1,000,000 to convert to kilobytes or megabytes, as appropriate.
This can be confusing at first, so you're encouraged to read through this series of examples using images of flags. Then try the following exercise.
Find the file size for the following images:
Most image manipulation programs will tell you the file size in
whatever format you're working with (GIF, JPEG, PNG, BMP, TIFF ...)
and many will also estimate the download times for various network
speeds. Thus, there is no reason in practice to have to compute these
by hand. Nevertheless, in this course, we expect you to
understand this computation and be able to do it.
Why would we do this? The main reason is intuition. Why does adding one more color sometimes matter a lot and sometimes matter hardly at all? This formula explains why. The mathematical relationship captured by this formula is logarithmic and it goes in discrete steps instead of a smooth curve. This may not be something you've encountered very often in your life, and so, frankly, it may seem a bit odd right now. If you just rely on computer programs to do the arithmetic for you, the result will continue to surprise you. It's worth acquiring some intution about this, so that you'll be more confident and effective when you're manipulating images.
Furthermore, many other relationships in computer science have this kind of stepwise logarithmic (or stepwise exponential) behavior, and so intuition about this kind of relationship is a good foundation for further exploration in the field.
(The following sections will probably not be covered in class, due to time constraints, but feel free to bring questions.)
GIF is popular, but it's not the only image format. A big limitation of GIF is that it can have a maximum of only 256 colors. Images of real-world objects have thousands of subtle shades of colors: reducing the number of colors to 256 would make them look cartoon-like.
For a real-world image, we would probably choose the JPEG image format. JPEG is a lossy compression technique. The details are complicated, but you can look at examples (say in the Wikipedia article, above) and see that blocks of similar colors are replaced with something like their average color. This reduces the number of colors, which is then compressed by a scheme somewhat like the indexed-color idea.
Another popular and important format is PNG
|(27,639 bytes)||w/ 2 colors (738 bytes)||at 5% quality (918 bytes)|
Consider the comparison above, in which the same simple image is represented in three different formats. The JPEG image looks bad, and yet it's actually bigger than the GIF. This example illustrates a situation in which GIF does better than JPEG. Of course, there are situations in which JPEG does better than GIF. Finally, although PNG is bigger than either, it can be edited by Fireworks in vector mode, not bitmap mode, which means that the red cross is an object you can manipulate, not just a collection of bits.
No one format is best
The following topics are for the intellectually curious student.
You've gone to a lot of work to learn hexadecimal, when in many cases you can just look up the color you want, or use decimal in CSS. Is there any other reason to have any familiarity with hexadecimal? There are lots, but here's one that is quite practical: URLs.
We've told you that you can't use certain characters in a URL. For example you can't use spaces, ampersands, slashes, question marks, and many others. What if, for some reason, you really want to? You're linking to something with a particular name, and you can't (or don't want to) rename it. Here are some examples:
How does this all work?