The internet is not secure. Sending an email is like sending a postcard: anyone can read it along the way. The same is true of web traffic: when your browser sends a URL, it's readable by anyone along the path, and when a server sends the page back, it's readable also.
So what? What do we care if someone can read a web page? We don't. But what about sending your credit card to a server, like Amazon.com? For that, you can use https.
The way this works is by cryptography, which we'll learn more about in a few minutes. Here is an outline of the mechanism:
Thanks to the secret code, any information that you send to the web site, such as your credit card number, can't be gotten by anyone along the path.
Very secure. Serious thought by lots of smart people has gone into these secret codes. It would take tremendous resources to crack the codes, and almost no one short of government agencies has those resources.
On the other hand, the web server could be hacked into. None of the things we are talking about make the web server machine any more secure. For that matter, nothing here makes your personal machine any more secure. In fact, a famous security guy once joked:
[It's] like using an armored truck to deliver cash from someone who lives on a park bench to someone who lives in a cardboard box.
To our knowledge, the only thefts of credit card numbers on the web have been from people hacking into web sites. No one has yet cracked the secure communications.
Moral: If you're worried, read the security information on the web site. Ideally, the personal info (credit card numbers, social security numbers, whatever), shouldn't be stored at all. Failing that, it should be stored on a separate computer.
All the preceding discussion of "trunks" and so forth was rather abstract. We'll now make that much more concrete by talking about encryption.
Let's talk about how machines can communicate in a secure way. A little while ago, we talked about analog and digital signals, and this lecture will remind you of that one, but we're talking about a higher level: not just how information gets from point A to Point B, but how it can get there without anyone in between knowing what is being said.
Lots of us had fun with "secret codes" and such when we were kids. We'd get "secret decoder rings" in our cereal boxes and make up messages to send to each other. Could be useful if they were intercepted by the teacher! We hope that today's lecture evokes that sense of fun.
When we discuss encryption, it's traditional to name the people involved, just so we can keep things straight. The cast of our little drama is:
The story is that Alice is sending Bob a message through some insecure medium (not whispering in his ear): think postcard or radio or the Internet. Alice might be a field agent and Bob is her contact at HQ, or vice versa. Alice and Bob might not even know each other: perhaps Alice is trying to persuade Bob to defect, and Eve is trying to find traitors.
The messages might be sent by radio, or phone, or email, or carrier pigeon. All media are potentially insecure. Radio waves can be picked up by other receivers, phone wires can be tapped, as can computer networks. Even carrier pigeons can be captured. In this discussion, we will assume that Eve can hear or read everything that Alice and Bob say to each other.
One approach for Alice and Bob to talk privately (without Eve understanding what they say) is to put the message in code. Humans have invented lots of codes in history, and code-making and code-breaking have often been pivotal. Indeed, Alan Turing and his code-breakers in England may have done more to win WWII against Hitler than General Patton, because they were able to break the "Enigma" machine used by the Germans. Similarly, the Navajo Indians, who used their native language as an unbreakable code in the war in the Pacific, contributed greatly to the war effort without ever firing a shot.
Terminology:
Caesar codes are named because Julius Caesar used them to communicate with his troops, so they're hardly new. They're also not particularly hard to break, but if you have to write your message on a postcard, it's better than nothing. The idea is letter substitution. A popular and easy way of doing it is "rotating": each letter is replaced by one that is "n" letters before or after it in the alphabet. For example, we replace each letter with the one preceding it in the alphabet (and "A" is replaced with "Z").
Plaintext: IBM Ciphertext: HAL
(You may remember that the computer in Arthur C. Clarke's classic science fiction novel 2001: A Space Odyssey was the HAL 9000. Many people believe this was a thinly veiled reference to IBM, using a "-1" rotation code.)
Caesar actually used a rotation of 3, so A was written as D, B as E, and so forth. Thus, "ATTACK" would be sent as "DWWDFN".
Another popular Caesar code is called "rot13" and has been used for many years on the Internet as a way to hide things from inadvertent reading, such as movie spoilers or dirty jokes. You have to "decode" it using "rot13" to read the spoiler or the dirty joke. Rot13 is just a Caesar code incrementing each letter by 13. This is convenient because you can decode using the same software as encoding, since rotating twice by 13 brings a letter back to where it started.
If you want to encrypt using a Caesar code "by hand," you can use the following form. Choose an amount to rotate by, then "write" your message using the letter from the cipher alphabet instead of the one from the plain alphabet -- in other words, the one below the one you really mean. For example, setting "rotation" to 1, you can see "B" below "A" and "C" below "B" and so forth. Setting it to 25, you can see "H" below "I" and "A" below "B" and "L" below "M," giving "HAL" for "IBM."
Use the form above to translate some simple words into code. Translate the first two using Caesar's code (rotate by 3) and the last by rot13.
| plaintext | ciphertext |
|---|---|
| BY SEA | |
| BY LAND | |
| ATTACK |
You can check your answers and try out other Caesar codes using the following form. Note that this form converts the string to uppercase. Also, it only rotates the letters; it leaves the numbers, punctuation and other characters alone. It could be easily re-written to handle more stuff.
So, now you and a friend can exchange secret messages, say by email, just by encrypting them using this form. You just have to agree on a rotation amount (and keep it secret!).
Earlier, we ignored punctuation and anything other than a letter. Obviously, that makes for a bad code. What we really should do is treat all characters the same. How is this done? First, we have to say what a character is.
As you know, everything in a computer is just numbers. How does it deal with characters? Essentially, early computer designers agreed on a set of characters and a standard numbering for each one. For example, A is a 65. One such system was ASCII: American Standard Code for Information Interchange. Here's part of the ASCII code:
You read this table just like an addition table. Add together the row and column headers, and you find the numerical value (ASCII code) of the character at the intersection of that row and column.
Note: The ASCII code is not about encryption; it's just a standard for numbering characters. The existence of such a numbering means that we can do rotation codes numerically, like this:
y = (x+amount)%128;
ASCII is now being supplanted by UNICODE, which is a vastly larger code, designed to handle all the world's languages.
Caesar ciphers are relatively easy to break, as is any cipher based just on substitution of letters. For example, the most common letter in the ciphertext is probably the encypted form of a very common letter, such as "e" or "t" or "a." With more sophisticated statistics, and trial and error, both of which computers are good at, it's fairly easy to crack a substitution cipher.
One way to spoil the statistics of a Caesar cipher is to use multiple Caesar ciphers. In other words, suppose we rotate the first letter of the plaintext by 3, the second letter by 1, and the third letter by 20. Then we repeat, rotating by 3, 1, 20, until we're done with the plaintext. Such a technique is called a "Vigenere Cipher," named for its inventor, Blaise de Vigenere from the court of Henry III of France in the sixteenth century. It was considered unbreakable for some 300 years!
Another way to think of the Vigenere Cipher is as follows. Write down a "keyword," such as "cat." Then, use the index of each letter as the amount to rotate. For example, "c=3," "a=1" and "t=20." Write down the keyword above the plaintext and use it to select the correct rotation. It helps to have a table of all the rotations:
Here's the idea. To encrypt the message:
ATTACK AT DAWN
using the keyword "CAT," we write the keyword above the message repeatedly:
keyword CATCATCATCATCAplaintext ATTACK AT DAWNciphertext
Now, below, we compute each letter of the ciphertext by looking at the intersection of the row of the keyword and the column of the plaintext. (In fact, because of the way the table is set up, it doesn't matter whether the keyword character is the row or the column.) In our example:
keyword CATCATCATCATCAplaintext ATTACK AT DAWNciphertext CTMCCD AM DTYN
Notice:
Use the table above to encrypt "ATTACK" using the Vignere cipher and the keyword "CODE". Notice how the A's and T's never have the same substitutions.
Because a Vigenere cipher uses more than one substitution alphabet, it's one of a bunch of ciphers known as polyalphabetic.
The Germans in WWII used an encryption scheme based on a polyalphabetic cipher. They built machines to do the encryption and named them "Enigma" machines. They considered the encryption by the Enigma machine unbreakable, and relied on it to communicate with their U-boats in the North Atlantic. The Enigma machine looked like manual typewriters with keys and hammers, but internally the keys were attached to drums that did the substitutions by "re-wiring": mapping the keys to different hammers. Actually, there were three (later four) rotors that did the substitutions and after each letter was typed, the rotors turned like a car odometer (the rightmost fastest, and so forth).
The British managed to steal one of the machines and figure out how it worked. (This was the story of the movie U-571, except that in the movie the Americans stole the machine, but in real life the British were the heros.) Unfortunately, they still needed the keyword (the settings of the rotors). A group of mathematicians led by Alan Turing (and including quite a few women) were able to analyze the transmissions and, eventually, crack the code. This was a major turning point in the war, and the Allies went to enormous effort and sacrifice to conceal the fact that they had cracked the code.
Here are some links about the Enigma machine:
Now that we understand something about encryption, let's go back to our initial scenario and see how it works. Alice wants to send a message to Bob, without Eve or anyone else being able to read the message. She encrypts it using the secret key, and sends it to Bob. Even if Eve intercepts the message, she won't be able to read it. Bob uses the secret key to decrypt the message and read what Alice says.
Suppose that Alice is behind enemy lines and Bob is back at home base. She gets to a radio transmitter, gets out her secret code book, encrypts her message using today's key, and sends it to Bob. Bob has to get out the matching code book to decrypt the message. If Eve is able to capture the code book, disaster!
If Eve captures the code book, Alice needs to send a new secret key to Bob. She needs to send it securely, so that Eve can't read it. Thus, we're back at square one: Alice needs to send a key securely to Bob, so that she can send a message securely to Bob. Consider the secret codes we used earlier, when we said you could send email messages in code, just by agreeing on a rotation amount, but if you had to send the rotation amount by email, you're stuck!
For literally thousands of years, this was the essential paradox of encryption: you had to have a secure way of communicating in order to have a secure way of communicating. A real-life chicken-and-egg problem!
The kernel of the paradox is that all of these encryption methods are kinds of private key systems: they required a shared private key.
In 1976, Whitfield Diffie and Martin Hellman invented public key encryption. The main ideas:
How does this help? Very simple: Bob knows that Alice wants to send him a message, so he creates a pair of keys. He advertises his public key all over the world (maybe he puts it on his web page). Alice sees it, and so does Eve. Alice downloads it and uses it to encrypt her message. She radios it to Bob, and Eve intercepts it, but Eve can't decrypt the message--only Bob can. No one but Bob can decrypt the message, because only Bob has the secret key.
Remember the "trunks" we talked about back earlier? Here's an elaboration:
![]() |
Diffie and Hellman were brilliant for thinking this up, and they also came up with a way to do it. However, I'll describe a different way, due to Ronald Rivest, Adi Shamir, and Leonard Adleman. The solution, called the RSA method, involves advanced number theory that makes my head hurt, but the idea is very cool.
The last fact makes the RSA method difficult to "crack." For their contribution, the members of the RSA team received the 2002 TURING AWARD, which is considered the Nobel-equivalent for Computer Science.
What does "hard" mean? Consider factoring a number. The state of the art is little better than trying all the possible factors to see if they divide evenly into the number. Suppose the number is 100 digits long. How many numbers would you have to try? All the primes up to the square root of the 100 digit number, and the square root will have 50 digits. So, you'd have to try about this many candidate factors:
1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000
Yes, but computers are fast, right? Suppose we have a fast computer that can try a billion factors every second. Then we only have to wait for
1,000,000,000,000,000,000,000,000,000,000,000,000,000,000
seconds. Since there are about 32 million seconds in a year, this will only take
31,000,000,000,000,000,000,000,000,000,000,000
years.
The centerpiece of the RSA method uses numbers that are the products of two very large prime numbers, forcing Eve to buy very fast computers and wait a very long time to crack the encryption.
Suppose Alice wants to send Bob the message:
Call off the attack, it's a trap! Signed, Alice
She encrypts her message with Bob's public key and radios it to him. Meanwhile, Eve sends Bob the message:
Go on with the attack, it's all clear! Signed, Alice
She also encrypts the message with Bob's public key and radios it to him. She's pretending to be Alice! What is Bob to think?
There's a cool aspect to public key encryption that we haven't mentioned. We said that one key decrypts what the other encrypts. In fact, both keys can encrypt, and the two keys are opposites, which means that one can decrypt what the other encrypts.
So, here's what Alice does: she encrypts her message with her own private key. Bob gets it and successfully decrypts it with Alice's public key. (She has a web site with her public key listed.) He then realizes that only Alice could have sent this message, since only her private key can create a message that her public key can decrypt.
Thus, public key encryption can give us digital signatures. The purpose of a regular, real-life signature is that, presumably, only you can sign your name the way you do. By comparison to a known signature on file, your bank or any other interested party can verify that something has been signed by you. We'll see in a moment how this is part of HTTPS.
When you access a secure web server, your browser and the server become Alice and Bob. They want to communicate securely, but they don't know each other and they didn't make prior arrangements to have a secret key. Instead, they use the idea of public key encryption:
When the little "padlock" icon closes, your browser has established this cryptographically secure connection to the server, and so your credit card numbers and other private information is safe from the Eves on all the network connections between you and the server.
Q. Why do you suppose the browser and server use public-key encryption in order to set up private-key encryption? Why not just use public key all the time?
Is your information safe? Not entirely. Maybe someone can hack into the server. Maybe someone at the destination isn't trustworthy. Maybe the server is a complete fraud, put up by Eve. Maybe someone was looking over your shoulder when you typed. There is probably no such thing as perfect security. We all have to decide how much effort to put into security.
Encryption isn't just for web traffic. It could also be used for email and for telephone messages, including cell phones. Right now, anyone with a radio scanner and a little know-how can listen to cell phone conversations. Similarly, tapping a phone is very easy. Encryption lets us use any of these insecure media to send a private message.
Intercepting traffic doesn't work, thanks to encryption, but here's something that does work. It's called "spoofing," and it basically means to impersonate a web site. Spoofing takes some hacking ability, but that's common enough.
Let's make some (illegal) money:
What can be done about it? How do we deal with possible impersonation in real life? We have id cards.
A certificate is like an ID card for the web. As we saw, when your browser wants to make a secure connection to a server (such as Amazon.com), it requests a "trunk" from the server (actually, a public key).
How does your browser know that the trunk isn't bogus? (If you're going to impersonate Amazon.com's server, you might as well supply a bogus trunk).
The answer is that the trunk is labeled with the owner's name and signed by some authority, just as a driver's license might have the state governor's signature on it. Of course, the signature is computerized (digital). This is done using exactly the digital signatures that we talked about above.
The signed, identified trunk is called a certificate.
How does your browser know that the signature on the certificate is valid? Can't the spoofer just put a bogus signature on the certificate? No: a signature can only come from a valid signing authority, called a certificate authority, or CA.
A certificate authority is a company that makes money verifying identity and signing trunks. For example, one is called "Verisign." The IT people at Amazon.com go to Verisign (or some other CA), prove that they are valid representatives of Amazon.com, and present their trunk for signature. Verisign signs the trunk, and Amazon.com is now ready to set up a secure server.
We now have:
When viewing a secure site, you can double-click on the padlock icon to view the certificate of that site.
Do the following:
The signing authority doesn't guarantee that the company is a honest, upstanding pillar of the community, any more than a driver's license does. It just guarantees that the owner is who it claims to be.
What if the certificate is signed by an unknown authority? Then you get to engage in the following interesting dialog. To see this dialog, visit
Certificates expire every so often, just like driver's licenses and for the same reasons.
From the W3C Security FAQ, some more information about security certificates and even more information about security certificates