How the Internet Works Part II
In part 1 of this post I wrote about some of the history behind the Internet as well as basic descriptions of networks and the Internet. In this post, we’ll take a deep dive into how the Internet actually works. We’ll use a lot of the terminology from the last post, so if you haven’t read it yet, I suggest you do so now. This post will focus on two main aspects of internetworking - how data is sent across the Internet, and where it is sent. In order to do that, I will first describe packet switching.
Packet Switching and TCP
First let’s talk about what data is transmitted over the Internet. This can be anything from an hour long, 4k video to a one page text document. These files have very different lengths. A movie may be as large as 100GB and the text file may be only a few KB (for translation, 100GB is 100,000,000KB). Now imagine instead of file sizes, these are weights in pounds. Would you rather have an army of disciples carrying a bunch of small items, or just yourself carrying one giant item? If you’re lazy like me, probably the former. The designers of the Internet thought in roughly the same way, and thus they decided to break up files into smaller, more manageable portions that could be sent from one computer to another, and reassembled at their destination. This is called packet switching, and it is the backbone of how data is sent over the Internet. With packet switching, packets may be dropped for many reasons. The network may be congested or a node between the source and destination may go down, though as I will describe, this isn’t a very big deal.
This method enabled many of the important functions and protocols which were later built on top of it. Imagine, for example, that you wanted to send a picture of yourself across the Internet to your best friend. This picture is first broken up into a bunch of packets, let’s say 10 of them. Each packet then takes a route to your friend. The route for each packet is not necessarily the same, so the packets may arrive out of order. Luckily, each packet is labeled with a sequential number, so your friend is able to reorder them upon arrival and piece them together into the original photograph. What happens if one of the packets is missing for some reason? Your friend merely asks you to resend the missing packet! That is one of the benefits of packet switching. If information is lost in transit, the whole message need not be resent, only the small missing packet does. How though, does one find a route to another computer?
One key piece of networking hardware is a router. A router is a device which forwards packets from your local network (LAN) to another. This is the one node on each network which connects to the other routers on different networks. When data is sent from one network to another, the packets containing that data flow through many routers between your router and the destination’s router. Routers come in many different sizes. There are small routers for home use, which you’re most likely using right now, and larger routers for companies. Your Internet Service Provider(ISP) also owns some very large routers that direct traffic around the globe. These routers connect different ISP networks to each other. So, your LAN has a router which connects it to other other LANs on you ISP’s network, and your ISP has routers which connect its network to the networks of other ISPs, thus giving you a connection to the entire Internet.
IP and DNS
Computers on the Internet are all labeled with an address called an Internet Protocol Address, or IP Address. This address is a unique 32-bit number (in IPv4). That means there are nearly 4.3 billion possible addresses. An IPV4 address looks like four numbers between zero and 255 separated by dots. Believe it or not, we’ve actually run out of IPv4 addresses (yes, all 4 billion) and have had to begin rolling out a new standard called IPv6. In IPv6, addresses are 128 bits, so there are possible addresses. To put this in perspective, we could assign every grain of sand (there are of these) on the planet one IP address each for every millisecond of the entire life of the universe (), and we would just barely use them all up.
When one computer wants to talk to another computer on the Internet, the computer goes through a series of routers until it finds one that knows to whom the requested IP address is assigned, and directs your packets there. The way a packet gets from one computer to its destination is a process called routing and uses Domain Name Servers, or DNS. I’ll work with an example to explain this one.
Imagine you’re sending a message from your computer with IP address 126.96.36.199 to a website “google.com”. First, your browser contacts a DNS server to resolve the domain name (www.google.com) into the IP Address of the computer on which google.com is running - 188.8.131.52 (p.s. you can verify this is the correct IP address by typing it into your browser address window and seeing where it takes you). The DNS server you contact may not know what “google.com” is though, and in that case, it will contact a larger DNS until one of them knows the IP Address of the resource you’re trying to locate. Once your browser knows the IP Address of the website you’re attempting to query, your message is broken up into packets, each with a sequential packet number, destination, source and other information attached, and the packets are sent through your router and several others until it is received by “google.com”, processed, and a response is returned back to you. But what is “google.com”?
Web Servers and HTTP
“Google.com” is a server. All that means is that it is a computer on the Internet to which other computers can connect directly. The Internet has servers for everything, mail servers, proxy servers, and more relevantly to our discussion, web servers. A web server is just a server which knows how to speak the language the web uses, called HTTP. I already talked about how to find and send data to servers on the Internet, but what does that server do with that information, and what does it send back? I will continue our example using Google.
The story left off with finding the IP Address of google.com and sending some data. What you’re actually doing is sending an HTTP “GET” request to the server. This request says to Google’s server that you’re looking for the homepage of that website (called index.html). Below is a sample GET request. Don’t worry if you don’t understand it, I’m going to explain it line by line.
authority:www.google.com method:GET path:/ scheme:https accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 accept-encoding:gzip, deflate, sdch, br accept-language:en-US,en;q=0.8 cache-control:max-age=0 cookie: upgrade-insecure-requests:1 user-agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36
The first line, “
authority:www.google.com” just says to whom the request is being sent. The next line tells the server how how are asking for data. In this case, we are asking for a file by using a GET request. If we were trying to send something to Google instead of receive something, this method would instead be POST. The path field tells the server what file we want. “/” defines the root directory of the server, usually its homepage. This homepage is actually just a file on the server’s hard drive, not unlike a text document on your hard drive. The scheme is either HTTP or, in our case, HTTPS. HTTPS is a secure version of HTTP, the protocol which defines the GET request. The next next four lines aren’t as important, they just say what sort of format files are expected to be in. The next line “
cookie:” (only blank because I deleted its entry) transfers cookies between the server and client. I’m not going to talk in depth about cookies, but you can read about them here. The last line, believe it or not, tells the server which web-browser I’m using. In this example, it is Chrome for Linux. The reasons this string is so long and contains the name of 4 different web browsers is pretty funny and can be found here.
Once Google finds the index.html file, it breaks it up into packets and sends it to you. After putting it back together, your browser renders this file on your screen and you see Google’s homepage! This is how nearly all requests work on the web, and is similar to some other Internet protocols as well. That pretty much sums up this post. I may continue this series by going more in depth into some of the protocols later on, so check back!