James's TCP/IP FAQ - Understanding Port Numbers

This is the second page of a 3 part FAQ on TCP/IP basics. Here is the list of parts (I recommend that they be read in order):

Introduction

In my introduction to IP networking, I described how IP addresses are used to route packets of information from one computer to another through a TCP/IP network. I talked about the importance of both source and destination addresses, how we find addresses for other machines, and how we get our own addresses assigned to our machines. I started off with an analogy to sending regular mail through the postal service.

Sticking with that analogy, how do we ensure that they right person reads the mail we send? That mail may need to go to a particular person or department. With mail, we don't just include the P.O.Box number or street address of our destination - instead, we include the name of the person or department that should open the mail.

Computers need something similar to make sure the correct software application on the destination computer gets the data packets we are sending, and some way to make sure replies get routed to the correct application on the computer at our end of the conversation. This is accomplished through the use of "port numbers".

If you followed along with the discussion on IP routing with addresses, port numbers will be very easy to understand. We talked about how "IP" is a networking protocol and requires addresses. On top of that networking protocol, we make use of "transport" protocols to direct packets to specific software applications. The most common transport protocol on the Internet is Transport Control Protocol (or TCP). We also sometimes see something called the User Datagram Protocol (UDP) at the transport layer. Both are transport protocols, and both use port numbers (we'll talk about the differences between the two a little later on).

Basically, just as with the need for a source and destination IP address in every packet, we must also include a source and destination port number in every packet. There are two types of port numbers to consider - port numbers used by server software, and port numbers used by client sofware.

Server Ports

Let's define what a "server" really is. It's not a particular kind of computer. It doesn't have to run on Unix, or Windows NT, or any other specific operating system. A server is simply a computer that is running a software package that "listens" for incoming network connection requests. For example, a web server is a computer running web server software. Once that software is started on the computer, it sits idle until it receives a network request for its services. Then it kicks into gear.

Now it's confusing enough trying to figure out the IP address for a server - we have to make use of DNS servers to help with that. Luckily, we don't have to try to remember or figure out the port numbers for most common kinds of servers. That's because there are a number of "commonly used port numbers" which have been agreed upon as a standard. For example, a web server typically listens at TCP port 80. You don't have to put that port number on the URL bar of your browser - your web browser assumes that it is sending information to port 80 on the target machine. It is possible to override that behavior and make a connection to a different port number, if for some reason the web server was listening on a different port - but the vast majority of the time it's port 80.

Other "normal" services all have their standard ports. FTP servers listen on port 21; machines that allow telnet connections listen on port 23. There are also standard port numbers for mail (both POP3 and SMTP are mail services), and many database servers have standard port numbers as well. The cool part is that your "client" software which you use to access those servers will already "know" which port numbers to use as the destination port.

Port numbers have to be in the range of 0-65535. However, standardized server ports are almost always in the range 0-1023. There are some exceptions, especially as more and more types of new servers are being developed.

Client Ports

Client software is not nearly so well-behaved when it comes to port numbers. Basically, any client can use any port number at or above 1024. As we shall see shortly, a single client application may use many different port numbers, even within a single session. When a client application needs to contact a server, it begins by negotiating with the TCP/IP networking stack on the local operating system for an unused port number. The client may start by requesting a port at a certain place in the stack, or it may just ask for a random assignment.

Frequently, a client will need lots of port numbers. Take the humble web browser, for example. You open the browser and point to a single web page. If you are looking at a web page that consists of a single HTML text file (like this one), your browser will use a single client-side port number to open the connection to the web server.

However, as soon as you click a link on the site to go to a different page, your browser will pick a new port number for that next session. And if you are pointing to a web page with frames, or with some number of embedded graphics, or java applets, or whatever - your browser will open a separate network session for each and every one of those pieces. Each HTML file and each graphic is a separate file being downloaded to your PC. Every one of those downloads gets a separate session; each session requires a separate client-side port number.

Important - Every "session" on a TCP/IP network has four pieces of information that define it as a unique conversation:

In order for two packets to be considered part of the same "session" all four of the above items have to match. If any one of those items is different, the two packets are part of different sessions. Once a client application "finishes" using a port number and a particular session is closed, the port number is reserved for a short period of time, and then is returned to the "pool" of available port numbers.

This scheme, while confusing, has some strong advantages. For example, imagine a web page with a single HTML text file and 5 graphics. Downloading that "page" requires 6 TCP/IP sessions, and thus uses 6 client-side port numbers. Assume for a moment that a temporary problem on the Internet causes one of the graphics to fail to come in cleanly. Since these are all separate sessions, the page can still be displayed in the browser using the pieces that arrived safely.

When A Client Is Really A Server...

In general, there is a clear line drawn between a client and a server. Server software sits and listens for connections; it does not initiate connections on its own. Server software has to be up and running all the time in order to be useful. Client software is used to access information on servers; it is used to initiate connections to servers. Client software only has to be running at the moment you want to use it.

But some of the newer applications available for use on the Internet are blurring the line between client and server. These are sometimes called "peer-to-peer" applications, and effectively operate as both client AND server. Examples include Napster (which allows for peer-to-peer file sharing) and just about any instant-messaging software you can name (AOL Instant Messenger, ICQ, etc.). These programs not only allow you to connect to someone else, but they also "listen" for incoming connections just like a server.

There are also an increasing number of games that can be operated in a "server" mode for multiplayer gaming. You can be a client connecting to someone else's game, or you can run your own game server allowing others to connect to you. For "normal" Internet connections, these distinctions may seem irrelevant, but when you begin looking at sharing an Internet connection they will assume more importance.

TCP or UDP?

I mentioned earlier that there are a couple of different transport layer protocols commonly used that both make use of port numbers. You will not normally need to be concerned about when to use TCP and when to use UDP - this will be defined by the application you're using. Your client or server software will automatically use whichever is most appropriate. However, there are a couple of rare cases where it might be useful to know the difference. This will mainly be with streaming media applications, like RealAudio, where you may be allowed to choose one or the other.

TCP is oriented towards reliability. It uses not only the port numbers, but also sequence numbers and frequent acknowledgment packets to ensure that packets arive intact and in order. This is usually desireable, but it can create some overhead on the connection, especially if there are a lot of errors or latency on the network. By contrast, UDP is oriented towards raw throughput. For example, applications that can do all their work in a single packet don't need to worry about sequencing. DNS is a good example of a message that can go over the network in a single packet. Another place where UDP is appropriate is streaming audio and video. You don't WANT to drop a lot of packets, but neither do you want to constantly stop and ask for packets to be re-sent. Using UDP for this kind of connection allows your computer to ignore dropped packets and play back the stream, warts and all.

So...our example was RealAudio - which would you choose? Under most circumstances, UDP would give you the best chance of listening to a live data stream. BUT - if you planned to "record" a copy of that data stream on your computer, TCP would let you guarantee that every packet is recorded for best quality.