A Crash Course in UNIX TCP/IP Socket Programming
|
A Crash Course in UNIX TCP/IP Socket Programming
John Selbie
CEN 4500
Spring 1997
Introduction
A "socket" is a loose term used to describe "an end point for communication." The traditional Berkley Socket API is a set of C function calls used to support network communication. The Sockets API is not specific to TCP/IP. Therefore, developing TCP/IP network applications requires slightly more overhead of programming and understanding to account for the generic parameters of the library's function calls. Once understood, Socket programming is as easy as reading and writing to disk files.
The material presented here is somewhat specific to C, UNIX, and TCP/IP. However the general format for the Socket API has been ported to such languages as Java and Perl. In addition some UNIX vendors support other protocols such as IPX, SNA, and DEC-NET with their socket libraries. The Microsoft Windows version of the Socket API ("WinSock") is also very similar. It's believed that once a software developer gains a good understanding of UNIX/C Sockets, he or she can quickly understand implementations for other languages and operating systems.
Include Files
When writing C or C++ programs that use the socket library go ahead and include all these header files:
UNIX:
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/time.h>
#include <stdlib.h>
#include <memory.h>
Compiling and Linking
Under most versions of UNIX (Linux, BSD, SunOS, IRIX) compiling is done as usual:
gcc my_socket_program.c -o my_socket_program
However, Solaris requires the developer to explicitly link the socket and network services library with the program:
cc my_socket_program -o my_socket_program -lsocket -lnsl
The Solaris C compiler that is usually located in /opt/SUNWspro/bin, is recommended over gcc.
Applications and TCP/IP
Programs written by a software developer may use either TCP or UDP for communicating with remote hosts on the Internet. Both are services that work on top of the IP network protocol. TCP is a reliable "streams" service that requires a connection establishment phase between a host making an active connection to a remote server host making a passive one. UDP is an unreliable datagram service and does not require any connection establishment before sending.
The general order of library calls for a UDP communication session is as follows:
socket()
bind()
sendto() and/or recvfrom()
close()
For TCP clients, the order of library calls is as follows:
socket()
bind()
connect()
send() and/or recv()
close()
For TCP server programs, the order of library calls is as follows:
socket()
bind()
listen()
accept()
send() and/or recv()
close()
Socket Address Structures
From an application programming point of view, the only differences between network protocols are the address schemes used. Otherwise, operations such as connect, send, receive, and disconnect are probably the only things a developer has to think about when designing a network application. For TCP/IP, an ideal API would be one that understood IP addresses and port numbers. Since the socket library is designed to be used for multiple protocols, addresses are referenced by a generic structure as follows:
struct sockaddr {
unsigned short sa_family;
char sa_data[14];
};
The sa_family field specifies the type of protocol. For TCP/IP, this field is always set to AF_INET. The remaining 14 bytes (sa_data) of this structure are always protocol dependent. For TCP/IP, IP addresses and port numbers are placed in this field. To facilitate operating with these fields, a specific type of socket address structure is used instead of the one above.
struct sockaddr_in{
short sin_family;
unsigned short sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
If it's not already apparent, these structures are compatible with each other. They both are 16 bytes in size. It is also readily seen that the first two bytes of each structure are the family field. Thus, a struct sockaddr_in can always be cast to a struct sockaddr.
A sockaddr_in structure contains an in_addr structure as a member field. It has the following form
struct in_addr {
unsigned long s_addr;
};
Browsing the header file reveals that this really isn't the form of the structure. It's really a very complicated union designed to hold an IP address in a variety of ways. Regardless, the in_addr struct is exactly 4 bytes long, which is the same size as an IP address. In the sockaddr_in structure, the sin_port field is a 16-bit unsigned value used to represent a port number. It's important to remember that these fields always need to be set and interpreted in network byte order. For example:
struct sockaddr_in sin;
sin.sin_family = AF_INET;
sin.sin_port = htons(9999)
sin.sin_addr.s_addr = inet_addr("128.227.224.3");
In the above code example, the structure sin, holds the IP address, 128.227.224.3, and references the port number 9999. Two utility functions are used to set these values. The function htons returns the integer argument passed into it in network byte order. The function inet_addr converts the string argument from a dotted-quad into a 32-bit integer. Its return value is also in network byte order.
The structure above could be used to reference a host and application in which a datagram is to be delivered. The uses of the sockaddr_in structure will be covered in more detail below.
socket
The socket library call has the following prototype:
int socket(int family, int type, int protocol);
In short, this function creates "an end point for communication". The return value from this function is a handle to a socket. This number is passed as a parameter to almost all of the other library calls.
Since the focus of this document is on TCP/IP based sockets, the family parameter should be set to AF_INET. The type parameter can be either SOCK_STREAM (for TCP), or SOCK_DGRAM (for UDP). The protocol field is intended for specifying a specific protocol in case the network model support different types of stream and datagram models. However, TCP/IP only has one protocol for each, so this field should always be set to 0.
Examples:
To create a UDP socket:
int s;
s = socket(AF_INET, SOCK_DGRAM, 0);
To create a TCP socket:
int s;
s = socket(AF_INET, SOCK_STREAM, 0);
bind
Before sending and receiving data with a socket, it must first be associated with a local source port and a network interface address. The mapping of a socket to a TCP/UDP source port and IP address is called a "binding".
It may be the case where the socket is being used as a server, and thus must be able to listen for client requests on a specific port. It can also be the case that a client program doesn't need a specific source port, since all it's concerned about doing is sending and receiving messages with a specific port on the remote host.
Further complications arise when there are more than one network devices on the host running the program. So the question of sending through "which network" must be answered as well. The bind function call is used to declare the mapping between the socket, the TCP/UDP source port, and the network interface device.
The prototype for bind is as follows:
bind(int socket, struct sockaddr *address, int address_length);
The first argument is a socket handle (the number returned from the socket function call). The second argument is a socket address structure. With TCP/IP, this is really a sockaddr_in structure. The sin_port field of the address argument is the local source port number associated with this socket. That is, for every "send" operation with this socket, the source port field in the TCP/UDP header gets set with this value. If specifying an exact source port is not required, setting this value to INADDR_ANY (0) allows the operating system to pick any available port number. The sin_addr field specifies which network interface device to use. Since most hosts only have one network interface and only one IP address, this field should be set with the host's own IP address. However, the socket library provides no immediate way of for a host to determine it's own IP address! However, specifying the value of INADDR_ANY (0) in this field tells the operating system to pick any available interface and address.
The address of the sockaddr_in structure is passed into the bind call, so that the socket will now be ready to communicate with remote hosts. The third parameter passed to bind is the length of the sockaddr_in structure.
Example:
struct sockaddr_in sin;
int s;
s = socket(AF_INET, SOCK_DGRAM, 0);
sin.sin_family = AF_INET;
sin.sin_port = htons(9999);
sin.sin_addr.s_addr = INADDR_ANY;
bind(s, (struct sockaddr *)&sin, sizeof(sin));
/* s is now a usable UDP socket. Source port is 9999 */
It is recommended that the return from bind be checked. Bind will fail by returning -1 if the port that is being requested for use is already taken. When bind is called on a UDP socket, the socket is now ready to send and receive datagrams. For TCP sockets, the socket is now ready for the connect or accept calls.
UDP Sockets
Once a UDP socket has been created and bound to a local source port, it is now capable of being used for sending and receiving datagrams. The functions for sending and receiving datagrams are sendto and recvfom. Sendto has the following prototype:
int sendto(int socket, char *buffer, int length, int flags, struct sockaddr *destination_address, int address_size);
Where socket is a UDP socket that has been created and bound to a source port. buffer is a pointer to an array of bytes that are to be sent over the network. The length field specifies how long this array is. The flags field is normally 0.
The destination address is also a sockaddr structure. A sockaddr_in structure can be casted into this field. Use the sin_addr field to specify the destination IP address and sin_port for the destination port.
For example:
struct sockaddr_in sin;
sin.sin_family = AF_INET;
sin.sin_port = htons(12345); // htons for network byte order
sin.sin_addr.s_addr = inet_addr("128.227.22.43");
char *msg = "Hello, World";
sendto(s, msg, strlen(msg)+1, 0, (struct sockaddr *)sin, sizeof(sin));
In the above example, s is assumed to be a created UDP socket that has already bound to a local port. When sendto is called, a UDP datagram is sent to the host at 128.227.22.43. It's assumed there is a process with a socket bound to port 12345 waiting on a recvfrom call to receive the contents of the message being sent. The sendto function returns the number of bytes sent, or -1 if an error occurred. With UDP sockets, it's not usually necessary to check to see how many bytes were sent because this information is specified in the length field.
Recvfrom has the following prototype:
int recvfrom(int socket, char *buffer, int length, int flags, struct sockaddr *sender_address, int *address_size);
Recvfrom is similar to sendto. Buffer is a pointer to a byte array that is to be filled with the contents of the datagram. The length argument specifies the maximum length to copy into buffer. This is to prevent buffer over-run errors in case the datagram is larger than expected. The flags field is normally 0. The sender_address argument is a pointer to a socket address structure that gets filled with a copy of the sender's IP address and source port. The address_size parameter must be initialized to the size of the sockaddr structure being used. On return it will hold the number of bytes that were copied into the sender_address structure.
Recvfrom returns the number of bytes copied into the byte array pointed to by buffer. If the buffer space specified in length is less than that of the original datagram, only length bytes will be copied into buffer, and the rest will be lost.
For example:
struct sockaddr_in sin;
char msg[10000];
int ret;
int sin_length;
sin_length = sizeof(sin);
ret = recvfrom(s, msg, 10000, 0, (struct sockaddr *)sin, &sin_length);
printf("%d bytes received from %s (port %d)\n",
ret, inet_ntoa(sin.sin_addr), sin.sin_port);
In the above example, recvfrom will wait until it receives a datagram on the local port associated with the socket s. The printf statement will list information regarding the size, source IP address, and source port of the datagram received.
For any open socket that has been successfully binded to a port, the application may call sendto and recvfrom using that socket as many times as it needs to.
Fragmentation is completely transparent to the applications that are sending and receiving datagrams.
TCP Sockets
<TO BE ADDED SOMETIME LATER>
connect()
listen() / accept()
send()
recv()
close
When the data transfer session is over, simply call close on the socket as you would a file:
close(s); // s is a created socket
For UDP sockets, this will release the ownership on the local port that is bound to this socket. For TCP, this will initiate a two-way shutdown between both hosts before giving up port ownership.
If a TCP socket calls close, any pending or subsequent recv calls by the remote host will result in recv returning 0 to indicate a connection shutdown on the other end has occurred. Attempting to call send on a socket that is connected to a host that has called close will result in send returning -1. Unless it's known a priori that the remote host has only called shutdown, it is recommended that the application call close on it's socket so that the TCP connection will be properly terminated on both sides.
shutdown
TCP sockets can also engage in a half-close operation using the shutdown function call. It's prototype is as follows:
shutdown(int socket, int how);
If the how field is 0, this will disallow further reading (recv) from the socket. If the how field is 1, subsequent writes (send) will be disallowed. The socket will still need to be passed to close.
Relationship Between Sockets and File Descriptors
Socket handles are integer values. In UNIX, socket handles can be passed to most of the low-level POSIX I/O functions. For example:
read(s, buffer, buffer_len);
In the above example, s could be either a socket or file handle. Calling read on an open socket is equivalent to recv and recvfrom. However, if the socket is UDP, then information about the sender of the datagram will not be returned. Similarly the write function call is equivalent to send and sendto. UDP sockets may call connect to use send and write. It's always recommended that the socket library functions be used instead of the file I/O equivalents.
Utility Functions
There are several library calls that are not actually part of the socket library family, but are nevertheless used in socket programming. Below is a brief description of each.
unsigned int inet_addr(char *str);
If the string contained in str represents an IP address it dotted-quad notation, inet_addr will return it's equivalent 32-bit value in network byte order. This value can be passed into the sin_addr.s_addr field of a socketaddr_in structure. If the string can not be interpreted as a dotted-quad, -1 is returned (casted as an unsigned integer).
char *inet_ntoa(struct in_addr ip);
Converts the 32-bit value which is assumed to be in network byte order and contained in ip to a string. The pointer returned by inet_ntoa contains this string. However, subsequent calls to inet_ntoa will always return the same pointer, so copying the string to another buffer is recommended before calling again.
int gethostname(char *name, int length);
Copies the name (up to length bytes) of the hostname of the local computer into the character array pointed to by name.
struct hostent *gethostbyname(char *strHost);
If the string contained in strHost represents a host name (such as "rain" or "rain.cise.ufl.edu"), gethostbyname will return a pointer to a hostent structure containing additional information about the host including additional names and IP addresses associate with that host. Gethostbyname will does all the work of looking up address entries in local database files as well as making DNS queries. NULL is returned if the host name is unknown.
The format for the hostent structure is as follows:
struct hostent {
char * h_name; /* official name of host */
char ** h_aliases; /* alias list */
short h_addrtype; /* host address type */
short h_length; /* length of address */
char ** h_addr_list; /* list of addresses */
#define h_addr h_addr_list[0] /* address, for backward compat */
};
In short, the first IP address is contained within the first 4 bytes of the first entry in h_addr_list. h_addr can be used to reference this value. Using gethostbyname and inet_addr, a very good resolver function can be written to convert strings the user types as Internet addresses into equivalent 32-bit numbers for socket calls.
unsigned int resolve(char *ip_addr)
{
struct hostent *hp;
unsigned int ip;
hp = gethostbyname(ip_addr);
if (!hp)
{
ip = inet_addr(ip_addr);
if ((int)ip == -1)
return -1;
else return ip;
}
// hp->h_length should equal to 4
memcpy(&ip, hp->h_addr, 4);
return ip;
}
unsigned long htonl(unsigned long ul);
unsigned long ntohl(unsigned long ul);
unsigned short ntohs(unsigned short us);
unsigned short htons(unsigned short us);
These functions are very useful for converting integer values to and from network byte order. On big-endian machines such as Sun Sparcs and Motorola processors, these functions simply return the value passed as an argument. On little endian machines such as the Intel x86 and any system running Windows NT, these calls will perform byte swapping operations. On most machines, htons is equivalent to ntohs. This may not be true for future 64-bit systems or other architectures.
int select (int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds,
const struct timeval *timeout);
When an application calls recv or recvfrom it is blocked until data arrives for that socket. An application could be doing other useful processing while the incoming data stream is empty. Another potential problem situation is when an application receives data from multiple sockets. Calling recv or recvfrom on a socket that has no data in it's input queue prevents immediate reception of data from other sockets. The select function call solves this problem by allowing the program to poll all the socket handles to see if they are available for non-blocking reading and writing operations.
A description of the arguments in select is as follows:
nfds - Some socket implementations ignore this argument. It's value should be equal to 1 + (the socket handle with the highest value.)
readfds - A pointer to a set of file and socket descriptors that are to be polled for non-blocking reading and writing operations. Can be NULL to indicate an empty set.
writefds, exceptfds - Same as readfds, except these sets contain the file/socket handles to poll for non-blocking writing operations and error detection. Can be NULL to indicate an empty set.
timeout - A pointer to a timeval struct that specifies how long the select call should poll the descriptors for an available I/O operation. If the timeout value is 0, then select will return immediately. If the timeout argument is NULL, then select will block until at least one file/socket handle is ready for an available I/O operation. Otherwise select will return after the amount of time in the timeout has elapsed OR when at least one file/socket descriptor is ready for an I/O operation.
The return value from select is the number of handles specified in the file descriptor sets that are ready for I/O. If the time limit specified by the timeout field is reached, select return 0. The following macros exist for manipulating a file descriptor set:
FD_CLR(s, *set) Removes the descriptor s from set.
FD_ISSET(s, *set) Nonzero if s is a member of the set, zero otherwise.
FD_SET(s, *set) Adds descriptor s to set.
FD_ZERO(*set) Initializes the set to the NULL set.
Example:
fd_set fds;
struct timeval tv;
// sock is an intialized socket handle
tv.tv_sec = 2;
tv.tv_usec = 500000;
// tv now represents 2.5 seconds
FD_ZERO(&fds);
FD_SET(sock, &fds); // adds sock to the file descriptor set
/* wait 2.5 seconds for any data to be read from any single socket */
select(sock+1, &fds, NULL, NULL, &tv);
if (FD_ISSET(sock, &fds))
recvfrom(s, buffer, buffer_len, 0, &sa, &sa_len);
else
/* do something else */
Conclusions
Developers who use the function calls described in this document should always check the return value for each. Consulting the UNIX on-line manual pages ("man") for a complete description of each function call is recommended as well.