Linux TCP/IP 协议栈学习(4)—— Linux Socket (Part II)

Packet, Raw, Netlink, and Routing Sockets :
 
Netlink, routing, packet, and raw are all types of specialized sockets.
Netlink provides a socket-based interface for communication of messages and settings between the user and the internal protocols
 
Rtnetlink is for application-level management of the neighbor tables and IP routing tables
 
Packet sockets are accessed by the application when it sets AF_PACKET in the family field of the socket call. 
ps = socket (PF_PACKET , int type , int protocol ); 

Type is set to either SOCK_RAW or SOCK_DGRAM. Protocol has the number of the protocol
and is the same as the IP header protocol number or one of the valid protocol numbers.
 
 
Raw sockets allow user-level application code to receive and transmit network layer packets by intercepting them before they pass through the transport layer.
 
rs = socket ( PF_INET , SOCK_RAW , int protocol ); 
 
Protocol is set to the protocol number that the application wants to transmit or receive.A common example of the use of raw sockets is the ping command When the ping application code opens the socket, it sets the protocol field in the socket call to 
IPPROTO_ICMP. Ping and other application programs for route and network maintenance make
use of a Linux utility library call to convert a protocol name into a protocol number, getprotent(3)
 
 
Netlink sockets are accessed by calling socket with family set to AF_NETLINK.
 
ns = socket (AF_NETLINK , int type , int netlink_family );
 
The type parameter can be set to either SOCK_DGRAM or SOCK_STREAM, but it doesn't
really matter because the protocol accessed by is determined by netlink_family, and this parameter is set to one of the values in Table 5.6. The send and recv socket calls are generally used with netlink
 
 
Implementation of the Socket API System Calls :
 
there are several steps involved with directing each application layer socket call to the specific protocol that must respond to the request.
 
也就是说,socket 可以很多协议族关联,当选定一个协议的时候,需要与相应的函数关联,需要有几个步骤。
 
First, any address referenced in the call’s arguments must be mapped from user space to kernel space. 
Next, the functions themselves must be translated from generic socket layer functions to the specific functions for the protocol family. 
Finally, the functions must be translated from the protocol family generic functions to the specific functions for the member protocol in the family.
 
Once we have a pointer to the socket structure, we retrieve the function specific to the address family and protocol type through the open socket. To do this, we call the protocol- specific function through a pointer in the structure pointed to by the ops field of the socket structure
 
 
asmlinkage long sys_socketcall(int call , unsigned long __user *args);
 
The first thing it does is map each address from user space to kernel space. It does this by calling copy_from_user. Next, sys_socketcall invokes the system call function that corresponds to auser-level socket call. For example, when the user calls bind, sys_socket call maps the user-level bind to the kernel function, sys_bind, and listen is mapped to sys_listen.
 
Sys_sendmsg and sys_recvmsg have a bit more work to do than the other socket functions. They must verify that the iovec buffer array contains valid addresses first. Each address is mapped from kernel to user space later when the data is actually transferred but the addresses are validated now. After completing the validation of the iovec structure, sock_sendmsg and sock_recvmsg functions are called, respectively. 
Sys_accept is a bit more complicated because it has to establish a new socket for the new incoming connection. The first thing it does is call sock_alloc to allocate a new socket. Next, it has to get a name for the socket by calling the function pointed to by the getname field in the ops field in the socket structure. Remember that the "name" of a socket is the address and port number associated with the socketNext, it calls sock_map_fd to map the new socket into the pseudo socket filesystem. 
 
 
The functions, sock_read and sock_write set up an iovec type msghdr structure before calling sock_recvmsg and sock_sendmsg, respectively
 
Sock_setsockopt and sock_getsockopt are called from the system call if level is set to SOL_SOCKET. The purpose of these functions is to set values in the sock structure according to the options that were passed as a parameter by the application layer
 
Sock_setsockopt gets a pointer to the sock structure from the sk field of the socket structure, sock, which was passed as an argument. Next, it sets options in the sock structure, sk, based on the values pointed to by the optname and optval arguments.Refer to Section 5.3.1 (这里参见 Socket API 笔记 )for a description of the fields in the sock structure. If SO_DEBUG is set in optname, debug is set, reuse is set to the value of SO_REUSEADDR, localroute to the value of SO_DONTROUTE, no_check is set to the value of SO_NO_CHECK, and priority to the value of SO_PRIORITY.
 
Sock_getsockopt reverses what sock_setsockopt does. It retrieves certain values from the sock structure for the option socket and returns them to the user.
 
 
how each member protocol communicates with the socket layer ?
 
the file descriptor fd is used to map each socket API call with a function specific to each protocol
 
In addition, as we saw in Section 5.5.2, each of the protocols registers itself
with the protocol switch table. When the socket structure is initialized, as described in Section 5.4, the ops field was set to the set of protocol-specific operations from the entry in the protocol switch table
 
Once all the complex initialization is done as described in other sections, the actual mapping is quite simple. In most cases, the "sys_" versions of the socket functions simply call sockfd_lookup to get a pointer to the socket structure and call the protocol’s function through the ops field.
 
一个简单的例子:This function is called in the kernel when the user executes the getsockname socket API function to get the address (name) of a socket
 
这里顺便复习一下系统调用是怎么找到内核对应函数的:
 
getsockname() ----> sys_getsockname() ----> SYSCALL_DEFINE3(getsockname,...) (linux/net/socket.c)
 
/*
 *    Get the local address ('name') of a socket object. Move the obtained
 *    name to user space.
 */
 
SYSCALL_DEFINE3 (getsockname , int, fd, struct sockaddr __user *, usockaddr,
             int __user *, usockaddr_len)
{
       struct socket * sock;
       struct sockaddr_storage address;
       int len, err, fput_needed ;
 
       sock = sockfd_lookup_light (fd , &err , &fput_needed );
       if (!sock)
             goto out;
 
       err = security_socket_getsockname (sock );
       if (err)
             goto out_put;
 
       err = sock->ops->getname (sock , (struct sockaddr *)&address , &len , 0);
       if (err)
             goto out_put;
       err = move_addr_to_user ((struct sockaddr *)&address , len , usockaddr, usockaddr_len );
 
out_put:
       fput_light(sock->file , fput_needed );
out:
       return err;
}
 
 
Creation of a Socket :
 
Sock_create, defined in file linux /net /socket .c , is called from sys_socket. This function initiatesthe creation of a new socket 
int sock_create( int family, int type, int protocol, struct socket **res); 
 
First, sock_create verifies that family is one of the allowed family types shown in Table 5.1. Then, it allocates a socket by calling sock_alloc, which returns a new socket structure, sock. Sock_alloc, called from sock_create, returns an allocated socket structure. The socket structure is actually part of an inode structure, created when sock_alloc calls new_inode.
Once the inode is created, sock_alloc retrieves the socket structure from the inode. Then, it initializes a few fields in the socket structure.
 
Sockets maintain a state related to whether an open socket represents a connection to a peer or not
 
The socket state really only reflects whether there is an active connection.
 
After returning from the sock_alloc call, sock_create calls the create function for the protocol family. It accesses the array net_families to get the family’s create function. For TCP/IP, family will be set to AF_INET. 
 
AF_INET is inet_create, is defined in fileaf_inet.c
static int inet_create( struct socket * sock, int protocol); 
 
In inet_create, we create a new sock structure called sk and initialize a few more fields.We call sk_alloc to allocate the sock structure from the slab cache that is 
specific to the protocol for this socket(  inet_sk_slab ).  
sk = sk_alloc (PF_INET , GFP_KERNEL , inet_sk_size( protocol),   
inet_sk_slab (protocol )); 
 
创建之后:
Sk points to the new slab cache. Next, inet_create searches the protocol switch table to look for a match from the protocol.  After getting the result from the search of the protocol switch table, the capability flags are checked against the capabilities of the current process, and if the caller doesn’t have permission to create this type of socket, the user level socket call will return the EPERM error.  
 
当创建了 sock 结构之后 inet_create 函数需要对 该 sock 结构 做如下的初始化:
 
inet_create will set some fields in the new sock data structure, however, many fields are pre-initialized when allocation is done from the slab cache. The field sk_family is set to PF_INET. The prot field is set to the protocol’s protocol block structure that defines the specific function for each of the transport protocols. No_check and ops are set according to their respective values in the protocol switch table. If the type of the socket is SOCK_RAW, the num field is set to the protocol number. As will be shown in later chapters, this field is used by IP to route packets internally depending on whether there is a raw socket open. The sk_destruct field of sk is set to inet_sock_destruct, the sock structure destructor. The sk_backlog_rcv field is set to
point to the protocol-specific backlog receive function. Next, some fields in the protocol family-specific part of the sock structure are initialized. 
 
通过这个宏,获取 inet_sock 类型的 inet 域
#define inet_sk(__sk) (&((struct inet_sock *)__sk)->inet) 
 
后面参见 inet_sock 结构的分析 和 proto 结构的分析
 
The sk_prot field in the sock structure points to the protocol block structure. The init field in the proto structure is specific for each protocol and socket type within the AF_INET protocol family. 
 
 
Netlink and Rtnetlink :
 
Netlink is an internal communication protocol. It mainly exists to transmit and receive messages between the application layer and various protocols in the Linux kernel. Netlink is implemented as a protocol with its own address family, AF_NETLINK. It supports most of the socket API functionsRtnetlink is a set of message extensions to the basic netlink protocol messages. The most common use of netlink is for applications to exchange routing information with the kernel’s internal routing table
 
 
Netlink sockets are accessed like any other sockets. Both socket calls and system IO calls will work with netlink sockets. For example, the sendmsg and recvmsg calls are generally used by user-level applications to add and delete routes. Both these calls pass a pointer to the nlmsghdr structure in the msg argument.
 
 
struct nlmsghdr {
       __u32       nlmsg_len ;   /* Length of message including header */
       __u16       nlmsg_type  /* Message content */
       __u16       nlmsg_flags ;       /* Additional flags */
       __u32       nlmsg_seq ;   /* Sequence number */
       __u32       nlmsg_pid ;   /* Sending process process ID */
};
 
 
The netlink protocol is implemented in the file linux/netlink/af_netlink.c.
It is similar to UDP or TCP in that it defines a proto_ops structure to bind internal calls with socket calls made through the AF_NETLINK address family sockets.
 
static const struct proto_ops netlink_ops = {
      . family =   PF_NETLINK ,
      . owner =    THIS_MODULE ,
      . release =  netlink_release ,
      . bind =            netlink_bind,
      . connect =  netlink_connect ,
      . socketpair =     sock_no_socketpair ,
      . accept =   sock_no_accept ,
      . getname =  netlink_getname ,
      . poll =            datagram_poll,
      . ioctl =    sock_no_ioctl ,
      . listen =   sock_no_listen ,
      . shutdown sock_no_shutdown ,
      . setsockopt =     netlink_setsockopt ,
      . getsockopt =     netlink_getsockopt ,
/*Sendmsg and recvmsg are the main functions used to send and receive messages through 
AF_NETLINK sockets. */
      . sendmsg =  netlink_sendmsg ,
      . recvmsg =  netlink_recvmsg ,
      . mmap =            sock_no_mmap,
      . sendpage sock_no_sendpage ,
};
 
Just like other protocols , such as UDP and TCP that register with the socket layer,netlinkaddress family declares a global instance of the net_proto_family structure in the fileaf_netlink .c . 
struct net_proto_family netlink_family_ops = { 
    . family PF_NETLINK ,
    . create netlink_create ,
    . owner THIS_MODULE ,
} ; 
 
The netlink module also provides an initialization function for the protocol,netlink_proto_init
 
static int __init netlink_proto_init( void );
 
This function registers the netlink family operations with the socket layer by calling 
posted @ 2012-12-31 19:50  KingsLanding  阅读(1745)  评论(0编辑  收藏  举报