第5章 发送和接收数据
There is nomagic: any programs that exchange information must agree on how that information will be encoded—represented as a sequence of bits—as well as which program sends what information when, and how the information received affects the behavior of the program. This agreement regarding the form and meaning of information exchanged over a communication channel is called a protocol .
Most application protocols are defined in terms of discrete messages made up of sequences of fields. Each field contains a specific piece of information encoded as a sequence of bits.
5.1 Encoding Integers
平台(platform)的解释:
By “platform” in this book we mean the combination of compiler, operating system, and hardware architecture. The gcc compiler with the Linux operating system, running on Intel’s IA-32 architecture, is an example of a platform.
5.1.1 Sizes of Integers
确定平台上整型的大小。
sizeof()需要注意的两件事:
第一,sizeof(char)总是1。因此,在C语言里,一个"byte"就是一个char类型变量占据的空间,sizeof()的单位其实是sizeof(char);
第二,预定义常量CHAR_BIT指示表示一个char类型的值需要多少bit。
Here are a couple of things to note about sizeof(). First, the language specifies that sizeof(char) is 1—always. Thus in the C language a “byte” is the amount of space occupied by a variable of type char, and the units of sizeof() are actually sizeof(char). But exactly how big is a C-language “byte”? That’s the second thing: the predefined constant CHAR_BIT tells how many bits it takes to represent a value of type char —usually 8, but possibly 10 or even 32.
The C99 language standard specification offers a solution in the form of a set of optional types: int8_t, int16_t, int32_t, and int64_t (along with their unsigned counterparts uint8_t, etc) all have the size (in bits) indicated by their names. On a platform where CHAR_BIT is eight, these are 1, 2, 4 and 8 byte integers, respectively. Although these types may not be implemented on every platform, each is required to be defined if any native primitive type has the corresponding size. (So if, say, the size of an int on the platform is 32 bits, the “optional” type int32_t is required to be defined.
5.1.2 Byte Ordering
There are two obvious choices: start at the “right” end of the number, with the least significant bits—so-called little-endian order—or at the left end, with the most significant bits— big-endian order. (Note that the ordering of bits within bytes is, fortunately, handled by the implementation in a standard way.)
Most protocols that send multibyte quantities in the Internet today use big-endian byte order; in fact, it is sometimes called network byte order. The byte order used by the hardware (whether it is big- or little-endian) is called the native byte order.
Addresses and ports that cross the Sockets API are always in network byte order.
5.1.3 Signedness and Sign Extension
Given k bits, we can represent values in the range −2k-1 through 2k-1 − 1 using two’s-complement. Note that the most significant bit (msb) tells whether the value is positive (msb=0) or negative (msb=1).On the other hand, a k-bit unsigned integer can encode values in the range 0 through 2k − 1 directly.
The signedness of the integers being transmitted should be determined by the range of values that need to be encoded.
Some care is required when dealing with integers of different signedness because of sign extension.
1.When a signed value is copied to any wider type, the additional bits are copied from the sign (i.e., most significant) bit.
当把有符号的值复制到任意更宽的类型时,将从符号位(即最高有效位)复制到额外的位。
2.The value of an unsigned integer type is—reasonably enough—not sign-extended.
One final point to remember: when expressions are evaluated, values of variables are widened (if needed) to the “native” ( ) size before any computation occurs. Thus, if you add the values of two variables together, the type of the result will be int, not char.
5.1.5 Wrapping TCP Sockets in Streams
A way of encoding multibyte integers for transmission over a stream (TCP) socket is to use the built-in -stream facilities.
FILE * fdopen(int socketdes, const char* mode)
The fdopen() function “wraps” the socket in a stream and returns the result. This allows buffered I/O to be performed on the socket via operations like fgets(), fputs(), fread() and fwrite().
int fclose(FILE* stream)
fclose() closes the stream along with the underlying socket.
int fflush(FILE* stream)
fflush() pushes buffered data to underlying socket, causes any buffered to be sent over the underlying socket.
size_t fwrite(const void * ptr, size_t size, size_t nmemb, FILE * stream)
The fwrite() method writes the specified number of objects of the given size to the stream.
size_t fread(void * ptr, size_t size, size_t nmemb, FILE * stream)
The fread() method goes in the other direction, reading the given number of objects of the given size from the given stream and placing them sequentially in the location pointed to by ptr.
Note that the sizes are given in units of sizeof(char), while the return values of these methods are the number of objects read/written, not the number of bytes. In particular, fread() never reads part of an object from the stream, and similarly fwrite() never writes a partial object. If the underlying connection terminates, these methods will return a short item count.
if (fwrite(&val8, sizeof(val8), 1, outstream) != 1) ...
Among the advantages of using buffered -streams with sockets is the ability to “put back” a byte after reading it from the stream (via ungetc()); this can sometimes be useful when parsing messages.
FILE-streams can only be used with TCP sockets.
5.1.6 Structure Overlays: Alignment and Padding
The C language rules for laying out data structures include specific alignment requirements, including that the fields within a structure begin on certain boundaries based on their type. The main points of the requirements can be summarized as follows:
1. Data structures are maximally aligned. That is, the address of any instance of a structure (including one in an array) will be divisible by the size of its largest native integer field.
2. Fields whose type is a multibyte integer type are aligned to their size (in bytes). Thus, an int32_t integer field’s beginning address is always divisible by four, and a uint16_t integer field’s address is guaranteed to be divisible by two.
To enforce these constraints, the compiler may add padding between the fields of a structure.
针对布置数据结构,C语言的规则包含特定的对齐要求,结构中的字段基于其类型开始于特定的边界。要点可以概括如下:
1.数据结构是最大化对齐的。一个结构任何实例(包括数组中的元素)的地址,可以被结构中最大整型字段的大小整除。
2.多字节整型字段与它们的大小对齐。因此,一个int32_t整型字段的开始地址总是能被4整除,一个unt16_t整型字段的地址则保证能被2整除。
5.1.7 Strings and Text
A mapping between a set of symbols and a set of integers is called a coded character set.
The C99 extensions standard defines a type wchar_t (“wide character”) to store characters from charsets that may use more than one byte per symbol. In addition, various library functions are defined that support conversion between byte sequences and arrays of wchar_t, in both directions. (In fact, there is a wide character string version of virtually every library function that operates on character strings.) To convert back and forth between wide strings and encoded char (byte) sequences suitable for transmission over the network, we would use the wcstombs() (“wide character string to multibyte string”) and mbstowcs() functions.
#include <stdlib.h>
size_t wcstombs(char *restrict s, const wchar_t *restrict pwcs, size_t n);
size_t mbstowcs(wchar_t *restrict pwcs, const char *restrict s, size_t n);
The terminating null is an artifact of the language, and not part of the string itself. It therefore should not be transmitted with the string unless the protocol explicitly specifies that method of marking the end of the string.
The bad news is that C99’s wide character facilities are not designed to give the programmer explicit control over the encoding scheme. Indeed, they assume a single, fixed charset defined according to the “locale” of the platform. Although the facilities support a variety of charsets, they do not even provide the programmer any way to learn which charset or encoding is in use. In fact, the C99 standard states in several situations that the effect of changing the locale’s charset at runtime is undefined. What this means is that if you want to implement a protocol using a particular charset, you’ll have to implement the encoding yourself.
5.2 Constructing, Framing, and Parsing Messages
A clean design further decomposes the process into two parts:
The first is concerned with framing, or marking the boundaries of the message, so the receiver can find it in the stream.
The second is concerned with the actual encoding of the message, whether it is represented using text or binary data.
Notice that these two parts can be independent of each other, and in a well-designed protocol they should be separated.
struct VoteInfo {
uint64_t count; // invariant: !isResponse => count==0
int candidate; // invariant: 0 <= candidate <= MAX_CANDIDATE
bool isInquiry;
bool isResponse;
};
typedef struct VoteInfo VoteInfo;
enum {
MAX_CANDIDATE = 1000,
MAX_WIRE_SIZE = 500
};
int GetNextMsg(FILE *in, uint8_t *buf, size_t bufSize);
int PutMsg(uint8_t buf[], size_t msgSize, FILE *out);
bool Decode(uint8_t *inBuf, size_t mSize, VoteInfo *v);
size_t Encode(VoteInfo *v, uint8_t *outBuf, size_t bufSize);
5.2.1 Framing
If a receiver tries to receive more bytes from a socket than were in the message, one of two things can happen.
If no other message is in the channel, the receiver will block and will be prevented from processing the message; if the sender is also blocked waiting for a reply, the result will be deadlock: each side of the connection waiting for the other to send more information.
On the other hand, if another message is already in the channel, the receiver may read some or all of it as part of the first message, leading to other kinds of errors. Therefore framing is an important consideration when using TCP sockets.
Two general techniques enable a receiver to unambiguously find the end of the message:
1. Delimiter-based: The end of the message is indicated by a unique marker, a particular, agreed-upon byte (or sequence of bytes) that the sender transmits immediately following the data.
The downside of such techniques is that both sender and receiver have to scan every byte of the message.
2. Explicit length: The variable-length field or message is preceded by a length field that tells how many bytes it contains. The length field is generally of a fixed size; this limits the maximum size message that can be framed.
The length-based approach is simpler but requires a known upper bound on the size of the message.
//--------------------------------------------------------------------DelimFramer.c #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include "Practical.h" static const char DELIMITER = '\n'; /* Read up to bufSize bytes or until delimiter, copying into the given * buffer as we go. * Encountering EOF after some data but before delimiter results in failure. * (That is: EOF is not a valid delimiter.) * Returns the number of bytes placed in buf (delimiter NOT transferred). * If buffer fills without encountering delimiter, negative count is returned. * If stream ends before first byte, -1 is returned. * Precondition: buf has room for at least bufSize bytes. */ int GetNextMsg(FILE *in, uint8_t *buf, size_t bufSize) { int count = 0; int nextChar; while (count < bufSize) { nextChar = getc(in); if (nextChar == EOF) { if (count > 0) DieWithUserMessage("GetNextMsg()", "Stream ended prematurely"); else return -1; } if (nextChar == DELIMITER) break; buf[count++] = nextChar; } if (nextChar != DELIMITER) { // Out of space: count==bufSize return -count; } else { // Found delimiter return count; } } /* Write the given message to the output stream, followed by * the delimiter. Return number of bytes written, or -1 on failure. */ int PutMsg(uint8_t buf[], size_t msgSize, FILE *out) { // Check for delimiter in message int i; for (i = 0; i < msgSize; i++) if (buf[i] == DELIMITER) return -1; if (fwrite(buf, 1, msgSize, out) != msgSize) return -1; fputc(DELIMITER, out); fflush(out); return msgSize; } //--------------------------------------------------------------------DelimFramer.c
//--------------------------------------------------------------------LengthFramer.c #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <netinet/in.h> #include "Practical.h" /* Read 2-byte length and place in big-endian order. * Then read the indicated number of bytes. * If the input buffer is too small for the data, truncate to fit and * return the negation of the *indicated* length. Thus a negative return * other than -1 indicates that the message was truncated. * (Ambiguity is possible only if the caller passes an empty buffer.) * Input stream is always left empty. */ int GetNextMsg(FILE *in, uint8_t *buf, size_t bufSize) { uint16_t mSize = 0; uint16_t extra = 0; if (fread(&mSize, sizeof(uint16_t), 1, in) != 1) return -1; mSize = ntohs(mSize); if (mSize > bufSize) { extra = mSize - bufSize; mSize = bufSize; // Truncate } if (fread(buf, sizeof(uint8_t), mSize, in) != mSize) { fprintf(stderr, "Framing error: expected %d, read less\n", mSize); return -1; } if (extra > 0) { // Message was truncated uint8_t waste[BUFSIZE]; fread(waste, sizeof(uint8_t), extra, in); // Try to flush the channel return -(mSize + extra); // Negation of indicated size } else return mSize; } /* Write the given message to the output stream, followed by * the delimiter. Precondition: buf[] is at least msgSize. * Returns -1 on any error. */ int PutMsg(uint8_t buf[], size_t msgSize, FILE *out) { if (msgSize > UINT16_MAX) return -1; uint16_t payloadSize = htons(msgSize); if ((fwrite(&payloadSize, sizeof(uint16_t), 1, out) != 1) || (fwrite(buf, sizeof(uint8_t), msgSize, out) != msgSize)) return -1; fflush(out); return msgSize; } //--------------------------------------------------------------------LengthFramer.c
//--------------------------------------------------------------------VoteEncodingText.c /* Routines for Text encoding of vote messages. * Wire Format: * "Voting <v|i> [R] <candidate ID> <count>" */ #include <string.h> #include <stdint.h> #include <stdbool.h> #include <stdlib.h> #include <stdio.h> #include <string.h> #include "Practical.h" #include "VoteProtocol.h" static const char *MAGIC = "Voting"; static const char *VOTESTR = "v"; static const char *INQSTR = "i"; static const char *RESPONSESTR = "R"; static const char *DELIMSTR = " "; enum { BASE = 10 }; /* Encode voting message info as a text string. * WARNING: Message will be silently truncated if buffer is too small! * Invariants (e.g. 0 <= candidate <= 1000) not checked. */ size_t Encode(const VoteInfo *v, uint8_t *outBuf, const size_t bufSize) { uint8_t *bufPtr = outBuf; long size = (size_t) bufSize; int rv = snprintf((char *) bufPtr, size, "%s %c %s %d", MAGIC, (v->isInquiry ? 'i' : 'v'), (v->isResponse ? "R" : ""), v->candidate); bufPtr += rv; size -= rv; if (v->isResponse) { rv = snprintf((char *) bufPtr, size, " %llu", v->count); bufPtr += rv; } return (size_t) (bufPtr - outBuf); } /* Extract message information from given buffer. * Note: modifies input buffer. */ bool Decode(uint8_t *inBuf, const size_t mSize, VoteInfo *v) { char *token; token = strtok((char *) inBuf, DELIMSTR); // Check for magic if (token == NULL || strcmp(token, MAGIC) != 0) return false; // Get vote/inquiry indicator token = strtok(NULL, DELIMSTR); if (token == NULL) return false; if (strcmp(token, VOTESTR) == 0) v->isInquiry = false; else if (strcmp(token, INQSTR) == 0) v->isInquiry = true; else return false; // Next token is either Response flag or candidate ID token = strtok(NULL, DELIMSTR); if (token == NULL) return false; // Message too short if (strcmp(token, RESPONSESTR) == 0) { // Response flag present v->isResponse = true; token = strtok(NULL, DELIMSTR); // Get candidate ID if (token == NULL) return false; } else { // No response flag; token is candidate ID; v->isResponse = false; } // Get candidate # v->candidate = atoi(token); if (v->isResponse) { // Response message should contain a count value token = strtok(NULL, DELIMSTR); if (token == NULL) return false; v->count = strtoll(token, NULL, BASE); } else { v->count = 0L; } return true; } //--------------------------------------------------------------------VoteEncodingText.c
//--------------------------------------------------------------------VoteEncodingBin.c /* Routines for binary encoding of vote messages * Wire Format: * 1 1 1 1 1 1 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 * +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ * | Magic |Flags| ZERO | * +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ * | Candidate ID | * +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ * | | * | Vote Count (only in response) | * | | * | | * +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ * */ #include <string.h> #include <stdbool.h> #include <stdlib.h> #include <stdint.h> #include <netinet/in.h> #include "Practical.h" #include "VoteProtocol.h" enum { REQUEST_SIZE = 4, RESPONSE_SIZE = 12, COUNT_SHIFT = 32, INQUIRE_FLAG = 0x0100, RESPONSE_FLAG = 0x0200, MAGIC = 0x5400, MAGIC_MASK = 0xfc00 }; typedef struct voteMsgBin voteMsgBin; struct voteMsgBin { uint16_t header; uint16_t candidateID; uint32_t countHigh; uint32_t countLow; }; size_t Encode(VoteInfo *v, uint8_t *outBuf, size_t bufSize) { if ((v->isResponse && bufSize < sizeof(voteMsgBin)) || bufSize < 2 * sizeof(uint16_t)) DieWithUserMessage("Output buffer too small", ""); voteMsgBin *vm = (voteMsgBin *) outBuf; memset(outBuf, 0, sizeof(voteMsgBin)); // Be sure vm->header = MAGIC; if (v->isInquiry) vm->header |= INQUIRE_FLAG; if (v->isResponse) vm->header |= RESPONSE_FLAG; vm->header = htons(vm->header); // Byte order vm->candidateID = htons(v->candidate); // Know it will fit, by invariants if (v->isResponse) { vm->countHigh = htonl(v->count >> COUNT_SHIFT); vm->countLow = htonl((uint32_t) v->count); return RESPONSE_SIZE; } else { return REQUEST_SIZE; } } /* Extract message info from given buffer. * Leave input unchanged. */ bool Decode(uint8_t *inBuf, size_t mSize, VoteInfo *v) { voteMsgBin *vm = (voteMsgBin *) inBuf; // Attend to byte order; leave input unchanged uint16_t header = ntohs(vm->header); if ((mSize < REQUEST_SIZE) || ((header & MAGIC_MASK) != MAGIC)) return false; /* message is big enough and includes correct magic number */ v->isResponse = ((header & RESPONSE_FLAG) != 0); v->isInquiry = ((header & INQUIRE_FLAG) != 0); v->candidate = ntohs(vm->candidateID); if (v->isResponse && mSize >= RESPONSE_SIZE) { v->count = ((uint64_t) ntohl(vm->countHigh) << COUNT_SHIFT) | (uint64_t) ntohl(vm->countLow); } return true; } //--------------------------------------------------------------------VoteEncodingBin.c