前言

本科期间修读了《计算机网络》课程，但是课上布置的作业比较简单，只是分析了一下 Wireshark 抓包的结构，没有动手实现过协议。所以最近在哔哩大学在线学习了斯坦福大学的 CS144 计算机网课程，这门课搭配了几个 Lab，要求动手实现一个 TCP 协议，而不是简单地调用系统为我们提供好的 Socket。

实验准备

CS144 Fall2019 的课件和实验指导书可以下载自 CS144 镜像网站，代码可以从我的 Github 仓库获取。

本篇博客将会介绍 Lab0 的实验过程，实验环境为 Ubuntu20.04 虚拟机，使用 VSCode 完成代码的编写。

实验过程

Lab0 有两个任务，第一个任务是实现能发送 Get 请求到任意网址的 webget 程序，第二个任务是实现内存内的可靠字节流。

webget

实验指导书中让我们先用 Telnet 程序连接到斯坦福大学的 Web 服务器上，在命令行中输入 telnet cs144.keithw.org http 并回车，不出意外的话会提示已成功连接上服务器。之后手动构造请求报文，包括请求行和请求头，输入两次回车就能得到响应，响应体内容为 Hello, CS144。

应用层的 Http 协议使用 TCP 传输层协议进行数据的可靠性传输，由于我们目前还没有实现 TCP 协议，只能先借用一下操作系统写好的的 socket 来发送 http 请求。CS144 的老师们十分贴心地对 socket 库进行了二次封装，类图如下所示：

FileDescriptor 的部分代码如下，可以看到内部类 FDWrapper 持有文件描述符，会在析构的时候调用 close() 函数释放对文件描述符的引用。FileDescriptor 还提供了 read() 和 write() 函数进行文件读写操作：

复制class FileDescriptor {
    //! \brief A handle on a kernel file descriptor.
    //! \details FileDescriptor objects contain a std::shared_ptr to a FDWrapper.
    class FDWrapper {
      public:
        int _fd;                    //!< The file descriptor number returned by the kernel
        bool _eof = false;          //!< Flag indicating whether FDWrapper::_fd is at EOF
        bool _closed = false;       //!< Flag indicating whether FDWrapper::_fd has been closed

        //! Construct from a file descriptor number returned by the kernel
        explicit FDWrapper(const int fd);
        //! Closes the file descriptor upon destruction
        ~FDWrapper();
        //! Calls [close(2)](\ref man2::close) on FDWrapper::_fd
        void close();
    };

    //! A reference-counted handle to a shared FDWrapper
    std::shared_ptr<FDWrapper> _internal_fd;

  public:
    //! Construct from a file descriptor number returned by the kernel
    explicit FileDescriptor(const int fd);

    //! Free the std::shared_ptr; the FDWrapper destructor calls close() when the refcount goes to zero.
    ~FileDescriptor() = default;

    //! Read up to `limit` bytes
    std::string read(const size_t limit = std::numeric_limits<size_t>::max());

    //! Read up to `limit` bytes into `str` (caller can allocate storage)
    void read(std::string &str, const size_t limit = std::numeric_limits<size_t>::max());

    //! Write a string, possibly blocking until all is written
    size_t write(const char *str, const bool write_all = true) { return write(BufferViewList(str), write_all); }

    //! Write a string, possibly blocking until all is written
    size_t write(const std::string &str, const bool write_all = true) { return write(BufferViewList(str), write_all); }

    //! Close the underlying file descriptor
    void close() { _internal_fd->close(); }

    int fd_num() const { return _internal_fd->_fd; }        //!< \brief underlying descriptor number
    bool eof() const { return _internal_fd->_eof; }         //!< \brief EOF flag state
    bool closed() const { return _internal_fd->_closed; }   //!< \brief closed flag state
};

// 析构的时候自动释放文件描述符
FileDescriptor::FDWrapper::~FDWrapper() {
    try {
        if (_closed) {
            return;
        }
        close();
    } catch (const exception &e) {
        // don't throw an exception from the destructor
        std::cerr << "Exception destructing FDWrapper: " << e.what() << std::endl;
    }
}

我们知道，在 Linux 系统中 “万物皆文件”，socket 也被认为是一种文件，socket 被表示成文件描述符，调用 socket() 函数返回就是一个文件描述符，对 socket 的读写就和文件的读写一样。所以 Socket 类继承自 FileDescriptor 类，同时拥有三个子类 TCPSocket、UDPSocket 和 LocalStreamSocket，我们将使用 TCPSocket 完成第一个任务。

第一个任务需要补全 apps/webget.cc 的 get_URL() 函数，这个函数接受两个参数：主机名 host 和请求路径 path ：

复制void get_URL(const string &host, const string &path) {
    TCPSocket socket;
    
    // 连接到 Web 服务器
    socket.connect(Address(host, "http"));
    
    // 创建请求报文
    socket.write("GET " + path + " HTTP/1.1\r\n");
    socket.write("Host: " + host + "\r\n\r\n");
    
    // 结束写操作
    socket.shutdown(SHUT_WR);

    // 读取响应报文
    while (!socket.eof()) {
        cout << socket.read();
    }

    // 关闭 socket
    socket.close();
}

首先调用 connect() 函数完成 TCP 的三次握手，建立与主机的连接，接着使用 write() 函数手动构造请求报文。请求报文的格式如下图所示，其中请求行的方法是 GET，URI 为请求路径 path，Http 协议版本为 HTTP/1.1，而首部行必须含有一个 Host 键值对指明将要连接的主机：

发送完请求报文后就可以结束写操作，并不停调用 TCPSocket.read() 函数读取响应报文的内容直至结束，最后关闭套接字释放资源。其实这里也可以不手动关闭，因为 socket 对象被析构的时候会自动调用 FDWrapper.close() 释放文件描述符。

在命令行中输入下述命令完成编译：

复制mkdir build
cd build
cmake ..
make -j8

之后运行 ./apps/webget cs144.keithw.org /hello 就能看到响应报文了：

接着运行测试程序，也顺利通过了：

in-memory reliable byte stream

任务二要求我们实现一个内存内的有序可靠字节流：

字节流可以从写入端写入，并以相同的顺序，从读取端读取
字节流是有限的，写者可以终止写入。而读者可以在读取到字节流末尾时，不再读取。
字节流支持流量控制，以控制内存的使用。当所使用的缓冲区爆满时，将禁止写入操作。
写入的字节流可能会很长，必须考虑到字节流大于缓冲区大小的情况。即便缓冲区只有1字节大小，所实现的程序也必须支持正常的写入读取操作。

在单线程环境下执行，无需考虑多线程生产者-消费者模型下各类条件竞争问题。

由于写入顺序和读出顺序相同，这种先入先出的 IO 特性可以使用队列来实现。C++ 标准库提供了 std::queue 模板类，但是 std::queue 不支持迭代器，这会对后续编码造成一点麻烦，所以这里换成双端队列 std::deque。

类声明如下所示，使用 deque<char> 存储数据，_capacity 控制队列长度，_is_input_end 代表写入是否结束：

复制class ByteStream {
  private:
    size_t _capacity;
    std::deque<char> _buffer{};
    size_t _bytes_written{0};
    size_t _bytes_read{0};
    bool _is_input_end{false};
    bool _error{};  //!< Flag indicating that the stream suffered an error.

  public:
    //! Construct a stream with room for `capacity` bytes.
    ByteStream(const size_t capacity);

    //! Write a string of bytes into the stream. Write as many
    //! as will fit, and return how many were written.
    //! \returns the number of bytes accepted into the stream
    size_t write(const std::string &data);

    //! \returns the number of additional bytes that the stream has space for
    size_t remaining_capacity() const;

    //! Signal that the byte stream has reached its ending
    void end_input();

    //! Indicate that the stream suffered an error.
    void set_error() { _error = true; }

    //! Peek at next "len" bytes of the stream
    //! \returns a string
    std::string peek_output(const size_t len) const;

    //! Remove bytes from the buffer
    void pop_output(const size_t len);

    //! Read (i.e., copy and then pop) the next "len" bytes of the stream
    //! \returns a vector of bytes read
    std::string read(const size_t len) {
        const auto ret = peek_output(len);
        pop_output(len);
        return ret;
    }

    //! \returns `true` if the stream input has ended
    bool input_ended() const;

    //! \returns `true` if the stream has suffered an error
    bool error() const { return _error; }

    //! \returns the maximum amount that can currently be read from the stream
    size_t buffer_size() const;

    //! \returns `true` if the buffer is empty
    bool buffer_empty() const;

    //! \returns `true` if the output has reached the ending
    bool eof() const;

    //! Total number of bytes written
    size_t bytes_written() const;

    //! Total number of bytes popped
    size_t bytes_read() const;
};

类实现：

复制ByteStream::ByteStream(const size_t capacity) : _capacity(capacity) {}

size_t ByteStream::write(const string &data) {
    size_t ws = min(data.size(), remaining_capacity());

    for (size_t i = 0; i < ws; ++i)
        _buffer.push_back(data[i]);

    _bytes_written += ws;
    return ws;
}

//! \param[in] len bytes will be copied from the output side of the buffer
string ByteStream::peek_output(const size_t len) const {
    auto rs = min(buffer_size(), len);
    return {_buffer.begin(), _buffer.begin() + rs};
}

//! \param[in] len bytes will be removed from the output side of the buffer
void ByteStream::pop_output(const size_t len) {
    auto rs = min(len, buffer_size());
    _bytes_read += rs;
    for (size_t i = 0; i < rs; ++i)
        _buffer.pop_front();
}

void ByteStream::end_input() { _is_input_end = true; }

bool ByteStream::input_ended() const { return _is_input_end; }

size_t ByteStream::buffer_size() const { return _buffer.size(); }

bool ByteStream::buffer_empty() const { return _buffer.empty(); }

bool ByteStream::eof() const { return buffer_empty() && input_ended(); }

size_t ByteStream::bytes_written() const { return _bytes_written; }

size_t ByteStream::bytes_read() const { return _bytes_read; }

size_t ByteStream::remaining_capacity() const { return _capacity - buffer_size(); }