C++ 多线程并发

1 创建线程

C++11 之前原生不支持多线程，C++11起逐步引入了对线程的支持。

std::thread 在 <thread> 头文件中声明，因此使用 std::thread 时需要包含 <thread> 头文件。

#include <iostream>
#include <thread>

void func(int a) {
    while (true) {
        std::cout << "hello world" << std::endl;
        std::this_thread::sleep_for(std::chrono::milliseconds(50));  // 休眠50毫秒
    }
}
int main() {
    int n = 0;
    std::thread t1(func, n);
    return 0;
}

上述代码中我们创建了一个 thread 线程 t1, 让它调用函数 func，并为其传入函数参数 n。

❗️线程创建后即开始运行，不需要调用 run 之类的函数才能执行。

但我们发现程序运行没多久，就会报错，这是因为主线程 main 创建完子线程 t1 后没有任何代码，就退出了。而子线程 t1 还没有执行完毕，此时就会报错了。

解决方法：

1️⃣ 使用 join 函数

这样主线程 main 即使执行完成，也会一直等待 join 的子线程执行完毕，才会结束。

...
int main() {
    int n = 0;
    std::thread t1(func, n);
    t1.join();
    return 0;
}

2️⃣ 使用 detach 函数

该函数会将主线程和子线程完全分离开，二者不再有任何关系。主线程 main 创建完子线程 t1 后，执行后续代码，执行完毕就直接退出。

...
int main() {
    int n = 0;
    std::thread t1(func, n);
    t1.detach();
    return 0;
}

detach其实就是一个守护线程。

使用 detach() 时要注意访问数据的有效性，假如向子线程 t1 传入的参数是个指针/引用，在主线程 main 执行完毕退出后，指针指向的内容就会失效，而子线程中还在使用该指针，则会出现错误。

线程的入口函数可以是：普通函数、类的静态/非静态成员函数、lambda 表达式。

1.1 其他操作

操作	功能	示例
swap(std::thread& other)	交换两个线程	std::thread t1(func, n); std::thread t2(func, n); t1.swap(t2);
get_id()	返回线程 id	t1.get_id();
hardware_concurrency()	返回硬件所实现支持最大并发线程数 (值不一定准确，只能做参考)	t1.hardware_concurrency();
native_handle()	返回操作系统支持的线程句柄	t1.native_handle();

这些都是在创建子线程 t1 的主线程 main 中能操作的方法，若我要在子线程 t1 执行的函数 func 中获取这些数据，要如何调用？使用 std::this_thread 操作当前线程。

#include <iostream>
#include <thread>

void func(int a) {
    while (true) {
        std::cout << "thread_id = " << std::this_thread::get_id() << std::endl;
        std::cout << "hardware_concurrency = " << std::this_thread::hardware_concurrency() << std::endl;
        std::cout << "native_handle = " << std::this_thread::native_handle() << std::endl;
        std::this_thread::sleep_for(std::chrono::milliseconds(50));  // 休眠50毫秒
    }
}
int main() {
    int n = 0;
    std::thread t1(func, n);
    return 0;
}

std::this_thread 还有其他的方法：

操作	功能	示例
sleep_for()	睡眠一段时间	std::this_thread::sleep_for (std::chrono::seconds(1));
sleep_until()	睡眠到一个绝对时间
yield()	当前线程放弃执行操作系统调用另一线程继续执行	while (!ready) { // wait until main() sets ready... std::this_thread::yield(); }

2 互斥量(mutex)

2.1 基础使用

#include <iostream>
#include <thread>

int global_veriable = 0;
void task() {
    for (int i = 0; i < 1000; i++) {
        global_veriable++;
        global_veriable--;
    }
}
int main() {
    std::thread t1(task);
    std::thread t2(task);
    t1.join();
    t2.join();
    std::cout << "current value is " << global_veriable;
    return 0;
}

看代码感觉 global_veritable 应该为 0，但是实际上可能每次运行都是不同的值。因为两个线程都会对该公共变量 global_veritable 进行读写访问。

多线程编程需考虑对公共资源的保护，否则涉及对公共资源访问的代码是不安全的。—— 互斥量(mutex)

std::mutex 对象提供了独占所有权的特性。在 <mutex> 头文件中声明，因此使用 std::mutex 时需要包含<mutex> 头文件。

现在，我们在👆代码中使用 mutex 对公共资源 global_veriable 进行一个保护。在 global_veriable 的更改前后增加上锁解锁的操作，上锁和解锁中间的代码段称为 临界区，临界区中的代码每次只能被一个线程访问。

#include <iostream>
#include <thread>
#include <mutex>

std::mutex mtx;
int global_veriable = 0;
void task() {
    for (int i = 0; i < 1000; i++) {
        mtx.lock();
        global_veriable++;
        global_veriable--;
        mtx.unlock();
    }
}
int main() {
    std::thread t1(task);
    std::thread t2(task);
    t1.join();
    t2.join();
    std::cout << "current value is " << global_veriable;
    return 0;
}
// 结果：current value is 0

2.2 std::lock_guard & std::unique_lock

但有时候临界区中的代码很复杂，可能中途就 return，或抛出异常，这样就不能正常往下执行调用 unlock 解锁。

—— std::lock_guard

std::lock_guard 是个模板类，会在构造函数中进行加锁，析构函数中进行解锁。这是 C++ 中一种常用的做法，称为 RAII (Resource Acquisition Is Initialization)，资源获取即初始化。

使用 std::lock_guard，它在创建时，会接受一个互斥量并对其上锁；在析构时对其解锁。因此在临界区中，若中途 return 或抛出异常时，该变量会析构自动解锁释放资源。

#include <iostream>
#include <thread>
#include <mutex>

std::mutex mtx;
int global_veriable = 0;
void task() {
    for (int i = 0; i < 1000; i++) {
        std::lock_guard<std::mutex> lock(mtx);
        global_veriable++;
        global_veriable--;
    }
}
int main() {
    std::thread t1(task);
    std::thread t2(task);
    t1.join();
    t2.join();
    std::cout << "current value is " << global_veriable;
    return 0;
}
// 结果：current value is 0

为便于理解，std::lock_guard 就相当于 👇 (实际更复杂一点，它是个模板类)：

class A {
    A(std::mutex &mtx) {
        mtx.lock();
    }
    ~A(){
        mtx.unlock();
    }
}

std::lock_guard 并不灵活，它除了析构函数，没有其他能调用的成员函数。它没有办法更改锁的粒度，👆示例中它的作用域是整个 for 循环，若我们想在中途某一时刻解锁，std::lock_guard 没有办法实现，它只有在析构时才能释放锁。这时就可以用到 std::unique_lock。

std::unique_lock 同样是个 RAII 风格的模板类，但提供了更好的上锁和解锁控制，能在 std::unique_lock 对象作用域中调用 unlock() 提前释放锁，它在析构时会根据当前状态是上锁还是已经解锁，决定是否要解锁。

#include <iostream>
#include <thread>
#include <mutex>

std::mutex mtx;
int global_veriable = 0;
void task() {
    for (int i = 0; i < 1000; i++) {
        std::unique_lock<std::mutex> lock(mtx);
        global_veriable++;
        global_veriable--;
        lock.unlock();
        // 后续的无需锁的一些操作...
    }
}
int main() {
    std::thread t1(task);
    std::thread t2(task);
    t1.join();
    t2.join();
    std::cout << "current value is " << global_veriable;
    return 0;
}
// 结果：current value is 0

2.3 std::lock() & std::try_lock() & std::scoped_lock

有时，线程中需要同时获取多个锁，很容易因为 lock 上锁顺序的不同造成死锁。

🌰 示例中，子线程 t1 执行 task1 会先对 mux1 上锁，再对 mux2 上锁，子线程 t2 执行 task2 会先对 mux2 上锁，再对 mux1 上锁。这样可能存在一种情况：t1 持有 mux1 锁 t2 持有 mux2 锁的同时，二者都希望再获取对方持有的锁，这样就会造成死锁。

#include <iostream>
#include <thread>
#include <mutex>

std::mutex mtx1;
std::mutex mtx2;
void task1() {
    for (int i = 0; i < 1000; i++) {
        mtx1.lock();  // 先对 mux1 上锁，再对 mux2 上锁
        mtx2.lock();
        // 临界区
        mtx2.unlock();
        mtx1.unlock();
    }
}
void task2() {
    for (int i = 0; i < 1000; i++) {
        mtx2.lock();  // 先对 mux2 上锁，再对 mux1 上锁
        mtx1.lock();
        // 临界区
        mtx1.unlock();
        mtx2.unlock();
    }
}
int main() {
    std::thread t1(task1);
    std::thread t2(task2);
    t1.join();
    t2.join();
    return 0;
}

解决方法：

1️⃣ 保证上锁顺序，二者都先对 mux1 上锁，再对 mux2 上锁。

2️⃣ 使用 std::lock() 对多个互斥量进行批量上锁。

#include <iostream>
#include <thread>
#include <mutex>

std::mutex mtx1;
std::mutex mtx2;
void task1() {
    for (int i = 0; i < 1000; i++) {
        std::lock(mux1, mux2);
        // 临界区
        mtx1.unlock();
        mtx2.unlock();
    }
}
void task2() {
    for (int i = 0; i < 1000; i++) {
        std::lock(mux1, mux2);
        // 临界区
        mtx1.unlock();
        mtx2.unlock();
    }
}
int main() {
    std::thread t1(task1);
    std::thread t2(task2);
    t1.join();
    t2.join();
    return 0;
}

std::lock() 会依次锁定给定的 mutex，若任何一个不可用会阻塞等待。

std::try_lock() 会尝试依次锁定给定的 mutex，但若有一个加锁不成功就返回 false，全部加锁成功返回 true。

3️⃣ 使用 std::scoped_lock，是个 RAII 风格的模板类，类似于 lock_guard，但可以管理多个互斥量，防止死锁。比 std::lock() 的好处在于析构时自动解锁。

std::scoped_lock<std::mutex> lock(mux1, mux2);

3 原子变量 (atomic)

std::atomic 是个模板类，可让我们无需通过互斥量来实现对资源的原子操作。只需将存在资源竞争的变量变为原子变量就行。使用头文件 <atomic>

🌰 之前的例子中，我们使用 std::mutex 来保证对公共资源 global_veriable 的互斥访问：

#include <iostream>
#include <thread>
#include <mutex>

std::mutex mtx;
int global_veriable = 0;
void task() {
    for (int i = 0; i < 1000; i++) {
        mtx.lock();
        global_veriable++;
        global_veriable--;
        mtx.unlock();
    }
}
int main() {
    std::thread t1(task);
    std::thread t2(task);
    t1.join();
    t2.join();
    std::cout << "current value is " << global_veriable;
    return 0;
}
// 结果：current value is 0

我们可以无需使用 std::mutex，而是将 global_veriable 变为原子变量。此后无需其他操作就能保证 global_veriable 的线程安全。

实例化模板类 std::atomic` 时，要传入变量的类型。

#include <iostream>
#include <thread>
#include <atomic>

std::atomic<int> global_veriable = 0;
void task() {
    for (int i = 0; i < 1000; i++) {
        global_veriable++;
        global_veriable--;
    }
}
int main() {
    std::thread t1(task);
    std::thread t2(task);
    t1.join();
    t2.join();
    std::cout << "current value is " << global_veriable;
    return 0;
}
// 结果：current value is 0

std::atomic 底层如何实现取决于具体的标准库的实现，有的是使用 mutex 进行包装，自动进行 lock/unlock；有的是利用 CPU 硬件的指令来实现原子访问。

4 条件变量 (condition_variable)

🌰 下面创建两个线程，生产者线程不停生产数据到队列 q 中，消费者线程不断从队列 q 中取数据。

#include <iostream>
#include <thread>
#include <mutex>

std::mutex mtx;
std::deque<int> q;
void producer() {
    int i = 0;
    while (true) {
        std::unique_lock<std::mutex> lock(mtx);
        q.push_back(i);
        // std::this_thread::sleep_for(std::chrono::milliseconds(10));
        if (i < 999) i++;
        else i = 0;
    }
}
void costumer() {
    while (true) {
        std::unique_lock<std::mutex> lock(mtx);
        if (!q.empty()) {
            std::cout << "Get value from producer: " << q.front() << std:endl;
            q.pop_front();
        }
        // std::this_thread::sleep_for(std::chrono::milliseconds(10));  
    }
}
int main() {
    std::thread t1(producer);
    std::thread t2(costumer);
    t1.join();
    t2.join();
    return 0;
}

因为两个线程都在一个死循环中，这是非常消耗 CPU 资源的。为了降低 CPU 资源，我们首先想到的可能是使用延时，但是这就要考虑延时的长短：若延时过长，则生产者生产数据不及时 or 消费者线程取数据不及时；若延时过短，则对缓解 CPU 消耗作用不大。

—— 因此可采用 条件变量(condition_variable)，使用头文件 <condition_veriable>

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_veriable>

std::mutex mtx;
std::deque<int> q;
std::condition_veriable cv;
void producer() {
    int i = 0;
    while (true) {
        std::unique_lock<std::mutex> lock(mtx);
        q.push_back(i);
        cv.notify_one();  // 唤醒一个正在等待的线程
        if (i < 999) i++;
        else i = 0;
    }
}
void costumer() {
    while (true) {
        std::unique_lock<std::mutex> lock(mtx);
        // 若 q 为空，则当前线程释放锁 lock 并陷入等待
        if (q.empty()) {
            cv.wait(lock);
        }
        std::cout << "Get value from producer: " << q.front() << std:endl;
        q.pop_front();
    }
}
int main() {
    std::thread t1(producer);
    std::thread t2(costumer);
    t1.join();
    t2.join();
    return 0;
}

上面这种写法，在只有一个生产者一个消费者时没问题；但若有多个消费者时，会出现虚假唤醒：某个条件并没有满足，但是线程被唤醒了。🌰 下面将一个消费者扩展至两个，会报错 Exception: front() called on empty deque.

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_veriable>

std::mutex mtx;
std::deque<int> q;
std::condition_veriable cv;
void producer() {
    int i = 0;
    while (true) {
        std::unique_lock<std::mutex> lock(mtx);
        q.push_back(i);
        cv.notify_one();
        if (i < 999) i++;
        else i = 0;
    }
}
void costumer1() {
    while (true) {
        std::unique_lock<std::mutex> lock(mtx);
        if (q.empty()) {
            cv.wait(lock);
        }
        std::cout << "Customer1 get value from producer: " << q.front() << std:endl;
        q.pop_front();
    }
}
void costumer2() {
    while (true) {
        std::unique_lock<std::mutex> lock(mtx);
        if (q.empty()) {
            cv.wait(lock);
        }
        std::cout << "Customer2 get value from producer: " << q.front() << std:endl;
        q.pop_front();
    }
}
int main() {
    std::thread t1(producer);
    std::thread t2(costumer1);
    std::thread t3(costumer2);
    t1.join();
    t2.join();
    t3.join();
    return 0;
}

这是因为：假设 producer 生产了一个数据之后 notify_one()，然后 costumer1 被唤醒，消费一个数据让队列 q 变为空，然后继续执行下一个 while 循环，但它在执行 if (q.empty()) 判断前 (还没 wait)，producer 又生产了一个数据并 notify_one()。costumer1 判断 q 不为空，没有进入 wait，直接消耗了这个生产出的数据。而 producer notify_one() 唤醒了 costumer2，但队列 q 中的数据因为被 costumer1 消耗使队列又变为了空，因此报错 Exception: front() called on empty deque. —— 因此我们不能写 if (q.empty()) 而应该改为 while (q.empty())，这样就能解决 虚假唤醒 的问题。

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_veriable>

std::mutex mtx;
std::deque<int> q;
std::condition_veriable cv;
void producer() {
    int i = 0;
    while (true) {
        std::unique_lock<std::mutex> lock(mtx);
        q.push_back(i);
        cv.notify_one();
        if (i < 999) i++;
        else i = 0;
    }
}
void costumer1() {
    while (true) {
        std::unique_lock<std::mutex> lock(mtx);
        while (q.empty()) {  // !!!
            cv.wait(lock);
        }
        std::cout << "Customer2 get value from producer: " << q.front() << std:endl;
        q.pop_front();
    }
}
void costumer2() {
    while (true) {
        std::unique_lock<std::mutex> lock(mtx);
        while (q.empty()) {  // !!!
            cv.wait(lock);
        }
        std::cout << "Customer2 get value from producer: " << q.front() << std:endl;
        q.pop_front();
    }
}
int main() {
    std::thread t1(producer);
    std::thread t2(costumer1);
    std::thread t3(costumer2);
    t1.join();
    t2.join();
    t3.join();
    return 0;
}

PS: cv.notify_all() 可唤醒所有正在等待的线程

5 信号量 (semaphore)

要求 C++20，使用头文件 <semahpore>，计数信号量允许一个公共资源同时被多个线程访问。二元信号量可以替代条件变量使用，可获得更好的性能。

// 计数信号量 counting_semaphore, 实例化时需指定内部计数器的计数
std::counting_semaphore<6> csem;
// 二元信号量 binary_semaphore, 其实就是计数信号量的模板特化 (计数为 1), 相当于：
// using std::binary_semaphore = counting_semaphore<1>
std::binary_semaphore bsem;

// 可在初始化一个信号量时, 指定其内部计数器的计数状态, 默认是0
std::counting_semaphore<6> csem2(0);
std::binary_semaphore bsem2(0);

成员函数	功能
release(std::ptrdiff_t update = 1);	原子地将内部计数器的值增加 `update`，update 默认值为 1。任何等待计数器大于 0 的线程，如由于阻塞于 acquire 者，将继而被解除阻塞。
acquire()	若内部计数器大于 0 则尝试将它减少 1；否则阻塞直至它大于 0 且能成功减少内部计数器。

信号量的 release()/acquire() 类似于条件变量的 notify_all()/wait()。

std::binary_semaphore 调用 release() 时，不能传入大于 1 的数。

🌰 创建五个线程，每个线程都在等待信号量，main 主线程 release(2)，那么将会有 2 个线程能继续运行。

#include <iostream>
#include <thread>
#include <semaphore>
std::counting_semaphore<4> csem;
void task() {
    std::cout << "Ready to acquire signal" << std::endl;
    csem.acquire();
    std::cout << "Acquire end" << std::endl;
}
int main() {
    std::thread t0(task);
    std::thread t1(task);
    std::thread t2(task);
    std::thread t3(task);
    std::thread t4(task);
    std::cout << "Ready to release signal" << std::endl;
    csem.release(2);
    std::cout << "Release end" << std::endl;
    t0.join();
    t1.join();
    t2.join();
    t3.join();
    t4.join();
    return 0;
}

输出结果：

Ready to release signal
Release end
Ready to acquire signal
Acquire end
Ready to acquire signal
Ready to acquire signal
Ready to acquire signal
Ready to acquire signal
Acquire end

6 std::promise & std::future

Scenario #1：主线程想将一些计算交给子线程，然后从子线程中获取结果。

#include <iostream>
#include <thread>

void task(int a, int b, int &ret) {
    ret = a * a + b * 2;
}
int main() {
    int ret = 0;
    // 因为 std::thread 没有给我们提供获取返回值的成员函数，因此我们只能通过传入一个指针/引用，获取返回值。
    // 此处传入一个 ret 的引用
    std::thread t(task, 1, 2, std::ref(ret));
    std::cout << "return value is " << ret;
    t.join();
    return 0;
}

👆这样写肯定是不行的，因为：

(1) 主线程 main 和子线程 t 都访问了公共变量 ret，因此要对它加锁。(虽然这里因为代码简单没啥问题，但还是线程不安全的)

(2) 因为无法得知子线程何时能对 ret 赋值，所以主线程不能直接获取 ret 的结果，因此要使用条件变量。

所以，我们把代码改成了👇

#include <iostream>
#include <thread>
#include <condition_veriable>

std::mutex mtx;
std::condition_veriable cv;

void task(int a, int b, int &ret) {
    std::unique_lock<std::mutex> lock(mtx);
    ret = a * a + b * 2;
    cv.notify_one();
}
int main() {
    int ret = 0;
    std::thread t(task, 1, 2, std::ref(ret));
    std::unique_lock<std::mutex> lock(mtx);
    cv.wait(lock);
    std::cout << "return value is " << ret;
    t.join();
    return 0;
}

这样写下来发现代码量有点多，我们只是想开启一个线程用于完成一些计算，然后在主线程中获取该计算结果。这时有一些更简单的方法 —— 使用标准库中为我们提供了 std::promise & std::future。

std::promise & std::future 都在头文件 <future> 中定义，二者都是模版类。二者和之前的 unique_lock 等是一样的，对象不能被复制，只能使用 std::move() 进行传递或使用指针/引用。

主线程中创建 std::promise, std::future 两个类型对象，通过 promise 对象的成员函数 get_future() 将二者联系在一起
主线程通过向子线程传入 promise 对象的引用，将子线程与主线程建立连接 (相当于一个数据传输通道)
子线程调用 promise 对象的成员函数 set_value() 对它赋值，主线程调用 future 对象的 get() 拿到子线程赋值的结果

❗️future 对象的 get() 只能调用一次，调用第二次会让程序崩溃。

#include <iostream>
#include <thread>
#include <future>

void task(int a, int b, std::promise<int> &ret) {
    ret.set_value(a * a + b * 2);
}
int main() {
    std::promise<int> p;
    std::future<int> f = p.get_future();
    
    std::thread t(task, 1, 2, std::ref(p));
    std::cout << "return value is " << f.get();
    t.join();
    return 0;
}

Scenario #2：主线程创建子线程时，有些参数仍是未知的，需要子线程在运行过程中从主线程获取

Scenario #1 相当于要在子线程中赋值，主线程中获取

Scenario #2 相当于要在子线程中获取，主线程中赋值

二者恰好是一个相反的操作

主线程在创建子线程 t 时传入的第二个参数是未知的，因此传入一个 std::future 对象的引用，用于后续子线程通过 get() 获取值。

同样的，该 std::future 对象需要和 std::promise 对象通过 get_future() 绑定在一起。

主线程之后可通过 std::promise 对象的 set_value() 赋值，然后子线程就能获取到该值进行后续计算。

#include <iostream>
#include <thread>
#include <future>

void task(int a, std::future<int> &b, std::promise<int> &ret) {
    int ret_b = b.get();
    ret.set_value(a * a + ret_b * 2);
}
int main() {
    std::promise<int> p_in;
    std::promise<int> p_out;
    std::future<int> f_in = p_in.get_future();
    std::future<int> f_out = p_out.get_future();
    
    std::thread t(task, 1, std::ref(f_in), std::ref(p_out));
    p_in.set_value(2);
    std::cout << "return value is " << f_out.get();
    t.join();
    return 0;
}

Scenario #3：有多个子线程都要在运行过程中从主线程获取同一个值

此时，若多个子线程都使用同一个 std::future<int> &b，则会因多次调用 get() 导致程序崩溃。

此时要使用 std::shared_future，也是模板类，不同于 std::future，可允许多个线程等候同一共享状态，可复制，多个 shared_future 对象能指代同一共享状态。

若每个线程通过其自身的 shared_future 对象副本访问，则从多个线程访问同一共享状态是安全的。

#include <iostream>
#include <thread>
#include <future>

void task(int a, std::shared_future<int> b, std::promise<int> &ret) {
    int ret_b = b.get();
    ret.set_value(a * a + ret_b * 2);
}
int main() {
    std::promise<int> p_in;
    std::promise<int> p_out;
    std::future<int> f_in = p_in.get_future();
    std::future<int> f_out = p_out.get_future();
    
    std::shared_future<int> f_share = f_in.share();
    
    // 每个线程都会单独持有一个 shared_future 对象的副本, 所以每个都只会进行一个 get() 操作，这样是没问题的
    std::thread t0(task, 1, f_share, std::ref(p_out));
    std::thread t1(task, 1, f_share, std::ref(p_out));
    std::thread t2(task, 1, f_share, std::ref(p_out));
    p_in.set_value(2);
    std::cout << "return value is " << f_out.get();
    t.join();
    return 0;
}

7 std::async() & std::packaged_task

我们第 6 小节有提到，使用标准库提供的 std::promise & std::future 实现效果 "将一些计算交给子线程，然后从子线程中获取结果"。

#include <iostream>
#include <thread>
#include <future>

void task(int a, int b, std::promise<int> &ret) {
    ret.set_value(a * a + b * 2);
}
int main() {
    std::promise<int> p;
    std::future<int> f = p.get_future();
    
    std::thread t(task, 1, 2, std::ref(p));
    std::cout << "return value is " << f.get();
    t.join();
    return 0;
}

但可能我们觉得这个代码量还是有点多，那么就可以使用标准库提供的 std::async()

std::async() 在头文件 <future> 中定义。

之前提到 std::thread 没有获取线程返回值的功能，但 std::async() 帮我们包装了一个这样的功能，让我们可以获取返回值。返回值类型是 std::future。

因此我们可以进一步简化代码：

#include <iostream>
#include <future>

int task(int a, int b) {
    return a * a + b * 2;
}
int main() {
    std::future<int> f = std::async(task, 1, 2);
    // std::future<int> f = std::async(std::launch::async, task, 1, 2);
    std::cout << "return value is " << f.get();
    return 0;
}

这里 std::async() 其实可能不会在定义时就开启一个线程完成计算，而是根据我们需要决定是否开启一个线程，如果我们要强制它在定义时就开一个线程完成相关计算，要传递参数 std::launch::async，默认参数是 std::launch::deferred。

std::launch 的常量	解释
std::launch::async	运行新线程，以异步执行任务
std::launch::deferred	调用方线程首次请求其结果时，才执行任务（惰性求值）

std::packaged_task 是个模版类，在头文件 <future> 中定义。

std::packaged_task 实例化时，要指定任务的参数列表和返回值，传入 Callable 目标 (函数、lambda表达式、bind表达式或另一个函数对象) 用于初始化，这样就完成了对任务的包装。它可以被异步调用。它的返回值或抛出的异常被存储在一个 shared state 中，可以通过 std::future 对象访问。

#include <iostream>
#include <future>

int task(int a, int b) {
    return a * a + b * 2;
}
int main() {
    std::packaged_task<int(int, int)> t(task);
    t(1, 2);  // 在任务执行时指定参数
    std::cout << "return value is " << t.get_future().get();
    return 0;
}

我们也可以在对任务包装时就指定参数 —— 通过 std::bind()

#include <iostream>
#include <future>

int task(int a, int b) {
    return a * a + b * 2;
}
int main() {
    std::packaged_task<int()> t(std::bind(task, 1, 2));
    t();
    std::cout << "return value is " << t.get_future().get();
    return 0;
}

std::bind() 将任务与参数绑定，返回值是个 std::function 类型。这样后续调用的时候就不用指定参数了。

这就相当于 👇 所以std::packaged_task 中执行的任务变为 int()，不再有参数列表，只有返回值。

int task() {
    int a = 1, b = 2;
    task(a, b);
}

8 生产者消费者模型

[TODO]

9 线程池的实现

[TODO]

总结

创建线程：std::thread t1(func, n); 传入调用函数和函数参数。

❗️线程创建后即开始运行，不需要调用 run 之类的函数才能执行。

但若主线程执行完时子线程还没执行完，主线程的退出会导致报错，解决方法：

使用 t1.join();：即使主线程执行完毕，也要等待 join 的子线程执行完毕，才能结束。
使用 t1.detach(); ：主线程与子线程完全分离，二者不再有任何关系，则主线程执行完后直接退出。
- 使用 detach() 时要注意访问数据的有效性。若向子线程 t1 传入的参数是个指针/引用，在主线程 main 执行完毕退出后，指针指向的内容就会失效，而子线程中还在使用该指针，则会出现错误。

互斥量：std::mutex mtx; 对多线程访问的公共资源进行保护。

在公共资源访问前后增加上锁解锁的操作 mtx.lock();、mtx.unlock();，构建临界区。临界区中的代码每次只能被一个线程访问。

临界区的代码很复杂，可能中途就 return，或抛出异常，这样就不能正常往下执行调用 unlock 解锁。

std::lock_guard<std::mutex> lock(mtx);：RAII 风格的模板类，在自身构造函数中进行加锁，析构函数中进行解锁。
std::unique_lock<std::mutex> lock(mtx);：RAII 风格的模板类，提供了更好的上锁和解锁控制，可以调用 lock.unlock(); 提前释放锁，它在析构时会根据当前状态是上锁还是已经解锁，决定是否要解锁。

多个线程中锁哥锁的上锁顺序不同可能造成死锁问题，解决方法：

保证所有线程中的上锁顺序相同，🌰先对 mux1 上锁，再对 mux2 上锁。
std::lock(mux1, mux2);：批量依次锁定给定的 mutex，若任何一个不可用会阻塞等待。
std::try_lock(mux1, mux2);：尝试依次锁定给定的 mutex，但若有一个加锁不成功就返回 false，全部加锁成功返回 true。
std::scoped_lock<std::mutex> lock(mux1, mux2);：RAII 风格的模板类，类似于 lock_guard，但可以管理多个互斥量，防止死锁。比 std::lock() 的好处在于析构时自动解锁。

原子变量：std::atomic<int> global_veriable = 0; 将存在资源竞争的变量变为原子变量，则无需再通过互斥量 mutex 来实现对资源的原子操作。

std::atomic 底层如何实现取决于具体的标准库的实现，有的是使用 mutex 进行包装，自动进行 lock/unlock；有的是利用 CPU 硬件的指令来实现原子访问。

条件变量：等待获取资源时，阻塞 CPU 以减少资源消耗。

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_veriable>

std::mutex mtx;
std::deque<int> q;
std::condition_veriable cv;
void producer() {
    int i = 0;
    while (true) {
        std::unique_lock<std::mutex> lock(mtx);
        q.push_back(i);
        cv.notify_one();  // 唤醒一个阻塞的线程，cv.notify_all() 可唤醒所有阻塞的线程
        if (i < 999) i++;
        else i = 0;
    }
}
void costumer1() {
    while (true) {
        std::unique_lock<std::mutex> lock(mtx);
        while (q.empty()) {  // !!! 防止虚假唤醒
            cv.wait(lock);
        }
        std::cout << "Customer2 get value from producer: " << q.front() << std:endl;
        q.pop_front();
    }
}
void costumer2() {
    while (true) {
        std::unique_lock<std::mutex> lock(mtx);
        while (q.empty()) {  // !!!
            cv.wait(lock);
        }
        std::cout << "Customer2 get value from producer: " << q.front() << std:endl;
        q.pop_front();
    }
}
int main() {
    std::thread t1(producer);
    std::thread t2(costumer1);
    std::thread t3(costumer2);
    t1.join();
    t2.join();
    t3.join();
    return 0;
}

信号量：计数信号量允许一个公共资源同时被多个线程访问。二元信号量可以替代条件变量使用，可获得更好的性能。

信号量的 release()/acquire() 类似于条件变量的 notify_all()/wait()。

// 计数信号量 counting_semaphore, 实例化时需指定内部计数器的计数
std::counting_semaphore<6> csem;
// 二元信号量 binary_semaphore, 其实就是计数信号量的模板特化 (计数为 1), 相当于：
// using std::binary_semaphore = counting_semaphore<1>
std::binary_semaphore bsem;

// 可在初始化一个信号量时, 指定其内部计数器的计数状态, 默认是0
std::counting_semaphore<6> csem2(0);
std::binary_semaphore bsem2(0);

std::promise & std:future 使用场景：主线程从子线程获取一些值 or 子线程在运行过程中要从主线程中获取一定的值。

主线程中创建 std::promise, std::future 两个类型对象，通过 promise 对象的成员函数 get_future() 将二者联系在一起。
若主线程要从子线程获取之后的值，则主线程要向子线程传入 promise 对象的引用；若子线程要在运行中从主线程中获取值，则主线程要向子线程传入 future 对象的引用。相当于将子线程与主线程建立连接 (相当于一个数据传输通道)。
线程可调用 promise 对象的成员函数 set_value() 对它赋值，调用 future 对象的 get() 可拿到该赋值的结果。

#include <iostream>
#include <thread>
#include <future>

void task(int a, std::future<int> &b, std::promise<int> &ret) {
    int ret_b = b.get();
    ret.set_value(a * a + ret_b * 2);
}
int main() {
    std::promise<int> p_in;
    std::promise<int> p_out;
    std::future<int> f_in = p_in.get_future();
    std::future<int> f_out = p_out.get_future();
    
    std::thread t(task, 1, std::ref(f_in), std::ref(p_out));
    p_in.set_value(2);
    std::cout << "return value is " << f_out.get();
    t.join();
    return 0;
}

❗️future 对象的 get() 只能调用一次，调用第二次会让程序崩溃。若有多个子线程都要在运行过程中从主线程获取同一个值，可使用 std::shared_future，也是模板类，不同于 std::future，可允许多个线程等候同一共享状态，可复制，多个 shared_future 对象能指代同一共享状态。

若每个线程通过其自身的 shared_future 对象副本访问，则从多个线程访问同一共享状态是安全的。

#include <iostream>
#include <thread>
#include <future>

void task(int a, std::shared_future<int> b, std::promise<int> &ret) {
    int ret_b = b.get();
    ret.set_value(a * a + ret_b * 2);
}
int main() {
    std::promise<int> p_in;
    std::promise<int> p_out;
    std::future<int> f_in = p_in.get_future();
    std::future<int> f_out = p_out.get_future();
    
    std::shared_future<int> f_share = f_in.share();
    
    // 每个线程都会单独持有一个 shared_future 对象的副本, 所以每个都只会进行一个 get() 操作，这样是没问题的
    std::thread t0(task, 1, f_share, std::ref(p_out));
    std::thread t1(task, 1, f_share, std::ref(p_out));
    std::thread t2(task, 1, f_share, std::ref(p_out));
    p_in.set_value(2);
    std::cout << "return value is " << f_out.get();
    t.join();
    return 0;
}