Operating System: Process, Thread, IPC，shared_memory

继续填坑。昨天看了Modern Operating System Concept一书，顺手在这里做点笔记......

1. To implement the process model, the operating system maintain a table(usually an array of structure), called the Process Table(also known as Process Control Block)

2. CPU utilization can be improved by adding another piece of memory

3. Processes are used to group resource together(resource management), while threads are the entities scheduled for execution on the CPU(think of the phrase "thread of control")

4. How can a non-blocking I/O be ? ( when non-blocking I/O finished, it just interrupt the executing process. ...However, non-blocking I/O is hard to program )

5. How can an interrupt be ? ( when the system (not necessarily the system kernel, could be the runtime-environment) decide that it's proper to deliver an interrupt to a process, it just force to "save the registers of the executing process", "push stacks...." )

6, thread_join: a thread block, waiting for another thread to exist.

7. thread_yield: a thread voluntarily give up the CPU to let another thread run

8. User-level-thread:

Advantage:

--- Efficiency. They don't have to trap into system call and make context switch from user space to kernel space.

Disadvantage:

--- How blocking ststem call is implemented? If one system call would block, it block the process, because the kernel don't know anything about the threads.

--- How to handle page fault? If page fault happen, the kernel would block the entire process before starting to fetch the required page from memory.

--- Without the kernel knowing about threads, no other thread in the process would ever run unless the first thread voluntarily give up the CPU (otherwise, you need a clock signal in the user-level thread library to force the executing thread to give up the CPU )

The point is, multithreading programs usually need to make a lots of system call, which must require the kernel to know about it, making user-level thread implementation not suitable.

9. kernel-level-thread

10. hybrid thread implementation

11. Three concerns about IPC:

1. How process can pass info to another

2. Making sure that two or more prcesses don't get into each other's way

3. Proper sequencing order when process dependency present (some process have to wait for another process to finish its work before it can continue on)

(we usee process as example here, but these concerns can also be applied to threads)

接下来是重点，我想通过对一系列方法的陈述和比较好好讲一通IPC问题，搞清楚它的来龙去脉。

在 Modern Operating System Concepts 这本书里面，作者详细地介绍了解决 IPC 问题的一下方法，由浅到深，由表及里，要细细看才能看出里头的逻辑来（我是看第二遍才发现的...）

（注意: 下面讲的方法中会将process和thread混在一起，因为IPC这些对process和thread都是一样的）

首先第一个尝试是：

1. Diable Interrupts

这个方法是指，当一个process进入critical region之后，就禁止所有的中断（interrupt)发生。我们知道，多个进程（或者线程）之间的轮换一般是由于CPU数据通路的时钟信号来控制的（每人占用CPU一段时间）（我们也把这种称之为Interrupt)，或者是由系统的中断信号导致的，如果禁用了中断，那么这个process就 "一家独大" 了。这样子，这个执行中的process处于critical region的时候就不怕被干扰了。等到离开critical region之后，允许恢复中断。

但是这个方法有一个致命的缺陷：如果那个process不终止了怎么办？死循环了怎么办？而且，这种方法只有对单个处理器的机器才适用，多个处理器或多核的CPU就不适用了。

（值得一提的时，这种技巧可以在操作系统内核中应用。当内核在执行一些非常任务时，可以禁用中断，这样就不怕被干扰了）

第二种方法是：

2. Lock Variable

进程通过一些变量的值来判断是否应该进入critical region。比如，设置一个变量 i , 当变量的值是 0 的时候就证明没有其他进程处于critical region，这时进程A 就将这个变量设置为 1 ，标志着有它正处于critical region，然后就进入critical region。当其他进程看到着个变量是 1 的时候，就知道有进程正处于critical region了，就会等待。等到进程A完成了任务离开critical region了, 它再把这个变量设置为 0 ,这时其他进程就可以进入了。

int flag = 0;

enter_region()
{
    while (flag == 1)
        ;   //spin wait
    flag = 1;
}

leave_region()
{
    flag = 0;
}

但是这个方法也是行不通的（很抱歉）。这是因为，假如，当进程A看到变量 i 为0，准备将其设置为1 进入critical region的时候，内核突然决定将CPU的使用权交给进程B, 那会怎么样？这时进程B也会发现变量的值是 0 ，然后将其设置为 1, 然后进入critical region.... 当进程B 处于critical region的时候，CPU使用权又交回给进程A, 然后进程A 将变量 i 设置为 1 (它在被切换之前已经检查变量 i 了，可惜现在 i 的值已经被设为 1 了...），然后进入critical region,这时就有两个进程同时处于才critical region了.

（这个方法失败的根本原因就在于，

while (flag == 1) 
    ;  
flag = 1;

这几条语句不是连在一起的（不是atomic），在执行的过程中会被打断。假如能过使其连在一起，就没问题了。要达到这个效果，可以借助x86的一些特殊指令。看下文The TST instruction和xchg指令）

虽然这个方法不行，但是通过简单变换一下，就可以得到一个不错的方法：

3. Strict Alternation (严格轮换法)

Strict Alternation的伪代码是：

1 //process  0                          //process 1
2 while(TRUE){                         while(TRUE){
3     while (turn != 0)                      while (turn != 1)
4         ;     // loop                                ;      // loop
5     critical_region();                      critical_region()
6     turn = 1;                               turn = 0;
7     non_critical_region();                  non_critical_region();
8 }                                    }

这里用了一个变量 turn ,貌似更上面的 Lock Variable 的方法一样，但是由于应用了严格轮换的方法，所以再也不怕进程在任何时候被切换了。

为什么叫严格轮换呢？主要是这个轮换：在上面的两个process的代码中，process 0要进入critical region的条件是 turn == 0 ，而process 1要进入critical region的条件是 turn == 1 ，两者的条件是不同的（lock variable那个方法要求条件都相同），而这个条件都是由另外一条线程退出critical region之后才能成立。比如，如果process 0要进入critical region，那么就需要 turn == 0 ，而 turn = 0 这个语句是在process 1的第6行，也就是process 1退出critical region之后。换句话说，turn什么时候等于0是由process 1决定的。即，一个线程进入critical region的机会是另一个进程赋予的。

但是，但是，这个方法虽然能够有效地解决race conditioin，却会造成资源浪费：假设当进程 0 执行完critical region了，它会将 turn 设为 1。又假设进程 1 这时候还处于 non_critical_region。这时系统决定让进程 0 继续执行，然后它就会继续执行完non_critical_region。这时，当进程 0 回到了while循环那里的时候，它会被block住，因为 turn 已经被它设为 1 了。那么，这时两个进程就会处于一个这样的状态：进程 1 处于non_critical_region，这时进程 0 却被block住了！这是一种资源浪费。为什么？我们说，只有某个进程处于criticl_region的时候其他进程被block住才是合理的：　如果某个进程处于non_critical_region，则计算资源是可以为其他进程所用的，但是你却把其他进程给block住，这还不是浪费？

其实，按照 Modern Operating System里面的说法，一个设计良好的 IPC 解决方案应该具有如下特征：

1. No two prcesses may be simultaneously inside thier critical regions.

2. No assumptioins may be made about the speeds or the numbers of CPUs.

3. No process running outside its critical region may block other processes ** // strice alternation的方法违反了这一条

4. No process shoud have to wait forever to enter its critical region.

第四种方案是：

4. Peterson's algorithm

代码如下：

 1 #define FALSE
 2 #define TRUE
 3 #define N   2                        /*number of process */
 4 
 5 int turn ;                           /*whose turn it is ? */
 6 int interested[N];                   /*all value initially FALSE (0) */
 7 
 8 void enter_region(int process)       /* process 0 or 1 */
 9 {
10     int other;
11     other = 1 - process;              /*number of the other process.( only 0 or 1) */
12     interested[process] = TRUE ;     /*show that you are interested */
13     turn = process;                  /* set flag */
14     while( turn == process && interested[other] == TRUE )
15           ;    /* loop */
16 }
17 
18 void leave_region(int process)       /* process: who is leaving */
19 {
20     interested[process] = FALSE;     /*indicate departure of critical region */
21 }

算法的魅力啊....慢慢体会...

值得注意的是，这个方法很好的规避了 race condition, 却没有违反上面的四点规则。

（有一个比较“极端”的情况需要说明一下。peterson's algorithm正确的前提是：

Loads and stores are atomic
They execute in order

也就是， x = 1 这种语句得是atomic的。。。。

C++11的内存模型中就有专门处理这个的，在这里不展开了）

第五种方法就比较简单粗暴一点：

5. The TSL Instruction (TSL means 'test and set lock' )

就是依靠 TSL 和 XCHG 这两个机器指令的操作原子性来规避race condition。比如， TSL 指令就可以被这样使用：

1 enter_region:
2     TSL REGISTER,LOCK    //copy lock to register and set lock to 1
3     CMP REGISTER, #0     //was lock zero ?
4     JNE enter_region     //if it was non-zero, lock was set, so loop
5     RET                  //return to caller,critical region entered
6 
7 leave_region:
8     MOVE LOCK, #0        //store a 0 in lock
9     RET                  //return to caller

注意 TSL 指令是具有原子性的。

也可以使用 xchg 指令，下文会有，看下文。

其实，这使用 TSL 指令和 xchg 指令的版本就是上文lock variable版本的加强版。上文提到，lock variable这个方法之所以会失败，是因为

while(flag == 1)
    ;
flag = 1;

这几行一句没有办法做到原子性，也就是说，在执行到第二句是可能会产生切换，进而导致race condition。但是，如果使用 TSL 或者 xchg 指令，就可以保证原子性，就可以解决这个问题。

extra bonus，书上提到一句话很值得记下来：

" The CPU executing the TST instruction lock the memory bus to prohibit another CPUs from accessing memoru util it's done. But also note that locking the memory bus is very different from disabling interrupts. Diabling interrupts then performing a read on a memory word following by a write does not prevent a second processor on the bus from accessing the word between the read and write. In fact, disabling processor 1 have no effect on processor 2. The only way to keep processor 2 out of memory until processor 1 is finished is to lock the bus, which require special hardware facility ( basically a bus line asserting the bus line is locked and not available) "

可以看到, strict alternation，Peterson's algorithm 和TSL instruction 都可以解决问题。但是美中不足的是，它们都使用看 "busy waiting"的方法来解决问题（这种锁叫做spin lock），也就是，当我们说，由于一个进程处于critical region而使得另外其他进程被block住的时候，那个被blog的进程其实并不是真的停止了运行（被block住），而是在做空循环，这是很浪费资源的。还有一个问题是，这种busy waiting 的方法会导致 Priority Inversion Problem . 这个问题有关操作系统给进程分配的优先级问题：假设进程 H 的优先级比进程 K 的高，假设这时 K 处于critical regioin中，而 H 却处于无用的循环状态。假设此时CPU的使用权在级别较高的 H 手中，那么，由于优先级的问题，操作系统会一直让 H 运行（因为它的优先级较高，而且系统并不知道无用的循环是无用的循环，他只看到 H 在运行），而 K 就永远没有机会离开critical region了，这反过来会导致 H 永远都在做空循环。

为了解决这个问题，科学家们引入了 "sleep and wakeup" 的方法。

有了这种方法，程序就不需要不断地循环(busy waiting)检查来确认是否可以进入critical region了。当有其他进程处于critical region时，当前进程就会 sleep，当处于critical region的进程离开critical region时，就会将正在sleep的进程wakeup

那么，这些sleep与wakeup是怎么实现的呢？我还没有仔细看源码，但是找到一份很有价值的资料（链接），其中给出了一个方案：

在这种方案里，实现了两个函数： lock() ， unlock() 。

这两个函数和上面的enter_region()，leave_region()功能相同，在将进入critical region是使用 lock() 锁住lock variable，在退出critical region时使用 unlock() 来解锁。为了和资料上的保持一致，我这里就使用 lock() 和 unlock() 而不使用 enter_region(), leave_region()了。

让我们循序渐进地来看看这个方法是如何发挥作用的（要懂得为什么这样实现，就要这样子）：

一开始的 lock() 和 unlock() 的实现是很像上文说的TSL instruction的，也就是借助了X86指令的atomic特性（原子性）：

int flag = 0;

lock()                                            unlock()
{                                                 {
    while(test_and_set(&flag)                        flag = 0;
        ;                                         }
}

test_and_set() 函数的实现如下：

long test_and_set(volatile long *lock)
{
    int old;
    asm("xchgl %0, %1"
        : "=r"(old), "+m"(*lock)   //output
        :"0"(1)                    //input
        :"memory"                  //can clobber anything in memory
        );
    return old;
}

上面用了x86的xchg指令来保证原子性（atomicity）。为了在C语言中插入汇编，用了gcc的asm扩展。如果看不懂，可以借助下面这个伪代码来辅助理解：

int test_and_set(int *lock) {
    int old = *lock;
    *lock = 1;
    return old;
}

（ xchg reg, addr : atomically swap *addr and reg ）

（most spin lock on x86 are implemented using this instruction）

不难发现，上面这个 lock() 和 unlock() 其实跟上文说的TSL instruction方法没什么两样，都是spin lock，同样存在busy waiting和priority inversion的缺点。

为了解决这个问题，运用这么一种思想：假如在lock()的时候失败了（别人已经使用了lock()但是还没有unlock()），就使用一个yield()函数，交出cpu的使用权，通过这样，避免busy waiting：

lock() {
    while(test_and_set(&flag))
        yield();
}

但是，这样子还是有很大的问题：

　　1）依然会有很多的context switch，而context switch是很浪费资源的。

　　1）可能会导致starvation。这是因为yield()的时候并没有显式地指明CPU要交给谁，如果cpu的使用权传来传去还是传回自己手上，那怎么办呢？假设有100个进程0 ... 99，假如0号进程在critical region，其他99个都在使用lock()，然后yield()，那么CPU的使用权可能会在 1 ... 99号进程中传来传去，却始终没有传给0号进程，导致对0号进程的starvation.（应该传给0号进程，好让他完成critical region，然后unlock()，然后1 ... 99号进程的其中一个就可以lock()了）

所以，可以这样：

lock() {                                           
    while(test_and_set(&flag))            
        add_myself_to_wait_queue();          
        yield();
}

unlock() {
    flag = 0
    if(any_thread_in_wait_queue())
        wait_up_one_wait_thread();
}

(上面的add_myself_to_wait_queue()这些都是伪码，意思意思）

这种方法背后的思想是，假如lock()失败了，那么就应该通过yield()交出CPU的使用权。但是，在yield()之前，将自己加入一个wait_queue中，加入wait_queue之后就不会被内核调度到了，而要等别的进程（处于critical region之中的进程）使用unlock()函数来wake_up。这样，就可以避免上面说的"too many context switch"和"starvation”的问题了。

举个例子：

　　假如有0...99这样100个进程，0号进程处于critical region当中，其他1...99号进程在lock()的时候失败了，那么就会将自己加入wait_queue中，加入wait_queue之后，这个进程就不能被内核调度了，只能由0号进程在unlock()的时候wake_up，所以，最多经过99次调度，CPU的使用权一定会再次交还给0号进程，然后0号进程就可以继续完成他的critical region，最后unlock()，将0...99号进程之中的一个wake_up。

但是，这个方法还是不够完美：

　　1）会导致lost wakeup。假如一个进程p1在lock()的时候失败了（进程p0已经lock()了还没unlock()），那么它将会执行add_myself_to_wait_queue()。但是，假设在while(test_and_set(&flag))完成之后，add_myself_to_wait_queue()执行之前，内核决定将CPU的使用权转移给p0，然后p0会接着执行完critical region，然后会unlock()，在unlock()的时候发现wait_queue中没有正在等待的进程（p1还没有来得及将自己加入wait_queue就产生了切换），所以没有执行wake_up，然后直接完成unlock()，退出。接着，p1又获得了CPU的使用权，然后将自己加入wait_queue，但是此时已经没有人来将它wait_up了。。。。

　　2）wrong process gets lock。假设一个进程执行完critical region之后，执行unlock()。在执行unlock()的时候，会先执行flag = 0，然后再去wake_up处于wait_queue中的进程。但是，要知道，其实在flag = 0这一句时锁已经解开了，假如在flag=0完成之后wait_up执行之前产生了切换，那么会是谁得到CPU的使用权呢？不知道。

所以，最后，为了解决以上的种种问题，应该这样实现lock()和unlock():

typedef struct __mutex_t {
    int flag;          //0: mutex is available, 1: mutex not available
    int guard;      //guard lock to avoid lost wakeup
    queue_t *q;  //queue of waitting threads/process
} mutex_t ;

void lock(mutex_t *m) {
    while(test_and_set(m->guard))
        ;  //acquire guard lock by spining
    if (m->flag == 0) {
        m->flag = 1;   //acquire mutex
        m->guard = 0;
    } else {
        enqueue(m->q, self);
        m->guard = 0;
        yield();
    }
}

void unlock(mutex_t *m) {
    while(test_and_set(m->guard))
        ;
    if (queue_empty(m->q))
        //just release mutex. No one wants mutex
        m->flag = 0;
    else
        //direct transfer mutex to next thread
        wakeup(dequeue(m->q));
    m->guard = 0;
}

typedef struct __mutex_t {
    int flag;      //0: mutex is available, 1: mutex not available
    int guard;     //guard lock to avoid lost wakeup
    queue_t *q;    //queue of waitting threads/process
} mutex_t ;

void lock(mutex_t *m) {
    while(test_and_set(m->guard))
        ;  //acquire guard lock by spining
    if (m->flag == 0) {
        m->flag = 1;   //acquire mutex
        m->guard = 0;
    } else {
        enqueue(m->q, self);
        m->guard = 0;
        yield();
    }
}

void unlock(mutex_t *m) {
    while(test_and_set(m->guard))
        ;
    if (queue_empty(m->q))
        //just release mutex. No one wants mutex
        m->flag = 0;
    else
        //direct transfer mutex to next thread
        wakeup(dequeue(m->q));
    m->guard = 0;
}

就是多了一个guard变量。双重保护。

通过上面那样实现的lock()和unlock()，就避免了busy waiting，priority reversion的问题，将会极大提高CPU使用效率。

完）

还有一个读写锁的问题。有空再写。

Interprocess communication primitives 很多都是借助上面所说的方法来实现。所谓的Interprocess communication primitives就是 semaphores , monitor, message passing 之类的东西：

6. Semaphores

7. Mutexes (mutex is a simplified version of semaphore)

8. Monitors

9. Message passing (Message queue)

10. barriers

(Note that Interprocess communication primitives are implemented with the help of atomic system call)

有空再写。

操作系统上机作业。分别用进程和线程的方法实现了生产者消费者问题：

进程实现：

  1 #include <sys/types.h>
  2 #include <unistd.h>
  3 #include <fstream>
  4 #include <iostream>
  5 #include <cstdio>
  6 #include <cstring>
  7 #include <cstdlib>
  8 #include <sys/signal.h>
  9 #include <sys/shm.h>
 10 #include <sys/ipc.h>
 11 #include <sys/stat.h>
 12 
 13 using std::cout;
 14 using std::cin;
 15 using std::endl;
 16 using std::cerr;
 17 
 18 void createPair(char *sh, int *turn, int *interested, int flag);
 19 
 20 void enter_region(int process, int *turn, int *interested) {
 21     int other = 1 - process;
 22     interested[process] = true;
 23     *turn = process;
 24     while (*turn == process && interested[other] == true)
 25         ;
 26     /*this 
 27      * while(turn==process && interested[other]==true) ;
 28      * is very meaningful and could be explained as followd:
 29      *     If 'interested[other] == true' , it means that some 
 30      *     process has been passed line 11. 
 31      *     Now that 'turn==process' mean that you have been grant
 32      *     the use of cpu and set 'turn' to yourself. But, other process maybe have ever
 33      *     set 'turn=process' and you change it.
 34      *     Together, it means that: If the other process is interested in entering critical region,
 35      *     and now you are grant the 'turn' variable, then be polite and let other process use it
 36      */
 37 }
 38 
 39 void leave_region(int process, int *interested) {
 40     interested[process] = false;
 41 }
 42 
 43 //permission bits of share memory object
 44 #define OBJ_PERMS (S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP)
 45 #define LENGTH_SHARED_BUFFER (64 * 1024 * 1024)
 46 #define LENGTH_TURN (sizeof(int))
 47 #define LENGTH_INTERESTED (2 * sizeof(int))
 48 
 49 int main(int argc, char *argv[]) {
 50 
 51     int shmid;
 52 
 53     shmid = shmget(IPC_PRIVATE, LENGTH_SHARED_BUFFER, IPC_CREAT | OBJ_PERMS);
 54     if (-1 == shmid) {
 55         perror("Error in the first shmget(): ");
 56         exit(1);
 57     }
 58     //shared buffer
 59     char *sh = (char *)shmat(shmid, NULL, 0);
 60     if (((void *)-1 == sh)) {
 61         perror("Error in the first shmat(): ");
 62         exit(1);
 63     }
 64 
 65     shmid = shmget(IPC_PRIVATE, LENGTH_TURN, IPC_CREAT | OBJ_PERMS);
 66     if (-1 == shmid) {
 67         perror("Error in the second shmget(): ");
 68         exit(1);
 69     }
 70     int *turn = (int *)shmat(shmid, NULL, 0);
 71     if ((void *)-1 == turn) {
 72         perror("Error in the second shmat(): ");
 73         exit(2);
 74     }
 75 
 76     shmid = shmget(IPC_PRIVATE, LENGTH_INTERESTED, IPC_CREAT | OBJ_PERMS);
 77     if (-1 == shmid) {
 78         perror("Error in the third shmget(): ");
 79         exit(1);
 80     }
 81     int *interested = (int *)shmat(shmid, NULL, 0);
 82     if ((void *)-1 == interested) {
 83         perror("Error in the third shmat(): ");
 84         exit(1);
 85     }
 86 
 87     createPair(sh, turn, interested, 0);
 88     memset(sh, 0, LENGTH_SHARED_BUFFER);
 89     memset(turn, 0, LENGTH_TURN);
 90     memset(interested, 0, LENGTH_INTERESTED);
 91     createPair(sh, turn, interested, 1);
 92 
 93     //detach shared memory
 94     shmdt(sh);
 95     shmdt(turn);
 96     shmdt(interested);
 97     return 0;
 98 }
 99 
100 void createPair(char *sh, int *turn, int *interested, int flag) {
101     pid_t ret = fork();
102     if (ret > 0) { //parent. also the producer. note that now ret is the child process ID
103         int pid = getpid();
104         if (0 == flag) {
105             //pair one
106             for (int i = 0; i < 10000; i++) {
107                 enter_region(1, turn, interested);
108                 sprintf(&sh[strlen(sh)], "%d %d\n", pid, i);
109                 leave_region(1, interested);
110             }
111         } else {
112             //pair 2
113             for (int i = 0; i < 26; i++) {
114                 enter_region(1, turn, interested);
115                 sprintf(&sh[strlen(sh)], "%d %c\n", pid, (char)(i + 'a'));
116                 leave_region(1, interested);
117             }
118         }
119         while (true) {
120             enter_region(1, turn, interested);
121             if (strlen(sh) == 0) {
122                 kill(ret, SIGKILL);
123                 return;
124             }
125             leave_region(1, interested);
126         }
127     } else if (0 == ret) {
128         std::ofstream outfile;
129         if (0 == flag) {
130             outfile.open("a.out", std::ofstream::trunc);
131         } else {
132             outfile.open("b.out", std::ofstream::trunc);
133         }
134         outfile<<std::noskipws;
135         while (true) {
136             enter_region(0, turn, interested);
137             outfile<<sh<<std::flush;//very important to flush the output buffer
138             memset(sh,0,strlen(sh));
139             leave_region(0, interested);
140         }
141     } else {
142         std::cout << "what the fuck error\n";
143         return;
144     }
145 }

线程(pthread)实现：

  1 #include <pthread.h>
  2 #include <iostream>
  3 #include <fstream>
  4 #include <unistd.h>
  5 #include <signal.h>
  6 #include <cstdio>
  7 #include <cstring>
  8 #include <string>
  9 using std::string;
 10 using std::cout;
 11 using std::cin;
 12 using std::cerr;
 13 using std::endl;
 14 
 15 pthread_mutex_t mutex_1;
 16 pthread_mutex_t mutex_2;
 17 char *buffer_1;
 18 char *buffer_2;
 19 bool finished_1;
 20 bool finished_2;
 21 pthread_t threads[4];
 22 
 23 void *produce(void *ptr);
 24 void *consume(void *ptr);
 25 
 26 int main(int argc, char *argv[]) {
 27     buffer_1 = new char[64 * 1024 * 1024];
 28     buffer_2 = new char[64*1024];
 29     pthread_attr_t attr;
 30     pthread_attr_init(&attr);
 31     pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
 32     char flag;
 33 
 34     flag = '0';
 35     if (pthread_create(&threads[0], &attr, produce, (void *)flag) != 0)
 36         std::cerr << "What the **** error when creating producer_1\n";
 37     if (pthread_create(&threads[1], &attr, consume, (void *)flag) != 0)
 38         std::cerr << "what the **** error when creating consumer_1\n";
 39 
 40     flag = '1';
 41     if (pthread_create(&threads[2], &attr, produce, (void *)flag) != 0)
 42         std::cerr << "what the **** error when creating producer_2\n";
 43     if (pthread_create(&threads[3], &attr, consume, (void *)flag) != 0)
 44         std::cerr << "what the **** error when creating consumer_2\n";
 45 
 46     pthread_attr_destroy(&attr);
 47 
 48     std::cout << "Main program complete. Calling pthread_exit()\n";
 49     pthread_exit(NULL);
 50 }
 51 
 52 void *produce(void *ptr) {
 53     char *buffer;
 54     pthread_mutex_t *mutex_ptr;
 55     int thread_index;
 56     bool *finished;
 57     char flag = (char)ptr;
 58     if (flag == '0') {
 59         buffer = buffer_1;
 60         mutex_ptr = &mutex_1;
 61         thread_index = 1;
 62         finished=&finished_1;
 63     } else if (flag == '1') {
 64         buffer = buffer_2;
 65         mutex_ptr = &mutex_2;
 66         thread_index = 3;
 67         finished=&finished_2;
 68     } else {
 69         cerr << "flag invalid" << endl;
 70         exit(1);
 71     }
 72 
 73     pid_t pid = getppid();
 74     if (flag == '0') {
 75         for (int i = 0; i < 10000; i++) {
 76             pthread_mutex_lock(mutex_ptr);
 77             sprintf(&buffer[strlen(buffer)], "%d %d\n", pid, i);
 78             pthread_mutex_unlock(mutex_ptr);
 79         }
 80     } else {
 81         for (int i = 0; i < 26; i++) {
 82             pthread_mutex_lock(mutex_ptr);
 83             sprintf(&buffer[strlen(buffer)], "%d %c\n", pid, (char)i + 'a');
 84             pthread_mutex_unlock(mutex_ptr);
 85         }
 86     }
 87     /*
 88     while (true) {
 89         pthread_mutex_lock(mutex_ptr);
 90         if (0 == strlen(buffer)) {
 91             //if (pthread_cancel(threads[thread_index]) != 0) {
 92             if (pthread_kill(threads[thread_index], SIGKILL) != 0) {
 93                 cout << "F**k cannot kill/cancel\n";
 94                 perror("Error: ");
 95             } else {
 96                 cout << "Canceled successfully \n";
 97                 pthread_exit(NULL);
 98             }
 99         }
100         pthread_mutex_unlock(mutex_ptr);
101     }
102     */
103     pthread_mutex_lock(mutex_ptr);
104     *finished=true;
105     pthread_mutex_unlock(mutex_ptr);
106     pthread_exit(NULL);
107     //return nullptr;
108 }
109 
110 void *consume(void *ptr) {
111     string file_name;
112     pthread_mutex_t *mutex_ptr;
113     char *buffer;
114     char flag = (char)ptr;
115     bool *finished;
116     if (flag == '0') {
117         file_name = "a.out";
118         mutex_ptr = &mutex_1;
119         buffer = buffer_1;
120         finished=&finished_1;
121     } else if (flag == '1') {
122         file_name = "b.out";
123         mutex_ptr = &mutex_2;
124         buffer = buffer_2;
125         finished=&finished_2;
126     } else {
127         cerr << "flag invalid" << endl;
128         exit(1);
129     }
130 
131     std::ofstream outfile;
132     outfile.open(file_name, std::ofstream::trunc);
133     outfile << std::noskipws;
134     while (true) {
135         pthread_mutex_lock(mutex_ptr);
136         outfile << buffer << std::flush; //very important to flush the buffer
137         memset(buffer, 0, strlen(buffer));
138         if(*finished){
139             cout<<"return now"<<endl;
140             pthread_exit(NULL);
141             //return nullptr;
142         }
143         pthread_mutex_unlock(mutex_ptr);
144     }
145 }

我这里所实现的producer-consumer不完全同于教科书上的，因为我用了一个超大的buffer。看错题了。。。[掩面哭]

还有一个多线程计算矩阵的：

  1 #include <pthread.h>
  2 #include <iostream>
  3 #include <fstream>
  4 #include <unistd.h>
  5 #include <signal.h>
  6 #include <cstdio>
  7 #include <cstdlib>
  8 #include <cstring>
  9 #include <string>
 10 #include <ctime>
 11 using std::string;
 12 using std::cout;
 13 using std::cin;
 14 using std::cerr;
 15 using std::endl;
 16 
 17 double *vec_1;
 18 double *vec_2;
 19 double result;
 20 int thread_unfinished;
 21 void compute_result(int thread_num, long long N);
 22 void initVec(double *vec, long long N);
 23 void *compute(void *determine_task);
 24 
 25 int main(int argc, char *argv[]) {
 26     if (argc != 3) {
 27         cerr << "Incorrect cmd-argument!" << endl;
 28         cerr << "Usage: " ;
 29         cerr << "vec_mul thread_num N" << endl;
 30         exit(1);
 31     }
 32     int thread_num;
 33     long N;
 34     thread_num = atoi(argv[1]);
 35     N = atoi(argv[2]);
 36     if (thread_num < 1 || thread_num > 16) {
 37         cerr << "Error: "
 38              << "Two many thread" << endl;
 39         exit(1);
 40     } else if (N < 100000) {
 41         cerr << "Warning: vector too small" << endl;
 42     }
 43     thread_unfinished = thread_num;
 44 
 45     try {
 46         vec_1 = new double[N];
 47         vec_2 = new double[N];
 48     } catch (...) {
 49         cerr << "Exception when allocated memory for vector." << endl;
 50         exit(1);
 51     }
 52     initVec(vec_1, N);
 53     initVec(vec_2, N);
 54 
 55     clock_t start=clock();
 56     compute_result(thread_num, N);
 57     while (true) {
 58         if (thread_unfinished == 0) {
 59             clock_t stop=clock();
 60             double elapsed=(double)(stop-start)*1000.0/CLOCKS_PER_SEC;
 61             printf("s=%.2lf,t=%.9lf(ms)\n", result, elapsed);
 62             break;
 63         }
 64     }
 65 }
 66 
 67 void initVec(double *vec, long long N) {
 68     for (int i = 0; i < N; i++) {
 69         if (i % 3 == 0) {
 70             vec[i] = 1.0;
 71         } else if (i % 3 == 1) {
 72             vec[i] = -1.0;
 73         } else {
 74             vec[i] = 0.0;
 75         }
 76     }
 77 }
 78 
 79 void compute_result(int thread_num, long long N) {
 80     pthread_t *threads = new pthread_t[thread_num];
 81     pthread_attr_t attr;
 82     pthread_attr_init(&attr);
 83     pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
 84     int piece_length = N / thread_num;
 85     for (int i = 0; i < thread_num; i++) {
 86         int *determine_task = new int[2];//new without delete
 87         determine_task[0] = i * piece_length;
 88         if (i != thread_num - 1) {
 89             determine_task[1] = piece_length;
 90         } else {
 91             determine_task[1] = N - i * piece_length;
 92         }
 93         pthread_create(&threads[i], &attr, compute, (void *)determine_task);
 94     }
 95 }
 96 
 97 void *compute(void *determine_task) {
 98     int *task = (int *)determine_task;
 99     double piece_result = 0.0;
100     int index;
101     int times;
102     for (index = task[0], times = 0; times < task[1]; times++, index++) {
103         piece_result += vec_1[index] * vec_2[index];
104     }
105     result += piece_result;
106     thread_unfinished--;
107     return nullptr;
108 }

关于shared_memory，想到一个问题：

　　fork()之后，parent和child拥有了两个完全不同的地址空间，parent和child里面的那个shared_memory指针所指向的内存（virtual memory）会映射到同一块物理内存中吗？

实际上是的，就像上面代码所用的那样。但是为什么？毕竟，如果在parent中new一块内存，在fork()之后，parent和child里面都会有两块内存了，所以更改其中一块并不会影响另外一块。

但是，shared_memory却是共享的，更改其中一块会影响另外一块。百思不得其解。

直到我看见下面这张图：

Memory management:

每个process都有其自己的page table（用于将virtual memory映射到physical memory）

posted @ 2016-03-31 23:20 walkerlala 阅读(425) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

walkerlala

Operating System: Process, Thread, IPC，shared_memory

公告