Some problems in openMP's parallel for

Overview

Somehow I started preparing for the ASC competition.
When I’m trying my second demo pi, which is a program running Monte-Carlo algorithm with multi-threading tech, I encountered a question.

Question-Solution

1. Initial program

// pi.cpp

#include <iostream>
#include <fstream>
#include <omp.h>
using namespace std;

fstream fin("/dev/urandom", ios::in|ios::binary);

u_int32_t randomNum() {
    u_int32_t ret;
    fin.read((char*)&ret, sizeof(u_int32_t));
    return ret;
}

int main() {

    int64_t inCircleHits = 0;
    int64_t totalHits = 1000000;

#pragma omp parallel for num_threads(4) reduction(+:inCircleHits)
    for (int i=1 ; i<=totalHits ; i++) {
        double x=1.0*randomNum()/__UINT32_MAX__;
        double y=1.0*randomNum()/__UINT32_MAX__;
        if ((x*x+y*y)<=1.0) {
            inCircleHits++;
        }
    }   

    clog << 4.0*inCircleHits/totalHits << endl;
    fin.close();
    return 0;
}

Running environment and results:

LLVM-clang++ + external openMP library on macOS: 3.9969 (always 3.9+)
GCC-g++ + built-in openMP library on macOS: 10: Bus Error
GCC-g++ + built-in openMP library on linux: Segmentation fault(core dumped)

Analysis: ??? (Browse online for a couple of hours…)
Gain discovery: On the Internet tutorials & examples, the loop variables are always inititialized with 0.

2. Second Trial

At this time, I initialized the loop variable i with 0

for (int i=0 ; i<totalHits ; i++)

Running environment and results:

LLVM-clang++ + external openMP library on macOS: 3.9925 (always 3.9+)
GCC-g++ + built-in openMP library on macOS: 3.9912 (always 3.9+) 
GCC-g++ + built-in openMP library on linux: Segmentation fault(core dumped)

Analysis: Errr… at least we fixed a internal exception.
But why do the answers incorrect? And why does it fails on Linux platform?
Hypothesis: Maybe std::fstream is to blame. The objects in c++ (may be) not multiThread-safe.

Trial 3

Try substitute std::fstream with the old friend FILE *
Changed all cpp to c

// pi.c

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

FILE *randomFile;

u_int32_t randomNum() {
    u_int32_t ret;
    fread((char*)(&ret), sizeof(u_int32_t), 1, randomFile);
    return ret;
}

int main() {

    randomFile = fopen("/dev/urandom", "rb");
    int64_t inCircleHits = 0;
    int64_t totalHits = 1000000;

#pragma omp parallel for num_threads(4) reduction(inCircleHits)
    for (int i=0 ; i<totalHits ; i++) {
        double x=1.0*randomNum()/__UINT32_MAX__;
        double y=1.0*randomNum()/__UINT32_MAX__;
        if ((x*x+y*y)<=1.0) {
            inCircleHits++;
        }   
    }   

    printf("%f", 4.0*inCircleHits/totalHits);
    fclose(randomFile);
    fflush(stdout);
    return 0;

}

Running environment and results: blog

LLVM-clang++ + external openMP library on macOS: 3.135 (correct)
GCC-g++ + built-in openMP library on macOS: 3.141 (correct) 
GCC-g++ + built-in openMP library on linux: 3.132 (correct)

Yea. As we could see, it’s running well.

4. variable control

Let’s see what happens if we initialize loop variable i with 1

for (int i=1 ; i<=totalHits ; i++)

Running environment and results

LLVM-clang++ + external openMP library on macOS: (correct)
GCC-g++ + built-in openMP library on macOS: (correct) 
GCC-g++ + built-in openMP library on linux: (correct)

So, the loop variable isn’t the decicive factor, std::fstream is!

Conclusion

Do not ever use cpp’s objects unless you make sure it’s safe under a multiThreading context.

 

posted @ 2020-10-17 13:27  whsu  阅读(137)  评论(0编辑  收藏  举报