编译器优化性能初步比较

OS：Windows XP 32 bit

CPU：Intel Mobile Core 2 Duo T6600

一、混合四则运算

main.c

#include <stdio.h>
#include <time.h>
int main()
{
      int i,j,a=1,b=1;
      float c=1.0,d=1.0;
      double e=1.0,f=1.0;
      double start, finish, duration;
      start=clock();
      for (i = 0; i < 1000; i++)
      {
            for (j = 0; j < 1000000; j++)
            {
                  a = a + 50;
                  b = a - 100;
                  a = b * 20;
                  c = a + 300.89;
                  d = c - 600.89;
                  c = d * 90.89;
                  d = c / 55.89;
                  e = c * 90.89;
                  f = e / 55.89;
            }
      }
      finish=clock();
      duration=finish-start;
      printf("%f,%f\n",e,f);
      printf("%10e",duration);
      return 0;
}

耗时比较（单位：秒）

	O1	O2	O3（Ox）	优化集合（无快速浮点优化）	优化集合
VS2008 C/C++ Compiler	10.015	9.530	9.530	2.734	1.968
gcc4.4.4	10.250	10.250	10.265	7.203	5.328
gcc4.5.1	10.390	10.375	10.969	6.156	4.265
Intel C/C++ Compiler 11.1	9.375	9.343	9.343	9.015	8.843

优化集合为

VS2008 C/C++ Compiler /Ox /Ob2 /Og /Oi /Ot /Oy /fp:fast /arch:SSE2

gcc4.4.4
gcc4.5.1 -O3 -ftracer -fivopts -ftree-loop-linear -ftree-vectorize -fforce-addr -fomit-frame-pointer -fno-bounds-check -funroll-loops -ffast-math -march=native -mfpmath=sse -mmmx -msse -msse2 -msse3

Intel C/C++ Compiler 11.1 /fast /O3 /Ot /Og /Oi /Qipo /QxHost /arch:SSE3 /Qunroll /Qvec /Quse-intel-optimized-headers /Qparallel /fp:fast=2 /Ob2 /GT /GA

二、三角函数

main.c(来源于Intel官方)
#include <stdio.h>
#include <stdlib.h> 
#include <time.h> 
#include <math.h>

#define INTEG_FUNC(x)  abs(sin(x))

int main(void)
{
   unsigned int i, j, N;
   double step, x_i, sum;
   double start, finish, duration;
   double interval_begin = 0.0;
   double interval_end = 2.0 * 3.141592653589793238;

   start = clock();

   printf("     \n");
   printf("    Number of    | Computed Integral | \n");
   printf(" Interior Points |                   | \n");
   for (j=2;j<27;j++)
   {
    printf("------------------------------------- \n");

     N =  1 << j;
     step = (interval_end - interval_begin) / N;
     sum = INTEG_FUNC(interval_begin) * step / 2.0;

     for (i=1;i<N;i++)
     {
        x_i = i * step;
        sum += INTEG_FUNC(x_i) * step;
     }

     sum += INTEG_FUNC(interval_end) * step / 2.0;

     printf(" %10d      |  %14e   | \n", N, sum);
   }
   finish = clock();
   duration = (finish - start);
   printf("     \n");
   printf("   Application Clocks   = %10e  \n", duration);
   printf("     \n");
}
耗时比较（单位：秒）

O1 O2 O3（Ox）优化集合（无快速浮点优化）优化集合

VS2008 C/C++ Compiler 9.687 9.343 8.734 8.281 6.843

gcc4.4.4 20.219 20.296 20.593 15.062 15.046

gcc4.5.1 20.125 19.953 20.094 15.000 15.187

Intel C/C++ Compiler 11.1 6.640 4.828 4.828 4.812 4.812

优化集合同上

三、OpenMP测试

prime.cpp
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>

int main(int argc, char *argv[])
{
    int i;
    int start, end; 
    int    number_of_primes=0; 
    int number_of_41primes=0;
    int number_of_43primes=0;
    double s1,s2;
    start = 1;
    end = 40000000; 

    printf("Range to check for Primes: %d - %d\n\n",start, end);
    s1=clock();
#pragma omp parallel for schedule(dynamic,100) 
        reduction(+:number_of_primes,number_of_41primes,number_of_43primes)

    for (i = start; i <= end; i += 2) {
        int limit, j, prime;
        limit = (int) sqrt((float)i) + 1;
        prime = 1; 
        j = 3;
        
        while (prime && (j <= limit)) {
            if (i%j == 0) prime = 0;
            j += 2;
        }
        if (prime) {
            number_of_primes++;
            if (i%4 == 1) number_of_41primes++;
            if (i%4 == 3) number_of_43primes++;
        }
    }
    s2=clock();
      printf("\n%10e\n",s2-s1);
    printf("\nProgram Done.\n %d primes found\n",number_of_primes);
    printf("\nNumber of 4n+1 primes found: %d\n",number_of_41primes);
    printf("\nNumber of 4n-1 primes found: %d\n",number_of_43primes);

    return 0; 
}
采用优化集合+OpenMP参数

其中，VS2008为/openmp，gcc为-fopenmp，intel compiler为/Qopenmp。

VS2008 C/C++ Compiler 16.781

gcc4.4.4 16.828

gcc4.5.1 15.672

Intel C/C++ Compiler 11.1 16.703

四、Fortran Compiler测试

Fortran编译器和以上的结果类似，除了VS2008（不支持Fortran），

gfortran在普通计算上和intel compiler相差很少，

只是在三角函数运算上落后较多。

linpk标准测试

代码来源：http://www.polyhedron.com/compare0html

O1 O2 O3 优化集合（无快速浮点优化）优化集合

gfortran4.4.4 25.109 24.938 25.172 24.846 24.922

gfortran4.5.1 24.375 24.313 24.203 24.063 24.234

Intel Fortran Compiler 11.1 25.813 25.188 25.016 25.484 25.203

矩阵相乘测试(内置函数)

main.f90
program main
implicit none
real(kind = 8) :: A(2000, 2000), B(2000, 2000), C(2000, 2000)
real(kind = 8) :: time_begin, time_end

CALL RANDOM_SEED()
CALL RANDOM_NUMBER(A)
CALL RANDOM_NUMBER(B)

CALL CPU_TIME(time_begin)
C=matmul(A, B)
CALL CPU_TIME(time_end)
WRITE(*,*)"consumed CPU_time(s):", time_end - time_begin

end program
O1 O2 O3 优化集合（无快速浮点优化）优化集合

gfortran4.4.4 15.500 15.563 15.688 15.656 15.469

Intel Fortran Compiler 11.1 37.734 37.359 4.484 5.047 4.953

矩阵相乘测试（调用原始blas）

blas代码来源：http://www.netlib.org/lapack/

main.f90
program main
implicit none
real(kind = 8) :: A(2000, 2000), B(2000, 2000), C(2000, 2000)
real(kind = 8) :: time_begin, time_end

CALL RANDOM_SEED()
CALL RANDOM_NUMBER(A)
CALL RANDOM_NUMBER(B)

CALL CPU_TIME(time_begin)
CALL dgemm('N', 'N', 2000, 2000, 2000, 1.0_8, A, 2000, B, 2000, 0.0_8, C, 2000)
CALL CPU_TIME(time_end)
WRITE(*,*)"consumed CPU_time(s):", time_end - time_begin

end program
O1 O2 O3 优化集合（无快速浮点优化）优化集合

gfortran4.4.4 18.500 17.844 17.391 17.016 17.156

Intel Fortran Compiler 11.1 14.938 13.969 13.938 18.227 18.430

五、结论

Intel Compiler在测试中表现良好，尤其对内置函数进行了比较多的优化，VS2008亦表现不错，

gcc除了在三角函数计算里远远落后外，其他的性能表现也还是不错的，考虑到gcc的开源跨平台，因此

占有比Intel Compiler和M$ Compiler更重要的位置。

posted on 2010-08-26 00:20 PcX 阅读(4834) 评论(1) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

PcX

编译器优化性能初步比较

公告

导航

VS2008 C/C++ Compiler	/Ox /Ob2 /Og /Oi /Ot /Oy /fp:fast /arch:SSE2
gcc4.4.4 gcc4.5.1	-O3 -ftracer -fivopts -ftree-loop-linear -ftree-vectorize -fforce-addr -fomit-frame-pointer -fno-bounds-check -funroll-loops -ffast-math -march=native -mfpmath=sse -mmmx -msse -msse2 -msse3
Intel C/C++ Compiler 11.1	/fast /O3 /Ot /Og /Oi /Qipo /QxHost /arch:SSE3 /Qunroll /Qvec /Quse-intel-optimized-headers /Qparallel /fp:fast=2 /Ob2 /GT /GA

VS2008 C/C++ Compiler	16.781
gcc4.4.4	16.828
gcc4.5.1	15.672
Intel C/C++ Compiler 11.1	16.703

	O1	O2	O3	优化集合（无快速浮点优化）	优化集合
gfortran4.4.4	25.109	24.938	25.172	24.846	24.922
gfortran4.5.1	24.375	24.313	24.203	24.063	24.234
Intel Fortran Compiler 11.1	25.813	25.188	25.016	25.484	25.203