本帖最后由 八宝粥 于 2013-7-1 22:29 编辑
CPU
Linpack基准测试已经完成对Arm的linpack基准测试, 选用gcc带-O3 (优化等级3)编译代码[1]. 用200大小的数组运行. 包括软件浮点
源码编译/运行- cc -O3 -o linpack linpack.c -lm
- linpack.c: In function ‘main’:
- linpack.c:69: warning: return type of ‘main’ is not ‘int’
- ./linpack
- Enter array size (q to quit) [200]: 200
复制代码 结果Crippled
- Memory required: 315K.
- LINPACK benchmark, Double precision.
- Machine precision: 15 digits.
- Array size 200 X 200.
- Average rolled and unrolled performance:
- Reps Time(s) DGEFA DGESL OVERHEAD KFLOPS
- 2 0.53 92.45% 1.89% 5.66% 5493.333
- 4 1.07 92.52% 2.80% 4.67% 5385.621
- 8 2.12 92.45% 2.36% 5.19% 5466.003
- 16 4.24 92.45% 2.83% 4.72% 5438.944
- 32 8.49 92.11% 2.71% 5.18% 5459.213
- 64 16.98 92.05% 2.89% 5.06% 5452.440
复制代码硬件浮点 (-mfloat-abi=softfp)
- Memory required: 315K.
- LINPACK benchmark, Double precision.
- Machine precision: 15 digits.
- Array size 200 X 200.
- Average rolled and unrolled performance:
- Reps Time(s) DGEFA DGESL OVERHEAD KFLOPS
- 8 0.51 90.20% 3.92% 5.88% 22888.889
- 16 1.02 89.22% 4.90% 5.88% 22888.889
- 32 2.05 90.24% 3.41% 6.34% 22888.889
- 64 4.08 91.42% 2.94% 5.64% 22829.437
- 128 8.16 91.54% 2.94% 5.51% 22799.827
- 256 16.31 91.35% 2.76% 5.89% 22903.800
复制代码Raspbian下的全硬件浮点 (-mfloat-abi=hard -mfpu=vfp), 频率arm_freq=700
- Memory required: 315K.
- LINPACK benchmark, Double precision.
- Machine precision: 15 digits.
- Array size 200 X 200.
- Average rolled and unrolled performance:
- Reps Time(s) DGEFA DGESL OVERHEAD KFLOPS
- 16 0.58 89.66% 3.45% 6.90% 40691.358
- 32 1.17 87.18% 4.27% 8.55% 41071.651
- 64 2.32 88.36% 3.02% 8.62% 41459.119
- 128 4.67 88.22% 3.43% 8.35% 41071.651
- 256 9.33 88.85% 3.32% 7.82% 40880.620
- 512 18.63 89.00% 2.95% 8.05% 41047.675
复制代码Raspbian下的全硬件浮点 (-mfloat-abi=hard -mfpu=vfp), 频率arm_freq=1000 core_freq=500
- Memory required: 315K.
- LINPACK benchmark, Double precision.
- Machine precision: 15 digits.
- Array size 200 X 200.
- Average rolled and unrolled performance:
- Reps Time(s) DGEFA DGESL OVERHEAD KFLOPS
- 32 0.79 89.87% 0.00% 10.13% 61896.714
- 64 1.58 89.24% 1.27% 9.49% 61463.869
- 128 3.16 90.19% 1.90% 7.91% 60407.789
- 256 6.32 88.13% 3.80% 8.07% 60511.761
- 512 12.65 87.83% 3.56% 8.62% 60825.836
复制代码Gentoo下的全硬件浮点, 带编译器优化(gcc-4.6.3 -Ofast -fno-fast-math), 默认时序
- Memory required: 315K.
- LINPACK benchmark, Double precision.
- Machine precision: 15 digits.
- Array size 200 X 200.
- Average rolled and unrolled performance:
- Reps Time(s) DGEFA DGESL OVERHEAD KFLOPS
- ----------------------------------------------------
- 16 0.56 89.29% 1.79% 8.93% 43084.967
- 32 1.13 91.15% 4.42% 4.42% 40691.358
- 64 2.25 89.78% 3.56% 6.67% 41853.968
- 128 4.51 87.80% 4.21% 7.98% 42358.233
- 256 9.01 88.68% 3.88% 7.44% 42155.076
- 512 18.01 89.23% 2.78% 8.00% 42434.923
复制代码 Whetstone/Dhrystone综合基准测试用gcc带参数-float-abi=softfp -O3编译全部代码
源码编译/运行结果Dhrystone
- Microseconds for one run through Dhrystone: 1.2
- Dhrystones per Second: 809061.5
复制代码Whetstone Crippled
- Loops: 1000, Iterations: 10, Duration: 24 sec.
- C Converted Double Precision Whetstones: 41.7 MIPS
复制代码用'gcc -mfpu -float-abi=softfp'重新编译Whetstone, 结果更好:
- Loops: 1000, Iterations: 100, Duration: 106 sec.
- C Converted Double Precision Whetstones: 94.3 MIPS
复制代码上面的测试没有带-mfpu=vfp编译, 所以大部分运算时间都花费在SQRT方法上. 用了vfp后提升很大:
- Loops: 1000, Iterations: 100, Duration: 15 sec.
- C Converted Double Precision Whetstones: 666.7 MIPS
复制代码 OpenSSL安全协议测试源码编译/运行- openssl version;
- openssl speed;
复制代码 结果关闭汇编优化:
- OpenSSL 0.9.8o 01 Jun 2010
- built on: Thu Aug 26 18:56:26 UTC 2010
- options:bn(64,32) md2(int) rc4(ptr,int) des(idx,risc1,4,long) aes(partial) blowfish(idx)
- compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O2 -Wa,--noexecstack -g -Wall
- available timing options: TIMES TIMEB HZ=100 [sysconf value]
- timing function used: times
- The 'numbers' are in 1000s of bytes per second processed.
- type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
- md2 148.81k 372.18k 624.81k 769.95k 832.90k
- mdc2 0.00 0.00 0.00 0.00 0.00
- md4 615.30k 2468.76k 7612.19k 16707.01k 28104.86k
- md5 380.13k 1501.12k 4800.77k 11312.81k 21682.77k
- hmac(md5) 1022.28k 3480.23k 9587.80k 17492.25k 25441.78k
- sha1 303.72k 1092.39k 3106.50k 6302.57k 9852.39k
- rmd160 244.29k 849.04k 2414.53k 4747.26k 7513.00k
- rc4 14658.70k 16836.49k 17462.03k 17628.21k 17522.08k
- des cbc 2913.17k 3221.30k 3289.77k 3360.09k 3367.21k
- des ede3 1149.87k 1188.59k 1198.46k 1206.00k 1208.25k
- idea cbc 0.00 0.00 0.00 0.00 0.00
- seed cbc 0.00 0.00 0.00 0.00 0.00
- rc2 cbc 2812.71k 3012.02k 3054.19k 3077.82k 3076.12k
- rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00
- blowfish cbc 6091.32k 7007.89k 7250.62k 7288.21k 7163.88k
- cast cbc 5068.25k 6020.03k 6345.71k 6367.64k 6260.44k
- aes-128 cbc 3205.76k 3497.72k 3616.00k 3652.49k 3665.85k
- aes-192 cbc 2730.65k 2981.88k 3073.20k 3102.38k 3111.86k
- aes-256 cbc 2383.90k 2596.12k 2659.91k 2702.13k 2732.50k
- camellia-128 cbc 0.00 0.00 0.00 0.00 0.00
- camellia-192 cbc 0.00 0.00 0.00 0.00 0.00
- camellia-256 cbc 0.00 0.00 0.00 0.00 0.00
- sha256 679.98k 1629.47k 2905.43k 3708.32k 4175.45k
- sha512 41.02k 163.83k 232.63k 318.20k 353.81k
- aes-128 ige 3089.03k 3579.08k 3698.68k 3689.14k 3578.18k
- aes-192 ige 2641.68k 3019.45k 3111.38k 3144.95k 3035.70k
- aes-256 ige 2334.50k 2632.35k 2705.04k 2735.69k 2687.74k
- sign verify sign/s verify/s
- rsa 512 bits 0.013747s 0.001193s 72.7 838.4
- rsa 1024 bits 0.063481s 0.002742s 15.8 364.7
- rsa 2048 bits 0.321250s 0.007378s 3.1 135.5
- rsa 4096 bits 1.805000s 0.022528s 0.6 44.4
- sign verify sign/s verify/s
- dsa 512 bits 0.011690s 0.013597s 85.5 73.5
- dsa 1024 bits 0.027233s 0.031683s 36.7 31.6
- dsa 2048 bits 0.073897s 0.087304s 13.5 11.5
复制代码