Coder Social home page Coder Social logo

cpufp's Introduction

cpufp

This is a cpu tool for benchmarking the peak performance of floating-points and AI ISAs.

It can automatically sense the local SIMD|DSA ISAs while compiling.

Support OS and ISA

Arch Linux MacOS Windows
arm64 yes no no
e2k yes no no
loongarch64 yes no no
riscv64 yes no no
x86-64 yes no no

Support x86-64 SIMD|DSA ISA

Arch ISA Feature Data Type Description
SIMD SSE Vector fp32 Before Sandy Bridge
SIMD SSE2 Vector fp64 Before Sandy Bridge
SIMD AVX Vector fp32/fp64 From Sandy Bridge
SIMD FMA Vector fp32/fp64 From Haswell/Zen
SIMD AVX512f Vector fp32/fp64 From Skylake X/Zen4
SIMD AVX512_VNNI Vector int8/int16 From IceLake
SIMD AVX_VNNI Vector int8/int16 From Alder Lake
SIMD AVX512_FP16 Vector fp16 From Intel Sapphire Rapids
SIMD AVX512_BF16 Vector bf16 From AMD Zen4
SIMD AVX_VNNI_INT8 Vector int8 Unknown
DSA AMX_INT8 Matrix int8 From Intel Sapphire Rapids
DSA AMX_BF16 Matrix bf16 From Intel Sapphire Rapids

Support arm64 SIMD ISA

Arch ISA Feature Data Type Description
SIMD asimd Vector fp32/fp64 From Cortex-A57/A53
SIMD asimd_hp Vector fp16 From Cortex-A75/A55
SIMD asimd_dp Vector int8 From Cortex-A75/A55
SIMD bf16 Matrix bf16 From Cortex-X2/A710/A510
SIMD i8mm Matrix int8 From Cortex-X2/A710/A510

Support riscv64 VECTOR ISA

Arch ISA Feature Data Type Description
SIMD V Vector fp16/fp32/fp64 From RISC-V "V" vector extension. Version 1.0
DSA ime Matrix int8 From SpacemiT-X60

NOTE: ime is a SpacemiT custom vendor extension.

Support loongarch64 ISA

Arch ISA Feature Data Type Description
SIMD LASX Vector fp32/fp64 From Loongson 3A5000
SIMD LSX Vector fp32/fp64 From Loongson 3A5000
Scalar FP Scalar fp32/fp64 From Loongson 3A5000

Support e2k ISA

Arch ISA Feature Vector Width Data Type Description
SIMD v6 Vector 128 fp32/fp64 FMA
SIMD v5 Vector 128 fp32/fp64 Combined operations
Scalar v1-v4 Scalar fp64 Combined operations
SIMD v1-v4 Vector 64 fp32 Combined operations

Combined operations

E2K has support for instructions that perform two independant operations. It is like FMA, but with additional rounding as these operations is independant.

Example fmul_addd

fmul_addd src1, src2, src3, dst
Description

Multiply double-precision (64-bit) floating-point values from src1 and src2, and add the intermediate result to value from src3. Store the result in dst.

Operation
dst[63:0] := src3[63:0] + src1[63:0] * src2[63:0]
Latency and Throughput
Architecture Latency Throughput (CPI) ALC
elbrus-v4 8 0.16 012345
elbrus-v1 8 0.25 01-34-
  • ALC (Arithmetic Logic Complex/Channel) is an execution port for RISC-like instructions

How to build

build x64 version:

./build_x64.sh

build arm64 version:

./build_arm64.sh

build riscv64 version:

./build_riscv64.sh

build loongarch64 version:

./build_loongarch64.sh

build e2k version:

./build_e2k.sh

clean:

./clean.sh

How to benchmark

./cpufp --thread_pool=[xxx] --idle_time=yyy

--thread_pool: [xxx] is the list of cpu thread to benchmarking, from setting affinities. Please reference the result of lstopo command. For example, [0,3,5-8,13-15].

--idle_time: the interval time(sec) between any two adjacent benchmarks, default is 0.

Benchmark results

x86-64 cpufp benchmark results

arm64 cpufp benchmark results

riscv64 cpufp benchmark results

loongarch64 cpufp benchmark results

e2k cpufp benchmark results

Todo list

Add armv9(SVE, SVE2 & SME) Supports.

cpufp's People

Contributors

concyclics avatar katyushascarlet avatar numas13 avatar phoebus-ma avatar pigirons avatar zhangyuef avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cpufp's Issues

I can not get the tops on K1, where I can modify ?

  1. K1 system info:
    Linux k1 6.1.15 #1.0 SMP PREEMPT Thu May 30 13:16:13 UTC 2024 riscv64 riscv64 riscv64 GNU/Linux
    make sure disable DVFS
    图片

图片

  1. but I can not get the 533 GFLOPS of vfmacc.vf on 8 threads.
    also I can not get the 66 GFLOPS of vfmacc.vf on single core.
    I can only achieve 80% performance, where I should modify on K1 or cpufp code.
    图片

need -std=c99 parameter when compile bin

[@l /tmp/cpufp]# gcc -pthread -O3 -c smtl.c
smtl.c: In function ‘smtl_begin_tasks’:
smtl.c:331:5: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < sh->num_threads; i++)
^
smtl.c:331:5: note: use option -std=c99 or -std=gnu99 to compile your code
[@l /tmp/cpufp]#
[@l /tmp/cpufp]# gcc -pthread -O3 -c smtl.c -std=c99
[@l /tmp/cpufp]#

cpufp_kernel_x86_avx512_vnni.s:16: Error: no such instruction: `vpdpbusd %zmm0,%zmm0,%zmm0'

@pigirons
Why this error appears in my build? Should I upgrade as to a new version? My "Test steps and Environment" is also listed below.
cpufp_kernel_x86_avx512_vnni.s:16: Error: no such instruction: `vpdpbusd %zmm0,%zmm0,%zmm0'

PS: I already update gcc to 9.2.0 in Centos 7.7.
Thanks for your help.

/Test steps and Environment**/
[root@baseimage cpufp]# as -o cpufp_kernel_x86_avx512_vnni.o cpufp_kernel_x86_avx512_vnni.s
cpufp_kernel_x86_avx512_vnni.s: Assembler messages:
cpufp_kernel_x86_avx512_vnni.s:16: Error: no such instruction: vpdpbusd %zmm0,%zmm0,%zmm0' cpufp_kernel_x86_avx512_vnni.s:17: Error: no such instruction: vpdpbusd %zmm1,%zmm1,%zmm1'
cpufp_kernel_x86_avx512_vnni.s:18: Error: no such instruction: vpdpbusd %zmm2,%zmm2,%zmm2' cpufp_kernel_x86_avx512_vnni.s:19: Error: no such instruction: vpdpbusd %zmm3,%zmm3,%zmm3'
cpufp_kernel_x86_avx512_vnni.s:20: Error: no such instruction: vpdpbusd %zmm4,%zmm4,%zmm4' cpufp_kernel_x86_avx512_vnni.s:21: Error: no such instruction: vpdpbusd %zmm5,%zmm5,%zmm5'
cpufp_kernel_x86_avx512_vnni.s:22: Error: no such instruction: vpdpbusd %zmm6,%zmm6,%zmm6' cpufp_kernel_x86_avx512_vnni.s:23: Error: no such instruction: vpdpbusd %zmm7,%zmm7,%zmm7'
cpufp_kernel_x86_avx512_vnni.s:24: Error: no such instruction: vpdpbusd %zmm8,%zmm8,%zmm8' cpufp_kernel_x86_avx512_vnni.s:25: Error: no such instruction: vpdpbusd %zmm9,%zmm9,%zmm9'
[root@baseimage cpufp]#
[root@baseimage cpufp]#
[root@baseimage cpufp]# as --version
GNU assembler version 2.27-41.base.el7_7.1
Copyright (C) 2016 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `x86_64-redhat-linux'.
[root@baseimage cpufp]#
[root@baseimage cpufp]#
[root@baseimage cpufp]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
Stepping: 7
CPU MHz: 3099.998
BogoMIPS: 6199.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-7
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512_vnni

Error when executing "sh build.sh"

I got the following errors when executing sh build.sh

cpufp_x86.c: In function ‘main’:
cpufp_x86.c:333:51: error: ‘i’ undeclared (first use in this function)
  if (strspn(argv[1], "0123456789") == strlen(argv[i]))
                                                   ^
cpufp_x86.c:333:51: note: each undeclared identifier is reported only once for each function it appears in
cpufp_x86.c:334:3: error: expected expression before ‘int’
   int num_threads = atoi(argv[1]);
   ^
cpufp_x86.c:335:28: error: ‘num_threads’ undeclared (first use in this function)
  printf("Thread(s): %d\n", num_threads);
                            ^
gcc: error: cpufp_x86.o: No such file or directory

XuanTie C908 benchmark results

Hi, I run the benchmark on the CanMV Kendryte K230, which has a single XuanTie C908 RISC-V core that supports RVV 1.0.
Here are the results:

$ ./cpufp --thread_pool=[0]
Number Threads: 1
Thread Pool Binding: 0
---------------------------------------------------------------
| Instruction Set | Core Computation       | Peak Performance |
| vector          | vfmacc.vf(f16,f16,f16) | 25.014 GFLOPS    |
| vector          | vfmacc.vv(f16,f16,f16) | 25.01 GFLOPS     |
| vector          | vfmacc.vf(f32,f32,f32) | 12.507 GFLOPS    |
| vector          | vfmacc.vv(f32,f32,f32) | 12.508 GFLOPS    |
| vector          | vfmacc.vf(f64,f64,f64) | 6.254 GFLOPS     |
| vector          | vfmacc.vv(f64,f64,f64) | 6.2541 GFLOPS    |
---------------------------------------------------------------

BTW: please indicate that "ime" is not a standard RISC-V extension, but rather a SpacemiT custom vendor extension.

Edit: just saw that the README mentions it, bit imo it shouldn't be called ime in the table.

sh build.sh issue

cpufp_x86.c: In function ‘main’:
cpufp_x86.c:333:51: error: ‘i’ undeclared (first use in this function)
333 | if (strspn(argv[1], "0123456789") == strlen(argv[i]))
| ^
cpufp_x86.c:333:51: note: each undeclared identifier is reported only once for each function it appears in
cpufp_x86.c:334:3: error: expected expression before ‘int’
334 | int num_threads = atoi(argv[1]);
| ^~~
cpufp_x86.c:335:28: error: ‘num_threads’ undeclared (first use in this function)
335 | printf("Thread(s): %d\n", num_threads);
| ^~~~~~~~~~~
gcc: error: cpufp_x86.o: No such file or directory

link with -lrt for old glibc

The librt library needs to be specified exlicitly for systems with old glibc.
On my machine with gcc 4.8.5 and glibc 2.12, adding the -lrt flag at the end of the last command in gen.sh resolves the compilation error.

The compilation errors are listed below.

[dongxiao@695189 cpufp]$ ./build.sh
smtl.c: In function ‘smtl_begin_tasks’:
smtl.c:331:5: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < sh->num_threads; i++)
^
smtl.c:331:5: note: use option -std=c99 or -std=gnu99 to compile your code
cpufp_x86.o: In function cpufp_x86_fma': cpufp_x86.c:(.text+0xb9): undefined reference to clock_gettime'
cpufp_x86.c:(.text+0xf7): undefined reference to clock_gettime' cpufp_x86.c:(.text+0x194): undefined reference to clock_gettime'
cpufp_x86.c:(.text+0x1d4): undefined reference to clock_gettime' cpufp_x86.c:(.text+0x25e): undefined reference to clock_gettime'
cpufp_x86.o:cpufp_x86.c:(.text+0x282): more undefined references to `clock_gettime' follow
collect2: error: ld returned 1 exit status

Intel Sapphire Rapids架构CPU编译错误

CPU: Intel 8458P

# ./build_x64.sh
x64/cpufp.cpp: In function ‘void cpufp_register_isa()’:
x64/cpufp.cpp:291:31: error: ‘avx512f_fp16_fma_f16f16f16’ was not declared in this scope; did you mean ‘avx512_fp16_fma_f16f16f16’?
291 | 0x20000000LL, 1024LL, avx512f_fp16_fma_f16f16f16);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
| avx512_fp16_fma_f16f16f16
/usr/bin/ld: cannot find build_dir/cpufp.o: No such file or directory
collect2: error: ld returned 1 exit status

Error when executing "sh build.sh"

when executing "sh build.sh",cmd line reported is as folllowed:
/usr/bin/ld: smtl.o: in function `smtl_thread_func(void*)': smtl.cpp:(.text+0x78): undefined reference to `pthread_setaffinity_np' /usr/bin/ld: smtl.o: in function `smtl_init(smtl_t**, std::vector<int, std::allocator<int> >&)': smtl.cpp:(.text+0x3ff): undefined reference to `pthread_create' /usr/bin/ld: smtl.o: in function `smtl_fini(smtl_t*)': smtl.cpp:(.text+0x639): undefined reference to `pthread_join'
when I add -lpthread in build.sh ,errors were still reported.

asm/cpufp_kernel_x86_avx_vnni.S:16: Error: unsupported instruction `vpdpbusd'

When I execute sh build.sh,an error occurred:

asm/cpufp_kernel_x86_avx_vnni.S: Assembler messages:
asm/cpufp_kernel_x86_avx_vnni.S:16: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:17: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:18: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:19: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:20: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:21: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:22: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:23: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:24: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:25: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:42: Error: unsupported instruction `vpdpwssd'
asm/cpufp_kernel_x86_avx_vnni.S:43: Error: unsupported instruction `vpdpwssd'
asm/cpufp_kernel_x86_avx_vnni.S:44: Error: unsupported instruction `vpdpwssd'
asm/cpufp_kernel_x86_avx_vnni.S:45: Error: unsupported instruction `vpdpwssd'
asm/cpufp_kernel_x86_avx_vnni.S:46: Error: unsupported instruction `vpdpwssd'
asm/cpufp_kernel_x86_avx_vnni.S:47: Error: unsupported instruction `vpdpwssd'
asm/cpufp_kernel_x86_avx_vnni.S:48: Error: unsupported instruction `vpdpwssd'
asm/cpufp_kernel_x86_avx_vnni.S:49: Error: unsupported instruction `vpdpwssd'
asm/cpufp_kernel_x86_avx_vnni.S:50: Error: unsupported instruction `vpdpwssd'
asm/cpufp_kernel_x86_avx_vnni.S:51: Error: unsupported instruction `vpdpwssd'
gcc: error: cpufp_kernel_x86_avx_vnni.o: No such file or directory

Should I use a specified version of gcc? or add some options to build.sh?

Current version of my gcc is like below:

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.4.0-1ubuntu1~20.04.1' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)

compile error on wsl2 of win10

compile error on wsl2:
asm/cpufp_kernel_x86_avx_vnni.S:16: Error: unsupported instruction `vpdpbusd'

cat /proc/cpuinfo
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512vbmi umip avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm flush_l1d arch_capabilities

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.