NumPy-1-26-中文文档-二十二-

编译

cd /path/to/numpy
python setup.py build --cpu-baseline="avx2 fma3" install --user 

cd /path/to/numpy
python setup.py build_ext --cpu-baseline="avx2 fma3" install --user 

cd /path/to/numpy
python setup.py build_clib --cpu-baseline="avx2 fma3" install --user 

pip install --no-use-pep517 --global-option=build \
--global-option="--cpu-baseline=avx2 fma3" \
--global-option="--cpu-dispatch=max" ./ 

python setup.py build --cpu-baseline="native" bdist 

python setup.py build --cpu-baseline=native --cpu-dispatch=none bdist 

python setup.py build --cpu-baseline="avx f16c" bdist 

python setup.py build --cpu-baseline="vsx2" bdist 

python setup.py build --cpu-dispatch="max -avx512f -avx512cd \
-avx512_knl -avx512_knm -avx512_skx -avx512_clx -avx512_cnl -avx512_icl" \
bdist 

python setup.py build --cpu-dispatch="SSE41 avx2 FMA3" 

python setup.py build --cpu-dispatch="SSE41 AVX2 FMA3"
# equivalent to
python setup.py build --cpu-dispatch="FMA3 AVX2 SSE41" 

python setup.py build --cpu-dispatch="avx2 avx512f"
# or
python setup.py build --cpu-dispatch=avx2,avx512f
# or
python setup.py build --cpu-dispatch="avx2+avx512f" 

python setup.py build --cpu-baseline=sse42
# equivalent to
python setup.py build --cpu-baseline="sse sse2 sse3 ssse3 sse41 popcnt sse42" 

export CFLAGS="-march=native"
python setup.py install --user
# is equivalent to
python setup.py build --cpu-baseline=native install --user 

# Requesting `AVX2,FMA3` but the compiler only support **SSE** features
python setup.py build --cpu-baseline="avx2 fma3"
# is equivalent to
python setup.py build --cpu-baseline="sse sse2 sse3 ssse3 sse41 popcnt sse42" 

# Only dispatches AVX2 and FMA3
python setup.py build --cpu-dispatch=avx2,fma3
# Dispatches AVX and SSE features
python setup.py build --cpu-baseline=ssse3,sse41,sse42,avx,avx2,fma3 

# On ARMv8/A64, specify NEON is going to enable Advanced SIMD
# and all predecessor extensions
python setup.py build --cpu-baseline=neon
# which equivalent to
python setup.py build --cpu-baseline="neon neon_fp16 neon_vfpv4 asimd" 

# Specify AVX2 will force enables FMA3 on Intel compilers
python setup.py build --cpu-baseline=avx2
# which equivalent to
python setup.py build --cpu-baseline="avx2 fma3" 

########### EXT COMPILER OPTIMIZATION ###########
Platform  :
  Architecture:  x64
  Compiler  :  gcc

CPU  baseline  :
  Requested  :  'min'
  Enabled  :  SSE  SSE2  SSE3
  Flags  :  -msse  -msse2  -msse3
  Extra  checks:  none

CPU  dispatch  :
  Requested  :  'max -xop -fma4'
  Enabled  :  SSSE3  SSE41  POPCNT  SSE42  AVX  F16C  FMA3  AVX2  AVX512F  AVX512CD  AVX512_KNL  AVX512_KNM  AVX512_SKX  AVX512_CLX  AVX512_CNL  AVX512_ICL
  Generated  :
  :
  SSE41  :  SSE  SSE2  SSE3  SSSE3
  Flags  :  -msse  -msse2  -msse3  -mssse3  -msse4.1
  Extra  checks:  none
  Detect  :  SSE  SSE2  SSE3  SSSE3  SSE41
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_arithmetic.dispatch.c
  :  numpy/core/src/umath/_umath_tests.dispatch.c
  :
  SSE42  :  SSE  SSE2  SSE3  SSSE3  SSE41  POPCNT
  Flags  :  -msse  -msse2  -msse3  -mssse3  -msse4.1  -mpopcnt  -msse4.2
  Extra  checks:  none
  Detect  :  SSE  SSE2  SSE3  SSSE3  SSE41  POPCNT  SSE42
  :  build/src.linux-x86_64-3.9/numpy/core/src/_simd/_simd.dispatch.c
  :
  AVX2  :  SSE  SSE2  SSE3  SSSE3  SSE41  POPCNT  SSE42  AVX  F16C
  Flags  :  -msse  -msse2  -msse3  -mssse3  -msse4.1  -mpopcnt  -msse4.2  -mavx  -mf16c  -mavx2
  Extra  checks:  none
  Detect  :  AVX  F16C  AVX2
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_arithm_fp.dispatch.c
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_arithmetic.dispatch.c
  :  numpy/core/src/umath/_umath_tests.dispatch.c
  :
  (FMA3  AVX2)  :  SSE  SSE2  SSE3  SSSE3  SSE41  POPCNT  SSE42  AVX  F16C
  Flags  :  -msse  -msse2  -msse3  -mssse3  -msse4.1  -mpopcnt  -msse4.2  -mavx  -mf16c  -mfma  -mavx2
  Extra  checks:  none
  Detect  :  AVX  F16C  FMA3  AVX2
  :  build/src.linux-x86_64-3.9/numpy/core/src/_simd/_simd.dispatch.c
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_exponent_log.dispatch.c
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_trigonometric.dispatch.c
  :
  AVX512F  :  SSE  SSE2  SSE3  SSSE3  SSE41  POPCNT  SSE42  AVX  F16C  FMA3  AVX2
  Flags  :  -msse  -msse2  -msse3  -mssse3  -msse4.1  -mpopcnt  -msse4.2  -mavx  -mf16c  -mfma  -mavx2  -mavx512f
  Extra  checks:  AVX512F_REDUCE
  Detect  :  AVX512F
  :  build/src.linux-x86_64-3.9/numpy/core/src/_simd/_simd.dispatch.c
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_arithm_fp.dispatch.c
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_arithmetic.dispatch.c
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_exponent_log.dispatch.c
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_trigonometric.dispatch.c
  :
  AVX512_SKX  :  SSE  SSE2  SSE3  SSSE3  SSE41  POPCNT  SSE42  AVX  F16C  FMA3  AVX2  AVX512F  AVX512CD
  Flags  :  -msse  -msse2  -msse3  -mssse3  -msse4.1  -mpopcnt  -msse4.2  -mavx  -mf16c  -mfma  -mavx2  -mavx512f  -mavx512cd  -mavx512vl  -mavx512bw  -mavx512dq
  Extra  checks:  AVX512BW_MASK  AVX512DQ_MASK
  Detect  :  AVX512_SKX
  :  build/src.linux-x86_64-3.9/numpy/core/src/_simd/_simd.dispatch.c
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_arithmetic.dispatch.c
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_exponent_log.dispatch.c
CCompilerOpt.cache_flush[804]  :  write  cache  to  path  ->  /home/seiko/work/repos/numpy/build/temp.linux-x86_64-3.9/ccompiler_opt_cache_ext.py

########### CLIB COMPILER OPTIMIZATION ###########
Platform  :
  Architecture:  x64
  Compiler  :  gcc

CPU  baseline  :
  Requested  :  'min'
  Enabled  :  SSE  SSE2  SSE3
  Flags  :  -msse  -msse2  -msse3
  Extra  checks:  none

CPU  dispatch  :
  Requested  :  'max -xop -fma4'
  Enabled  :  SSSE3  SSE41  POPCNT  SSE42  AVX  F16C  FMA3  AVX2  AVX512F  AVX512CD  AVX512_KNL  AVX512_KNM  AVX512_SKX  AVX512_CLX  AVX512_CNL  AVX512_ICL
  Generated  :  none 

NPY_DISABLE_CPU_FEATURES="AVX2,FMA3" 

cd /path/to/numpy
python setup.py build --cpu-baseline="avx2 fma3" install --user 

cd /path/to/numpy
python setup.py build_ext --cpu-baseline="avx2 fma3" install --user 

cd /path/to/numpy
python setup.py build_clib --cpu-baseline="avx2 fma3" install --user 

pip install --no-use-pep517 --global-option=build \
--global-option="--cpu-baseline=avx2 fma3" \
--global-option="--cpu-dispatch=max" ./ 

python setup.py build --cpu-baseline="native" bdist 

python setup.py build --cpu-baseline=native --cpu-dispatch=none bdist 

python setup.py build --cpu-baseline="avx f16c" bdist 

python setup.py build --cpu-baseline="vsx2" bdist 

python setup.py build --cpu-dispatch="max -avx512f -avx512cd \
-avx512_knl -avx512_knm -avx512_skx -avx512_clx -avx512_cnl -avx512_icl" \
bdist 

python setup.py build --cpu-baseline="native" bdist 

python setup.py build --cpu-baseline=native --cpu-dispatch=none bdist 

python setup.py build --cpu-baseline="avx f16c" bdist 

python setup.py build --cpu-baseline="vsx2" bdist 

python setup.py build --cpu-dispatch="max -avx512f -avx512cd \
-avx512_knl -avx512_knm -avx512_skx -avx512_clx -avx512_cnl -avx512_icl" \
bdist 

python setup.py build --cpu-dispatch="SSE41 avx2 FMA3" 

python setup.py build --cpu-dispatch="SSE41 AVX2 FMA3"
# equivalent to
python setup.py build --cpu-dispatch="FMA3 AVX2 SSE41" 

python setup.py build --cpu-dispatch="avx2 avx512f"
# or
python setup.py build --cpu-dispatch=avx2,avx512f
# or
python setup.py build --cpu-dispatch="avx2+avx512f" 

python setup.py build --cpu-baseline=sse42
# equivalent to
python setup.py build --cpu-baseline="sse sse2 sse3 ssse3 sse41 popcnt sse42" 

export CFLAGS="-march=native"
python setup.py install --user
# is equivalent to
python setup.py build --cpu-baseline=native install --user 

# Requesting `AVX2,FMA3` but the compiler only support **SSE** features
python setup.py build --cpu-baseline="avx2 fma3"
# is equivalent to
python setup.py build --cpu-baseline="sse sse2 sse3 ssse3 sse41 popcnt sse42" 

# Only dispatches AVX2 and FMA3
python setup.py build --cpu-dispatch=avx2,fma3
# Dispatches AVX and SSE features
python setup.py build --cpu-baseline=ssse3,sse41,sse42,avx,avx2,fma3 

# On ARMv8/A64, specify NEON is going to enable Advanced SIMD
# and all predecessor extensions
python setup.py build --cpu-baseline=neon
# which equivalent to
python setup.py build --cpu-baseline="neon neon_fp16 neon_vfpv4 asimd" 

# Specify AVX2 will force enables FMA3 on Intel compilers
python setup.py build --cpu-baseline=avx2
# which equivalent to
python setup.py build --cpu-baseline="avx2 fma3" 

########### EXT COMPILER OPTIMIZATION ###########
Platform  :
  Architecture:  x64
  Compiler  :  gcc

CPU  baseline  :
  Requested  :  'min'
  Enabled  :  SSE  SSE2  SSE3
  Flags  :  -msse  -msse2  -msse3
  Extra  checks:  none

CPU  dispatch  :
  Requested  :  'max -xop -fma4'
  Enabled  :  SSSE3  SSE41  POPCNT  SSE42  AVX  F16C  FMA3  AVX2  AVX512F  AVX512CD  AVX512_KNL  AVX512_KNM  AVX512_SKX  AVX512_CLX  AVX512_CNL  AVX512_ICL
  Generated  :
  :
  SSE41  :  SSE  SSE2  SSE3  SSSE3
  Flags  :  -msse  -msse2  -msse3  -mssse3  -msse4.1
  Extra  checks:  none
  Detect  :  SSE  SSE2  SSE3  SSSE3  SSE41
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_arithmetic.dispatch.c
  :  numpy/core/src/umath/_umath_tests.dispatch.c
  :
  SSE42  :  SSE  SSE2  SSE3  SSSE3  SSE41  POPCNT
  Flags  :  -msse  -msse2  -msse3  -mssse3  -msse4.1  -mpopcnt  -msse4.2
  Extra  checks:  none
  Detect  :  SSE  SSE2  SSE3  SSSE3  SSE41  POPCNT  SSE42
  :  build/src.linux-x86_64-3.9/numpy/core/src/_simd/_simd.dispatch.c
  :
  AVX2  :  SSE  SSE2  SSE3  SSSE3  SSE41  POPCNT  SSE42  AVX  F16C
  Flags  :  -msse  -msse2  -msse3  -mssse3  -msse4.1  -mpopcnt  -msse4.2  -mavx  -mf16c  -mavx2
  Extra  checks:  none
  Detect  :  AVX  F16C  AVX2
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_arithm_fp.dispatch.c
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_arithmetic.dispatch.c
  :  numpy/core/src/umath/_umath_tests.dispatch.c
  :
  (FMA3  AVX2)  :  SSE  SSE2  SSE3  SSSE3  SSE41  POPCNT  SSE42  AVX  F16C
  Flags  :  -msse  -msse2  -msse3  -mssse3  -msse4.1  -mpopcnt  -msse4.2  -mavx  -mf16c  -mfma  -mavx2
  Extra  checks:  none
  Detect  :  AVX  F16C  FMA3  AVX2
  :  build/src.linux-x86_64-3.9/numpy/core/src/_simd/_simd.dispatch.c
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_exponent_log.dispatch.c
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_trigonometric.dispatch.c
  :
  AVX512F  :  SSE  SSE2  SSE3  SSSE3  SSE41  POPCNT  SSE42  AVX  F16C  FMA3  AVX2
  Flags  :  -msse  -msse2  -msse3  -mssse3  -msse4.1  -mpopcnt  -msse4.2  -mavx  -mf16c  -mfma  -mavx2  -mavx512f
  Extra  checks:  AVX512F_REDUCE
  Detect  :  AVX512F
  :  build/src.linux-x86_64-3.9/numpy/core/src/_simd/_simd.dispatch.c
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_arithm_fp.dispatch.c
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_arithmetic.dispatch.c
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_exponent_log.dispatch.c
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_trigonometric.dispatch.c
  :
  AVX512_SKX  :  SSE  SSE2  SSE3  SSSE3  SSE41  POPCNT  SSE42  AVX  F16C  FMA3  AVX2  AVX512F  AVX512CD
  Flags  :  -msse  -msse2  -msse3  -mssse3  -msse4.1  -mpopcnt  -msse4.2  -mavx  -mf16c  -mfma  -mavx2  -mavx512f  -mavx512cd  -mavx512vl  -mavx512bw  -mavx512dq
  Extra  checks:  AVX512BW_MASK  AVX512DQ_MASK
  Detect  :  AVX512_SKX
  :  build/src.linux-x86_64-3.9/numpy/core/src/_simd/_simd.dispatch.c
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_arithmetic.dispatch.c
  :  build/src.linux-x86_64-3.9/numpy/core/src/umath/loops_exponent_log.dispatch.c
CCompilerOpt.cache_flush[804]  :  write  cache  to  path  ->  /home/seiko/work/repos/numpy/build/temp.linux-x86_64-3.9/ccompiler_opt_cache_ext.py

########### CLIB COMPILER OPTIMIZATION ###########
Platform  :
  Architecture:  x64
  Compiler  :  gcc

CPU  baseline  :
  Requested  :  'min'
  Enabled  :  SSE  SSE2  SSE3
  Flags  :  -msse  -msse2  -msse3
  Extra  checks:  none

CPU  dispatch  :
  Requested  :  'max -xop -fma4'
  Enabled  :  SSSE3  SSE41  POPCNT  SSE42  AVX  F16C  FMA3  AVX2  AVX512F  AVX512CD  AVX512_KNL  AVX512_KNM  AVX512_SKX  AVX512_CLX  AVX512_CNL  AVX512_ICL
  Generated  :  none 

NPY_DISABLE_CPU_FEATURES="AVX2,FMA3" 

// The header should be located at numpy/numpy/core/src/common/_cpu_dispatch.h
/**NOTE
 ** C definitions prefixed with "NPY_HAVE_" represent
 ** the required optimizations.
 **
 ** C definitions prefixed with 'NPY__CPU_TARGET_' are protected and
 ** shouldn't be used by any NumPy C sources.
 */
/******* baseline features *******/
/** SSE **/
#define NPY_HAVE_SSE 1
#include  <xmmintrin.h>
/** SSE2 **/
#define NPY_HAVE_SSE2 1
#include  <emmintrin.h>
/** SSE3 **/
#define NPY_HAVE_SSE3 1
#include  <pmmintrin.h>

/******* dispatch-able features *******/
#ifdef NPY__CPU_TARGET_SSSE3
  /** SSSE3 **/
  #define NPY_HAVE_SSSE3 1
  #include  <tmmintrin.h>
#endif
#ifdef NPY__CPU_TARGET_SSE41
  /** SSE41 **/
  #define NPY_HAVE_SSE41 1
  #include  <smmintrin.h>
#endif 

/*@targets avx2 avx512f vsx2 vsx3 asimd asimdhp */
// C code 

/*
 * this definition is used by NumPy utilities as suffixes for the
 * exported symbols
 */
#define NPY__CPU_TARGET_CURRENT AVX512F
/*
 * The following definitions enable
 * definitions of the dispatch-able features that are defined within the main
 * configuration header. These are definitions for the implied features.
 */
#define NPY__CPU_TARGET_SSE
#define NPY__CPU_TARGET_SSE2
#define NPY__CPU_TARGET_SSE3
#define NPY__CPU_TARGET_SSSE3
#define NPY__CPU_TARGET_SSE41
#define NPY__CPU_TARGET_POPCNT
#define NPY__CPU_TARGET_SSE42
#define NPY__CPU_TARGET_AVX
#define NPY__CPU_TARGET_F16C
#define NPY__CPU_TARGET_FMA3
#define NPY__CPU_TARGET_AVX2
#define NPY__CPU_TARGET_AVX512F
// our dispatch-able source
#include  "/the/absuolate/path/of/hello.dispatch.c" 

// hello.dispatch.c
/*@targets baseline sse42 avx512f */
#include  <stdio.h>
#include  "numpy/utils.h" // NPY_CAT, NPY_TOSTR

#ifndef NPY__CPU_TARGET_CURRENT
  // wrapping the dispatch-able source only happens to the additional optimizations
  // but if the keyword 'baseline' provided within the configuration statements,
  // the infrastructure will add extra compiling for the dispatch-able source by
  // passing it as-is to the compiler without any changes.
  #define CURRENT_TARGET(X) X
  #define NPY__CPU_TARGET_CURRENT baseline // for printing only
#else
  // since we reach to this point, that's mean we're dealing with
  // the additional optimizations, so it could be SSE42 or AVX512F
  #define CURRENT_TARGET(X) NPY_CAT(NPY_CAT(X, _), NPY__CPU_TARGET_CURRENT)
#endif
// Macro 'CURRENT_TARGET' adding the current target as suffux to the exported symbols,
// to avoid linking duplications, NumPy already has a macro called
// 'NPY_CPU_DISPATCH_CURFX' similar to it, located at
// numpy/numpy/core/src/common/npy_cpu_dispatch.h
// NOTE: we tend to not adding suffixes to the baseline exported symbols
void  CURRENT_TARGET(simd_whoami)(const  char  *extra_info)
{
  printf("I'm "  NPY_TOSTR(NPY__CPU_TARGET_CURRENT)  ", %s\n",  extra_info);
} 

#ifndef NPY__CPU_DISPATCH_EXPAND_
  // To expand the macro calls in this header
  #define NPY__CPU_DISPATCH_EXPAND_(X) X
#endif
// Undefining the following macros, due to the possibility of including config headers
// multiple times within the same source and since each config header represents
// different required optimizations according to the specified configuration
// statements in the dispatch-able source that derived from it.
#undef NPY__CPU_DISPATCH_BASELINE_CALL
#undef NPY__CPU_DISPATCH_CALL
// nothing strange here, just a normal preprocessor callback
// enabled only if 'baseline' specified within the configuration statements
#define NPY__CPU_DISPATCH_BASELINE_CALL(CB, ...) \
 NPY__CPU_DISPATCH_EXPAND_(CB(__VA_ARGS__))
// 'NPY__CPU_DISPATCH_CALL' is an abstract macro is used for dispatching
// the required optimizations that specified within the configuration statements.
//
// @param CHK, Expected a macro that can be used to detect CPU features
// in runtime, which takes a CPU feature name without string quotes and
// returns the testing result in a shape of boolean value.
// NumPy already has macro called "NPY_CPU_HAVE", which fits this requirement.
//
// @param CB, a callback macro that expected to be called multiple times depending
// on the required optimizations, the callback should receive the following arguments:
//  1- The pending calls of @param CHK filled up with the required CPU features,
//     that need to be tested first in runtime before executing call belong to
//     the compiled object.
//  2- The required optimization name, same as in 'NPY__CPU_TARGET_CURRENT'
//  3- Extra arguments in the macro itself
//
// By default the callback calls are sorted depending on the highest interest
// unless the policy "$keep_sort" was in place within the configuration statements
// see "Dive into the CPU dispatcher" for more clarification.
#define NPY__CPU_DISPATCH_CALL(CHK, CB, ...) \
 NPY__CPU_DISPATCH_EXPAND_(CB((CHK(AVX512F)), AVX512F, __VA_ARGS__)) \
 NPY__CPU_DISPATCH_EXPAND_(CB((CHK(SSE)&&CHK(SSE2)&&CHK(SSE3)&&CHK(SSSE3)&&CHK(SSE41)), SSE41, __VA_ARGS__)) 

// NOTE: The following macros are only defined for demonstration purposes only.
// NumPy already has a collections of macros located at
// numpy/numpy/core/src/common/npy_cpu_dispatch.h, that covers all dispatching
// and declarations scenarios.

#include  "numpy/npy_cpu_features.h" // NPY_CPU_HAVE
#include  "numpy/utils.h" // NPY_CAT, NPY_EXPAND

// An example for setting a macro that calls all the exported symbols at once
// after checking if they're supported by the running machine.
#define DISPATCH_CALL_ALL(FN, ARGS) \
 NPY__CPU_DISPATCH_CALL(NPY_CPU_HAVE, DISPATCH_CALL_ALL_CB, FN, ARGS) \
 NPY__CPU_DISPATCH_BASELINE_CALL(DISPATCH_CALL_BASELINE_ALL_CB, FN, ARGS)
// The preprocessor callbacks.
// The same suffixes as we define it in the dispatch-able source.
#define DISPATCH_CALL_ALL_CB(CHECK, TARGET_NAME, FN, ARGS) \
 if (CHECK) { NPY_CAT(NPY_CAT(FN, _), TARGET_NAME) ARGS; }
#define DISPATCH_CALL_BASELINE_ALL_CB(FN, ARGS) \
 FN NPY_EXPAND(ARGS);

// An example for setting a macro that calls the exported symbols of highest
// interest optimization, after checking if they're supported by the running machine.
#define DISPATCH_CALL_HIGH(FN, ARGS) \
 if (0) {} \
 NPY__CPU_DISPATCH_CALL(NPY_CPU_HAVE, DISPATCH_CALL_HIGH_CB, FN, ARGS) \
 NPY__CPU_DISPATCH_BASELINE_CALL(DISPATCH_CALL_BASELINE_HIGH_CB, FN, ARGS)
// The preprocessor callbacks
// The same suffixes as we define it in the dispatch-able source.
#define DISPATCH_CALL_HIGH_CB(CHECK, TARGET_NAME, FN, ARGS) \
 else if (CHECK) { NPY_CAT(NPY_CAT(FN, _), TARGET_NAME) ARGS; }
#define DISPATCH_CALL_BASELINE_HIGH_CB(FN, ARGS) \
 else { FN NPY_EXPAND(ARGS); }

// NumPy has a macro called 'NPY_CPU_DISPATCH_DECLARE' can be used
// for forward declarations any kind of prototypes based on
// 'NPY__CPU_DISPATCH_CALL' and 'NPY__CPU_DISPATCH_BASELINE_CALL'.
// However in this example, we just handle it manually.
void  simd_whoami(const  char  *extra_info);
void  simd_whoami_AVX512F(const  char  *extra_info);
void  simd_whoami_SSE41(const  char  *extra_info);

void  trigger_me(void)
{
  // bring the auto-generated config header
  // which contains config macros 'NPY__CPU_DISPATCH_CALL' and
  // 'NPY__CPU_DISPATCH_BASELINE_CALL'.
  // it is highly recommended to include the config header before executing
  // the dispatching macros in case if there's another header in the scope.
  #include  "hello.dispatch.h"
  DISPATCH_CALL_ALL(simd_whoami,  ("all"))
  DISPATCH_CALL_HIGH(simd_whoami,  ("the highest interest"))
  // An example of including multiple config headers in the same source
  // #include "hello2.dispatch.h"
  // DISPATCH_CALL_HIGH(another_function, ("the highest interest"))
} 

// The header should be located at numpy/numpy/core/src/common/_cpu_dispatch.h
/**NOTE
 ** C definitions prefixed with "NPY_HAVE_" represent
 ** the required optimizations.
 **
 ** C definitions prefixed with 'NPY__CPU_TARGET_' are protected and
 ** shouldn't be used by any NumPy C sources.
 */
/******* baseline features *******/
/** SSE **/
#define NPY_HAVE_SSE 1
#include  <xmmintrin.h>
/** SSE2 **/
#define NPY_HAVE_SSE2 1
#include  <emmintrin.h>
/** SSE3 **/
#define NPY_HAVE_SSE3 1
#include  <pmmintrin.h>

/******* dispatch-able features *******/
#ifdef NPY__CPU_TARGET_SSSE3
  /** SSSE3 **/
  #define NPY_HAVE_SSSE3 1
  #include  <tmmintrin.h>
#endif
#ifdef NPY__CPU_TARGET_SSE41
  /** SSE41 **/
  #define NPY_HAVE_SSE41 1
  #include  <smmintrin.h>
#endif 

/*@targets avx2 avx512f vsx2 vsx3 asimd asimdhp */
// C code 

/*
 * this definition is used by NumPy utilities as suffixes for the
 * exported symbols
 */
#define NPY__CPU_TARGET_CURRENT AVX512F
/*
 * The following definitions enable
 * definitions of the dispatch-able features that are defined within the main
 * configuration header. These are definitions for the implied features.
 */
#define NPY__CPU_TARGET_SSE
#define NPY__CPU_TARGET_SSE2
#define NPY__CPU_TARGET_SSE3
#define NPY__CPU_TARGET_SSSE3
#define NPY__CPU_TARGET_SSE41
#define NPY__CPU_TARGET_POPCNT
#define NPY__CPU_TARGET_SSE42
#define NPY__CPU_TARGET_AVX
#define NPY__CPU_TARGET_F16C
#define NPY__CPU_TARGET_FMA3
#define NPY__CPU_TARGET_AVX2
#define NPY__CPU_TARGET_AVX512F
// our dispatch-able source
#include  "/the/absuolate/path/of/hello.dispatch.c" 

// hello.dispatch.c
/*@targets baseline sse42 avx512f */
#include  <stdio.h>
#include  "numpy/utils.h" // NPY_CAT, NPY_TOSTR

#ifndef NPY__CPU_TARGET_CURRENT
  // wrapping the dispatch-able source only happens to the additional optimizations
  // but if the keyword 'baseline' provided within the configuration statements,
  // the infrastructure will add extra compiling for the dispatch-able source by
  // passing it as-is to the compiler without any changes.
  #define CURRENT_TARGET(X) X
  #define NPY__CPU_TARGET_CURRENT baseline // for printing only
#else
  // since we reach to this point, that's mean we're dealing with
  // the additional optimizations, so it could be SSE42 or AVX512F
  #define CURRENT_TARGET(X) NPY_CAT(NPY_CAT(X, _), NPY__CPU_TARGET_CURRENT)
#endif
// Macro 'CURRENT_TARGET' adding the current target as suffux to the exported symbols,
// to avoid linking duplications, NumPy already has a macro called
// 'NPY_CPU_DISPATCH_CURFX' similar to it, located at
// numpy/numpy/core/src/common/npy_cpu_dispatch.h
// NOTE: we tend to not adding suffixes to the baseline exported symbols
void  CURRENT_TARGET(simd_whoami)(const  char  *extra_info)
{
  printf("I'm "  NPY_TOSTR(NPY__CPU_TARGET_CURRENT)  ", %s\n",  extra_info);
} 

#ifndef NPY__CPU_DISPATCH_EXPAND_
  // To expand the macro calls in this header
  #define NPY__CPU_DISPATCH_EXPAND_(X) X
#endif
// Undefining the following macros, due to the possibility of including config headers
// multiple times within the same source and since each config header represents
// different required optimizations according to the specified configuration
// statements in the dispatch-able source that derived from it.
#undef NPY__CPU_DISPATCH_BASELINE_CALL
#undef NPY__CPU_DISPATCH_CALL
// nothing strange here, just a normal preprocessor callback
// enabled only if 'baseline' specified within the configuration statements
#define NPY__CPU_DISPATCH_BASELINE_CALL(CB, ...) \
 NPY__CPU_DISPATCH_EXPAND_(CB(__VA_ARGS__))
// 'NPY__CPU_DISPATCH_CALL' is an abstract macro is used for dispatching
// the required optimizations that specified within the configuration statements.
//
// @param CHK, Expected a macro that can be used to detect CPU features
// in runtime, which takes a CPU feature name without string quotes and
// returns the testing result in a shape of boolean value.
// NumPy already has macro called "NPY_CPU_HAVE", which fits this requirement.
//
// @param CB, a callback macro that expected to be called multiple times depending
// on the required optimizations, the callback should receive the following arguments:
//  1- The pending calls of @param CHK filled up with the required CPU features,
//     that need to be tested first in runtime before executing call belong to
//     the compiled object.
//  2- The required optimization name, same as in 'NPY__CPU_TARGET_CURRENT'
//  3- Extra arguments in the macro itself
//
// By default the callback calls are sorted depending on the highest interest
// unless the policy "$keep_sort" was in place within the configuration statements
// see "Dive into the CPU dispatcher" for more clarification.
#define NPY__CPU_DISPATCH_CALL(CHK, CB, ...) \
 NPY__CPU_DISPATCH_EXPAND_(CB((CHK(AVX512F)), AVX512F, __VA_ARGS__)) \
 NPY__CPU_DISPATCH_EXPAND_(CB((CHK(SSE)&&CHK(SSE2)&&CHK(SSE3)&&CHK(SSSE3)&&CHK(SSE41)), SSE41, __VA_ARGS__)) 

// NOTE: The following macros are only defined for demonstration purposes only.
// NumPy already has a collections of macros located at
// numpy/numpy/core/src/common/npy_cpu_dispatch.h, that covers all dispatching
// and declarations scenarios.

#include  "numpy/npy_cpu_features.h" // NPY_CPU_HAVE
#include  "numpy/utils.h" // NPY_CAT, NPY_EXPAND

// An example for setting a macro that calls all the exported symbols at once
// after checking if they're supported by the running machine.
#define DISPATCH_CALL_ALL(FN, ARGS) \
 NPY__CPU_DISPATCH_CALL(NPY_CPU_HAVE, DISPATCH_CALL_ALL_CB, FN, ARGS) \
 NPY__CPU_DISPATCH_BASELINE_CALL(DISPATCH_CALL_BASELINE_ALL_CB, FN, ARGS)
// The preprocessor callbacks.
// The same suffixes as we define it in the dispatch-able source.
#define DISPATCH_CALL_ALL_CB(CHECK, TARGET_NAME, FN, ARGS) \
 if (CHECK) { NPY_CAT(NPY_CAT(FN, _), TARGET_NAME) ARGS; }
#define DISPATCH_CALL_BASELINE_ALL_CB(FN, ARGS) \
 FN NPY_EXPAND(ARGS);

// An example for setting a macro that calls the exported symbols of highest
// interest optimization, after checking if they're supported by the running machine.
#define DISPATCH_CALL_HIGH(FN, ARGS) \
 if (0) {} \
 NPY__CPU_DISPATCH_CALL(NPY_CPU_HAVE, DISPATCH_CALL_HIGH_CB, FN, ARGS) \
 NPY__CPU_DISPATCH_BASELINE_CALL(DISPATCH_CALL_BASELINE_HIGH_CB, FN, ARGS)
// The preprocessor callbacks
// The same suffixes as we define it in the dispatch-able source.
#define DISPATCH_CALL_HIGH_CB(CHECK, TARGET_NAME, FN, ARGS) \
 else if (CHECK) { NPY_CAT(NPY_CAT(FN, _), TARGET_NAME) ARGS; }
#define DISPATCH_CALL_BASELINE_HIGH_CB(FN, ARGS) \
 else { FN NPY_EXPAND(ARGS); }

// NumPy has a macro called 'NPY_CPU_DISPATCH_DECLARE' can be used
// for forward declarations any kind of prototypes based on
// 'NPY__CPU_DISPATCH_CALL' and 'NPY__CPU_DISPATCH_BASELINE_CALL'.
// However in this example, we just handle it manually.
void  simd_whoami(const  char  *extra_info);
void  simd_whoami_AVX512F(const  char  *extra_info);
void  simd_whoami_SSE41(const  char  *extra_info);

void  trigger_me(void)
{
  // bring the auto-generated config header
  // which contains config macros 'NPY__CPU_DISPATCH_CALL' and
  // 'NPY__CPU_DISPATCH_BASELINE_CALL'.
  // it is highly recommended to include the config header before executing
  // the dispatching macros in case if there's another header in the scope.
  #include  "hello.dispatch.h"
  DISPATCH_CALL_ALL(simd_whoami,  ("all"))
  DISPATCH_CALL_HIGH(simd_whoami,  ("the highest interest"))
  // An example of including multiple config headers in the same source
  // #include "hello2.dispatch.h"
  // DISPATCH_CALL_HIGH(another_function, ("the highest interest"))
} 

double rms(double* seq, int n); 

def rms(seq):
  """
 rms: return the root mean square of a sequence
 rms(numpy.ndarray) -> double
 rms(list) -> double
 rms(tuple) -> double
 """ 

%{
#define SWIG_FILE_WITH_INIT
#include "rms.h"
%}

%include "numpy.i"

%init %{
import_array();
%}

%apply (double* IN_ARRAY1, int DIM1) {(double* seq, int n)};
%include "rms.h" 

 1 PyObject *_wrap_rms(PyObject *args) {
 2   PyObject *resultobj = 0;
 3   double *arg1 = (double *) 0 ;
 4   int arg2 ;
 5   double result;
 6   PyArrayObject *array1 = NULL ;
 7   int is_new_object1 = 0 ;
 8   PyObject * obj0 = 0 ;
 9
10   if (!PyArg_ParseTuple(args,(char *)"O:rms",&obj0)) SWIG_fail;
11   {
12     array1 = obj_to_array_contiguous_allow_conversion(
13                  obj0, NPY_DOUBLE, &is_new_object1);
14     npy_intp size[1] = {
15       -1
16     };
17     if (!array1 || !require_dimensions(array1, 1) ||
18         !require_size(array1, size, 1)) SWIG_fail;
19     arg1 = (double*) array1->data;
20     arg2 = (int) array1->dimensions[0];
21   }
22   result = (double)rms(arg1,arg2);
23   resultobj = SWIG_From_double((double)(result));
24   {
25     if (is_new_object1 && array1) Py_DECREF(array1);
26   }
27   return resultobj;
28 fail:
29   {
30     if (is_new_object1 && array1) Py_DECREF(array1);
31   }
32   return NULL;
33 } 

double rms(int n, double* seq); 

%apply (int DIM1, double* IN_ARRAY1) {(int n, double* seq)}; 

%{
#define SWIG_FILE_WITH_INIT
%}
%include "numpy.i"
%init %{
import_array();
%} 

%numpy_typemaps(DATA_TYPE, DATA_TYPECODE, DIM_TYPE) 

%numpy_typemaps(double, NPY_DOUBLE, int)
%numpy_typemaps(int,    NPY_INT   , int) 

double[3] newVector(double x, double y, double z); 

%typemap(out) (TYPE[ANY]); 

%numpy_typemaps(bool, NPY_BOOL, int) 

%numpy_typemaps(bool, NPY_UINT, int) 

/* Python */
typedef struct {double real; double imag;} Py_complex;

/* NumPy */
typedef struct {float  real, imag;} npy_cfloat;
typedef struct {double real, imag;} npy_cdouble; 

%numpy_typemaps(Py_complex , NPY_CDOUBLE, int)
%numpy_typemaps(npy_cfloat , NPY_CFLOAT , int)
%numpy_typemaps(npy_cdouble, NPY_CDOUBLE, int) 

TypeError: in method 'MyClass_MyMethod', argument 2 of type 'int' 

pyfragments.swg 

%fragment("NumPy_Fragments"); 

double dot(int len, double* vec1, double* vec2); 

def dot(vec1, vec2):
  """
 dot(PyObject,PyObject) -> double
 """ 

%apply (int DIM1, double* IN_ARRAY1) {(int len1, double* vec1),
                                      (int len2, double* vec2)}
%rename (dot) my_dot;
%exception my_dot {
    $action
    if (PyErr_Occurred()) SWIG_fail;
}
%inline %{
double my_dot(int len1, double* vec1, int len2, double* vec2) {
    if (len1 != len2) {
        PyErr_Format(PyExc_ValueError,
                     "Arrays of lengths (%d,%d) given",
                     len1, len2);
        return 0.0;
    }
    return dot(len1, vec1, vec2);
}
%} 

%numpy_typemaps(double, NPY_DOUBLE, long) 

%apply (double* IN_ARRAY1, int DIM1) {(double* vector, int length)}
%include "my_header.h"
%clear (double* vector, int length); 

double rms(double* seq, int n); 

def rms(seq):
  """
 rms: return the root mean square of a sequence
 rms(numpy.ndarray) -> double
 rms(list) -> double
 rms(tuple) -> double
 """ 

%{
#define SWIG_FILE_WITH_INIT
#include "rms.h"
%}

%include "numpy.i"

%init %{
import_array();
%}

%apply (double* IN_ARRAY1, int DIM1) {(double* seq, int n)};
%include "rms.h" 

 1 PyObject *_wrap_rms(PyObject *args) {
 2   PyObject *resultobj = 0;
 3   double *arg1 = (double *) 0 ;
 4   int arg2 ;
 5   double result;
 6   PyArrayObject *array1 = NULL ;
 7   int is_new_object1 = 0 ;
 8   PyObject * obj0 = 0 ;
 9
10   if (!PyArg_ParseTuple(args,(char *)"O:rms",&obj0)) SWIG_fail;
11   {
12     array1 = obj_to_array_contiguous_allow_conversion(
13                  obj0, NPY_DOUBLE, &is_new_object1);
14     npy_intp size[1] = {
15       -1
16     };
17     if (!array1 || !require_dimensions(array1, 1) ||
18         !require_size(array1, size, 1)) SWIG_fail;
19     arg1 = (double*) array1->data;
20     arg2 = (int) array1->dimensions[0];
21   }
22   result = (double)rms(arg1,arg2);
23   resultobj = SWIG_From_double((double)(result));
24   {
25     if (is_new_object1 && array1) Py_DECREF(array1);
26   }
27   return resultobj;
28 fail:
29   {
30     if (is_new_object1 && array1) Py_DECREF(array1);
31   }
32   return NULL;
33 } 

double rms(int n, double* seq); 

%apply (int DIM1, double* IN_ARRAY1) {(int n, double* seq)}; 

%{
#define SWIG_FILE_WITH_INIT
%}
%include "numpy.i"
%init %{
import_array();
%} 

%numpy_typemaps(DATA_TYPE, DATA_TYPECODE, DIM_TYPE) 

%numpy_typemaps(double, NPY_DOUBLE, int)
%numpy_typemaps(int,    NPY_INT   , int) 

double[3] newVector(double x, double y, double z); 

%typemap(out) (TYPE[ANY]); 

%numpy_typemaps(bool, NPY_BOOL, int) 

%numpy_typemaps(bool, NPY_UINT, int) 

/* Python */
typedef struct {double real; double imag;} Py_complex;

/* NumPy */
typedef struct {float  real, imag;} npy_cfloat;
typedef struct {double real, imag;} npy_cdouble; 

%numpy_typemaps(Py_complex , NPY_CDOUBLE, int)
%numpy_typemaps(npy_cfloat , NPY_CFLOAT , int)
%numpy_typemaps(npy_cdouble, NPY_CDOUBLE, int) 

double[3] newVector(double x, double y, double z); 

%typemap(out) (TYPE[ANY]); 

%numpy_typemaps(bool, NPY_BOOL, int) 

%numpy_typemaps(bool, NPY_UINT, int) 

/* Python */
typedef struct {double real; double imag;} Py_complex;

/* NumPy */
typedef struct {float  real, imag;} npy_cfloat;
typedef struct {double real, imag;} npy_cdouble; 

%numpy_typemaps(Py_complex , NPY_CDOUBLE, int)
%numpy_typemaps(npy_cfloat , NPY_CFLOAT , int)
%numpy_typemaps(npy_cdouble, NPY_CDOUBLE, int) 

TypeError: in method 'MyClass_MyMethod', argument 2 of type 'int' 

pyfragments.swg 

%fragment("NumPy_Fragments"); 

double dot(int len, double* vec1, double* vec2); 

def dot(vec1, vec2):
  """
 dot(PyObject,PyObject) -> double
 """ 

%apply (int DIM1, double* IN_ARRAY1) {(int len1, double* vec1),
                                      (int len2, double* vec2)}
%rename (dot) my_dot;
%exception my_dot {
    $action
    if (PyErr_Occurred()) SWIG_fail;
}
%inline %{
double my_dot(int len1, double* vec1, int len2, double* vec2) {
    if (len1 != len2) {
        PyErr_Format(PyExc_ValueError,
                     "Arrays of lengths (%d,%d) given",
                     len1, len2);
        return 0.0;
    }
    return dot(len1, vec1, vec2);
}
%} 

%numpy_typemaps(double, NPY_DOUBLE, long) 

%apply (double* IN_ARRAY1, int DIM1) {(double* vector, int length)}
%include "my_header.h"
%clear (double* vector, int length); 

double dot(int len, double* vec1, double* vec2); 

def dot(vec1, vec2):
  """
 dot(PyObject,PyObject) -> double
 """ 

%apply (int DIM1, double* IN_ARRAY1) {(int len1, double* vec1),
                                      (int len2, double* vec2)}
%rename (dot) my_dot;
%exception my_dot {
    $action
    if (PyErr_Occurred()) SWIG_fail;
}
%inline %{
double my_dot(int len1, double* vec1, int len2, double* vec2) {
    if (len1 != len2) {
        PyErr_Format(PyExc_ValueError,
                     "Arrays of lengths (%d,%d) given",
                     len1, len2);
        return 0.0;
    }
    return dot(len1, vec1, vec2);
}
%} 

%numpy_typemaps(double, NPY_DOUBLE, long) 

%apply (double* IN_ARRAY1, int DIM1) {(double* vector, int length)}
%include "my_header.h"
%clear (double* vector, int length); 

Vector.h
Vector.cxx 

Vector.i 

testVector.py 

class VectorTestCase(unittest.TestCase): 

length = Vector.__dict__[self.typeStr + "Length"] 

class doubleTestCase(VectorTestCase):
    def __init__(self, methodName="runTest"):
        VectorTestCase.__init__(self, methodName)
        self.typeStr  = "double"
        self.typeCode = "d" 

Vector.h
Vector.cxx 

Vector.i 

testVector.py 

class VectorTestCase(unittest.TestCase): 

length = Vector.__dict__[self.typeStr + "Length"] 

class doubleTestCase(VectorTestCase):
    def __init__(self, methodName="runTest"):
        VectorTestCase.__init__(self, methodName)
        self.typeStr  = "double"
        self.typeCode = "d" 

git clone --recurse-submodules https://github.com/your-username/numpy.git 

cd numpy 

git remote add upstream https://github.com/numpy/numpy.git 

git checkout main
git pull upstream main --tags 

git submodule update --init 

git checkout -b linspace-speedups 

git push origin linspace-speedups 

import numpy as np 

$ python -m pip install -r test_requirements.txt 

$ spin test --coverage 

$ firefox build/coverage/index.html 

spin docs 

git clone --recurse-submodules https://github.com/your-username/numpy.git 

cd numpy 

git remote add upstream https://github.com/numpy/numpy.git 

git checkout main
git pull upstream main --tags 

git submodule update --init 

git checkout -b linspace-speedups 

git push origin linspace-speedups 

import numpy as np 

$ python -m pip install -r test_requirements.txt 

$ spin test --coverage 

$ firefox build/coverage/index.html 

spin docs 

import numpy as np 

$ python -m pip install -r test_requirements.txt 

$ spin test --coverage 

$ firefox build/coverage/index.html 

spin docs 

名称	暗示	收集
`SSE`	`SSE2`
`SSE2`	`SSE`
`SSE3`	`SSE` `SSE2`
`SSSE3`	`SSE` `SSE2` `SSE3`
`SSE41`	`SSE` `SSE2` `SSE3` `SSSE3`
`POPCNT`	`SSE` `SSE2` `SSE3` `SSSE3` `SSE41`
`SSE42`	`SSE` `SSE2` `SSE3` `SSSE3` `SSE41` `POPCNT`
`AVX`	`SSE` `SSE2` `SSE3` `SSSE3` `SSE41` `POPCNT` `SSE42`
`XOP`	`SSE` `SSE2` `SSE3` `SSSE3` `SSE41` `POPCNT` `SSE42` `AVX`
`FMA4`	`SSE` `SSE2` `SSE3` `SSSE3` `SSE41` `POPCNT` `SSE42` `AVX`
`F16C`	`SSE` `SSE2` `SSE3` `SSSE3` `SSE41` `POPCNT` `SSE42` `AVX`
`FMA3`	`SSE` `SSE2` `SSE3` `SSSE3` `SSE41` `POPCNT` `SSE42` `AVX` `F16C`
`AVX2`	`SSE` `SSE2` `SSE3` `SSSE3` `SSE41` `POPCNT` `SSE42` `AVX` `F16C`
`AVX512F`	`SSE` `SSE2` `SSE3` `SSSE3` `SSE41` `POPCNT` `SSE42` `AVX` `F16C` `FMA3` `AVX2`
`AVX512CD`	`SSE` `SSE2` `SSE3` `SSSE3` `SSE41` `POPCNT` `SSE42` `AVX` `F16C` `FMA3` `AVX2` `AVX512F`
`AVX512_KNL`	`SSE` `SSE2` `SSE3` `SSSE3` `SSE41` `POPCNT` `SSE42` `AVX` `F16C` `FMA3` `AVX2` `AVX512F` `AVX512CD`	`AVX512ER` `AVX512PF`
`AVX512_KNM`	`SSE` `SSE2` `SSE3` `SSSE3` `SSE41` `POPCNT` `SSE42` `AVX` `F16C` `FMA3` `AVX2` `AVX512F` `AVX512CD` `AVX512_KNL`	`AVX5124FMAPS` `AVX5124VNNIW` `AVX512VPOPCNTDQ`
`AVX512_SKX`	`SSE` `SSE2` `SSE3` `SSSE3` `SSE41` `POPCNT` `SSE42` `AVX` `F16C` `FMA3` `AVX2` `AVX512F` `AVX512CD`	`AVX512VL` `AVX512BW` `AVX512DQ`
`AVX512_CLX`	`SSE` `SSE2` `SSE3` `SSSE3` `SSE41` `POPCNT` `SSE42` `AVX` `F16C` `FMA3` `AVX2` `AVX512F` `AVX512CD` `AVX512_SKX`	`AVX512VNNI`
`AVX512_CNL`	`SSE` `SSE2` `SSE3` `SSSE3` `SSE41` `POPCNT` `SSE42` `AVX` `F16C` `FMA3` `AVX2` `AVX512F` `AVX512CD` `AVX512_SKX`	`AVX512IFMA` `AVX512VBMI`
`AVX512_ICL`	`SSE` `SSE2` `SSE3` `SSSE3` `SSE41` `POPCNT` `SSE42` `AVX` `F16C` `FMA3` `AVX2` `AVX512F` `AVX512CD` `AVX512_SKX` `AVX512_CLX` `AVX512_CNL`	`AVX512VBMI2` `AVX512BITALG` `AVX512VPOPCNTDQ`
`AVX512_SPR`	`SSE` `SSE2` `SSE3` `SSSE3` `SSE41` `POPCNT` `SSE42` `AVX` `F16C` `FMA3` `AVX2` `AVX512F` `AVX512CD` `AVX512_SKX` `AVX512_CLX` `AVX512_CNL` `AVX512_ICL`	`AVX512FP16`

名称	意味着
`VSX`
`VSX2`	`VSX`
`VSX3`	`VSX` `VSX2`
`VSX4`	`VSX` `VSX2` `VSX3`

名称	意味着
`VSX`	`VSX2`
`VSX2`	`VSX`
`VSX3`	`VSX` `VSX2`
`VSX4`	`VSX` `VSX2` `VSX3`

名称	意味着
`NEON`
`NEON_FP16`	`NEON`
`NEON_VFPV4`	`NEON` `NEON_FP16`
`ASIMD`	`NEON` `NEON_FP16` `NEON_VFPV4`
`ASIMDHP`	`NEON` `NEON_FP16` `NEON_VFPV4` `ASIMD`
`ASIMDDP`	`NEON` `NEON_FP16` `NEON_VFPV4` `ASIMD`
`ASIMDFHM`	`NEON` `NEON_FP16` `NEON_VFPV4` `ASIMD` `ASIMDHP`

名称	意味着
`NEON`	`NEON_FP16` `NEON_VFPV4` `ASIMD`
`NEON_FP16`	`NEON` `NEON_VFPV4` `ASIMD`
`NEON_VFPV4`	`NEON` `NEON_FP16` `ASIMD`
`ASIMD`	`NEON` `NEON_FP16` `NEON_VFPV4`
`ASIMDHP`	`NEON` `NEON_FP16` `NEON_VFPV4` `ASIMD`
`ASIMDDP`	`NEON` `NEON_FP16` `NEON_VFPV4` `ASIMD`
`ASIMDFHM`	`NEON` `NEON_FP16` `NEON_VFPV4` `ASIMD` `ASIMDHP`

对于 Arch	意味着
x86（32 位模式）	`SSE` `SSE2`
x86_64	`SSE` `SSE2` `SSE3`
IBM/POWER（大端模式）	`NONE`
IBM/POWER（小端模式）	`VSX` `VSX2`
ARMHF	`NONE`
ARM64 A.K. AARCH64	`NEON` `NEON_FP16` `NEON_VFPV4` `ASIMD`
IBM/ZSYSTEM(S390X)	`NONE`

名称	暗示	收集
FMA3	SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C AVX2
AVX2	SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3
AVX512F	SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512CD
XOP	SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX
FMA4	SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX
AVX512_SPR	SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F AVX512CD AVX512_SKX AVX512_CLX AVX512_CNL AVX512_ICL	AVX512FP16

名称	意味着
`VX`
`VXE`	`VX`

名称	意味着
`VX`
`VXE`	`VX`
`VXE2`	`VX` `VXE`

名称	含义
`VX`
`VXE`	`VX`
`VXE2`	`VX` `VXE`

龙哥盟

掠夺·扩张·投机·博弈

NumPy 1.26 中文文档（二十二）

CPU/SIMD 优化

CPU 构建选项

描述

快速开始

我正在为本地使用构建 NumPy

我不想支持 x86 架构的旧处理器

我遇到了与上述情况相同的问题，但是针对 ppc64 架构

遇到AVX512功能问题了吗？

支持的功能

在 x86 上

在 IBM/POWER 大端

在 IBM/POWER 小端

在 ARMv7/A32

在 ARMv8/A64

在 IBM/ZSYSTEM(S390X)

行为

平台差异

在 x86::Intel 编译器

在 x86::Microsoft Visual C/C++

描述

快速开始

我正在为本地使用构建 NumPy

我不想支持x86架构的旧处理器

我遇到了与上述相同的情况，但是使用ppc64架构

遇到AVX512特性的问题？

我正在为本地使用构建 NumPy

我不想支持x86架构的旧处理器

我遇到了与上述相同的情况，但是使用ppc64架构

遇到AVX512特性的问题？

支持的特性

在 x86 上

在 IBM/POWER 大端

在 IBM/POWER 小端

在 ARMv7/A32

在 ARMv8/A64

在 IBM/ZSYSTEM(S390X)

在 x86

在 IBM/POWER 大端

在 IBM/POWER 小端

在 ARMv7/A32

在 ARMv8/A64

在 IBM/ZSYSTEM(S390X)

特殊选项

行为

平台差异

在 x86::Intel 编译器上

在 x86::Microsoft Visual C/C++ 上

在 x86::Intel 编译器上

在 x86::Microsoft Visual C/C++ 上

构建报告

运行时分派

CPU 调度器是如何工作的？

1- 配置

2- 发现环境

3- 验证所请求的优化

4- 生成主配置头文件

5- 可调度源和配置语句

1- 配置

2- 发现环境

3- 验证所请求的优化

4- 生成主配置头文件

5- 分发源和配置语句

NumPy 安全性

在处理不可信数据时的建议使用 NumPy

在处理不受信任数据时的建议使用 NumPy

NumPy 和 SWIG

numpy.i：NumPy 的 SWIG 接口文件

简介

使用 numpy.i

可用的类型映射

输入数组

就地数组

输出数组

Argout 视图数组

内存管理 Argout 视图数组

输出数组

`numpy.i：NumPy 的 SWIG 接口文件`

`upstream/main` 和您的 feature 分支之间的分歧