大连人工智能计算平台——华为昇腾AI平台——高性能计算HPC的pytorch源码编译的一些注意事项

这里记录一下x86 docker环境下对aarch架构下pytorch+cuda进行编译的一些注意事项的总结：

关于cuda的支持：

一定要删掉cuda文件夹的软链接，并将原安装文件夹改名为cuda文件夹，具体见：

大连人工智能计算平台——华为昇腾AI平台——高性能计算HPC的pytorch源码编译报错——USE_CUDA=OFF——编译好的pytorch不支持CUDA的问题解决

关于MPI和NUMA的支持：

在pytorch源码编译之前安装openmpi，命令：

apt install openmpi-bin libopenmpi-dev

需要注意的是上述安装MPI的方式在编译pytorch时会提示：

OpenMPI found, but it is not built with CUDA support

由于Ubuntu官方提供的MPI版本并没有在编译时支持CUDA，如果想开启这个特性需要自己重新编译OpenMPI，由于该特性只是减少一次内存拷贝操作，对总体性能影响不大，并且操作过程比较繁琐因此这里不开启该特性。

编译命令：（注意：这里并没有给出依赖环境的配置，依赖环境的配置参考官方文档）

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py build

================================

CUDA、NVCC、CUDNN、MPI的环境变量设置：

export USE_CUDA=1


export CUDA_PATH=/usr/local/cuda
export CUDA_HOME=${CUDA_PATH}
export CUDA_BIN_PATH=${CUDA_PATH}/bin


export PATH=${CUDA_BIN_PATH}:$PATH
export LD_LIBRARY_PATH=${CUDA_PATH}/lib64:$LD_LIBRARY_PATH


export CMAKE_CUDA_COMPILER=${CUDA_BIN_PATH}/nvcc



export CUDNN_LIBRARY_PATH=${CUDA_PATH}/lib64 
export CUDNN_INCLUDE_PATH=${CUDA_PATH}/include


export LD_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu/openmpi/lib:$LD_LIBRARY_PATH

配置过程：

(base) root@afa50e5922a4:~/pytorch# python setup.py develop
Building wheel torch-2.1.0a0+git3c70d4b
-- Building version 2.1.0a0+git3c70d4b
cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/root/pytorch/torch -DCMAKE_PREFIX_PATH=/root/anaconda3/lib/python3.11/site-packages;/root/anaconda3 -DNUMPY_INCLUDE_DIR=/root/anaconda3/lib/python3.11/site-packages/numpy/core/include -DPYTHON_EXECUTABLE=/root/anaconda3/bin/python -DPYTHON_INCLUDE_DIR=/root/anaconda3/include/python3.11 -DPYTHON_LIBRARY=/root/anaconda3/lib/libpython3.11.a -DTORCH_BUILD_VERSION=2.1.0a0+git3c70d4b -DUSE_CUDA=1 -DUSE_NUMPY=True /root/pytorch
-- The CXX compiler identification is GNU 7.5.0
-- The C compiler identification is GNU 7.5.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- /usr/bin/c++ /root/pytorch/torch/abi-check.cpp -o /root/pytorch/build/abi-check
-- Determined _GLIBCXX_USE_CXX11_ABI=1
-- Not forcing any particular BLAS to be found
-- Could not find ccache. Consider installing ccache to speed up compilation.
-- Performing Test C_HAS_AVX_1
-- Performing Test C_HAS_AVX_1 - Failed
-- Performing Test C_HAS_AVX_2
-- Performing Test C_HAS_AVX_2 - Failed
-- Performing Test C_HAS_AVX_3
-- Performing Test C_HAS_AVX_3 - Failed
-- Performing Test C_HAS_AVX2_1
-- Performing Test C_HAS_AVX2_1 - Failed
-- Performing Test C_HAS_AVX2_2
-- Performing Test C_HAS_AVX2_2 - Failed
-- Performing Test C_HAS_AVX2_3
-- Performing Test C_HAS_AVX2_3 - Failed
-- Performing Test C_HAS_AVX512_1
-- Performing Test C_HAS_AVX512_1 - Failed
-- Performing Test C_HAS_AVX512_2
-- Performing Test C_HAS_AVX512_2 - Failed
-- Performing Test C_HAS_AVX512_3
-- Performing Test C_HAS_AVX512_3 - Failed
-- Performing Test CXX_HAS_AVX_1
-- Performing Test CXX_HAS_AVX_1 - Failed
-- Performing Test CXX_HAS_AVX_2
-- Performing Test CXX_HAS_AVX_2 - Failed
-- Performing Test CXX_HAS_AVX_3
-- Performing Test CXX_HAS_AVX_3 - Failed
-- Performing Test CXX_HAS_AVX2_1
-- Performing Test CXX_HAS_AVX2_1 - Failed
-- Performing Test CXX_HAS_AVX2_2
-- Performing Test CXX_HAS_AVX2_2 - Failed
-- Performing Test CXX_HAS_AVX2_3
-- Performing Test CXX_HAS_AVX2_3 - Failed
-- Performing Test CXX_HAS_AVX512_1
-- Performing Test CXX_HAS_AVX512_1 - Failed
-- Performing Test CXX_HAS_AVX512_2
-- Performing Test CXX_HAS_AVX512_2 - Failed
-- Performing Test CXX_HAS_AVX512_3
-- Performing Test CXX_HAS_AVX512_3 - Failed
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS - Failed
-- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY - Success
-- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY - Success
-- Performing Test COMPILER_SUPPORTS_RDYNAMIC
-- Performing Test COMPILER_SUPPORTS_RDYNAMIC - Success
-- Found CUDA: /usr/local/cuda (found version "11.8") 
-- The CUDA compiler identification is NVIDIA 11.8.89
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda/include (found version "11.8.89") 
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE  
-- Caffe2: CUDA detected: 11.8
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 11.8
-- /usr/local/cuda/lib64/libnvrtc.so shorthash is 20655db5
-- Could NOT find CUDNN (missing: CUDNN_LIBRARY_PATH CUDNN_INCLUDE_PATH) 
CMake Warning at cmake/public/cuda.cmake:251 (message):
  Cannot find cuDNN library.  Turning the option off
Call Stack (most recent call first):
  cmake/Dependencies.cmake:43 (include)
  CMakeLists.txt:718 (include)


-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 3.5;5.0;8.0;8.6;8.9;9.0;8.9+PTX;9.0+PTX
-- Added CUDA NVCC flags for: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_89,code=compute_89;-gencode;arch=compute_90,code=compute_90
-- Building using own protobuf under third_party per request.
-- Use custom protobuf build.
-- 
-- 3.13.0.0
-- Performing Test protobuf_HAVE_BUILTIN_ATOMICS
-- Performing Test protobuf_HAVE_BUILTIN_ATOMICS - Success
-- Caffe2 protobuf include directory: $<BUILD_INTERFACE:/root/pytorch/third_party/protobuf/src>$<INSTALL_INTERFACE:include>
-- Trying to find preferred BLAS backend of choice: MKL
-- MKL_THREADING = OMP
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of void*
-- Check size of void* - done
-- MKL_THREADING = OMP
CMake Warning at cmake/Dependencies.cmake:211 (message):
  MKL could not be found.  Defaulting to Eigen
Call Stack (most recent call first):
  CMakeLists.txt:718 (include)


CMake Warning at cmake/Dependencies.cmake:248 (message):
  Preferred BLAS (MKL) cannot be found, now searching for a general BLAS
  library
Call Stack (most recent call first):
  CMakeLists.txt:718 (include)


-- MKL_THREADING = OMP
-- Checking for [mkl_intel_lp64 - mkl_gnu_thread - mkl_core - gomp - pthread - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - gomp - pthread - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_gnu_thread - mkl_core - gomp - pthread - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_intel - mkl_intel_thread - mkl_core - gomp - pthread - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_gf_lp64 - mkl_gnu_thread - mkl_core - gomp - pthread - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf_lp64 - mkl_intel_thread - mkl_core - gomp - pthread - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf - mkl_gnu_thread - mkl_core - gomp - pthread - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl_gf - mkl_intel_thread - mkl_core - gomp - pthread - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl_intel_lp64 - mkl_gnu_thread - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_gnu_thread - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_intel - mkl_intel_thread - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_gf_lp64 - mkl_gnu_thread - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf_lp64 - mkl_intel_thread - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf - mkl_gnu_thread - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl_gf - mkl_intel_thread - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl_intel_lp64 - mkl_gnu_thread - mkl_core - pthread - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - pthread - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_gnu_thread - mkl_core - pthread - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_intel - mkl_intel_thread - mkl_core - pthread - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_gf_lp64 - mkl_gnu_thread - mkl_core - pthread - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf_lp64 - mkl_intel_thread - mkl_core - pthread - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf - mkl_gnu_thread - mkl_core - pthread - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl_gf - mkl_intel_thread - mkl_core - pthread - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl_intel_lp64 - mkl_sequential - mkl_core - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_sequential - mkl_core - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_gf_lp64 - mkl_sequential - mkl_core - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf - mkl_sequential - mkl_core - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl_intel_lp64 - mkl_core - gomp - pthread - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_core - gomp - pthread - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_gf_lp64 - mkl_core - gomp - pthread - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf - mkl_core - gomp - pthread - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl_intel_lp64 - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_gf_lp64 - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl_intel_lp64 - mkl_core - pthread - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_core - pthread - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_gf_lp64 - mkl_core - pthread - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf - mkl_core - pthread - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl - guide - pthread - m]
--   Library mkl: not found
-- MKL library not found
-- Checking for [blis]
--   Library blis: BLAS_blis_LIBRARY-NOTFOUND
-- Checking for [Accelerate]
--   Library Accelerate: BLAS_Accelerate_LIBRARY-NOTFOUND
-- Checking for [vecLib]
--   Library vecLib: BLAS_vecLib_LIBRARY-NOTFOUND
-- Checking for [flexiblas]
--   Library flexiblas: BLAS_flexiblas_LIBRARY-NOTFOUND
-- Checking for [openblas]
--   Library openblas: /root/anaconda3/lib/libopenblas.so
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Performing Test BLAS_F2C_DOUBLE_WORKS
-- Performing Test BLAS_F2C_DOUBLE_WORKS - Failed
-- Performing Test BLAS_F2C_FLOAT_WORKS
-- Performing Test BLAS_F2C_FLOAT_WORKS - Success
-- Performing Test BLAS_USE_CBLAS_DOT
-- Performing Test BLAS_USE_CBLAS_DOT - Success
-- Looking for sbgemm_
-- Looking for sbgemm_ - not found
-- Found a library with BLAS API (open). Full path: (/root/anaconda3/lib/libopenblas.so)
-- Using pocketfft in directory: /root/pytorch/third_party/pocketfft/
-- The ASM compiler identification is GNU
-- Found assembler: /usr/bin/cc
-- Brace yourself, we are building NNPACK
-- NNPACK backend is neon
-- Found Python: /root/anaconda3/bin/python3 (found version "3.11.3") found components: Interpreter 
-- Found Git: /usr/bin/git (found version "2.17.1") 
-- git version: v1.6.1 normalized to 1.6.1
-- Version: 1.6.1
-- Looking for shm_open in rt
-- Looking for shm_open in rt - found
-- Performing Test HAVE_CXX_FLAG_STD_CXX11
-- Performing Test HAVE_CXX_FLAG_STD_CXX11 - Success
-- Performing Test HAVE_CXX_FLAG_WALL
-- Performing Test HAVE_CXX_FLAG_WALL - Success
-- Performing Test HAVE_CXX_FLAG_WEXTRA
-- Performing Test HAVE_CXX_FLAG_WEXTRA - Success
-- Performing Test HAVE_CXX_FLAG_WSHADOW
-- Performing Test HAVE_CXX_FLAG_WSHADOW - Success
-- Performing Test HAVE_CXX_FLAG_WERROR
-- Performing Test HAVE_CXX_FLAG_WERROR - Success
-- Performing Test HAVE_CXX_FLAG_WSUGGEST_OVERRIDE
-- Performing Test HAVE_CXX_FLAG_WSUGGEST_OVERRIDE - Success
-- Performing Test HAVE_CXX_FLAG_PEDANTIC
-- Performing Test HAVE_CXX_FLAG_PEDANTIC - Success
-- Performing Test HAVE_CXX_FLAG_PEDANTIC_ERRORS
-- Performing Test HAVE_CXX_FLAG_PEDANTIC_ERRORS - Success
-- Performing Test HAVE_CXX_FLAG_WSHORTEN_64_TO_32
-- Performing Test HAVE_CXX_FLAG_WSHORTEN_64_TO_32 - Failed
-- Performing Test HAVE_CXX_FLAG_FSTRICT_ALIASING
-- Performing Test HAVE_CXX_FLAG_FSTRICT_ALIASING - Success
-- Performing Test HAVE_CXX_FLAG_WNO_DEPRECATED_DECLARATIONS
-- Performing Test HAVE_CXX_FLAG_WNO_DEPRECATED_DECLARATIONS - Success
-- Performing Test HAVE_CXX_FLAG_WNO_DEPRECATED
-- Performing Test HAVE_CXX_FLAG_WNO_DEPRECATED - Success
-- Performing Test HAVE_CXX_FLAG_WSTRICT_ALIASING
-- Performing Test HAVE_CXX_FLAG_WSTRICT_ALIASING - Success
-- Performing Test HAVE_CXX_FLAG_WD654
-- Performing Test HAVE_CXX_FLAG_WD654 - Failed
-- Performing Test HAVE_CXX_FLAG_WTHREAD_SAFETY
-- Performing Test HAVE_CXX_FLAG_WTHREAD_SAFETY - Failed
-- Performing Test HAVE_CXX_FLAG_COVERAGE
-- Performing Test HAVE_CXX_FLAG_COVERAGE - Success
-- Performing Test HAVE_STD_REGEX
-- Performing Test HAVE_STD_REGEX
-- Performing Test HAVE_STD_REGEX -- success
-- Performing Test HAVE_GNU_POSIX_REGEX
-- Performing Test HAVE_GNU_POSIX_REGEX
-- Performing Test HAVE_GNU_POSIX_REGEX -- failed to compile
-- Performing Test HAVE_POSIX_REGEX
-- Performing Test HAVE_POSIX_REGEX
-- Performing Test HAVE_POSIX_REGEX -- success
-- Performing Test HAVE_STEADY_CLOCK
-- Performing Test HAVE_STEADY_CLOCK
-- Performing Test HAVE_STEADY_CLOCK -- success
CMake Warning at cmake/Dependencies.cmake:796 (message):
  A compiler with AVX512 support is required for FBGEMM.  Not compiling with
  FBGEMM.  Turn this warning off by USE_FBGEMM=OFF.
Call Stack (most recent call first):
  CMakeLists.txt:718 (include)


CMake Warning at cmake/Dependencies.cmake:838 (message):
  Turning USE_FAKELOWP off as it depends on USE_FBGEMM.
Call Stack (most recent call first):
  CMakeLists.txt:718 (include)


-- Found Numa: /usr/include  
-- Found Numa  (include: /usr/include, library: /usr/lib/aarch64-linux-gnu/libnuma.so)
-- Using third party subdirectory Eigen.
-- Found PythonInterp: /root/anaconda3/bin/python (found suitable version "3.11.3", minimum required is "3.0") 
-- Found PythonLibs: /root/anaconda3/lib/libpython3.11.a (found suitable version "3.11.3", minimum required is "3.0") 
-- Using third_party/pybind11.
-- pybind11 include dirs: /root/pytorch/cmake/../third_party/pybind11/include
-- Found MPI_C: /usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi.so (found version "3.1") 
-- Found MPI_CXX: /usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so (found version "3.1") 
-- Found MPI: TRUE (found version "3.1")  
-- MPI support found
-- MPI compile flags: -pthread
-- MPI include path: /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include/usr/lib/aarch64-linux-gnu/openmpi/include
-- MPI LINK flags path: -L/usr/lib -pthread
-- MPI libraries: /usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi.so
CMake Warning at cmake/Dependencies.cmake:1164 (message):
  OpenMPI found, but it is not built with CUDA support.
Call Stack (most recent call first):
  CMakeLists.txt:718 (include)


-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
-- Adding OpenMP CXX_FLAGS: -fopenmp
-- Will link against OpenMP libraries: /usr/lib/gcc/aarch64-linux-gnu/7/libgomp.so;/usr/lib/aarch64-linux-gnu/libpthread.so
-- Disabling kernel asserts for ROCm
-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 3.5;5.0;8.0;8.6;8.9;9.0;8.9+PTX;9.0+PTX
CMake Warning at cmake/External/nccl.cmake:69 (message):
  Enabling NCCL library slimming
Call Stack (most recent call first):
  cmake/Dependencies.cmake:1345 (include)
  CMakeLists.txt:718 (include)


-- Found CUB: /usr/local/cuda/include  
-- Converting CMAKE_CUDA_FLAGS to CUDA_NVCC_FLAGS:
    CUDA_NVCC_FLAGS                = -Xfatbin;-compress-all;-DONNX_NAMESPACE=onnx_torch;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_89,code=compute_89;-gencode;arch=compute_90,code=compute_90;-Xcudafe;--diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl;--expt-relaxed-constexpr;--expt-extended-lambda
    CUDA_NVCC_FLAGS_DEBUG          = -g
    CUDA_NVCC_FLAGS_RELEASE        = -O3;-DNDEBUG
    CUDA_NVCC_FLAGS_RELWITHDEBINFO = -O2;-g;-DNDEBUG
    CUDA_NVCC_FLAGS_MINSIZEREL     = -O1;-DNDEBUG
-- Performing Test UV_LINT_W4
-- Performing Test UV_LINT_W4 - Failed
-- Performing Test UV_LINT_NO_UNUSED_PARAMETER_MSVC
-- Performing Test UV_LINT_NO_UNUSED_PARAMETER_MSVC - Failed
-- Performing Test UV_LINT_NO_CONDITIONAL_CONSTANT_MSVC
-- Performing Test UV_LINT_NO_CONDITIONAL_CONSTANT_MSVC - Failed
-- Performing Test UV_LINT_NO_NONSTANDARD_MSVC
-- Performing Test UV_LINT_NO_NONSTANDARD_MSVC - Failed
-- Performing Test UV_LINT_NO_NONSTANDARD_EMPTY_TU_MSVC
-- Performing Test UV_LINT_NO_NONSTANDARD_EMPTY_TU_MSVC - Failed
-- Performing Test UV_LINT_NO_NONSTANDARD_FILE_SCOPE_MSVC
-- Performing Test UV_LINT_NO_NONSTANDARD_FILE_SCOPE_MSVC - Failed
-- Performing Test UV_LINT_NO_NONSTANDARD_NONSTATIC_DLIMPORT_MSVC
-- Performing Test UV_LINT_NO_NONSTANDARD_NONSTATIC_DLIMPORT_MSVC - Failed
-- Performing Test UV_LINT_NO_HIDES_LOCAL
-- Performing Test UV_LINT_NO_HIDES_LOCAL - Failed
-- Performing Test UV_LINT_NO_HIDES_PARAM
-- Performing Test UV_LINT_NO_HIDES_PARAM - Failed
-- Performing Test UV_LINT_NO_HIDES_GLOBAL
-- Performing Test UV_LINT_NO_HIDES_GLOBAL - Failed
-- Performing Test UV_LINT_NO_CONDITIONAL_ASSIGNMENT_MSVC
-- Performing Test UV_LINT_NO_CONDITIONAL_ASSIGNMENT_MSVC - Failed
-- Performing Test UV_LINT_NO_UNSAFE_MSVC
-- Performing Test UV_LINT_NO_UNSAFE_MSVC - Failed
-- Performing Test UV_LINT_WALL
-- Performing Test UV_LINT_WALL - Success
-- Performing Test UV_LINT_NO_UNUSED_PARAMETER
-- Performing Test UV_LINT_NO_UNUSED_PARAMETER - Success
-- Performing Test UV_LINT_STRICT_PROTOTYPES
-- Performing Test UV_LINT_STRICT_PROTOTYPES - Success
-- Performing Test UV_LINT_EXTRA
-- Performing Test UV_LINT_EXTRA - Success
-- Performing Test UV_LINT_UTF8_MSVC
-- Performing Test UV_LINT_UTF8_MSVC - Failed
-- Performing Test UV_F_STRICT_ALIASING
-- Performing Test UV_F_STRICT_ALIASING - Success
-- summary of build options:
    Install prefix:  /root/pytorch/torch
    Target system:   Linux
    Compiler:
      C compiler:    /usr/bin/cc
      CFLAGS:         

-- Found uv: 1.38.1 (found version "1.38.1") 
CMake Warning (dev) at third_party/gloo/CMakeLists.txt:21 (option):
  Policy CMP0077 is not set: option() honors normal variables.  Run "cmake
  --help-policy CMP0077" for policy details.  Use the cmake_policy command to
  set the policy and suppress this warning.

  For compatibility with older versions of CMake, option is clearing the
  normal variable 'BUILD_BENCHMARK'.
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Gloo build as SHARED library
-- MPI include path: /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include/usr/lib/aarch64-linux-gnu/openmpi/include
-- MPI libraries: /usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi.so
CMake Warning (dev) at third_party/gloo/cmake/Cuda.cmake:109 (find_package):
  Policy CMP0074 is not set: find_package uses <PackageName>_ROOT variables.
  Run "cmake --help-policy CMP0074" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.

  CMake variable CUDAToolkit_ROOT is set to:

    /usr/local/cuda

  For compatibility, CMake is ignoring the variable.
Call Stack (most recent call first):
  third_party/gloo/cmake/Dependencies.cmake:115 (include)
  third_party/gloo/CMakeLists.txt:111 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Found CUDAToolkit: /usr/local/cuda/include (found suitable version "11.8.89", minimum required is "7.0") 
-- CUDA detected: 11.8.89
CMake Warning at cmake/Dependencies.cmake:1489 (message):
  Metal is only used in ios builds.
Call Stack (most recent call first):
  CMakeLists.txt:718 (include)


-- Found PythonInterp: /root/anaconda3/bin/python (found version "3.11.3") 
Generated: /root/pytorch/build/third_party/onnx/onnx/onnx_onnx_torch-ml.proto
Generated: /root/pytorch/build/third_party/onnx/onnx/onnx-operators_onnx_torch-ml.proto
Generated: /root/pytorch/build/third_party/onnx/onnx/onnx-data_onnx_torch.proto
-- 
-- ******** Summary ********
--   CMake version             : 3.22.1
--   CMake command             : /root/anaconda3/bin/cmake
--   System                    : Linux
--   C++ compiler              : /usr/bin/c++
--   C++ compiler version      : 7.5.0
--   CXX flags                 :  -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -Wnon-virtual-dtor
--   Build type                : Release
--   Compile definitions       : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;__STDC_FORMAT_MACROS
--   CMAKE_PREFIX_PATH         : /root/anaconda3/lib/python3.11/site-packages;/root/anaconda3;/usr/local/cuda
--   CMAKE_INSTALL_PREFIX      : /root/pytorch/torch
--   CMAKE_MODULE_PATH         : /root/pytorch/cmake/Modules;/root/pytorch/cmake/public/../Modules_CUDA_fix
-- 
--   ONNX version              : 1.14.0
--   ONNX NAMESPACE            : onnx_torch
--   ONNX_USE_LITE_PROTO       : OFF
--   USE_PROTOBUF_SHARED_LIBS  : OFF
--   Protobuf_USE_STATIC_LIBS  : ON
--   ONNX_DISABLE_EXCEPTIONS   : OFF
--   ONNX_WERROR               : OFF
--   ONNX_BUILD_TESTS          : OFF
--   ONNX_BUILD_BENCHMARKS     : OFF
-- 
--   Protobuf compiler         : 
--   Protobuf includes         : 
--   Protobuf libraries        : 
--   BUILD_ONNX_PYTHON         : OFF
-- 
-- ******** Summary ********
--   CMake version         : 3.22.1
--   CMake command         : /root/anaconda3/bin/cmake
--   System                : Linux
--   C++ compiler          : /usr/bin/c++
--   C++ compiler version  : 7.5.0
--   CXX flags             :  -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -Wnon-virtual-dtor
--   Build type            : Release
--   Compile definitions   : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1
--   CMAKE_PREFIX_PATH     : /root/anaconda3/lib/python3.11/site-packages;/root/anaconda3;/usr/local/cuda
--   CMAKE_INSTALL_PREFIX  : /root/pytorch/torch
--   CMAKE_MODULE_PATH     : /root/pytorch/cmake/Modules;/root/pytorch/cmake/public/../Modules_CUDA_fix
-- 
--   ONNX version          : 1.4.1
--   ONNX NAMESPACE        : onnx_torch
--   ONNX_BUILD_TESTS      : OFF
--   ONNX_BUILD_BENCHMARKS : OFF
--   ONNX_USE_LITE_PROTO   : OFF
--   ONNXIFI_DUMMY_BACKEND : 
-- 
--   Protobuf compiler     : 
--   Protobuf includes     : 
--   Protobuf libraries    : 
--   BUILD_ONNX_PYTHON     : OFF
-- Found CUDA with FP16 support, compiling with torch.cuda.HalfTensor
-- Adding -DNDEBUG to compile flags
-- Checking prototype magma_get_sgeqrf_nb for MAGMA_V2
-- Checking prototype magma_get_sgeqrf_nb for MAGMA_V2 - False
CMake Warning at cmake/Dependencies.cmake:1690 (message):
  Not compiling with MAGMA.  Suppress this warning with -DUSE_MAGMA=OFF.
Call Stack (most recent call first):
  CMakeLists.txt:718 (include)


-- Could not find hardware support for NEON on this machine.
-- No OMAP3 processor on this machine.
-- No OMAP4 processor on this machine.
-- Looking for cheev_
-- Looking for cheev_ - found
-- Looking for cgesdd_
-- Looking for cgesdd_ - found
-- Found a library with LAPACK API (open).
disabling ROCM because NOT USE_ROCM is set
-- MIOpen not found. Compiling without MIOpen support
disabling MKLDNN because USE_MKLDNN is not set
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
-- Looking for mmap
-- Looking for mmap - found
-- Looking for shm_open
-- Looking for shm_open - found
-- Looking for shm_unlink
-- Looking for shm_unlink - found
-- Looking for malloc_usable_size
-- Looking for malloc_usable_size - found
-- Performing Test C_HAS_THREAD
-- Performing Test C_HAS_THREAD - Success
-- <FindZVECTOR>
-- check z16
-- Performing Test COMPILE_OUT_z16
-- Performing Test COMPILE_OUT_z16 - Failed
-- check z15
-- Performing Test COMPILE_OUT_z15
-- Performing Test COMPILE_OUT_z15 - Failed
-- check z14
-- Performing Test COMPILE_OUT_z14
-- Performing Test COMPILE_OUT_z14 - Failed
-- </FindZVECTOR>
-- Module support is disabled.
-- Version: 9.1.0
-- Build type: Release
-- CXX_STANDARD: 17
-- Performing Test has_std_17_flag
-- Performing Test has_std_17_flag - Success
-- Performing Test has_std_1z_flag
-- Performing Test has_std_1z_flag - Success
-- Required features: cxx_variadic_templates
-- Using Kineto with CUPTI support
-- Configuring Kineto dependency:
--   KINETO_SOURCE_DIR = /root/pytorch/third_party/kineto/libkineto
--   KINETO_BUILD_TESTS = OFF
--   KINETO_LIBRARY_TYPE = static
--   CUDA_SOURCE_DIR = /usr/local/cuda
--   CUDA_INCLUDE_DIRS = /usr/local/cuda/include
--   CUPTI_INCLUDE_DIR = /usr/local/cuda/extras/CUPTI/include
--   CUDA_cupti_LIBRARY = /usr/local/cuda/extras/CUPTI/lib64/libcupti.so
-- Found CUPTI
INFO ROCM_SOURCE_DIR = 
-- Kineto: FMT_SOURCE_DIR = /root/pytorch/third_party/fmt
-- Kineto: FMT_INCLUDE_DIR = /root/pytorch/third_party/fmt/include
INFO CUPTI_INCLUDE_DIR = /usr/local/cuda/extras/CUPTI/include
INFO ROCTRACER_INCLUDE_DIR = /include/roctracer
INFO DYNOLOG_INCLUDE_DIR = /root/pytorch/third_party/kineto/libkineto/third_party/dynolog/
INFO IPCFABRIC_INCLUDE_DIR = /root/pytorch/third_party/kineto/libkineto/third_party/dynolog//dynolog/src/ipcfabric/
-- Configured Kineto
CMake Warning (dev) at /root/anaconda3/share/cmake-3.22/Modules/CMakeDependentOption.cmake:84 (message):
  Policy CMP0127 is not set: cmake_dependent_option() supports full Condition
  Syntax.  Run "cmake --help-policy CMP0127" for policy details.  Use the
  cmake_policy command to set the policy and suppress this warning.
Call Stack (most recent call first):
  CMakeLists.txt:721 (cmake_dependent_option)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- GCC 7.5.0: Adding gcc and gcc_s libs to link line
-- Performing Test HAS_WERROR_RETURN_TYPE
-- Performing Test HAS_WERROR_RETURN_TYPE - Success
-- Performing Test HAS_WERROR_NON_VIRTUAL_DTOR
-- Performing Test HAS_WERROR_NON_VIRTUAL_DTOR - Success
-- Performing Test HAS_WERROR_BRACED_SCALAR_INIT
-- Performing Test HAS_WERROR_BRACED_SCALAR_INIT - Failed
-- Performing Test HAS_WERROR_RANGE_LOOP_CONSTRUCT
-- Performing Test HAS_WERROR_RANGE_LOOP_CONSTRUCT - Failed
-- Performing Test HAS_WERROR_BOOL_OPERATION
-- Performing Test HAS_WERROR_BOOL_OPERATION - Success
-- Performing Test HAS_WNARROWING
-- Performing Test HAS_WNARROWING - Success
-- Performing Test HAS_WNO_MISSING_FIELD_INITIALIZERS
-- Performing Test HAS_WNO_MISSING_FIELD_INITIALIZERS - Success
-- Performing Test HAS_WNO_TYPE_LIMITS
-- Performing Test HAS_WNO_TYPE_LIMITS - Success
-- Performing Test HAS_WNO_ARRAY_BOUNDS
-- Performing Test HAS_WNO_ARRAY_BOUNDS - Success
-- Performing Test HAS_WNO_UNKNOWN_PRAGMAS
-- Performing Test HAS_WNO_UNKNOWN_PRAGMAS - Success
-- Performing Test HAS_WNO_UNUSED_PARAMETER
-- Performing Test HAS_WNO_UNUSED_PARAMETER - Success
-- Performing Test HAS_WNO_UNUSED_FUNCTION
-- Performing Test HAS_WNO_UNUSED_FUNCTION - Success
-- Performing Test HAS_WNO_UNUSED_RESULT
-- Performing Test HAS_WNO_UNUSED_RESULT - Success
-- Performing Test HAS_WNO_STRICT_OVERFLOW
-- Performing Test HAS_WNO_STRICT_OVERFLOW - Success
-- Performing Test HAS_WNO_STRICT_ALIASING
-- Performing Test HAS_WNO_STRICT_ALIASING - Success
-- Performing Test HAS_WVLA_EXTENSION
-- Performing Test HAS_WVLA_EXTENSION - Failed
-- Performing Test HAS_WNEWLINE_EOF
-- Performing Test HAS_WNEWLINE_EOF - Failed
-- Performing Test HAS_WINCONSISTENT_MISSING_OVERRIDE
-- Performing Test HAS_WINCONSISTENT_MISSING_OVERRIDE - Failed
-- Performing Test HAS_WINCONSISTENT_MISSING_DESTRUCTOR_OVERRIDE
-- Performing Test HAS_WINCONSISTENT_MISSING_DESTRUCTOR_OVERRIDE - Failed
-- Performing Test HAS_WNO_ERROR_PEDANTIC
-- Performing Test HAS_WNO_ERROR_PEDANTIC - Success
-- Performing Test HAS_WNO_ERROR_OLD_STYLE_CAST
-- Performing Test HAS_WNO_ERROR_OLD_STYLE_CAST - Success
-- Performing Test HAS_WNO_ERROR_INCONSISTENT_MISSING_OVERRIDE
-- Performing Test HAS_WNO_ERROR_INCONSISTENT_MISSING_OVERRIDE - Failed
-- Performing Test HAS_WNO_ERROR_INCONSISTENT_MISSING_DESTRUCTOR_OVERRIDE
-- Performing Test HAS_WNO_ERROR_INCONSISTENT_MISSING_DESTRUCTOR_OVERRIDE - Failed
-- Performing Test HAS_WCONSTANT_CONVERSION
-- Performing Test HAS_WCONSTANT_CONVERSION - Failed
-- Performing Test HAS_WNO_INVALID_PARTIAL_SPECIALIZATION
-- Performing Test HAS_WNO_INVALID_PARTIAL_SPECIALIZATION - Success
-- Performing Test HAS_WNO_UNUSED_PRIVATE_FIELD
-- Performing Test HAS_WNO_UNUSED_PRIVATE_FIELD - Success
-- Performing Test HAS_WNO_ALIGNED_ALLOCATION_UNAVAILABLE
-- Performing Test HAS_WNO_ALIGNED_ALLOCATION_UNAVAILABLE - Success
-- Performing Test HAS_WNO_MISSING_BRACES
-- Performing Test HAS_WNO_MISSING_BRACES - Success
-- Performing Test HAS_WUNUSED_LAMBDA_CAPTURE
-- Performing Test HAS_WUNUSED_LAMBDA_CAPTURE - Failed
-- Performing Test HAS_QUNUSED_ARGUMENTS
-- Performing Test HAS_QUNUSED_ARGUMENTS - Failed
-- Performing Test HAS_FDIAGNOSTICS_COLOR_ALWAYS
-- Performing Test HAS_FDIAGNOSTICS_COLOR_ALWAYS - Success
-- Performing Test HAS_FALIGNED_NEW
-- Performing Test HAS_FALIGNED_NEW - Success
-- Performing Test HAS_WNO_UNUSED_BUT_SET_VARIABLE
-- Performing Test HAS_WNO_UNUSED_BUT_SET_VARIABLE - Success
-- Performing Test HAS_WNO_MAYBE_UNINITIALIZED
-- Performing Test HAS_WNO_MAYBE_UNINITIALIZED - Success
-- Performing Test HAS_FSTANDALONE_DEBUG
-- Performing Test HAS_FSTANDALONE_DEBUG - Failed
-- Performing Test HAS_FNO_MATH_ERRNO
-- Performing Test HAS_FNO_MATH_ERRNO - Success
-- Performing Test HAS_FNO_TRAPPING_MATH
-- Performing Test HAS_FNO_TRAPPING_MATH - Success
-- Performing Test HAS_WERROR_FORMAT
-- Performing Test HAS_WERROR_FORMAT - Success
-- Performing Test HAS_WERROR_CAST_FUNCTION_TYPE
-- Performing Test HAS_WERROR_CAST_FUNCTION_TYPE - Failed
-- Performing Test HAS_VST1
-- Performing Test HAS_VST1 - Failed
-- Performing Test HAS_VLD1
-- Performing Test HAS_VLD1 - Failed
-- Performing Test HAS_WNO_STRINGOP_OVERFLOW
-- Performing Test HAS_WNO_STRINGOP_OVERFLOW - Success
-- Looking for backtrace
-- Looking for backtrace - found
-- backtrace facility detected in default set of libraries
-- Found Backtrace: /usr/include  
-- NUMA paths:
-- /usr/include
-- /usr/lib/aarch64-linux-gnu/libnuma.so
-- headers outputs: 
-- sources outputs: 
-- declarations_yaml outputs: 
-- Using ATen parallel backend: OMP
CMake Deprecation Warning at third_party/sleef/CMakeLists.txt:91 (cmake_policy):
  The OLD behavior for policy CMP0066 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.


-- Found OpenSSL: /root/anaconda3/lib/libcrypto.so (found version "1.1.1u")  
-- Check size of long double
-- Check size of long double - done
-- Performing Test COMPILER_SUPPORTS_LONG_DOUBLE
-- Performing Test COMPILER_SUPPORTS_LONG_DOUBLE - Success
-- Performing Test COMPILER_SUPPORTS_FLOAT128
-- Performing Test COMPILER_SUPPORTS_FLOAT128 - Failed
-- Performing Test COMPILER_SUPPORTS_SVE
-- Performing Test COMPILER_SUPPORTS_SVE - Failed
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Performing Test COMPILER_SUPPORTS_OPENMP
-- Performing Test COMPILER_SUPPORTS_OPENMP - Success
-- Performing Test COMPILER_SUPPORTS_WEAK_ALIASES
-- Performing Test COMPILER_SUPPORTS_WEAK_ALIASES - Success
-- Performing Test COMPILER_SUPPORTS_BUILTIN_MATH
-- Performing Test COMPILER_SUPPORTS_BUILTIN_MATH - Success
-- Performing Test COMPILER_SUPPORTS_SYS_GETRANDOM
-- Performing Test COMPILER_SUPPORTS_SYS_GETRANDOM - Success
-- Configuring build for SLEEF-v3.6.0
   Target system: Linux-6.2.0-26-generic
   Target processor: aarch64
   Host system: Linux-6.2.0-26-generic
   Host processor: aarch64
   Detected C compiler: GNU @ /usr/bin/cc
   CMake: 3.22.1
   Make program: /root/anaconda3/bin/ninja
-- Using option `-Wall -Wno-unused -Wno-attributes -Wno-unused-result -Wno-psabi -ffp-contract=off -fno-math-errno -fno-trapping-math` to compile libsleef
-- Building shared libs : OFF
-- Building static test bins: OFF
-- MPFR : /root/anaconda3/lib/libmpfr.so
-- MPFR header file in /root/anaconda3/include
-- GMP : /root/anaconda3/lib/libgmp.so
-- RT : /usr/lib/aarch64-linux-gnu/librt.so
-- FFTW3 : LIBFFTW3-NOTFOUND
-- OPENSSL : 1.1.1u
-- SDE : SDE_COMMAND-NOTFOUND
-- RUNNING_ON_TRAVIS : 
-- COMPILER_SUPPORTS_OPENMP : 1
AT_INSTALL_INCLUDE_DIR include/ATen/core
core header install: /root/pytorch/build/aten/src/ATen/core/TensorBody.h
core header install: /root/pytorch/build/aten/src/ATen/core/aten_interned_strings.h
core header install: /root/pytorch/build/aten/src/ATen/core/enum_tag.h
-- Generating sources for unboxing kernels /root/anaconda3/bin/python;-m;torchgen.gen_executorch;--source-path=/root/pytorch/test/edge/../../test/edge;--install-dir=/root/pytorch/build/out;--tags-path=/root/pytorch/test/edge/../../aten/src/ATen/native/tags.yaml;--aten-yaml-path=/root/pytorch/test/edge/../../aten/src/ATen/native/native_functions.yaml;--use-aten-lib;--op-selection-yaml-path=/root/pytorch/test/edge/../../test/edge/selected_operators.yaml;--custom-ops-yaml-path=/root/pytorch/test/edge/../../test/edge/custom_ops.yaml
-- Performing Test HAS_WNO_UNUSED_VARIABLE
-- Performing Test HAS_WNO_UNUSED_VARIABLE - Success
-- Performing Test HAS_WNO_UNUSED_BUT_SET_PARAMETER
-- Performing Test HAS_WNO_UNUSED_BUT_SET_PARAMETER - Success
-- _GLIBCXX_USE_CXX11_ABI=1 is already defined as a cmake variable
CMake Warning (dev) at torch/CMakeLists.txt:389:
  Syntax Warning in cmake code at column 107

  Argument not separated from preceding token by whitespace.
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) at torch/CMakeLists.txt:389:
  Syntax Warning in cmake code at column 115

  Argument not separated from preceding token by whitespace.
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 3.5;5.0;8.0;8.6;8.9;9.0;8.9+PTX;9.0+PTX
-- Using lib/python3.11/site-packages as python relative installation path
CMake Warning at CMakeLists.txt:1109 (message):
  Generated cmake files are only fully tested if one builds with system glog,
  gflags, and protobuf.  Other settings may generate files that are not well
  tested.


-- 
-- ******** Summary ********
-- General:
--   CMake version         : 3.22.1
--   CMake command         : /root/anaconda3/bin/cmake
--   System                : Linux
--   C++ compiler          : /usr/bin/c++
--   C++ compiler id       : GNU
--   C++ compiler version  : 7.5.0
--   Using ccache if found : ON
--   Found ccache          : CCACHE_PROGRAM-NOTFOUND
--   CXX flags             :  -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -DMISSING_ARM_VST1 -DMISSING_ARM_VLD1 -Wno-stringop-overflow
--   Build type            : Release
--   Compile definitions   : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS;BUILD_NVFUSER
--   CMAKE_PREFIX_PATH     : /root/anaconda3/lib/python3.11/site-packages;/root/anaconda3;/usr/local/cuda
--   CMAKE_INSTALL_PREFIX  : /root/pytorch/torch
--   USE_GOLD_LINKER       : OFF
-- 
--   TORCH_VERSION         : 2.1.0
--   BUILD_CAFFE2          : OFF
--   BUILD_CAFFE2_OPS      : OFF
--   BUILD_STATIC_RUNTIME_BENCHMARK: OFF
--   BUILD_TENSOREXPR_BENCHMARK: OFF
--   BUILD_NVFUSER_BENCHMARK: OFF
--   BUILD_BINARY          : OFF
--   BUILD_CUSTOM_PROTOBUF : ON
--     Link local protobuf : ON
--   BUILD_DOCS            : OFF
--   BUILD_PYTHON          : True
--     Python version      : 3.11.3
--     Python executable   : /root/anaconda3/bin/python
--     Pythonlibs version  : 3.11.3
--     Python library      : /root/anaconda3/lib/libpython3.11.a
--     Python includes     : /root/anaconda3/include/python3.11
--     Python site-packages: lib/python3.11/site-packages
--   BUILD_SHARED_LIBS     : ON
--   CAFFE2_USE_MSVC_STATIC_RUNTIME     : OFF
--   BUILD_TEST            : True
--   BUILD_JNI             : OFF
--   BUILD_MOBILE_AUTOGRAD : OFF
--   BUILD_LITE_INTERPRETER: OFF
--   INTERN_BUILD_MOBILE   : 
--   TRACING_BASED         : OFF
--   USE_BLAS              : 1
--     BLAS                : open
--     BLAS_HAS_SBGEMM     : 
--   USE_LAPACK            : 1
--     LAPACK              : open
--   USE_ASAN              : OFF
--   USE_TSAN              : OFF
--   USE_CPP_CODE_COVERAGE : OFF
--   USE_CUDA              : 1
--     Split CUDA          : 
--     CUDA static link    : OFF
--     USE_CUDNN           : OFF
--     USE_EXPERIMENTAL_CUDNN_V8_API: ON
--     CUDA version        : 11.8
--     USE_FLASH_ATTENTION : ON
--     CUDA root directory : /usr/local/cuda
--     CUDA library        : /usr/local/cuda/lib64/stubs/libcuda.so
--     cudart library      : /usr/local/cuda/lib64/libcudart.so
--     cublas library      : /usr/local/cuda/lib64/libcublas.so
--     cufft library       : /usr/local/cuda/lib64/libcufft.so
--     curand library      : /usr/local/cuda/lib64/libcurand.so
--     cusparse library    : /usr/local/cuda/lib64/libcusparse.so
--     nvrtc               : /usr/local/cuda/lib64/libnvrtc.so
--     CUDA include path   : /usr/local/cuda/include
--     NVCC executable     : /usr/local/cuda/bin/nvcc
--     CUDA compiler       : /usr/local/cuda/bin/nvcc
--     CUDA flags          :  -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_89,code=compute_89 -gencode arch=compute_90,code=compute_90 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda  -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__
--     CUDA host compiler  : 
--     CUDA --device-c     : OFF
--     USE_TENSORRT        : OFF
--   USE_ROCM              : OFF
--   BUILD_NVFUSER         : ON
--   USE_EIGEN_FOR_BLAS    : ON
--   USE_FBGEMM            : OFF
--     USE_FAKELOWP          : OFF
--   USE_KINETO            : ON
--   USE_FFMPEG            : OFF
--   USE_GFLAGS            : OFF
--   USE_GLOG              : OFF
--   USE_LEVELDB           : OFF
--   USE_LITE_PROTO        : OFF
--   USE_LMDB              : OFF
--   USE_METAL             : OFF
--   USE_PYTORCH_METAL     : OFF
--   USE_PYTORCH_METAL_EXPORT     : OFF
--   USE_MPS               : OFF
--   USE_FFTW              : OFF
--   USE_MKL               : OFF
--   USE_MKLDNN            : OFF
--   USE_UCC               : OFF
--   USE_ITT               : OFF
--   USE_NCCL              : ON
--     USE_SYSTEM_NCCL     : OFF
--     USE_NCCL_WITH_UCC   : OFF
--   USE_NNPACK            : ON
--   USE_NUMPY             : ON
--   USE_OBSERVERS         : ON
--   USE_OPENCL            : OFF
--   USE_OPENCV            : OFF
--   USE_OPENMP            : ON
--   USE_TBB               : OFF
--   USE_MIMALLOC          : OFF
--   USE_VULKAN            : OFF
--   USE_PROF              : OFF
--   USE_QNNPACK           : ON
--   USE_PYTORCH_QNNPACK   : ON
--   USE_XNNPACK           : ON
--   USE_REDIS             : OFF
--   USE_ROCKSDB           : OFF
--   USE_ZMQ               : OFF
--   USE_DISTRIBUTED       : ON
--     USE_MPI               : ON
--     USE_GLOO              : ON
--     USE_GLOO_WITH_OPENSSL : OFF
--     USE_TENSORPIPE        : ON
--   Public Dependencies  : 
--   Private Dependencies : Threads::Threads;pthreadpool;cpuinfo;qnnpack;pytorch_qnnpack;nnpack;XNNPACK;/usr/lib/aarch64-linux-gnu/libnuma.so;fp16;/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so;/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi.so;caffe2::openmp;tensorpipe;gloo;foxi_loader;rt;fmt::fmt-header-only;kineto;gcc_s;gcc;dl
--   Public CUDA Deps.    : caffe2::cufft;caffe2::curand;caffe2::cublas
--   Private CUDA Deps.   : __caffe2_nccl;tensorpipe_cuda;gloo_cuda;/usr/local/cuda/lib64/libcudart.so;CUDA::cusparse;CUDA::curand;CUDA::cufft;ATEN_CUDA_FILES_GEN_LIB
--   USE_COREML_DELEGATE     : OFF
--   BUILD_LAZY_TS_BACKEND   : ON
--   TORCH_DISABLE_GPU_ASSERTS : ON
-- Performing Test HAS_WMISSING_PROTOTYPES
-- Performing Test HAS_WMISSING_PROTOTYPES - Failed
-- Performing Test HAS_WERROR_MISSING_PROTOTYPES
-- Performing Test HAS_WERROR_MISSING_PROTOTYPES - Failed
-- Configuring done
CMake Warning at caffe2/CMakeLists.txt:813 (add_library):
  Cannot generate a safe runtime search path for target torch_cpu because
  files in some directories may conflict with libraries in implicit
  directories:

    runtime library [libgomp.so.1] in /usr/lib/gcc/aarch64-linux-gnu/7 may be hidden by files in:
      /root/anaconda3/lib

  Some of these libraries may not be found correctly.


-- Generating done
-- Build files have been written to: /root/pytorch/build
cmake --build . --target install --config Release
[1/4] Generating ATen declarations_yaml
[3/4] Generating ATen sources

posted on 2023-08-11 08:15 Angry_Panda 阅读(506) 评论(0) 收藏举报

刷新页面返回顶部

Angry Panda（T-800）

大连人工智能计算平台——华为昇腾AI平台——高性能计算HPC的pytorch源码编译的一些注意事项

大连人工智能计算平台——华为昇腾AI平台——高性能计算HPC的pytorch源码编译报错——USE_CUDA=OFF——编译好的pytorch不支持CUDA的问题解决

公告

导航