guangxuan-xiao / torch-int Goto Github PK

View Code? Open in Web Editor NEW

146.0 2.0 49.0 948 KB

This repository contains integer operators on GPUs for PyTorch.

License: MIT License

Python 61.29% Shell 8.54% C++ 1.14% Cuda 27.55% C 1.48%

torch-int's Introduction

torch-int

This repository contains integer operators on GPUs for PyTorch.

Dependencies

CUTLASS
PyTorch with CUDA 11.3
NVIDIA-Toolkit 11.3
CUDA Driver 11.3
gcc g++ 9.4.0
cmake >= 3.12

Installation

git clone --recurse-submodules https://github.com/Guangxuan-Xiao/torch-int.git
conda create -n int python=3.8
conda activate int
conda install -c anaconda gxx_linux-64=9
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
source environment.sh
bash build_cutlass.sh
python setup.py install

Test

python tests/test_linear_modules.py

torch-int's People

Contributors

Stargazers

Watchers

Forkers

techthiyanes imanhosseini teargosling jamesdborin merrymercy mickaelseznec yangwang92 ryanai3 gongcheng1919 erichan1 els-rd artek0chumak csarron chanchichoi lyg95 zhenghsi sungwookson berserkr ziruiou leoputera2407 sarvex deelvin cl-shang fabianschuetze dacorvo tanish-g charlesxrwu meicale eziozz geonyi cylinbao b1ore fwkey sleepcoo anizpz zhiwei-dong niconico6 llcurious kimmishi wissamantoun mard1no kyrie2to11 generalpoxter chuangellow weishengying ken012git thibaultcastells

torch-int's Issues

ModuleNotFoundError: No module named 'torch_int._CUDA'

Error happens as below:

>>> import torch_int
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xxx/jiang/torch-int/torch_int/__init__.py", line 1, in <module>
    from . import nn
  File "/home/xxx/jiang/torch-int/torch_int/nn/__init__.py", line 1, in <module>
    from .linear import W8A16Linear, W8FakeA8Linear
  File "/home/xxx/jiang/torch-int/torch_int/nn/linear.py", line 2, in <module>
    from .._CUDA import (linear_a8_w8_b32_o32,
ModuleNotFoundError: No module named 'torch_int._CUDA'

Where is the _CUDA module?

is it possible to install torch-int on CUDA version 12.3

My cuda version is 12.3 and it is nontrivial for me get to 11.3. When I run python setup.py install I get RuntimeError:
The detected CUDA version (12.3) mismatches the version that was used to compile
PyTorch (11.3). Please make sure to use the same CUDA versions.

Is it possible to install it on 12.3 CUDA?

3 | #include <cutlass/core_io.h> | ^~~~~~~~~~~~~~~~~~~ compilation terminated

这个错误什么意思

fatal error: crypt.h: No such file or directory

python setup.py install
running install
/opt/conda/envs/smoothquant/lib/python3.8/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

    ********************************************************************************
    Please avoid running ``setup.py`` directly.
    Instead, use pypa/build, pypa/installer or other
    standards-based tools.

    See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
    ********************************************************************************

!!
self.initialize_options()
/opt/conda/envs/smoothquant/lib/python3.8/site-packages/setuptools/_distutils/cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!

    ********************************************************************************
    Please avoid running ``setup.py`` and ``easy_install``.
    Instead, use pypa/build, pypa/installer or other
    standards-based tools.

    See https://github.com/pypa/setuptools/issues/917 for details.
    ********************************************************************************

!!
self.initialize_options()
running bdist_egg
running egg_info
writing torch_int.egg-info/PKG-INFO
writing dependency_links to torch_int.egg-info/dependency_links.txt
writing top-level names to torch_int.egg-info/top_level.txt
reading manifest file 'torch_int.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'torch_int.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
copying torch_int/init.py -> build/lib.linux-x86_64-cpython-38/torch_int
copying torch_int/nn/fused.py -> build/lib.linux-x86_64-cpython-38/torch_int/nn
copying torch_int/nn/linear.py -> build/lib.linux-x86_64-cpython-38/torch_int/nn
copying torch_int/nn/bmm.py -> build/lib.linux-x86_64-cpython-38/torch_int/nn
copying torch_int/nn/init.py -> build/lib.linux-x86_64-cpython-38/torch_int/nn
copying torch_int/utils/init.py -> build/lib.linux-x86_64-cpython-38/torch_int/utils
copying torch_int/functional/fused.py -> build/lib.linux-x86_64-cpython-38/torch_int/functional
copying torch_int/functional/bmm.py -> build/lib.linux-x86_64-cpython-38/torch_int/functional
copying torch_int/functional/quantization.py -> build/lib.linux-x86_64-cpython-38/torch_int/functional
copying torch_int/functional/init.py -> build/lib.linux-x86_64-cpython-38/torch_int/functional
copying torch_int/models/opt.py -> build/lib.linux-x86_64-cpython-38/torch_int/models
copying torch_int/models/init.py -> build/lib.linux-x86_64-cpython-38/torch_int/models
running build_ext
/opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/utils/cpp_extension.py:813: UserWarning: The detected CUDA version (11.8) has a minor version mismatch with the version that was used to compile PyTorch (11.3). Most likely this shouldn't be a problem.
warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
/opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/utils/cpp_extension.py:820: UserWarning: There are no /opt/conda/envs/smoothquant/bin/x86_64-conda-linux-gnu-c++ version bounds defined for CUDA version 11.8
warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
building 'torch_int.CUDA' extension
/opt/conda/envs/smoothquant/bin/x86_64-conda-linux-gnu-cc -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/envs/smoothquant/include -fPIC -O2 -isystem /opt/conda/envs/smoothquant/include -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /opt/conda/envs/smoothquant/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /opt/conda/envs/smoothquant/include -fPIC -Itorch_int/kernels/include -I/opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/include -I/opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.8/include -I/opt/conda/envs/smoothquant/include/python3.8 -c torch_int/kernels/bindings.cpp -o build/temp.linux-x86_64-cpython-38/torch_int/kernels/bindings.o -std=c++14 -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -DTORCH_EXTENSION_NAME=CUDA -D_GLIBCXX_USE_CXX11_ABI=0
/usr/local/cuda-11.8/bin/nvcc -Itorch_int/kernels/include -I/opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/include -I/opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.8/include -I/opt/conda/envs/smoothquant/include/python3.8 -c torch_int/kernels/bmm.cu -o build/temp.linux-x86_64-cpython-38/torch_int/kernels/bmm.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS_ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -DCUDA_ARCH=800 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=_CUDA -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -ccbin /opt/conda/envs/smoothquant/bin/x86_64-conda-linux-gnu-cc
/opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/include/c10/core/SymInt.h(84): warning #68-D: integer conversion resulted in a change of sign

/opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/include/c10/core/SymInt.h(84): warning #68-D: integer conversion resulted in a change of sign

/usr/local/cuda-11.8/bin/nvcc -Itorch_int/kernels/include -I/opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/include -I/opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.8/include -I/opt/conda/envs/smoothquant/include/python3.8 -c torch_int/kernels/fused.cu -o build/temp.linux-x86_64-cpython-38/torch_int/kernels/fused.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -DCUDA_ARCH=800 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=_CUDA -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -ccbin /opt/conda/envs/smoothquant/bin/x86_64-conda-linux-gnu-cc
In file included from /opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/include/torch/csrc/python_headers.h:10,
from /opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/include/torch/csrc/Device.h:3,
from /opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
from /opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/include/torch/extension.h:6,
from /opt/conda/envs/smoothquant/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:6,
from torch_int/kernels/include/common.h:8,
from torch_int/kernels/fused.cu:2:
/opt/conda/envs/smoothquant/include/python3.8/Python.h:44:10: fatal error: crypt.h: No such file or directory
44 | #include <crypt.h>
| ^~~~~~~~~
compilation terminated.
error: command '/usr/local/cuda-11.8/bin/nvcc' failed with exit code 1

A version conflict of cutlass & torch-int

I noticed that to build torch-int, we need CUDA version for 11.3. However, the dependency cutlass need CUDA version of 11.4. Is there any conflict?

change all static link library to dynamic so when use nvcr.io/nvidia/pytorch:22.10-py3 container

there is no static lib in ngc container, and add link path in setup.py like this:
extra_link_args=['-lcublas', '-lcublasLt',
'-lculibos', '-lcudart', '-lcudart',
'-lrt', '-lpthread', '-ldl', '-L/usr/lib/x86_64-linux-gnu/', '-L/usr/local/cuda-11.8/targets/x86_64-linux/lib/'],

Tests `test_linear_shape.py` fails

Thanks for this wonderful repo - it's a pleasure to work with it.

While playing around with the repo, I realized that the Linear layer accomodates only some input sizes. In fact, when running the test_linear_shape.py test, I get the following output:

test_quant_linear_a8_w8_bfp32_ofp32
Traceback (most recent call last):
  File "/notebooks/torch-int/tests/test_linear_shape.py", line 42, in <module>
    test_quant_linear_a8_w8_bfp32_ofp32()
  File "/usr/local/lib/python3.9/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/notebooks/torch-int/tests/test_linear_shape.py", line 18, in test_quant_linear_a8_w8_bfp32_ofp32
    y = linear_a8_w8_bfp32_ofp32(
RuntimeError: cutlass cannot implement

Is there some shape restriction on B, N, M that need to be satisfied for working with torch-int?

from .._CUDA import (linear_a8_w8_b32_o32, ImportError: attempted relative import with no known parent package

请问源码中是少了_CUDA的包吗

your environment can never be created

Followed the instruction while always encounter errors such as
/opt/conda/envs/int/compiler_compat/ld: cannot find -lcublas_static: No such file or directory

Even by using nvidia docker image, the error is still there.

Undefined symbol when running test_linear_modules.py

I followed all instructions, but when I ran test_linear_modules.py, the following error happen

ImportError: /home/ubuntu/anaconda3/envs/int/lib/python3.8/site-packages/torch_int/_CUDA.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

How can I fix it?

‼️A100/A800 not working

RuntimeError: cutlass cannot run, status: 7
can u help me?

CUTLASS submodule URL might not work

I've run into an issue while initializing cutlass. Instead of using the ssh address, I might be easier to use https there:

torch-int/.gitmodules

Line 3 in d03a8da

url = [email protected]:NVIDIA/cutlass.git

what is ellm.tools.quantize_int?

I really enjoy read this repository

i wonder that

/benchmark/bench_model.py
In line 3, from ellm.tools.quantize_int import quantize_model_int

what is ellm.tools.quantize_int?

file benchmark/bench_linear.sh is broken

python tests/bench_linear.py doesn't exist...

error: "auto" is not allowed here

Could not find compiler set in environment variable CUDACXX

When I run "bash build_cutlass.sh", an error happens:

-- CMake Version: 3.18.2
-- The CXX compiler identification is GNU 8.4.1
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at /usr/share/cmake/Modules/CMakeDetermineCUDACompiler.cmake:25 (message):
Could not find compiler set in environment variable CUDACXX:

/usr/local/cuda/bin/nvcc.

Call Stack (most recent call first):
CUDA.cmake:46 (enable_language)
CMakeLists.txt:42 (include)

CMake Error: CMAKE_CUDA_COMPILER not set, after EnableLanguage
-- Configuring incomplete, errors occurred!
See also "/home/lizhangming/Project/torch-int-main/submodules/cutlass/build/CMakeFiles/CMakeOutput.log".
make: *** No targets specified and no makefile found. Stop.

what should I do before "bash build_cutlass.sh"?

Cannot read from remote repo when cloning './torch-int/submodules/cutlass'

The error message is as follows:

Submodule 'submodules/cutlass' ([email protected]:NVIDIA/cutlass.git) registered for path 'submodules/cutlass'
Cloning into '/opt/conda/bin/torch-int/submodules/cutlass'...
kex_exchange_identification: Connection closed by remote host
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
fatal: clone of '[email protected]:NVIDIA/cutlass.git' into submodule path '/opt/conda/bin/torch-int/submodules/cutlass' failed
Failed to clone 'submodules/cutlass'. Retry scheduled
Cloning into '/opt/conda/bin/torch-int/submodules/cutlass'...
kex_exchange_identification: Connection closed by remote host
fatal: Could not read from remote repository.

Why can't I clone from the Nv repo? If it's the name or path has changed?

error: "auto" is not allowed here

编译时候总是遇到这类问题

type_traits.hpp(43): error: namespace "std" has no member "conjunction"
/torch-int/submodules/cutlass/include/cute/util/type_traits.hpp(44): error: namespace "std" has no member "conjunction_v"
/torch-int/submodules/cutlass/include/cute/util/type_traits.hpp(46): error: namespace "std" has no member "disjunction"
/torch-int/submodules/cutlass/include/cute/util/type_traits.hpp(47): error: namespace "std" has no member "disjunction_v"
/torch-int/submodules/cutlass/include/cute/util/type_traits.hpp(49): error: namespace "std" has no member "negation"
/torch-int/submodules/cutlass/include/cute/util/type_traits.hpp(50): error: namespace "std" has no member "negation_v"
/torch-int/submodules/cutlass/include/cute/util/type_traits.hpp(52): error: namespace "std" has no member "void_t"
/torch-int/submodules/cutlass/include/cute/numeric/integral_constant.hpp(78): error: "auto" is not allowed here
/torch-int/submodules/cutlass/include/cute/numeric/integral_constant.hpp(80): error: "auto" is not allowed here
/torch-int/submodules/cutlass/include/cute/numeric/integral_constant.hpp(82): error: "auto" is not allowed here
/torch-int/submodules/cutlass/include/cute/numeric/integral_constant.hpp(84): error: "auto" is not allowed here
/torch-int/submodules/cutlass/include/cute/numeric/integral_constant.hpp(86): error: "auto" is not allowed here
/torch-int/submodules/cutlass/include/cute/numeric/integral_constant.hpp(88): error: "auto" is not allowed here
/torch-int/submodules/cutlass/include/cute/underscore.hpp(67): error: disjunction is not a template
/torch-int/submodules/cutlass/include/cute/underscore.hpp(79): error: conjunction is not a template
/torch-int/submodules/cutlass/include/cutlass/gemm/gemm.h(562): error: namespace "std" has no member "void_t"
/torch-int/submodules/cutlass/include/cutlass/gemm/gemm.h(562): error: expected a ">"
/torch-int/submodules/cutlass/include/cutlass/gemm/gemm.h(562): error: expected a ";"

1）需要将setup.py中c++14 改成c++17

ZIP does not support timestamps before 1980

with cmd 'python setup.py install' , it returns error like "ZIP does not support timestamps before 1980"

how could I get rid of these problem?

can not install by setup.py because /ld: cannot find -lcublas_static

Hi,
I think this repo is great and tried it using pre-baked docker image for nvidi, and after following the instructions in Readme.md, I got these errors.
/root/anaconda3/envs/int/bin/../lib/gcc/x86_64-conda-linux-gnu/9.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: cannot find -lcublas_static
/root/anaconda3/envs/int/bin/../lib/gcc/x86_64-conda-linux-gnu/9.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: cannot find -lcublasLt_static

I guess maybe the cublass location is different, but I can not figure out how to make it right. Would you mind helping me out? Thx!

I also can not find cublasLt_static or cublas_static by using find. The results were:
(int) root@bafd706ba0bf:/workspace/torch-int# find / -name "cublasLt" -print
/root/anaconda3/lib/python3.10/site-packages/torch/lib/libcublasLt.so.11
/root/anaconda3/envs/int/lib/python3.8/site-packages/torch/lib/libcublasLt.so.11
/usr/local/cuda-11.3/targets/x86_64-linux/lib/stubs/libcublasLt.so
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcublasLt.so
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcublasLt.so.11
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcublasLt.so.11.5.1.109
/usr/local/cuda-11.3/targets/x86_64-linux/include/cublasLt.h

BTW, if I link cublas without static it can finish the compilation. But is will fail the test on readme. python tests/test_linear_modules.py

编译报错,是什么原因啊

/usr/bin/ld: warning: /home/xxx/anaconda3090/envs/zach_glm/lib/libstdc++.so: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
/usr/bin/ld: warning: /home/xxx/anaconda3090/envs/zach_glm/lib/libstdc++.so: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
/usr/bin/ld: warning: /home/xxx/anaconda3090/envs/zach_glm/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
/usr/bin/ld: warning: /home/xxx/anaconda3090/envs/zach_glm/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
/usr/bin/ld: warning: /home/xxx/anaconda3090/envs/zach_glm/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
/usr/bin/ld: warning: /home/xx/anaconda3090/envs/zach_glm/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002