openbmb / bmtrain Goto Github PK
View Code? Open in Web Editor NEWEfficient Training (including pre-training and fine-tuning) for Big Models
License: Apache License 2.0
Efficient Training (including pre-training and fine-tuning) for Big Models
License: Apache License 2.0
If I wanna use CPU Offloading in my model, which API should I call in BMTrain?
在使用该框架的时候,默认使用了zero optimization,backward时间是forward的3倍左右,以换取更少的显存占用;如果不缺显存的话,我希望获得更快的训练速度,请问我怎样禁用zero optimization呢?
python setup.py install
running install
running bdist_egg
running egg_info
writing bmtrain.egg-info\PKG-INFO
writing dependency_links to bmtrain.egg-info\dependency_links.txt
writing requirements to bmtrain.egg-info\requires.txt
writing top-level names to bmtrain.egg-info\top_level.txt
reading manifest file 'bmtrain.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'bmtrain.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_py
running build_ext
building 'bmtrain.nccl._C' extension
Emitting ninja build file D:\code\python\nlp\BMTrain-main\build\temp.win-amd64-3.7\Release\build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
F:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:F:\Python\Python37\lib\site-packages\torch\lib "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\lib\x64" /LIBPATH:F:\Python\Python37\libs /LIBPATH:F:\Python\Python37\PCbuild\amd64 "/LIBPATH:F:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\ATLMFC\lib\x64" "/LIBPATH:F:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.22000.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.22000.0\um\x64" c10.lib torch.lib torch_cpu.lib torch_python.lib cudart.lib c10_cuda.lib torch_cuda_cu.lib torch_cuda_cpp.lib /EXPORT:PyInit__C D:\code\python\nlp\BMTrain-main\build\temp.win-amd64-3.7\Release\csrc/nccl.obj /OUT:build\lib.win-amd64-3.7\bmtrain\nccl_C.cp37-win_amd64.pyd /IMPLIB:D:\code\python\nlp\BMTrain-main\build\temp.win-amd64-3.7\Release\csrc_C.cp37-win_amd64.lib
正在创建库 D:\code\python\nlp\BMTrain-main\build\temp.win-amd64-3.7\Release\csrc_C.cp37-win_amd64.lib 和对象 D:\code\python\nlp\BMTrain-main\build\temp.win-amd64-3.7\Release\csrc_C.cp37-win_amd64.exp
nccl.obj : error LNK2001: 无法解析的外部符号 ncclCommInitRank
nccl.obj : error LNK2001: 无法解析的外部符号 ncclReduce
nccl.obj : error LNK2001: 无法解析的外部符号 ncclRecv
nccl.obj : error LNK2001: 无法解析的外部符号 ncclGroupEnd
nccl.obj : error LNK2001: 无法解析的外部符号 ncclSend
nccl.obj : error LNK2001: 无法解析的外部符号 ncclCommCount
nccl.obj : error LNK2001: 无法解析的外部符号 ncclGetUniqueId
nccl.obj : error LNK2001: 无法解析的外部符号 ncclCommDestroy
nccl.obj : error LNK2001: 无法解析的外部符号 ncclBroadcast
nccl.obj : error LNK2001: 无法解析的外部符号 ncclGroupStart
nccl.obj : error LNK2001: 无法解析的外部符号 ncclCommUserRank
nccl.obj : error LNK2001: 无法解析的外部符号 ncclReduceScatter
nccl.obj : error LNK2001: 无法解析的外部符号 ncclAllGather
nccl.obj : error LNK2001: 无法解析的外部符号 ncclAllReduce
nccl.obj : error LNK2001: 无法解析的外部符号 ncclGetErrorString
build\lib.win-amd64-3.7\bmtrain\nccl_C.cp37-win_amd64.pyd : fatal error LNK1120: 15 个无法解析的外部命令
error: command 'F:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\bin\HostX86\x64\link.exe' failed with exit status 1120
cuda 11.7
torch 1.13.1+cu117
torchaudio 0.13.1+cu117
torchvision 0.14.1+cu117
vs 2022
I can't find a solution. I tried different versions of torch and vs.
试了很多个版本的torch vs,还是不行,是因为cuda版本太高了吗?我显卡是3070,windows这么难编译的吗
Thanks for your great work,
On the official website, BMTrain support the following structure:
Encoder(bert-base-cased bert-base-uncased bert-large-cased bert-large-uncased bert-base-chinese bert-base-multilingual-cased)
Decoder(CPM-1(large) GPT-2(base) GPT-2(medium) GPT-2(large) GPT-2(XL) GPT-J(6B))
Encoder-Decoder(CPM-2(large) T5-small T5-base T5-large T5(3B) T5(11B))
Whether BMTrain support other model that outside the list?(e.g. resnet and so on)
Is there a tutorial?
Looking forward your prompt reply.
您好,问题如下:
4台机器,每台2张2080ti(11G),如果模型很大,一台机器加载不了,是否会通过模型并行加载到其他机器上?
如果是,各节点是依次执行以下代码实现训练吗?
torchrun --nnodes=4 --nproc_per_node=2 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=xxx.xxx.xxx.xx:88688} train.py
注:各节点执行以上代码时时,将指令中的rdzv_id=更改为1、2、3、4,以对应四台机器,是这样吗?
我的环境是cuda11.2,torch 1.12.1+cu113,能够安装成功bmtrain,但是导入错误如下:辛苦请教下这个是什么原因呢?比较急想用BMTrain,辛苦~
import bmtrain
Traceback (most recent call last):
File "", line 1, in
File "/root/anaconda3/lib/python3.8/site-packages/bmtrain/init.py", line 16, in
from . import optim
File "/root/anaconda3/lib/python3.8/site-packages/bmtrain/optim/init.py", line 1, in
from .adam import AdamOptimizer
File "/root/anaconda3/lib/python3.8/site-packages/bmtrain/optim/adam.py", line 4, in
from . import _cuda as C
ImportError: /root/anaconda3/lib/python3.8/site-packages/bmtrain/optim/_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor8data_ptrIhEEPT_v
Hi,
When using BMCOOK with BMTrain I encountered a bug that the second bmtrain.synchronize() is always stuck. Do you probably have any ideas?
Below is the code:
import os
import json
import torch
import random
import time
import bmtrain as bmt
from data import MMapIndexedDataset, Dataset
from bmcook import CookTrainer
from bmcook.utils.config import ConfigParser
from bmcook.utils.arguments import parse_args
from pathlib import Path
bmt.init_distributed()
args = parse_args()
save_dir = Path(args.save_dir)
ckpt_dir = save_dir / 'checkpoints'
os.makedirs(ckpt_dir, exist_ok=True)
json.dump(vars(args), open(save_dir / 'train_args.json', 'w'), indent=2)
model_config = config_map[args.model].from_pretrained(args.model)
model = model_map[args.model].from_pretrained(args.model, config=model_config)
# teacher model has the same config as the student model
teacher = model_map[args.model].from_pretrained(args.model, config=model_config)
bmt.synchronize() #this works
...
CookTrainer.set_compression(config, model, optimizer, teacher) #this step uses another bmt.synchronize() where I stuck
error massage: csrc/adam_cpu.cpp: 158:27 error const class at::tensor has no member named is_cpu
(cpm) D:\GitHub\BMTrain>python setup.py install
running install
C:\ProgramData\Anaconda3\envs\cpm\lib\site-packages\setuptools\command\install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
setuptools.SetuptoolsDeprecationWarning,
C:\ProgramData\Anaconda3\envs\cpm\lib\site-packages\setuptools\command\easy_install.py:147: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
EasyInstallDeprecationWarning,
running bdist_egg
running egg_info
writing bmtrain.egg-info\PKG-INFO
writing dependency_links to bmtrain.egg-info\dependency_links.txt
writing requirements to bmtrain.egg-info\requires.txt
writing top-level names to bmtrain.egg-info\top_level.txt
reading manifest file 'bmtrain.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'bmtrain.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_py
creating build\lib.win-amd64-cpython-37
creating build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\block_layer.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\checkpointing.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\debug.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\global_var.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\init.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\layer.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\parameter.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\param_init.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\pipe_layer.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\store.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\synchronize.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\utils.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\wrapper.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain_init_.py -> build\lib.win-amd64-cpython-37\bmtrain
creating build\lib.win-amd64-cpython-37\bmtrain\benchmark
copying bmtrain\benchmark\all_gather.py -> build\lib.win-amd64-cpython-37\bmtrain\benchmark
copying bmtrain\benchmark\reduce_scatter.py -> build\lib.win-amd64-cpython-37\bmtrain\benchmark
copying bmtrain\benchmark\send_recv.py -> build\lib.win-amd64-cpython-37\bmtrain\benchmark
copying bmtrain\benchmark\shape.py -> build\lib.win-amd64-cpython-37\bmtrain\benchmark
copying bmtrain\benchmark\utils.py -> build\lib.win-amd64-cpython-37\bmtrain\benchmark
copying bmtrain\benchmark_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\benchmark
creating build\lib.win-amd64-cpython-37\bmtrain\distributed
copying bmtrain\distributed\ops.py -> build\lib.win-amd64-cpython-37\bmtrain\distributed
copying bmtrain\distributed_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\distributed
creating build\lib.win-amd64-cpython-37\bmtrain\inspect
copying bmtrain\inspect\format.py -> build\lib.win-amd64-cpython-37\bmtrain\inspect
copying bmtrain\inspect\model.py -> build\lib.win-amd64-cpython-37\bmtrain\inspect
copying bmtrain\inspect\tensor.py -> build\lib.win-amd64-cpython-37\bmtrain\inspect
copying bmtrain\inspect_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\inspect
creating build\lib.win-amd64-cpython-37\bmtrain\loss
copying bmtrain\loss\cross_entropy.py -> build\lib.win-amd64-cpython-37\bmtrain\loss
copying bmtrain\loss_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\loss
creating build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler\cosine.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler\exponential.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler\linear.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler\noam.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler\no_decay.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler\warmup.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
creating build\lib.win-amd64-cpython-37\bmtrain\nccl
copying bmtrain\nccl\enums.py -> build\lib.win-amd64-cpython-37\bmtrain\nccl
copying bmtrain\nccl_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\nccl
creating build\lib.win-amd64-cpython-37\bmtrain\optim
copying bmtrain\optim\adam.py -> build\lib.win-amd64-cpython-37\bmtrain\optim
copying bmtrain\optim\adam_offload.py -> build\lib.win-amd64-cpython-37\bmtrain\optim
copying bmtrain\optim\optim_manager.py -> build\lib.win-amd64-cpython-37\bmtrain\optim
copying bmtrain\optim_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\optim
running build_ext
error: [WinError 5] 拒绝访问。
Sometimes, we need to use multiple optimizers for different parameters so that we can turn on and off the optimization of different parameters easily.
However, in the current implementation of BMTrain, every optimizer has its own scale. To make the gradient correct, either I need to put all parameters into one optimizer, or I need to call backward for multiple times for each optimizer with their own scaler (and I'm not sure if this works; not tried yet).
So I request for a utility that synchronizes the scalers of multiple optimizers, which takes the loss and a list of optimizers as parameters and works like this roughly as far I can see:
... # initialize
for optimizer in optimizers:
if optimizer.scale < min_scale:
min_scale = optimizer.scale
for optimizer in optimizers:
optimizer.scale = min_scale
loss = loss * min_scale ... # scale the loss
Building wheels for collected packages: bmtrain
Building wheel for bmtrain (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [156 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/init.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/backward.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/block_layer.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/checkpointing.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/debug.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/global_var.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/init.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/layer.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/param_init.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/parameter.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/store.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/synchronize.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/utils.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/wrapper.py -> build/lib.linux-x86_64-3.8/bmtrain
creating build/lib.linux-x86_64-3.8/bmtrain/benchmark
copying bmtrain/benchmark/init.py -> build/lib.linux-x86_64-3.8/bmtrain/benchmark
copying bmtrain/benchmark/all_gather.py -> build/lib.linux-x86_64-3.8/bmtrain/benchmark
copying bmtrain/benchmark/reduce_scatter.py -> build/lib.linux-x86_64-3.8/bmtrain/benchmark
copying bmtrain/benchmark/shape.py -> build/lib.linux-x86_64-3.8/bmtrain/benchmark
copying bmtrain/benchmark/utils.py -> build/lib.linux-x86_64-3.8/bmtrain/benchmark
creating build/lib.linux-x86_64-3.8/bmtrain/distributed
copying bmtrain/distributed/init.py -> build/lib.linux-x86_64-3.8/bmtrain/distributed
copying bmtrain/distributed/ops.py -> build/lib.linux-x86_64-3.8/bmtrain/distributed
creating build/lib.linux-x86_64-3.8/bmtrain/inspect
copying bmtrain/inspect/init.py -> build/lib.linux-x86_64-3.8/bmtrain/inspect
copying bmtrain/inspect/format.py -> build/lib.linux-x86_64-3.8/bmtrain/inspect
copying bmtrain/inspect/model.py -> build/lib.linux-x86_64-3.8/bmtrain/inspect
copying bmtrain/inspect/tensor.py -> build/lib.linux-x86_64-3.8/bmtrain/inspect
creating build/lib.linux-x86_64-3.8/bmtrain/loss
copying bmtrain/loss/init.py -> build/lib.linux-x86_64-3.8/bmtrain/loss
copying bmtrain/loss/cross_entropy.py -> build/lib.linux-x86_64-3.8/bmtrain/loss
creating build/lib.linux-x86_64-3.8/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/init.py -> build/lib.linux-x86_64-3.8/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/cosine.py -> build/lib.linux-x86_64-3.8/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/exponential.py -> build/lib.linux-x86_64-3.8/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/linear.py -> build/lib.linux-x86_64-3.8/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/no_decay.py -> build/lib.linux-x86_64-3.8/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/noam.py -> build/lib.linux-x86_64-3.8/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/warmup.py -> build/lib.linux-x86_64-3.8/bmtrain/lr_scheduler
creating build/lib.linux-x86_64-3.8/bmtrain/nccl
copying bmtrain/nccl/init.py -> build/lib.linux-x86_64-3.8/bmtrain/nccl
copying bmtrain/nccl/enums.py -> build/lib.linux-x86_64-3.8/bmtrain/nccl
creating build/lib.linux-x86_64-3.8/bmtrain/optim
copying bmtrain/optim/init.py -> build/lib.linux-x86_64-3.8/bmtrain/optim
copying bmtrain/optim/adam.py -> build/lib.linux-x86_64-3.8/bmtrain/optim
copying bmtrain/optim/adam_offload.py -> build/lib.linux-x86_64-3.8/bmtrain/optim
copying bmtrain/optim/clip_grad.py -> build/lib.linux-x86_64-3.8/bmtrain/optim
running build_ext
/usr/local/lib/python3.8/site-packages/torch/utils/cpp_extension.py:329: UserWarning:
!! WARNING !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (g++ 4.8.5) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 5.0 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.
See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 5 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! WARNING !!
warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
building 'bmtrain.nccl._C' extension
creating /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/build/temp.linux-x86_64-3.8
creating /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/build/temp.linux-x86_64-3.8/csrc
/usr/local/lib/python3.8/site-packages/torch/utils/cpp_extension.py:329: UserWarning:
!! WARNING !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++ 4.8.5) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 5.0 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.
See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 5 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! WARNING !!
warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
Emitting ninja build file /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/build/temp.linux-x86_64-3.8/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] c++ -MMD -MF /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/build/temp.linux-x86_64-3.8/csrc/nccl.o.d -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Icsrc/nccl/build/include -I/usr/local/lib/python3.8/site-packages/torch/include -I/usr/local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.8/site-packages/torch/include/TH -I/usr/local/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/local/include/python3.8 -c -c /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/csrc/nccl.cpp -o /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/build/temp.linux-x86_64-3.8/csrc/nccl.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
FAILED: /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/build/temp.linux-x86_64-3.8/csrc/nccl.o
c++ -MMD -MF /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/build/temp.linux-x86_64-3.8/csrc/nccl.o.d -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Icsrc/nccl/build/include -I/usr/local/lib/python3.8/site-packages/torch/include -I/usr/local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.8/site-packages/torch/include/TH -I/usr/local/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/local/include/python3.8 -c -c /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/csrc/nccl.cpp -o /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/build/temp.linux-x86_64-3.8/csrc/nccl.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
c++: error: unrecognized command line option ‘-std=c++14’
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1740, in _run_ninja_build
subprocess.run(
File "/usr/local/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/setup.py", line 74, in <module>
setup(
File "/usr/local/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/usr/local/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/local/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/usr/local/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 325, in run
self.run_command("build")
File "/usr/local/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/local/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.8/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/usr/local/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/local/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/usr/local/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/usr/local/lib/python3.8/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/usr/local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 741, in build_extensions
build_ext.build_extensions(self)
File "/usr/local/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
File "/usr/local/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/usr/local/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/usr/local/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
_build_ext.build_extension(self, ext)
File "/usr/local/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
objects = self.compiler.compile(sources,
File "/usr/local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 562, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/usr/local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1419, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/usr/local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1756, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for bmtrain
Running setup.py clean for bmtrain
Failed to build bmtrain
Installing collected packages: bmtrain, blis, absl-py, requests-oauthlib, pathy, markdown, google-auth, confection, thinc, google-auth-oauthlib, tensorboard, spacy
Running setup.py install for bmtrain ... error
error: subprocess-exited-with-error
× Running setup.py install for bmtrain did not run successfully.
│ exit code: 1
╰─> [158 lines of output]
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/init.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/backward.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/block_layer.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/checkpointing.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/debug.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/global_var.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/init.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/layer.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/param_init.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/parameter.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/store.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/synchronize.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/utils.py -> build/lib.linux-x86_64-3.8/bmtrain
copying bmtrain/wrapper.py -> build/lib.linux-x86_64-3.8/bmtrain
creating build/lib.linux-x86_64-3.8/bmtrain/benchmark
copying bmtrain/benchmark/init.py -> build/lib.linux-x86_64-3.8/bmtrain/benchmark
copying bmtrain/benchmark/all_gather.py -> build/lib.linux-x86_64-3.8/bmtrain/benchmark
copying bmtrain/benchmark/reduce_scatter.py -> build/lib.linux-x86_64-3.8/bmtrain/benchmark
copying bmtrain/benchmark/shape.py -> build/lib.linux-x86_64-3.8/bmtrain/benchmark
copying bmtrain/benchmark/utils.py -> build/lib.linux-x86_64-3.8/bmtrain/benchmark
creating build/lib.linux-x86_64-3.8/bmtrain/distributed
copying bmtrain/distributed/init.py -> build/lib.linux-x86_64-3.8/bmtrain/distributed
copying bmtrain/distributed/ops.py -> build/lib.linux-x86_64-3.8/bmtrain/distributed
creating build/lib.linux-x86_64-3.8/bmtrain/inspect
copying bmtrain/inspect/init.py -> build/lib.linux-x86_64-3.8/bmtrain/inspect
copying bmtrain/inspect/format.py -> build/lib.linux-x86_64-3.8/bmtrain/inspect
copying bmtrain/inspect/model.py -> build/lib.linux-x86_64-3.8/bmtrain/inspect
copying bmtrain/inspect/tensor.py -> build/lib.linux-x86_64-3.8/bmtrain/inspect
creating build/lib.linux-x86_64-3.8/bmtrain/loss
copying bmtrain/loss/init.py -> build/lib.linux-x86_64-3.8/bmtrain/loss
copying bmtrain/loss/cross_entropy.py -> build/lib.linux-x86_64-3.8/bmtrain/loss
creating build/lib.linux-x86_64-3.8/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/init.py -> build/lib.linux-x86_64-3.8/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/cosine.py -> build/lib.linux-x86_64-3.8/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/exponential.py -> build/lib.linux-x86_64-3.8/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/linear.py -> build/lib.linux-x86_64-3.8/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/no_decay.py -> build/lib.linux-x86_64-3.8/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/noam.py -> build/lib.linux-x86_64-3.8/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/warmup.py -> build/lib.linux-x86_64-3.8/bmtrain/lr_scheduler
creating build/lib.linux-x86_64-3.8/bmtrain/nccl
copying bmtrain/nccl/init.py -> build/lib.linux-x86_64-3.8/bmtrain/nccl
copying bmtrain/nccl/enums.py -> build/lib.linux-x86_64-3.8/bmtrain/nccl
creating build/lib.linux-x86_64-3.8/bmtrain/optim
copying bmtrain/optim/init.py -> build/lib.linux-x86_64-3.8/bmtrain/optim
copying bmtrain/optim/adam.py -> build/lib.linux-x86_64-3.8/bmtrain/optim
copying bmtrain/optim/adam_offload.py -> build/lib.linux-x86_64-3.8/bmtrain/optim
copying bmtrain/optim/clip_grad.py -> build/lib.linux-x86_64-3.8/bmtrain/optim
running build_ext
/usr/local/lib/python3.8/site-packages/torch/utils/cpp_extension.py:329: UserWarning:
!! WARNING !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (g++ 4.8.5) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 5.0 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.
See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 5 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! WARNING !!
warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
building 'bmtrain.nccl._C' extension
creating /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/build/temp.linux-x86_64-3.8
creating /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/build/temp.linux-x86_64-3.8/csrc
/usr/local/lib/python3.8/site-packages/torch/utils/cpp_extension.py:329: UserWarning:
!! WARNING !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++ 4.8.5) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 5.0 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.
See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 5 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! WARNING !!
warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
Emitting ninja build file /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/build/temp.linux-x86_64-3.8/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] c++ -MMD -MF /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/build/temp.linux-x86_64-3.8/csrc/nccl.o.d -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Icsrc/nccl/build/include -I/usr/local/lib/python3.8/site-packages/torch/include -I/usr/local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.8/site-packages/torch/include/TH -I/usr/local/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/local/include/python3.8 -c -c /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/csrc/nccl.cpp -o /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/build/temp.linux-x86_64-3.8/csrc/nccl.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
FAILED: /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/build/temp.linux-x86_64-3.8/csrc/nccl.o
c++ -MMD -MF /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/build/temp.linux-x86_64-3.8/csrc/nccl.o.d -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Icsrc/nccl/build/include -I/usr/local/lib/python3.8/site-packages/torch/include -I/usr/local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.8/site-packages/torch/include/TH -I/usr/local/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/local/include/python3.8 -c -c /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/csrc/nccl.cpp -o /tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/build/temp.linux-x86_64-3.8/csrc/nccl.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
c++: error: unrecognized command line option ‘-std=c++14’
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1740, in _run_ninja_build
subprocess.run(
File "/usr/local/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-9z6c_ooh/bmtrain_926bd6a4c18c493a82619c6ec3553d2e/setup.py", line 74, in <module>
setup(
File "/usr/local/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/usr/local/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/local/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/usr/local/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.8/site-packages/setuptools/command/install.py", line 61, in run
return orig.install.run(self)
File "/usr/local/lib/python3.8/distutils/command/install.py", line 545, in run
self.run_command('build')
File "/usr/local/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/local/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.8/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/usr/local/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/local/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/usr/local/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/usr/local/lib/python3.8/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/usr/local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 741, in build_extensions
build_ext.build_extensions(self)
File "/usr/local/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
File "/usr/local/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/usr/local/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/usr/local/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
_build_ext.build_extension(self, ext)
File "/usr/local/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
objects = self.compiler.compile(sources,
File "/usr/local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 562, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/usr/local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1419, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/usr/local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1756, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure
× Encountered error while trying to install package.
╰─> bmtrain
note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.
several error occur, when I try to reproduce the project
https://modelcenter.readthedocs.io/en/latest/notes/quickstart.html
1.TypeError__init__() got an unexpected keyword argument 'architectures'
It seems like Hugging face update the configuration file. Problem fixed when I empty configuration file.
2.TypeError: linear(): argument ‘input‘ (position 1) must be Tensor, not NoneType
Traceback prompt that it occur from
logits = model(input_ids, attention_mask)
...
TypeError: linear(): argument ‘input‘ (position 1) must be Tensor, not NoneType
return torch._C._nn.linear(input, weight, bias)
Many thanks for your fantastic project. Just a kind suggestion: while using bmtrain
, it will throw out a lot warnings and they are all corresponding to torch.storage
as following:
/opt/conda/envs/compression/lib/python3.10/site-packages/bmtrain/store.py:178: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
it seems that it is related to feature of torch
. You may replace all the tensor.storage()
with tensor.untyped_storage()
so that we can get the clean logs:)
If I want to train T5 model with BMTrain,how can I do it?
Hi, is there a DistributedDataloader design necessary to work with the bmtrain for the accelerating, or the bmtrain method itself would realize the optimization for both the memory and the speed?
我看到有一个scale实现上的错误,可能会导致 Nan的问题。
这个eps*scale 应该需要 是 eps*sqrtf(scale)
https://github.com/OpenBMB/BMTrain/blob/9cc975593f628a3fcc8c71328081e238914eca1d/csrc/cuda/adam.cu#LL29C115-L29C115
bmt.OpTransformerBlockList
can only handle the hidden states returned by transformer block.hidden_states
as well as residual
in order to fuse Dropout -> Add -> LN
. Additionally, the above two will be passed to the next block as input;
class Block(nn.Module):
def forward(self, hidden_states: Tensor, residual: Optional[Tensor] = None,
mixer_subset=None, mixer_kwargs=None):
if self.prenorm:
...
return hidden_states, residual
...
bmt.OpTransformerBlockList
and cannot be properly handled by us.
I'm trying to run EleutherAI/gpt-j-6B on a Titan V (12 GB), which can't fit the weights, so after loading the PyTorch model I'm doing:
torch.cuda.device(0): model = bminf.wrapper(model)
but, now I get the error (full error log at the end):
TypeError: object of type 'TransformerBlockList' has no len()
So I tried patching class TransformerBlockList(torch.nn.Module)
by adding a __len__
method:
def __len__(self):
return len(self.layers)
and the model finally runs, but I'm getting nonsense (random words) as output.
The code is roughly:
import transformers
model = transformers.GPTJForCausalLM.from_pretrained(f'EleutherAI/{MODELN}', revision='float16', torch_dtype=torch.float16, low_cpu_mem_usage=True)
torch.save(model, MODELN)
print(model)
if not os.path.exists(f'{DATAD}/{MODELN}.pt'): shutil.copy(f'{MODELN}.pt', DATAD)
model = torch.load(f'{DATAD}/{MODELN}.pt') # 11 s @ /dev/shm
tokenizer = transformers.AutoTokenizer.from_pretrained(f'EleutherAI/{MODELN}') # 8 s
with torch.cuda.device(0): model = bminf.wrapper(model) # 14 s
pl = transformers.pipeline('text-generation', model=model, tokenizer=tokenizer, max_length=100, device=0) # create pipeline
otxts = pl(prompt)
for txt in otxts:
print(f"\x1b[32m{txt['generated_text']}\x1b[0m")
Full error log:
/home/da/py38/lib/python3.8/site-packages/torch/nn/modules/module.py:673: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:480.)
if param.grad is not None:
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
/home/da/.local/lib/python3.8/site-packages/transformers/generation/utils.py:1387: UserWarning: Neither `max_length` nor `max_new_tokens` has been set, `max_length` will default to 50 (`self.config.max_length`). Controlling `max_length` via the config is deprecated and `max_length` will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.
warnings.warn(
Traceback (most recent call last):
File "./app.py", line 84, in <module>
txt = pl(prompt) # inf # [{'generated_text': 'My Name is philipp k. and I live just outside of Detroit....
File "/home/da/.local/lib/python3.8/site-packages/transformers/pipelines/text_generation.py", line 202, in __call__
return super().__call__(text_inputs, **kwargs)
File "/home/da/.local/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1074, in __call__
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
File "/home/da/.local/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1081, in run_single
model_outputs = self.forward(model_inputs, **forward_params)
File "/home/da/.local/lib/python3.8/site-packages/transformers/pipelines/base.py", line 990, in forward
model_outputs = self._forward(model_inputs, **forward_params)
File "/home/da/.local/lib/python3.8/site-packages/transformers/pipelines/text_generation.py", line 244, in _forward
generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
File "/home/da/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/da/.local/lib/python3.8/site-packages/transformers/generation/utils.py", line 1571, in generate
return self.sample(
File "/home/da/.local/lib/python3.8/site-packages/transformers/generation/utils.py", line 2534, in sample
outputs = self(
File "/home/da/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/da/.local/lib/python3.8/site-packages/transformers/models/gptj/modeling_gptj.py", line 821, in forward
transformer_outputs = self.transformer(
File "/home/da/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/da/.local/lib/python3.8/site-packages/transformers/models/gptj/modeling_gptj.py", line 587, in forward
past_key_values = tuple([None] * len(self.h))
TypeError: object of type 'TransformerBlockList' has no len()
When I tried to use bmt.init_distributed(seed=0), I met the following problem.
Traceback (most recent call last):
File "train_inner.py", line 19, in main
bmt.init_distributed(seed=0)
File "/python3.9/site-packages/bmtrain/init.py", line 88, in init_distributed
config['comm'] = nccl.commInitRank(unique_id, world_size, rank)
File "//python3.9/site-packages/bmtrain/nccl/__init__.py", line 77, in commInitRank
return NCCLCommunicator(C.ncclCommInitRank(unique_id, world_size, rank))
Any idea why would this occur? Or how may I solve this problem? Thanks.
torchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost train.py
This is how I run the program.
ERROR: Command errored out with exit status 1: 'C:\Users\46213\anaconda3\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\46213\AppData\Local\Temp\pip-install-0s5mkooc\bmtrain_23dc03e3d7e841b88ef095cfb3ede34b\setup.py'"'"'; file='"'"'C:\Users\46213\AppData\Local\Temp\pip-install-0s5mkooc\bmtrain_23dc03e3d7e841b88ef095cfb3ede34b\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\46213\AppData\Local\Temp\pip-record-z5irnjba\install-record.txt' --single-version-externally-managed --compile --install-headers 'C:\Users\46213\anaconda3\Include\bmtrain' Check the logs for full command output.
执行命令:
python setup.py install
报错信息:
The detected CUDA version (12.1) mismatches the version that was used to compile
PyTorch (11.8). Please make sure to use the same CUDA versions.
疑问:
目前pytorch的话 还没CUDA12.1 的版本,这样的报错有啥办法可以解决吗?
Existing TransformerBlockList
cannot output the hidden states and attention scores for each transformer layer. Sometimes we want to get the hiddens and attention scores to conduct analysis and feed them into the next modules.
我尝试了多种方式:
1 model.load_state_dict(torch.load(args.model_path), strict=False)
2 bmt.load(model, args.LoRA_path,strict=False)
但是,打印模型参数后发现并没有被读取进去。为什么发生这样的情况。
bmt.init_distributed(seed=0)
config = CPMAntConfig.from_json_file(args.config_path)
model = CPMAntPlus(config=config)
bmt.load(model, args.model_path,strict=True)
#delta_model = AutoDeltaModel.from_finetuned(args.LoRA_path, backbone_model=model)
delta_model = LoraModel(backbone_model=model, modified_modules=["project_q", "project_v"], backend="bmt")
delta_model.freeze_module(exclude=["deltas"], set_state_dict=True)
我用cuda extention 的方式添加了一个op,用bmtrain框架跑会报OOM,应该是ZeRO没有起效,请问这个问题怎么解决?
''''''
File "train.py", line 15, in main
bmt.init_distributed(
File "lib/python3.9/site-packages/bmtrain/init.py", line 40, in init_distributed
local_rank = int(os.environ["LOCAL_RANK"])
File "lib/python3.9/os.py", line 679, in getitem
raise KeyError(key) from None
KeyError: 'LOCAL_RANK'
''''''
An Error occured when calling bmt.init_distributed function in train.py,
After I check 'os.environ.keys()', couldn't find 'LOCAL_RANK'.
It seems that 'bmtrain' wasn't successfully installed
In the environment of pytorch 1.12.1, cuda 11.1, and python 3.8.0, I failed to install BMTrain 0.2.0 using "pip setup.py install". Prompt that cuda needs to meet version 11.3. Does BMTrain 0.2.0 support cuda 11.1?
Say, I have a tensor z
with a size of [1]
, and a tensor x
with a size of [batch_size, intermediate_dim, model_dim]
.
When calculating z*x
, z
should be broadcasted to the same size as x.
However, in the current implementation of BMTrain, if we have 4 GPUs, as far as I can see, z
would be firstly split into tensors z0
to z3
with sizes of [1]
, [0]
, [0]
, [0]
; and x
would be also split into 4 tensors x0
to x3
. Then things like zi*xi
would be calculated. However, z1*x1
fails because the tensor z1
with a size of [0]
does not match the tensor x1 in size.
Code causing the problem would be attached later.
感觉英语有点工地,这是中文版本:
假如我有张量 z
,大小是 [1]
,还有张量 x
,大小是 [batch_size, intermediate_dim, model_dim]
。
算 z*x
的时候,z
应该被广播到 x
一样的大小。
但是在 BMTrain 现有的实现下,假设有 4 个 GPU,依我所见,z
会被切成四个张量 z0
到 z3
,大小分别是 [1]
, [0]
, [0]
, [0]
; x
也会被切成四个张量 x0
到 x3
。然后会算 zi*xi
之类的东西。但是 z1*x1
计算会失败,因为张量 z1
大小是 [0]
,与张量 x1
大小不符。
产生问题的代码稍后附上。
The installation fails with "RuntimeError: Error compiling objects for extension".
Thank you!
ValueError: Unknown CUDA arch (9.0+PTX) or GPU not supported
nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_May__3_18:49:52_PDT_2022 Cuda compilation tools, release 11.7, V11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0
Hi developer, when I tried to use 'gather()' method from the 'Distributedparameter', I received the following error:
, line 43, in __init__
self.rel_embed.weight /= torch.norm(self.rel_embed.weight.gather().detach(), p=self.p_norm, dim=-1)[:, None]
TypeError: gather() received an invalid combination of arguments - got (), but expected one of:
* (int dim, Tensor index, *, bool sparse_grad)
* (name dim, Tensor index, *, bool sparse_grad)
I coundn't find any information about this set of arguments required, any idea why this may occur or how may I solve this issue?
Thanks a lot.
As mentioned, try to run the example provided in example folder by using run.sh script. But throw me this
WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -7) local_rank: 0 (pid: 37831) of binary: /usr/local/bin/python
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
train.py FAILED
Failures:
[1]:
time : 2023-03-21_01:00:08
host : 2b0ea7cdb636
rank : 1 (local_rank: 1)
exitcode : -7 (pid: 37832)
error_file: <N/A>
traceback : Signal 7 (SIGBUS) received by PID 37832
[2]:
time : 2023-03-21_01:00:08
host : 2b0ea7cdb636
rank : 2 (local_rank: 2)
exitcode : -7 (pid: 37833)
error_file: <N/A>
traceback : Signal 7 (SIGBUS) received by PID 37833
[3]:
time : 2023-03-21_01:00:08
host : 2b0ea7cdb636
rank : 3 (local_rank: 3)
exitcode : -7 (pid: 37834)
error_file: <N/A>
traceback : Signal 7 (SIGBUS) received by PID 37834
Root Cause (first observed failure):
[0]:
time : 2023-03-21_01:00:08
host : 2b0ea7cdb636
rank : 0 (local_rank: 0)
exitcode : -7 (pid: 37831)
error_file: <N/A>
traceback : Signal 7 (SIGBUS) received by PID 37831
I am using a server with
8*3090
630GB cpu RAM
python 3.8
cuda 11.8
When I run the following command to install BMTrain
python setup.py install
I met the error of
running install
/home/yhlin/torch_env/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
/home/yhlin/torch_env/lib/python3.8/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running bdist_egg
running egg_info
writing bmtrain.egg-info/PKG-INFO
writing dependency_links to bmtrain.egg-info/dependency_links.txt
writing requirements to bmtrain.egg-info/requires.txt
writing top-level names to bmtrain.egg-info/top_level.txt
reading manifest file 'bmtrain.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'bmtrain.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build/lib.linux-x86_64-cpython-38
creating build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/layer.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/block_layer.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/synchronize.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/parameter.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/param_init.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/global_var.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/wrapper.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/utils.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/store.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/pipe_layer.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/__init__.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/init.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/debug.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/checkpointing.py -> build/lib.linux-x86_64-cpython-38/bmtrain
creating build/lib.linux-x86_64-cpython-38/bmtrain/loss
copying bmtrain/loss/__init__.py -> build/lib.linux-x86_64-cpython-38/bmtrain/loss
copying bmtrain/loss/cross_entropy.py -> build/lib.linux-x86_64-cpython-38/bmtrain/loss
creating build/lib.linux-x86_64-cpython-38/bmtrain/benchmark
copying bmtrain/benchmark/all_gather.py -> build/lib.linux-x86_64-cpython-38/bmtrain/benchmark
copying bmtrain/benchmark/reduce_scatter.py -> build/lib.linux-x86_64-cpython-38/bmtrain/benchmark
copying bmtrain/benchmark/send_recv.py -> build/lib.linux-x86_64-cpython-38/bmtrain/benchmark
copying bmtrain/benchmark/shape.py -> build/lib.linux-x86_64-cpython-38/bmtrain/benchmark
copying bmtrain/benchmark/utils.py -> build/lib.linux-x86_64-cpython-38/bmtrain/benchmark
copying bmtrain/benchmark/__init__.py -> build/lib.linux-x86_64-cpython-38/bmtrain/benchmark
creating build/lib.linux-x86_64-cpython-38/bmtrain/distributed
copying bmtrain/distributed/ops.py -> build/lib.linux-x86_64-cpython-38/bmtrain/distributed
copying bmtrain/distributed/__init__.py -> build/lib.linux-x86_64-cpython-38/bmtrain/distributed
creating build/lib.linux-x86_64-cpython-38/bmtrain/optim
copying bmtrain/optim/adam.py -> build/lib.linux-x86_64-cpython-38/bmtrain/optim
copying bmtrain/optim/adam_offload.py -> build/lib.linux-x86_64-cpython-38/bmtrain/optim
copying bmtrain/optim/optim_manager.py -> build/lib.linux-x86_64-cpython-38/bmtrain/optim
copying bmtrain/optim/__init__.py -> build/lib.linux-x86_64-cpython-38/bmtrain/optim
creating build/lib.linux-x86_64-cpython-38/bmtrain/nccl
copying bmtrain/nccl/enums.py -> build/lib.linux-x86_64-cpython-38/bmtrain/nccl
copying bmtrain/nccl/__init__.py -> build/lib.linux-x86_64-cpython-38/bmtrain/nccl
creating build/lib.linux-x86_64-cpython-38/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/no_decay.py -> build/lib.linux-x86_64-cpython-38/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/linear.py -> build/lib.linux-x86_64-cpython-38/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/noam.py -> build/lib.linux-x86_64-cpython-38/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/exponential.py -> build/lib.linux-x86_64-cpython-38/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/warmup.py -> build/lib.linux-x86_64-cpython-38/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/__init__.py -> build/lib.linux-x86_64-cpython-38/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/cosine.py -> build/lib.linux-x86_64-cpython-38/bmtrain/lr_scheduler
creating build/lib.linux-x86_64-cpython-38/bmtrain/inspect
copying bmtrain/inspect/model.py -> build/lib.linux-x86_64-cpython-38/bmtrain/inspect
copying bmtrain/inspect/format.py -> build/lib.linux-x86_64-cpython-38/bmtrain/inspect
copying bmtrain/inspect/__init__.py -> build/lib.linux-x86_64-cpython-38/bmtrain/inspect
copying bmtrain/inspect/tensor.py -> build/lib.linux-x86_64-cpython-38/bmtrain/inspect
running build_ext
building 'bmtrain.nccl._C' extension
creating /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38
creating /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc
Emitting ninja build file /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] c++ -MMD -MF /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc/nccl.o.d -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/nccl/build/include -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/TH -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/yhlin/torch_env/include -I/opt/conda/include/python3.8 -c -c /home/yhlin/BMTrain/csrc/nccl.cpp -o /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc/nccl.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
g++ -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -pthread -shared -B /opt/conda/compiler_compat -L/opt/conda/lib -Wl,-rpath=/opt/conda/lib -Wl,--no-as-needed -Wl,--sysroot=/ /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc/nccl.o -L/home/yhlin/torch_env/lib/python3.8/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda_cu -ltorch_cuda_cpp -o build/lib.linux-x86_64-cpython-38/bmtrain/nccl/_C.cpython-38-x86_64-linux-gnu.so
building 'bmtrain.optim._cuda' extension
creating /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc/cuda
Emitting ninja build file /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc/adam_cuda.o.d -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/TH -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/yhlin/torch_env/include -I/opt/conda/include/python3.8 -c -c /home/yhlin/BMTrain/csrc/adam_cuda.cpp -o /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc/adam_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
[2/3] /usr/local/cuda/bin/nvcc -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/TH -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/yhlin/torch_env/include -I/opt/conda/include/python3.8 -c -c /home/yhlin/BMTrain/csrc/cuda/has_inf_nan.cu -o /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc/cuda/has_inf_nan.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
FAILED: /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc/cuda/has_inf_nan.o
/usr/local/cuda/bin/nvcc -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/TH -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/yhlin/torch_env/include -I/opt/conda/include/python3.8 -c -c /home/yhlin/BMTrain/csrc/cuda/has_inf_nan.cu -o /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc/cuda/has_inf_nan.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
/home/yhlin/BMTrain/csrc/cuda/has_inf_nan.cu(11): error: identifier "__heq" is undefined
1 error detected in the compilation of "/home/yhlin/BMTrain/csrc/cuda/has_inf_nan.cu".
I went into the code of has_inf_nan.cu, and found there was no anyother place to define "__heq". Can you help to solve it? Thanks!
Could BMTrain use together with the tools like Jax or Apex, or any comparisions or experiments plan with these tools? Thanks
pip install bmtrain
Collecting bmtrain
Using cached bmtrain-0.1.8.post1.tar.gz (48 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: numpy in c:\users\chenliyu\anaconda3\envs\cpm\lib\site-packages (from bmtrain) (1.21.6)
Building wheels for collected packages: bmtrain
Building wheel for bmtrain (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [55 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-37
creating build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\backward.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\block_layer.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\checkpointing.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\debug.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\global_var.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\init.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\layer.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\parameter.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\param_init.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\store.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\synchronize.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\utils.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\wrapper.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain_init_.py -> build\lib.win-amd64-cpython-37\bmtrain
creating build\lib.win-amd64-cpython-37\bmtrain\benchmark
copying bmtrain\benchmark\all_gather.py -> build\lib.win-amd64-cpython-37\bmtrain\benchmark
copying bmtrain\benchmark\reduce_scatter.py -> build\lib.win-amd64-cpython-37\bmtrain\benchmark
copying bmtrain\benchmark\shape.py -> build\lib.win-amd64-cpython-37\bmtrain\benchmark
copying bmtrain\benchmark\utils.py -> build\lib.win-amd64-cpython-37\bmtrain\benchmark
copying bmtrain\benchmark_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\benchmark
creating build\lib.win-amd64-cpython-37\bmtrain\distributed
copying bmtrain\distributed\ops.py -> build\lib.win-amd64-cpython-37\bmtrain\distributed
copying bmtrain\distributed_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\distributed
creating build\lib.win-amd64-cpython-37\bmtrain\inspect
copying bmtrain\inspect\format.py -> build\lib.win-amd64-cpython-37\bmtrain\inspect
copying bmtrain\inspect\model.py -> build\lib.win-amd64-cpython-37\bmtrain\inspect
copying bmtrain\inspect\tensor.py -> build\lib.win-amd64-cpython-37\bmtrain\inspect
copying bmtrain\inspect_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\inspect
creating build\lib.win-amd64-cpython-37\bmtrain\loss
copying bmtrain\loss\cross_entropy.py -> build\lib.win-amd64-cpython-37\bmtrain\loss
copying bmtrain\loss_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\loss
creating build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler\cosine.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler\exponential.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler\linear.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler\noam.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler\no_decay.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler\warmup.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
creating build\lib.win-amd64-cpython-37\bmtrain\nccl
copying bmtrain\nccl\enums.py -> build\lib.win-amd64-cpython-37\bmtrain\nccl
copying bmtrain\nccl_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\nccl
creating build\lib.win-amd64-cpython-37\bmtrain\optim
copying bmtrain\optim\adam.py -> build\lib.win-amd64-cpython-37\bmtrain\optim
copying bmtrain\optim\adam_offload.py -> build\lib.win-amd64-cpython-37\bmtrain\optim
copying bmtrain\optim\clip_grad.py -> build\lib.win-amd64-cpython-37\bmtrain\optim
copying bmtrain\optim_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\optim
running build_ext
error: [WinError 2] 系统找不到指定的文件。
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for bmtrain
Running setup.py clean for bmtrain
Failed to build bmtrain
Installing collected packages: bmtrain
Running setup.py install for bmtrain ... error
error: subprocess-exited-with-error
× Running setup.py install for bmtrain did not run successfully.
│ exit code: 1
╰─> [57 lines of output]
running install
C:\Users\chenliyu\anaconda3\envs\cpm\lib\site-packages\setuptools\command\install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
setuptools.SetuptoolsDeprecationWarning,
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-37
creating build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\backward.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\block_layer.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\checkpointing.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\debug.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\global_var.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\init.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\layer.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\parameter.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\param_init.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\store.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\synchronize.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\utils.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain\wrapper.py -> build\lib.win-amd64-cpython-37\bmtrain
copying bmtrain_init_.py -> build\lib.win-amd64-cpython-37\bmtrain
creating build\lib.win-amd64-cpython-37\bmtrain\benchmark
copying bmtrain\benchmark\all_gather.py -> build\lib.win-amd64-cpython-37\bmtrain\benchmark
copying bmtrain\benchmark\reduce_scatter.py -> build\lib.win-amd64-cpython-37\bmtrain\benchmark
copying bmtrain\benchmark\shape.py -> build\lib.win-amd64-cpython-37\bmtrain\benchmark
copying bmtrain\benchmark\utils.py -> build\lib.win-amd64-cpython-37\bmtrain\benchmark
copying bmtrain\benchmark_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\benchmark
creating build\lib.win-amd64-cpython-37\bmtrain\distributed
copying bmtrain\distributed\ops.py -> build\lib.win-amd64-cpython-37\bmtrain\distributed
copying bmtrain\distributed_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\distributed
creating build\lib.win-amd64-cpython-37\bmtrain\inspect
copying bmtrain\inspect\format.py -> build\lib.win-amd64-cpython-37\bmtrain\inspect
copying bmtrain\inspect\model.py -> build\lib.win-amd64-cpython-37\bmtrain\inspect
copying bmtrain\inspect\tensor.py -> build\lib.win-amd64-cpython-37\bmtrain\inspect
copying bmtrain\inspect_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\inspect
creating build\lib.win-amd64-cpython-37\bmtrain\loss
copying bmtrain\loss\cross_entropy.py -> build\lib.win-amd64-cpython-37\bmtrain\loss
copying bmtrain\loss_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\loss
creating build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler\cosine.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler\exponential.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler\linear.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler\noam.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler\no_decay.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler\warmup.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
copying bmtrain\lr_scheduler_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\lr_scheduler
creating build\lib.win-amd64-cpython-37\bmtrain\nccl
copying bmtrain\nccl\enums.py -> build\lib.win-amd64-cpython-37\bmtrain\nccl
copying bmtrain\nccl_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\nccl
creating build\lib.win-amd64-cpython-37\bmtrain\optim
copying bmtrain\optim\adam.py -> build\lib.win-amd64-cpython-37\bmtrain\optim
copying bmtrain\optim\adam_offload.py -> build\lib.win-amd64-cpython-37\bmtrain\optim
copying bmtrain\optim\clip_grad.py -> build\lib.win-amd64-cpython-37\bmtrain\optim
copying bmtrain\optim_init_.py -> build\lib.win-amd64-cpython-37\bmtrain\optim
running build_ext
error: [WinError 2] 系统找不到指定的文件。
已经将cl加入环境变量,python版本为3.7,已更新vc++build tool
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF /tmp/pip-install-c6m09ftc/bmtrain_85cbac71a04c4746abbb40699f06db93/build/temp.linux-x86_64-cpython-37/csrc/adam_cuda.o.d -pthread -B /data/home/youzan/xiaoyang/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include/TH -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/data/home/youzan/xiaoyang/py37env_tf2/include -I/data/home/youzan/xiaoyang/anaconda3/include/python3.7m -c -c /tmp/pip-install-c6m09ftc/bmtrain_85cbac71a04c4746abbb40699f06db93/csrc/adam_cuda.cpp -o /tmp/pip-install-c6m09ftc/bmtrain_85cbac71a04c4746abbb40699f06db93/build/temp.linux-x86_64-cpython-37/csrc/adam_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
[2/3] /usr/local/cuda/bin/nvcc -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include/TH -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/data/home/youzan/xiaoyang/py37env_tf2/include -I/data/home/youzan/xiaoyang/anaconda3/include/python3.7m -c -c /tmp/pip-install-c6m09ftc/bmtrain_85cbac71a04c4746abbb40699f06db93/csrc/cuda/has_inf_nan.cu -o /tmp/pip-install-c6m09ftc/bmtrain_85cbac71a04c4746abbb40699f06db93/build/temp.linux-x86_64-cpython-37/csrc/cuda/has_inf_nan.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
FAILED: /tmp/pip-install-c6m09ftc/bmtrain_85cbac71a04c4746abbb40699f06db93/build/temp.linux-x86_64-cpython-37/csrc/cuda/has_inf_nan.o
/usr/local/cuda/bin/nvcc -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include/TH -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/data/home/youzan/xiaoyang/py37env_tf2/include -I/data/home/youzan/xiaoyang/anaconda3/include/python3.7m -c -c /tmp/pip-install-c6m09ftc/bmtrain_85cbac71a04c4746abbb40699f06db93/csrc/cuda/has_inf_nan.cu -o /tmp/pip-install-c6m09ftc/bmtrain_85cbac71a04c4746abbb40699f06db93/build/temp.linux-x86_64-cpython-37/csrc/cuda/has_inf_nan.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/include/c++/7/bits/basic_string.tcc: In instantiation of ‘static std::basic_string<_CharT, _Traits, _Alloc>::_Rep* std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’:
/usr/include/c++/7/bits/basic_string.tcc:578:28: required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&, std::forward_iterator_tag) [with _FwdIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’
/usr/include/c++/7/bits/basic_string.h:5042:20: required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct_aux(_InIterator, _InIterator, const _Alloc&, std::__false_type) [with _InIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’
/usr/include/c++/7/bits/basic_string.h:5063:24: required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&) [with _InIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’
/usr/include/c++/7/bits/basic_string.tcc:656:134: required from ‘std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’
/usr/include/c++/7/bits/basic_string.h:6688:95: required from here
/usr/include/c++/7/bits/basic_string.tcc:1067:16: error: cannot call member function ‘void std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_M_set_sharable() [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’ without object
__p->_M_set_sharable();
~~~~~~~~~^~
/usr/include/c++/7/bits/basic_string.tcc: In instantiation of ‘static std::basic_string<_CharT, _Traits, _Alloc>::_Rep* std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’:
/usr/include/c++/7/bits/basic_string.tcc:578:28: required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&, std::forward_iterator_tag) [with _FwdIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.h:5042:20: required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct_aux(_InIterator, _InIterator, const _Alloc&, std::__false_type) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.h:5063:24: required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.tcc:656:134: required from ‘std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; Alloc = std::allocator<char32_t>; std::basic_string<CharT, Traits, Alloc>::size_type = long unsigned int]’
/usr/include/c++/7/bits/basic_string.h:6693:95: required from here
/usr/include/c++/7/bits/basic_string.tcc:1067:16: error: cannot call member function ‘void std::basic_string<CharT, Traits, Alloc>::Rep::M_set_sharable() [with CharT = char32_t; Traits = std::char_traits<char32_t>; Alloc = std::allocator<char32_t>]’ without object
[3/3] /usr/local/cuda/bin/nvcc -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include/TH -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/data/home/youzan/xiaoyang/py37env_tf2/include -I/data/home/youzan/xiaoyang/anaconda3/include/python3.7m -c -c /tmp/pip-install-c6m09ftc/bmtrain_85cbac71a04c4746abbb40699f06db93/csrc/cuda/adam.cu -o /tmp/pip-install-c6m09ftc/bmtrain_85cbac71a04c4746abbb40699f06db93/build/temp.linux-x86_64-cpython-37/csrc/cuda/adam.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
FAILED: /tmp/pip-install-c6m09ftc/bmtrain_85cbac71a04c4746abbb40699f06db93/build/temp.linux-x86_64-cpython-37/csrc/cuda/adam.o
/usr/local/cuda/bin/nvcc -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include/TH -I/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/data/home/youzan/xiaoyang/py37env_tf2/include -I/data/home/youzan/xiaoyang/anaconda3/include/python3.7m -c -c /tmp/pip-install-c6m09ftc/bmtrain_85cbac71a04c4746abbb40699f06db93/csrc/cuda/adam.cu -o /tmp/pip-install-c6m09ftc/bmtrain_85cbac71a04c4746abbb40699f06db93/build/temp.linux-x86_64-cpython-37/csrc/cuda/adam.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/include/c++/7/bits/basic_string.tcc: In instantiation of ‘static std::basic_string<_CharT, _Traits, _Alloc>::_Rep* std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’:
/usr/include/c++/7/bits/basic_string.tcc:578:28: required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&, std::forward_iterator_tag) [with _FwdIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’
/usr/include/c++/7/bits/basic_string.h:5042:20: required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct_aux(_InIterator, _InIterator, const _Alloc&, std::__false_type) [with _InIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’
/usr/include/c++/7/bits/basic_string.h:5063:24: required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&) [with _InIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’
/usr/include/c++/7/bits/basic_string.tcc:656:134: required from ‘std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’
/usr/include/c++/7/bits/basic_string.h:6688:95: required from here
/usr/include/c++/7/bits/basic_string.tcc:1067:16: error: cannot call member function ‘void std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_M_set_sharable() [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’ without object
__p->_M_set_sharable();
~~~~~~~~~^~
/usr/include/c++/7/bits/basic_string.tcc: In instantiation of ‘static std::basic_string<_CharT, _Traits, _Alloc>::_Rep* std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’:
/usr/include/c++/7/bits/basic_string.tcc:578:28: required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&, std::forward_iterator_tag) [with _FwdIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.h:5042:20: required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct_aux(_InIterator, _InIterator, const _Alloc&, std::__false_type) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.h:5063:24: required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.tcc:656:134: required from ‘std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’
/usr/include/c++/7/bits/basic_string.h:6693:95: required from here
/usr/include/c++/7/bits/basic_string.tcc:1067:16: error: cannot call member function ‘void std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_M_set_sharable() [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’ without object
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1723, in _run_ninja_build
env=env)
File "/data/home/youzan/xiaoyang/anaconda3/lib/python3.7/subprocess.py", line 468, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 36, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-c6m09ftc/bmtrain_85cbac71a04c4746abbb40699f06db93/setup.py", line 86, in <module>
'build_ext': BuildExtension
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/__init__.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 148, in setup
return run_commands(dist)
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 163, in run_commands
dist.run_commands()
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 967, in run_commands
self.run_command(cmd)
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/dist.py", line 1224, in run_command
super().run_command(command)
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
cmd_obj.run()
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/wheel/bdist_wheel.py", line 299, in run
self.run_command('build')
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/dist.py", line 1224, in run_command
super().run_command(command)
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
cmd_obj.run()
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/_distutils/command/build.py", line 136, in run
self.run_command(cmd_name)
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/dist.py", line 1224, in run_command
super().run_command(command)
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
cmd_obj.run()
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 735, in build_extensions
build_ext.build_extensions(self)
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 448, in build_extensions
self._build_extensions_serial()
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 473, in _build_extensions_serial
self.build_extension(ext)
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
_build_ext.build_extension(self, ext)
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 534, in build_extension
depends=ext.depends)
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 565, in unix_wrap_ninja_compile
with_cuda=with_cuda)
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1404, in _write_ninja_file_and_compile_objects
error_prefix='Error compiling objects for extension')
File "/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for bmtrain
Running setup.py clean for bmtrain
Successfully built model-center
Failed to build bmtrain
Installing collected packages: bmtrain, model-center
Running setup.py install for bmtrain ... error
error: subprocess-exited-with-error
× Running setup.py install for bmtrain did not run successfully.
│ exit code: 1
╰─> [183 lines of output]
running install
/data/home/youzan/xiaoyang/py37env_tf2/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
Procedure: simply reversing p.grad
when ('maximize' in group) and (group['maximize'] is True)
, making the code match the description in https://pytorch.org/docs/stable/generated/torch.optim.Adam.html#torch.optim.Adam better.
Bmtrain 提到支持 bf16 和 pipeline parallel。
请问有没有使用例子, pipeline parallel 和 zero 可以同时使用吗,谢谢
请问 adam和adam_offload 有计划支持 bf16 么?谢谢
We want to try a large LM model (>30B).
Are there any examples to do that?
can not load model using BMTrain using Torch==1.12.0
File "/home/ubuntu/anaconda3/envs/xx/lib/python3.9/site-packages/model_center/model/basemodel.py", line 33, in from_pretrained
bmt.load(model, os.path.join(path, 'pytorch_model.pt'), strict=False)
File "/home/ubuntu/anaconda3/envs/xx/lib/python3.9/site-packages/bmtrain/store.py", line 197, in load
ret = model.load_state_dict(
File "/home/ubuntu/anaconda3/envs/xx/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1559, in load_state_dict
raise TypeError("Expected state_dict to be dict-like, got {}.".format(type(state_dict)))
TypeError: Expected state_dict to be dict-like, got <class 'bmtrain.store.DistributedStateDictWrapper'>.
this is because the code in torch: torch/nn/modules/module.py
add type restriction.
def load_state_dict(self, state_dict: Mapping[str, Any],
strict: bool = True):
r"""Copies parameters and buffers from :attr:`state_dict` into
this module and its descendants. If :attr:`strict` is ``True``, then
the keys of :attr:`state_dict` must exactly match the keys returned
by this module's :meth:`~torch.nn.Module.state_dict` function.
Args:
state_dict (dict): a dict containing parameters and
persistent buffers.
strict (bool, optional): whether to strictly enforce that the keys
in :attr:`state_dict` match the keys returned by this module's
:meth:`~torch.nn.Module.state_dict` function. Default: ``True``
Returns:
``NamedTuple`` with ``missing_keys`` and ``unexpected_keys`` fields:
* **missing_keys** is a list of str containing the missing keys
* **unexpected_keys** is a list of str containing the unexpected keys
Note:
If a parameter or buffer is registered as ``None`` and its corresponding key
exists in :attr:`state_dict`, :meth:`load_state_dict` will raise a
``RuntimeError``.
"""
if not isinstance(state_dict, Mapping):
raise TypeError("Expected state_dict to be dict-like, got {}.".format(type(state_dict)))
#....
Hi, I found there might be some problems between bmt.save() and bmt.load().
Following the examples of BMCook, I loaded a gpt2-base model and trained it for several epoches. Notice that all operations of BMCook had been disabled in --cook-config. After training I invoked the bmt.save() method to save the checkpoint. However, this checkpoint seems to be mismatched with an initialized model parameters:
Traceback (most recent call last):
File "eval.py", line 207, in <module>
main()
File "eval.py", line 202, in main
bmt.load(model, args.load_path)
File "/opt/conda/lib/python3.8/site-packages/bmtrain/store.py", line 202, in load
ret = model.load_state_dict()
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict^M
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format()
RuntimeError: Error(s) in loading state_dict for GPT2:
While copying the parameter named "encoder.layers.0.self_att.self_attention.project_q.weight", whose dimensions in the model are torch.Size([768, 768]) and whose dimensions in the checkpoint are torch.Size([768, 768]), an exception occurred : ('The size of tensor a (768) must match the size of tensor b (589824) at non-singleton dimension 1',).
While copying the parameter named "encoder.layers.0.self_att.self_attention.project_k.weight", whose dimensions in the model are torch.Size([768, 768]) and whose dimensions in the checkpoint are torch.Size([768, 768]), an exception occurred : ('The size of tensor a (768) must match the size of tensor b (589824) at non-singleton dimension 1',).
While copying the parameter named "encoder.layers.0.self_att.self_attention.project_v.weight", whose dimensions in the model are torch.Size([768, 768]) and whose dimensions in the checkpoint are torch.Size([768, 768]), an exception occurred : ('The size of tensor a (768) must match the size of tensor b (589824) at non-singleton dimension 1',).
While copying the parameter named "encoder.layers.0.self_att.self_attention.attention_out.weight", whose dimensions in the model are torch.Size([768, 768]) and whose dimensions in the checkpoint are torch.Size([768, 768]), an exception occurred : ('The size of tensor a (768) must match the size of tensor b (589824) at non-singleton dimension 1',).
While copying the parameter named "encoder.layers.0.ffn.ffn.w_in.w.weight", whose dimensions in the model are torch.Size([3072, 768]) and whose dimensions in the checkpoint are torch.Size([3072, 768]), an exception occurred : ('The size of tensor a (768) must match the size of tensor b (2359296) at non-singleton dimension 1',).
...
While copying the parameter named "encoder.layers.11.ffn.ffn.w_out.weight", whose dimensions in the model are torch.Size([768, 3072]) and whose dimensions in the checkpoint are torch.Size([768, 3072]), an exception occurred : ('The size of tensor a (3072) must match the size of tensor b (2359296) at non-singleton dimension 1',).
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 253443) of binary: /opt/conda/bin/python
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch()
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError()
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
eval.py FAILED
It seems that the bmt.load just could not align the saved parameters and the flatten parameters of an initialized model. I'm not sure this is caused by BMCook or BMTrain. All hyper-parameters have been aligned, including preprocess of BMCook, this is the main part of my code, which is the same as the given example of BMCook except for the final bmt.load():
bmt.init_distributed()
args = parse_args()
save_dir = Path(args.save_dir)
ckpt_dir = save_dir / 'checkpoints'
os.makedirs(ckpt_dir, exist_ok=True)
json.dump(vars(args), open(save_dir / 'train_args.json', 'w'), indent=2)
model_config = config_map[args.model].from_pretrained(args.model)
model = model_map[args.model].from_pretrained(args.model, config=model_config)
# teacher model has the same config as the student model
teacher = model_map[args.model].from_pretrained(args.model, config=model_config)
def new_forward(model_self, enc_input, enc_length, dec_input, dec_length, return_logits=False):
return model_self.forward_old(dec_input, dec_length, output_logits=return_logits)
model.forward_old = model.forward
model.forward = types.MethodType(new_forward, model)
teacher.forward_old = teacher.forward
teacher.forward = types.MethodType(new_forward, teacher)
bmt.synchronize()
# data
batch_size = 8
dec_len = 512
loss_func = torch.nn.CrossEntropyLoss(ignore_index=-100)
optimizer = bmt.optim.AdamOptimizer(model.parameters(), scale=2**20)
lr_scheduler = bmt.lr_scheduler.Noam(optimizer, start_lr=args.start_lr, warmup_iter=2000, end_iter=100000)
# bmcook config
from bmcook.utils.config import ConfigParser
config = ConfigParser(args.cook_config)
# remove checkpointing
for _, v in model.named_modules():
if isinstance(v, bmt.TransformerBlockList):
def new_func(list_self, hidden_states, *args):
for i in range(len(list_self._modules)):
hidden_states = list_self._modules[str(i)](hidden_states, *args)
return hidden_states
v.forward = types.MethodType(new_func, v)
for k in v._modules.keys():
state_dict = v._modules[k].state_dict()
for kk, vv in v._modules[k]._module.named_modules():
if kk+'.weight' in state_dict:
vv.weight.data = state_dict[kk+'.weight'].clone().cuda()
if kk+'.bias' in state_dict:
vv.bias.data = state_dict[kk+'.bias'].clone().cuda()
v._modules[k] = v._modules[k]._module
# for distillation
Trainer.forward = BMDistill.set_forward(model, teacher, Trainer.forward, config)
# for pruning
BMPrune.compute_mask(model, config)
BMPrune.set_optim_for_pruning(optimizer)
# for quantization
BMQuant.quantize(model, config)
# for moefication
Trainer.forward = BMMoE.get_hidden(model, config, Trainer.forward)
bmt.synchronize()
average_time = 0
average_time_shift = 0.9
dataset = Dataset(
MMapIndexedDataset(args.data_path),
dec_len
)
if config.get('MoEfication')['is_moefy']:
os.makedirs(save_dir / 'hiddens', exist_ok=True)
model.eval()
for iteration, data in enumerate(Trainer.batch_iter(dataset, batch_size, bmt.rank(), bmt.world_size())):
if iteration == 100:
break
dec_input = data["ctx"].int()
dec_length = data["len_ctx"].int()
dec_mask = torch.arange(dec_len)[None, :].repeat(batch_size, 1) < dec_length[:, None]
targets = torch.where(dec_mask, data["target"].long(), torch.scalar_tensor(-100, dtype=torch.long))
targets = targets.cuda()
dec_input = dec_input.cuda()
dec_length = dec_length.cuda()
with torch.no_grad():
outputs = Trainer.forward(model, None, None, dec_input, dec_length, targets, loss_func)
torch.save(outputs[-1], save_dir / 'hiddens' / '{}_{}'.format(iteration, bmt.rank()))
bmt.print_rank("Iteration:", iteration)
exit()
do_distill = True
distill_config = config.get('distillation')
if distill_config['ce_scale'] + distill_config['mse_hidn_scale'] + distill_config['mse_att_scale'] == 0:
do_distill = False
bmt.load(model, args.load_path)
# model.train()
teacher.eval()
model.eval()
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.