niutrans / niutrans.nmt Goto Github PK

View Code? Open in Web Editor NEW

130.0 12.0 35.0 43.54 MB

A Fast Neural Machine Translation System developed in C++.

License: Apache License 2.0

CMake 4.69% C++ 86.39% Python 6.83% Cuda 2.09%

neural-machine-translation machine-translation transformer fast-decoding

niutrans.nmt's Introduction

NiuTrans.NMT

NiuTrans.NMT

Features

NiuTrans.NMT is a lightweight and efficient Transformer-based neural machine translation system. 中文介绍

Its main features are:

Few dependencies. It is implemented with pure C++, and all dependencies are optional.
High efficiency. It is heavily optimized for fast decoding, see our WMT paper for more details.
Flexible running modes. The system can run with various systems and devices (Linux vs. Windows, CPUs vs. GPUs, and FP32 vs. FP16, etc.).
Framework agnostic. It supports various models trained with other tools, e.g., fairseq models.

Recent Updates

November 2021: Released the code of our submissions to the WMT21 efficiency task. We speed up the inference by 3 times on the GPU (up to 250k words/s on a single NVIDIA A100 GPU card)!

December 2020: Added support for the training of DLCL and RPR Attention

December 2020: Heavily reduced the memory footprint of training by optimizing the backward functions

Installation

Requirements

OS: Linux or Windows
GCC/G++ >=4.8.5 (on Linux)
VC++ >=2015 (on Windows)
cmake >= 3.5
CUDA >= 10.2 (optional)
MKL latest version (optional)
OpenBLAS latest version (optional)

Build from Source

Configure with cmake

The default configuration enables compiling for the pure CPU version.

# Download the code
git clone https://github.com/NiuTrans/NiuTrans.NMT.git
git clone https://github.com/NiuTrans/NiuTensor.git
# Merge with NiuTrans.Tensor
mv NiuTensor/source NiuTrans.NMT/source/niutensor
rm NiuTrans.NMT/source/niutensor/Main.cpp
rm -rf NiuTrans.NMT/source/niutensor/sample NiuTrans.NMT/source/niutensor/tensor/test
mkdir NiuTrans.NMT/build && cd NiuTrans.NMT/build
# Run cmake
cmake ..

You can add compilation options to the cmake command to support accelerations with MKL, OpenBLAS, or CUDA.

Please note that you can only select at most one of MKL or OpenBLAS.

Use CUDA (required for training)

Add -DUSE_CUDA=ON, -DCUDA_TOOLKIT_ROOT=$CUDA_PATH and DGPU_ARCH=$GPU_ARCH to the cmake command, where $CUDA_PATH is the path of the CUDA toolkit and $GPU_ARCH is the GPU architecture.

Supported GPU architectures are listed as below: K：Kepler M：Maxwell P：Pascal V：Volta T：Turing A：Ampere

See the NVIDIA's official page for more details.

You can also add -DUSE_HALF_PRECISION=ON to the cmake command to get half-precision supported.
Use MKL (optional)

Add -DUSE_MKL=ON and -DINTEL_ROOT=$MKL_PATH to the cmake command, where $MKL_PATH is the path of MKL.
Use OpenBLAS (optional)

Add -DUSE_OPENBLAS=ON and -DOPENBLAS_ROOT=$OPENBLAS_PATH to the cmake command, where $OPENBLAS_PATH is the path of OpenBLAS.

Note that half-precision requires Pascal or newer GPU architectures.

Configuration Example

We provide several examples to build the project with different options.

Compile on Linux

make -j && cd ..

Compile on Windows

Add -A 64 to the cmake command and it will generate a visual studio project on windows, i.e., NiuTrans.NMT.sln so you can open & build it with Visual Studio (>= Visual Studio 2015).

If it succeeds, you will get an executable file NiuTrans.NMT in the 'bin' directory.

Usage

Training

Commands

Make sure compiling the program with CUDA because training on CPUs is not supported now.

Step 1: Prepare the training data.

# Convert the BPE vocabulary
python3 tools/GetVocab.py \
  -raw $bpeVocab \
  -new $niutransVocab

Description:

raw - Path of the BPE vocabulary.
new - Path of the NiuTrans.NMT vocabulary to be saved.

# Binarize the training data
python3 tools/PrepareParallelData.py \ 
  -src $srcFile \
  -tgt $tgtFile \
  -sv $srcVocab \
  -tv $tgtVocab \
  -maxsrc 200 \
  -maxtgt 200 \
  -output $trainingFile

Description:

src - Path of the source language data. One sentence per line with tokens separated by spaces or tabs.
tgt - Path of the target language data. The same format as the source language data.
sv - Path of the source language vocabulary. Its first line is the vocabulary size and the first index, followed by a word and its index in each following line.
tv - Path of the target language vocabulary. The same format as the source language vocabulary.
maxsrc - The maximum length of a source sentence. Default: 200.
maxtgt - The maximum length of a target sentence. Default: 200.
output - Path of the training data to be saved.

Step 2: Train the model

bin/NiuTrans.NMT \
  -dev 0 \
  -nepoch 50 \
  -model model.bin \
  -ncheckpoint 10 \
  -train train.data \
  -valid valid.data

Description:

dev - Device id (>= 0 for GPUs). Default: 0.
model - Path of the model to be saved.
train - Path to the training file. The same format as the output file in step 1.
valid - Path to the validation file. The same format as the output file in step 1.
wbatch - Word batch size. Default: 4096.
sbatch - Sentence batch size. Default: 32.
dropout - Dropout rate for the model. Default: 0.3.
fnndrop - Dropout rate for fnn layers. Default: 0.1.
attdrop - Dropout rate for attention layers. Default: 0.1.
lrate- Learning rate. Default: 0.0015.
minlr - The minimum learning rate for training. Default: 1e-9.
warmupinitlr - The initial learning rate for warm-up. Default: 1e-7.
weightdecay - The weight decay factor. Default: 0.
nwarmup - Step number of warm-up for training. Default: 8000.
adam - Indicates whether Adam is used. Default: true.
adambeta1 - Hyper parameters of Adam. Default: 0.9.
adambeta2 - Hyper parameters of Adam. Default: 0.98.
adambeta - Hyper parameters of Adam. Default: 1e-9.
labelsmoothing - Label smoothing factor. Default: 0.1.
updatefreq - Update the model every updatefreq step. Default: 1.
nepoch - The maximum training epoch. Default: 50.
nstep - The maximum traing step. Default: 100000.
ncheckpoint - The maximum checkpoint to be saved. Default: 0.1.

Training Example

Refer to this page for the training example.

Translating

Make sure compiling the program with CUDA and FP16 if you want to translate with FP16 on GPUs.

Commands

bin/NiuTrans.NMT \
 -dev $deviceID \
 -input $inputFile \
 -model $modelPath \
 -wbatch $wordBatchSize \
 -sbatch $sentenceBatchSize \
 -beamsize $beamSize \
 -srcvocab $srcVocab \
 -tgtvocab $tgtVocab \
 -output $outputFile

Description:

model - Path of the model.
sbatch - Sentence batch size. Default: 32.
dev - Device id (-1 for CPUs, and >= 0 for GPUs). Default: 0.
beamsize - Size of the beam. 1 for the greedy search.
input - Path of the input file. One sentence per line with tokens separated by spaces.
output - Path of the output file to be saved. The same format as the input file.
srcvocab - Path of the source language vocabulary. Its first line is the vocabulary size, followed by a word and its index in each following line.
tgtvocab - Path of the target language vocabulary. The same format as the source language vocabulary.
fp16 (optional) - Inference with FP16. This will not work if the model is stored in FP32. Default: false.
lenalpha - The alpha parameter controls the length preference. Default: 0.6.
maxlenalpha - Scalar of the input sequence (for the max number of search steps). Default: 1.2.

An Example

Refer to this page for the translating example.

Low Precision Inference

NiuTrans.NMT supports inference with FP16 and INT8, you can convert the model to FP16 with our tools:

python3 tools/FormatConverter.py \
  -input $inputModel \
  -output $outputModel \ 
  -format $targetFormat

Description:

input - Path of the raw model file.
output - Path of the new model file.
format - Target storage format, FP16 (Default) or FP32.

Converting Models from Fairseq

The core implementation is framework agnostic, so we can easily convert models trained with other frameworks to a binary format for efficient inference.

The following frameworks and models are currently supported:

	fairseq (>=0.6.2)
Transformer (Vaswani et al. 2017)	✓
RPR attention (Shaw et al. 2018)	✓
Deep Transformer (Wang et al. 2019)	✓

Refer to this page for the details about training models with fairseq.

After training, you can convert the fairseq checkpoint and vocabulary with the following steps.

Step 1: Convert parameters of a single fairseq model

python3 tools/ModelConverter.py -i $fairseqCheckpoint -o $niutransModel

Description:

raw - Path of the fairseq checkpoint, refer to this for more details.
new - Path to save the converted model parameters. All parameters are stored in a binary format.
fp16 (optional) - Save the parameters with 16-bit data type. Default: disabled.

Step 2: Convert the vocabulary:

python3 tools/VocabConverter.py -raw $fairseqVocabPath -new $niutransVocabPath

Description:

raw - Path of the fairseq vocabulary, refer to this for more details.
new - Path to save the converted vocabulary. Its first line is the vocabulary size, followed by a word and its index in each following line.

You may need to convert both the source language vocabulary and the target language vocabulary if they are not shared.

A Model Zoo

We provide several pre-trained models to test the system. All models and runnable systems are packaged into docker files so that one can easily reproduce our result.

Refer to this page for more details.

Papers

Here are the papers related to this project:

Learning Deep Transformer Models for Machine Translation. Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F. Wong, Lidia S. Chao. 2019. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.

The NiuTrans System for WNGT 2020 Efficiency Task. Chi Hu, Bei Li, Yinqiao Li, Ye Lin, Yanyang Li, Chenglong Wang, Tong Xiao, Jingbo Zhu. 2020. Proceedings of the Fourth Workshop on Neural Generation and Translation.

The NiuTrans System for the WMT21 Efficiency Task. Chenglong Wang, Chi Hu, Yongyu Mu, Zhongxiang Yan, Siming Wu, Minyi Hu, Hang Cao, Bei Li, Ye Lin, Tong Xiao, Jingbo Zhu. 2020.

Team Members

This project is maintained by a joint team from NiuTrans Research and NEU NLP Lab. Current team members are

Chi Hu, Chenglong Wang, Siming Wu, Bei Li, Yinqiao Li, Ye Lin, Quan Du, Tong Xiao and Jingbo Zhu

Feel free to contact huchinlp[at]gmail.com or niutrans[at]mail.neu.edu.cn if you have any questions.

niutrans.nmt's People

Contributors

Stargazers

Watchers

niutrans.nmt's Issues

Add -A x64 to the cmake command not -A 64

Compile on Windows
Add -A 64 to the cmake command

This sentence may have been a clerical error.

Try this：
cmake -DUSE_CUDA=ON -DCUDA_TOOLKIT_ROOT="E:/Program Files/NVIDIA GPU Computing Toolkit/CUDA" -DGPU_ARCH=M -A x64 ..

Replace "E:/Program Files/NVIDIA GPU Computing Toolkit/CUDA" with yours own CUDA directory.

It generated a visual studio project on windows.

(XMem.cpp line 721): Cannot allocate the memory

Hi, it's an amazing project, but I cannot start work with myself data which has a bigger vocabulary size about 40000. The error is "(XMem.cpp line 721): Cannot allocate the memory", and my log is :
8 -nepoch=50
9 -maxcheckpoint=10
10 -enclayer=9
11 -declayer=1
12 -embsize=256
13 -modelsize=256
14 -nhead=8
15 -maxpos=128
16
17 encoder layer: 9
18 decoder layer: 1
19 attention heads: 8
20 model size: 256
21 source vocab size: 41056
22 target vocab size: 41056
23 [INFO] loaded 160239 sentences
24 [ERROR] (XMem.cpp line 721): Cannot allocate the memory.
25 terminate called without an active exception
26 scripts/train.deen.sh: line 28: 1006 Aborted bin/NiuTrans.NMT -dev ${deviceID} -model $modelFile -train ${dataDir}/train.bin -valid ${dataDir}/valid.bin -nepoch 50 -maxcheckpoint 10 -enclayer 9 -declayer 1 -embsize 256 -modelsize 256 -nhead 8 -maxpos 128

Could you help me solve this problem?

About Dataset

Hi！Nice work！
I wanna know where i can download Chinese-English parallel dataset, which containing 100,000 entries each in Chinese and English？

训练好的模型如何部署

如题，想把训练好的模型部署成WEB-API，请问有什么好的建议。

关于论文两个细节讨论

你好！
读了你们工程后面附的两篇论文，有两个问题想做一些讨论。
1、2021最优的student model表现甚至高出去年的teacher model，这块主要是做了哪些优化呢？我看到今年teacher model有增加回译，这块的收益是多少？
2、在做en-de序列蒸馏的时候是否需要将训练teacher model，使用的de-en模型输出的回译数据同样用teacher model预测出新结果后给student model学习呢？这块困惑的是如果src序列和tgt序列都来自于机器翻译，在训练的时候会不会是脏数据？

不能分配内存

[INFO] elapsed=813.4, step=100, epoch=1, total word=713106, total batch=29584, loss=14.962, lr=1.87e-05
[ERROR] (XMem.cpp line 721): Cannot allocate the memory.
terminate called without an active exception
Aborted (core dumped)
以上为日志信息，打印第一条训练信息后就报错。您是否遇到过？

Cannot compile under cuda 11.8

It cannot pass the linking stage. Here is the full log

[main] Building folder: niutensor 
[build] Starting build
[proc] Executing command: /usr/bin/cmake --build /home/pzzzzz/MyProjects/niutensor/build --config Debug --target all --
[build] -- CUDA_TOOLKIT_ROOT: /opt/cuda
[build] -- GPU_ARCH: -arch=compute_75;-code=sm_75
[build] -- try to compile with half precision
[build] -- ARCH_FLAGS:-arch=compute_75;-code=sm_75
[build] -- CUDA_LIB_PATH:
[build] -- Generate Makefile For Executable File
[build] -- Name of Executable File: NiuTrans.NMT
[build] -- On Linux or macOS; Use CUDA
[build] -- Configuring done
[build] -- Generating done
[build] -- Build files have been written to: /home/pzzzzz/MyProjects/niutensor/build
[build] Consolidate compiler generated dependencies of target NiuTrans.NMT
[build] [  1%] Linking CXX executable /home/pzzzzz/MyProjects/niutensor/bin/NiuTrans.NMT
[build] /usr/bin/ld: /opt/cuda/lib64/libcurand_static.a(curand.o): in function `curandDestroyGenerator':
[build] curand.compute_90.cudafe1.cpp:(.text+0x9cdb): undefined reference to `culibosEnterCriticalSection'
[build] /usr/bin/ld: curand.compute_90.cudafe1.cpp:(.text+0x9d18): undefined reference to `culibosLeaveCriticalSection'
[build] /usr/bin/ld: curand.compute_90.cudafe1.cpp:(.text+0x9d33): undefined reference to `culibosEnterCriticalSection'
[build] /usr/bin/ld: curand.compute_90.cudafe1.cpp:(.text+0x9d75): undefined reference to `culibosLeaveCriticalSection'
[build] /usr/bin/ld: curand.compute_90.cudafe1.cpp:(.text+0x9d90): undefined reference to `culibosEnterCriticalSection'
[build] /usr/bin/ld: curand.compute_90.cudafe1.cpp:(.text+0x9dd1): undefined reference to `culibosLeaveCriticalSection'
[build] /usr/bin/ld: curand.compute_90.cudafe1.cpp:(.text+0x9e9c): undefined reference to `culibosLeaveCriticalSection'
[build] /usr/bin/ld: curand.compute_90.cudafe1.cpp:(.text+0x9eaf): undefined reference to `culibosLeaveCriticalSection'
[build] /usr/bin/ld: curand.compute_90.cudafe1.cpp:(.text+0x9ec2): undefined reference to `culibosLeaveCriticalSection'
[build] /usr/bin/ld: /opt/cuda/lib64/libcurand_static.a(curand.o): in function `curandCreateGenerator':
[build] curand.compute_90.cudafe1.cpp:(.text+0xc3f5): undefined reference to `culibosEnterCriticalSection'
[build] /usr/bin/ld: curand.compute_90.cudafe1.cpp:(.text+0xc435): undefined reference to `culibosLeaveCriticalSection'
[build] /usr/bin/ld: curand.compute_90.cudafe1.cpp:(.text+0xc464): undefined reference to `culibosLeaveCriticalSection'
[build] /usr/bin/ld: curand.compute_90.cudafe1.cpp:(.text+0xc4e2): undefined reference to `culibosInitializeCriticalSection'
[build] /usr/bin/ld: curand.compute_90.cudafe1.cpp:(.text+0xc570): undefined reference to `culibosEnterCriticalSection'
[build] /usr/bin/ld: curand.compute_90.cudafe1.cpp:(.text+0xc5b0): undefined reference to `culibosLeaveCriticalSection'
[build] /usr/bin/ld: curand.compute_90.cudafe1.cpp:(.text+0xc5e4): undefined reference to `culibosLeaveCriticalSection'
[build] /usr/bin/ld: curand.compute_90.cudafe1.cpp:(.text+0xc6c1): undefined reference to `culibosInitializeCriticalSection'
[build] /usr/bin/ld: /opt/cuda/lib64/libcurand_static.a(curand.o): in function `curandDeviceConstants<unsigned int>::get(int)':
[build] curand.compute_90.cudafe1.cpp:(.text._ZN21curandDeviceConstantsIjE3getEi[_ZN21curandDeviceConstantsIjE3getEi]+0x1b): undefined reference to `culibosEnterCriticalSection'
[build] /usr/bin/ld: curand.compute_90.cudafe1.cpp:(.text._ZN21curandDeviceConstantsIjE3getEi[_ZN21curandDeviceConstantsIjE3getEi]+0x3b): undefined reference to `culibosLeaveCriticalSection'
[build] /usr/bin/ld: curand.compute_90.cudafe1.cpp:(.text._ZN21curandDeviceConstantsIjE3getEi[_ZN21curandDeviceConstantsIjE3getEi]+0xe0): undefined reference to `culibosLeaveCriticalSection'
[build] /usr/bin/ld: /opt/cuda/lib64/libcurand_static.a(curand.o): in function `curandDeviceConstants<unsigned long long>::curandDeviceConstants(void*, unsigned long, void (*)())':
[build] curand.compute_90.cudafe1.cpp:(.text._ZN21curandDeviceConstantsIyEC2EPvmPFvvE[_ZN21curandDeviceConstantsIyEC5EPvmPFvvE]+0x82): undefined reference to `culibosInitializeCriticalSection'
[build] /usr/bin/ld: /opt/cuda/lib64/libcurand_static.a(curand.o): in function `curandDeviceConstants<unsigned long long>::get(int)':
[build] curand.compute_90.cudafe1.cpp:(.text._ZN21curandDeviceConstantsIyE3getEi[_ZN21curandDeviceConstantsIyE3getEi]+0x1b): undefined reference to `culibosEnterCriticalSection'
[build] /usr/bin/ld: curand.compute_90.cudafe1.cpp:(.text._ZN21curandDeviceConstantsIyE3getEi[_ZN21curandDeviceConstantsIyE3getEi]+0x3b): undefined reference to `culibosLeaveCriticalSection'
[build] /usr/bin/ld: curand.compute_90.cudafe1.cpp:(.text._ZN21curandDeviceConstantsIyE3getEi[_ZN21curandDeviceConstantsIyE3getEi]+0xe0): undefined reference to `culibosLeaveCriticalSection'
[build] clang-14: error: linker command failed with exit code 1 (use -v to see invocation)
[build] make[2]: *** [CMakeFiles/NiuTrans.NMT.dir/build.make:15303: /home/pzzzzz/MyProjects/niutensor/bin/NiuTrans.NMT] Error 1
[build] make[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/NiuTrans.NMT.dir/all] Error 2
[build] make: *** [Makefile:91: all] Error 2
[proc] The command: /usr/bin/cmake --build /home/pzzzzz/MyProjects/niutensor/build --config Debug --target all -- exited with code: 2 and signal: null
[build] Build finished with exit code 2
[cpptools] The build configurations generated do not contain the active build configuration. Using "" for CMAKE_BUILD_TYPE instead of "Debug" to ensure that IntelliSense configurations can be found

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

niutrans / niutrans.nmt Goto Github PK

niutrans.nmt's Introduction

NiuTrans.NMT

Features

Recent Updates

Installation

Requirements

Build from Source

Configure with cmake

Configuration Example

Compile on Linux

Compile on Windows

Usage

Training

Commands

Training Example

Translating

Commands

An Example

Low Precision Inference

Converting Models from Fairseq

A Model Zoo

Papers

Team Members

niutrans.nmt's People

Contributors

Stargazers

Watchers

Forkers

niutrans.nmt's Issues

Recommend Projects

Recommend Topics

Recommend Org