Note: This tutorial is written in Chinese. The English version can refer to The Annotated Transformer.
- In the original blog post, you may not train the model on Multi30k dataset, cause the dataset could not be accessed. (you need to follow the dataset part in this tutorial to download and preprocess the dataset)
This is a step by step implementation of the Transformer model described in the paper Attention is all you need and inspired by the The Annotated Transformer blog post.
The goal of this project is to provide a simple and readable implementation of the Transformer model, and to provide a quick guide for researchers and ML practitioners to understand the model and its implementation.
- Python 3.10
- PyTorch 1.10 + CUDA 11.3
- torchtext==0.3.0
- torchtext==0.12.0
- spacy==3.2.0
- altair==5.1.1
- GPUtil