Coder Social home page Coder Social logo

awslabs / sockeye Goto Github PK

View Code? Open in Web Editor NEW
1.2K 51.0 324.0 10.05 MB

Sequence-to-sequence framework with a focus on Neural Machine Translation based on PyTorch

Home Page: https://awslabs.github.io/sockeye/

License: Apache License 2.0

Shell 0.26% Python 98.96% CSS 0.07% JavaScript 0.37% Dockerfile 0.09% TeX 0.25%
deep-learning deep-neural-networks machine-learning machine-translation neural-machine-translation encoder-decoder attention-mechanism sequence-to-sequence sequence-to-sequence-models sockeye

sockeye's Introduction

Sockeye

Sockeye has entered maintenance mode and is no longer adding new features. We are grateful to everyone who has contributed to Sockeye throughout its development with pull requests, issue reports, and more.

PyPI version GitHub license GitHub issues Documentation Status Torch Nightly

Sockeye is an open-source sequence-to-sequence framework for Neural Machine Translation built on PyTorch. It implements distributed training and optimized inference for state-of-the-art models, powering Amazon Translate and other MT applications. Recent developments and changes are tracked in our CHANGELOG.

For a quickstart guide to training a standard NMT model on any size of data, see the WMT 2014 English-German tutorial.

For questions and issue reports, please file an issue on GitHub.

Version 3.1.x: PyTorch only

With version 3.1.x, we remove support for MXNet 2.x. Models trained with PyTorch and Sockeye 3.0.x remain compatible with Sockeye 3.1.x. Models trained with 2.3.x (using MXNet) and converted to PyTorch with Sockeye 3.0.x's conversion tool can NOT be used with Sockeye 3.1.x.

Version 3.0.0: Concurrent PyTorch and MXNet support

Starting with version 3.0.0, Sockeye is also based on PyTorch. We maintain backwards compatibility with MXNet models of version 2.3.x with 3.0.x. If MXNet 2.x is installed, Sockeye can run both with PyTorch or MXNet.

All models trained with 2.3.x (using MXNet) can be converted to models running with PyTorch using the converter CLI (sockeye.mx_to_pt). This will create a PyTorch parameter file (<model>/params.best) and backup the existing MXNet parameter file to <model>/params.best.mx. Note that this only applies to fully-trained models that are to be used for inference. Continued training of an MXNet model with PyTorch is not supported (because we do not convert training and optimizer states). sockeye.mx_to_pt requires MXNet to be installed into the environment.

All CLIs of Version 3.0.0 now use PyTorch by default, e.g. sockeye-{train,translate,score}. MXNet-based CLIs/modules are still operational and accessible via sockeye-{train,translate,score}-mx.

Sockeye 3 can be installed and run without MXNet, but if installed, an extended test suite is executed to ensure equivalence between PyTorch and MXNet models. Note that running Sockeye 3.0.0 with MXNet requires MXNet 2.x to be installed (pip install --pre -f https://dist.mxnet.io/python 'mxnet>=2.0.0b2021')

Installation

Download the current version of Sockeye:

git clone https://github.com/awslabs/sockeye.git

Install the sockeye module and its dependencies:

cd sockeye && pip3 install --editable .

For faster GPU training, install NVIDIA Apex. NVIDIA also provides PyTorch Docker containers that include Apex.

Documentation

Older versions

  • Sockeye 3.0, based on PyTorch & MXNet 2.x is available in the sockeye_30 branch.
  • Sockeye 2.x, based on the MXNet Gluon API, is available in the sockeye_2 branch.
  • Sockeye 1.x, based on the MXNet Module API, is available in the sockeye_1 branch.

Citation

For more information about Sockeye, see our papers (BibTeX).

Sockeye 3.x

Felix Hieber, Michael Denkowski, Tobias Domhan, Barbara Darques Barros, Celina Dong Ye, Xing Niu, Cuong Hoang, Ke Tran, Benjamin Hsu, Maria Nadejde, Surafel Lakew, Prashant Mathur, Anna Currey, Marcello Federico. Sockeye 3: Fast Neural Machine Translation with PyTorch. ArXiv e-prints.

Sockeye 2.x

Tobias Domhan, Michael Denkowski, David Vilar, Xing Niu, Felix Hieber, Kenneth Heafield. The Sockeye 2 Neural Machine Translation Toolkit at AMTA 2020. Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (AMTA'20).

Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar. Sockeye 2: A Toolkit for Neural Machine Translation. Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, Project Track (EAMT'20).

Sockeye 1.x

Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, Matt Post. The Sockeye Neural Machine Translation Toolkit at AMTA 2018. Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (AMTA'18).

Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton and Matt Post. 2017. Sockeye: A Toolkit for Neural Machine Translation. ArXiv e-prints.

Research with Sockeye

Sockeye has been used for both academic and industrial research. A list of known publications that use Sockeye is shown below. If you know more, please let us know or submit a pull request (last updated: May 2022).

2023

  • Zhang, Xuan, Kevin Duh, Paul McNamee. "A Hyperparameter Optimization Toolkit for Neural Machine Translation Research". Proceedings of ACL (2023).

2022

  • Currey, Anna, Maria Nădejde, Raghavendra Pappagari, Mia Mayer, Stanislas Lauly, Xing Niu, Benjamin Hsu, Georgiana Dinu. "MT-GenEval: A Counterfactual and Contextual Dataset for Evaluating Gender Accuracy in Machine Translation". Proceedings of EMNLP (2022).
  • Domhan, Tobias, Eva Hasler, Ke Tran, Sony Trenous, Bill Byrne and Felix Hieber. "The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation". Proceedings of NAACL-HLT (2022)
  • Fischer, Lukas, Patricia Scheurer, Raphael Schwitter, Martin Volk. "Machine Translation of 16th Century Letters from Latin to German". Workshop on Language Technologies for Historical and Ancient Languages (2022).
  • Knowles, Rebecca, Patrick Littell. "Translation Memories as Baselines for Low-Resource Machine Translation". Proceedings of LREC (2022)
  • McNamee, Paul, Kevin Duh. "The Multilingual Microblog Translation Corpus: Improving and Evaluating Translation of User-Generated Text". Proceedings of LREC (2022)
  • Nadejde Maria, Anna Currey, Benjamin Hsu, Xing Niu, Marcello Federico, Georgiana Dinu. "CoCoA-MT: A Dataset and Benchmark for Contrastive Controlled MT with Application to Formality". Proceedings of NAACL (2022).
  • Weller-Di Marco, Marion, Matthias Huck, Alexander Fraser. "Modeling Target-Side Morphology in Neural Machine Translation: A Comparison of Strategies ". arXiv preprint arXiv:2203.13550 (2022)

2021

  • Bergmanis, Toms, Mārcis Pinnis. "Facilitating Terminology Translation with Target Lemma Annotations". arXiv preprint arXiv:2101.10035 (2021)
  • Briakou, Eleftheria, Marine Carpuat. "Beyond Noise: Mitigating the Impact of Fine-grained Semantic Divergences on Neural Machine Translation". arXiv preprint arXiv:2105.15087 (2021)
  • Hasler, Eva, Tobias Domhan, Sony Trenous, Ke Tran, Bill Byrne, Felix Hieber. "Improving the Quality Trade-Off for Neural Machine Translation Multi-Domain Adaptation". Proceedings of EMNLP (2021)
  • Tang, Gongbo, Philipp Rönchen, Rico Sennrich, Joakim Nivre. "Revisiting Negation in Neural Machine Translation". Transactions of the Association for Computation Linguistics 9 (2021)
  • Vu, Thuy, Alessandro Moschitti. "Machine Translation Customization via Automatic Training Data Selection from the Web". arXiv preprint arXiv:2102.1024 (2021)
  • Xu, Weijia, Marine Carpuat. "EDITOR: An Edit-Based Transformer with Repositioning for Neural Machine Translation with Soft Lexical Constraints." Transactions of the Association for Computation Linguistics 9 (2021)
  • Müller, Mathias, Rico Sennrich. "Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation". Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (2021)
  • Popović, Maja, Alberto Poncelas. "On Machine Translation of User Reviews." Proceedings of RANLP (2021)
  • Popović, Maja. "On nature and causes of observed MT errors." Proceedings of the 18th MT Summit (Volume 1: Research Track) (2021)
  • Jain, Nishtha, Maja Popović, Declan Groves, Eva Vanmassenhove. "Generating Gender Augmented Data for NLP." Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing (2021)
  • Vilar, David, Marcello Federico. "A Statistical Extension of Byte-Pair Encoding." Proceedings of IWSLT (2021)

2020

  • Dinu, Georgiana, Prashant Mathur, Marcello Federico, Stanislas Lauly, Yaser Al-Onaizan. "Joint translation and unit conversion for end-to-end localization." Proceedings of IWSLT (2020)
  • Exel, Miriam, Bianka Buschbeck, Lauritz Brandt, Simona Doneva. "Terminology-Constrained Neural Machine Translation at SAP". Proceedings of EAMT (2020).
  • Hisamoto, Sorami, Matt Post, Kevin Duh. "Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?" Transactions of the Association for Computational Linguistics, Volume 8 (2020)
  • Naradowsky, Jason, Xuan Zhan, Kevin Duh. "Machine Translation System Selection from Bandit Feedback." arXiv preprint arXiv:2002.09646 (2020)
  • Niu, Xing, Prashant Mathur, Georgiana Dinu, Yaser Al-Onaizan. "Evaluating Robustness to Input Perturbations for Neural Machine Translation". arXiv preprint arXiv:2005.00580 (2020)
  • Niu, Xing, Marine Carpuat. "Controlling Neural Machine Translation Formality with Synthetic Supervision." Proceedings of AAAI (2020)
  • Keung, Phillip, Julian Salazar, Yichao Liu, Noah A. Smith. "Unsupervised Bitext Mining and Translation via Self-Trained Contextual Embeddings." arXiv preprint arXiv:2010.07761 (2020).
  • Sokolov, Alex, Tracy Rohlin, Ariya Rastrow. "Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion." arXiv preprint arXiv:2006.14194 (2020)
  • Stafanovičs, Artūrs, Toms Bergmanis, Mārcis Pinnis. "Mitigating Gender Bias in Machine Translation with Target Gender Annotations." arXiv preprint arXiv:2010.06203 (2020)
  • Stojanovski, Dario, Alexander Fraser. "Addressing Zero-Resource Domains Using Document-Level Context in Neural Machine Translation." arXiv preprint arXiv preprint arXiv:2004.14927 (2020)
  • Stojanovski, Dario, Benno Krojer, Denis Peskov, Alexander Fraser. "ContraCAT: Contrastive Coreference Analytical Templates for Machine Translation". Proceedings of COLING (2020)
  • Zhang, Xuan, Kevin Duh. "Reproducible and Efficient Benchmarks for Hyperparameter Optimization of Neural Machine Translation Systems." Transactions of the Association for Computational Linguistics, Volume 8 (2020)
  • Swe Zin Moe, Ye Kyaw Thu, Hnin Aye Thant, Nandar Win Min, and Thepchai Supnithi, "Unsupervised Neural Machine Translation between Myanmar Sign Language and Myanmar Language", Journal of Intelligent Informatics and Smart Technology, April 1st Issue, 2020, pp. 53-61. (Submitted December 21, 2019; accepted March 6, 2020; revised March 16, 2020; published online April 30, 2020)
  • Thazin Myint Oo, Ye Kyaw Thu, Khin Mar Soe and Thepchai Supnithi, "Neural Machine Translation between Myanmar (Burmese) and Dawei (Tavoyan)", In Proceedings of the 18th International Conference on Computer Applications (ICCA 2020), Feb 27-28, 2020, Yangon, Myanmar, pp. 219-227
  • Müller, Mathias, Annette Rios, Rico Sennrich. "Domain Robustness in Neural Machine Translation." Proceedings of AMTA (2020)
  • Rios, Annette, Mathias Müller, Rico Sennrich. "Subword Segmentation and a Single Bridge Language Affect Zero-Shot Neural Machine Translation." Proceedings of the 5th WMT: Research Papers (2020)
  • Popović, Maja, Alberto Poncelas. "Neural Machine Translation between similar South-Slavic languages." Proceedings of the 5th WMT: Research Papers (2020)
  • Popović, Maja, Alberto Poncelas. "Extracting correctly aligned segments from unclean parallel data using character n-gram matching." Proceedings of Conference on Language Technologies & Digital Humanities (JTDH 2020).
  • Popović, Maja, Alberto Poncelas, Marija Brkic, Andy Way. "Neural Machine Translation for translating into Croatian and Serbian." Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects (2020)

2019

  • Agrawal, Sweta, Marine Carpuat. "Controlling Text Complexity in Neural Machine Translation." Proceedings of EMNLP (2019)
  • Beck, Daniel, Trevor Cohn, Gholamreza Haffari. "Neural Speech Translation using Lattice Transformations and Graph Networks." Proceedings of TextGraphs-13 (EMNLP 2019)
  • Currey, Anna, Kenneth Heafield. "Zero-Resource Neural Machine Translation with Monolingual Pivot Data." Proceedings of EMNLP (2019)
  • Gupta, Prabhakar, Mayank Sharma. "Unsupervised Translation Quality Estimation for Digital Entertainment Content Subtitles." IEEE International Journal of Semantic Computing (2019)
  • Hu, J. Edward, Huda Khayrallah, Ryan Culkin, Patrick Xia, Tongfei Chen, Matt Post, and Benjamin Van Durme. "Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting." Proceedings of NAACL-HLT (2019)
  • Rosendahl, Jan, Christian Herold, Yunsu Kim, Miguel Graça,Weiyue Wang, Parnia Bahar, Yingbo Gao and Hermann Ney “The RWTH Aachen University Machine Translation Systems for WMT 2019” Proceedings of the 4th WMT: Research Papers (2019)
  • Thompson, Brian, Jeremy Gwinnup, Huda Khayrallah, Kevin Duh, and Philipp Koehn. "Overcoming catastrophic forgetting during domain adaptation of neural machine translation." Proceedings of NAACL-HLT 2019 (2019)
  • Tättar, Andre, Elizaveta Korotkova, Mark Fishel “University of Tartu’s Multilingual Multi-domain WMT19 News Translation Shared Task Submission” Proceedings of 4th WMT: Research Papers (2019)
  • Thazin Myint Oo, Ye Kyaw Thu and Khin Mar Soe, "Neural Machine Translation between Myanmar (Burmese) and Rakhine (Arakanese)", In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, NAACL-2019, June 7th 2019, Minneapolis, United States, pp. 80-88

2018

  • Domhan, Tobias. "How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures". Proceedings of 56th ACL (2018)
  • Kim, Yunsu, Yingbo Gao, and Hermann Ney. "Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies." arXiv preprint arXiv:1905.05475 (2019)
  • Korotkova, Elizaveta, Maksym Del, and Mark Fishel. "Monolingual and Cross-lingual Zero-shot Style Transfer." arXiv preprint arXiv:1808.00179 (2018)
  • Niu, Xing, Michael Denkowski, and Marine Carpuat. "Bi-directional neural machine translation with synthetic parallel data." arXiv preprint arXiv:1805.11213 (2018)
  • Niu, Xing, Sudha Rao, and Marine Carpuat. "Multi-Task Neural Models for Translating Between Styles Within and Across Languages." COLING (2018)
  • Post, Matt and David Vilar. "Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation." Proceedings of NAACL-HLT (2018)
  • Schamper, Julian, Jan Rosendahl, Parnia Bahar, Yunsu Kim, Arne Nix, and Hermann Ney. "The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018." Proceedings of the 3rd WMT: Shared Task Papers (2018)
  • Schulz, Philip, Wilker Aziz, and Trevor Cohn. "A stochastic decoder for neural machine translation." arXiv preprint arXiv:1805.10844 (2018)
  • Tamer, Alkouli, Gabriel Bretschner, and Hermann Ney. "On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation." Proceedings of the 3rd WMT: Research Papers (2018)
  • Tang, Gongbo, Rico Sennrich, and Joakim Nivre. "An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation." Proceedings of 3rd WMT: Research Papers (2018)
  • Thompson, Brian, Huda Khayrallah, Antonios Anastasopoulos, Arya McCarthy, Kevin Duh, Rebecca Marvin, Paul McNamee, Jeremy Gwinnup, Tim Anderson, and Philipp Koehn. "Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation." arXiv preprint arXiv:1809.05218 (2018)
  • Vilar, David. "Learning Hidden Unit Contribution for Adapting Neural Machine Translation Models." Proceedings of NAACL-HLT (2018)
  • Vyas, Yogarshi, Xing Niu and Marine Carpuat “Identifying Semantic Divergences in Parallel Text without Annotations”. Proceedings of NAACL-HLT (2018)
  • Wang, Weiyue, Derui Zhu, Tamer Alkhouli, Zixuan Gan, and Hermann Ney. "Neural Hidden Markov Model for Machine Translation". Proceedings of 56th ACL (2018)
  • Zhang, Xuan, Gaurav Kumar, Huda Khayrallah, Kenton Murray, Jeremy Gwinnup, Marianna J Martindale, Paul McNamee, Kevin Duh, and Marine Carpuat. "An Empirical Exploration of Curriculum Learning for Neural Machine Translation." arXiv preprint arXiv:1811.00739 (2018)
  • Swe Zin Moe, Ye Kyaw Thu, Hnin Aye Thant and Nandar Win Min, "Neural Machine Translation between Myanmar Sign Language and Myanmar Written Text", In the second Regional Conference on Optical character recognition and Natural language processing technologies for ASEAN languages 2018 (ONA 2018), December 13-14, 2018, Phnom Penh, Cambodia.
  • Tang, Gongbo, Mathias Müller, Annette Rios and Rico Sennrich. "Why Self-attention? A Targeted Evaluation of Neural Machine Translation Architectures." Proceedings of EMNLP (2018)

2017

  • Domhan, Tobias and Felix Hieber. "Using target-side monolingual data for neural machine translation through multi-task learning." Proceedings of EMNLP (2017).

sockeye's People

Contributors

abdelrahmanbadawy avatar annacurrey avatar artemsok avatar b0noi avatar blchu avatar bricksdont avatar davvil avatar deseaus avatar fhieber avatar gonzaloiglesiasiglesias avatar graehl avatar hmashlah avatar hoangcuong2011 avatar hyandell avatar kellensunderland avatar ketranm avatar kpuatamazon avatar logogin avatar lorisbaz avatar marismmm avatar mjdenkowski avatar mjpost avatar samuellarkin avatar tdomhan avatar tholiao avatar tomlippincott avatar tuglat avatar xingniu avatar xinyu-intel avatar ye-kyaw-thu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sockeye's Issues

Missing params.best

If you set low max_num_epochs and train_early_stop_when values then params.best file is often missing although the training has succeeded. This is likely caused by the fact that there was no improvement of the model.
The params.best file should always be present after a model is trained.

Implementing Copy Mechanism/Pointer Network

Hi, I'm interested in implementing pointer networks/copy mechanism in SockEye and would be interested in knowing what would be the best way to implement it in your framework.

Best practice of using dropout

Hello,
I used the following options to get 22.15 BLEU, but got terrible scores (6.09 BLEU) when adding dropout.

	python3 -m sockeye.train \
		-s $train_clean.$lang_in \
		-t $train_clean.$lang_out \
		-vs $data_dir/$tune.$lang_in \
		-vt $data_dir/$tune.$lang_out \
		-o $exp_dir/model \
		--rnn-cell-type gru \
		--rnn-num-hidden 1000 \
		--max-seq-len 50:50 \
		--batch-size 80 \
		--checkpoint-frequency 1000 \
		--max-num-checkpoint-not-improved 8 \
		--monitor-bleu -1 \
		--device-ids $gpu_id

I tried

--rnn-dropout .1:.1 \

for Sockeye 1.5.1, and

--rnn-dropout-inputs .1:.1 \
--rnn-dropout-states .1:.1 \
--embed-dropout .1:.1 \ # (either used or not used)

for Sockeye 1.7.1. But they all got exactly the same low BLEU score.
Any suggestions to using dropout with Sockeye? Thanks!

Using cell type lnlstm cause trainning error

Hi, I want to use the built-in LSTM to train a model. However, when I chose the cell-type "lnlstm" there is a error. I guess the error is caused by different dimension of i2h and h2h vector.
Here is my training script.

trainning script
python -m sockeye.train --source /home/user/multi30k/train.en --target /home/user/multi30k/train.de --validation-source /home/user/multi30k/val.en --validation-target /home/user/multi30k/val.de --output test_train --encoder rnn --decoder rnn --num-layers '6:6' --rnn-num-hidden 512 --rnn-cell-type lnlstm --rnn-residual-connections --optimizer adam --initial-learning-rate 0.0002 --learning-rate-reduce-factor 0.7 --learning-rate-reduce-num-not-improved 8 --max-num-checkpoint-not-improved 32 --batch-size 4000 --batch-type word --rnn-attention-type mlp --rnn-dropout-inputs 0.1 --rnn-decoder-hidden-dropout 0.2 --use-tensorboard --checkpoint-frequency 4000 --rnn-attention-in-upper-layers --device-id
error message

[11:33:47] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [11:33:47] src/operator/./slice_channel-inl.h:208: Check failed: dshape[real_axis] % param_.num_outputs == 0U (10 vs. 0) You are trying to split the 0-th axis of input tensor with shape [10,253,512] into num_outputs=100 evenly sized chunks, but this is not possible because 100 does not evenly divide 10

Stack trace returned 10 entries:
[bt] (0) /home/user/miniconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x28965c) [0x7f5d6776565c]
[bt] (1) /home/user/miniconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2964e97) [0x7f5d69e40e97]
[bt] (2) /home/user/miniconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2688337) [0x7f5d69b64337]
[bt] (3) /home/user/miniconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x249c52f) [0x7f5d6997852f]
[bt] (4) /home/user/miniconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x249f039) [0x7f5d6997b039]
[bt] (5) /home/user/miniconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2482aa9) [0x7f5d6995eaa9]
[bt] (6) /home/user/miniconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2483564) [0x7f5d6995f564]
[bt] (7) /home/user/miniconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(MXExecutorSimpleBind+0x2250) [0x7f5d698cec80]
[bt] (8) /home/user/miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(ffi_call_unix64+0x4c) [0x7f5d84fcf550]
[bt] (9) /home/user/miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(ffi_call+0x1f5) [0x7f5d84fcecf5]

I modified sockeye/rnn.py:215

        if True or self._counter == 0:
            self._shape_fix = mx.sym.zeros_like(i2h)
        else:
            assert self._shape_fix is not None

        h2h = mx.sym.FullyConnected(data=states[0], weight=self._hW, bias=self._hB,
                                    num_hidden=self._num_hidden * 4,
                                    name='%sh2h' % name)

        gates = self._hN.normalize(self._shape_fix + h2h)

I always allocate self._shape_fix so trainning can run as exception.

Recommended paramters for transformer model

Hi,
what are your recommended command line options for the transformer model? I would like to see if I can reproduce the results from tensor2tensor with sockeye. My first attempts with your transformer implementation do not really seem to converge.

My current options:

python3 -m sockeye.train -s train.en \                                                                                                                                                                                                 
                         -t train.de \                                                                                                                                                                                                 
                         -vs valid.en \                                                                                                                                                                                                
                         -vt valid.de \                                                                                                                                                                                                
                         --num-embed 512 \                                                                                                                                                                                             
                         --rnn-num-hidden 2048 \                                                                                                                                                                                       
                         --attention-type dot \                                                                                                                                                                                        
                         --max-seq-len 50 \                                                                                                                                                                                            
                         --encoder transformer --decoder transformer \                                                                                                                                                                 
                         --num-layers 6 --layer-normalization \                                                                                                                                                                        
                         --weight-tying-type src_trg_softmax --weight-tying \                                                                                                                                                          
                         -o wmt_model

I guess some learning rate magic might be needed?

What is the purpose of the equivalent NDArray implementation for OutputLayer?

It's never been used.

sockeye/sockeye/layers.py

Lines 183 to 191 in c0f1296

# Equivalent NDArray implementation (requires passed weights/biases)
assert isinstance(hidden, mx.nd.NDArray)
utils.check_condition(weight is not None and bias is not None,
"OutputLayer NDArray implementation requires passing weight and bias NDArrays.")
return mx.nd.FullyConnected(data=hidden,
num_hidden=bias.shape[0],
weight=weight,
bias=bias,
flatten=False)

I'm not sure if it's proper to ask questions like this in Issues.

ModelConfig getting cluttered with parameters

With more and more flags being added to training, the ModelConfig object gets cluttered with parameters (and their default settings) that are often not even used as their are conditional parameters.
It would be much better to have hierarchical configs that are put together in a composite configuration. We could split configurations into

  • encoder
  • decoder
  • attention
    configurations, each of one being an individual configuration object.

Configuration objects should support JSON serialization, ideally also for nested configurations.

We might not be able to continue to use a NamedTuple for a config object as it may contain another config object as a member and this can be easily serialized.

Lets discuss if we should come up with our own ConfigObject implementation or use some existing library.

fairseq support

Is sockeye going to support facebook's fairseq and goolge's tensor2tensor?

no pair has frequency >= 2. Stopping

Hi all,
python -m learn_joint_bpe_and_vocab --input corpus.en corpus.ch -s 30000 -o bpe.codes --write-vocabulary bpe.vocab.en bpe.vocab.ch
After running the above command, the following hints appear:no pair has frequency >= 2. Stopping。
I don't understand what this message means. I hope to give it a answer, thank you!

Segmentation Fault when training with GPU

When I am trying the demo introduced on this blog:

python3 -m sockeye.train -s ~/wmt17/train.de \
                         -t ~/wmt17/train.en \
                         -vs ~/wmt17/newstest2016.tc.de \
                         -vt ~/wmt17/newstest2016.tc.en \
                         --num-embed 128 \
                         --rnn-num-hidden 512 \
                         --attention-type dot \
                         --dropout 0.5 \
                         -o modeldeen

it shows the Segmentation fault below:

[INFO:__main__] Optimizer Parameters: {'wd': 0.0, 'rescale_grad': 0.015625, 'learning_rate': 0.0003, 'clip_gradient': 1.0, 'lr_scheduler': LearningRateSchedulerPlateauReduce(reduce_factor=0.50, reduce_num_not_improved=0)}
[INFO:sockeye.model] Saved config to "/home/ec2-user/sockeye/modeldeen/config"
[INFO:sockeye.training] Training started.
[INFO:sockeye.callback] Early stopping by optimizing 'perplexity' (minimize=True)
/usr/lib64/python3.4/multiprocessing/semaphore_tracker.py:129: UserWarning: semaphore_tracker: There appear to be 3 leaked semaphores to clean up at shutdown
  len(cache))
traindeen.sh: line 9:  3380 Segmentation fault      python3 -m sockeye.train -s ~/wmt17/train.de -t ~/wmt17/train.en -vs ~/wmt17/newstest2016.tc.de -vt ~/wmt17/newstest2016.tc.en --num-embed 128 --rnn-num-hidden 512 --attention-type dot --dropout 0.5 -o modeldeen

After I added --use-cpu, it works fine.


Or if I delete the args

                         --num-embed 128 \
                         --rnn-num-hidden 512 \
                         --attention-type dot \
                         --dropout 0.5 \

It works fine.

Initialize embedding weights with pretrained word representations

In order to initialize Sockeye embedding weights with pretrained word representations, I am thinking to:

  1. Pretrain word representations using Corpus-A and get Embed-A and Vocab-A.
  2. Create the vocabulary of Corpus-B which will be used for NMT, i.e. Vocab-B. (Using sockeye-vocab)
  3. Embed-A, Vocab-A, Vocab-B -> Embed-B.
    3.1. Load Embed-A and Vocab-A;
    3.2. Create Embed-B with random values;
    3.3. Copy word representations from Embed-A to Embed-B for overlaps between Vocab-A and Vocab-B.
    3.4. Store Embed-B in the format of Dict[str, mx.nd.NDArray], str can be for example "target_embed_weight".
  4. Start Sockeye training by loading Embed-B with model parameter --params.

Is this workflow correct? If so I can create a tool to do Step 3.

Adding metadata for source sentences as arguments

Greetings,

Some published NMT architectures rely on adding extra linguistic information to the source sentence. Examples include POS tags (https://www.aclweb.org/anthology/U/U16/U16-1001.pdf) and syntactic trees (http://www.aclweb.org/anthology/P16-1078 https://arxiv.org/pdf/1704.04675.pdf). This issue is an enhancement idea to add provisions to sockeye to read this kind of metadata into the toolkit.

At least initially, the idea is not to implement any of these encoders, but just make sure the metadata is properly read and propagated all the way to the encoder interface. I was thinking about having an optional keyword "metadata" to the "encode" method in the Encoder interface. Then users can extend or implement their own encoders while safely assuming the metadata (POS, syntactic trees, etc.) will be available in that metadata field.

I started to implement this idea in the context of adding semantic graphs to the input. Code is being hosted here:
https://github.com/ws17mt/sockeye/tree/source_metadata

I am happy to keep working on this but some feedback would be appreciated. Specifically:

  • Is the "metadata keyword" approach the best?
  • How does this would affect bucketing? Right now I am just propagating the metadata through the bucketing interface to make sure everything reaches the decoder but the bucketing approaches might not be useful in this setting anymore.
  • How to make things work at decoding time?
  • I am assuming tokens in the metadata are separated by spaces so I can just reuse the data reading methods. Not sure if this is the best approach though.

Fused Bidirectional Encoder

Hi,

The fused bidirectional encoder seems to be disabled at all times (code snippet), even when the user specifies that they prefer a bidirectional encoder, as well as fusion (via cuDNN). Is this something that is known to be problematic and is hence turned off?

Mentioning @fhieber here since this is probably of interest.

NameError: name 'broadcast_maximum' is not defined

OS : Ubuntu 16.04
Python 3.5.2
mxnet :0.11.1

sockeye : 1.5.1
Run :

error:
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/lianghong/Project/github/sockeye/sockeye/train.py", line 417, in
main()
File "/home/lianghong/Project/github/sockeye/sockeye/train.py", line 413, in main
mxmonitor_stat_func=args.monitor_stat_func)
File "/home/lianghong/Project/github/sockeye/sockeye/training.py", line 246, in fit
mxmonitor=monitor)
File "/home/lianghong/Project/github/sockeye/sockeye/training.py", line 324, in _fit
self.module.update_metric(metric_train, batch.label)
File "/home/lianghong/Project/github/mxnet/python/mxnet/module/bucketing_module.py", line 488, in update_metric
self._curr_module.update_metric(eval_metric, labels)
File "/home/lianghong/Project/github/mxnet/python/mxnet/module/module.py", line 735, in update_metric
self.exec_group.update_metric(eval_metric, labels)
File "/home/lianghong/Project/github/mxnet/python/mxnet/module/executor_group.py", line 582, in update_metric
eval_metric.update_dict(labels
, preds)
File "/home/lianghong/Project/github/mxnet/python/mxnet/metric.py", line 280, in update_dict
metric.update_dict(labels, preds)
File "/home/lianghong/Project/github/mxnet/python/mxnet/metric.py", line 108, in update_dict
self.update(label, pred)
File "/home/lianghong/Project/github/mxnet/python/mxnet/metric.py", line 657, in update
loss -= ndarray.sum(ndarray.log(ndarray.maximum(1e-10, pred))).asscalar()
File "/home/lianghong/Project/github/mxnet/python/mxnet/ndarray/ndarray.py", line 1793, in maximum
broadcast_maximum,
NameError: name 'broadcast_maximum' is not defined

Many thanks.

Unsolicited acquisition of GPU device

For some reason, sockeye acquires a device I did not ask for. For the following training command:

python3 -m sockeye.train \
--decode-and-evaluate-device-id 1 \
--device-ids 5 6 7 \
--lock-dir /var/tmp \
[...]

(Includes all relevant options, I just omitted them because I suspect they do not matter.) After the devices are acquired, the output of gpustat is:

[0] GeForce GTX TITAN X | 40'C, 0 % | 217 / 12207 MB | mmueller(213M)
[1] GeForce GTX TITAN X | 42'C, 0 % | 2 / 12207 MB |
[2] GeForce GTX TITAN X | 66'C, 85 % | 11146 / 12207 MB | [not me]
[3] GeForce GTX TITAN X | 62'C, 87 % | 9995 / 12207 MB | [not me]
[4] GeForce GTX TITAN X | 68'C, 81 % | 11231 / 12207 MB | [not me]
[5] GeForce GTX TITAN X | 77'C, 60 % | 6631 / 12207 MB | mmueller(6627M)
[6] GeForce GTX TITAN X | 70'C, 95 % | 6095 / 12207 MB | mmueller(6091M)
[7] GeForce GTX TITAN X | 67'C, 92 % | 6094 / 12207 MB | mmueller(6090M)

Everything as expected, GPU 1 is not showing up yet because it is the validation device, but on device 0, 213 MB are occupied. And nvidia-smi shows that the same Python 3 process has taken those 213 MB on device 0:

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 14929 C python3 213MiB |
| 2 8981 C python 11144MiB |
| 3 18504 C python 9993MiB |
| 4 5915 C python 11229MiB |
| 5 14929 C python3 6627MiB |
| 6 14929 C python3 6091MiB |
| 7 14929 C python3 6090MiB |
+-----------------------------------------------------------------------------+

In my lock directory, gpu0 does not even have a lock:

$ ll /var/tmp/sockeye.*.lock
-rw-r--r-- 1 mmueller users 6 Jan 17 17:37 /var/tmp/sockeye.gpu5.lock
-rw-r--r-- 1 mmueller users 6 Jan 17 17:37 /var/tmp/sockeye.gpu6.lock
-rw-r--r-- 1 mmueller users 6 Jan 17 17:37 /var/tmp/sockeye.gpu7.lock

Any idea why this is happening? Thanks!

Problems with building from source

When I try to build from source using "python setup.py install -r requirements.gpu-cu80.txt" I get the following error:

Searching for mxnet-cu80==0.10.0
Reading https://pypi.python.org/simple/mxnet-cu80/
No local packages or working download links found for mxnet-cu80==0.10.0
error: Could not find suitable distribution for Requirement.parse('mxnet-cu80==0.10.0')

However, typing "pip install mxnet-cu80==0.10.0" solves that problem.

Next, when I run "pip install -e '.[optional]'" the command fails on the first attempt with the following output:

Obtaining file:///home/pschulz/sockeye
Collecting pyyaml (from sockeye==1.7.1)
Downloading PyYAML-3.12.tar.gz (253kB)
100% |████████████████████████████████| 256kB 5.9MB/s
Collecting mxnet==0.10.0 (from sockeye==1.7.1)
Using cached mxnet-0.10.0-py2.py3-none-manylinux1_x86_64.whl
Requirement already satisfied: numpy>=1.12 in /home/pschulz/toy-env/lib/python3.6/site-packages/numpy-1.13.1-py3.6-linux-x86_64.egg (from sockeye==1.7.1)
Collecting tensorboard (from sockeye==1.7.1)
Using cached tensorboard-1.0.0a6-cp36-cp36m-manylinux1_x86_64.whl
Collecting matplotlib (from sockeye==1.7.1)
Using cached matplotlib-2.0.2-cp36-cp36m-manylinux1_x86_64.whl
Collecting werkzeug>=0.11.10 (from tensorboard->sockeye==1.7.1)
Using cached Werkzeug-0.12.2-py2.py3-none-any.whl
Requirement already satisfied: wheel>=0.26 in /home/pschulz/toy-env/lib/python3.6/site-packages (from tensorboard->sockeye==1.7.1)
Collecting six>=1.10.0 (from tensorboard->sockeye==1.7.1)
Using cached six-1.10.0-py2.py3-none-any.whl
Collecting Pillow>=4.0.0 (from tensorboard->sockeye==1.7.1)
Using cached Pillow-4.2.1-cp36-cp36m-manylinux1_x86_64.whl
Collecting protobuf>=3.1.0 (from tensorboard->sockeye==1.7.1)
Using cached protobuf-3.4.0-cp36-cp36m-manylinux1_x86_64.whl
Collecting pytz (from matplotlib->sockeye==1.7.1)
Using cached pytz-2017.2-py2.py3-none-any.whl
Collecting cycler>=0.10 (from matplotlib->sockeye==1.7.1)
Using cached cycler-0.10.0-py2.py3-none-any.whl
Collecting pyparsing!=2.0.0,!=2.0.4,!=2.1.2,!=2.1.6,>=1.5.6 (from matplotlib->sockeye==1.7.1)
Using cached pyparsing-2.2.0-py2.py3-none-any.whl
Collecting python-dateutil (from matplotlib->sockeye==1.7.1)
Using cached python_dateutil-2.6.1-py2.py3-none-any.whl
Collecting olefile (from Pillow>=4.0.0->tensorboard->sockeye==1.7.1)
Requirement already satisfied: setuptools in /home/pschulz/toy-env/lib/python3.6/site-packages (from protobuf>=3.1.0->tensorboard->sockeye==1.7.1)
Building wheels for collected packages: pyyaml
Running setup.py bdist_wheel for pyyaml ... done
Stored in directory: /home/pschulz/.cache/pip/wheels/2c/f7/79/13f3a12cd723892437c0cfbde1230ab4d82947ff7b3839a4fc
Successfully built pyyaml
Installing collected packages: pyyaml, mxnet, werkzeug, six, olefile, Pillow, protobuf, tensorboard, pytz, cycler, pyparsing, python-dateutil, matplotlib, sockeye
Found existing installation: sockeye 1.7.1
Uninstalling sockeye-1.7.1:
Successfully uninstalled sockeye-1.7.1
Running setup.py develop for sockeye
Successfully installed Pillow-4.2.1 cycler-0.10.0 matplotlib-2.0.2 mxnet-0.10.0 olefile-0.44 protobuf-3.4.0 pyparsing-2.2.0 python-dateutil-2.6.1 pytz-2017.2 pyyaml-3.12 six-1.10.0 sockeye tensorboard-1.0.0a6 werkzeug-0.12.2
Traceback (most recent call last):
File "/home/pschulz/toy-env/bin/pip", line 11, in
sys.exit(main())
File "/home/pschulz/toy-env/lib/python3.6/site-packages/pip/init.py", line 233, in main
return command.main(cmd_args)
File "/home/pschulz/toy-env/lib/python3.6/site-packages/pip/basecommand.py", line 252, in main
pip_version_check(session)
File "/home/pschulz/toy-env/lib/python3.6/site-packages/pip/utils/outdated.py", line 102, in pip_version_check
installed_version = get_installed_version("pip")
File "/home/pschulz/toy-env/lib/python3.6/site-packages/pip/utils/init.py", line 838, in get_installed_version
working_set = pkg_resources.WorkingSet()
File "/home/pschulz/toy-env/lib/python3.6/site-packages/pip/_vendor/pkg_resources/init.py", line 644, in init
self.add_entry(entry)
File "/home/pschulz/toy-env/lib/python3.6/site-packages/pip/_vendor/pkg_resources/init.py", line 700, in add_entry
for dist in find_distributions(entry, True):
File "/home/pschulz/toy-env/lib/python3.6/site-packages/pip/_vendor/pkg_resources/init.py", line 1949, in find_eggs_in_zip
if metadata.has_metadata('PKG-INFO'):
File "/home/pschulz/toy-env/lib/python3.6/site-packages/pip/_vendor/pkg_resources/init.py", line 1463, in has_metadata
return self.egg_info and self._has(self._fn(self.egg_info, name))
File "/home/pschulz/toy-env/lib/python3.6/site-packages/pip/_vendor/pkg_resources/init.py", line 1823, in _has
return zip_path in self.zipinfo or zip_path in self._index()
File "/home/pschulz/toy-env/lib/python3.6/site-packages/pip/_vendor/pkg_resources/init.py", line 1703, in zipinfo
return self._zip_manifests.load(self.loader.archive)
File "/home/pschulz/toy-env/lib/python3.6/site-packages/pip/_vendor/pkg_resources/init.py", line 1643, in load
mtime = os.stat(path).st_mtime
FileNotFoundError: [Errno 2] No such file or directory: '/home/pschulz/toy-env/lib/python3.6/site-packages/sockeye-1.7.1-py3.6.egg'

It does, however, succeed, when I run it a second time. Any idea where this problem might stem from?

Thanks a lot in advance.

Potential error of "permission denied" when trying to open lock files

A "permission denied" error would be raised if the lock file was owned by another user.
Is moving L552 into try-except a possible solution?

sockeye/sockeye/utils.py

Lines 549 to 575 in d436ae8

def __enter__(self) -> Optional[int]:
for gpu_id in self.candidates:
lockfile_path = os.path.join(self.lock_dir, "sockeye.gpu%d.lock" % gpu_id)
lock_file = open(lockfile_path, 'w')
try:
# exclusive non-blocking lock
fcntl.flock(lock_file, fcntl.LOCK_EX | fcntl.LOCK_NB)
# got the lock, let's write our PID into it:
lock_file.write("%d\n" % os.getpid())
lock_file.flush()
self._acquired_lock = True
self.gpu_id = gpu_id
self.lock_file = lock_file
self.lockfile_path = lockfile_path
logger.info("Acquired GPU %d." % gpu_id)
return gpu_id
except IOError as e:
# raise on unrelated IOErrors
if e.errno != errno.EAGAIN:
logger.error("Failed acquiring GPU lock.", exc_info=True)
raise
else:
logger.debug("GPU %d is currently locked.", gpu_id)
return None

python versioning

I was wondering if anyone is working on a Python 2.7 version of the sacrebleu library? If not, are there any particular roadblocks I should be aware of if I try to convert it myself?

Question on CrossEntropyMetric in loss.py line 159 .

when computing smoothed cross entropy , my understand is the function of this line:
label_dist = mx.nd.where(ignore, label_dist, mx.nd.zeros_like(label_dist))
is to set the distribution of PAD to be zero, while keeping the distribution None-PAD elements in label after label smoothed, if so , maybe the condition in nd.where() should be 1-ignore rather than ignore, is not it ?

End-to-end tests?

We're reaching a large enough number of components that it's probably not reasonable to expect everyone to know what everyone else is working on and avoid potential breaks. Our unit tests are doing a great job catching local problems but it would be nice to run end-to-end tests to make sure our code produces output, however bad.

One option would be to build a tiny model on literally a few lines of data just to make sure we can go from data to model to decoded output and a zero exit code. I can look at this next week. Anyone else have thoughts on automated end-to-end tests?

monitor-bleu option does not exist anymore

The WMT15 English-German translation example seems to be having issues. I can't run it exactly as described in the README and successfully run training.

For instance, that example uses --monitor-bleu arg, which no longer exists in top-of-tree Sockeye.

I'm not sure if the solution is to change the metric from perplexity to BLEU for monitoring (that would be very slow), or keep monitoring perplexity, and report BLEU after training, by post-processing the inference output (fast but doesn't give useful feedback during training, since perplexity and BLEU are very loosely correlated). However, the example won't work in its current state.

Also, running the example as-is, by just removing --monitor-bleu from the command line args, fails due to a diagreement in the number of source and target text lines (see below). It seems to me that the data preprocessing does not clean up empty lines. That's at least how Sockeye used to fail for me before removing empty lines from the parallel corpora. See the example of my solution. It's kind of simplistic compared to the byte-pair encoding and other preprocessing steps used in the official Sockeye example, but it runs at least.

[INFO:sockeye.vocab] Building vocabulary from dataset(s): ['corpus.tc.BPE.de']
[INFO:sockeye.vocab] Vocabulary: types: 22305/22305/22305/22309 (initial/min_pruned/max_pruned/+special) [min_frequency=1, max_num_types=50000]
[INFO:sockeye.vocab] Building vocabulary from dataset(s): ['corpus.tc.BPE.en']
[INFO:sockeye.vocab] Vocabulary: types: 22662/22662/22662/22666 (initial/min_pruned/max_pruned/+special) [min_frequency=1, max_num_types=50000]
[INFO:sockeye.data_io] ===============================
[INFO:sockeye.data_io] Creating training data iterator
[INFO:sockeye.data_io] ===============================
[INFO:sockeye.utils] Releasing GPU 0.
[ERROR:__main__] Uncaught exception
Traceback (most recent call last):
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.5/dist-packages/sockeye-1.16.0-py3.5.egg/sockeye/train.py", line 822, in <module>
    main()
  File "/usr/local/lib/python3.5/dist-packages/sockeye-1.16.0-py3.5.egg/sockeye/train.py", line 746, in main
    output_folder=output_folder)
  File "/usr/local/lib/python3.5/dist-packages/sockeye-1.16.0-py3.5.egg/sockeye/train.py", line 362, in create_data_iters_and_vocab
    bucket_width=args.bucket_width)
  File "/usr/local/lib/python3.5/dist-packages/sockeye-1.16.0-py3.5.egg/sockeye/data_io.py", line 736, in get_training_data_iters
    max_seq_len_source, max_seq_len_target)
  File "/usr/local/lib/python3.5/dist-packages/sockeye-1.16.0-py3.5.egg/sockeye/data_io.py", line 230, in analyze_sequence_lengths
    "Different number of lines in source and target data.")
  File "/usr/local/lib/python3.5/dist-packages/sockeye-1.16.0-py3.5.egg/sockeye/utils.py", line 117, in check_condition
    raise SockeyeError(error_message)
sockeye.utils.SockeyeError: Different number of lines in source and target data.

GPU training fails

Hey guys,

just started building sockeye using pip (as described in the readme). Before I would always just install mxnet by hand and use "python setup.py devel --no-deps". This strategy still works. Now with pip the dependencies are satisfied but when I start training the model I get the following error:

train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [INFO:sockeye.loss] Loss: CrossEntropy(normalization_type=valid, label_smoothing=0.0)
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [INFO:sockeye.training] Using bucketing. Default max_seq_len=(100, 100)
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [INFO:main] Optimizer: adam
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [INFO:main] Optimizer Parameters: {'wd': 0.0, 'learning_rate': 0.0003, 'lr_scheduler': LearningRateSchedulerPlateauReduce(reduce_factor=0.50, reduce_num_not_improved=0), 'clip_gradient': 1.0, 'rescale_grad': 1.0}
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [INFO:main] kvstore: device
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [INFO:sockeye.model] Saved config to "/data/philip/variational-latent-variable/train_model/Attention.dot+LatentDim.1024+RnnCell.gru/model/config"
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [06:58:53] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [06:58:53] src/storage/storage.cc:113: Compile with USE_CUDA=1 to enable GPU usage
train_model/Attention.dot+LatentDim.1024+RnnCell.gru:
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: Stack trace returned 10 entries:
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (0) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x175b6c) [0x7f45bf6cdb6c]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (1) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x255adaf) [0x7f45c1ab2daf]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (2) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x255d2a7) [0x7f45c1ab52a7]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (3) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x255d81c) [0x7f45c1ab581c]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (4) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x1ff8741) [0x7f45c1550741]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (5) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2015e24) [0x7f45c156de24]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (6) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x1ffd70a) [0x7f45c155570a]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (7) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2001827) [0x7f45c1559827]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (8) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x20095ea) [0x7f45c15615ea]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (9) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2009ce4) [0x7f45c1561ce4]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru:
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [INFO:sockeye.utils] Releasing GPU 1.
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [INFO:sockeye.utils] Releasing GPU 0.
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [ERROR:main] UNCAUGHT EXCEPTION
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: Traceback (most recent call last):
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: File "/home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 1485, in simple_bind
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: ctypes.byref(exe_handle)))
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: File "/home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/base.py", line 146, in check_call
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: raise MXNetError(py_str(_LIB.MXGetLastError()))
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: mxnet.base.MXNetError: [06:58:53] src/storage/storage.cc:113: Compile with USE_CUDA=1 to enable GPU usage
train_model/Attention.dot+LatentDim.1024+RnnCell.gru:
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: Stack trace returned 10 entries:
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (0) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x175b6c) [0x7f45bf6cdb6c]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (1) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x255adaf) [0x7f45c1ab2daf]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (2) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x255d2a7) [0x7f45c1ab52a7]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (3) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x255d81c) [0x7f45c1ab581c]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (4) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x1ff8741) [0x7f45c1550741]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (5) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2015e24) [0x7f45c156de24]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (6) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x1ffd70a) [0x7f45c155570a]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (7) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2001827) [0x7f45c1559827]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (8) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x20095ea) [0x7f45c15615ea]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (9) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2009ce4) [0x7f45c1561ce4]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru:
train_model/Attention.dot+LatentDim.1024+RnnCell.gru:
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: During handling of the above exception, another exception occurred:
train_model/Attention.dot+LatentDim.1024+RnnCell.gru:
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: Traceback (most recent call last):
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: File "/home/philip/python-3.2.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: "main", mod_spec)
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: File "/home/philip/python-3.2.6/lib/python3.6/runpy.py", line 85, in _run_code
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: exec(code, run_globals)
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: File "/home/philip/mxnet-env/lib/python3.6/site-packages/sockeye-1.11.1-py3.6.egg/sockeye/train.py", line 792, in
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: File "/home/philip/mxnet-env/lib/python3.6/site-packages/sockeye-1.11.1-py3.6.egg/sockeye/train.py", line 788, in main
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: File "/home/philip/mxnet-env/lib/python3.6/site-packages/sockeye-1.11.1-py3.6.egg/sockeye/training.py", line 274, in fit
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: for_training=True, force_rebind=True, grad_req='write')
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: File "/home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/module/bucketing_module.py", line 324, in bind
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: force_rebind=False, shared_module=None, grad_req=grad_req)
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: File "/home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/module/module.py", line 417, in bind
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: state_names=self._state_names)
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: File "/home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/module/executor_group.py", line 231, in init
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: self.bind_exec(data_shapes, label_shapes, shared_group)
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: File "/home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/module/executor_group.py", line 327, in bind_exec
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: shared_group))
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: File "/home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/module/executor_group.py", line 603, in _bind_ith_exec
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: shared_buffer=shared_data_arrays, **input_shapes)
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: File "/home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 1491, in simple_bind
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: raise RuntimeError(error_msg)
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: RuntimeError: simple_bind error. Arguments:
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: source: (40, 100)
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: target: (40, 100)
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: target_label: (40, 100)
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [06:58:53] src/storage/storage.cc:113: Compile with USE_CUDA=1 to enable GPU usage
train_model/Attention.dot+LatentDim.1024+RnnCell.gru:
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: Stack trace returned 10 entries:
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (0) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x175b6c) [0x7f45bf6cdb6c]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (1) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x255adaf) [0x7f45c1ab2daf]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (2) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x255d2a7) [0x7f45c1ab52a7]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (3) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x255d81c) [0x7f45c1ab581c]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (4) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x1ff8741) [0x7f45c1550741]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (5) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2015e24) [0x7f45c156de24]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (6) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x1ffd70a) [0x7f45c155570a]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (7) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2001827) [0x7f45c1559827]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (8) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x20095ea) [0x7f45c15615ea]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru: [bt] (9) /home/philip/mxnet-env/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2009ce4) [0x7f45c1561ce4]
train_model/Attention.dot+LatentDim.1024+RnnCell.gru:
Failed train_model/Attention.dot+LatentDim.1024+RnnCell.gru/22: Task train_model/Attention.dot+LatentDim.1024+RnnCell.gru/22 failed

sockeye/git_version.py not deleted if tests fail

If a test fails while running python setup.py test, the file sockeye/git_version.py is not deleted. Running the tests again (and probably other commands in setup) fails with

Traceback (most recent call last):
  File "setup.py", line 128, in <module>
    with temporarily_write_git_hash(get_git_hash()):
  File "/Users/dvilar/anaconda3/lib/python3.6/contextlib.py", line 82, in __enter__
    return next(self.gen)
  File "setup.py", line 48, in temporarily_write_git_hash
    raise RuntimeError("%s already exists, will not overwrite" % filename)
RuntimeError: sockeye/git_version.py already exists, will not overwrite

CNN Decoder only supports a maximum length of 128, but 130 was requested....

Hi! When I try to train (and translate with) a CNN encoder & decoder, I constantly get a maximum length error like

sockeye.utils.SockeyeError: Decoder only supports a maximum length of 128, but 130 was requested. Note that the maximum output length depends on the input length and the source/target length ratio observed during training.

I did get it to work when specifying a small enough length

python -m sockeye.translate --models . \
                            --use-cpu \
                            --max-input-len 97 \
                            < decode.source \
                            > decode.output

Training parameters I used:

python -m sockeye.train --prepared-data data-prepared \
                        -vs newsdev2017.bpe.en \
                        -vt newsdev2017.bpe.lv \
                        --batch-size 128 \
                        --num-embed 512 \
                        --decode-and-evaluate -1 \
                        --checkpoint-frequency 3000 \
                        --use-tensorboard \
                        -o wmt_conv_model \
                        --encoder cnn \
                        --decoder cnn \
                        --num-layers 6:6 \
                        --max-seq-len 128

smallest_k_mx just partitions but does not sort

Writing this done before I forget about it: _beam_search expects the first hypothesis to be the best hypothesis (see _get_best_from_beam). The function utils.smallest_k guarantees this by sorting the partition returned by numpy. smallest_k_mx does not, as it takes the output of mxnet's topk. Right now this is not an issue because 1. smallest_k_mx is not used and 2. mxnet's topk actually sorts. However, we can not rely on a concrete implementation of topk and should change smallest_k_mx accordingly..

Alternative suggestions - Nearest Neighbours

Hi,

I'm looking to find alternative translation where the user would be able to change words in the translated sentence.

Here an example: English to French
"I love apples." -> "J'aimes les pommes."

The user could be able to click over "aimes" and see others possibilities like "adores","..", etc..
When the user change the word, the sentence needs to be retranslated given the choice the user submitted..
2 challenges:
1- Nearest neighbour of "aimes" given the context of the sentence "I love apples."
2- Also, how to recalculate words given some words we NEED to use ?
ex: I love apples -> J'adores .....
Any advice where to start ?

Simon

raw_corpus_bleu doesn't work for tokenized sentences

if (tokenize == 'none' or not force) and lines[0].rstrip().endswith(' .'):
tokenized_count += 1
if tokenized_count > 100:
logging.error('FATAL: That\'s > 100 lines that end in a tokenized period (\'.\')')
logging.error('It looks like you forgot to detokenize your test data, which will hurt your score.')
logging.error('If you insist your data is tokenized, rerun with \'--force\'.')
sys.exit(1)

raw_corpus_bleu sets force=True and tokenize='none', while the condition in L800 above is still satisfied. Should it be
if (tokenize != 'none' or not force) and lines[0].rstrip().endswith(' .'):
or even
if not force and lines[0].rstrip().endswith(' .'): ?

Best Parameters For Transformer Model on WMT 2014?

Hello,
I am a research assistant who is part of a team that is profiling machine learning frameworks across different applications. Among other applications, I am studying the transformer model applied to translation tasks. I would like to know if there is a listing of the best settings to use with this framework to reproduce the results in the "Attention Is All You Need" paper.

Thank you

examples for s2s distributed training with --kvstore=dist_async/dist_sync

hi all,

can u provide some examples about distributed training especially with --kvstore=dist_async/dist_sync ?

I succeed in using: --kvstore=device --device-ids -4 --batch-size 486, and the speed is fine.
but failed in: --kvstore=dist_sync--device-ids -4 --batch-size 162, complaining: "can't find DMLC_WORKER .. " (while i have compiled with USE_DIST_KVSTORE=1, mxnet_cu80-1.0.0, sockeye-1.15.8)

then i try several methods as below, it can run but the speed is slow:
python3 ../incubator-mxnet/tools/launch.py --cluster=local -n 1 ' python3 -m sockeye.train --device-ids -3 --kvstore dist_sync --batch-size 162... '
python3 ../incubator-mxnet/tools/launch.py --cluster=local -n 3 ' python3 -m sockeye.train --device-ids -1 --kvstore dist_sync --batch-size 162... '
python3 ../incubator-mxnet/tools/launch.py --cluster=local -n 3 -s 1 ' python3 -m sockeye.train --device-ids -1 --kvstore dist_sync --batch-size 162 ... '

Thus I think I might not use it properly.
Thanks.

Issues with GPUs

Hi,

I got some issues when running sockeye on GPUs.

Issue 1: leaked semaphores

sockeye-train --source ../data/multi30k/train.en.atok
--target ../data/multi30k/train.de.atok
--validation-source ../data/multi30k/val.en.atok
--validation-target ../data/multi30k/val.de.atok
--word-min-count 2
--rnn-num-layers 1
--rnn-cell-type gru
--rnn-num-hidden 128
--num-embed-source 128
--num-embed-target 128
--attention-type mlp
--attention-num-hidden 128
--batch-size 64
--normalize-loss
--dropout 0.1
--initial-learning-rate 0.001
--device-ids 0
--output ../models/multi30k

... (truncated)
[INFO:sockeye.training] Training started.
[INFO:sockeye.callback] Early stopping by optimizing 'perplexity' (minimize=True)
script-multi30k.sh: line 49: 19451 Segmentation fault sockeye-train --source ../data/multi30k/train.en.atok --target ../data/multi30k/train.de.atok --validation-source ../data/multi30k/val.en.atok --validation-target ../data/multi30k/val.de.atok --word-min-count 2 --rnn-num-layers 1 --rnn-cell-type gru --rnn-num-hidden 128 --num-embed-source 128 --num-embed-target 128 --attention-type mlp --attention-num-hidden 128 --batch-size 64 --normalize-loss --dropout 0.1 --initial-learning-rate 0.001 --device-ids 0 --output ../models/multi30k
[vhoang@gpu047 scripts]$ /pylon2/ci560op/fosterg/tools/anaconda3/lib/python3.6/multiprocessing/semaphore_tracker.py:129: UserWarning: semaphore_tracker: There appear to be 3 leaked semaphores to clean up at shutdown
len(cache))

Issue 2: Permission denied

sockeye-train --source ../data/multi30k/train.en.atok
--target ../data/multi30k/train.de.atok
--validation-source ../data/multi30k/val.en.atok
--validation-target ../data/multi30k/val.de.atok
--word-min-count 2
--rnn-num-layers 1
--rnn-cell-type gru
--rnn-num-hidden 128
--num-embed-source 128
--num-embed-target 128
--attention-type mlp
--attention-num-hidden 128
--batch-size 64
--normalize-loss
--dropout 0.1
--initial-learning-rate 0.001
--device-ids -1
--output ../models/multi30k

... (truncated)
[INFO:sockeye.train] Attempting to acquire 1 GPUs.
[INFO:sockeye.utils] Trying to acquire one of 2 available GPUs.
Traceback (most recent call last):
File "/pylon2/ci560op/fosterg/tools/anaconda3/bin/sockeye-train", line 11, in
load_entry_point('sockeye==1.0.3', 'console_scripts', 'sockeye-train')()
File "/pylon2/ci560op/fosterg/tools/anaconda3/lib/python3.6/site-packages/sockeye-1.0.3-py3.6.egg/sockeye/train.py", line 147, in main
File "/pylon2/ci560op/fosterg/tools/anaconda3/lib/python3.6/site-packages/sockeye-1.0.3-py3.6.egg/sockeye/train.py", line 147, in
File "/pylon2/ci560op/fosterg/tools/anaconda3/lib/python3.6/contextlib.py", line 330, in enter_context
result = _cm_type.enter(cm)
File "/pylon2/ci560op/fosterg/tools/anaconda3/lib/python3.6/contextlib.py", line 82, in enter
return next(self.gen)
File "/pylon2/ci560op/fosterg/tools/anaconda3/lib/python3.6/site-packages/sockeye-1.0.3-py3.6.egg/sockeye/utils.py", line 291, in acquire_gpu
PermissionError: [Errno 13] Permission denied: '/var/lock/sockeye.gpu0.lock'

Do you have any idea to resolve them?

Thanks!

--
Cheers,
Vu

perplexity-val values are incorrect

Version: 1.10.1
"perplexity-val" values in model/metrics are incorrect, e.g.

1	avg-sec-per-sent-val=0.406752	bleu-val=0.069343	perplexity-train=162.859366	perplexity-val=121212.751628	time-elapsed=356.103674	used-gpu-memory=5037.000000
2	avg-sec-per-sent-val=0.403712	bleu-val=0.106434	perplexity-train=40.775847	perplexity-val=203796.727398	time-elapsed=765.858212	used-gpu-memory=5037.000000
3	avg-sec-per-sent-val=0.351910	bleu-val=0.153550	perplexity-train=24.803743	perplexity-val=266022.905890	time-elapsed=1181.168308	used-gpu-memory=5037.000000
4	avg-sec-per-sent-val=0.339560	bleu-val=0.175705	perplexity-train=16.951490	perplexity-val=428458.420087	time-elapsed=1593.712781	used-gpu-memory=5037.000000
5	avg-sec-per-sent-val=0.317350	bleu-val=0.193712	perplexity-train=13.384035	perplexity-val=571508.121336	time-elapsed=1997.266849	used-gpu-memory=5037.000000
6	avg-sec-per-sent-val=0.297145	bleu-val=0.193210	perplexity-train=12.003007	perplexity-val=609632.199078	time-elapsed=2403.232710	used-gpu-memory=5037.000000
7	avg-sec-per-sent-val=0.288253	bleu-val=0.203492	perplexity-train=10.607567	perplexity-val=689141.407104	time-elapsed=2808.074823	used-gpu-memory=5037.000000
8	avg-sec-per-sent-val=0.295245	bleu-val=0.205443	perplexity-train=9.640161	perplexity-val=591580.788622	time-elapsed=3206.099968	used-gpu-memory=5037.000000
9	avg-sec-per-sent-val=0.188597	bleu-val=0.209383	perplexity-train=9.180378	perplexity-val=780487.730170	time-elapsed=3607.191386	used-gpu-memory=5037.000000

Line buffering / benchmark timing

It looks like we're currently not able to operate in "unbuffered" mode, interactively translating one line at a time with --batch-size 1. We're also getting all zeros for translation time for --output-type benchmark.

Gluon & Sockeye

Thanks for the such a neat project!

Recently, the Gluon, MXNet's imperative interface has been released.
What are your thoughts on it in terms of Sockeye?

Speed up decoding time

Hi, our team has been trying to use mxnet translation model in our product, however, the decoding speed is not satisfied. Any suggestion to speed up translating time ?

import error on train CLI

First commit that exhibits the error:
34748cb

Error:
Training the model...
Traceback (most recent call last):
File "/Users/cherryc/nlp/jsalt/git/upstream/sockeye/sockeye/train.py", line 32, in
from . import arguments
ImportError: cannot import name 'arguments'

Command:
PYTHONPATH=$SOCKEYE python3 $SOCKEYE/sockeye/train.py
--source data/multi30k/train-toy.$1.atok
--target data/multi30k/train-toy.$2.atok
--validation-source data/multi30k/val.$1.atok
--validation-target data/multi30k/val.$2.atok
--word-min-count 2
--rnn-num-layers 1
--rnn-cell-type gru
--rnn-num-hidden 64
--num-embed-source 64
--num-embed-target 64
--attention-type mlp
--attention-num-hidden 64
--batch-size 64
--normalize-loss
--dropout 0.1
--optimizer adam
--initial-learning-rate 0.001
--use-cpu
--output models/multi30k-$1-$2/baseline

Train Error : src/storage/./pooled_storage_manager.h:84: cudaMalloc failed: out of memory

mxnet versino:0.10.0
GPU: gtx1080 8G
Ubuntu 16.04

Error Information:
`[10:16:06] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:304: [10:16:06] src/storage/./pooled_storage_manager.h:84: cudaMalloc failed: out of memory

Stack trace returned 10 entries:
[bt] (0) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7fc8b883f0dc]
[bt] (1) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xf797e8) [0x7fc8b962d7e8]
[bt] (2) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xf7b677) [0x7fc8b962f677]
[bt] (3) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xb4392b) [0x7fc8b91f792b]
[bt] (4) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xb3b1ee) [0x7fc8b91ef1ee]
[bt] (5) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xb7e5bc) [0x7fc8b92325bc]
[bt] (6) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xb81590) [0x7fc8b9235590]
[bt] (7) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fc8d190bc80]
[bt] (8) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fc8d772c6ba]
[bt] (9) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fc8d74623dd]

[10:16:06] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:304: [10:16:06] src/engine/./threaded_engine.h:329: [10:16:06] src/storage/./pooled_storage_manager.h:84: cudaMalloc failed: out of memory

Stack trace returned 10 entries:
[bt] (0) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7fc8b883f0dc]
[bt] (1) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xf797e8) [0x7fc8b962d7e8]
[bt] (2) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xf7b677) [0x7fc8b962f677]
[bt] (3) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xb4392b) [0x7fc8b91f792b]
[bt] (4) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xb3b1ee) [0x7fc8b91ef1ee]
[bt] (5) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xb7e5bc) [0x7fc8b92325bc]
[bt] (6) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xb81590) [0x7fc8b9235590]
[bt] (7) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fc8d190bc80]
[bt] (8) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fc8d772c6ba]
[bt] (9) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fc8d74623dd]

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 6 entries:
[bt] (0) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7fc8b883f0dc]
[bt] (1) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xb7e84f) [0x7fc8b923284f]
[bt] (2) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xb81590) [0x7fc8b9235590]
[bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fc8d190bc80]
[bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fc8d772c6ba]
[bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fc8d74623dd]

terminate called after throwing an instance of 'dmlc::Error'
what(): [10:16:06] src/engine/./threaded_engine.h:329: [10:16:06] src/storage/./pooled_storage_manager.h:84: cudaMalloc failed: out of memory

Stack trace returned 10 entries:
[bt] (0) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7fc8b883f0dc]
[bt] (1) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xf797e8) [0x7fc8b962d7e8]
[bt] (2) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xf7b677) [0x7fc8b962f677]
[bt] (3) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xb4392b) [0x7fc8b91f792b]
[bt] (4) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xb3b1ee) [0x7fc8b91ef1ee]
[bt] (5) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xb7e5bc) [0x7fc8b92325bc]
[bt] (6) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xb81590) [0x7fc8b9235590]
[bt] (7) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fc8d190bc80]
[bt] (8) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fc8d772c6ba]
[bt] (9) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fc8d74623dd]

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 6 entries:
[bt] (0) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x18b0dc) [0x7fc8b883f0dc]
[bt] (1) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xb7e84f) [0x7fc8b923284f]
[bt] (2) /home/abner/new/software/mxnet_env/lib/python3.5/site-packages/mxnet/libmxnet.so(+0xb81590) [0x7fc8b9235590]
[bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fc8d190bc80]
[bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fc8d772c6ba]
[bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fc8d74623dd]

Aborted (core dumped)
`

Getting 'RuntimeError: simple_bind error' while training a cnn-cnn model

Hi,
I am training a model using cnn as encoder and decoder with 40 million source-target pairs as train dataset and 10 million source-target pairs as validation dataset.
Here is my training script:

python -m sockeye.train -s train_53_50M_sampled.source
-t train_53_50M_sampled.target
-vs test_53_50M_sampled.source
-vt test_53_50M_sampled.target
--encoder cnn
--decoder cnn
--max-seq-len 5:4
--decode-and-evaluate 500
--max-num-epochs 10
-o s2s_out_cnn_cnn_53_50M_sampled
--device-ids 2 3 4 5 6 7
--batch-size 384
--use-tensorboard
--num-words 100000
--shared-vocab
--checkpoint-frequency 1000

Getting the following error message:

[INFO:root]
Saved params to "/mnt/pankajd/sequence-to-sequence/s2s_out_cnn_cnn_53_50M_sampled/params.00001"
[INFO:sockeye.utils] GPU 2: 3370/11439 MB (29.46%) GPU 3: 3370/11439 MB (29.46%) GPU 4: 3370/11439 MB (29.46%) GPU 5: 1702/11439 MB (14.88%) GPU 6: 1666/11439 MB (14.56%) GPU 7: 1633/11439 MB (14.28%)
[INFO:sockeye.training] Checkpoint [1] Updates=1000 Epoch=0 Samples=384000 Time-cost=701.575
[INFO:sockeye.training] Checkpoint [1] Train-perplexity=171.711222
metrics = checkpoint_decoder.decode_and_evaluate(checkpoint, output_name)
File "/home/pankajd/sockeye-cu8-env/lib/python3.5/site-packages/sockeye/checkpoint_decoder.py", line 125, in decode_and_evaluate
max_output_length_num_stds=self.max_output_length_num_stds)
File "/home/pankajd/sockeye-cu8-env/lib/python3.5/site-packages/sockeye/inference.py", line 408, in load_models
model.initialize(max_input_len, get_max_output_length)
File "/home/pankajd/sockeye-cu8-env/lib/python3.5/site-packages/sockeye/inference.py", line 130, in initialize
self.encoder_module.bind(data_shapes=max_encoder_data_shapes, for_training=False, grad_req="null")
File "/home/pankajd/sockeye-cu8-env/lib/python3.5/site-packages/mxnet/module/bucketing_module.py", line 337, in bind
force_rebind=False, shared_module=None, grad_req=grad_req)
File "/home/pankajd/sockeye-cu8-env/lib/python3.5/site-packages/mxnet/module/module.py", line 428, in bind
state_names=self._state_names)
File "/home/pankajd/sockeye-cu8-env/lib/python3.5/site-packages/mxnet/module/executor_group.py", line 237, in init
self.bind_exec(data_shapes, label_shapes, shared_group)
File "/home/pankajd/sockeye-cu8-env/lib/python3.5/site-packages/mxnet/module/executor_group.py", line 333, in bind_exec
shared_group))
File "/home/pankajd/sockeye-cu8-env/lib/python3.5/site-packages/mxnet/module/executor_group.py", line 611, in _bind_ith_exec
shared_buffer=shared_data_arrays, **input_shapes)
File "/home/pankajd/sockeye-cu8-env/lib/python3.5/site-packages/mxnet/symbol/symbol.py", line 1494, in simple_bind
raise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
source: (16, 0)
Error in operator _arange1: [17:18:00] src/operator/tensor/./init_op.h:410: Check failed: param.start < param.stop.value() Invalid range (start, stop, step) = (0,0,1)

I have also tried using lstm as encoder and decoder, but still getting same runtime error.

--max-seq-length not used for inference

It appears that max-seq-length is no longer transfered to --max-input-len of the inference arguments when tracking the BLEU on transformer encoder-decoder model. The log below shows that inference buckets up to a length of 1540 are created which resulted in an error for tracking BLEU, although the training process continued on normal buckets length of 80 as expected from the parameter --max-seq-length 80.
The BLEU tracking works properly for both rnn and cnn configurations.

/sockeye/model_fr_en_trans/params.00006"
[INFO:sockeye.inference] Translator (1 model(s) beam_size=5 ensemble_mode=None batch_size=16 buckets_source=[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950] buckets_target=[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, 1000, 1010, 1020, 1030, 1040, 1050, 1060, 1070, 1080, 1090, 1100, 1110, 1120, 1130, 1140, 1150, 1160, 1170, 1180, 1190, 1200, 1210, 1220, 1230, 1240, 1250, 1260, 1270, 1280, 1290, 1300, 1310, 1320, 1330, 1340, 1350, 1360, 1370, 1380, 1390, 1400, 1410, 1420, 1430, 1440, 1450, 1460, 1470, 1480, 1490, 1500, 1510, 1520, 1530, 1540])
[17:55:25] /mxnet/dmlc-core/include/dmlc/./logging.h:308: [17:55:25] src/storage/./pooled_storage_manager.h:102: cudaMalloc failed: out of memory

fp16 training support

Purpose of this issue: This is a placeholder for a conversation regarding fp16 training support. I will be sharing my findings reagarding this issue here, and I suggest that if others have tried to make this happen, sharing the findings here would be much appreciated.

Background: fp16 training is important, particularly on Volta GPUs, which will soon be on the market. The fp16 FLOPS on Volta are almost 8x more compared to fp32 (~120 TFLOPS fp16 vs. ~15 TFLOPS fp32), and on Pascal that's still about 2x.

Reproducibility: In the meantime, before Volta is widely available, the testing of whether the implementation supports fp16 training (irrespective of whether that affects training accuracy - just to check the "plumbing") can be done on Pascal GPUs, especially P100 (true fp16). It's possible to do it on Titan XP - Titan XP provides, emulated fp16, which is slower, but still allows development to verify MxNet and Sockeye support. From previous experiments using TensorFlow, we know that training with fp16 can generate the same accuracy as fp32 for various seq2seq models, including the public TensorFlow model.

Is GPU support available?

Hi all,
I've tried to run a training on gpu with the following command

python -m sockeye.train --source data/wmt15-de-en/train.en --target data/wmt15-de-en/train.de                --validation-source data/wmt15-de-en/valid.en --validation-target data/wmt15-de-en/valid.de --output models/ --device-ids 0

but got the following error

Traceback (most recent call last):
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/site-packages/sockeye-1.0.2-py3.6.egg/sockeye/train.py", line 265, in <module>
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/site-packages/sockeye-1.0.2-py3.6.egg/sockeye/train.py", line 261, in main
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/site-packages/sockeye-1.0.2-py3.6.egg/sockeye/training.py", line 192, in fit
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/site-packages/mxnet/module/bucketing_module.py", line 298, in bind
    force_rebind=False, shared_module=None, grad_req=grad_req)
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/site-packages/mxnet/module/module.py", line 388, in bind
    state_names=self._state_names)
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/site-packages/mxnet/module/executor_group.py", line 216, in __init__
    self.bind_exec(data_shapes, label_shapes, shared_group)
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/site-packages/mxnet/module/executor_group.py", line 312, in bind_exec
    shared_group))
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/site-packages/mxnet/module/executor_group.py", line 632, in _bind_ith_exec
    context, self.logger)
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/site-packages/mxnet/module/executor_group.py", line 610, in _get_or_reshape
    arg_arr = nd.zeros(arg_shape, context, dtype=arg_type)
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/site-packages/mxnet/ndarray.py", line 1003, in zeros
    return _internal._zeros(shape=shape, ctx=ctx, dtype=dtype)
  File "<string>", line 15, in _zeros
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py", line 72, in _imperative_invoke
    c_array(ctypes.c_char_p, [c_str(str(val)) for val in vals])))
  File "/hltmt0/data/digangi/anaconda3/lib/python3.6/site-packages/mxnet/base.py", line 84, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [16:16:21] src/c_api/c_api_ndarray.cc:392: Operator _zeros cannot be run; requires at least one of FCompute<xpu>, NDArrayFunction, FCreateOperator be registered

I don't get this error with --use-cpu.

I'm quite new to MXNet, so I can't get whether it is a problem of my installation.

Thanks,
Mattia

Sockeye's translation of the text of the command

Hello, I would like to ask what is the command to translate a text using the Sockeye model.
Not echo "er ist so ein toller Kerl und ein Familienvater." |
python -m apply_bpe -c bpe.codes --vocabulary bpe.vocab.en
--vocabulary-threshold 50 |
python -m sockeye.translate -m wmt_model 2> / dev / null |
sed -r 's / @@ (| $) // g'
Hope to give the answer, thank you

Could not replicate results obtained with OpenNMT-py

Hi all,

I've run a training on a private small dataset (~200K parallel sentences) trying to use the same hyperparameters for both OpenNMT-py and Sockeye, as far as it was possible.

After 50 epochs I stopped the training with OpenNMT-py, which had a validation perplexity of 3.03, while after 50 epochs the validation perplexity was more than 400 with Sockeye. The training continued until epoch 137, where the validation perplexity was still more than 200.

The followings are the training commands:

OpenNMT-py

python $OpenNMTpy/train.py -data data/train.pt -save_model models/output -brnn -batch_size 120 -epochs 50 -start_epoch=1 -optim sgd -learning_rate 1 -learning_rate_decay 0.9 -start_decay_at 9 -gpus 0 -dropout 0.3 -brnn_merge sum

Sockeye

python -m sockeye.train --source data/train.en --target data/train.it --validation-source data/dev.en --validation-target data/dev.it --output models --device-ids 1 --rnn-num-layers 2 --rnn-num-hidden 500 --num-embed 500 --max-seq-len 50 --batch-size 120 --dropout 0.3 --optimizer sgd --initial-learning-rate 1.0 --learning-rate-reduce-factor 0.9 --clip-gradient 5 --attention-type 'dot' --use-fused-rnn --normalize-loss

Something to point out:

  • 500 is the default value in OpenNMT-py for both rnn-units and word embedding size.
  • From what I've understood, the attention implemented there is equivalent to the dot attention.
  • Without --normalize-loss the model diverges immediately, producing really high perplexity.

I'll try with a publicly available dataset to provide you with more information.

UPDATE: I'm trying with IWSLT2016 En-Fr and the same training commands, but the situation is exactly the same.

Yaml dependency missing

On a fresh installation, I got this exception right after the first run

Traceback (most recent call last):
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/elav01/.local/lib/python3.5/site-packages/sockeye/train.py", line 33, in <module>
    from . import attention
  File "/home/elav01/.local/lib/python3.5/site-packages/sockeye/attention.py", line 22, in <module>
    from . import config
  File "/home/elav01/.local/lib/python3.5/site-packages/sockeye/config.py", line 16, in <module>
    import yaml
ImportError: No module named 'yaml'

pip install pyyaml solved it. I guess it should

when i trained the model getting the error :TypeError: __init__() got an unexpected keyword argument 'forget_bias'

mxnet version:0.9.4

CMD:
python -m sockeye.train --source ./nmt_data/train2.en --target ./nmt_data/train2.vi --validation-source ./nmt_data/tst2012.en --validation-target ./nmt_data/tst2012.vi --output model

ERROR INFORMATION:
Traceback (most recent call last):
File "/usr/lib/python3.4/runpy.py", line 170, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.4/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/new/software/mxnet_env/lib/python3.4/site-packages/sockeye/train.py", line 270, in
main()
File "/home/new/software/mxnet_env/lib/python3.4/site-packages/sockeye/train.py", line 227, in main
rnn_forget_bias=args.rnn_forget_bias)
File "/home/new/software/mxnet_env/lib/python3.4/site-packages/sockeye/training.py", line 86, in init
self._build_model_components(self.config.max_seq_len, fused, rnn_forget_bias)
File "/home/new/software/mxnet_env/lib/python3.4/site-packages/sockeye/model.py", line 154, in _build_model_components
fused_encoder)
File "/home/new/software/mxnet_env/lib/python3.4/site-packages/sockeye/encoder.py", line 68, in get_encoder
forget_bias=forget_bias))
File "/home/new/software/mxnet_env/lib/python3.4/site-packages/sockeye/encoder.py", line 360, in init
forget_bias=forget_bias)
File "/home/new/software/mxnet_env/lib/python3.4/site-packages/sockeye/encoder.py", line 247, in init
residual, forget_bias)
File "/home/new/software/mxnet_env/lib/python3.4/site-packages/sockeye/rnn.py", line 45, in get_stacked_rnn
cell = mx.rnn.LSTMCell(num_hidden=num_hidden, prefix=cell_prefix, forget_bias=forget_bias)
TypeError: init() got an unexpected keyword argument 'forget_bias'

Not so fast and the performance is not as good as the paper declares

I find that the training is not so fast and it takes 4 days to train the wmt16en-de, however, the tensor2tensor only needs 2 days. I do not know why the sockeye needs 1M updates? As I set the batch_size as 4096 both for the sockeye and tensor2tensor, the tensor2tensor runs 6000 updates for one epoch, but the sockeys runs 30000 updates for one epoch. The script for running the sockeye is:
export CUDA_VISIBLE_DEVICES='8,9,10,11,12,13'

python3 -m sockeye.train
-s /data/zhyang/dl4mt/corpus/data_450w_en_de/transformer/450w_ende_data/train.tok.clean.bpe.32000.en
-t /data/zhyang/dl4mt/corpus/data_450w_en_de/transformer/450w_ende_data/train.tok.clean.bpe.32000.de
-vs /data/zhyang/dl4mt/corpus/data_450w_en_de/transformer/450w_ende_data/newstest2013.tok.bpe.32000.en
-vt /data/zhyang/dl4mt/corpus/data_450w_en_de/transformer/450w_ende_data/newstest2013.tok.bpe.32000.de
-o en-de
--seed=1 --batch-type=word --batch-size=4096 --checkpoint-frequency=4000 --device-ids=-6 --embed-dropout=0:0 --encoder=transformer --decoder=transformer --num-layers=6:6 --transformer-model-size=512 --transformer-attention-heads=8 --transformer-feed-forward-num-hidden=2048 --transformer-preprocess=n --transformer-postprocess=dr --transformer-dropout-attention=0.1 --transformer-dropout-act=0.1 --transformer-dropout-prepost=0.1 --transformer-positional-embedding-type fixed --fill-up=replicate --max-seq-len=100:100 --label-smoothing 0.1 --weight-tying --weight-tying-type=src_trg_softmax --num-embed 512:512 --num-words 50000:50000 --word-min-count 1:1 --optimizer=adam --optimized-metric=perplexity --initial-learning-rate=0.0001 --learning-rate-reduce-num-not-improved=8 --learning-rate-reduce-factor=0.7 --learning-rate-scheduler-type=plateau-reduce --learning-rate-warmup=0 --max-num-checkpoint-not-improved=32 --min-num-epochs=0 --max-updates 1001000
--weight-init xavier --weight-init-scale 3.0 --weight-init-xavier-factor-type avg

After finished training, I only get 26.01 BLEU on newstest2014. However, the tensor2tensor gets 26.75.

Validation with process-exclusive devices

Most GPUs on our servers are process-exclusive. If I specify a process-exclusive GPU as a sockeye --device for training, the validation step throws an error:

cudaMalloc failed: all CUDA-capable devices are busy or unavailable

Presumably because I have specified a single device to be aquired and it's in Exclusive Mode.

With Nematus, I usually use a different device for the validation process. With sockeye, as far as I can see, I cannot specify a device that would only be used for the purpose of validation, is that correct?

Do you have a recommendation to deal with this? Thanks!

sacreBLEU encoding compatibility issues with utf8

Hi,
I've been trying to use the tool, it works well in my machine but crashes in my docker container for some reason (attempts to encode ascii instead of utf-8,I'm using python 3.5.2 and set PYTHONIOENCODING="utf-8" )
I had to manually add some encoding information to both setup.py and sacreblue.py, but I assume this is just a symptom and a better encoding attention is required.

tensorboard cannot find runfiles

Likely trivial..but I don't
I have sockeye up and running having followed the instructions, but tensorboard does not find the .runfiles directory
tailing the logfile that I piped stdout/stderr to
[INFO:sockeye.inference] Translator (1 model(s) beam_size=5 ensemble_mode=None buckets_source=[10, 20, 30, 40, 50, 60] buckets_target=[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 118])
[INFO:root] Epoch[0] Batch [6050] Speed: 315.80 samples/sec perplexity=121.795174
[INFO:root] Epoch[0] Batch [6100] Speed: 319.00 samples/sec perplexity=118.806768
but in another shell
~/sockeye/wmt_model$ tensorboard --logdir .
Traceback (most recent call last):
File "/home/levinth/.local/bin/tensorboard", line 152, in
Main()
File "/home/levinth/.local/bin/tensorboard", line 102, in Main
module_space = FindModuleSpace()
File "/home/levinth/.local/bin/tensorboard", line 83, in FindModuleSpace
sys.argv[0])
AssertionError: Cannot find .runfiles directory for /home/levinth/.local/bin/tensorboard

I did do the pip install tensorboard part and invoked sockeye with the invocation for the 1 layer german to english model (hidden =512, embedding = 256 etc..)
but tensorboard seems to want something else.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.