niutrans / niutrans.smt Goto Github PK

NiuTrans.SMT is an open-source statistical machine translation system developed by a joint team from NLP Lab. at Northeastern University and the NiuTrans Team. The NiuTrans system is fully developed in C++ language. So it runs fast and uses less memory. Currently it supports phrase-based, hierarchical phrase-based and syntax-based (string-to-tree, tree-to-string and tree-to-tree) models for research-oriented studies.

License: GNU General Public License v2.0

Perl 1.94% Makefile 0.01% C++ 97.86% C 0.15% Shell 0.01% Python 0.01% Prolog 0.02% Batchfile 0.01% Raku 0.01%

machine-translation statistical-machine-translation decoder phrase-based-translation parsing

niutrans.smt's Issues

特定领域的翻译问题, 使用统计翻译模型大概需要多少数据量才能得到合理的翻译结果

首先感谢该项目, 我在完全不了解perl的情况下, 成功在自己的语料下完成了, 整个过程. (只遇到了一个因 "#"字符导致的错误)

我当前的数据量只有几千条, 在未经任何数据处理下, 我的实验结果是训练集 bleu是0.76, 测试集是0.26.
使用的模型是基于层次的短语模型.

除了标题的问题, 我还想知道切换到其他开源翻译模型, 是否对翻译效果, 有帮助

clang: error: linker command failed with exit code 1 (use -v to see invocation) make[1]: * [NiuTrans.Decoder] Error 1 make: * [all] Error 2

clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[1]: *** [NiuTrans.Decoder] Error 1
make: *** [all] Error 2

语料对齐问题

示例里面用的是中翻英系统，src填中文语料路径，tgt填英文语料路径。
如果我想训练英翻中系统，src也是填中文，tgt也是填英文吗？

Error about "NiuTrans-running-segmenter"

some error occured when i run this script：
perl NiuTrans-running-segmenter.pl \ # 中文预处理 -lang ch \ -input ../work/preprocessing/chinese.clean.txt \ -output ../work/preprocessing/chinese.clean.txt.prepro \ -method 01

and some error info is as follows:

`########### SCRIPT ########### SCRIPT ############ SCRIPT ##########

NiuTrans Running NiuSeg (version 1.2.0 Beta) --www.nlplab.com

########### SCRIPT ########### SCRIPT ############ SCRIPT ##########
Running: ../bin/NiuSegmenter_CN_x64 ../config/NiuTrans.NiuSeg.ch.config tmp.config.chi 11101
--- Initialize Chinese program ...
--- Chinese_Wrapper : Load configure file.
--- Chinese_Wrapper : Configure file load finished.
--- Chinese_Wrapper : Initialize segmentation ...
Reading keys from ../resource/Dict0920/len2.lex...
Sorting keys...
Analyzing ...
keys wcstok failed

Error ##### chi_LM-Based_word_breaker reports lex:../resource/Dict0920/len2.lex||||||loc:../resource/Dict0920/len2.loc||||||org:../resource/bi.org.dict||||||psn:../resource/Dict0920/len2.psn not found or can't open!

--- Chinese_Wrapper : Segmentations initialize finished.
--- Chinese_Wrapper : Initialize preprocessor ...
--- all_PreProcessing_FullToHalf stand ready.
--- Chinese_Wrapper : PreProcessors initialize finished.
--- Chinese_Wrapper : Initialize prev-recognizers ...
--- all_PrevRecognition_RegexRecognizer stand ready.
--- Chinese_Wrapper : Prev-recognizers initialize finished.
--- Chinese_Wrapper : Initialize post-recognizers ...
--- chi_All_Post_Details stand ready.
--- all_PostRecognition_MergeAtomToCompose stand ready.
--- Chinese_Wrapper : Post-recognizers initialize finished.
--- Chinese_Wrapper : Initialize translators ...
--- chi_Translation_ChinumToArabicnum stand ready.
--- chi_Translation_ArabicNumToEngTranslate stand ready.
--- chi_Translation_BilingualDictionary stand ready.
--- chi_Translation_NumberTranslator stand ready.
Error: Execution of: ../bin/NiuSegmenter_CN_x64 ../config/NiuTrans.NiuSeg.ch.config tmp.config.chi 11101
die with signal 11, with coredump
`
Environment
Linux version 4.4.0-62-generic (buildd@lcy01-30) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) )

perl NiuTrans-running-segmenter.pl -lang ch -input ../sample-data/sample-submission-version/Test-set/Niu.test.txt -output ./sample-data/sample-submission-version/Test-set/pred -method 11

########### SCRIPT ########### SCRIPT ############ SCRIPT ##########

NiuTrans Running NiuSeg (version 1.2.0 Beta) --www.nlplab.com

########### SCRIPT ########### SCRIPT ############ SCRIPT ##########
Running: ../bin/NiuSegmenter_CN_x64 ../config/NiuTrans.NiuSeg.ch.config tmp.config.chi 11111
--- Initialize Chinese program ...
--- Chinese_Wrapper : Load configure file.
--- Chinese_Wrapper : Configure file load finished.
--- Chinese_Wrapper : Initialize segmentation ...
Reading keys from ../resource/Dict0920/len2.lex...
Sorting keys...
Analyzing ...
keys wcstok failed

Error ##### chi_LM-Based_word_breaker reports lex:../resource/Dict0920/len2.lex||||||loc:../resource/Dict0920/len2.loc||||||org:../resource/bi.org.dict||||||psn:../resource/Dict0920/len2.psn; not found or can't open!

--- Chinese_Wrapper : Segmentations initialize finished.
--- Chinese_Wrapper : Initialize preprocessor ...
--- all_PreProcessing_FullToHalf stand ready.
--- Chinese_Wrapper : PreProcessors initialize finished.
--- Chinese_Wrapper : Initialize prev-recognizers ...
--- all_PrevRecognition_RegexRecognizer stand ready.
--- Chinese_Wrapper : Prev-recognizers initialize finished.
--- Chinese_Wrapper : Initialize post-recognizers ...
--- chi_All_Post_Details stand ready.
--- all_PostRecognition_MergeAtomToCompose stand ready.
--- Chinese_Wrapper : Post-recognizers initialize finished.
--- Chinese_Wrapper : Initialize translators ...
--- chi_Translation_ChinumToArabicnum stand ready.
--- chi_Translation_ArabicNumToEngTranslate stand ready.
--- chi_Translation_BilingualDictionary stand ready.
--- chi_Translation_NumberTranslator stand ready.
Error: Execution of: ../bin/NiuSegmenter_CN_x64 ../config/NiuTrans.NiuSeg.ch.config tmp.config.chi 11111
die with signal 11, with coredump
zyyt@ubuntu:~/liuqingmin/enkk_wmt/tools/NiuTrans.SMT/scripts$ vi ../sample-data/sample-submission-version/Test-set/Niu.test.txt

Multiple compile issues on Linux

My C++ is very weak but doing a fresh pull and make has multiple errors when attempting to build from master on Ubuntu 18.04. It appears that there are many issues, the first of which is

OurTree.cpp: In member function ‘bool smt::Tree::CreateForest(const char*)’:                                                                                  
OurTree.cpp:377:23: error: ISO C++ forbids comparison between pointer and integer [-fpermissive]                                                 
         while(ibeg != '\0'){                                                                                                                                 
                       ^~~~                                                                                                                                   Makefile:13: recipe for target 'OurTree.o' failed                                                                                                             
make[1]: *** [OurTree.o] Error 1                                                                                                                              
make[1]: Leaving directory '/home/a.melser/dev/NiuTrans.SMT/src/NiuTrans.Decoder'                                                                             
Makefile:12: recipe for target 'all' failed                                                                                                                   
make: *** [all] Error 2

But there seem to be many others, like missing variables (src/NiuTrans.PhraseExtractor/dispatcher.cpp, options.sort_phrase_table), missing methods:

ruletable_scorer.cpp: In member function ‘bool ruletable_scorer::PhraseTable::generatePhraseTable(ruletable_scorer::PhraseAlignment&, bool&, std::ofstream&, b
ool&, ruletable_scorer::OptionsOfScore&, ruletable_scorer::ScoreClassifyNum&)’:                                                                  
ruletable_scorer.cpp:280:80: error: no matching function for call to ‘ruletable_scorer::PhraseTable::output(std::ofstream&, bool&, ruletable_scorer::OptionsOf
Score&, ruletable_scorer::ScoreClassifyNum&, double&)’                                                                                                        
         output( outfile, inverseFlag, options, scoreClassifyNum ,totalFrequency);

And maybe more. Is there something I am missing or has this version not been tested on Linux? If you have a version that has definitely been compiled on Linux I can compare with then I can help get this working!

FYI, none of the links to download packages on http://www.nlplab.com/NiuPlan/NiuTrans.html or http://www.niutrans.com/niutrans/NiuTrans.html are still working.

niutrans / niutrans.smt Goto Github PK

niutrans.smt's Issues

特定领域的翻译问题, 使用统计翻译模型大概需要多少数据量才能得到合理的翻译结果

clang: error: linker command failed with exit code 1 (use -v to see invocation) make[1]: * [NiuTrans.Decoder] Error 1 make: * [all] Error 2

语料对齐问题

Error about "NiuTrans-running-segmenter"

NiuTrans Running NiuSeg (version 1.2.0 Beta) --www.nlplab.com

Error ##### chi_LM-Based_word_breaker reports lex:../resource/Dict0920/len2.lex||||||loc:../resource/Dict0920/len2.loc||||||org:../resource/bi.org.dict||||||psn:../resource/Dict0920/len2.psn not found or can't open!

perl NiuTrans-running-segmenter.pl -lang ch -input ../sample-data/sample-submission-version/Test-set/Niu.test.txt -output ./sample-data/sample-submission-version/Test-set/pred -method 11

NiuTrans Running NiuSeg (version 1.2.0 Beta) --www.nlplab.com

Error ##### chi_LM-Based_word_breaker reports lex:../resource/Dict0920/len2.lex||||||loc:../resource/Dict0920/len2.loc||||||org:../resource/bi.org.dict||||||psn:../resource/Dict0920/len2.psn; not found or can't open!

Multiple compile issues on Linux

请问机器翻译中应该如何处理关于表情与特殊符号的问题

Segmentation fault when using NiuTrans.Decoder

Error: Execution of: ../bin/NiuSegmenter_CN_x64 ../config/NiuTrans.NiuSeg.ch.config tmp.config.chi 11101

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent