dtopwords's People
Forkers
moreinterestdtopwords's Issues
请问词典可以换么?
想做一个类似于补充新词到已有词典的工作,请问初始的词典可以换么?
Question about evaluation Metric
Hello.
最近在读这篇论文 [Chen, A. and Sun, M., 2017, August. Domain-specific new words detection in Chinese. In Proceedings of the 6th joint conference on lexical and computational semantics (* SEM 2017) (pp. 44-53)]. 关于 Evaluation Metric 节 MAP 中的 P(k) 函数有点疑惑,能求一份详细的评估部分代码吗?
Thanks.
请问这个路径问题应该怎么解决好?
E:\dtopword>python pipeline.py corpus/c4x@[email protected] result.txt result1.txt
命令语法不正确。
命令语法不正确。
'split' 不是内部或外部命令,也不是可运行的程序
或批处理文件。
Traceback (most recent call last):
File "pipeline.py", line 103, in
generate_domain_words(src_path, dtop_res_path)
File "E:\dtopword\dtopwords.py", line 336, in generate_domain_words
initialize_static_files(trg_files)
File "E:\dtopword\dtopwords.py", line 99, in initialize_static_files
init_dict(tfp, cnp, inp)
File "E:\dtopword\dtopwords.py", line 106, in init_dict
generate_all_ngram(src_path, cn_path, len_threshold, fre_threshold, splitor=splitor)
File "E:\dtopword\ngram.py", line 486, in generate_all_ngram
generate_large_ngram_by_filtering(input_file_path, input_file_path + '_tmp%d' % i, gram_num=i, filter_num=filter_num, sort=True, filter_function=punctuation_filter, merge=False, splitor=splitor)
File "E:\dtopword\ngram.py", line 257, in generate_large_ngram_by_filtering
piece_file_list = [(f, os.path.join(tmp_dir, f)) for f in os.listdir(tmp_dir)]
WindowsError: [Error 3] : '/tmp/large_ngram_pieces_c4x@[email protected]_filtered/.'
我发现只按readme里面的执行的话提示缺路径参数,所以在setting.py里面加入USE_RULE0 = 1这样的初始化语句,但是会蹦出上面的错误,应该是路径的问题,不懂怎么解决好呢?
大数据(5.4G)出现编码问题了
之前运行的小数据500M以内没有问题,但是换成大数据了,总是报错。只把报错的句子部分运行又不报错了。
Traceback (most recent call last):
File "pipeline.py", line 103, in
generate_domain_words(src_path, dtop_res_path)
File "/home/mjj/project/H/dtopwords.py", line 336, in generate_domain_words
initialize_static_files(trg_files)
File "/home/mjj/project/H/dtopwords.py", line 99, in initialize_static_files
init_dict(tfp, cnp, inp)
File "/home/mjj/project/H/dtopwords.py", line 106, in init_dict
generate_all_ngram(src_path, cn_path, len_threshold, fre_threshold, splitor=splitor)
File "/home/mjj/project/H/ngram.py", line 486, in generate_all_ngram
generate_large_ngram_by_filtering(input_file_path, input_file_path + '_tmp%d' % i, gram_num=i, filter_num=filter_num, sort=True, filter_function=punctuation_filter, merge=False, splitor
=splitor) File "/home/mjj/project/H/ngram.py", line 265, in generate_large_ngram_by_filtering
get_ngram(gram_num, piece_file_path, piece_ngram_path, filter_num=filter_num, preprocessor=preprocessor, splitor=splitor)
File "/home/mjj/project/H/ngram.py", line 135, in get_ngram
segs = [a.encode('utf-8') for a in list(line.decode('utf-8'))]
File "/home/mjj/virtualenv/py2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe6 in position 79: unexpected end of data
关于语料数据文件“corpus.tar.gz”的咨询
您好!
我最近的研究参考了您的文章《Domain-Specific New Words Detection in Chinese》,因此看到了这个github的代码,但您好像忘记上传语料数据 文件“corpus.tar.gz”了。如果方便的话能否上传或者发给我邮箱一份呢?
十分谢谢!
Neo (Beihang Univesrsity)
e-mail: [email protected]
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.