Coder Social home page Coder Social logo

monpa-team / monpa Goto Github PK

View Code? Open in Web Editor NEW
245.0 23.0 26.0 8.44 MB

MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型

License: Other

Jupyter Notebook 12.54% Python 87.46%
nlp ner pos named-entity-recognition word-segmentation chinese-word-segmentation pos-tagging bert albert

monpa's Issues

Compatibility with spacy?

Hello,

謝謝你們的分享,想請教一下,目前Monpa套件有方法,或是計劃,能夠整合在spacy language model當中嗎?謝謝回覆!

downloading model 404?

http 404: http://nlp.tmu.edu.tw/monpa_model/model-511.pt

/monpa/init.py in
73 with open(path_model, 'wb') as f:
74 r = requests.get(download_model_url, stream=True)
---> 75 total_length = int(r.headers.get('content-length'))
76 print("Total file size:", total_length, "KB")
77 dl = 0

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

I guess you have changed the link?

How to dismiss message of " Welcome to MONPA: Multi-Objective NER POS Annotator for Chinese"

@windyeh
Apache server的error log 都會出現以下的wsgi:error
在cli下執行雖然也會有同樣的字樣,但不會有錯誤。請教如何關掉這類的message使其不要出現wsgi:error?
謝謝

[Wed Mar 06 12:13:47.823665 2024] [wsgi:error] [pid 6932] +---------------------------------------------------------------------+ [Wed Mar 06 12:13:47.823691 2024] [wsgi:error] [pid 6932] Welcome to MONPA: Multi-Objective NER POS Annotator for Chinese [Wed Mar 06 12:13:47.823694 2024] [wsgi:error] [pid 6932] +---------------------------------------------------------------------+ [Wed Mar 06 12:13:47.851108 2024] [wsgi:error] [pid 6932] \xe5\xb7\xb2\xe6\x89\xbe\xe5\x88\xb0 model\xe6\xaa\x94\xe3\x80\x82Found model file.

Fix to avoid return empty result if user dict has empty line


+++ __init__.py	2020-04-14 22:09:55.000000000 +0800
@@ -246,7 +246,7 @@
     # empty previous userdict
     _userdict = []
     for input_item in io.open(pathtofile, 'r', encoding="utf-8").read().split("\n"):
-        _userdict.append(input_item.split(" "))
+        if input_item: _userdict.append(input_item.split(" "))

 def findall(p, s):
     ''' Yields all the positions of the pattern p in the string s```

jupyter notebook 的 import monpa 問題

PY新手請問

不好意思,請問一下,使用anaconda開啟的jupyter notebook沒辨法import monpa,使用dos指令開啟的jupyter notebook沒問題,謝謝

分詞錯誤:AssertionError: lengths ['名嘴'] (words) and ['Na', 'VI'] (pos tags) mismatch

我在分詞的時候不知道為什麼出現這個錯誤
在此回報,請測試看看

Traceback (most recent call last):
File "monpaseg_from_sql.py", line 45, in
stmts = stmts.compute()
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\dask\base.py", line 175, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\dask\base.py", line 446, in compute
results = schedule(dsk, keys, **kwargs)
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\dask\threaded.py", line 82, in get
**kwargs
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\dask\local.py", line 491, in get_async
raise_exception(exc, tb)
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\dask\compatibility.py", line 130, in reraise
raise exc
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\dask\local.py", line 233, in execute_task
result = execute_task(task, data)
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\dask\core.py", line 119, in execute_task
return func(*args2)
File "monpaseg_from_sql.py", line 27, in generate_update_monpa_stmt
seg_newscontent = monpa_seg(row_contains_title_and_content[2], config_filepath=commonvar_file_path)
File "E:\Software\scripts\python\ML_DL_final\segmentation_functions.py", line
39, in monpa_seg
cuttedwords = [monpa.cut(sent) for sent in sents]
File "E:\Software\scripts\python\ML_DL_final\segmentation_functions.py", line
39, in
cuttedwords = [monpa.cut(sent) for sent in sents]
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\monpa_init
.py", line 410, in cut
return cut_w_userdict(text)
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\monpa_init
.py", line 327, in cut_w_userdict
conll_formatted, segmented_words, pos_tags = to_CoNLL_format(sentence, model_out)
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\monpa_init_.py", line 210, in to_CoNLL_format
(segmented_words), (pos_tags))
AssertionError: lengths ['名嘴'] (words) and ['Na', 'VI'] (pos tags) mismatch

RuntimeError: unexpected EOF, expected 2095002 more bytes. The file might be corrupted.

env

  • win10
  • python 3.7.3
  • torch==1.1.0

my code:

import monpa
sentence = "蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。"
result = monpa.cut(sentence)

for t in result:
    print(t)

Result:

+---------------------------------------------------------------------+
  Welcome to use MONPA: Multi-Objective NER POS Annotator for Chinese
+---------------------------------------------------------------------+
Good, we can found the model file
07/25/2019 14:17:06 - INFO - monpa -   running on device cpu
Traceback (most recent call last):
  File "c:\Users\Tom Chen\.vscode\extensions\ms-python.python-2019.5.18875\pythonFiles\ptvsd_launcher.py", line 43, in <module>
    main(ptvsdArgs)
  File "c:\Users\Tom Chen\.vscode\extensions\ms-python.python-2019.5.18875\pythonFiles\lib\python\ptvsd\__main__.py", line 434, in main
    run()
  File "c:\Users\Tom Chen\.vscode\extensions\ms-python.python-2019.5.18875\pythonFiles\lib\python\ptvsd\__main__.py", line 312, in run_file
    runpy.run_path(target, run_name='__main__')
  File "c:\programdata\anaconda3\Lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "c:\programdata\anaconda3\Lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "c:\programdata\anaconda3\Lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "d:\Codes\PythonPlayground\monpa_test.py", line 1, in <module>
    import monpa
  File "d:\Codes\PythonPlayground\env\lib\site-packages\monpa\__init__.py", line 128, in <module>
    checkpoint_dict = torch.load(args.model, map_location='cpu')
  File "d:\Codes\PythonPlayground\env\lib\site-packages\torch\serialization.py", line 387, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "d:\Codes\PythonPlayground\env\lib\site-packages\torch\serialization.py", line 581, in _load
    deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 2095002 more bytes. The file might be corrupted.

Note:
This error only occurs at first time

Unsupported string type: <class 'list'>

試著在自己的伺服器上架設web api
一開始都沒問題,今天突然出現以下錯誤,請問如何解決? 謝謝

File "/home/shilik/hug_api/lib/python3.10/site-packages/monpa/__init__.py", line 390, in pseg
[wsgi:error] [pid 1672] [client 192.168.0.17:57731]     text = convert_to_unicode(text)
  File "/home/shilik/hug_api/lib/python3.10/site-packages/monpa/tokenization.py", line 56, in convert_to_unicode
  raise ValueError("Unsupported string type: %s" % (type(text)))
 ValueError: Unsupported string type: <class 'list'>

自訂辭典導致分詞出現重複詞

example of user's dictionary
賣弄風騷的人 1000 NER
獨領世界風騷 1000 NER
獨領風騷 1000 NER
賣弄風騷 1000 NER
帶領風騷 1000 NER
領風騷 1000 NER
風騷 1000 NER

print(monpa.cut("15檔龍年雙旺領風騷"))

['15', '檔', '龍年', '雙', '旺領', '風騷']

monpa.load_userdict(file_path_to_dict)

print(monpa.cut("15檔龍年雙旺領風騷"))

['15', '檔', '龍年', '雙', '旺', '領風騷', '風騷']

============================================
example of user's 2nd dictionary (remove 風騷 1000 NER)
賣弄風騷的人 1000 NER
獨領世界風騷 1000 NER
獨領風騷 1000 NER
賣弄風騷 1000 NER
帶領風騷 1000 NER
領風騷 1000 NER

print(monpa.cut("15檔龍年雙旺領風騷"))

['15', '檔', '龍年', '雙', '旺領', '風騷']

monpa.load_userdict(file_path_to_dict)

print(monpa.cut("15檔龍年雙旺領風騷"))

['15', '檔', '龍年', '雙', '旺', '領風騷']

__init__.py會蓋掉所有logging設定

logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s', \
                        datefmt = '%m/%d/%Y %H:%M:%S', \
                        level = logging.INFO)
logger = logging.getLogger(__name__)

這段會使得所有import monpa專案的module,在import階段logger就被蓋掉。建議至少拿掉level=logging.INFO這行,不然所有使用monpa的專案都會無法使用DEBUG等級,或者被迫在所有import monpa的模組裡面override掉這個設定。

AttributeError: module 'monpa' has no attribute 'cut'

import monpa
sentence = "蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。"
result = monpa.cut(sentence)

for t in result:
print(t)

Traceback (most recent call last):
File "monpa.py", line 1, in
import monpa
File "/Users/jay/py/tmu/monpa.py", line 5, in
result = monpa.cut(sentence)
AttributeError: module 'monpa' has no attribute 'cut'

Python 3.6.8 :: Anaconda custom (64-bit) / MacOS & Ubuntu16
1.Can't Download model-511.pt
2. Error Code AttributeError: module 'monpa' has no attribute 'cut'

how to run on gpu?

發現加入user_dict,
切分詞跑得很慢,
有提供可以跑在gpu的方法嗎?

Need to upgrade torch API

Got the following warning:

crf_layer.py:374: UserWarning: where received a uint8 condition tensor. 
This behavior is deprecated and will be removed in a future version of PyTorch. 
Use a boolean condition instead. 

Need to upgrade crf_layer and other callers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.