Coder Social home page Coder Social logo

taishi-i / nagisa Goto Github PK

View Code? Open in Web Editor NEW
376.0 12.0 22.0 40.39 MB

A Japanese tokenizer based on recurrent neural networks

Home Page: https://huggingface.co/spaces/taishi-i/nagisa-demo

License: MIT License

Python 86.75% Cython 13.25%
dynet word-segmentation pos-tagging japanese nlp-library sequence-labeling natural-language-processing nlp tokenizer

nagisa's Introduction


Python package Build status Coverage Status Documentation Status PyPI Hugging Face Spaces Downloads

Nagisa is a python module for Japanese word segmentation/POS-tagging. It is designed to be a simple and easy-to-use tool.

This tool has the following features.

  • Based on recurrent neural networks.
  • The word segmentation model uses character- and word-level features [池田+].
  • The POS-tagging model uses tag dictionary information [Inoue+].

For more details refer to the following links.

  • The stop words for nagisa are available here.
  • The presentation slide at PyCon JP (2022) is available here.
  • The article in Japanese is available here.
  • The documentation is available here.

Installation

To use nagisa, you need to have Python versions 3.6 through 3.12 on Linux, or Python versions 3.9 through 3.12 on macOS (Intel or M1/M2). You can install nagisa with the following command.

pip install nagisa

For Windows users, please run it with python 3.6, 3.7 or 3.8 (64bit). It is also compatible with the Windows Subsystem for Linux (WSL).

Basic usage

Sample of word segmentation and POS-tagging for Japanese.

import nagisa

text = 'Pythonで簡単に使えるツールです'
words = nagisa.tagging(text)
print(words)
#=> Python/名詞 で/助詞 簡単/形状詞 に/助動詞 使える/動詞 ツール/名詞 です/助動詞

# Get a list of words
print(words.words)
#=> ['Python', 'で', '簡単', 'に', '使える', 'ツール', 'です']

# Get a list of POS-tags
print(words.postags)
#=> ['名詞', '助詞', '形状詞', '助動詞', '動詞', '名詞', '助動詞']

Post-processing functions

Filter and extarct words by the specific POS tags.

# Filter the words of the specific POS tags.
words = nagisa.filter(text, filter_postags=['助詞', '助動詞'])
print(words)
#=> Python/名詞 簡単/形状詞 使える/動詞 ツール/名詞

# Extarct only nouns.
words = nagisa.extract(text, extract_postags=['名詞'])
print(words)
#=> Python/名詞 ツール/名詞

# This is a list of available POS-tags in nagisa.
print(nagisa.tagger.postags)
#=> ['補助記号', '名詞', ... , 'URL']

Add the user dictionary in easy way.

# default
text = "3月に見た「3月のライオン」"
print(nagisa.tagging(text))
#=> 3/名詞 月/名詞 に/助詞 見/動詞 た/助動詞 「/補助記号 3/名詞 月/名詞 の/助詞 ライオン/名詞 」/補助記号

# If a word ("3月のライオン") is included in the single_word_list, it is recognized as a single word.
new_tagger = nagisa.Tagger(single_word_list=['3月のライオン'])
print(new_tagger.tagging(text))
#=> 3/名詞 月/名詞 に/助詞 見/動詞 た/助動詞 「/補助記号 3月のライオン/名詞 」/補助記号

Train a model

Nagisa (v0.2.0+) provides a simple train method for a joint word segmentation and sequence labeling (e.g, POS-tagging, NER) model.

The format of the train/dev/test files is tsv. Each line is word and tag and one line is represented by word \t(tab) tag. Note that you put EOS between sentences. Refer to sample datasets and tutorial (Train a model for Universal Dependencies).

$ cat sample.train
唯一	NOUN
の	ADP
趣味	NOU
は	ADP
料理	NOUN
EOS
とても	ADV
おいしかっ	ADJ
た	AUX
です	AUX
。	PUNCT
EOS
ドル	NOUN
は	ADP
主要	ADJ
通貨	NOUN
EOS
# After finish training, save the three model files (*.vocabs, *.params, *.hp).
nagisa.fit(train_file="sample.train", dev_file="sample.dev", test_file="sample.test", model_name="sample")

# Build the tagger by loading the trained model files.
sample_tagger = nagisa.Tagger(vocabs='sample.vocabs', params='sample.params', hp='sample.hp')

text = "福岡・博多の観光情報"
words = sample_tagger.tagging(text)
print(words)
#> 福岡/PROPN ・/SYM 博多/PROPN の/ADP 観光/NOUN 情報/NOUN

nagisa's People

Contributors

bung87 avatar codacy-badger avatar dependabot[bot] avatar felixonmars avatar fossabot avatar taishi-i avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nagisa's Issues

dict_file format

Now I'm trying to train with nagisa.fit by UD_Classical_Chinese-Kyoto (漢文) under 4-level classified-POS-system (4階層品詞). See my blog what I tried. I found dict_file parameter in nagisa.fit and I guessed it an outer dictionary (外部辞書). But I could not find any explanation or usage of the dict_file in your document. How do I use dict_file? Does it (and train_file) support classified-POS-system?

Returning a generator instead of a list in nagisa.postagging

Hi, I'm trying to figure out how to POS-tag a list of tokens that have already been tokenized and I found #8 , which works fine.

And I think that returning a generator instead of a list would be better for users, since it will create a long list of POS tags in-memory for a large input text. And in most cases, the returned POS-tags are to be iterated over (usually only once) to be zipped with the tokens.

Or, you could provide two functions, like postagging and lpostagging, the former one returning a generator and the latter one returning a common list.

Details about pre-trained nagisa model

I have questions about the hyper parameters and corpus used to train the built-in model.

When I execute code below:

import nagisa
tagger = nagisa.Tagger()
print(tagger._hp)

I get

{
	'LAYERS': 1,
	'THRESHOLD': 2,
	'DECAY': 3,
	'EPOCH': 50,
	'WINDOW_SIZE': 3,
	'DIM_UNI': 32,
	'DIM_BI': 16,
	'DIM_WORD': 16,
	'DIM_CTYPE': 8,
	'DIM_TAGEMB': 16,
	'DIM_HIDDEN': 100,
	'LEARNING_RATE': 0.075,
	'DROPOUT_RATE': 0.2,
	'TRAINSET': '../../nlp2018/workshop/nagisa-train/data/bccwj.train',
	'TESTSET': '../../nlp2018/workshop/nagisa-train/data/bccwj.test',
	'DEVSET': '../../nlp2018/workshop/nagisa-train/data/bccwj.dev',
	'DICTIONARY': '../../nlp2018/workshop/nagisa-train/data/unidict.txt',
	'HYPERPARAMS': 'data/nagisa_v002.hp',
	'MODEL': 'data/nagisa_v002.model',
	'VOCAB': 'data/nagisa_v002.dict',
	'EPOCH_MODEL': 'data/epoch.model',
	'TMP_PRED': 'data/pred',
	'TMP_GOLD': 'data/gold',
	'VOCAB_SIZE_UNI': 3090,
	'VOCAB_SIZE_BI': 82114,
	'VOCAB_SIZE_WORD': 59260,
	'VOCAB_SIZE_POSTAG': 24
}

Here I have 3 questions:

  1. The prefix of the files (nagisa_v002) is different from the actual files (nagisa_v001). Is this just a matter of the filename?
  2. It says it used BCCWJ as the source data. I believe it's this BCCWJ (https://pj.ninjal.ac.jp/corpus_center/bccwj/), but would like to confirm this is the case.
  3. If the answer for the previous question is yes, could you share more about the training data such as the # of lines, word unit (short/long)?

importing nagisa gives error "source code string cannot contain null bytes"

Full output

nagisa-0.2.11-cp310-cp310-manylinux_2_5_x86_64

ValueError                                Traceback (most recent call last)
Cell In[10], [line 6](vscode-notebook-cell:?execution_count=10&line=6)
      [1](vscode-notebook-cell:?execution_count=10&line=1) # now, tokenizing the data
      [2](vscode-notebook-cell:?execution_count=10&line=2) #Text preprocessing, tokenizing and filtering of stopwords are all included in CountVectorizer, which builds a dictionary of features and transforms documents to feature vectors:
      [3](vscode-notebook-cell:?execution_count=10&line=3) 
      [4](vscode-notebook-cell:?execution_count=10&line=4) # custom tokenization, this also removes common words
      [5](vscode-notebook-cell:?execution_count=10&line=5) from keyword_extraction import extract_keyword
----> [6](vscode-notebook-cell:?execution_count=10&line=6) import nagisa
      [7](vscode-notebook-cell:?execution_count=10&line=7) # Takes in a document, returns the list of words
      [8](vscode-notebook-cell:?execution_count=10&line=8) def tokenize_jp(doc):

File [~/.local/lib/python3.10/site-packages/nagisa/__init__.py:4](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/__init__.py:4)
      [1](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/__init__.py:1) import nagisa_utils as utils
      [3](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/__init__.py:3) from nagisa.tagger import Tagger
----> [4](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/__init__.py:4) from nagisa.train import fit
      [6](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/__init__.py:6) version = '0.2.11'
      [7](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/__init__.py:7) # Initialize instance

File [~/.local/lib/python3.10/site-packages/nagisa/train.py:11](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/train.py:11)
      [7](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/train.py:7) import logging
      [8](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/train.py:8) from collections import OrderedDict
---> [11](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/train.py:11) import model
     [12](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/train.py:12) import prepro
     [13](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/train.py:13) import mecab_system_eval

Improving the handling of numerals of nagisa's word tokenizer

I'm using nagisa v0.1.1. There's some problems about the tokenizer's handling of numerals, the numbers and decimals are split as single characters and tagged as "名詞"
357 -> 3_名詞 5_名詞 7_名詞 # Numbers
1.48 -> 1_名詞 ._名詞 4_名詞 8_名詞 # Decimals
$5.5 -> $_補助記号 5_名詞 ._補助記号 5_名詞 # Numbers with currency symbols (and other symbols)
133-1111-2222 -> 1_名詞 3_名詞 3_名詞 -_補助記号 1_名詞 1_名詞 1_名詞 1_名詞 -_補助記号 2_名詞 2_名詞 2_名詞 2_名詞 # Phone numbers

and etc... Is it possible to improve this?

Heroku deployment of NLP model Nagisa Tokenizer showing error

Hi,
I deployed my Flask App ( NLP model ) on Heroku. I was basically a price prediction model where some columns were in Japanese where I applied NLP + Nagisa Library for tokenization and some columns were numerical data. I pickled vectorizers and the model and Finally added them to my Flask API. But after deployment when I added the values in the frontend and clicked on Predict button, the result is not getting displayed. This is the exact error I am facing.
image
The exact code of Tokenizer_jp is :
def tokenize_jp(doc): doc = nagisa.tagging(doc) return doc.words

I am not able to figure out how to fix this? does Nagisa work in Heroku deployment?
PS: I am not really sure if the problem is with Heroku or Nagisa, please help me with this.

Dynet38 is not compatible with python3.11 on macos m2

Hi Taishi-i,

I know you have been battling with dynet not having the proper wheel for some python versions...
unfortunately, I cannot install nagisa on MacOS Sonoma 14.3.1 (23D60), with python3.11.

I got the following error using pip-sync:

ERROR: Ignored the following versions that require a different python version: 1.21.2 Requires-Python >=3.7,<3.11; 1.21.3 Requires-Python >=3.7,<3.11; 1.21.4 Requires-Python >=3.7,<3.11; 1.21.5 Requires-Python >=3.7,<3.11; 1.21.6 Requires-Python >=3.7,<3.11
ERROR: Could not find a version that satisfies the requirement DyNet38 (from nagisa) (from versions: none)

Fail to install the package

Hi!
I am trying to install nagisa, and I am using windows. I installed Microsoft Visual C++ 14.0 suggested in the error message but I still run into the same following error. Can somebody help?
Collecting nagisa
Using cached https://files.pythonhosted.org/packages/29/25/f8a7916c541c79eb59c3a30f80ab2055ff26330518bdffa3e38ee4d76edf/nagisa-0.2.4.tar.gz
Requirement already satisfied: six in c:\users\sophi\appdata\local\continuum\miniconda3\envs\uipath\lib\site-packages (from nagisa) (1.12.0)
Requirement already satisfied: numpy in c:\users\sophi\appdata\local\continuum\miniconda3\envs\uipath\lib\site-packages (from nagisa) (1.16.4)
Requirement already satisfied: DyNet in c:\users\sophi\appdata\local\continuum\miniconda3\envs\uipath\lib\site-packages (from nagisa) (2.1)
Requirement already satisfied: cython in c:\users\sophi\appdata\local\continuum\miniconda3\envs\uipath\lib\site-packages (from DyNet->nagisa) (0.29.13)
Building wheels for collected packages: nagisa
Building wheel for nagisa (setup.py) ... error
ERROR: Complete output from command 'C:\Users\sophi\AppData\Local\Continuum\miniconda3\envs\uipath\python.exe' -u -c 'import setuptools, tokenize;file='"'"'C:\Users\sophi\AppData\Local\Temp\pip-install-lgcnxvk5\nagisa\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\sophi\AppData\Local\Temp\pip-wheel-nygi_3k8' --python-tag cp37:
ERROR: running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-3.7
creating build\lib.win-amd64-3.7\nagisa
copying nagisa\mecab_system_eval.py -> build\lib.win-amd64-3.7\nagisa
copying nagisa\model.py -> build\lib.win-amd64-3.7\nagisa
copying nagisa\prepro.py -> build\lib.win-amd64-3.7\nagisa
copying nagisa\tagger.py -> build\lib.win-amd64-3.7\nagisa
copying nagisa\train.py -> build\lib.win-amd64-3.7\nagisa
copying nagisa_init_.py -> build\lib.win-amd64-3.7\nagisa
running egg_info
writing nagisa.egg-info\PKG-INFO
writing dependency_links to nagisa.egg-info\dependency_links.txt
writing requirements to nagisa.egg-info\requires.txt
writing top-level names to nagisa.egg-info\top_level.txt
reading manifest file 'nagisa.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'nagisa.egg-info\SOURCES.txt'
copying nagisa\utils.c -> build\lib.win-amd64-3.7\nagisa
copying nagisa\utils.pyx -> build\lib.win-amd64-3.7\nagisa
creating build\lib.win-amd64-3.7\nagisa\data
copying nagisa\data\models.jpg -> build\lib.win-amd64-3.7\nagisa\data
copying nagisa\data\nagisa_logo.png -> build\lib.win-amd64-3.7\nagisa\data
copying nagisa\data\nagisa_v001.dict -> build\lib.win-amd64-3.7\nagisa\data
copying nagisa\data\nagisa_v001.hp -> build\lib.win-amd64-3.7\nagisa\data
copying nagisa\data\nagisa_v001.model -> build\lib.win-amd64-3.7\nagisa\data
creating build\lib.win-amd64-3.7\nagisa\data\sample_datasets
copying nagisa\data\sample_datasets\sample.dev -> build\lib.win-amd64-3.7\nagisa\data\sample_datasets
copying nagisa\data\sample_datasets\sample.dict -> build\lib.win-amd64-3.7\nagisa\data\sample_datasets
copying nagisa\data\sample_datasets\sample.emb -> build\lib.win-amd64-3.7\nagisa\data\sample_datasets
copying nagisa\data\sample_datasets\sample.test -> build\lib.win-amd64-3.7\nagisa\data\sample_datasets
copying nagisa\data\sample_datasets\sample.train -> build\lib.win-amd64-3.7\nagisa\data\sample_datasets
running build_ext
building 'utils' extension
error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": https://visualstudio.microsoft.com/downloads/

ERROR: Failed building wheel for nagisa
Running setup.py clean for nagisa
Failed to build nagisa
Installing collected packages: nagisa
Running setup.py install for nagisa ... error
ERROR: Complete output from command 'C:\Users\sophi\AppData\Local\Continuum\miniconda3\envs\uipath\python.exe' -u -c 'import setuptools, tokenize;file='"'"'C:\Users\sophi\AppData\Local\Temp\pip-install-lgcnxvk5\nagisa\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\sophi\AppData\Local\Temp\pip-record-ed6qn2hl\install-record.txt' --single-version-externally-managed --compile:
ERROR: running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.7
creating build\lib.win-amd64-3.7\nagisa
copying nagisa\mecab_system_eval.py -> build\lib.win-amd64-3.7\nagisa
copying nagisa\model.py -> build\lib.win-amd64-3.7\nagisa
copying nagisa\prepro.py -> build\lib.win-amd64-3.7\nagisa
copying nagisa\tagger.py -> build\lib.win-amd64-3.7\nagisa
copying nagisa\train.py -> build\lib.win-amd64-3.7\nagisa
copying nagisa_init_.py -> build\lib.win-amd64-3.7\nagisa
running egg_info
writing nagisa.egg-info\PKG-INFO
writing dependency_links to nagisa.egg-info\dependency_links.txt
writing requirements to nagisa.egg-info\requires.txt
writing top-level names to nagisa.egg-info\top_level.txt
reading manifest file 'nagisa.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'nagisa.egg-info\SOURCES.txt'
copying nagisa\utils.c -> build\lib.win-amd64-3.7\nagisa
copying nagisa\utils.pyx -> build\lib.win-amd64-3.7\nagisa
creating build\lib.win-amd64-3.7\nagisa\data
copying nagisa\data\models.jpg -> build\lib.win-amd64-3.7\nagisa\data
copying nagisa\data\nagisa_logo.png -> build\lib.win-amd64-3.7\nagisa\data
copying nagisa\data\nagisa_v001.dict -> build\lib.win-amd64-3.7\nagisa\data
copying nagisa\data\nagisa_v001.hp -> build\lib.win-amd64-3.7\nagisa\data
copying nagisa\data\nagisa_v001.model -> build\lib.win-amd64-3.7\nagisa\data
creating build\lib.win-amd64-3.7\nagisa\data\sample_datasets
copying nagisa\data\sample_datasets\sample.dev -> build\lib.win-amd64-3.7\nagisa\data\sample_datasets
copying nagisa\data\sample_datasets\sample.dict -> build\lib.win-amd64-3.7\nagisa\data\sample_datasets
copying nagisa\data\sample_datasets\sample.emb -> build\lib.win-amd64-3.7\nagisa\data\sample_datasets
copying nagisa\data\sample_datasets\sample.test -> build\lib.win-amd64-3.7\nagisa\data\sample_datasets
copying nagisa\data\sample_datasets\sample.train -> build\lib.win-amd64-3.7\nagisa\data\sample_datasets
running build_ext
building 'utils' extension
error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": https://visualstudio.microsoft.com/downloads/
ERROR: Command "'C:\Users\sophi\AppData\Local\Continuum\miniconda3\envs\uipath\python.exe' -u -c 'import setuptools, tokenize;file='"'"'C:\Users\sophi\AppData\Local\Temp\pip-install-lgcnxvk5\nagisa\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\sophi\AppData\Local\Temp\pip-record-ed6qn2hl\install-record.txt' --single-version-externally-managed --compile" failed with error code 1 in C:\Users\sophi\AppData\Local\Temp\pip-install-lgcnxvk5\nagisa\

Failed to install on docker on pi4b

I have the following Dockerfile

FROM python:3.10-bookworm

WORKDIR /app
COPY . .
RUN pip install -r requirements.txt

CMD ["python", "main.py"]

where content of requirements.txt is

atproto == 0.0.42
python-dotenv == 1.0.1
nagisa == 0.2.11

I am getting error

ERROR: Could not find a version that satisfies the requirement DyNet38 (from nagisa) (from versions: none)
ERROR: No matching distribution found for DyNet38

How can I resolve it?

building nagisa on m1

I am facing this issue:

[notice] To update, run: pip install --upgrade pip
(venv) b@m1 vocab % pip install nagisa
Collecting nagisa
  Using cached nagisa-0.2.8.tar.gz (20.9 MB)
  Preparing metadata (setup.py) ... done
Collecting six
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting numpy
  Using cached numpy-1.23.4-cp310-cp310-macosx_11_0_arm64.whl (13.3 MB)
Collecting nagisa
  Using cached nagisa-0.2.7.tar.gz (20.9 MB)
  Preparing metadata (setup.py) ... done
Collecting DyNet
  Using cached dyNET-2.1.2.tar.gz (509 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting cython
  Using cached Cython-0.29.32-py2.py3-none-any.whl (986 kB)
Building wheels for collected packages: nagisa, DyNet
  Building wheel for nagisa (setup.py) ... done
  Created wheel for nagisa: filename=nagisa-0.2.7-cp310-cp310-macosx_11_0_arm64.whl size=21306402 sha256=c559ab30293dffc0d1ae36d215725dec08da0910ed1c3331728c398397258d2f
  Stored in directory: /Users/b/Library/Caches/pip/wheels/cf/38/0b/463d99fdf6d3c736cfcb4124124496513831eeefdc7f896391
  Building wheel for DyNet (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for DyNet (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [101 lines of output]
      /private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-build-env-rvxcggqa/overlay/lib/python3.10/site-packages/setuptools/dist.py:530: UserWarning: Normalizing 'v2.1.2' to '2.1.2'
        warnings.warn(tmpl.format(**locals()))
      /private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-build-env-rvxcggqa/overlay/lib/python3.10/site-packages/setuptools/dist.py:771: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
        warnings.warn(
      running bdist_wheel
      running build
      INFO:root:CMAKE_PATH='/opt/homebrew/bin/cmake'
      INFO:root:MAKE_PATH='/usr/bin/make'
      INFO:root:MAKE_FLAGS='-j 8'
      INFO:root:EIGEN3_INCLUDE_DIR='/private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-install-v2h7cwoe/dynet_f6727a54d6ce4c5d83d9578e2d0a272a/build/py3.10-64bit/eigen'
      INFO:root:EIGEN3_DOWNLOAD_URL='https://github.com/clab/dynet/releases/download/2.1/eigen-b2e267dc99d4.zip'
      INFO:root:CC_PATH='/usr/bin/gcc'
      INFO:root:CXX_PATH='/usr/bin/g++'
      INFO:root:SCRIPT_DIR='/private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-install-v2h7cwoe/dynet_f6727a54d6ce4c5d83d9578e2d0a272a'
      INFO:root:BUILD_DIR='/private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-install-v2h7cwoe/dynet_f6727a54d6ce4c5d83d9578e2d0a272a/build/py3.10-64bit'
      INFO:root:INSTALL_PREFIX='/Users/b/study/jap/vocab/venv/lib/python3.10/site-packages/../../..'
      INFO:root:PYTHON='/Users/b/study/jap/vocab/venv/bin/python3.10'
      cmake version 3.24.1

      CMake suite maintained and supported by Kitware (kitware.com/cmake).
      Apple clang version 13.1.6 (clang-1316.0.21.2.5)
      Target: arm64-apple-darwin21.6.0
      Thread model: posix
      InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
      INFO:root:Creating build directory /private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-install-v2h7cwoe/dynet_f6727a54d6ce4c5d83d9578e2d0a272a/build/py3.10-64bit
      INFO:root:Fetching Eigen...
      INFO:root:Unpacking Eigen...
      INFO:root:Configuring...
      -- The C compiler identification is AppleClang 13.1.6.13160021
      -- The CXX compiler identification is AppleClang 13.1.6.13160021
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/usr/bin/gcc - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/usr/bin/g++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      CMake Deprecation Warning at CMakeLists.txt:2 (cmake_minimum_required):
        Compatibility with CMake < 2.8.12 will be removed from a future version of
        CMake.

        Update the VERSION argument <min> value or use a ...<max> suffix to tell
        CMake that the project does not need compatibility with older versions.


      -- Optimization level: fast
      -- BACKEND not specified, defaulting to eigen.
      -- Eigen dir is /private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-install-v2h7cwoe/dynet_f6727a54d6ce4c5d83d9578e2d0a272a/build/py3.10-64bit/eigen
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- Found Cython version 0.29.32

      CMAKE_INSTALL_PREFIX="/Users/b/study/jap/vocab/venv"
      PROJECT_SOURCE_DIR="/private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-install-v2h7cwoe/dynet_f6727a54d6ce4c5d83d9578e2d0a272a"
      PROJECT_BINARY_DIR="/private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-install-v2h7cwoe/dynet_f6727a54d6ce4c5d83d9578e2d0a272a/build/py3.10-64bit"
      LIBS=""
      EIGEN3_INCLUDE_DIR="/private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-install-v2h7cwoe/dynet_f6727a54d6ce4c5d83d9578e2d0a272a/build/py3.10-64bit/eigen"
      MKL_LINK_DIRS=""
      WITH_CUDA_BACKEND=""
      CUDA_RT_FILES=""
      CUDA_RT_DIRS=""
      CUDA_CUBLAS_FILES=""
      CUDA_CUBLAS_DIRS=""
      MSVC=""
      fatal: not a git repository (or any of the parent directories): .git
      -- Configuring done
      -- Generating done
      -- Build files have been written to: /private/var/folders/yv/lystpk8n2015cf8vmqd2yj_c0000gp/T/pip-install-v2h7cwoe/dynet_f6727a54d6ce4c5d83d9578e2d0a272a/build/py3.10-64bit
      INFO:root:Compiling...
      [  4%] Building CXX object dynet/CMakeFiles/dynet.dir/deep-lstm.cc.o
      [  4%] Building CXX object dynet/CMakeFiles/dynet.dir/exec.cc.o
      [  4%] Building CXX object dynet/CMakeFiles/dynet.dir/aligned-mem-pool.cc.o
      [  5%] Building CXX object dynet/CMakeFiles/dynet.dir/cfsm-builder.cc.o
      [  8%] Building CXX object dynet/CMakeFiles/dynet.dir/dynet.cc.o
      [  8%] Building CXX object dynet/CMakeFiles/dynet.dir/dict.cc.o
      [ 10%] Building CXX object dynet/CMakeFiles/dynet.dir/devices.cc.o
      [ 11%] Building CXX object dynet/CMakeFiles/dynet.dir/dim.cc.o
      clang: error: the clang compiler does not support '-march=native'
      clang: error: the clang compiler does not support '-march=native'
      clang: error: the clang compiler does not support '-march=native'
      clang: error: the clang compiler does not support '-march=native'
      clang: error: the clang compiler does not support '-march=native'
      clang: error: the clang compiler does not support '-march=native'
      make[2]: *** [dynet/CMakeFiles/dynet.dir/devices.cc.o] Error 1
      make[2]: *** Waiting for unfinished jobs....
      make[2]: *** [dynet/CMakeFiles/dynet.dir/aligned-mem-pool.cc.o] Error 1
      make[2]: *** [dynet/CMakeFiles/dynet.dir/dynet.cc.o] Error 1
      make[2]: *** [dynet/CMakeFiles/dynet.dir/cfsm-builder.cc.o] Error 1
      clang: error: the clang compiler does not support '-march=native'
      clang: error: the clang compiler does not support '-march=native'
      make[2]: *** [dynet/CMakeFiles/dynet.dir/dim.cc.o] Error 1
      make[2]: *** [dynet/CMakeFiles/dynet.dir/deep-lstm.cc.o] Error 1
      make[2]: *** [dynet/CMakeFiles/dynet.dir/dict.cc.o] Error 1
      make[2]: *** [dynet/CMakeFiles/dynet.dir/exec.cc.o] Error 1
      make[1]: *** [dynet/CMakeFiles/dynet.dir/all] Error 2
      make: *** [all] Error 2
      error: /usr/bin/make -j 8
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for DyNet
Successfully built nagisa

any ideas?

About referecnce this library

I want to citation this library, and I want to know you have any paper or journal to read more about this library work or Bibtext for citation?

Deferring loading of DyNet on import

Hi, I'm using nagisa in my GUI-based project. When importing nagisa, it requires a few seconds to load DyNet, adding extra time to the loading time of my program, so I have to defer the import of nagisa until it is needed.
I'm not sure whether DyNet is required by all functions of nagisa, but I think that it would be better if the loading of DyNet is deferred until it is needed by nagisa the first time.

core dumped

I am running manjaro linux on a thinkpad x230, using python 3.9.7 and the version of nagisa from pip. When i run import nagisa i get Illegal instruction (core dumped)

Illegal instruction (core dumped)

Thanks for building this. I've been trying mecab and not been getting the exact results that I need and thought I'd give this a try.

For now, I have this working on a centos box, but I'm wanting to get this working on ubuntu as it's my main dev machine.

I keep getting:

[dynet] random seed: 1234
[dynet] allocating memory: 32MB
Illegal instruction (core dumped)

Distributor ID: Ubuntu
Description: Ubuntu 20.04 LTS
Release: 20.04
Codename: focal

  • Python 3.8.5
  • 8GB laptop.

Is there any more information you need? Thanks

Nagisa changes Japanese zenkaku to hankaku

Hi Taishi-i.
When I use Nagisa to tokenize Japanese text. It auto changes zenkaku symbols to hankaku. For example:
"(" → "("
")" → ")"
"〜" → "~"
Can you guide me how to remain zenkaku after tokenizing Japanese text.

error: command 'cl.exe' failed: No such file or directory

When I use pip install nagisa to install,the error message is:

Collecting nagisa
Using cached https://files.pythonhosted.org/packages/a1/40/a94f7944ee5d6a4d44eadcc966fe0d46b5155fb139d7b4d708e439617df1/nagisa-0.1.1.tar.gz
Requirement already satisfied: six in e:\anaconda3\lib\site-packages (from nagisa) (1.11.0)
Requirement already satisfied: numpy in e:\anaconda3\lib\site-packages (from nagisa) (1.14.0)
Requirement already satisfied: DyNet in e:\anaconda3\lib\site-packages (from nagisa) (2.1)
Requirement already satisfied: cython in e:\anaconda3\lib\site-packages (from DyNet->nagisa) (0.27.3)
Building wheels for collected packages: nagisa
Running setup.py bdist_wheel for nagisa ... error
Complete output from command e:\anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\test\AppData\Local\Temp\pip-install-t_9_vdzk\nagisa\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d C:\Users\test\AppData\Local\Temp\pip-wheel-dmgx_3eh --python-tag cp36:
running bdist_wheel
Warning: Extension name 'utils' does not match fully qualified name 'nagisa.utils' of 'nagisa/utils.pyx'
running build
running build_py
creating build
creating build\lib.win-amd64-3.6
creating build\lib.win-amd64-3.6\nagisa
copying nagisa\mecab_system_eval.py -> build\lib.win-amd64-3.6\nagisa
copying nagisa\model.py -> build\lib.win-amd64-3.6\nagisa
copying nagisa\prepro.py -> build\lib.win-amd64-3.6\nagisa
copying nagisa\tagger.py -> build\lib.win-amd64-3.6\nagisa
copying nagisa\train.py -> build\lib.win-amd64-3.6\nagisa
copying nagisa_init_.py -> build\lib.win-amd64-3.6\nagisa
running egg_info
writing nagisa.egg-info\PKG-INFO
writing dependency_links to nagisa.egg-info\dependency_links.txt
writing requirements to nagisa.egg-info\requires.txt
writing top-level names to nagisa.egg-info\top_level.txt
reading manifest file 'nagisa.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'nagisa.egg-info\SOURCES.txt'
copying nagisa\utils.c -> build\lib.win-amd64-3.6\nagisa
copying nagisa\utils.pyx -> build\lib.win-amd64-3.6\nagisa
creating build\lib.win-amd64-3.6\nagisa\data
copying nagisa\data\models.jpg -> build\lib.win-amd64-3.6\nagisa\data
copying nagisa\data\nagisa_image.jpg -> build\lib.win-amd64-3.6\nagisa\data
copying nagisa\data\nagisa_v001.dict -> build\lib.win-amd64-3.6\nagisa\data
copying nagisa\data\nagisa_v001.hp -> build\lib.win-amd64-3.6\nagisa\data
copying nagisa\data\nagisa_v001.model -> build\lib.win-amd64-3.6\nagisa\data
running build_ext
building 'utils' extension
creating build\temp.win-amd64-3.6
creating build\temp.win-amd64-3.6\Release
creating build\temp.win-amd64-3.6\Release\nagisa
cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -Ie:\anaconda3\lib\site-packages\numpy\core\include -Ie:\anaconda3\include -Ie:\anaconda3\include /Tcnagisa/utils.c /Fobuild\temp.win-amd64-3.6\Release\nagisa/utils.obj
error: command 'cl.exe' failed: No such file or directory


Failed building wheel for nagisa
Running setup.py clean for nagisa
Failed to build nagisa
Installing collected packages: nagisa
Running setup.py install for nagisa ... error
Complete output from command e:\anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\test\AppData\Local\Temp\pip-install-t_9_vdzk\nagisa\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\test\AppData\Local\Temp\pip-record-p2d6rr5x\install-record.txt --single-version-externally-managed --compile:
running install
Warning: Extension name 'utils' does not match fully qualified name 'nagisa.utils' of 'nagisa/utils.pyx'
running build
running build_py
creating build
creating build\lib.win-amd64-3.6
creating build\lib.win-amd64-3.6\nagisa
copying nagisa\mecab_system_eval.py -> build\lib.win-amd64-3.6\nagisa
copying nagisa\model.py -> build\lib.win-amd64-3.6\nagisa
copying nagisa\prepro.py -> build\lib.win-amd64-3.6\nagisa
copying nagisa\tagger.py -> build\lib.win-amd64-3.6\nagisa
copying nagisa\train.py -> build\lib.win-amd64-3.6\nagisa
copying nagisa_init_.py -> build\lib.win-amd64-3.6\nagisa
running egg_info
writing nagisa.egg-info\PKG-INFO
writing dependency_links to nagisa.egg-info\dependency_links.txt
writing requirements to nagisa.egg-info\requires.txt
writing top-level names to nagisa.egg-info\top_level.txt
reading manifest file 'nagisa.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'nagisa.egg-info\SOURCES.txt'
copying nagisa\utils.c -> build\lib.win-amd64-3.6\nagisa
copying nagisa\utils.pyx -> build\lib.win-amd64-3.6\nagisa
creating build\lib.win-amd64-3.6\nagisa\data
copying nagisa\data\models.jpg -> build\lib.win-amd64-3.6\nagisa\data
copying nagisa\data\nagisa_image.jpg -> build\lib.win-amd64-3.6\nagisa\data
copying nagisa\data\nagisa_v001.dict -> build\lib.win-amd64-3.6\nagisa\data
copying nagisa\data\nagisa_v001.hp -> build\lib.win-amd64-3.6\nagisa\data
copying nagisa\data\nagisa_v001.model -> build\lib.win-amd64-3.6\nagisa\data
running build_ext
building 'utils' extension
creating build\temp.win-amd64-3.6
creating build\temp.win-amd64-3.6\Release
creating build\temp.win-amd64-3.6\Release\nagisa
cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -Ie:\anaconda3\lib\site-packages\numpy\core\include -Ie:\anaconda3\include -Ie:\anaconda3\include /Tcnagisa/utils.c /Fobuild\temp.win-amd64-3.6\Release\nagisa/utils.obj
error: command 'cl.exe' failed: No such file or directory

----------------------------------------

Command "e:\anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\test\AppData\Local\Temp\pip-install-t_9_vdzk\nagisa\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\test\AppData\Local\Temp\pip-record-p2d6rr5x\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\test\AppData\Local\Temp\pip-install-t_9_vdzk\nagisa\

How to fix it?

install error on UBUNTU 18.04--python3.6

home/wentao/programming/weibospider/config/conf.py:12: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
cf = load(cont)
[dynet] random seed: 1234
[dynet] allocating memory: 32MB
[dynet] memory allocation done.
Traceback (most recent call last):
File "multi_token.py", line 12, in
import nagisa
File "/home/wentao/programming/tweet/py3_all/lib/python3.6/site-packages/nagisa/init.py", line 3, in
from nagisa.train import fit
File "/home/wentao/programming/tweet/py3_all/lib/python3.6/site-packages/nagisa/train.py", line 12, in
import prepro
File "/home/wentao/programming/tweet/py3_all/lib/python3.6/site-packages/nagisa/prepro.py", line 11, in
OOV = utils.OOV
AttributeError: module 'utils' has no attribute 'OOV'

my working dir has no utils.py

Could not install nagisa with poetry (without complicated configurations)

Hello @taishi-i, thank you for maintaining nagisa package! Let me report that I think we couldn't install nagisa with poetry.
I'm not sure why but poetry tried to build DyNet even if the manifest explicitly requires Python 3.8+ (in this case, 3.10+).

Currently I have a workaround to specify the repository on GitHub so it's just to share with you. Maybe switch_install_requires doesn't work as expected. I'm sorry that currently I'm not sure which poetry or my config is bad.

From PyPI

It (might) use sdist.

> cat pyproject.toml
[tool.poetry]
name = "testtt"
version = "0.1.0"
description = ""
authors = ["himkt <[email protected]>"]
readme = "README.md"

[tool.poetry.dependencies]
python = "^3.10"


[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
> poetry add nagisa -vvv
Loading configuration file /Users/himkt/Library/Application Support/pypoetry/config.toml
Using virtualenv: /Users/himkt/Desktop/testtt/.venv
[keyring.backend] Loading KWallet
[keyring.backend] Loading SecretService
[keyring.backend] Loading Windows
[keyring.backend] Loading chainer
[keyring.backend] Loading libsecret
[keyring.backend] Loading macOS
Creating new session for pypi.org
Source (PyPI): 23 packages found for nagisa *
Using version ^0.2.10 for nagisa

Updating dependencies
Resolving dependencies...
   1: fact: testtt is 0.1.0
   1: derived: testtt
   1: fact: testtt depends on nagisa (^0.2.10)
   1: selecting testtt (0.1.0)
   1: derived: nagisa (>=0.2.10,<0.3.0)
Source (PyPI): 1 packages found for nagisa >=0.2.10,<0.3.0
   1: fact: nagisa (0.2.10) depends on six (*)
   1: fact: nagisa (0.2.10) depends on numpy (*)
   1: fact: nagisa (0.2.10) depends on DyNet (*)
   1: selecting nagisa (0.2.10)
   1: derived: DyNet
   1: derived: numpy
   1: derived: six
Source (PyPI): 12 packages found for dynet *
Source (PyPI): 103 packages found for numpy *
Source (PyPI): 27 packages found for six *
   1: selecting numpy (1.26.3)
   1: selecting six (1.16.0)
   1: fact: dynet (2.1.2) depends on cython (*)
   1: fact: dynet (2.1.2) depends on numpy (*)
   1: selecting dynet (2.1.2)
   1: derived: cython
Source (PyPI): 116 packages found for cython *
   1: selecting cython (3.0.8)
   1: Version solving took 0.172 seconds.
   1: Tried 1 solutions.

Finding the necessary packages for the current system
Source (PyPI): 1 packages found for nagisa >=0.2.10,<0.3.0
Source (PyPI): 1 packages found for dynet *
Source (PyPI): 1 packages found for numpy *
Source (PyPI): 1 packages found for six *
Source (PyPI): 1 packages found for cython *

Package operations: 2 installs, 0 updates, 0 removals, 3 skipped

  • Installing dynet (2.1.2): Pending...
Skipping wheel dyNET-2.1.2-cp27-cp27m-macosx_10_13_x86_64.whl as this is not supported by the current environment
Skipping wheel dyNET-2.1.2-cp27-cp27m-manylinux1_i686.whl as this is not supported by the current environment
Skipping wheel dyNET-2.1.2-cp27-cp27m-manylinux1_x86_64.whl as this is not supported by the current environment
Skipping wheel dyNET-2.1.2-cp27-cp27mu-manylinux1_i686.whl as this is not supported by the current environment
Skipping wheel dyNET-2.1.2-cp27-cp27mu-manylinux1_x86_64.whl as this is not supported by the current environment
Skipping wheel dyNET-2.1.2-cp35-cp35m-macosx_10_13_x86_64.whl as this is not supported by the current environment
Skipping wheel dyNET-2.1.2-cp35-cp35m-manylinux1_i686.whl as this is not supported by the current environment
Skipping wheel dyNET-2.1.2-cp35-cp35m-manylinux1_x86_64.whl as this is not supported by the current environment
Skipping wheel dyNET-2.1.2-cp36-cp36m-macosx_10_13_x86_64.whl as this is not supported by the current environment
Skipping wheel dyNET-2.1.2-cp36-cp36m-manylinux1_i686.whl as this is not supported by the current environment
Skipping wheel dyNET-2.1.2-cp36-cp36m-manylinux1_x86_64.whl as this is not supported by the current environment
Skipping wheel dyNET-2.1.2-cp36-cp36m-win_amd64.whl as this is not supported by the current environment
Skipping wheel dyNET-2.1.2-cp37-cp37m-macosx_10_13_x86_64.whl as this is not supported by the current environment
Skipping wheel dyNET-2.1.2-cp37-cp37m-manylinux1_i686.whl as this is not supported by the current environment
Skipping wheel dyNET-2.1.2-cp37-cp37m-manylinux1_x86_64.whl as this is not supported by the current environment
Skipping wheel dyNET-2.1.2-cp37-cp37m-win_amd64.whl as this is not supported by the current environment
Skipping wheel dyNET-2.1.2-cp38-cp38-macosx_10_13_x86_64.whl as this is not supported by the current environment
Skipping wheel dyNET-2.1.2-cp38-cp38-manylinux1_i686.whl as this is not supported by the current environment
Skipping wheel dyNET-2.1.2-cp38-cp38-manylinux1_x86_64.whl as this is not supported by the current environment
Skipping wheel dyNET-2.1.2-cp38-cp38-win_amd64.whl as this is not supported by the current environment
  • Installing dynet (2.1.2): Preparing...
  • Installing dynet (2.1.2): Failed

  Stack trace:

  8  ~/.pyenv/versions/3.10.12/lib/python3.10/site-packages/poetry/installation/executor.py:269 in _execute_operation
      267│
      268│             try:
    → 269│                 result = self._do_execute_operation(operation)
      270│             except EnvCommandError as e:
      271│                 if e.e.returncode == -2:
...
  /var/folders/v1/77c226vs4m7_bfwn8m9y7yzr0000gn/T/tmp0nxnyp_h/.venv/lib/python3.10/site-packages/setuptools/command/sdist.py:125: SetuptoolsDeprecationWarning: `build_py` command does not inherit from setuptools' `build_py`.
  !!

          ********************************************************************************
          Custom 'build_py' does not implement 'get_data_files_without_manifest'.
          Please extend command classes from setuptools instead of distutils.

          See https://peps.python.org/pep-0632/ for details.
          ********************************************************************************

  !!
    self._add_data_files(self._safe_data_files(build_py))
  INFO:root:reading manifest file 'dyNET.egg-info/SOURCES.txt'
  INFO:root:adding license file 'LICENSE.txt'
  INFO:root:writing manifest file 'dyNET.egg-info/SOURCES.txt'
  INFO:root:Copying dyNET.egg-info to build/bdist.macosx-13.5-x86_64/wheel/dyNET-2.1.2-py3.10.egg-info
  INFO:root:running install_scripts
  [WARNING] This wheel needs a higher macOS version than the version your Python interpreter is compiled against.  To silence this warning, set MACOSX_DEPLOYMENT_TARGET to at least 14_0 or recreate these files with lower MACOSX_DEPLOYMENT_TARGET:
  build/bdist.macosx-13.5-x86_64/wheel/libdynet.dyliberror: [Errno 2] No such file or directory: 'LICENSE.txt'


  at ~/.pyenv/versions/3.10.12/lib/python3.10/site-packages/poetry/installation/chef.py:164 in _prepare
      160│
      161│                 error = ChefBuildError("\n\n".join(message_parts))
      162│
      163│             if error is not None:
    → 164│                 raise error from None
      165│
      166│             return path
      167│
      168│     def _prepare_sdist(self, archive: Path, destination: Path | None = None) -> Path:

Note: This error originates from the build backend, and is likely not a problem with poetry but with dynet (2.1.2) not supporting PEP 517 builds. You can verify this by running 'pip wheel --no-cache-dir --use-pep517 "dynet (==2.1.2)"'.

Full log: https://gist.github.com/himkt/f714d3c6a1f1e21f268a0ba5f4b7ee8b

From GitHub (git repo)

It uses dyNet38 as expected.

> cat pyproject.toml
[tool.poetry]
name = "testtt"
version = "0.1.0"
description = ""
authors = ["himkt <[email protected]>"]
readme = "README.md"

[tool.poetry.dependencies]
python = "^3.10"


[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
> poetry add git+https://github.com/himkt/nagisa.git#7b760c5 -vvv
Loading configuration file /Users/himkt/Library/Application Support/pypoetry/config.toml
Using virtualenv: /Users/himkt/Desktop/testtt/.venv
[keyring.backend] Loading KWallet
[keyring.backend] Loading SecretService
[keyring.backend] Loading Windows
[keyring.backend] Loading chainer
[keyring.backend] Loading libsecret
[keyring.backend] Loading macOS
[urllib3.connectionpool] Starting new HTTPS connection (1): github.com:443
[urllib3.connectionpool] https://github.com:443 "GET /himkt/nagisa.git/info/refs?service=git-upload-pack HTTP/1.1" 200 None
[urllib3.connectionpool] Starting new HTTPS connection (2): github.com:443
[urllib3.connectionpool] https://github.com:443 "POST /himkt/nagisa.git/git-upload-pack HTTP/1.1" 200 None
Cloning https://github.com/himkt/nagisa.git at '7b760c5' to /Users/himkt/Desktop/testtt/.venv/src/nagisa

Updating dependencies
Resolving dependencies...
   1: fact: testtt is 0.1.0
   1: derived: testtt
   1: fact: testtt depends on nagisa (0.2.10)
   1: selecting testtt (0.1.0)
   1: derived: nagisa (0.2.10) @ git+https://github.com/himkt/nagisa.git@7b760c5
   1: fact: nagisa (0.2.10) depends on six (*)
   1: fact: nagisa (0.2.10) depends on numpy (*)
   1: fact: nagisa (0.2.10) depends on DyNet38 (*)
   1: selecting nagisa (0.2.10 7b760c5)
   1: derived: DyNet38
   1: derived: numpy
   1: derived: six
   1: selecting numpy (1.26.3)
   1: selecting six (1.16.0)
   1: fact: dynet38 (2.2) depends on cython (*)
   1: fact: dynet38 (2.2) depends on numpy (*)
   1: selecting dynet38 (2.2)
   1: derived: cython
   1: selecting cython (3.0.8)
   1: Version solving took 0.192 seconds.
   1: Tried 1 solutions.

Finding the necessary packages for the current system

Package operations: 0 installs, 1 update, 0 removals, 4 skipped

  • Installing cython (3.0.8): Pending...
  • Installing cython (3.0.8): Skipped for the following reason: Already installed
  • Installing dynet38 (2.2): Pending...
  • Installing dynet38 (2.2): Skipped for the following reason: Already installed
  • Installing six (1.16.0): Pending...
  • Installing six (1.16.0): Skipped for the following reason: Already installed
  • Installing numpy (1.26.3): Pending...
  • Installing numpy (1.26.3): Skipped for the following reason: Already installed
  • Updating nagisa (0.2.10 989a080 -> 0.2.10 7b760c5): Pending...
  • Updating nagisa (0.2.10 989a080 -> 0.2.10 7b760c5): Cloning...
  • Updating nagisa (0.2.10 989a080 -> 0.2.10 7b760c5): Preparing...
  • Updating nagisa (0.2.10 989a080 -> 0.2.10 7b760c5): Installing...
  • Updating nagisa (0.2.10 989a080 -> 0.2.10 7b760c5)

Writing lock file

Wheel request for Python 3.8

Hello, thank you for maintaining the awesome toolkit!

I think we cannot install nagisa by pip install nagisa on Python>=3.8.
This is because:

  • (a) dynet uses the old URL for eigen (clab/dynet#1616). Some commits (e.g. clab/dynet@b800ed0) were pushed for this problem but no-release including them is available.
  • (b) nagisa doesn't provide wheel for the latest versions of Python. If someone wants to install nagisa on Python<=3.7, it works well as wheels are uploaded to https://pypi.org/project/nagisa/#files. However, for Python>=3.8, pip will install nagisa from the source. This may not work well because of the problem (a).

The full output of pip install nagisa on Python3.8: https://gist.github.com/himkt/1bc75b83f1735535c4df0b952f352bf6

Suppress output messages

Currently, every time I import nagisa, messages regarding nagisa and dynet appear.

Is there an option/argument to suppress these output messages?

request: comparison to other tokenizers/PoS taggers

Could you include some notes briefly comparing this to other parses like Mecab? Mecab includes a comparison to other tokenizers/parsers. I think users would greatly benefit from knowing things like parsing speed comparisons, accuracy, and other slight differences/nuances/use cases.

Pip/pip3 install nagisa Error

Hello @taishi-i when i am trying to pip install nagisa getting below error. I tried to install through conda.

Windows7
C:\Users\SAIKIRAN>python --version
Python 3.8.3

Error:
ERROR: Command errored out with exit status 1: 'c:\users\saikiran\appdata\local\programs\python\python38\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0
= '"'"'C:\Users\SAIKIRAN\AppData\Local\Temp\pip-install-a31d0hp1\DyNet\setup.py'"'"'; file='"'"'C:\Users\SAIKIRAN\AppData\Local\Temp\pip-install-a31d0
1\DyNet\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '
"'exec'"'"'))' install --record 'C:\Users\SAIKIRAN\AppData\Local\Temp\pip-record-mg2btvbb\install-record.txt' --single-version-externally-managed --compile Check the lo
for full command output.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.