Coder Social home page Coder Social logo

trendingtechnology / bunkai Goto Github PK

View Code? Open in Web Editor NEW

This project forked from megagonlabs/bunkai

0.0 2.0 0.0 285 KB

Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)

Home Page: https://pypi.org/project/bunkai/

License: Apache License 2.0

Python 97.66% Makefile 2.34%

bunkai's Introduction

Bunkai

PyPI version Python Versions License Downloads

CircleCI CodeQL Maintainability Test Coverage markdownlint jsonlint yamllint

Bunkai is a sentence boundary (SB) disambiguation tool for Japanese texts.
Bunkaiは日本語文境界判定器です.

Quick Start

Install

$ pip install -U bunkai

Disambiguation without Models

$ echo -e '宿を予約しました♪!まだ2ヶ月も先だけど。早すぎかな(笑)楽しみです★\n2文書目の先頭行です。▁改行はU+2581で表現します。' \
    | bunkai
宿を予約しました♪!│まだ2ヶ月も先だけど。│早すぎかな(笑)│楽しみです★
2文書目の先頭行です。▁│改行はU+2581で表現します。
  • Feed a document as one line by using (U+2581) for line breaks.
    1行は1つの文書を表します.文書中の改行は (U+2581) で与えてください.
  • The output shows sentence boundaries with (U+2502).
    出力では文境界は (U+2502) で表示されます.

Disambiguation for Line Breaks with a Model

If you want to disambiguate sentence boundaries for line breaks, please add a --model option with the path to the model.
改行記号に対しても文境界判定を行いたい場合は,--modelオプションを与える必要があります.

First time, please setup a model. It will take some time.
はじめにモデルをセットアップする必要があります.セットアップには少々時間がかかります.

$ bunkai --model bunkai-model-directory --setup

Then, please designate the directory.
そしてモデルを指定して動かしてください.

$ echo -e "文の途中で改行を▁入れる文章ってありますよね▁それも対象です。" | bunkai --model bunkai-model-directory
文の途中で改行を▁入れる文章ってありますよね▁│それも対象です。

Morphological Analysis Result

You can get morphological analysis results with --ma option.
--maオプションを付与すると形態素解析結果が得られます.

$ echo -e '形態素解析し▁ます。結果を 表示します!' | bunkai --ma
形態素	名詞,一般,*,*,*,*,形態素,ケイタイソ,ケイタイソ
解析	名詞,サ変接続,*,*,*,*,解析,カイセキ,カイセキ
し	動詞,自立,*,*,サ変・スル,連用形,する,シ,シ

EOS
ます	助動詞,*,*,*,特殊・マス,基本形,ます,マス,マス
。	記号,句点,*,*,*,*,。,。,。
EOS
結果	名詞,副詞可能,*,*,*,*,結果,ケッカ,ケッカ
を	助詞,格助詞,一般,*,*,*,を,ヲ,ヲ
 	記号,空白,*,*,*,*, ,*,*
表示	名詞,サ変接続,*,*,*,*,表示,ヒョウジ,ヒョージ
し	動詞,自立,*,*,サ変・スル,連用形,する,シ,シ
ます	助動詞,*,*,*,特殊・マス,基本形,ます,マス,マス
!	記号,一般,*,*,*,*,!,!,!
EOS

Python Library

You can also use Bunkai as Python library.
BunkaiはPythonライブラリとしても使えます.

from bunkai import Bunkai
bunkai = Bunkai()
for sentence in bunkai("はい。このようにpythonライブラリとしても使えます!"):
    print(sentence)

For more information, see examples.
ほかの例はexamplesをご覧ください.

Documents

References

  • Yuta Hayashibe and Kensuke Mitsuzawa. Sentence Boundary Detection on Line Breaks in Japanese. Proceedings of The 6th Workshop on Noisy User-generated Text (W-NUT 2020), pp.71-75. November 2020. [PDF] [bib]

License

Apache License 2.0

bunkai's People

Contributors

dependabot[bot] avatar shirayu avatar t-yamamura avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.