Coder Social home page Coder Social logo

jianlog / open-korean-text-python Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jonghwanhyeon/open-korean-text-python

0.0 0.0 0.0 6.44 MB

Python interface to Open Korean Text Processor inspired by KoNLPy

License: Apache License 2.0

Python 100.00%

open-korean-text-python's Introduction

Open Korean Text Python

Python interface to Open Korean Text Processor inspired by KoNLPy

Requirements

  • Python 3+
  • Java 8+

Installation

pip install open-korean-text-python

Usage

Normalizing text

openkoreantext.normalize('안녕하세욬ㅋㅋㅋㅋ') # 안녕하세요ㅋㅋㅋ

Tagging part-of-speech

openkoreantext.pos('대한민국은 민주공화국이다.')
# [ ('대한민국', 'Noun'), ('은', 'Josa'), ('민주공화국', 'Noun'), ('이다', 'Josa'), ('.', 'Punctuation') ]

Extracting morphemes

openkoreantext.morphs('대한민국의 주권은 국민에게 있고, 모든 권력은 국민으로부터 나온다.')
# [ '대한민국', '의', '주권', '은', '국민', '에게', '있고', ',', '모든', '권력', '은', '국민', '으로부터', '나온다', '.' ]

Extracting nouns

openkoreantext.nouns('대한민국은 민주공화국이다.')
# [ '대한민국', '민주공화국' ]

Extracting phrases

openkoreantext.phrases('불법 토토 신고하는 방법 #포상금', filter_spam=False, include_hashtags=True)
# [ '불법', '불법 토토', '불법 토토 신고', '불법 토토 신고하는 방법', '토토', '신고', '방법', '#포상금' ]

Splitting sentences

openkoreantext.sentences('대한민국은 민주공화국이다. 대한민국의 주권은 국민에게 있고, 모든 권력은 국민으로부터 나온다.')
# [ '대한민국은 민주공화국이다.', '대한민국의 주권은 국민에게 있고, 모든 권력은 국민으로부터 나온다.' ]

Adding words to dictionary

openkoreantext.add_words_to_dictionary('Noun', [ '앎읾슮', '앎멞릶칾놂' ])
openkoreantext.add_words_to_dictionary('Adverb', '살랑설렁')

API

openkoreantext.normalize(text)

Normalizes text. Returns a normalized text

Parameter

  • text: text to normalize

openkoreantext.pos(text, stem=False)

Tokenizes text into morphemes and tags their part-of-speech. Returns a list of pairs of morpheme and part-of-speech.

Parameters

  • text: text to tokenize
  • stem: stem morphemes if True

openkoreantext.morphs(text, stem=False)

Extract morphemes from text. Returns a list of morphemes.

Parameters

  • text: text to extract morphemes
  • stem: stem morphemes if True

openkoreantext.nouns(text)

Extracts nouns from text. Returns a list of nouns.

Parameter

  • text: text to extract nouns

openkoreantext.phrases(text, filter_spam=True, include_hashtags=True)

Extracts phrases from text. Returns a list of phrases.

Parameters

  • text: text to extract phrases
  • filter_spam: ignore spam words if True
  • include_hashtags: include hashtags if True

openkoreantext.sentences(text)

Splits text into sentences. Returns a list of sentences

Parameter

  • text: text to split into sentences

openkoreantext.add_words_to_dictionary(pos, words)

Adds user-defined words to the dictionary

Parameters

  • pos: part-of-speech of words (Noun, Verb, Adjective, Adverb, Determiner, Exclamation, Josa, Eomi, PreEomi, Conjunction, Modifier, VerbPrefix, Suffix)
  • words: list of words to add

open-korean-text-python's People

Contributors

jonghwanhyeon avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.