Coder Social home page Coder Social logo

nseg's Introduction

Node.js Version of MMSG for Chinese Word Segmentation

Build Status

MMSG originally invented by Chih-Hao Tsai is a very popular Chinese word segmentation algorithm. Many implementations are available on different platforms including Python, Java, etc.

This package provide Node.js version of MMSG algorithm. The API is async and evented sytle.

So far this package is still in developing, but the basic functionalities are ready.

Install

Use nseg in your own package

$ npm install nseg

Or if you want to use nseg command

$ npm install -g nseg

Command line

After intsalling globally, you can use nseg command to:

help

$ nseg help

segment text using default dictionary

$ nseg segf -i ~/project/text/shi.txt -o ~/project/output/shi.txt
$ nseg segd -i ~/project/text -o ~/project/output

build user dictionary for loading aftermath

$ nseg dict ~/project/data/dict.js ~/dict/dict1.txt ~/dict/dict2.txt
$ nseg dict ~/project/data/dict.js ~/dict

build character-frequecy map for loading aftermath

$ nseg freq ~/project/data/freq.js ~/freq/data1.csv ~/freq/data2.csv
$ nseg freq ~/project/data/freq.js ~/freq

segment text using customized settings

$ nseg segd -d ~/project/data/dict.js -f ~/project/data/freq.js -i ~/project/text -o ~/project/output
$ nseg segd -l ~/project/lex/ -i ~/project/text -o ~/project/output

check the existence of a word

$ nseg check "石狮"
$ nseg check -d ~/project/data/dict.js "石狮"

Using nseg in program

Preparation

  • (Optional) build your own dictionay and freqency map
  • (Optional) create your own lexical handler for special text pattern

Examples

Stream-pipe style

var dict  = require('../data/dict'),
    freq  = require('../data/freq'),
    date  = require('../lex/datetime'),
    sina  = require('../lex/sina');

var opts  = {
        dict: dict,
        freq: freq,
        lexers: [date, sina],
    };

var nseg = require('nseg').evented(opts);

var strmOut = fs.createWriteStream(target, {flags: 'w+', encoding: 'utf-8'}),
    strmIn  = fs.createReadStream(input);

var pipe = nseg(strmIn, strmOut);
pipe.on('error', function (err) {
    console.log('error', err);
});

pipe.start();

Normal callback style (buggy)

var dict  = require('../data/dict'),
    freq  = require('../data/freq'),
    date  = require('../lex/datetime'),
    sina  = require('../lex/sina');

var opts  = {
        dict: dict,
        freq: freq,
        lexers: [date, sina],
    };

var nseg = require('nseg').normal(opts);

nseg('研究生源计划', function (result) {
    console.log(result);
});

Lexical handler customization

Lexical handlers support definitions by acceptor functions.

An acceptor function is a function with signature of

function accept(curchar, undecidedprefix, nextchar)

And return value should be one value from -1, 0, 1 on case of:

  • -1: we can decide a negitive result for the current character.
  • 0 : we should read more characters.
  • 1 : we can decide a negitive result for the current character.

License

MIT License

nseg's People

Contributors

mountain avatar

Stargazers

Lance Pollard avatar Troy Shu avatar  avatar 意大利小黄蜂 avatar kangkang's github avatar Alecyrus avatar  avatar 白墨 avatar moxm.com avatar Giovanni Gaglione avatar 尹挚 avatar  avatar LennyHu avatar Hill Liu avatar panxw avatar @Lonjoy avatar Ryan avatar  avatar snape avatar  avatar somename123 avatar tangtes avatar  avatar Aaron avatar Zack Young avatar M+ avatar Fangzhou Li avatar  avatar feixiang avatar  avatar Caesar73 avatar wisetwo avatar monokeroslun avatar Angus H. avatar Adam avatar  avatar butonly avatar de meng avatar Ken Huang avatar xxx avatar Scorpius avatar Shine Chen avatar mountainmoon avatar  avatar Seven Yu avatar arover avatar Wang Liang avatar  avatar Kai avatar angck avatar Zhen  Li avatar Isaac Huang avatar Nik avatar jackpan avatar 李亚川 avatar wghust avatar Keenwon avatar Steven Shen avatar Sebastian Godelet avatar  avatar ianva avatar RobinQu avatar KasuganoSora avatar  avatar Jim Liu 宝玉 avatar VinsonHuang-D avatar vincode avatar Rick Gigger avatar Lopez Hugo avatar Airyland avatar Laurian Gridinoc avatar Mr.Q avatar Lin avatar Tad avatar  avatar Finian avatar Xu avatar Yuanliang Xie avatar hiver avatar Kamikat avatar BruceNiu avatar Hanyu Xiao avatar  avatar alex00zoe avatar Niel de la Rouviere avatar LEI Zongmin avatar Clément Renaud avatar 一回 avatar Dreampuf avatar Fossil avatar 橘子 avatar lyhiving avatar Arron avatar Wu Yuntao avatar  avatar  avatar

Watchers

 avatar david sands avatar James Cloos avatar Jia-Rui Lin avatar Sebastian Godelet avatar Wang Liang avatar Gamal DeWeever avatar Zhen  Li avatar  avatar 再见松岛枫 avatar Fractal avatar  avatar  avatar

nseg's Issues

Use nseg in regular javascript

Hi, do you know how much work it would take to get this to work in normal javascript? I would like to use this in a phonegap app if possible.

Sina lex unavailable

On the following variable declaration:

var sina = require('../lex/sina');

We don't have a "sina" lexer in the repository or package build.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.