Coder Social home page Coder Social logo

ultraicy / nlpcc_2018_task2_gec Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yingywang/nlpcc_2018_task2_gec

0.0 2.0 0.0 2.44 MB

Code for the paper: " A Sequence to Sequence Learning for Chinese Grammatical Error Correction" (NLPCC-18).

Shell 3.17% Python 92.97% C++ 2.32% C 0.15% Lua 1.40%

nlpcc_2018_task2_gec's Introduction

This is the code of our team (Zlbnlp) for the NLPCC 2018 Shared Task 2 Grammatical Error Correction.The paper is A Sequence to Sequence Learning for Chinese Grammatical Error Correction.

Usage

Prerequisites

  • python3.6
  • pytorch0.2.0 (use following commands to install from source)
export CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" # [anaconda root directory]
conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
conda install -c mingfeima mkldnn
conda install -c pytorch magma-cuda80

git clone https://github.com/pytorch/pytorch.git
cd pytorch
git reset --hard a03e5cb40938b6b3f3e6dbddf9cff8afdff72d1b
git submodule update --init
pip install -r requirements.txt
python setup.py install
cd CS2S+BPE+Emb/software/fairseq-py
pip install -r requirements.txt
python setup.py build 
python setup.py develop  

Data

The data and embeddings can be found in the Zlbnlp_data. You need manually split the whole dataset into two parts.

  • training dataset:contain 1,215,876 sentence pairs.Filepaths is CS2S+BPE+Emb/data/train.tok.src, CS2S+BPE+Emb/data/train.tok.trg
  • development dataset:contain 5k sentence pairs.Filepaths is CS2S+BPE+Emb/data/dev.tok.src, CS2S+BPE+Emb/data/dev.tok.trg
  • test data is source.txt.jieba.seg,using jieba toolkit.

Data processing

cd CS2S+BPE+Emb/training/
chmod +x preprocess.sh
./preprocess.sh

Training

  • Training command

The command below is what we used to train an model on the NLPCC-2018 Task 2 dataset.

./train_embed.sh

Decoding

The following is the command used to generate outputs and F0.5 score:

cd CS2S+BPE+Emb/
./run.sh ./data/source.txt.jieba.seg ./output/CS2S+BPE+Emb/ 0 ./training/models/mlconv_embed/model1
cd libgrass-ui/
./remove_spac_pkunlp_segment.sh 

nlpcc_2018_task2_gec's People

Contributors

styxjedi avatar renhongkai avatar tianlinyang avatar

Watchers

James Cloos avatar nick avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.