Coder Social home page Coder Social logo

bert-ner-kor's Introduction

BERT-NER-KO

BERT for Korean NER (TensorFlow version 1.x)

This is an implementation of Korean NER using BERT. The code is a modified version of BERT-NER (https://github.com/kyzhouhzau/BERT-NER). A new corpus (raw data collected from Naver Knowledge iN) for medical diagnosis information extraction has been used as the default data.

Usage:

  1. Create a directory "BERT".

  2. Go the the directory.

cd BERT
  1. Clone the repository "bert".
git clone https://github.com/google-research/bert.git
  1. Download the files and folder of this project to the directory "BERT".

  2. Create a directory "pretrained" in the directory "bert".

  3. Download and unzip the pretrained model, "BERT-Base, Multilingual Cased" https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip in the directory "pretrained"

  4. Create a directory "output". The structure is as follows:

BERT
|____ bert
|____ data
      |____ diagnosis   
|____ BERT_NER_DS.py
|____ tf_metrics.py
|____ output
  1. Run BERT_NER_DS.py
python3 BERT_NER_DS.py  \
  --task_name="NER"  \
  --do_lower_case=False  \
  --do_train=True   \
  --do_eval=True   \
  --do_predict=True  \
  --do_demo=False  \
  --data_dir=./data/diagnosis   \
  --input_file_train=train_diagnosis_shuffle1.txt   \
  --input_file_eval=eval_diagnosis_shuffle1.txt  \
  --vocab_file=./bert/pretrained/multi_cased_L-12_H-768_A-12/vocab.txt  \
  --bert_config_file=./bert/pretrained/multi_cased_L-12_H-768_A-12/bert_config.json  \
  --init_checkpoint=./bert/pretrained/multi_cased_L-12_H-768_A-12/bert_model.ckpt   \
  --max_seq_length=128   \
  --train_batch_size=32   \
  --learning_rate=2e-5   \
  --num_train_epochs=16.0   \
  --save_checkpoints_steps=1000   \
  --output_dir=./output

Citation

Kim, YM., Lee, TH. Korean clinical entity recognition from diagnosis text using BERT. BMC Med Inform Decis Mak 20, 242 (2020). https://doi.org/10.1186/s12911-020-01241-8

bert-ner-kor's People

Contributors

labihem avatar taehooonlee avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.