Coder Social home page Coder Social logo

fine_tune_bert_with_spacy3's Introduction

Fine_tune_BERT_with_spacy3

This is the code to reproduce the results shown in tutorial: https://towardsdatascience.com/how-to-fine-tune-bert-transformer-with-spacy-3-6a90bfe57647

First we convert the IOB file exported from the UBIAI annotation tool to spacy JSON:

!python -m spacy convert drive/MyDrive/train.tsv ./ -t json -n 1 -c iob
!python -m spacy convert drive/MyDrive/test.tsv ./ -t json -n 1 -c iob

After converting the training and dev files to JSON file, we need to convert them to spacy binary file:

!python -m spacy convert drive/MyDrive/train.json ./ -t spacy
!python -m spacy convert drive/MyDrive/test.json ./ -t spacy

Next we install spacy and transformer library pipeline:

pip install -U spacy
!python -m spacy download en_core_web_trf

Next we install the cuda:

!wget https://developer.nvidia.com/compute/cuda/9.2/Prod/local_installers/cuda-repo-ubuntu1604-9-2-local_9.2.88-1_amd64 -O cuda-repo-ubuntu1604-9-2-local_9.2.88-1_amd64.deb
!dpkg -i cuda-repo-ubuntu1604-9-2-local_9.2.88-1_amd64.deb
!apt-key add /var/cuda-repo-9-2-local/7fa2af80.pub
!apt-get update
!apt-get install cuda-9.2

Install pytorch:

pip install torch==1.7.1+cu92 torchvision==0.8.2+cu92 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

Install spacy and transformer packages:

!export CUDA_PATH="/usr/local/cuda-9.2"
!pip install -U spacy[cuda92,transformers]

Install cupy

!pip install cupy

Set the cuda path

!export LD_LIBRARY_PATH=$CUDA_PATH/lib64:$LD_LIBRARY_PATH

After updating the spacy config.cfg file with the training and test paths, we auto-fill the config file with the rest of the parameters that the BERT model will need

!python -m spacy init fill-config drive/MyDrive/config.cfg drive/MyDrive/config_spacy.cfg

Before launching the training, lets debug the config file to make sure everything is correct:

!python -m spacy debug data drive/MyDrive/config.cfg

Finally, we are ready to start the training:

!python -m spacy train -g 0 drive/MyDrive/config.cfg — output ./

After training, the model will be saved in a folder named model-best. Lets try to extract entities using the newly trained model:

nlp = spacy.load(“./model-best”)
text = [
'''Qualifications
- A thorough understanding of C# and .NET Core
- Knowledge of good database design and usage
- An understanding of NoSQL principles
- Excellent problem solving and critical thinking skills
- Curious about new technologies
- Experience building cloud hosted, scalable web services
- Azure experience is a plus
Requirements
- Bachelor's degree in Computer Science or related field
(Equivalent experience can substitute for earned educational qualifications)
- Minimum 4 years experience with C# and .NET
- Minimum 4 years overall experience in developing commercial software
'''
]
for doc in nlp.pipe(text, disable=["tagger", "parser"]):
    print([(ent.text, ent.label_) for ent in doc.ents])

If you have any questions or run into issues just email us at [email protected].

fine_tune_bert_with_spacy3's People

Contributors

ubiai avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.