Coder Social home page Coder Social logo

ai21labs / sense-bert Goto Github PK

View Code? Open in Web Editor NEW
41.0 6.0 9.0 20 KB

This is the code for loading the SenseBERT model, described in our paper from ACL 2020.

License: Apache License 2.0

Python 100.00%
bert-model nlp bert-embeddings bert bert-models nlp-machine-learning natural-language-processing natural-language-understanding

sense-bert's Introduction

SenseBERT: Driving Some Sense into BERT

This is the code for loading the SenseBERT model, described in our paper from ACL 2020.

Available models

We made two SenseBERT models public:

  • sensebert-base-uncased
  • sensebert-large-uncased

These models have the same number of parameters as Google's BERT models, except for the following (both changes are described in our paper thoroughly):

  1. We use a larger vocabulary.
  2. We add a supersense prediction head. The sense embeddings are also used as inputs to the model.

Requirements

  • Python 3.7 or higher
  • TensorFlow 1.15
  • NLTK

You can install these using:

pip install -r requirements.txt

Usage

Supersense and MLM predictions

This is an example for making Masked Language Modeling (MLM) and supersense predictions based on SenseBERT:

import tensorflow as tf
from sensebert import SenseBert

with tf.Session() as session:
    sensebert_model = SenseBert("sensebert-base-uncased", session=session)  # or sensebert-large-uncased
    input_ids, input_mask = sensebert_model.tokenize(["I went to the store to buy some groceries.", "The store was closed."])
    model_outputs = sensebert_model.run(input_ids, input_mask)

contextualized_embeddings, mlm_logits, supersense_logits = model_outputs  # these are NumPy arrays

Note that both vocabularies (tokens and supersenses) are available for you via sensebert_model.tokenizer. For example, in order to predict the supersense of the word 'groceries' in the above example, you may run

import numpy as np

print(sensebert_model.tokenizer.convert_ids_to_senses([np.argmax(supersense_logits[0][9])]))

This will output:

['noun.artifact']

Fine-tuning

If you want to fine-tune SenseBERT, run

sensebert_model = SenseBert("sensebert-base-uncased", session=session)  # or sensebert-large-uncased

sensebert_model.model.input_ids and sensebert_model.model.input_mask are the model's tensorflow placeholders, and sensebert_model.model.contextualized_embeddings, sensebert_model.model.mlm_logits and sensebert_model.model.supersense_logits are the output tensors. You can take any of these three tensors and build your graph on top of them.

Download SenseBERT to your local machine

In order to avoid high latency, we recommend to download the model once to your local machine. Our code also supports initializations from local directories. For that, you will need to install gsutil. Once you have it, run one of the following

gsutil -m cp -r gs://ai21-public-models/sensebert-base-uncased PATH/TO/DIR
gsutil -m cp -r gs://ai21-public-models/sensebert-large-uncased PATH/TO/DIR

Then you can go ahead and use our code exactly as before, with

sensebert_model = SenseBert("PATH/TO/DIR", session=session)

Citation

If you use our model for your research, please cite our paper:

@inproceedings{levine-etal-2020-sensebert,
   title = "{S}ense{BERT}: Driving Some Sense into {BERT}",
   author = "Levine, Yoav  and
     Lenz, Barak  and
     Dagan, Or  and
     Ram, Ori  and
     Padnos, Dan  and
     Sharir, Or  and
     Shalev-Shwartz, Shai  and
     Shashua, Amnon  and
     Shoham, Yoav",
   booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
   month = jul,
   year = "2020",
   address = "Online",
   publisher = "Association for Computational Linguistics",
   url = "https://www.aclweb.org/anthology/2020.acl-main.423",
   pages = "4656--4667",
}

sense-bert's People

Contributors

barakp-ai21 avatar dependabot[bot] avatar oriram avatar pelegb avatar yoel-zeldes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

sense-bert's Issues

Upload on Huggingface

Hey,

A very interesting paper, thanks a lot for this contribution!
Do you have plans to upload a model on huggingface?

Best,
Max

Tensorflow version

Hi! Can you please release a version compatible with newer TensorFlow?
Thanks!

Word in Context

Hello. Thank you very much for sharing your code. I'm a student and I want to reproduce your results on WiC competition. As I understood, in the inference you get two supersenses for the word in different contexts, after that you compare these two supersenses and decide if they are in the same meaning or not. The question is, do you finetune the model, if so, how is it done?
As I see it, you have to take supersense logits, after that compare them, may be with CosineSimilarity and get the loss.

Code for fine-tuning the model for sequence classification

Hi. Thank you very much for sharing the code for loading your models. I am trying to use the SenseBert model to get the sense of some words in tweets and for that, I want to fine-tune your model for a simple sequence classification that classifies tweets into offensive and non-offensive. Could you share the code that would be able to do this or tell me how I could do this?

Thanks in advance!

Cased Version?

Hi,

As I know, word sense is case-sensitive.

Have you ever trained the cased version? Could you please share the checkpoint?

Aligning tokens with supersenses?

Thank you very much for sharing the code for your excellent paper.
Pardon me for asking this newbie question: how to align the tokens in the input sentence with the supersenses outputted from the model?
For example, the words in the sentence "I went to the store to buy some groceries." do not appear to be aligned with the following senses

['noun.person']
['verb.communication']
['verb.social']
['verb.communication']
['noun.artifact']
['noun.artifact']
['verb.communication']
['verb.cognition']
['noun.artifact']
['noun.artifact']
['adv.all']
['adv.all']

as printed using the following code:

for i, id_ in enumerate(input_ids[0]):
  print(sensebert_model.tokenizer.convert_ids_to_senses([np.argmax(supersense_logits[0][i])]))

Could you please provide some example code for how to do this properly? Thanks a lot in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.