Coder Social home page Coder Social logo

simcse-chinese-pytorch's Introduction

SimCSE-Chinese-Pytorch

SimCSE在中文上的复现,无监督 + 有监督

1. 背景

最近看了SimCSE这篇论文,便对论文做了pytorch版的复现和评测

2. 文件

> datasets		数据集文件夹
   > cnsd-snli
   > STS-B
> pretrained_model	各种预训练模型文件夹
> saved_model		微调之后保存的模型文件夹
  data_preprocess.py	snli数据集的数据预处理
  simcse_sup.py		有监督训练
  simcse_unsup.py	无监督训练

3. 使用

需要将公开数据集和预训练模型放到指定目录下, 并检查在代码中的位置是否对应

# 预训练模型目录
BERT = 'pretrained_model/bert_pytorch'
model_path = BERT 
# 微调后参数存放位置
SAVE_PATH = './saved_model/simcse_unsup.pt'
# 数据目录
SNIL_TRAIN = './datasets/cnsd-snli/train.txt'
STS_TRAIN = './datasets/STS-B/cnsd-sts-train.txt'
STS_DEV = './datasets/STS-B/cnsd-sts-dev.txt'
STS_TEST = './datasets/STS-B/cnsd-sts-test.txt'

数据预处理(需要先执行此文件):

python data_preprocess.py

无监督训练

python simcse_unsup.py

有监督训练

python simcse_sup.py

4. 下载

数据集:

预训练模型:

5. 测评

测评指标为spearman相关系数

无监督:batch_size=64,lr=1e-5,droupout_rate=0.3,pooling=cls, 抽样10000样本

模型 STS-B dev STS-B test
BERT 0.7308 0.6725
BERT-wwm 0.7229 0.6628
BERT-wwm-ext 0.7271 0.6669
RoBERTa-wwm-ext 0.7558 0.7141

有监督:batch_size=64,lr=1e-5,pooling=cls

模型 STS-B dev STS-B test 收敛所需样本数
BERT 0.8016 0.7624 23040
BERT-wwm 0.8022 0.7572 16640
BERT-wwm-ext 0.8081 0.7539 33280
RoBERTa-wwm-ext 0.8135 0.7763 38400

6. 参考

simcse-chinese-pytorch's People

Contributors

vdogmcgee avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.