Coder Social home page Coder Social logo

ncov_sentence_simi's Introduction

nCoV-2019 related sentence similarity

If useful for you, maybe a star to encourage our work.

introduce

ERNIE , RoBerta based model for sentence similarity

For example:

387,支原体肺炎,支原体肺炎的症状及治疗方法是什么,肺炎衣原体与肺炎支原体有什么区别?,0
388,支原体肺炎,支原体肺炎的症状及治疗方法是什么,肺炎支原体培养及药敏的检验单怎么看?,0
389,支原体肺炎,支原体肺炎的症状及治疗方法是什么,小儿支原体与小儿支原体肺炎相同吗?,0
390,支原体肺炎,宝宝支原体肺炎感染的症状有哪些?,宝宝肺炎支原体感染的症状是什么?,1
391,支原体肺炎,宝宝支原体肺炎感染的症状有哪些?,宝宝支原体肺炎感染有什么症状?,1

95.2 acc online (simply choose the 1st fold, 1/6)

  • ERNIE 1.0
  • Nadam with 2.0*1e-5 lr
  • OHEM CE, with label smoothing
  • cosine lr scheduler with warmup
  • clean noise data by an overfitted model

more tricks maybe

  • simply change the model

  • add any 'word2vec' features

  • split into multipiece data,get N bert,
    using multiple feature to train a tree based
    model, lightGBM, Xgboost...

  • for those hard example, maybe add the nearest sentence
    (pair with label) for reference info, into bert

  • pseudo label

  • more open data(e.g ping an CHIP 2019)

  • ...

denpendency

  • opencv-python
  • pytorch >= 1.4
  • pandas
  • yacs
  • sklearn

prepare

train

you maye change the data path, have a look at train.py test.py

export PYTHONPATH=./
sh train_pipeline.sh

ref

https://tianchi.aliyun.com/competition/entrance/231776/introduction?spm=5176.12281949.1003.4.21eb2448atCLQk

ncov_sentence_simi's People

Contributors

lhwcv avatar

Stargazers

Haochun Wang avatar  avatar Maxwell-Jia avatar siyewy avatar yuanke avatar  avatar chenyunshan avatar momo avatar  avatar Weibin Chen avatar  avatar  avatar TwdcbiG avatar StarLib avatar  avatar  avatar Young Xu avatar Jack Mu avatar  avatar  avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.