Coder Social home page Coder Social logo

nlp_project's Introduction

This is my NLP project including many sub-projects,using Python.


Text classification and keyword extraction based on abstracts

relative link: Text classification and keyword extraction based on abstracts

This is my first NLP project,not perfect but interesting.

  • note is my markdown.

  • baseline1 is the traditional baseline of the project,running on the Baidu AI Studio(relative link),and this is the local version.

  • NLP_baseline is a series of baseline,transmitting different classifiers including the Logistic Regression,the Support Vector Machine and the Random Forest Classifier. Based on the classifiers above,fine-tune the parameters with parameter_tuning.py baseline_tuning.py.

    According to the score given by the platform,the fine-tuned Logistic Regression model(AKA fine-tuned baseline) performs best up to now,reaching 0.99401.

    The official provides another dataset: testB.csv on 24th,July. The dataset remove the column Keywords. Thus, I update baseline2 into baseline3 to fix the dataset

  • NLP_upper is the upper project,using the BERT model from transformers to solve the classify-problem.

    Regretfully, my local environment couldn't support the project(my poor GTX1650 4GB).

    SOLUTION: Run the project on Ali Cloud(not success yet)<---It's still a good solution

    However,this project has run for 26 epochs before I stopped the interpreter and the score was unsatisfactory.<---maybe overfitting

    Set the epoch=10,and the model works well,accuracy reaching 0.9850.<---for task 1

    The latest version of NLP_upper is a complete version. It uses the BERT model to solve two tasks compared with only one in last version. The result is quite good but a bit late :).

  • NLP_chatGLM is the project using the LLM,leveraging chatGLM in the case of the stability of the connection. However,using API may casuse the problem that the input including sensitive words stops the program,emphasizing the essence of training the LLM locally.


ChatGPT-generated Text Tester

relative link: ChatGPT-generated Text Tester

This is a program that identifies whether the content is generated by GPT.

  • note is my markdown

  • baseline is the baseline of this sub-project, it has an average level, using the Logistic Regression.

  • upper is the upper project,using the TF-IDF to classify the contents

  • bert is another solution using the BERT model and it's the best model up to now

  • chatGLM_api is a failed project,but it's not meaningless.

    For one thing, the LLM performs well in classifying; for another thing, using the api is not a good idea. From my point of view, the solution is to build the training set and to fine-tune the LLM using the GPU.

  • ernie performs best. I use the Ernie model and Paddle environment. The project is run on the AI Studio. Set the epochs=100 and run all cells

To be continued...

nlp_project's People

Contributors

jiang-wu-19 avatar

Stargazers

jist avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.