Coder Social home page Coder Social logo

jonathanstefanov / cefr_classifier_french Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 0.0 590 KB

The CEFR Level Classifier Project is a AI Streamlit-based application that utilizes a two-phase Camembert model to accurately classify texts into six CEFR language proficiency levels. This innovative tool offers user-friendly navigation for both training and text classification.

Python 100.00%
bert-model camembert cefr embeddings french nlp ai

cefr_classifier_french's Introduction

Welcome to the CEFR Level Classifier Project! ๐Ÿš€

Run in Saturn Cloud

This application, built using Streamlit ๐ŸŒ, leverages the advanced capabilities of a 3-phase Camembert AI model ๐Ÿง€ to classify texts into the six levels of the Common European Framework of Reference for Languages (CEFR): A1, A2, B1, B2, C1, and C2.

Video Presentation

Video Presentation

What is CEFR? ๐Ÿ“˜

The CEFR is an internationally recognized standard for describing language ability. It's widely used across the globe to assess and describe the language proficiency of learners.

Installation via pip ๐Ÿ“ฆ

Easily install the CEFR Level Classifier package with pip:

pip install CEFR-Classifier-French

Example Usage ๐ŸŒŸ

Here's a quick example of how to use the CEFR-Classifier-French package to predict the CEFR level of a French sentence:

from CEFR_Classifier_French.inference.predict import Predictor

predictor = Predictor()

# Predict the CEFR level of a text
text = "Je ne sais pas quoi dire."

level = predictor.inference_sentence(text)

print("Level of the sentence is -> ", level)

How to Run the GUI ๐Ÿš€

On Your Own Computer

  1. Clone the Repository:
git clone [email protected]:JonathanStefanov/CEFR_Classifier_French.git
  1. Navigate to the Folder:
cd CEFR_Classifier_French
  1. Install the Requirements:
pip install -r requirements.txt
  1. Run the Streamlit App:
streamlit run CEFR_Classifier_French/app.py

On Saturn Cloud

  • Why Use Saturn Cloud?: Ideal if you don't have a GPU. Offers 10 hours for free.
  • Steps:
  1. Click on the "Run in Saturn Cloud" Button at the top of this README.
  2. Create the CEFR_French Resource and click on Run. All necessary configurations are pre-set.

About Our Model ๐Ÿค–

Our application utilizes the Camembert model, a cutting-edge language processing model, structured in a unique three-phase system to accurately assess and classify texts:

  1. Phase 1 - Initial Classification: This phase classifies texts into broad categories: A, B, or C.
  2. Phase 2 - Detailed Assessment:
    • Phase 2 A: Distinguishes between A1 and A2 levels for texts classified as 'A' in Phase 1.
    • Phase 2 B: Distinguishes between B1 and B2 levels for texts classified as 'A' in Phase 1.
    • Phase 2 C: Distinguishes between C1 and C2 levels for texts classified as 'A' in Phase 1.

This multi-phase approach ensures precise and nuanced classification in line with CEFR standards.

How to Use the App ๐Ÿ–ฑ๏ธ

  1. Navigation: Use the sidebar to easily navigate through the application.
  2. Training the Model: Head over to the Training section ๐Ÿ‘จโ€๐Ÿซ. Here, you can train the model with your dataset, allowing it to learn and adapt to your specific language use cases.
  3. Text Classification: Visit the Inference section ๐Ÿ” to input text. The app will analyze the text and provide you with its CEFR level classification.

Our Model's Evolution ๐Ÿค–

Initial Attempts

  • Logistic Regression Approach: We began by analyzing sentence structure - counting length, verbs, punctuation, and checking for passive sentences. Despite these efforts, a logistic regression model yielded unsatisfactory results.

Transition to Camembert Model

  • First Camembert Trial: Shifting gears, we implemented a Camembert language model. Although it improved accuracy to 58%, the model's size and training speed were concerning.

Final, Optimized Model

  • Two-Phase Camembert System: Our breakthrough came with a refined version of the Camembert model, structured in two phases for precise, efficient classification. This significantly accelerated training times without compromising accuracy. It even increased it to 60,2% with the same dataset.

Advanced Metrics

โš ๏ธ Attention: Here the accuracy is lower than in the Kaggle file because I have had to retrain the data with an 85-15 split for train and test in order to be able to recompute the F1 score, accuracy, precision, and recall as well as the confusion matrix. This is why the accuracy is 5% lower here.

Metric Value
Accuracy 0.5402777777777777
F1 Score 0.5399223400401946
Recall 0.5402777777777777
Precision 0.5455209131671773

And now the confusion matrix:

[[78 22 11 1 0 0] [22 65 39 2 0 0] [11 27 72 14 1 1] [ 2 2 13 70 24 10] [ 1 1 12 26 56 24] [ 0 0 5 11 49 48]]

Explore More ๐Ÿ”—

Interested in learning more about this project? Looking for source code or detailed documentation? Visit our GitHub Repository ๐ŸŒŸ for all the resources you need.

We hope you enjoy exploring and using our CEFR Level Classifier! Happy Classifying! ๐ŸŽ‰

Feedback and Contributions

Your feedback is valuable to us! If you have suggestions or want to contribute to this project, please feel free to open an issue or submit a pull request on our GitHub repository. Let's make language learning and classification better, together!

License

This project is licensed under the GNU General Public License (GPL). This license ensures users have the freedom to share and change all versions of a program to make sure it remains free software for all its users. For more details, see the LICENSE file in the repository.

cefr_classifier_french's People

Contributors

jonathanstefanov avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.