Coder Social home page Coder Social logo

text_summarization_nlp's Introduction

French Text Summarization APM

Introduction

This Module is a Summarizer using Hugging face's Transformers library, which provides general-purpose architectures for NLP, in particual the BERT architecture.

It can be used with two NLP pre-trained french BERT based models:

There are 3 type of extractive summarization techniques that can be used in this module:

Mean Summarization:

This technique is the simplest one. It computes the mean embedding of the text and return the top closer sentences to compose the summary.

Clustering Summarization:

The clustering summarization model acts as the mean summarization one, only here the summarizer performs a clustering algorithm (K-means) on the data embedding first. nb_clusers centroids embedding are computed, one for each cluster. Then nb_top closer sentences are selected for each cluster to compose the summary. With this method, a cluster labels in the 2D space after TSNE dimension reduction can be visualized.

Graph Summary:

This third summarization method makes a similarity graph between the different lines of the text data. Then, it calculates a score for each sentence using the pagerank algorithm. This score is used to produce the final summary. To use this method, call the graph_summary method on the summarizer. Once again, you can chose how may sentence you want for the summary with the nb_sentences parameter.

Usage

Requirements

The code is designed to run on Python 3 and works with pytorch 1.6. Some major dependencies is needed to be installed :

pip install transformers
python3 -m spacy download fr_core_news_md

Running

python3 main.py --text_path=path/to/text --model='flaubert' --method='clustering' --nb_sentences=5 

You can choose flaubert or camembert for the model and clustering , mean or graph for the summarization methods.

Demos

The original text can be seen in the first part of this article. The following text is the summary produced by graph method :


La Terre est la troisième planète par ordre d'éloignement au Soleil et la cinquième plus grande aussi bien par la masse que le diamètre du Système solaire.
L'axe de rotation de la Terre possède une inclinaison de 23°, ce qui cause l'apparition des saisons.
Une combinaison de facteurs tels que la distance de la Terre au Soleil (environ 150 millions de kilomètres, aussi appelée unité astronomique), son atmosphère, sa couche d'ozone, son champ magnétique et son évolution géologique ont permis à la vie d'évoluer et de se développer.
Elle est la planète la plus dense du Système solaire ainsi que la plus grande et massive des quatre planètes telluriques.
La structure interne de la Terre est géologiquement active, le noyau interne solide et le noyau externe liquide (composés tous deux essentiellement de fer) permettant notamment de générer le champ magnétique terrestre par effet dynamo et la convection du manteau terrestre (composé de roches silicatées) étant la cause de la tectonique des plaques

text_summarization_nlp's People

Contributors

ialifinaritra avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.