Coder Social home page Coder Social logo

aj-naik / text-summarization Goto Github PK

View Code? Open in Web Editor NEW
84.0 2.0 6.0 174 KB

Abstractive and Extractive Text summarization using Transformers.

License: MIT License

Jupyter Notebook 99.29% Python 0.64% Dockerfile 0.07%
text-summarization transformers pegasus flask fastapi api xlnet streamlit abstractive-summarization extractive-summarization

text-summarization's Introduction

Text-Summarization

Abstractive and Extractive Text summarization Transformer model and API.

Project History

I wanted to create an abstractive text summarization app as a tool to help in university studies. Researched and tried various models for text summarization including LSTMS and RNNs etc. The output was okay enough from a project point of view but not good enough for actual use case. Hence I decided to go with Transformers which produce good enough summary for real world use case. I used T5, Pegasus Longformer2RoBerta, BART and LED . According to my tests the models surprisingly, Pegasus produced better output than the other two. Longformer2RobBerta should have been the best model as it is meant to be used for summarization of long documents but the output produced wasn't upto the mark. BART and LED also gave decentish outputs. Overall Pegasus provided a good abstractive summary

Also tried a few extractive based transformer models like BERT, GPT2, XLNet. The output was almost indistingushible from a human summary.

Project

  1. 'src' directory contains 3 sub directories:
  • 'abstractive' which contains notebooks for T5, Pegasus, Longformer2RoBerta, BART and LED abstractive summarization models.
  • 'extractive' which contains BERT, GPT2 and XLNet extractive summarization models.
  1. 'prototype' directory contains a web app prototype created using Streamlit framework (Used T5) for testing purposes. To run it locally:-
    • Git Clone repo
    • Go to 'prototype' directory, open command prompt there and run 'streamlit run app.py'
  2. 'app' directory contains an API created for both Abstractive and Extractive (Pegasus and XLNet) summaries. To test API locally:
  • Run pip install -r requirements.txt to install all dependencies
  • Open terminal in project directory and run uvicorn app.main:app --reload
  • After the application startup is completed, go to localhost:8000/docs to try it out

Note:-

  • API will soon be deployed to cloud for inference and then integrated into FLASK application as direct usage of transformer leads to timeout.
  • Dont copy paste 2 paras directly while testing. Remove all instances of new line so as to convert text to 1 continuous paragraph. Otherwise it will lead to Error 422.

Tech Used

These are the libraries and technologies used or will be used in the project.

  1. PyTorch
  2. Transformers Library
  3. Streamlit
  4. Flask (Work in Progress)
  5. FastAPI

To Do

  1. Create a web app using Flask and host on cloud platforms for easy usage. (Done)
  2. Build a chrome extension for use in web site (More portable and faster than web app). (WIP)

text-summarization's People

Contributors

aj-naik avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.