Coder Social home page Coder Social logo

efficient_data_conversation's Introduction

Efficient Data Conversation

Chat with your data while uploading a pdf file and using a local LLM.

Table of Contents

PDF File Structure Support:

  1. Upcoming:Files with well organized tables i.e.: a single row/column ins not divided in multi row/column
  2. Usually Research Paper Structure:
    • Abstract
    • Intorduction
    • Background Works
    • Dataset
    • Methodology
    • Result Analysis
    • Discussion
    • Future Works
    • Conclusion
  3. No Image support for now
  4. Up coming: meta data support

Language Support:

  1. English
  2. Others are loading...

Key Dependencies:

  • Ollama with or without GPU
  • Sentence-transformers
  • Langchain

The models in use:

  1. Attempted Sentence Embedding, chosen on mainly MTEB leaderboard and personal experience:
  2. Attemtped LLMs, chosen based on Mistral-7b's acceptable performence for low resource devices:
    • Mistral-7b: instruct-v0.2-q2_K
    • Mistral-7b: instruct-v0.2-q5_K_M
    • Mistral-7b: instruct-v0.2-q6_K [Currently, In use]

To store models, open a sub-directory inside the "api" directory open a directory.

For example: "lang_models":

plot

Setup Guidelines:

  1. OS tested: Ubuntu>=20.04 LTS
  2. Create a Python>=3.11 environment using conda or virtual env
  3. Use the requirements file to install the dependencies:
pip install -r requirements.txt
  1. Use Ollama docker and Huggingface to pull/download all the models, refer to section: Key Dependencies for details and where to store the models inside your machine.
  2. Set the .env file according to the .env.example structure. Note: For CPU inference, set USE_GPU=0
  3. From the parent directory, to run the system, execute the command below in the termnal:
streamlit run api/app.py

System Support:

  1. Integrated frontend with Streamlit
  2. Up-coming: Separated backend support
  3. Up-coming: Docker support

Credits and special thanks to my friends:

  1. Sharif Ahamed, MSc. in AI, University of Bradford, Bradford, United Kingdom, Email:
    • For advising me through
  2. Soroush Yaghoubi, BSc. In Informatics, Technical University Dortmund, Dortmund, Germany:
    • For the frontend idea and more works in future

efficient_data_conversation's People

Contributors

alcatraz47 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.