Coder Social home page Coder Social logo

ai-review-ingestion's Introduction

AI Review Ingestion

Purpose

This repository provides a set of tools for generating topics from sets of questions, predetermined topics, or zero-shot using various different techniques such as LLMs or BERT-based models.

Technologies

This repository uses the following technologies:

  1. DSPy: Used for prompt management and in future for prompt optimization and bootstrapping multi shot prompts
  2. Arize: Temporarily used for monitoring and debugging dspy till mlflow tracing support is added
  3. LanceDB: Is temporarily used for the search tool along with UC volumes to be able to FTS with unlimited result set to do aggs
  4. Llama 3.1 70b: Used for topic generation and sentiment analysis using dspy as the prompt management tool
  5. All data is stored in delta tables

High-level Design

The repository is organized into several modules:

  • auto_topic.index: Provides functionality for indexing and searching text data.
  • auto_topic.sentiment: Analyzes text data to extract sentiment and other relevant information.
  • auto_topic.domains: Defines domains and questions to be answered by the LLM.
  • auto_topic.source: Specifies the source of the text data.
  • auto_topic.extract: Extracts relevant information from the text data.

How to use

  1. Install the required packages by running pip install -r requirements.txt.
  2. Set up your data by creating a table with the required columns (e.g., review_id, rating, review).
  3. Configure the 00_CONFIG notebook with your specific settings (e.g., CATALOG, SCHEMA, REVIEWS_TABLE).
  4. Run the 01_SETUP_DATA notebook to set up your data.
  5. Run the 05_SEARCH_TOOL notebook to create an index and search for text data.
  6. Run the 04_BATCH_ETL notebook to extract relevant information from the text data.

Notebook Explanations

00_CONFIG

Configures the settings for the repository, including the catalog, schema, and table names.

01_SETUP_DATA

Sets up the data by creating a table with the required columns.

02_SETUP_DOMAINS

Defines domains and questions to be answered by the LLM using the auto_topic.domains module.

03_PLAYGROUND

Provides a playground for testing and experimenting with the repository.

04_BATCH_ETL

Extracts relevant information from the text data using the auto_topic.extract module.

05_SEARCH_TOOL

Creates an index and searches for text data using the auto_topic.index module.

Optional Notebooks Explained

00_TRACING (Optional)

Enables tracing with Arize UI for monitoring and debugging purposes.

01_SETUP_TOPIC_ANALYSIS (Optional but Recommended)

Analyzes the text data to extract topics and sentiment using the auto_topic.sentiment module.

Note: This notebook is optional but recommended for gaining a deeper understanding of the text data.

Example Use Cases

  • Analyzing customer reviews to extract sentiment and topics.
  • Identifying defects and issues with products.
  • Extracting relevant information from text data for decision-making purposes.

Troubleshooting

  • Check the 00_CONFIG notebook for correct settings.
  • Verify that the required packages are installed.
  • Consult the documentation for each module for specific troubleshooting tips.

Disclaimer

This is a "what you see is what you get" set of notebooks, and it is your responsibility to use them to go to production. The notebooks are provided as examples, and you should modify them to suit your specific needs.

ai-review-ingestion's People

Contributors

stikkireddy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.