Coder Social home page Coder Social logo

daxa-ai / pebblo Goto Github PK

View Code? Open in Web Editor NEW
113.0 7.0 19.0 7.37 MB

Pebblo enables developers to safely load data and promote their Gen AI app to deployment

Home Page: https://daxa.ai/pebblo

License: MIT License

Python 55.90% CSS 8.49% HTML 14.07% Makefile 0.65% JavaScript 20.70% HCL 0.19%
data-governance gen-ai llm rag topic-classification entity-classification

pebblo's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pebblo's Issues

Dockerize Pebblo

Create a Dockerfile for pebblo to quickly run pebblo in a containerized environment.

Tasks:

  • Add Pebblo Dockerfile
  • CI: Add docker image push on release/tag

[Enhancement] App Histoy

As of pebblo 0.1.7, we have good information about current state of the app loading. It would be useful to capture history of last 5 loads of the app.
This will give below information about each load:

  • Location of the report and report file name
  • How many findings were there in that report
  • How many files were there with findings
  • When the report was generated.

Anonymize document snippets in the report

Feature: Document snippet anonymizer

Anonymize document snippets in Pebblo report. As Pebblo is considered for environments beyond dev, anonymization will help distribute the report to more app stakeholders.

Pebblo --help should not show empty progress bar

$ pebblo --help
  0%|                                                                                                      | 0/10 [00:00<?, ?it/s]usage: pebblo [-h] [--config CONFIG]

Pebblo CLI

options:
  -h, --help       show this help message and exit
  --config CONFIG  Config file path
  0%|                                                                                                       | 0/10 [00:00<?, ?it/s]

LlamaIndex support

Description

Pebblo currently supports LangChain RAG apps. Extend the client to support another popular RAG development framework: LlamaIndex.

https://www.llamaindex.ai/

Components

  • Pebblo SafeLoader for LlamaIndex

OSError: cannot load library 'pango-1.0-0'

  • Pebblo server is failing on conda environment due to Pango(WeasyPrint) dependency issue
  • Once Pebblo installation is completed and user tries to run Pebblo
  • Below runtime error occurs -
    WeasyPrint could not import some external libraries. Please carefully follow the installation steps before reporting an issue: https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#installation https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#troubleshooting
  • Please refer below screenshot for complete error message:

image

[Enhancement] Support for Multiple Data Sources

Pebblo (as of 0.1.12 version) supports single data source. Having support multiple data sources within single RAG application would be a good feature.

Description:
When I have multiple data sources to be used in my app, I should be able to see all those data sources and their details in the pebblo report.

As part of this feature, following changes would need to be done in the report:

  1. Report Summary: Aggregate details about all data data sources.
  2. Top Files With Most Findings: Add new column to show to which data source the file belongs.
  3. Data Source: It would show snippets about all data sources.

[Bug] Unable to reach Pebblo Server

Description

When execute RAG app, we are getting error "unable to reach pebblo server." but it is generating report as expected on Pebblo Server.

Error message
$ python3 fin_corp_rag_app.py
Loading RAG documents ...
Unable to reach pebblo server.
Loaded 93 documents ...

Hydrating Vector DB ...
Finished hydrating Vector DB ...

Expected behavior
It should call Pebblo APIs and pdf report should get generated without any error.

Additional context
Pebblo server was healthy when this error occured.

System:

  • OS: Mac
  • GPU/CPU:
  • Pebblo version (commit or version number): 0.1.11
  • Langchain version: 0.1.9
  • DocumentStore:
  • Reader:
  • Retriever:

[Enchancement] Local UI

As of pebblo 0.1.9, we have only pebblo_report.pdf as output of the pebblo package.
It would be useful if we can have simple UI running locally on pebblo server which will show all apps discovered, giving details about each app(equivalent to pebblo_report.pdf).

OSError: [E050] can't find model

Description:

  • Once Pebblo installation is completed and user tries to run Pebblo
  • Below runtime error occurs - OSError: [E050] can't find model en_core_web_lg
  • Using Ubuntu VM with Ubuntu 22.04
  • Please refer below screenshot for complete error message:
Screenshot 2024-01-30 at 12 41 03 AM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.