Coder Social home page Coder Social logo

daxa-ai / pebblo Goto Github PK

View Code? Open in Web Editor NEW
113.0 7.0 20.0 7.51 MB

Pebblo enables developers to safely load data and promote their Gen AI app to deployment

Home Page: https://daxa.ai/pebblo

License: MIT License

Python 61.39% CSS 6.24% HTML 10.17% Makefile 0.62% JavaScript 21.44% HCL 0.13%
data-governance gen-ai llm rag topic-classification entity-classification

pebblo's Issues

[Bug] Unable to reach Pebblo Server

Description

When execute RAG app, we are getting error "unable to reach pebblo server." but it is generating report as expected on Pebblo Server.

Error message
$ python3 fin_corp_rag_app.py
Loading RAG documents ...
Unable to reach pebblo server.
Loaded 93 documents ...

Hydrating Vector DB ...
Finished hydrating Vector DB ...

Expected behavior
It should call Pebblo APIs and pdf report should get generated without any error.

Additional context
Pebblo server was healthy when this error occured.

System:

  • OS: Mac
  • GPU/CPU:
  • Pebblo version (commit or version number): 0.1.11
  • Langchain version: 0.1.9
  • DocumentStore:
  • Reader:
  • Retriever:

[Enchancement] Local UI

As of pebblo 0.1.9, we have only pebblo_report.pdf as output of the pebblo package.
It would be useful if we can have simple UI running locally on pebblo server which will show all apps discovered, giving details about each app(equivalent to pebblo_report.pdf).

Pebblo Safe Retriever details in Local UI

We would like to have Safe Retriever Details on the local UI. Here two separate tabs would be shown for loader type and retrieval type applications.
This would include below details on Safe Retrieval App listing page:

  • Active Users (with mouse over showing actual "Top" users - cut off at N user-ids, N = 3)
  • Retrieve Documents (with mouse over showing "Top" documents retrieved - cut off at N user-ids, N = 3)
  • Retrievals (i.e. prompt count, cumulative)
  • VectorDB (with mouse over showing first N vector db names, N=3)
  • Owner(app owner)

Dockerize Pebblo

Create a Dockerfile for pebblo to quickly run pebblo in a containerized environment.

Tasks:

  • Add Pebblo Dockerfile
  • CI: Add docker image push on release/tag

Linter for UI code

As we have ruff linter for python code, we would like to have linter for UI code.

[Enhancement] LlamaIndex SafeRetriever support

Follow on for #296, which introduced SafeLoader for LlamaIndex.

Add support for SafeRetriever with,

  • identity enforcement / filtering on the doc snippets retrieved from vector db
  • semantic topic policy enforcement / filtering on the doc snippets retrieved from vector db
  • send app-discover and per-prompt stats to pebblo server and pebblo cloud

Add make command to format all/changed files

Description:
We need to streamline our code formatting process by implementing a make command that can format either all files in the repository or only the changed files. This will help maintain consistency in our codebase and make it easier for developers to adhere to our coding standards.


Tasks:

  • Configure the formatting commands (format and format-diff) to use the ruff formatting tool
  • Document the usage of these commands in the project's README or documentation.

Labels: enhancement, formatting

Anonymize document snippets in the report

Feature: Document snippet anonymizer

Anonymize document snippets in Pebblo report. As Pebblo is considered for environments beyond dev, anonymization will help distribute the report to more app stakeholders.

Capture and display topic classification confidence score

Capture and display RAG document snippet's classifier confidence score in Local UI and PDF report

Tasks

  • Capture confidence score in backend schema
  • Display confidence score in PDF report
  • Display confidence score in Local UI on SafeLoader and SafeRetriever pages

OSError: [E050] can't find model

Description:

  • Once Pebblo installation is completed and user tries to run Pebblo
  • Below runtime error occurs - OSError: [E050] can't find model en_core_web_lg
  • Using Ubuntu VM with Ubuntu 22.04
  • Please refer below screenshot for complete error message:
Screenshot 2024-01-30 at 12 41 03 AM

[Enhancement] Support for Multiple Data Sources

Pebblo (as of 0.1.12 version) supports single data source. Having support multiple data sources within single RAG application would be a good feature.

Description:
When I have multiple data sources to be used in my app, I should be able to see all those data sources and their details in the pebblo report.

As part of this feature, following changes would need to be done in the report:

  1. Report Summary: Aggregate details about all data data sources.
  2. Top Files With Most Findings: Add new column to show to which data source the file belongs.
  3. Data Source: It would show snippets about all data sources.

OSError: cannot load library 'pango-1.0-0'

  • Pebblo server is failing on conda environment due to Pango(WeasyPrint) dependency issue
  • Once Pebblo installation is completed and user tries to run Pebblo
  • Below runtime error occurs -
    WeasyPrint could not import some external libraries. Please carefully follow the installation steps before reporting an issue: https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#installation https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#troubleshooting
  • Please refer below screenshot for complete error message:

image

Pebblo --help should not show empty progress bar

$ pebblo --help
  0%|                                                                                                      | 0/10 [00:00<?, ?it/s]usage: pebblo [-h] [--config CONFIG]

Pebblo CLI

options:
  -h, --help       show this help message and exit
  --config CONFIG  Config file path
  0%|                                                                                                       | 0/10 [00:00<?, ?it/s]

[Enhancement] App Histoy

As of pebblo 0.1.7, we have good information about current state of the app loading. It would be useful to capture history of last 5 loads of the app.
This will give below information about each load:

  • Location of the report and report file name
  • How many findings were there in that report
  • How many files were there with findings
  • When the report was generated.

[Local UI] Add delete app

Add delete app support for SafeLoader and SafeRetriever apps

Tasks

  • Add delete app API to pebbo server
  • Add Delete link in Local UI

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.