sinaptik-ai / pandas-ai Goto Github PK

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

Home Page: https://pandas-ai.com

License: Other

Python 99.73% Makefile 0.27%

llm pandas ai data-analysis data-science gpt-3 gpt-4 csv data sql

pandas-ai's Introduction

PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.

🔧 Getting started

The documentation for PandasAI to use it with specific LLMs, vector stores and connectors, can be found here.

📦 Installation

With pip:

pip install pandasai

With poetry:

poetry add pandasai

🔍 Demo

Try out PandasAI yourself in your browser:

🚀 Deploying PandasAI

PandasAI can be deployed in a variety of ways. You can easily use it in your Jupyter notebooks or streamlit apps, or you can deploy it as a REST API such as with FastAPI or Flask.

If you are interested in managed PandasAI Cloud or self-hosted Enterprise Offering, take a look at our website or book a meeting with us.

💻 Usage

Ask questions

import os
import pandas as pd
from pandasai import Agent

# Sample DataFrame
sales_by_country = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
    "sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000]
})

# By default, unless you choose a different LLM, it will use BambooLLM.
# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)
os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"

agent = Agent(sales_by_country)
agent.chat('Which are the top 5 countries by sales?')

China, United States, Japan, Germany, Australia

Or you can ask more complex questions:

agent.chat(
    "What is the total sales for the top 3 countries by sales?"
)

The total sales for the top 3 countries by sales is 16500.

Visualize charts

You can also ask PandasAI to generate charts for you:

agent.chat(
    "Plot the histogram of countries showing for each the gdp, using different colors for each bar",
)

Multiple DataFrames

You can also pass in multiple dataframes to PandasAI and ask questions relating them.

import os
import pandas as pd
from pandasai import Agent

employees_data = {
    'EmployeeID': [1, 2, 3, 4, 5],
    'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'],
    'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance']
}

salaries_data = {
    'EmployeeID': [1, 2, 3, 4, 5],
    'Salary': [5000, 6000, 4500, 7000, 5500]
}

employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)

# By default, unless you choose a different LLM, it will use BambooLLM.
# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)
os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"

agent = Agent([employees_df, salaries_df])
agent.chat("Who gets paid the most?")

Olivia gets paid the most.

You can find more examples in the examples directory.

🔒 Privacy & Security

In order to generate the Python code to run, we take some random samples from the dataframe, we randomize it (using random generation for sensitive data and shuffling for non-sensitive data) and send just the randomized head to the LLM.

If you want to enforce further your privacy you can instantiate PandasAI with enforce_privacy = True which will not send the head (but just column names) to the LLM.

📜 License

PandasAI is available under the MIT expat license, except for the pandasai/ee directory (which has it's license here if applicable.

If you are interested in managed PandasAI Cloud or self-hosted Enterprise Offering, take a look at our website or book a meeting with us.

Resources

Docs for comprehensive documentation
Examples for example notebooks
Discord for discussion with the community and PandasAI team

🤝 Contributing

Contributions are welcome! Please check the outstanding issues and feel free to open a pull request. For more information, please check out the contributing guidelines.

Thank you!

pandas-ai's People

Contributors

Stargazers

Watchers

Forkers

akashmavle5 fillerfree valeman cumeadi imnotdev25 wagnerchagas sabastain-wakoyi mr-ziad techthiyanes merishnasuwal ajay-cz bhuwan1998 davidruge tspannhw dpolychr djoguns restevesd arambo213 mrjdomingus gauravkrp eltociear davipar abionaraji hamzaeialaoui jarulraj lfunderburk yzaparto techventurebuilder greenrock21 chrstfer afiqmuzaffar lgs aben25 dev-khant funkysmoothsyntax rmudaly74 alsawy45 richardsonjf mahdi2026 kenibrewer xeerox666 team172011 farcode-io franciraldo dleondevx thecasual arosstale georasaq mailmahee leolujuyi sbarman-mi9 johnson-52197 grv805 dsupertramp andre-amorim pauljw28 daniel-nieto springtian1982 nageshmashette robotvinay ekamioka aditikhare007 jaimescarlos uripeled2 harishk98x minhquan9094 harrylui1995 anitatailor loutussun zhangc927 code4indo deepankarm duydn tamsir quangnd2911 temuulino alebusquet raouf217 lorenzobattistela cosmix canhvinh2003 pranavchandran himanshu662000 mafiatun kyleorkin sdi1982 aailckw abdoulfataoh curiouskangaroo annasli378 nautics889 izarates jibarrionuevogaltier vpegasus jpizzle34 elixirnote hhy5277 april727 nashid doducthao

pandas-ai's Issues

No Output?

Python Newbie Q;. Using code below (with correct API Key); the graph is generated correctly, but there's no response, or error message, to the line/question : pandas_ai.run(df, prompt='Which are the 2 happiest countries?')

Could you please help?

first, installed pandasai in command line with "pip install pandasai"

#import the dependencies:

import pandas as pd
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI

#create a dataframe using pandas
df = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"gdp": [21400000, 2940000, 2830000, 3870000, 2160000, 1350000, 1780000, 1320000, 516000, 14000000],
"happiness_index": [7.3, 7.2, 6.5, 7.0, 6.0, 6.3, 7.3, 7.3, 5.9, 5.0]
})

#print results
#print(df)

OPENAI_API_KEY = "XXX"
llm = OpenAI(api_token=OPENAI_API_KEY)

pandas_ai = PandasAI(llm)
pandas_ai.run(df, prompt='Which are the 2 happiest countries?')

create a chart, using pandasai to set up the prompt

pandas_ai.run(df, "Plot the histogram of countries showing for each the gpd, using different colors for each bar")

Starcoder Hallucinations, Graph Issues, Output issues

🐛 Describe the bug

When using StarCoder, it works great for text prompts 90% of the time, but graphing prompts don't seem to work. I have listed two issues I have observed and their respective prompts:

Prompt 1: Show a bar chart of the order qty for all unique part id's.

Issue 1: Besides not generating the plt.show() function, it seems to cut off randomly during the output. Don't know if this is a token issue, or some kind of execution issue. The matplotlib window appears in the MacOS Dock, so it is being initiated, but it doesn't open a window or anything.

Prompt 2: Make a bar chart of the order qty for all desired ship dates.

Issue 2: This prompt seems to make the LLM output quite an ambiguous message, and I don't see any code generated.

I have attached the test data file
filename.csv

Minimalized Code Is Below:

import pandas

from pandasai import PandasAI
from pandasai.llm.starcoder import Starcoder
        
dataPrompt = "Make a bar chart of the order qty for all desired ship dates."

llm = Starcoder(api_token="HF_API_TOKEN")

pandasAI = PandasAI(llm)

pandasDataFrame = pandas.read_csv("filename.csv")

response = pandasAI.run(pandasDataFrame, prompt = dataPrompt)

print(response)

Choose a better separator for startCode and endCode

HI,
I'm working on implementing Cohere LLM, and the separator used for startCode and endCode ( and ) doesn't work: I'm testing with different characters:

And others

SyntaxError: invalid syntax changing the query in the demo example

Sample DataFrame
df = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
    "gdp": [21400000, 2940000, 2830000, 3870000, 2160000, 1350000, 1780000, 1320000, 516000, 14000000],
    "happiness_index": [7.3, 7.2, 6.5, 7.0, 6.0, 6.3, 7.3, 7.3, 5.9, 5.0]
})

from pandasai.llm.openai import OpenAI
llm = OpenAI(api_token="")

pandas_ai = PandasAI(llm)
pandas_ai.run(df, prompt='What is the data about?')

The above code (changing the prompt to "What is the data about?"), gives the following error. Looks like it is still able to describe the data, but its giving syntaxerror.

Traceback (most recent call last):
 ......
  File "<unknown>", line 2
    The data is about a dataframe with 26 columns and 5 rows. The columns include api_id, email, name, phone number, and various survey questions such as age, gender, and income. There is also a column for whether the respondent was invited by a friend. 
        ^^^^
SyntaxError: invalid syntax

Possible replacement for pytest over built-in unittest

@gventuri, wouldn't using the pytest framework for unit testing be better?

pytest is also included as a dev dependency but not used.

Randomly getting incomplete answers

🐛 Describe the bug

Hi there,

Firstly, I think this is an awesome project, thanks to those who create and maintain it!

I am using this football dataset - https://www.kaggle.com/datasets/rishikeshkanabar/premier-league-player-statistics-updated-daily

My code is exactly as the example suggests.

When I give it a prompt, sometimes it works fine - for instance, asking it "How many nationalities are there? How many occurrences are there of each nationality?" generates a response of 'There are a total of 47 nationalities in the dataset. The number of occurrences for each nationality varies, with England having the highest count at 221 and many other nationalities having only one occurrence.', which is brilliant.

However, when I ask it about players or clubs, it essentially rewords the question and doesn't give an answer. For example:
Prompt = pandas_ai.run(df, prompt="Who has the most offsides?")
Response = 'has the most offsides, did you know that?'

Prompt = pandas_ai.run(df, prompt="Which club has the most red cards?")
Response = 'has the most red cards, did you know that?'

Any idea why this might be happening?

Edit : I should note, I have only just set up my OpenAI API key, but I'd be surprised if that was a factor in this issue given it works fine some of the time.

Support for Azure Openai apis

🚀 The feature

Hi, was wondering if we can get the support for azure openai directly as we have for other llms? Is something happening under development for the cause?

Motivation, pitch

Azure Openai is a great way to get used to the LLM technology without having to deal with anything else. Hence would love to see the features from pandas-ai implemented for the same, which will help greatly towards creating analysis reports.

Alternatives

No response

Additional context

No response

Add support for containerized code execution

Formally documenting this suggestion: #43 (comment)

Create CodeExecutionService
Implement basic UnsafeCodeExecutionService using the current exec approach using globals or whatever else makes whoever comfortable with that approach
Replace current copy-pase usages of exec (DRY violation) with CES interface, defaulting to UnsafeCES impl
Implement SandboxedCodeExecutionService using docker python library
1. Define Docker image with python support and the following python server script:
2. Create python listening for requests
3. On request received, execute code
4. On success/failure/any condition, collect results and return in server response
5. On tool startup, create docker client
6. Forward SandboxedCES execution requests to docker client and return response

Add multiturn capability

Currently, you can only ask a single question to the LLM and get an answer in return: technically, this is called a "single turn" or "single interation" process. Once I ask a new question, the context from the preceding question isn't kept, which makes it hard to improve answers. For examples, it would be great to have something like this (which is how I use GPT-4 in practice):

Q: create boxplots for each column, using the seaborn library
A: < saves boxplots >
Q: ok, now change the style to 'darkgrid' instead of 'whitegrid'
A: <save boxplots again, with new style>
Q: it would be better to condition on the value "employment_status"
A: < changes the plot again, using maybe the hue parameter to show the different values of "employment_status" >

I.e., multiturn mode, also called "dialogue" or "conversation"

Note that this wouldn't work because currently you don't allow importing packages...but I'm opening another issue for that

Incorporate DataSketches for simple operations?

There's a package called approximatelabs/sketch that implements something similar but it relies on Apache DataSketches to implement certain summary operations.
Is there a possible integration opportunity here? There are a number of simple operations that can be executed without the need for code generation from an LLM.

Deploy pandas-ai as APIs locally/cloud with langchain-serve

Great work with pandas-ai. Opens up lots of possibilities with dataframes.

langchain-serve can help achieve many of the planned Todos (and more) by expanding the current codebase. I understand that pandas-ai doesn't use langchain, but langchain-serve works with any python-based LLM apps.

👀 See how pdfGPT integrates with langchain-serve to deploy PDF Q&A bot on production.

Highlights:

Exposes APIs from function definitions locally as well as on the cloud.
Very few lines of code changes and ease of development remain the same as local.
Supports both REST & WebSocket endpoints with custom authorization.
Serverless/autoscaling endpoints with automatic tls certs on the cloud.
Real-time streaming, human-in-the-loop support - crucial for chatbots.

Disclaimer: I'm the primary author of langchain-serve. Would be happy to collaborate on this.

OpenAssistant Error

Hey Thanks for the package, When I tried using OpenAssistant. I got this error. The same code works fine with OpenAI.

Traceback (most recent call last):

  File "/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py", line 3553, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File ["<ipython-input-9-ac30c70b206d>"](https://localhost:8080/#), line 2, in <cell line: 2>
    pandas_ai.run(df, prompt='Which are the 5 happiest countries?')

  File "/usr/local/lib/python3.10/dist-packages/pandasai/__init__.py", line 70, in run
    code = self._llm.generate_code(

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 76, in generate_code
    return self._extract_code(self.call(instruction, prompt))

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 62, in _extract_code
    code = self._polish_code(code)

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 45, in _polish_code
    self._remove_imports(code)

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 24, in _remove_imports
    tree = ast.parse(code)

  File "/usr/lib/python3.10/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,

  File "<unknown>", line 5
    df =
        ^
SyntaxError: invalid syntax

Token limits

Any plans to unlock current large dataframes limitation related to token limits?

Congrats on the amazing work btw!

add topics

I suggest adding the topics pandas, python in the About section at https://github.com/gventuri/pandas-ai

Scalability issues

You clearly have to pass the whole data frame to the OAI API. Even for small data frames (hundreds of rows, dozens of columns) this could easily fill up a 4096 context, or make users spend a lot of money. You should compute the number of tokens before you make the API call, and it’s that over some threshold, warn the user.

Also, this will clearly not scale to the size of the datasets used in the industry. Try a random dataset with 10000 rows and 100 columns for example. If it doesn’t work (as I expect) consider testing some fix, such as maybe split the di in chunks, summarize them and use the summaries to answer the research question. Summaries will most likely mess up the floating point numbers, though. All in all, I don’t see how this can work even for medium-sized dataframes

hi

Which language models to add support for

I imagine you want to add support for Google's Bard and Stanford's Alpaca, any other LLMs that you had in mind?

Crash in conversational mode

🐛 Describe the bug

I load data from CSV file:

df = pd.read_csv(dataFile, encoding='ISO-8859-1')

This file has columns name, price, width and height.

I created pandasAI with conversational = True:

pandas_ai = PandasAI(
    llm, 
    verbose = True, 
    conversational = True 
)

Some questions work fine:

How many items you have with size less than 200x200?
There are 18 items that have a size smaller than 200x200.
How many rows?
There are 21 rows, did that answer your question?

But some questions return with crash.

How many chairs do you have?

File "/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py", line 3553, in run_code
    the next line of the prompt.

  File ["<ipython-input-45-508395eda1c3>"](https://localhost:8080/#), line 1, in <cell line: 1>
    pandas_ai.run(df, prompt='How many chairs have you?')

  File "/usr/local/lib/python3.10/dist-packages/pandasai/__init__.py", line 103, in run
    code = self._llm.generate_code(

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 117, in generate_code
    return self._extract_code(self.call(instruction, prompt))

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 91, in _extract_code
    code = self._polish_code(code)

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 60, in _polish_code
    self._remove_imports(code)

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 36, in _remove_imports
    tree = ast.parse(code)

  File "/usr/lib/python3.10/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,

  File "<unknown>", line 1
    <startCode>
    ^
SyntaxError: invalid syntax

I have $200. What I can buy?

File "/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py", line 3553, in run_code
    the next line of the prompt.

  File ["<ipython-input-38-77eb4102830b>"](https://localhost:8080/#), line 1, in <cell line: 1>
    pandas_ai.run(df, prompt='I have $200. What I can buy?')

  File "/usr/local/lib/python3.10/dist-packages/pandasai/__init__.py", line 103, in run
    code = self._llm.generate_code(

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 117, in generate_code
    return self._extract_code(self.call(instruction, prompt))

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 91, in _extract_code
    code = self._polish_code(code)

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 60, in _polish_code
    self._remove_imports(code)

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 36, in _remove_imports
    tree = ast.parse(code)

  File "/usr/lib/python3.10/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,

  File "<unknown>", line 1
    <startCode>
    ^
SyntaxError: invalid syntax

Support operation on multiple data frames, for example concat, merge, join, append, compare, etc

🚀 The feature

Support operation on multiple data frames, for example concat, merge, join, append, compare, etc

Motivation, pitch

Support operation on multiple data frames, for example concat, merge, join, append, compare, etc

Alternatives

No response

Additional context

No response

Add feedback loop if the code fails to execute

We should add a retry mechanism so that if the code fails to be executed and an error occurs, the llm self-improves.
We should also add a max_retry variable that defaults to 3 so that we don't run in infinite loops.

Incorrect example

Thanks for this awesome library. However, the example shown needs to be factually correct.

question with current date

hi there, I was playing around with your project it's amazing but found some problems when using it with scheduled data

when I try to ask questions that relate to the current time like "What is today's work"," Who has work tomorrow", and "Who had a shift yesterday"

I try to use prompts like "today date is 4/22/2023........" or "use datetime.now() to know current DateTime..."
it works well sometimes but sometimes it gives me a hallucination answer

so, it might be a great idea if you add this feature to your project

You exceeded your current quota, please check your plan and billing details.

Hello management how are you hope you doing good. I've a very tough issue about this(You exceeded your current quota, please check your plan and billing details) every time when I run my code this issue stuck my mind I'm very frustrated plz help me management I'm in new guy in data field

Fix branch name in `.pylintrc` file

The branch in .pylintrc is defined as master, and this is no longer the default from git and GitHub

README.md code not correct for current version

import pandas as pd
from pandas_ai import PandasAI
does not work

you've apparently changed it to
from pandasai import PandasAI

`exec` is risky

https://github.com/gventuri/pandas-ai/blob/95667b94361ec8101ab0ae08183e4d49930fce25/pandasai/__init__.py#L130-L165

we take the code generated by LLM and run it via exec, this is extremely risky and can have data leakage or execution at the operating system level

if llm returns this:

import os
os.environ

all environment variables are exposed, even the LLM token

solution?

limit execution of modules

Is adding Contributing guidelines still a todo.

Hi,

I see that adding Contributing Guidelines still marked as a TODO.

However, I see that there is a already a Contributing.md present over here

Is there something that you want to add to Contributing.md or shall we mark it as completed in the TODO?

Openai Key error.

🐛 Describe the bug

I can use the key to access openai api while when I used pandas ai it failed

Write unit tests for the PandasAI class

Add privacy flag

Passing private data to a third party api might be a concern for privacy (for example credit cards, users personal info, etc).

We should create a "enforce_privacy" flag that, if passed, prevents the library from sending any data from within the table to third party APis

Graph/Plot Continue Execution Feature

Right now with a prompt asking to show a graph, pandasAI returns the plt.show() function, thus displaying the graph and blocking code execution until the graph is closed.

Funny enough, I was able to force the LLM to return plt.show(block = false) with the following prompt: Make a bar chart of the order qty for all unique part id's. After showing the chart, don't block the process. Continue with plt.show(block=false).

(and it worked)

I'm not sure if this will be a pandasAI code change or some kind of prompt concatenation trick, but allowing for continuing code execution be a very useful feature to have at hand. Great project for the record 👍🏼

Add GPT-4 support

Why are 3.5 & davinci the only ones supported?

Do not display charts

🐛 Describe the bug

I am a novice and would like to ask if the chart is no longer displayed, or if there is a specific operation to display the chart

Adding Issue and Pull Request template

Hi @gventuri ,

I just wanted to congratulate on the excellent work done on pandas-ai. It looks great and seeing it gain so many stars in such a short time is nothing short of just incredible.

However, with increasing popularity of this will result in increasing issues and increasing pull requests. Therefore, it would be a good idea to have some sort of issue template and pull request template which will enable the users to post issues in a manner which will help the maintainers to either debug it or add a new feature better.

Similarly, it would be nicer to add a Pull request template to keep a track of which issues were solved by which PR.

Key error with Excel data

When loading Excel spreadsheet, GPT hallucinates column name (eg., 'HQ Location' or 'Country' instead of 'Location'). If there's vector embedding behind this, column names should be included in the prompt.

Prone to prompt injection attacks

Please highlight the risks and at the very least consider using ReAct (Reason+Act) pattern, learn more here Dual LLM pattern

Include show code feature

Hi @gventuri,

I have included a feature that let's the user view the code that was used to generate the answer. A parameter named show_code can be included in the run function which will create a new cell under the prompt and pastes the code in it. Let me know if I can integrate it.

Thank you

Allow importing packages, capture the error and allow the user to choose whether to install them or not

I see that your current model for dealing with the installation of new libraries, is to strongly discourage it in the prompt:

Return the python code (do not import anything) and make sure to prefix the python code with exactly and suffix the code with exactly to get the answer to the following question

I don't think this is optimal: some of the best Pandas code generated by GPT-4 requires importing seaborn or numpy. Also, shouldn't at least importing matplotlib be allowed? Otherwise, how do you generate plots?

IMO, it would be far better to:

allow packages to be imported
now the LLM generates some code that import packages. When reading the code and before execution, capture all the import statements
Check if all packages are already installed in the active environment (this adds a bit of complexity because now you need to understand if conda, pip or poetry is being used to install packages)
if not, ask permission to the user to install packages. If permission is negated, you may print an informative message, and query again the LLM with a different prompt which contains the words (do not import anything).

Many variations are possible:

you could add a parameter allow_imports to run that switches between a prompt that allows imports, and an another one that doesn't
you could never install packages, but only ask the user if they want to install the suggested packages themselves
etc.

FileNotFoundError: [Errno 2] No such file or directory: 'filename.csv'

when calling this code:
pandas_ai.run( data, "plot the growth of Internet popularity in Entity Russia", )
this error is displayed:
FileNotFoundError: [Errno 2] No such file or directory: 'filename.csv'
the file name may change, that's not the point. I think this is due to the fact that ChatGPT is thinking about the code starting with the import and loading of the dataset. you can solve it by removing a few lines of code using regular expressions or in other ways. I haven't solved this problem yet

Adding Sphnix Based Documentation

🚀 The feature

In order to readthrough the code and method implemented, I have to go through all code. Sphinx based Documentation can be added and hosted on at readthedocs server.

Motivation, pitch

Easy to read through the code and its feature. It will also help to see the transparency in code and how sensitive information is blocked sending to LLM model

Alternatives

No response

Additional context

No response

Answer without number

Answer without number:

Add support for charts

Add Cohere LLM Support

I add the support for Cohere LLM, but the prompt needs to be refactored, in order to get the Python code as output.

CLI

I saw on the TODO list that you'd like to add a CLI functionality, I'd like to do that, wanted to make sure it's okay before I start.

Google Sheets Feature request

Hey can you use gspread or other frameworks so that we can run some cool prompt based analytics for pandas.

Increase test coverage

🚀 The feature

Improve test coverage by writing tests for openai and startcoder.
This improves code quality and allows a better user and developer experience.

Motivation, pitch

Mentioned by @gventuri on Discord.

Alternatives

No response

Additional context

No response

pip install wrong

Hello authors:
when i installed pandasai, something was wrong in MacOS.i use the code according to the README.md

pip install pandas

the wrong is:

ERROR: Could not find a version that satisfies the requirement pandasai (from versions: none)
ERROR: No matching distribution found for pandasai

can you help me?

Replace unit test repetitions with fixtures

There are a lot of repetitive calls at unit tests setup. Maybe it would be better to replace them with fixtures. This is even simpler in pytest framework.

Check here: https://docs.python.org/3/library/unittest.html#unittest.TestCase.setUp

error requesting data in `json` format

traceback:

Traceback (most recent call last):
  File "/Users/avelino/projects/buser/openapi/recommendation-next-travel/use-pandasai.py", line 43, in <module>
    print(pandas_ai.run(
          ^^^^^^^^^^^^^^
  File "/Users/avelino/projects/buser/openapi/recommendation-next-travel/.venv/lib/python3.11/site-packages/pandasai/__init__.py", line 120, in run
    answer = self.run_code(code, data_frame, False)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/avelino/projects/buser/openapi/recommendation-next-travel/.venv/lib/python3.11/site-packages/pandasai/__init__.py", line 166, in run_code
    exec(code)
  File "<string>", line 9, in <module>
NameError: name 'json' is not defined

source:

data = {...}
df = pd.DataFrame(data)
llm = OpenAI()
pandas_ai = PandasAI(llm)

print(pandas_ai.run(
    df,
    prompt="""
    suggest when the next trip will be, destination and seat type,
    in JSON format: {"destination": "", "seat-type": ""}
    """))

proposed solution

how I solved it locally, but I believe it is not the best way

add import json in __init__.py

Apply linting

There are a lot of linting issues in the code. They should be analyzed, and the critical ones fixed.

My suggestion is to add pylint and mypy as main linters.

sinaptik-ai / pandas-ai Goto Github PK

pandas-ai's Introduction

🔧 Getting started

📦 Installation

🔍 Demo

🚀 Deploying PandasAI

💻 Usage

Ask questions

Visualize charts

Multiple DataFrames

🔒 Privacy & Security

📜 License

Resources

🤝 Contributing

Thank you!

pandas-ai's People

Contributors

Stargazers

Watchers

Forkers

pandas-ai's Issues

first, installed pandasai in command line with "pip install pandasai"

create a chart, using pandasai to set up the prompt

🐛 Describe the bug

🐛 Describe the bug

🚀 The feature

Motivation, pitch

Alternatives

Additional context

🐛 Describe the bug

🚀 The feature

Motivation, pitch

Alternatives

Additional context

solution?

🐛 Describe the bug

🐛 Describe the bug

🚀 The feature

Motivation, pitch

Alternatives

Additional context

🚀 The feature

Motivation, pitch

Alternatives

Additional context

proposed solution

Recommend Projects

Recommend Topics

Recommend Org