Coder Social home page Coder Social logo

sinaptik-ai / pandas-ai Goto Github PK

View Code? Open in Web Editor NEW
11.0K 96.0 988.0 4.21 MB

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

Home Page: https://pandas-ai.com

License: Other

Python 99.73% Makefile 0.27%
llm pandas ai data-analysis data-science gpt-3 gpt-4 csv data sql

pandas-ai's Introduction

PandasAI

Release CI CD Coverage Documentation Status Discord Downloads License: MIT Open in Colab

PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.

πŸ”§ Getting started

The documentation for PandasAI to use it with specific LLMs, vector stores and connectors, can be found here.

πŸ“¦ Installation

With pip:

pip install pandasai

With poetry:

poetry add pandasai

πŸ” Demo

Try out PandasAI yourself in your browser:

Open in Colab

πŸš€ Deploying PandasAI

PandasAI can be deployed in a variety of ways. You can easily use it in your Jupyter notebooks or streamlit apps, or you can deploy it as a REST API such as with FastAPI or Flask.

If you are interested in managed PandasAI Cloud or self-hosted Enterprise Offering, take a look at our website or book a meeting with us.

πŸ’» Usage

Ask questions

import os
import pandas as pd
from pandasai import Agent

# Sample DataFrame
sales_by_country = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
    "sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000]
})

# By default, unless you choose a different LLM, it will use BambooLLM.
# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)
os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"

agent = Agent(sales_by_country)
agent.chat('Which are the top 5 countries by sales?')
China, United States, Japan, Germany, Australia

Or you can ask more complex questions:

agent.chat(
    "What is the total sales for the top 3 countries by sales?"
)
The total sales for the top 3 countries by sales is 16500.

Visualize charts

You can also ask PandasAI to generate charts for you:

agent.chat(
    "Plot the histogram of countries showing for each the gdp, using different colors for each bar",
)

Chart

Multiple DataFrames

You can also pass in multiple dataframes to PandasAI and ask questions relating them.

import os
import pandas as pd
from pandasai import Agent

employees_data = {
    'EmployeeID': [1, 2, 3, 4, 5],
    'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'],
    'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance']
}

salaries_data = {
    'EmployeeID': [1, 2, 3, 4, 5],
    'Salary': [5000, 6000, 4500, 7000, 5500]
}

employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)

# By default, unless you choose a different LLM, it will use BambooLLM.
# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)
os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"

agent = Agent([employees_df, salaries_df])
agent.chat("Who gets paid the most?")
Olivia gets paid the most.

You can find more examples in the examples directory.

πŸ”’ Privacy & Security

In order to generate the Python code to run, we take some random samples from the dataframe, we randomize it (using random generation for sensitive data and shuffling for non-sensitive data) and send just the randomized head to the LLM.

If you want to enforce further your privacy you can instantiate PandasAI with enforce_privacy = True which will not send the head (but just column names) to the LLM.

πŸ“œ License

PandasAI is available under the MIT expat license, except for the pandasai/ee directory (which has it's license here if applicable.

If you are interested in managed PandasAI Cloud or self-hosted Enterprise Offering, take a look at our website or book a meeting with us.

Resources

  • Docs for comprehensive documentation
  • Examples for example notebooks
  • Discord for discussion with the community and PandasAI team

🀝 Contributing

Contributions are welcome! Please check the outstanding issues and feel free to open a pull request. For more information, please check out the contributing guidelines.

Thank you!

Contributors

pandas-ai's People

Contributors

amjadraza avatar arslansaleem avatar avelino avatar chengwaikoo avatar dsupertramp avatar dudesparsh avatar gaurang98671 avatar goriri avatar gventuri avatar hemantsachdeva avatar henriqueajnb avatar johnson-52197 avatar jonbiemond avatar kartheekyakkala avatar kukushking avatar leehanchung avatar lorenzobattistela avatar milind-sinaptik avatar mspronesti avatar nautics889 avatar oedokumaci avatar omarelsherif010 avatar pavelagurov avatar rekwet avatar sandiemann avatar sourcery-ai[bot] avatar tanmaypatil123 avatar victor-hugo-dc avatar yassinkortam avatar yzaparto avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pandas-ai's Issues

No Output?

Python Newbie Q;. Using code below (with correct API Key); the graph is generated correctly, but there's no response, or error message, to the line/question : pandas_ai.run(df, prompt='Which are the 2 happiest countries?')

Could you please help?

first, installed pandasai in command line with "pip install pandasai"

#import the dependencies:

import pandas as pd
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI

#create a dataframe using pandas
df = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"gdp": [21400000, 2940000, 2830000, 3870000, 2160000, 1350000, 1780000, 1320000, 516000, 14000000],
"happiness_index": [7.3, 7.2, 6.5, 7.0, 6.0, 6.3, 7.3, 7.3, 5.9, 5.0]
})

#print results
#print(df)

OPENAI_API_KEY = "XXX"
llm = OpenAI(api_token=OPENAI_API_KEY)

pandas_ai = PandasAI(llm)
pandas_ai.run(df, prompt='Which are the 2 happiest countries?')

create a chart, using pandasai to set up the prompt

pandas_ai.run(df, "Plot the histogram of countries showing for each the gpd, using different colors for each bar")

Starcoder Hallucinations, Graph Issues, Output issues

πŸ› Describe the bug

When using StarCoder, it works great for text prompts 90% of the time, but graphing prompts don't seem to work. I have listed two issues I have observed and their respective prompts:

Prompt 1: Show a bar chart of the order qty for all unique part id's.
Screenshot 2023-05-08 at 12 55 58 PM
Issue 1: Besides not generating the plt.show() function, it seems to cut off randomly during the output. Don't know if this is a token issue, or some kind of execution issue. The matplotlib window appears in the MacOS Dock, so it is being initiated, but it doesn't open a window or anything.

Prompt 2: Make a bar chart of the order qty for all desired ship dates.
Screenshot 2023-05-08 at 1 00 19 PM
Issue 2: This prompt seems to make the LLM output quite an ambiguous message, and I don't see any code generated.

I have attached the test data file
filename.csv

Minimalized Code Is Below:

import pandas

from pandasai import PandasAI
from pandasai.llm.starcoder import Starcoder
        
dataPrompt = "Make a bar chart of the order qty for all desired ship dates."

llm = Starcoder(api_token="HF_API_TOKEN")

pandasAI = PandasAI(llm)

pandasDataFrame = pandas.read_csv("filename.csv")

response = pandasAI.run(pandasDataFrame, prompt = dataPrompt)

print(response)

SyntaxError: invalid syntax changing the query in the demo example

Sample DataFrame
df = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
    "gdp": [21400000, 2940000, 2830000, 3870000, 2160000, 1350000, 1780000, 1320000, 516000, 14000000],
    "happiness_index": [7.3, 7.2, 6.5, 7.0, 6.0, 6.3, 7.3, 7.3, 5.9, 5.0]
})

from pandasai.llm.openai import OpenAI
llm = OpenAI(api_token="")

pandas_ai = PandasAI(llm)
pandas_ai.run(df, prompt='What is the data about?')

The above code (changing the prompt to "What is the data about?"), gives the following error. Looks like it is still able to describe the data, but its giving syntaxerror.

Traceback (most recent call last):
 ......
  File "<unknown>", line 2
    The data is about a dataframe with 26 columns and 5 rows. The columns include api_id, email, name, phone number, and various survey questions such as age, gender, and income. There is also a column for whether the respondent was invited by a friend. 
        ^^^^
SyntaxError: invalid syntax

Randomly getting incomplete answers

πŸ› Describe the bug

Hi there,

Firstly, I think this is an awesome project, thanks to those who create and maintain it!

I am using this football dataset - https://www.kaggle.com/datasets/rishikeshkanabar/premier-league-player-statistics-updated-daily

My code is exactly as the example suggests.

When I give it a prompt, sometimes it works fine - for instance, asking it "How many nationalities are there? How many occurrences are there of each nationality?" generates a response of 'There are a total of 47 nationalities in the dataset. The number of occurrences for each nationality varies, with England having the highest count at 221 and many other nationalities having only one occurrence.', which is brilliant.

However, when I ask it about players or clubs, it essentially rewords the question and doesn't give an answer. For example:
Prompt = pandas_ai.run(df, prompt="Who has the most offsides?")
Response = 'has the most offsides, did you know that?'

Prompt = pandas_ai.run(df, prompt="Which club has the most red cards?")
Response = 'has the most red cards, did you know that?'

Any idea why this might be happening?

Edit : I should note, I have only just set up my OpenAI API key, but I'd be surprised if that was a factor in this issue given it works fine some of the time.

Support for Azure Openai apis

πŸš€ The feature

Hi, was wondering if we can get the support for azure openai directly as we have for other llms? Is something happening under development for the cause?

Motivation, pitch

Azure Openai is a great way to get used to the LLM technology without having to deal with anything else. Hence would love to see the features from pandas-ai implemented for the same, which will help greatly towards creating analysis reports.

Alternatives

No response

Additional context

No response

Add support for containerized code execution

Formally documenting this suggestion: #43 (comment)

  1. Create CodeExecutionService
  2. Implement basic UnsafeCodeExecutionService using the current exec approach using globals or whatever else makes whoever comfortable with that approach
  3. Replace current copy-pase usages of exec (DRY violation) with CES interface, defaulting to UnsafeCES impl
  4. Implement SandboxedCodeExecutionService using docker python library
    1. Define Docker image with python support and the following python server script:
    2. Create python listening for requests
    3. On request received, execute code
    4. On success/failure/any condition, collect results and return in server response
    5. On tool startup, create docker client
    6. Forward SandboxedCES execution requests to docker client and return response

Add multiturn capability

Currently, you can only ask a single question to the LLM and get an answer in return: technically, this is called a "single turn" or "single interation" process. Once I ask a new question, the context from the preceding question isn't kept, which makes it hard to improve answers. For examples, it would be great to have something like this (which is how I use GPT-4 in practice):

Q: create boxplots for each column, using the seaborn library
A: < saves boxplots >
Q: ok, now change the style to 'darkgrid' instead of 'whitegrid'
A: <save boxplots again, with new style>
Q: it would be better to condition on the value "employment_status"
A: < changes the plot again, using maybe the hue parameter to show the different values of "employment_status" >

I.e., multiturn mode, also called "dialogue" or "conversation"

Note that this wouldn't work because currently you don't allow importing packages...but I'm opening another issue for that

Deploy pandas-ai as APIs locally/cloud with langchain-serve

Great work with pandas-ai. Opens up lots of possibilities with dataframes.

langchain-serve can help achieve many of the planned Todos (and more) by expanding the current codebase. I understand that pandas-ai doesn't use langchain, but langchain-serve works with any python-based LLM apps.

πŸ‘€ See how pdfGPT integrates with langchain-serve to deploy PDF Q&A bot on production.

Highlights:

  • Exposes APIs from function definitions locally as well as on the cloud.
  • Very few lines of code changes and ease of development remain the same as local.
  • Supports both REST & WebSocket endpoints with custom authorization.
  • Serverless/autoscaling endpoints with automatic tls certs on the cloud.
  • Real-time streaming, human-in-the-loop support - crucial for chatbots.

Disclaimer: I'm the primary author of langchain-serve. Would be happy to collaborate on this.

OpenAssistant Error

Hey Thanks for the package, When I tried using OpenAssistant. I got this error. The same code works fine with OpenAI.

Traceback (most recent call last):

  File "/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py", line 3553, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File ["<ipython-input-9-ac30c70b206d>"](https://localhost:8080/#), line 2, in <cell line: 2>
    pandas_ai.run(df, prompt='Which are the 5 happiest countries?')

  File "/usr/local/lib/python3.10/dist-packages/pandasai/__init__.py", line 70, in run
    code = self._llm.generate_code(

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 76, in generate_code
    return self._extract_code(self.call(instruction, prompt))

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 62, in _extract_code
    code = self._polish_code(code)

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 45, in _polish_code
    self._remove_imports(code)

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 24, in _remove_imports
    tree = ast.parse(code)

  File "/usr/lib/python3.10/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,

  File "<unknown>", line 5
    df =
        ^
SyntaxError: invalid syntax

Token limits

Any plans to unlock current large dataframes limitation related to token limits?

Congrats on the amazing work btw!

Scalability issues

You clearly have to pass the whole data frame to the OAI API. Even for small data frames (hundreds of rows, dozens of columns) this could easily fill up a 4096 context, or make users spend a lot of money. You should compute the number of tokens before you make the API call, and it’s that over some threshold, warn the user.

Also, this will clearly not scale to the size of the datasets used in the industry. Try a random dataset with 10000 rows and 100 columns for example. If it doesn’t work (as I expect) consider testing some fix, such as maybe split the di in chunks, summarize them and use the summaries to answer the research question. Summaries will most likely mess up the floating point numbers, though. All in all, I don’t see how this can work even for medium-sized dataframes

Crash in conversational mode

πŸ› Describe the bug

I load data from CSV file:

df = pd.read_csv(dataFile, encoding='ISO-8859-1')

This file has columns name, price, width and height.

I created pandasAI with conversational = True:

pandas_ai = PandasAI(
    llm, 
    verbose = True, 
    conversational = True 
)

Some questions work fine:

  • How many items you have with size less than 200x200?
    There are 18 items that have a size smaller than 200x200.
  • How many rows?
    There are 21 rows, did that answer your question?

But some questions return with crash.

How many chairs do you have?

File "/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py", line 3553, in run_code
    the next line of the prompt.

  File ["<ipython-input-45-508395eda1c3>"](https://localhost:8080/#), line 1, in <cell line: 1>
    pandas_ai.run(df, prompt='How many chairs have you?')

  File "/usr/local/lib/python3.10/dist-packages/pandasai/__init__.py", line 103, in run
    code = self._llm.generate_code(

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 117, in generate_code
    return self._extract_code(self.call(instruction, prompt))

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 91, in _extract_code
    code = self._polish_code(code)

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 60, in _polish_code
    self._remove_imports(code)

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 36, in _remove_imports
    tree = ast.parse(code)

  File "/usr/lib/python3.10/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,

  File "<unknown>", line 1
    <startCode>
    ^
SyntaxError: invalid syntax

I have $200. What I can buy?

File "/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py", line 3553, in run_code
    the next line of the prompt.

  File ["<ipython-input-38-77eb4102830b>"](https://localhost:8080/#), line 1, in <cell line: 1>
    pandas_ai.run(df, prompt='I have $200. What I can buy?')

  File "/usr/local/lib/python3.10/dist-packages/pandasai/__init__.py", line 103, in run
    code = self._llm.generate_code(

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 117, in generate_code
    return self._extract_code(self.call(instruction, prompt))

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 91, in _extract_code
    code = self._polish_code(code)

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 60, in _polish_code
    self._remove_imports(code)

  File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 36, in _remove_imports
    tree = ast.parse(code)

  File "/usr/lib/python3.10/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,

  File "<unknown>", line 1
    <startCode>
    ^
SyntaxError: invalid syntax

Add feedback loop if the code fails to execute

We should add a retry mechanism so that if the code fails to be executed and an error occurs, the llm self-improves.
We should also add a max_retry variable that defaults to 3 so that we don't run in infinite loops.

Incorrect example

Thanks for this awesome library. However, the example shown needs to be factually correct.

question with current date

hi there, I was playing around with your project it's amazing but found some problems when using it with scheduled data

when I try to ask questions that relate to the current time like "What is today's work"," Who has work tomorrow", and "Who had a shift yesterday"

I try to use prompts like "today date is 4/22/2023........" or "use datetime.now() to know current DateTime..."
it works well sometimes but sometimes it gives me a hallucination answer

so, it might be a great idea if you add this feature to your project

Is adding Contributing guidelines still a todo.

Hi,

I see that adding Contributing Guidelines still marked as a TODO.

However, I see that there is a already a Contributing.md present over here

Is there something that you want to add to Contributing.md or shall we mark it as completed in the TODO?

Openai Key error.

πŸ› Describe the bug

I can use the key to access openai api while when I used pandas ai it failed

Add privacy flag

Passing private data to a third party api might be a concern for privacy (for example credit cards, users personal info, etc).

We should create a "enforce_privacy" flag that, if passed, prevents the library from sending any data from within the table to third party APis

Graph/Plot Continue Execution Feature

Right now with a prompt asking to show a graph, pandasAI returns the plt.show() function, thus displaying the graph and blocking code execution until the graph is closed.

Funny enough, I was able to force the LLM to return plt.show(block = false) with the following prompt: Make a bar chart of the order qty for all unique part id's. After showing the chart, don't block the process. Continue with plt.show(block=false).

(and it worked)

I'm not sure if this will be a pandasAI code change or some kind of prompt concatenation trick, but allowing for continuing code execution be a very useful feature to have at hand. Great project for the record πŸ‘πŸΌ

Do not display charts

πŸ› Describe the bug

I am a novice and would like to ask if the chart is no longer displayed, or if there is a specific operation to display the chart

image

Adding Issue and Pull Request template

Hi @gventuri ,

I just wanted to congratulate on the excellent work done on pandas-ai. It looks great and seeing it gain so many stars in such a short time is nothing short of just incredible.

However, with increasing popularity of this will result in increasing issues and increasing pull requests. Therefore, it would be a good idea to have some sort of issue template and pull request template which will enable the users to post issues in a manner which will help the maintainers to either debug it or add a new feature better.

Similarly, it would be nicer to add a Pull request template to keep a track of which issues were solved by which PR.

Key error with Excel data

When loading Excel spreadsheet, GPT hallucinates column name (eg., 'HQ Location' or 'Country' instead of 'Location'). If there's vector embedding behind this, column names should be included in the prompt.

Include show code feature

Hi @gventuri,

I have included a feature that let's the user view the code that was used to generate the answer. A parameter named show_code can be included in the run function which will create a new cell under the prompt and pastes the code in it. Let me know if I can integrate it.

Thank you
Screenshot 2023-05-01 at 6 34 22 AM

Allow importing packages, capture the error and allow the user to choose whether to install them or not

I see that your current model for dealing with the installation of new libraries, is to strongly discourage it in the prompt:

Return the python code (do not import anything) and make sure to prefix the python code with exactly and suffix the code with exactly to get the answer to the following question

I don't think this is optimal: some of the best Pandas code generated by GPT-4 requires importing seaborn or numpy. Also, shouldn't at least importing matplotlib be allowed? Otherwise, how do you generate plots?

IMO, it would be far better to:

  1. allow packages to be imported
  2. now the LLM generates some code that import packages. When reading the code and before execution, capture all the import statements
  3. Check if all packages are already installed in the active environment (this adds a bit of complexity because now you need to understand if conda, pip or poetry is being used to install packages)
  4. if not, ask permission to the user to install packages. If permission is negated, you may print an informative message, and query again the LLM with a different prompt which contains the words (do not import anything).

Many variations are possible:

  • you could add a parameter allow_imports to run that switches between a prompt that allows imports, and an another one that doesn't
  • you could never install packages, but only ask the user if they want to install the suggested packages themselves
  • etc.

FileNotFoundError: [Errno 2] No such file or directory: 'filename.csv'

when calling this code:
pandas_ai.run( data, "plot the growth of Internet popularity in Entity Russia", )
this error is displayed:
FileNotFoundError: [Errno 2] No such file or directory: 'filename.csv'
the file name may change, that's not the point. I think this is due to the fact that ChatGPT is thinking about the code starting with the import and loading of the dataset. you can solve it by removing a few lines of code using regular expressions or in other ways. I haven't solved this problem yet

Adding Sphnix Based Documentation

πŸš€ The feature

In order to readthrough the code and method implemented, I have to go through all code. Sphinx based Documentation can be added and hosted on at readthedocs server.

Motivation, pitch

Easy to read through the code and its feature. It will also help to see the transparency in code and how sensitive information is blocked sending to LLM model

Alternatives

No response

Additional context

No response

Add Cohere LLM Support

I add the support for Cohere LLM, but the prompt needs to be refactored, in order to get the Python code as output.

CLI

I saw on the TODO list that you'd like to add a CLI functionality, I'd like to do that, wanted to make sure it's okay before I start.

Increase test coverage

πŸš€ The feature

Improve test coverage by writing tests for openai and startcoder.
This improves code quality and allows a better user and developer experience.

Motivation, pitch

Mentioned by @gventuri on Discord.

Alternatives

No response

Additional context

No response

pip install wrong

Hello authors:
when i installed pandasai, something was wrong in MacOS.i use the code according to the README.md

pip install pandas

the wrong is:

ERROR: Could not find a version that satisfies the requirement pandasai (from versions: none)
ERROR: No matching distribution found for pandasai

can you help me?

error requesting data in `json` format

traceback:

Traceback (most recent call last):
  File "/Users/avelino/projects/buser/openapi/recommendation-next-travel/use-pandasai.py", line 43, in <module>
    print(pandas_ai.run(
          ^^^^^^^^^^^^^^
  File "/Users/avelino/projects/buser/openapi/recommendation-next-travel/.venv/lib/python3.11/site-packages/pandasai/__init__.py", line 120, in run
    answer = self.run_code(code, data_frame, False)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/avelino/projects/buser/openapi/recommendation-next-travel/.venv/lib/python3.11/site-packages/pandasai/__init__.py", line 166, in run_code
    exec(code)
  File "<string>", line 9, in <module>
NameError: name 'json' is not defined

source:

data = {...}
df = pd.DataFrame(data)
llm = OpenAI()
pandas_ai = PandasAI(llm)

print(pandas_ai.run(
    df,
    prompt="""
    suggest when the next trip will be, destination and seat type,
    in JSON format: {"destination": "", "seat-type": ""}
    """))

proposed solution

how I solved it locally, but I believe it is not the best way

add import json in __init__.py

Apply linting

There are a lot of linting issues in the code. They should be analyzed, and the critical ones fixed.

My suggestion is to add pylint and mypy as main linters.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.