zwq2018 / data-copilot Goto Github PK

View Code? Open in Web Editor NEW

1.3K 1.3K 128.0 15.19 MB

Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow

Home Page: https://arxiv.org/abs/2306.07209

License: MIT License

Python 100.00%

data-copilot's People

Stargazers

Watchers

Forkers

myn2023 zwq2022 xyq1024 yanna403 stena303 skweii changxuding yg106188 ai-mou yuhuizhang nateqi angiepeng to-be-architect apollohuang1 teng1996 isperfee siriming sharejing ludongchang donggegithub ljc9562 zhongdj lovexinny zhouquan03 eric-doug zcfrank1st silentmoebuta itsharex h8f jie311 junjiem w0lker yfliao mygit-2023 nanqiai hxllegend yezhwi gaohuan2015 frcmail chrismii rovedream tipsylowrie huiguyy zouchengfang xwjim huangtianan winnerking-2020 wangyibin0011 xwzjren observedobserver myxiaoyu flyingfeather qianyouliang gunjianpanxdd duanshaoyi xiyu0229 24601 githubzuoyi xbsdsongnan honwei189 syaikhipin tiger0526 rohan7958 astalavist renheqiang hww123 sxm1129 ahuachen liding1992 kai2020-hello wufeifan124 caoshichuang lifei68801 beslet threelamb ariktan xs818818 mayi140611 renshaohai83 icyparsley verigle eadwin shism2 devlkk zhanglei3019 barryhana ai-learn-use hopshine james-hadoop hqc87v5 zhilun86 wsj-7416 gdl888 gokunwu xiangweizheng caszhang ccp123456789 meitianjinbu zhouzj1610 zhjwy9343

data-copilot's Issues

What should I do if I want to use my own dataset

Do I need to modify some interfaces?

在哪里替换openi的api-key呀

我在main.py和lab_gpt4_call.py都有设置OPENAI_API_KEY变量，但是还是会提示No API key provided.

请教一下gradio的版本是多少呀 requeirment里面没有

Generated an exception: 抱歉，您没有访问该接口的权限，权限的具体详情访问：https://tushare.pro/document/1?doc_id=108。

Would you pls share the tushare token for us? when use the instruction '预测未来**4个季度的GDP增长率', it will call the tool api.
Many thx!

step: step1=
content: {
"arg1": ["2000101","20230707","gdp_yoy"],
"function1": "get_GDP_data",
"output1": "result1",
"description1": "gdp同比增速数据"
}
It has parallel steps: 1.0
Traceback (most recent call last):
File "D:/workspace/02_from_git/Data-Copilot/main.py", line 380, in
output, image, df, output_result = run(instruction, send_chat_request_Azure=openai_call, openai_key=openai_key, api_base='', engine='')
File "D:/workspace/02_from_git/Data-Copilot/main.py", line 273, in run
Previous_result[rename] = result_buffer[output_name][1]
KeyError: 'result2'
Generated an exception: 抱歉，您没有访问该接口的权限，权限的具体详情访问：https://tushare.pro/document/1?doc_id=108。

Process finished with exit code 1

IndexError: list index out of range

运行 main.py 后

问题：今天的可孚医疗股价是多少

===============================Intent Detecting===========================================
2023-07-07 12:19:09
new_instruction: 今天的日期是2023-07-06,请帮我查询可孚医疗今天的股价。
===============================Task Planing===========================================
2023-07-07 12:19:11
stock_task : 获取可孚医疗今天的股价数据
===============================Tool select and using Stage===========================================
2023-07-07 12:19:13
==================


step: step1=
content: {
 "arg1": ["可孚医疗","20220720","20220720","daily"],
 "function1": "get_stock_prices_data",
 "output1": "result1",
 "description1": "可孚医疗今日股价数据"
}
It has parallel steps: 1.0
parallel step: 1
===============================Visualization Stage===========================================
Traceback (most recent call last):
  File "/root/Data-Copilot/main.py", line 381, in <module>
    output, image, df , output_result = run(instruction, send_chat_request_Azure = openai_call, openai_key=openai_key, api_base='', engine='')
  File "/root/Data-Copilot/main.py", line 267, in run
    task_name = list(task_plan.keys())[1].split('_task')[0] #visualization_task
IndexError: list index out of range

您好，在论文中提到，使用gpt-4 对接口设计，请问这块如何使用gpt实现的呢?

You exceeded your current quota, please check your plan and billing details., Retrying in 20 seconds.

I really appreciate your work, but when I try this app, I run into this problem "You exceeded your current quota, please check your plan and billing details., Retrying in 20 seconds.", it seems like the problem of my openai_key, even if I renewed my openai_key, do you have any suggestions to fix this problem ?

which version of python should I use?

when I use python3.11, i meet this error:
Collecting numpy==1.22.4 (from -r requirements.txt (line 4))
Downloading numpy-1.22.4.zip (11.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.5/11.5 MB 44.9 kB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [86 lines of output]
:66: RuntimeWarning: NumPy 1.22.4 may not yet support Python 3.11.
Running from numpy source directory.
running egg_info
running build_src
INFO: build_src
creating numpy.egg-info
writing numpy.egg-info/PKG-INFO
writing dependency_links to numpy.egg-info/dependency_links.txt
writing entry points to numpy.egg-info/entry_points.txt
writing top-level names to numpy.egg-info/top_level.txt
writing manifest file 'numpy.egg-info/SOURCES.txt'
/usr/local/Cellar/[email protected]/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/setuptools/command/egg_info.py:643: SetuptoolsDeprecationWarning: Custom 'build_py' does not implement 'get_data_files_without_manifest'.
Please extend command classes from setuptools instead of distutils.
warnings.warn(
INFO: unifing config_cc, config, build_clib, build_ext, build commands --compiler options
INFO: unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/usr/local/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
^^^^^^^^^^^^^^^^^^^^^

执行main方法时，折线图能正常显示。执行app在可视化界面上，折线图是一张空的全白图。

在执行app.py时，最后的日志是这个：StopIteration。

本地无法运行app.py

您好，我本地按照步骤操作，一直报错，请问怎么解决了，谢谢！

有计划接入其他行业的数据吗？

目前只接入了金融的一些数据

TuShare 的token，账户积分不够大家都怎么解决的哈？

我看调用数据接口需要TuShare的token，我注册了一个TuShare的账号，但是我的积分貌似不够

利用大模型进行接口设计的开源计划

感谢您的开源工作！
论文中提到，该工作可以分为两部分，第一部分为用大模型进行接口设计，第二部分为对用户需求进行分析处理并得到相关结果。我看到开源代码中更多的是关于第二部分的体现，想问一下第一部分利用大模型进行接口设计的代码是否有计划开源？大概什么时候？

自己定义函数，让llm选择是这样吗？

这是提供一个框架么

用哪个版本的python？

试了3.10,3.11,3.12都不行，博主用的哪个版本的python。MAC电脑

请教下，如何调用自定义的函数的

Outputting Figure from Main.py Source Code and Seeking Feedback/Correction

Read the source code of main.py, output the figure below, hoping to help friends who want to read the source code, also hoping to correct any errors. 😊

How is this different from openai's function call

Hi，thanks for your Data-Copilot project.
After reading Data-Copilot, I think this is a kind of function call implementation based on prompt. so no offense I am confused if I can use function call instead of your project ~ ❓

如何理解用户意图

您好，我看你的demo，看起来都是用户的输入的问题是固定的，假设用户的问题不是固定的呢？我想请教问题

如果理解用户意图，如果提的问题不合理，应该如何去让用户补全信息
拿到用户意图之后，应该是抽取用户信息，拆解成时间，维度，指标，还有一些where的关系，如大于，小于等
拿到结构化的用户信息之后，再去让LLM规划执行步骤，那就是1拿数据，2计算数据了

现在最难的是，如何理解用户意图，并做信息抽取？大佬有解决思路吗？

本地方式启动解码报错

本地启动python main.py，报下面的gbk解码错误，请问这个是为什么？
检查了下系统是utf-8的编码方式的
(venv) PS D:\Tools\PyCharm 2023.2\project\Data-Copilot-main> python main.py
None
None
===============================Intent Detecting===========================================
Traceback (most recent call last):
File "D:\Tools\PyCharm 2023.2\project\Data-Copilot-main\main.py", line 382, in
output, image, df , output_result = run(instruction, send_chat_request_Azure = openai_call, openai_key=openai_key, api_base='', engine='')
File "D:\Tools\PyCharm 2023.2\project\Data-Copilot-main\main.py", line 128, in run
prompt_task_dict = json.load(f)
File "D:\refer-self\study-self\python\lib\json_init_.py", line 293, in load
return loads(fp.read(),

UnicodeDecodeError: 'gbk' codec can't decode byte 0xa5 in position 46: illegal multibyte sequence

图像显示不出来

按照要求进行布置环境后，其他都正常，就是图像显示不出来。
运行的是类似于”给我画一下可孚医疗2022年年中到今天的股价“这种指令

Something went wrong Connection errored out.

Error：'MyThread' object has no attribute 'result'

刚刚在huggingface平台上试了自己写例子和使用官方的例子，程序最终都在工具选择阶段显示“'MyThread' object has no attribute 'result'”

关于Interface Design的问题

data-copilot的设计很新颖。关于论文中提到的Interface Design这部分的内容，我有一些问题希望得到您的回复。

interface design可以理解为是通过prompt让LLM自动生成指定编程语言的代码吗？
仓库中有关于这部分的代码吗？我好像没找到相关的代码和prompt。

error

Expecting value: line 1 column 1 (char 0),界面输入key出错

KeyError: 'result2'

启动main.py 后报如下的异常：

===============================Intent Detecting===========================================
2023-06-28 09:36:54
new_instruction: 预测未来**4个季度的GDP增长率，展示基于2023年06月27日的数据预测未来4个季度的GDP增长率并打印表格
===============================Task Planing===========================================
2023-06-28 09:37:00
economic_task : 获取从20230627到20240627的季度GDP数据并预测未来4个季度的增长率
visualization_task : 打印未来4个季度的GDP增长率预测数据表格
===============================Tool select and using Stage===========================================
2023-06-28 09:37:05

step: step1=
content: {
"arg1": ["20230627","20240627","gdp_yoy"],
"function1": "get_GDP_data",
"output1": "result1",
"description1": "gdp同比增速数据"
}
It has parallel steps: 1.0
Generated an exception: 抱歉，您没有访问该接口的权限，权限的具体详情访问：https://tushare.pro/document/1?doc_id=108。

step: step2=
content: {
"arg1": ["result1","gdp_yoy",4],
"function1": "predict_next_value",
"output1": "result2",
"description1": "未来4个季度gdp同比增速数据预测数据"
}
It has parallel steps: 1.0
Generated an exception: 'result1'
===============================Visualization Stage===========================================
11111111111111111111111111111111111111
input1
['result2']
{}
Traceback (most recent call last):
File "/home/Data-Copilot/Data-Copilot/main.py", line 390, in
output, image, df , output_result = run(instruction, send_chat_request_Azure = openai_call, openai_key=openai_key, api_base='', engine='')
File "/home/Data-Copilot/Data-Copilot/main.py", line 283, in run
Previous_result[rename] = result_buffer[output_name][1]
KeyError: 'result2'

Spelling error in prompt file and file not found error when executing main.py

In the file https://github.com/zwq2018/Data-Copilot/blob/main/prompt_lib/prompt_task.json, there is a spelling error on line 8 where "finanical" is written instead of "financial". When I ask "What is the net profit of Ping An Insurance in China?", and execute the line in main.py where tool_lib = './tool_lib/' + 'tool_' + task_name + '.json' and tool_prompt = './prompt_lib/' + 'prompt_' + task_name + '.json', it returns an error saying that the file cannot be found.

未响应

遇到一个问题，我在本地启动 app.py ,已经在tool.py 中填入 tushare.pro 的 token，然后在gradio页面填入 key api base engine，点击 ok后已经显示一直在加载，后台也没看见什么信息。

请问可以链接到开源模型吗

如何做某专业领域的数据分析助手？

感谢开源贡献！
最近对大语言模型的应用很感兴趣，想请教下两个问题，
1、如果想做某个垂直专业领域的数据报表分析（比如某些生产过程中的指标数据等等），如何参考这个项目做适配？主要是修改处理数据源的方法？
2、基座大语言模型可以换成开源的大语言模型吗？比如ChatGLM系列或者 LLama系列？

期待能给一些思路！感谢！

local variable 'thread' referenced before assignment

线程声明的是个局部变量使用到的时候有问题启动有问题

报错：KeyError: 'arg'

在这条语句：result_buffer_viz = parse_and_exe(call_dict, result_buffer_viz, parallel_step = '' )中出错

call_dict = {'arg1': ['20230809', '20240809', 'gdp_yoy'], 'function1': 'get_GDP_data', 'output1': 'result1', 'description1': '四个季度的GDP增长率数据'}

call_steps返回有误

main.py中call_steps, _ = response.split('###')有时候gpt返回的后缀不带###，可以判断一下

关于接口设计部分问题请教

您好！想请问下在接口设计部分，您提到了通过种子request来self-instruct得到更多的request，请问这个过程应该需要生成多少量级的新request才可以比较好的完成接口定义部分呢？以及self-request，interface implemantation这两部分的prompt词您方便提供吗？谢谢！

How to create new tools(functions)

thx for the excellent invention! that's exactly what I want.
I looked through the code and I've read the paper, and it seems that there is no code for 「saving the running unresolved problem and generating new tools」.
Does it mean that the tools are pre-generated/defined and cannot be expanded later? If not, would you please give me some instructions as to where is this part of the code.

用Langchain的agent可以实现类似的流程吗？

是否有考虑借助Langchain复现本工作？

请问模型设计的接口代码是如何实现的？我应当如何让项目调取本地数据？

你好，在论文中有提及本项目是如何通过self-request和自行设计的interface来调取数据的，但是没有详细阐述是通过何种方式来调取金融信息和实现interface的代码的，请问这是通过llm的代码生成能力自行完成的吗？还有论文中似乎没有特别提及的如何实现调取本地数据以及应当如何自行添加数据描述应该如何实现？感谢回答

关于数据信息的问题

请问parsing data file为什么要记录数据的第一行和最后一行，它们起到了什么作用呢？

能否连接本地数据

非常感谢您的开源工作
请问这个项目可以做到连接本地的BI数据，对其他领域的数据进行分析预测吗？

Error encountered while attempting "json.decoder.JSONDecodeError"

My question is "json.decoder.JSONDecodeError", and I encountered the following error. Please refer to the details in the attached screenshot.

thanks

咨询下企业内部数据报表分析工具的思路

感谢开源贡献，整个项目对我的启发很大！
最近我在研究企业自己内部的数据报表分析（如询问企业的项目进度/缺陷统计/人力分布等），想问以下问题，希望能得到答复，感谢！
1、LLM接口的代替：为了保证数据的安全性，数据不能外传，无法调用GPT3.5、GPT4等接口，只能部署一些开源大模型如Qwen、ChatGLM来做大模型的代替，请问当前项目是否已经支持替换模型调用接口
2、数据源的代替：当前项目是以金融的数据源为例，比如说我想替换为企业内部的数据，我的数据是在SQL数据库里头的多个结构化表的实时数据，应该怎么做对接，对接后怎么让data-copilot生成调用接口？
3、我之前的一个实现思路是NL2SQL，把表结构喂给模型，然后给模型问题，让模型直接写出SQL，之后查到数据，再让界面呈现表格。data-copilot查数据的形式当前是用多个接口查的，那单个接口如果要取出数据库里头的数据，其实还是绕不过NL2SQL这个桥梁，想问下在这方面有做过什么研究，如何提高接口实现的准确率。
4、再问下有无高校与企业技术项目的合作可能~~
感谢！