zwq2018 / data-copilot Goto Github PK
View Code? Open in Web Editor NEWData-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow
Home Page: https://arxiv.org/abs/2306.07209
License: MIT License
Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow
Home Page: https://arxiv.org/abs/2306.07209
License: MIT License
Do I need to modify some interfaces?
我在main.py和lab_gpt4_call.py都有设置OPENAI_API_KEY变量,但是还是会提示No API key provided.
考虑吧LLM接口替换成清华GLM之类的模型接口么
Would you pls share the tushare token for us? when use the instruction '预测未来**4个季度的GDP增长率', it will call the tool api.
Many thx!
step: step2=
content: {
"arg1": ["result1","gdp_yoy",4],
"function1": "predict_next_value",
"output1": "result2",
"description1": "未来4个季度gdp同比增速数据预测数据"
}
It has parallel steps: 1.0
Generated an exception: 'result1'
===============================Visualization Stage===========================================
Process finished with exit code 1
运行 main.py 后
问题:今天的可孚医疗股价是多少
===============================Intent Detecting===========================================
2023-07-07 12:19:09
new_instruction: 今天的日期是2023-07-06,请帮我查询可孚医疗今天的股价。
===============================Task Planing===========================================
2023-07-07 12:19:11
stock_task : 获取可孚医疗今天的股价数据
===============================Tool select and using Stage===========================================
2023-07-07 12:19:13
==================
step: step1=
content: {
"arg1": ["可孚医疗","20220720","20220720","daily"],
"function1": "get_stock_prices_data",
"output1": "result1",
"description1": "可孚医疗今日股价数据"
}
It has parallel steps: 1.0
parallel step: 1
===============================Visualization Stage===========================================
Traceback (most recent call last):
File "/root/Data-Copilot/main.py", line 381, in <module>
output, image, df , output_result = run(instruction, send_chat_request_Azure = openai_call, openai_key=openai_key, api_base='', engine='')
File "/root/Data-Copilot/main.py", line 267, in run
task_name = list(task_plan.keys())[1].split('_task')[0] #visualization_task
IndexError: list index out of range
您好,在论文中提到,使用gpt-4 对接口设计,请问这块如何使用gpt实现的呢?
I really appreciate your work, but when I try this app, I run into this problem "You exceeded your current quota, please check your plan and billing details., Retrying in 20 seconds.", it seems like the problem of my openai_key, even if I renewed my openai_key, do you have any suggestions to fix this problem ?
when I use python3.11, i meet this error:
Collecting numpy==1.22.4 (from -r requirements.txt (line 4))
Downloading numpy-1.22.4.zip (11.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.5/11.5 MB 44.9 kB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [86 lines of output]
:66: RuntimeWarning: NumPy 1.22.4 may not yet support Python 3.11.
Running from numpy source directory.
running egg_info
running build_src
INFO: build_src
creating numpy.egg-info
writing numpy.egg-info/PKG-INFO
writing dependency_links to numpy.egg-info/dependency_links.txt
writing entry points to numpy.egg-info/entry_points.txt
writing top-level names to numpy.egg-info/top_level.txt
writing manifest file 'numpy.egg-info/SOURCES.txt'
/usr/local/Cellar/[email protected]/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/setuptools/command/egg_info.py:643: SetuptoolsDeprecationWarning: Custom 'build_py' does not implement 'get_data_files_without_manifest'.
Please extend command classes from setuptools instead of distutils.
warnings.warn(
INFO: unifing config_cc, config, build_clib, build_ext, build commands --compiler options
INFO: unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/usr/local/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
^^^^^^^^^^^^^^^^^^^^^
在执行app.py时,最后的日志是这个:StopIteration。
目前只接入了金融的一些数据
我看调用数据接口需要TuShare的token,我注册了一个TuShare的账号,但是我的积分貌似不够
感谢您的开源工作!
论文中提到,该工作可以分为两部分,第一部分为用大模型进行接口设计,第二部分为对用户需求进行分析处理并得到相关结果。我看到开源代码中更多的是关于第二部分的体现,想问一下第一部分利用大模型进行接口设计的代码是否有计划开源?大概什么时候?
这是提供一个框架么
试了3.10,3.11,3.12都不行,博主用的哪个版本的python。MAC电脑
Hi,thanks for your Data-Copilot project.
After reading Data-Copilot, I think this is a kind of function call implementation based on prompt. so no offense I am confused if I can use function call instead of your project ~ ❓
您好,我看你的demo,看起来都是用户的输入的问题是固定的,假设用户的问题不是固定的呢?我想请教问题
现在最难的是,如何理解用户意图,并做信息抽取?大佬有解决思路吗?
本地启动python main.py,报下面的gbk解码错误,请问这个是为什么?
检查了下系统是utf-8的编码方式的
(venv) PS D:\Tools\PyCharm 2023.2\project\Data-Copilot-main> python main.py
None
None
===============================Intent Detecting===========================================
Traceback (most recent call last):
File "D:\Tools\PyCharm 2023.2\project\Data-Copilot-main\main.py", line 382, in
output, image, df , output_result = run(instruction, send_chat_request_Azure = openai_call, openai_key=openai_key, api_base='', engine='')
File "D:\Tools\PyCharm 2023.2\project\Data-Copilot-main\main.py", line 128, in run
prompt_task_dict = json.load(f)
File "D:\refer-self\study-self\python\lib\json_init_.py", line 293, in load
return loads(fp.read(),
刚刚在huggingface平台上试了自己写例子和使用官方的例子,程序最终都在工具选择阶段显示“'MyThread' object has no attribute 'result'”
data-copilot的设计很新颖。关于论文中提到的Interface Design这部分的内容,我有一些问题希望得到您的回复。
Expecting value: line 1 column 1 (char 0),界面输入key出错
启动main.py 后 报如下的异常:
step: step2=
content: {
"arg1": ["result1","gdp_yoy",4],
"function1": "predict_next_value",
"output1": "result2",
"description1": "未来4个季度gdp同比增速数据预测数据"
}
It has parallel steps: 1.0
Generated an exception: 'result1'
===============================Visualization Stage===========================================
11111111111111111111111111111111111111
input1
['result2']
{}
Traceback (most recent call last):
File "/home/Data-Copilot/Data-Copilot/main.py", line 390, in
output, image, df , output_result = run(instruction, send_chat_request_Azure = openai_call, openai_key=openai_key, api_base='', engine='')
File "/home/Data-Copilot/Data-Copilot/main.py", line 283, in run
Previous_result[rename] = result_buffer[output_name][1]
KeyError: 'result2'
In the file https://github.com/zwq2018/Data-Copilot/blob/main/prompt_lib/prompt_task.json, there is a spelling error on line 8 where "finanical" is written instead of "financial". When I ask "What is the net profit of Ping An Insurance in China?", and execute the line in main.py where tool_lib = './tool_lib/' + 'tool_' + task_name + '.json' and tool_prompt = './prompt_lib/' + 'prompt_' + task_name + '.json', it returns an error saying that the file cannot be found.
感谢开源贡献!
最近对大语言模型的应用很感兴趣,想请教下两个问题,
1、如果想做某个垂直专业领域的数据报表分析(比如某些生产过程中的指标数据等等),如何参考这个项目做适配?主要是修改处理数据源的方法?
2、基座大语言模型可以换成开源的大语言模型吗?比如ChatGLM系列 或者 LLama系列?
期待能给一些思路!感谢!
在这条语句:result_buffer_viz = parse_and_exe(call_dict, result_buffer_viz, parallel_step = '' )中出错
call_dict = {'arg1': ['20230809', '20240809', 'gdp_yoy'], 'function1': 'get_GDP_data', 'output1': 'result1', 'description1': '四个季度的GDP增长率数据'}
main.py中call_steps, _ = response.split('###')有时候gpt返回的后缀不带###,可以判断一下
您好!想请问下在接口设计部分,您提到了通过种子request来self-instruct得到更多的request,请问这个过程应该需要生成多少量级的新request才可以比较好的完成接口定义部分呢?以及self-request,interface implemantation这两部分的prompt词您方便提供吗?谢谢!
thx for the excellent invention! that's exactly what I want.
I looked through the code and I've read the paper, and it seems that there is no code for 「saving the running unresolved problem and generating new tools」.
Does it mean that the tools are pre-generated/defined and cannot be expanded later? If not, would you please give me some instructions as to where is this part of the code.
是否有考虑借助Langchain复现本工作?
你好,在论文中有提及本项目是如何通过self-request和自行设计的interface来调取数据的,但是没有详细阐述是通过何种方式来调取金融信息和实现interface的代码的,请问这是通过llm的代码生成能力自行完成的吗?还有论文中似乎没有特别提及的如何实现调取本地数据以及应当如何自行添加数据描述应该如何实现?感谢回答
请问parsing data file为什么要记录数据的第一行和最后一行,它们起到了什么作用呢?
非常感谢您的开源工作
请问这个项目可以做到连接本地的BI数据,对其他领域的数据进行分析预测吗?
感谢开源贡献,整个项目对我的启发很大!
最近我在研究企业自己内部的数据报表分析(如询问企业的项目进度/缺陷统计/人力分布等),想问以下问题,希望能得到答复,感谢!
1、LLM接口的代替:为了保证数据的安全性,数据不能外传,无法调用GPT3.5、GPT4等接口,只能部署一些开源大模型如Qwen、ChatGLM来做大模型的代替,请问当前项目是否已经支持替换模型调用接口
2、数据源的代替:当前项目是以金融的数据源为例,比如说我想替换为企业内部的数据,我的数据是在SQL数据库里头的多个结构化表的实时数据,应该怎么做对接,对接后怎么让data-copilot生成调用接口?
3、我之前的一个实现思路是NL2SQL,把表结构喂给模型,然后给模型问题,让模型直接写出SQL,之后查到数据,再让界面呈现表格。data-copilot查数据的形式当前是用多个接口查的,那单个接口如果要取出数据库里头的数据,其实还是绕不过NL2SQL这个桥梁,想问下在这方面有做过什么研究,如何提高接口实现的准确率。
4、再问下有无高校与企业技术项目的合作可能~~
感谢!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.