Coder Social home page Coder Social logo

data-copilot's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-copilot's Issues

api替换

考虑吧LLM接口替换成清华GLM之类的模型接口么

Generated an exception: 抱歉,您没有访问该接口的权限,权限的具体详情访问:https://tushare.pro/document/1?doc_id=108。

Would you pls share the tushare token for us? when use the instruction '预测未来**4个季度的GDP增长率', it will call the tool api.
Many thx!

step: step1=
content: {
"arg1": ["2000101","20230707","gdp_yoy"],
"function1": "get_GDP_data",
"output1": "result1",
"description1": "gdp同比增速数据"
}
It has parallel steps: 1.0
Traceback (most recent call last):
File "D:/workspace/02_from_git/Data-Copilot/main.py", line 380, in
output, image, df, output_result = run(instruction, send_chat_request_Azure=openai_call, openai_key=openai_key, api_base='', engine='')
File "D:/workspace/02_from_git/Data-Copilot/main.py", line 273, in run
Previous_result[rename] = result_buffer[output_name][1]
KeyError: 'result2'
Generated an exception: 抱歉,您没有访问该接口的权限,权限的具体详情访问:https://tushare.pro/document/1?doc_id=108。

step: step2=
content: {
"arg1": ["result1","gdp_yoy",4],
"function1": "predict_next_value",
"output1": "result2",
"description1": "未来4个季度gdp同比增速数据预测数据"
}
It has parallel steps: 1.0
Generated an exception: 'result1'
===============================Visualization Stage===========================================

Process finished with exit code 1

IndexError: list index out of range

运行 main.py 后

问题:今天的可孚医疗股价是多少

===============================Intent Detecting===========================================
2023-07-07 12:19:09
new_instruction: 今天的日期是2023-07-06,请帮我查询可孚医疗今天的股价。
===============================Task Planing===========================================
2023-07-07 12:19:11
stock_task : 获取可孚医疗今天的股价数据
===============================Tool select and using Stage===========================================
2023-07-07 12:19:13
==================


step: step1=
content: {
 "arg1": ["可孚医疗","20220720","20220720","daily"],
 "function1": "get_stock_prices_data",
 "output1": "result1",
 "description1": "可孚医疗今日股价数据"
}
It has parallel steps: 1.0
parallel step: 1
===============================Visualization Stage===========================================
Traceback (most recent call last):
  File "/root/Data-Copilot/main.py", line 381, in <module>
    output, image, df , output_result = run(instruction, send_chat_request_Azure = openai_call, openai_key=openai_key, api_base='', engine='')
  File "/root/Data-Copilot/main.py", line 267, in run
    task_name = list(task_plan.keys())[1].split('_task')[0] #visualization_task
IndexError: list index out of range

which version of python should I use?

when I use python3.11, i meet this error:
Collecting numpy==1.22.4 (from -r requirements.txt (line 4))
Downloading numpy-1.22.4.zip (11.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.5/11.5 MB 44.9 kB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [86 lines of output]
:66: RuntimeWarning: NumPy 1.22.4 may not yet support Python 3.11.
Running from numpy source directory.
running egg_info
running build_src
INFO: build_src
creating numpy.egg-info
writing numpy.egg-info/PKG-INFO
writing dependency_links to numpy.egg-info/dependency_links.txt
writing entry points to numpy.egg-info/entry_points.txt
writing top-level names to numpy.egg-info/top_level.txt
writing manifest file 'numpy.egg-info/SOURCES.txt'
/usr/local/Cellar/[email protected]/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/setuptools/command/egg_info.py:643: SetuptoolsDeprecationWarning: Custom 'build_py' does not implement 'get_data_files_without_manifest'.
Please extend command classes from setuptools instead of distutils.
warnings.warn(
INFO: unifing config_cc, config, build_clib, build_ext, build commands --compiler options
INFO: unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/usr/local/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
^^^^^^^^^^^^^^^^^^^^^

利用大模型进行接口设计的开源计划

感谢您的开源工作!
论文中提到,该工作可以分为两部分,第一部分为用大模型进行接口设计,第二部分为对用户需求进行分析处理并得到相关结果。我看到开源代码中更多的是关于第二部分的体现,想问一下第一部分利用大模型进行接口设计的代码是否有计划开源?大概什么时候?

How is this different from openai's function call

Hi,thanks for your Data-Copilot project.
After reading Data-Copilot, I think this is a kind of function call implementation based on prompt. so no offense I am confused if I can use function call instead of your project ~ ❓

如何理解用户意图

您好,我看你的demo,看起来都是用户的输入的问题是固定的,假设用户的问题不是固定的呢?我想请教问题

  1. 如果理解用户意图,如果提的问题不合理,应该如何去让用户补全信息
  2. 拿到用户意图之后,应该是抽取用户信息,拆解成时间,维度,指标,还有一些where的关系,如大于,小于等
  3. 拿到结构化的用户信息之后,再去让LLM规划执行步骤,那就是1拿数据,2计算数据了

现在最难的是,如何理解用户意图,并做信息抽取?大佬有解决思路吗?

本地方式启动解码报错

本地启动python main.py,报下面的gbk解码错误,请问这个是为什么?
检查了下系统是utf-8的编码方式的
(venv) PS D:\Tools\PyCharm 2023.2\project\Data-Copilot-main> python main.py
None
None
===============================Intent Detecting===========================================
Traceback (most recent call last):
File "D:\Tools\PyCharm 2023.2\project\Data-Copilot-main\main.py", line 382, in
output, image, df , output_result = run(instruction, send_chat_request_Azure = openai_call, openai_key=openai_key, api_base='', engine='')
File "D:\Tools\PyCharm 2023.2\project\Data-Copilot-main\main.py", line 128, in run
prompt_task_dict = json.load(f)
File "D:\refer-self\study-self\python\lib\json_init_.py", line 293, in load
return loads(fp.read(),

UnicodeDecodeError: 'gbk' codec can't decode byte 0xa5 in position 46: illegal multibyte sequence

图像显示不出来

按照要求进行布置环境后,其他都正常,就是图像显示不出来。
运行的是类似于”给我画一下可孚医疗2022年年中到今天的股价“这种指令
b1ff1daf318727f9f4d87d71a15e6f0

关于Interface Design的问题

data-copilot的设计很新颖。关于论文中提到的Interface Design这部分的内容,我有一些问题希望得到您的回复。

  1. interface design可以理解为是通过prompt让LLM自动生成指定编程语言的代码吗?
  2. 仓库中有关于这部分的代码吗?我好像没找到相关的代码和prompt。

error

Expecting value: line 1 column 1 (char 0),界面输入key出错

KeyError: 'result2'

启动main.py 后 报如下的异常:

===============================Intent Detecting===========================================
2023-06-28 09:36:54
new_instruction: 预测未来**4个季度的GDP增长率,展示基于2023年06月27日的数据预测未来4个季度的GDP增长率并打印表格
===============================Task Planing===========================================
2023-06-28 09:37:00
economic_task : 获取从20230627到20240627的季度GDP数据并预测未来4个季度的增长率
visualization_task : 打印未来4个季度的GDP增长率预测数据表格
===============================Tool select and using Stage===========================================
2023-06-28 09:37:05

step: step1=
content: {
"arg1": ["20230627","20240627","gdp_yoy"],
"function1": "get_GDP_data",
"output1": "result1",
"description1": "gdp同比增速数据"
}
It has parallel steps: 1.0
Generated an exception: 抱歉,您没有访问该接口的权限,权限的具体详情访问:https://tushare.pro/document/1?doc_id=108。

step: step2=
content: {
"arg1": ["result1","gdp_yoy",4],
"function1": "predict_next_value",
"output1": "result2",
"description1": "未来4个季度gdp同比增速数据预测数据"
}
It has parallel steps: 1.0
Generated an exception: 'result1'
===============================Visualization Stage===========================================
11111111111111111111111111111111111111
input1
['result2']
{}
Traceback (most recent call last):
File "/home/Data-Copilot/Data-Copilot/main.py", line 390, in
output, image, df , output_result = run(instruction, send_chat_request_Azure = openai_call, openai_key=openai_key, api_base='', engine='')
File "/home/Data-Copilot/Data-Copilot/main.py", line 283, in run
Previous_result[rename] = result_buffer[output_name][1]
KeyError: 'result2'

Spelling error in prompt file and file not found error when executing main.py

In the file https://github.com/zwq2018/Data-Copilot/blob/main/prompt_lib/prompt_task.json, there is a spelling error on line 8 where "finanical" is written instead of "financial". When I ask "What is the net profit of Ping An Insurance in China?", and execute the line in main.py where tool_lib = './tool_lib/' + 'tool_' + task_name + '.json' and tool_prompt = './prompt_lib/' + 'prompt_' + task_name + '.json', it returns an error saying that the file cannot be found.

未响应

遇到一个问题,我在本地启动 app.py ,已经在tool.py 中填入 tushare.pro 的 token,然后在gradio页面填入 key api base engine,点击 ok后已经显示一直在加载,后台也没看见什么信息。

image

如何做某专业领域的数据分析助手?

感谢开源贡献!
最近对大语言模型的应用很感兴趣,想请教下两个问题,
1、如果想做某个垂直专业领域的数据报表分析(比如某些生产过程中的指标数据等等),如何参考这个项目做适配?主要是修改处理数据源的方法?
2、基座大语言模型可以换成开源的大语言模型吗?比如ChatGLM系列 或者 LLama系列?

期待能给一些思路!感谢!

报错:KeyError: 'arg'

在这条语句:result_buffer_viz = parse_and_exe(call_dict, result_buffer_viz, parallel_step = '' )中出错

call_dict = {'arg1': ['20230809', '20240809', 'gdp_yoy'], 'function1': 'get_GDP_data', 'output1': 'result1', 'description1': '四个季度的GDP增长率数据'}

call_steps返回有误

main.py中call_steps, _ = response.split('###')有时候gpt返回的后缀不带###,可以判断一下

关于接口设计部分问题请教

您好!想请问下在接口设计部分,您提到了通过种子request来self-instruct得到更多的request,请问这个过程应该需要生成多少量级的新request才可以比较好的完成接口定义部分呢?以及self-request,interface implemantation这两部分的prompt词您方便提供吗?谢谢!

How to create new tools(functions)

thx for the excellent invention! that's exactly what I want.
I looked through the code and I've read the paper, and it seems that there is no code for 「saving the running unresolved problem and generating new tools」.
Does it mean that the tools are pre-generated/defined and cannot be expanded later? If not, would you please give me some instructions as to where is this part of the code.

请问模型设计的接口代码是如何实现的?我应当如何让项目调取本地数据?

你好,在论文中有提及本项目是如何通过self-request和自行设计的interface来调取数据的,但是没有详细阐述是通过何种方式来调取金融信息和实现interface的代码的,请问这是通过llm的代码生成能力自行完成的吗?还有论文中似乎没有特别提及的如何实现调取本地数据以及应当如何自行添加数据描述应该如何实现?感谢回答

关于数据信息的问题

请问parsing data file为什么要记录数据的第一行和最后一行,它们起到了什么作用呢?

能否连接本地数据

非常感谢您的开源工作
请问这个项目可以做到连接本地的BI数据,对其他领域的数据进行分析预测吗?

咨询下企业内部数据报表分析工具的思路

感谢开源贡献,整个项目对我的启发很大!
最近我在研究企业自己内部的数据报表分析(如询问企业的项目进度/缺陷统计/人力分布等),想问以下问题,希望能得到答复,感谢!
1、LLM接口的代替:为了保证数据的安全性,数据不能外传,无法调用GPT3.5、GPT4等接口,只能部署一些开源大模型如Qwen、ChatGLM来做大模型的代替,请问当前项目是否已经支持替换模型调用接口
2、数据源的代替:当前项目是以金融的数据源为例,比如说我想替换为企业内部的数据,我的数据是在SQL数据库里头的多个结构化表的实时数据,应该怎么做对接,对接后怎么让data-copilot生成调用接口?
3、我之前的一个实现思路是NL2SQL,把表结构喂给模型,然后给模型问题,让模型直接写出SQL,之后查到数据,再让界面呈现表格。data-copilot查数据的形式当前是用多个接口查的,那单个接口如果要取出数据库里头的数据,其实还是绕不过NL2SQL这个桥梁,想问下在这方面有做过什么研究,如何提高接口实现的准确率。
4、再问下有无高校与企业技术项目的合作可能~~
感谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.