cocacola-lab / chatie Goto Github PK

The online version is temporarily unavailable because we cannot afford the key. You can clone and run it locally. Note: we set defaul openai key. If keys exceed plan and are invalid, please tell us. The response speed depends on openai. ( sometimes, the official is too crowded and slow)

Home Page: http://124.221.16.143:5000/

License: Other

Python 76.01% HTML 2.24% CSS 5.06% JavaScript 16.69%

chatgpt information-extraction zero-shot event-extraction ner tool nlp relation-extraction ai openai

chatie's Introduction

ChatIE🐬

Official repository of paper "Zero-Shot Information Extraction via Chatting with ChatGPT". Please star, watch, and fork our repo for the active updates!

Abstract

Zero-shot information extraction (IE) aims to build IE systems from the unannotated text. It is challenging due to involving little human intervention. Challenging but worthwhile, zero-shot IE reduces the time and effort that data labeling takes. Recent efforts on large language models (LLMs, e.g., GPT3, ChatGPT) show promising performance on zero-shot settings, thus inspiring us to explore prompt-based methods. In this work, we ask whether strong IE models can be constructed by directly prompting LLMs. Specifically, we transform the zero-shot IE task into a multi-turn question-answering problem with a two-stage framework (ChatIE). With the power of ChatGPT, we extensively evaluate our framework on three IE tasks: entityrelation triple extract, named entity recognition, and event extraction. Empirical results on six datasets across two languages show that ChatIE achieves impressive performance and even surpasses some full-shot models on several datasets (e.g., NYT11-HRL). We believe that our work could shed light on building IE models with limited resources.

零样本信息抽取（Information Extraction，IE）旨在从无标注文本中建立IE系统，因为很少涉及人为干预，该问题非常具有挑战性。但零样本IE不再需要标注数据时耗费的时间和人力，因此十分重要。近来的大规模语言模型（例如GPT-3，Chat GPT）在零样本设置下取得了很好的表现，这启发我们探索基于提示的方法来解决零样本IE任务。我们提出一个问题：不经过训练来实现零样本信息抽取是否可行？我们将零样本IE任务转变为一个两阶段框架的多轮问答问题（Chat IE）,并在三个IE任务中广泛评估了该框架：实体关系三元组抽取、命名实体识别和事件抽取。在两个语言的6个数据集上的实验结果表明，Chat IE取得了非常好的效果，甚至在几个数据集上（例如NYT11-HRL）上超过了全监督模型的表现。我们的工作能够为有限资源下IE系统的建立奠定基础。

Methods

Results

Tools🧰

UPDATE： we use the official api, the tool becomes more faster!!! if the key exceed limits please tell us.

NOTICE： The response speed depends on the official openai chatgpt api. (sometimes, the official is too crowded and the speed will be slow or the chatgpt will be overloaded.) Moreover, you better use your own openai key because if our default account is used by multiple people at the same time, the account may be overloaded.

NOTICE: because official api is not available in domestic, so we use api from revChatGPT and v1 version. But it's too slow, so we advise you use the tool offline for study. We will update the api further in the future (TODO).

we also provide a IE tool based on GPT3.5, you can see in GPT4IE

Description

ChatIE (Zero-Shot Information Extraction via Chatting with ChatGPT) is a open-source and powerful IE tool demo. Enhanced by ChatGPT and prompting, it aims to automatically extract structured information from a raw sentence and make a valuable in-depth analysis of the input sentence. Harnessing valuable structured information helps corporations make incisive and business–improving decisions.

We support the following functions:

Task	Name	Lauguages
RE	entity-relation joint extraction	Chinese, English
NER	named entity recoginzation	Chinese, English
EE	event extraction	Chinese, English

RE

This task aims to extract triples from plain texts, such as (China, capital, Beijing) , (《如懿传》, 主演, 周迅).

Input

sentence: a plain text.
relation type list (rtl)* : {'relation type 1': ['subject1', 'object1'], 'relation type 2': ['subject2', 'object2'], ...}

PS: * denote optional, we set default value for them. But for better extraction, you should specify the three list according to application scenarios.

Examples

sentence: Four other Google executives the chief financial officer , George Reyes ; the senior vice president for business operations , Shona Brown ; the chief legal officer , David Drummond ; and the senior vice president for product management , Jonathan Rosenberg earned salaries of $ 250,000 each .
rtl: default, see file "default-types"
ouptut:

sentence: 第五部：《如懿传》《如懿传》是一部古装宫廷情感电视剧，由汪俊执导，周迅、霍建华、张钧甯、董洁、辛芷蕾、童瑶、李纯、邬君梅等主演。
rtl: default, see file "default-types"
ouptut:

NER

This task aims to extract entities from plain texts, such as (LOC, Beijing) , (人物, 周恩来).

Input

sentence: a plain text.
entity type list (etl)* : ['entity type 1', 'entity type 2', ...]

Examples

sentence: James worked for Google in Beijing, the capital of China. etl: ['LOC', 'MISC', 'ORG', 'PER']
ouptut:

sentence: **共产党创立于中华民国大陆时期，由陈独秀和李大钊领导组织。
etl: ['组织机构', '地点', '人物']
ouptut:

EE

This task aims to extract event from plain texts, such as {Life-Divorce: {Person: Bob, Time: today, Place: America}} , {竞赛行为-晋级: {时间: 无, 晋级方: 西北狼, 晋级赛事: 中甲榜首之争}}.

Input

sentence: a plain text.
event type list (etl)* : {'event type 1': ['argument role 1', 'argument role 2', ...], ...}

sentence: Yesterday Bob and his wife got divorced in Guangzhou.
etl: default, see file "default-types"
ouptut:

sentence: 在2022年卡塔尔世界杯决赛中，阿根廷以点球大战险胜法国。
etl: default, see file "default-types"
ouptut:

Setup

react+flask

cd front-end and Run npm install to download required dependencies.
Run npm run start. ChatIE should open up in a new browser tab.
cd back-end and Run python run.py.
note: node-version v14.17.4 npm-version 9.6.0
you may need to configure proxy on your machine.

Examples

RE

NER

EE

Data usage policy

We are committed to improving our project and providing you with the best possible experience. To achieve this, we will collect your data to help us understand how you interact with our project and identify areas for improvement. We value the privacy and security of your data and ensure the data only for the purposes of improving our project.

Citation

Checkout this paper arxiv: 2302.10205

@article{wei2023zero,
  title={Zero-Shot Information Extraction via Chatting with ChatGPT},
  author={Wei, Xiang and Cui, Xingyu and Cheng, Ning and Wang, Xiaobin and Zhang, Xin and Huang, Shen and Xie, Pengjun and Xu, Jinan and Chen, Yufeng and Zhang, Meishan and others},
  journal={arXiv preprint arXiv:2302.10205},
  year={2023}
}

chatie's People

Contributors

Stargazers

Watchers

Forkers

threecolorfr dumpmemory hellodannyliu maniyantingliu shmctchina denglizong mplebron zhiqic yezhwi chengli0327 vu1seek shellingford221 jhlim-gsds nanqiai 331000738 lightdgx smarttraffic2021 huyhoang17 aicodehunt zchengzhong fangzheng354 cainiaogoroad mars-wei huberywjh russelmcgrady bobo04020802 autowds rourouz qcxyzff hy160518 kioco gallllong wayson20 tian64873493 liuq29 skliu2001 chenylong greitzmann huangtao36 ami4411 zhangzhiyi0108 zhenghliu5 jiyulongxu smilingwalker yuanmeng1120 another-noob-coder mtcto yeungbo zwygit2025 millerjin zh-feng elegant-spider wurentidai handsomemao knowledgehacker zl-comment harry8207 ytwu1314 tiago-clementino xiuixb raidery zouxiaodong

chatie's Issues

关于默认API key

Incorrect API key

"error-stage1:Incorrect API key provided: sk-5ZHQn*****************************************************************************************pYxy. You can find your API key at https://platform.openai.com/account/api-keys."

请问下API key测试可用，为啥代码里面报错incorrect api key

如何接入ChatGLM呢

Prompt数据集是否可以开源

您好，很棒的一篇工作。
请问会不会将经过Prompt转换后的数据集分享出来 :)

为什么ChatIE在两个Du数据集上效果那么好呢？

按理说这两个Du数据集在关系类型上比其他两个同类任务的数据集要丰富的多，根据其他研究者的类似结论，当关系类型很丰富的情况下，chatGPT应该更难做，为什么在这两个数据集上这么反常呢？

DUIE2.0数据集问题

请问DUIE2.0官方没提供测试集的标签，您如何划分的测试集

关于prompt template

您好，请问你们的prompt template是人工定义的吧，定义时有没有考虑测试更换不同的template内容呢？还有测试不同数据集的时候是将一条一条句子带入template后测试的吗？能不能整个数据集上传完成抽取？

请问出现这个问题HTTPSConnectionPool(host='api.openai.com', port=443): Max retries exceeded with url: /v1/chat/completions 是什么原因

Error communicating with OpenAI: HTTPSConnectionPool(host='api.openai.com', port=443): Max retries exceeded with url: /v1/chat/completions (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f20e6998910>: Failed to establish a new connection: [Errno 111] Connection refused

关于Vanilla Prompt

你好，
我看论文中的你们对“Vanilla Prompt vs. Our Chat-based Prompt”做了对比，这里的Vanilla Prompt具体有文章吗？是哪篇文章做的提示学习吗？想了解下，方便解释一下吗，谢谢。

Note and solve: Rate Limit openai gpt-3.5-turbo

"error-stage1:Rate limit reached for default-gpt-3.5-turbo in organization org-NATszHOraQIVn1LhMbHGpI3Q on requests per min. Limit: 3 / min. Please try again in 20s. Contact [email protected] if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method."

openai官方对请求速率限制一分钟3个，在此提醒一下。
因为是多轮框架，所以一个实例中间可能会断，影响结果。
Solve：TODO

项目运行的问题，前端界面报错：SyntaxError: Unexpected token '<', "<!DOCTYPE "... is not valid JSON

按照项目说明的步骤：

Run npm run start 。浏览器自动弹出界面。
cd back-end and Run python run.py. 。弹出提示：Running on http://127.0.0.1:3000/ (Press CTRL+C to quit)，进入链接http://127.0.0.1:3000，界面内容为：Not Found。The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.
我的key，openai.proxy都设置完成了。
4.我在前端尝试输入一段话，就报错：http://localhost:5000/ says SyntaxError: Unexpected token '<', "<!DOCTYPE "... is not valid JSON。后端也报错：TypeError: The view function did not return a valid response. The function either returned None or ended without a return statement.

NER Evaluation

This may relate more to the paper than the tool, sorry if it is the wrong channel.

How exactly is your NER evaluation performed. The relevant section in the paper is as follows:

We only consider the complete matching and use the micro F1 to evaluate NER task.
Only when both the border and the type of the predicted entity and the true entity are the same will we regard it as a correct prediction.

Given your example prompts in the appendix only list expected outputs of the form: ["Japan", "LOC"], there is no direct access to a border/offset. Do you just look for the answer text in the original sentence and take that index?

unable to land demo http://124.221.16.143:5000/

关于API Key

我的API Key在GPT4IE中可以正确返回结果，为什么在ChatIE中报错error-stage1:Incorrect API key provided: key 3. You can find your API key at https://platform.openai.com/account/api-keys.

您好，图例中的命令在哪里运行呢，npm install在哪里呢

API问题

谢谢你们的开源贡献和新想法！

最近openai API似乎很慢，总是报RateLimitError的错误，几乎没法批量调用数据生成答案。想了解一下你们是如何解决的？付费成为plus会不会好一点。

access token问题

{"detail":{"message":"Your authentication token has expired. Please try signing in again.","type":"invalid_request_error","param":null,"code":"token_expired"}} OpenAI: {"detail":{"message":"Your authentication token has expired. Please try signing in again.","type":"invalid_request_error","param":null,"code":"token_expired"}} (code: 401)
以上是错误信息
应该是指access token过期了。
不太懂access token这个是什么意思，和openai的key有关吗？如何修复？

cocacola-lab / chatie Goto Github PK

chatie's Introduction

ChatIE🐬

Abstract

Methods

Results

Tools🧰

Description

RE

Input

Examples

NER

Input

Examples

EE

Input

Setup

Examples

RE

NER

EE

Data usage policy

Citation

chatie's People

Contributors

Stargazers

Watchers

Forkers

chatie's Issues

Recommend Projects

Recommend Topics

Recommend Org