Coder Social home page Coder Social logo

cocacola-lab / chatie Goto Github PK

View Code? Open in Web Editor NEW
772.0 7.0 63.0 5.8 MB

The online version is temporarily unavailable because we cannot afford the key. You can clone and run it locally. Note: we set defaul openai key. If keys exceed plan and are invalid, please tell us. The response speed depends on openai. ( sometimes, the official is too crowded and slow)

Home Page: http://124.221.16.143:5000/

License: Other

Python 76.01% HTML 2.24% CSS 5.06% JavaScript 16.69%
chatgpt information-extraction zero-shot event-extraction ner tool nlp relation-extraction ai openai

chatie's Introduction

ChatIE🐬

Official repository of paper "Zero-Shot Information Extraction via Chatting with ChatGPT". Please star, watch, and fork our repo for the active updates!

Abstract

Zero-shot information extraction (IE) aims to build IE systems from the unannotated text. It is challenging due to involving little human intervention. Challenging but worthwhile, zero-shot IE reduces the time and effort that data labeling takes. Recent efforts on large language models (LLMs, e.g., GPT3, ChatGPT) show promising performance on zero-shot settings, thus inspiring us to explore prompt-based methods. In this work, we ask whether strong IE models can be constructed by directly prompting LLMs. Specifically, we transform the zero-shot IE task into a multi-turn question-answering problem with a two-stage framework (ChatIE). With the power of ChatGPT, we extensively evaluate our framework on three IE tasks: entityrelation triple extract, named entity recognition, and event extraction. Empirical results on six datasets across two languages show that ChatIE achieves impressive performance and even surpasses some full-shot models on several datasets (e.g., NYT11-HRL). We believe that our work could shed light on building IE models with limited resources.

零样本信息抽取(Information Extraction,IE)旨在从无标注文本中建立IE系统,因为很少涉及人为干预,该问题非常具有挑战性。但零样本IE不再需要标注数据时耗费的时间和人力,因此十分重要。近来的大规模语言模型(例如GPT-3,Chat GPT)在零样本设置下取得了很好的表现,这启发我们探索基于提示的方法来解决零样本IE任务。我们提出一个问题:不经过训练来实现零样本信息抽取是否可行?我们将零样本IE任务转变为一个两阶段框架的多轮问答问题(Chat IE),并在三个IE任务中广泛评估了该框架:实体关系三元组抽取、命名实体识别和事件抽取。在两个语言的6个数据集上的实验结果表明,Chat IE取得了非常好的效果,甚至在几个数据集上(例如NYT11-HRL)上超过了全监督模型的表现。我们的工作能够为有限资源下IE系统的建立奠定基础。

Methods

architecture

Results

result

Tools🧰

UPDATE: we use the official api, the tool becomes more faster!!! if the key exceed limits please tell us.

NOTICE: The response speed depends on the official openai chatgpt api. (sometimes, the official is too crowded and the speed will be slow or the chatgpt will be overloaded.) Moreover, you better use your own openai key because if our default account is used by multiple people at the same time, the account may be overloaded.

NOTICE: because official api is not available in domestic, so we use api from revChatGPT and v1 version. But it's too slow, so we advise you use the tool offline for study. We will update the api further in the future (TODO).

we also provide a IE tool based on GPT3.5, you can see in GPT4IE

Description

ChatIE (Zero-Shot Information Extraction via Chatting with ChatGPT) is a open-source and powerful IE tool demo. Enhanced by ChatGPT and prompting, it aims to automatically extract structured information from a raw sentence and make a valuable in-depth analysis of the input sentence. Harnessing valuable structured information helps corporations make incisive and business–improving decisions.
Present

We support the following functions:

Task Name Lauguages
RE entity-relation joint extraction Chinese, English
NER named entity recoginzation Chinese, English
EE event extraction Chinese, English

RE

This task aims to extract triples from plain texts, such as (China, capital, Beijing) , (《如懿传》, 主演, 周迅).

Input
  • sentence: a plain text.
  • relation type list (rtl)* : {'relation type 1': ['subject1', 'object1'], 'relation type 2': ['subject2', 'object2'], ...}

PS: * denote optional, we set default value for them. But for better extraction, you should specify the three list according to application scenarios.

Examples

sentence: Four other Google executives the chief financial officer , George Reyes ; the senior vice president for business operations , Shona Brown ; the chief legal officer , David Drummond ; and the senior vice president for product management , Jonathan Rosenberg earned salaries of $ 250,000 each .
rtl: default, see file "default-types"
ouptut:
ouptut

sentence: 第五部:《如懿传》《如懿传》是一部古装宫廷情感电视剧,由汪俊执导,周迅、霍建华、张钧甯、董洁、辛芷蕾、童瑶、李纯、邬君梅等主演。
rtl: default, see file "default-types"
ouptut:
ouptut


NER

This task aims to extract entities from plain texts, such as (LOC, Beijing) , (人物, 周恩来).

Input
  • sentence: a plain text.
  • entity type list (etl)* : ['entity type 1', 'entity type 2', ...]
Examples

sentence: James worked for Google in Beijing, the capital of China. etl: ['LOC', 'MISC', 'ORG', 'PER']
ouptut:
ouptut

sentence: **共产党创立于中华民国大陆时期,由陈独秀和李大钊领导组织。
etl: ['组织机构', '地点', '人物']
ouptut:
ouptut


EE

This task aims to extract event from plain texts, such as {Life-Divorce: {Person: Bob, Time: today, Place: America}} , {竞赛行为-晋级: {时间: 无, 晋级方: 西北狼, 晋级赛事: 中甲榜首之争}}.

Input
  • sentence: a plain text.
  • event type list (etl)* : {'event type 1': ['argument role 1', 'argument role 2', ...], ...}

sentence: Yesterday Bob and his wife got divorced in Guangzhou.
etl: default, see file "default-types"
ouptut:
ouptut

sentence: 在2022年卡塔尔世界杯决赛中,阿根廷以点球大战险胜法国。
etl: default, see file "default-types"
ouptut:
ouptut


Setup

react+flask

  1. cd front-end and Run npm install to download required dependencies.
  2. Run npm run start. ChatIE should open up in a new browser tab.
  3. cd back-end and Run python run.py.
  4. note: node-version v14.17.4 npm-version 9.6.0
  5. you may need to configure proxy on your machine.

Examples

RE

re-1 re-4 re-3

NER

ner-3 ner-2

EE

EE-1 EE-3


Data usage policy

We are committed to improving our project and providing you with the best possible experience. To achieve this, we will collect your data to help us understand how you interact with our project and identify areas for improvement. We value the privacy and security of your data and ensure the data only for the purposes of improving our project.

Citation

Checkout this paper arxiv: 2302.10205

@article{wei2023zero,
  title={Zero-Shot Information Extraction via Chatting with ChatGPT},
  author={Wei, Xiang and Cui, Xingyu and Cheng, Ning and Wang, Xiaobin and Zhang, Xin and Huang, Shen and Xie, Pengjun and Xu, Jinan and Chen, Yufeng and Zhang, Meishan and others},
  journal={arXiv preprint arXiv:2302.10205},
  year={2023}
}

chatie's People

Contributors

hackers267 avatar threecolorfr avatar winniehan avatar xiaoen0 avatar xingyucui avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

chatie's Issues

Incorrect API key

"error-stage1:Incorrect API key provided: sk-5ZHQn*****************************************************************************************pYxy. You can find your API key at https://platform.openai.com/account/api-keys."

image
请问下API key测试可用,为啥代码里面报错incorrect api key

为什么ChatIE在两个Du数据集上效果那么好呢?

按理说这两个Du数据集在关系类型上比其他两个同类任务的数据集要丰富的多,根据其他研究者的类似结论,当关系类型很丰富的情况下,chatGPT应该更难做,为什么在这两个数据集上这么反常呢?

关于prompt template

您好,请问你们的prompt template是人工定义的吧,定义时有没有考虑测试更换不同的template内容呢?还有测试不同数据集的时候是将一条一条句子带入template后测试的吗?能不能整个数据集上传完成抽取?

关于Vanilla Prompt

你好,
我看论文中的你们对“Vanilla Prompt vs. Our Chat-based Prompt”做了对比,这里的Vanilla Prompt具体有文章吗?是哪篇文章做的提示学习吗?想了解下,方便解释一下吗,谢谢。

Note and solve: Rate Limit openai gpt-3.5-turbo

"error-stage1:Rate limit reached for default-gpt-3.5-turbo in organization org-NATszHOraQIVn1LhMbHGpI3Q on requests per min. Limit: 3 / min. Please try again in 20s. Contact [email protected] if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method."

openai官方对请求速率限制一分钟3个,在此提醒一下。
因为是多轮框架,所以一个实例中间可能会断,影响结果。
Solve:TODO

项目运行的问题,前端界面报错:SyntaxError: Unexpected token '<', "<!DOCTYPE "... is not valid JSON

按照项目说明的步骤:

  1. Run npm run start 。 浏览器自动弹出界面。
  2. cd back-end and Run python run.py. 。弹出提示:Running on http://127.0.0.1:3000/ (Press CTRL+C to quit),进入链接http://127.0.0.1:3000,界面内容为:Not Found。The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.
  3. 我的key,openai.proxy都设置完成了。
    4.我在前端尝试输入一段话,就报错:http://localhost:5000/ says SyntaxError: Unexpected token '<', "<!DOCTYPE "... is not valid JSON。后端也报错:TypeError: The view function did not return a valid response. The function either returned None or ended without a return statement.

NER Evaluation

This may relate more to the paper than the tool, sorry if it is the wrong channel.

How exactly is your NER evaluation performed. The relevant section in the paper is as follows:

We only consider the complete matching and use the micro F1 to evaluate NER task.
Only when both the border and the type of the predicted entity and the true entity are the same will we regard it as a correct prediction.

Given your example prompts in the appendix only list expected outputs of the form: ["Japan", "LOC"], there is no direct access to a border/offset. Do you just look for the answer text in the original sentence and take that index?

API问题

谢谢你们的开源贡献和新想法!

最近openai API似乎很慢,总是报RateLimitError的错误,几乎没法批量调用数据生成答案。想了解一下你们是如何解决的?付费成为plus会不会好一点。

access token问题

{"detail":{"message":"Your authentication token has expired. Please try signing in again.","type":"invalid_request_error","param":null,"code":"token_expired"}} OpenAI: {"detail":{"message":"Your authentication token has expired. Please try signing in again.","type":"invalid_request_error","param":null,"code":"token_expired"}} (code: 401)
以上是错误信息
应该是指access token过期了。
不太懂access token这个是什么意思,和openai的key有关吗?如何修复?

event detection performance on the ACE2005 dataset

hello, did you test the event detection performance (trigger identification and trigger classification) on the ACE2005 dataset ? According to the paper, only contain arguments classification are considered. Did I miss anything?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.