ysymyth / react Goto Github PK
View Code? Open in Web Editor NEW[ICLR 2023] ReAct: Synergizing Reasoning and Acting in Language Models
License: MIT License
[ICLR 2023] ReAct: Synergizing Reasoning and Acting in Language Models
License: MIT License
Hi! I'm replicating ReAct results on WebShop, and I have several questions with webshopEnv in the jupyter notebook
if prod_cnt >= 3:
processed_t = ''
Is this also what you used in the paper?
assert False
when the button Next or Prev is clicked. Is this also intentional?Also, I have got results of ReAct on WebShop with session id fixed_{1-500}, which I believe is the same setup as the paper, using this environment (did not modify it) but with different llm (not PaLM-540B):
gpt-turbo-3.5
Act - Score: 64.99 Success Rate: 34.0
ReAct - Score: 59.9 Success Rate: 30.0
code-davinci-002
Act - Score: 64.99 Success Rate: 34.0
ReAct - Score: 65.60 Success Rate: 38.8
Is this to be expected? Wondering if you have any thoughts on this. After some researching, there're people saying that chain-of-thought might not be as effective for models that was trained with RLHF like ChatGPT. But I don't have much explanation for why I'm not seeing the performance boost from Act to ReAct with Codex (code-davinci-002)
Thank you in advance! Love the simplicity of your work and I'm trying to come up with new ideas based off of this paper :)
Hello, I would like to ask if there is a code implementation for cot ->react and react ->cot mentioned in the paper
你好,我想问一下论文里提到的cot->react 和 react->cot 有代码实现吗
I tried to run Webshop.ipython, and here are some of the outputs:
Observation: Invalid action!
Action: click[Add to Cart]
Observation:
Action: click[Add to Cart]
Observation:
Action: click[Checkout]
Observation:
Action: click[Checkout]
Observation:
Action: click[Proceed to checkout]
Observation:
Action: click[Proceed to checkout]
Observation:
Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!
Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!
Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!
Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!
Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!
Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!
Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!
1 0.0 0.0 0.0
-------------
-----------------
1
Action: reset
Observation:
Action: click[Buy Now]
Observation: Invalid action!
Action: click[Add to Cart]
Observation:
Action: click[Add to Cart]
Observation:
Action: click[Checkout]
Observation:
Action: click[Checkout]
Observation:
Action: click[Proceed to checkout]
Observation:
Action: click[Proceed to checkout]
Observation:
Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!
Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!
Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!
Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!
Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!
Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!
Action: fill out form[name: John Doe, email: [email protected], phone: 555-555-1212, address: 123 Main St, city: Anytown, state: CA, zip: 99999]
Observation: Invalid action!
2 0.0 0.0 0.0
-------------
-----------------
How can I get the right score? Thank you
Thank you for your code. But I can not access the webshop url in your jupterbook. Do I have to launch another servise?
请问一下,在webshop实验中
我们的网页地址是: WEBSHOP_URL = "http://3.83.245.205:3000"
请问我应该如何把这个网页替换为我自己的网页?有网页构建的代码吗
Hi,
For webshop env, what was the number of retrieved items displayed per page?
As per the code, it seems item names indexed after 3 are purposefully omitted, which does not seem to be clarified in the actual paper.
Could you please explicitly clarify this setting just so that I am clear whether this was a small change for visualization in code or was it done for all results reported in the paper?
I was looking through the earlier issues in the repo and couldn't find this resolved in the closed issues.
Thanks!
Hi Shunyu,
Could you provide text-davinci-002 trajectory on HotpotQA 500 (30.8EM in Table 5 of A.1 GPT-3 Experiments)?
Thank you!
@ysymyth Thanks for your good work!
Can you attach the output of HotpotQA (hotpotqa.ipynb), like those in (FEVER.ipynb)? Thank you!
Don't give me links to Alfworld! The installations there don't work, the support is nonexistent.
How can I install ReAct on my Ubuntu 22.04?
Hi, I'm tring to run ReAct with GPT-3.5-Turbo on hotpot dataset with provided jupyter notebook. But only get 0.182 accuracy, is it a reasonable result? I think it is much lower than result showed in paper.
I was wondering if autogpt is inspired by your ideas. anyway, thanks for your great efforts.
Hi there, I cannot seem to find any information on the fine-tuning process in your paper and this repository.
A snippet from your paper:
However, when finetuned with just 3,000 examples, ReAct becomes the best
method among the four, with PaLM-8B finetuned ReAct outperforming all PaLM-62B prompting
methods, and PaLM-62B finetuned ReAct outperforming all 540B prompting methods. In contrast,
finetuning Standard or CoT is significantly worse than finetuning ReAct or Act for both PaLM-
8/62B, as the former essentially teaches models to memorize (potentially halluincated) knowledge
facts, and the latter teaches models how to (reason and) act to access information from Wikipedia, a
more generalizable skill for knowledge reasoning.
Hi, thanks for publishing this code. It is really helpful for me!
I have some questions about the code.
Line 162 in 6bdb3a1
def reset(self, seed=None, return_info=False, options=None, idx=None):
self.env.reset(seed=seed, return_info=return_info, options=options)
try:
self.env.step('')
except:
pass
self.env.reset(seed=seed, return_info=return_info, options=options)
self.data_idx = int(np.random.randint(len(self.data))) if idx is None else idx
observation = f"Claim: {self.data[self.data_idx][0]}"
info = self._get_info()
return (observation, info) if return_info else observation
I can not figure out why we need this try-except code, it seems this part of the code did nothing. The second self.env.reset
will reset the env, there is no need for the first reset.
Still for the reset
code, the return_info
argument seems always been False. I think this argument can be dropped. Besides, the options
and seed
arguments have never been used in WikiEnv.reset
, FeverWrapper.reset
and WikiEnv.reset
.
def reset(self, seed=None, return_info=False, options=None, idx=None):
Line 44 in 6bdb3a1
Line 158 in 6bdb3a1
Line 214 in 6bdb3a1
in the WikiEnv.step
, the reward
has not been changed since it was initialized, and has never been used why do we need this variable?
Besides, in the FeverWrapper.step
, the reward is obtained by self.get_reward
, not from WikiEnv.step
. in
Line 188 in 6bdb3a1
_
to receive the reward
from WikiEnv.step
, it also demonstrated the reward
in WikiEnv.step
is not useful.
Thanks for your patience ~
Hello, thank you for this important work and project!
I'm already seeing many references to the paradigm. The problem is that there was already a massively popular project named React. This makes searches for ReAct somewhat difficult.
when i run alfworld.ipynb, it return:
Initializing AlfredTWEnv...
Checking for solvable games...
Overall we have 134 games
Evaluating with 134 games
Traceback (most recent call last):
File "/home/ict/ReAct/react.py", line 55, in
env = env.init_env(batch_size=1)
File "/home/ict/miniconda3/envs/react1/lib/python3.9/site-packages/alfworld/agents/environment/alfred_tw_env.py", line 224, in init_env
infos = textworld.EnvInfos(won=True, admissible_commands=True, expert_type=expert_type, expert_plan=expert_plan, extras=["gamefile"])
File "/home/ict/miniconda3/envs/react1/lib/python3.9/site-packages/textworld/core.py", line 109, in init
raise ValueError(msg)
ValueError: Unknown information requested: ['expert_plan', 'expert_type']. Available information are: ['admissible_commands', 'command_templates', 'description', 'entities', 'extras', 'facts', 'fail_facts', 'feedback', 'game', 'intermediate_reward', 'inventory', 'last_action', 'last_command', 'location', 'lost', 'max_score', 'moves', 'objective', 'policy_commands', 'score', 'verbs', 'win_facts', 'won']
it seems that textworld do not work any more.
Dear Authors,
Thank you for the great work on introducing ReAct.
Since, the original model that you used text-davinci-002
is deprecated on openai the closest two alternatives are: gpt-3.5-turbo
and davinci-002
. The best performance we get on e.g. the first 10 is 0.3, while the reported results on the first 10 envs of Alfworld are 0.7.
Could you share the traces or advice, what your latest scores on this environment is? Or how to reproduce your score of 0.7. @ysymyth @john-b-yang @descrip
Thanks.
I am finding certain strings can break clean_str
:
p = "This is a test string with unicode escape: \\u00e9"
This will break clean_str
:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 43: unexpected end of data
Why do we need to convert string to be UTF-8? And if it's required, why not just ignore
conversion errors?
I am impressed with your research. Thank you for your good research.
But I have a question and would like to ask.
According to Table 2 of the paper, success and failure modes are divided.
Thanks!
Hi, I was wondering how could we finetune the small REACT model given the prompts generated using LLM being prompt tuned.
Are we trying to use LoRA or P-Tuning for the finetuning step?
How to use the prompt data?
(1) Letting all the actions and thoughts be the input and let the final action (answer) be the output
(2) Parse the whole ReAct process and use previous in-context info as input and current action as output
(3) Or any other way you used?
Really appreciate your help.
Have you ever considered to apply ReAct prompting to numerical reasoning task? (like GSM8k or datasets which contain more difficult symbolic operations)
If yes, If you have considered it, does it show any improvement?
Thank you.
Hi,
I wondered if you had more details or numbers from your GPT-3 results on Alfworld? For instance, do you have the splits of accuracy across the different subtasks (as in Table 3 in the paper)?
I would try to reproduce it, but I reckon the total cost would be > $100 and would like to avoid it if possible.
Is davinci-002 referring to text-davinci-002 or davinci-002 (not-finetuned model)?
Hi, I'm trying to reproduce your ReAct results on Webshop using some LLM APIs. However, I sometimes encountered the following errors.
Basically, sometimes, after you select some specific options and then click[Buy Now], it's going to show the error below:
Traceback
(most recent call last)
File
"/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py"
,
line
2095
,
in
__call__
def __call__(self, environ: dict, start_response: t.Callable) -> t.Any:
"""The WSGI server calls the Flask application object as the
WSGI application. This calls :meth:`wsgi_app`, which can be
wrapped to apply middleware.
"""
return self.wsgi_app(environ, start_response)
File
"/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py"
,
line
2080
,
in
wsgi_app
try:
ctx.push()
response = self.full_dispatch_request()
except Exception as e:
error = e
response = self.handle_exception(e)
except: # noqa: B001
error = sys.exc_info()[1]
raise
return response(environ, start_response)
finally:
File
"/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py"
,
line
2077
,
in
wsgi_app
ctx = self.request_context(environ)
error: t.Optional[BaseException] = None
try:
try:
ctx.push()
response = self.full_dispatch_request()
except Exception as e:
error = e
response = self.handle_exception(e)
except: # noqa: B001
error = sys.exc_info()[1]
File
"/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py"
,
line
1525
,
in
full_dispatch_request
request_started.send(self)
rv = self.preprocess_request()
if rv is None:
rv = self.dispatch_request()
except Exception as e:
rv = self.handle_user_exception(e)
return self.finalize_request(rv)
def finalize_request(
self,
rv: t.Union[ResponseReturnValue, HTTPException],
File
"/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py"
,
line
1523
,
in
full_dispatch_request
self.try_trigger_before_first_request_functions()
try:
request_started.send(self)
rv = self.preprocess_request()
if rv is None:
rv = self.dispatch_request()
except Exception as e:
rv = self.handle_user_exception(e)
return self.finalize_request(rv)
def finalize_request(
File
"/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py"
,
line
1509
,
in
dispatch_request
getattr(rule, "provide_automatic_options", False)
and req.method == "OPTIONS"
):
return self.make_default_options_response()
# otherwise dispatch to the handler for that endpoint
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
def full_dispatch_request(self) -> Response:
"""Dispatches the request and on top of that performs request
pre and postprocessing as well as HTTP exception catching and
error handling.
File
"/home/user/webshop/web_agent_site/app.py"
,
line
221
,
in
done
return html
@app.route('/done/<session_id>/<asin>/<options>', methods=['GET', 'POST'])
def done(session_id, asin, options):
options = literal_eval(options)
goal = user_sessions[session_id]['goal']
purchased_product = product_item_dict[asin]
price = product_prices[asin]
reward, reward_info = get_reward(
File
"/home/user/anaconda3/envs/webshop/lib/python3.8/ast.py"
,
line
59
,
in
literal_eval
expression. The string or node provided may only consist of the following
Python literal structures: strings, bytes, numbers, tuples, lists, dicts,
sets, booleans, and None.
"""
if isinstance(node_or_string, str):
node_or_string = parse(node_or_string, mode='eval')
if isinstance(node_or_string, Expression):
node_or_string = node_or_string.body
def _raise_malformed_node(node):
raise ValueError(f'malformed node or string: {node!r}')
def _convert_num(node):
File
"/home/user/anaconda3/envs/webshop/lib/python3.8/ast.py"
,
line
47
,
in
parse
assert major == 3
feature_version = minor
elif feature_version is None:
feature_version = -1
# Else it should be an int giving the minor version for 3.x.
return compile(source, filename, mode, flags,
_feature_version=feature_version)
def literal_eval(node_or_string):
"""
File "<unknown>", line 1
{'color': '2
^
SyntaxError: EOL while scanning string literal
This is the Copy/Paste friendly version of the traceback.
Traceback (most recent call last):
File "/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py", line 2095, in __call__
return self.wsgi_app(environ, start_response)
File "/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py", line 2080, in wsgi_app
response = self.handle_exception(e)
File "/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py", line 2077, in wsgi_app
response = self.full_dispatch_request()
File "/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py", line 1525, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py", line 1523, in full_dispatch_request
rv = self.dispatch_request()
File "/home/user/anaconda3/envs/webshop/lib/python3.8/site-packages/flask/app.py", line 1509, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "/home/user/webshop/web_agent_site/app.py", line 221, in done
options = literal_eval(options)
File "/home/user/anaconda3/envs/webshop/lib/python3.8/ast.py", line 59, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "/home/user/anaconda3/envs/webshop/lib/python3.8/ast.py", line 47, in parse
return compile(source, filename, mode, flags,
File "<unknown>", line 1
{'color': '2
^
SyntaxError: EOL while scanning string literal
The debugger caught an exception in your WSGI application. You can now
look at the traceback which led to the error.
If you enable JavaScript you can also use additional features such as code
execution (if the evalex feature is enabled), automatic pasting of the
exceptions and much more.
Brought to you by
DON'T PANIC
, your
friendly Werkzeug powered traceback interpreter.
Console Locked
The console is locked and needs to be unlocked by entering the PIN.
You can find the PIN printed out on the standard output of your
shell that runs the server.
PIN:
To reproduce the error, you can try this:
In the ipython file of ReAct webshop, select the task id 83: i need a slim fit gray colored coat that has long sleeves. it should be in x-large size, and price lower than 40.00 dollars. Then do the following actions:
Then the error occurs. When doing these actions directly on the website, there is no such error. Therefore there may be something wrong when passing the argument to the environment.
(The errors I notice all come when an option that has '#' inside it is selected, maybe that's useful. )
Could you please help check that? Thank you so much!
You can actually close this issue -- just in case anyone is looking for running it with OAI Chat Completion -
import os
import openai
import requests
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("OPENAI_API_KEY"),
)
def llm(prompt, stop=["\n"]):
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": prompt,
}
],
temperature=0,
max_tokens=100,
top_p=1,
frequency_penalty=0.0,
presence_penalty=0.0,
stop=stop
)
return response.choices[0].message.content
I used the code as it is for the hotpotqa.ipynb and found the following error:
APIRemovedInV1 Traceback (most recent call last)
Cell In[53], line 10
8 old_time = time.time()
9 for i in idxs[:500]:
---> 10 r, info = webthink(i, to_print=True)
11 rs.append(info['em'])
12 infos.append(info)
Cell In[47], line 26
24 for i in range(1, 8):
25 n_calls += 1
---> 26 thought_action = llm(prompt + f"Thought {i}:", stop=[f"\nObservation {i}:"])
27 try:
28 thought, action = thought_action.strip().split(f"\nAction {i}: ")
Cell In[52], line 10
9 def llm(prompt, stop=["\n"]):
---> 10 response = openai.Completion.create(
11 model="text-davinci-002",
12 prompt=prompt,
13 temperature=0,
14 max_tokens=100,
15 top_p=1,
16 frequency_penalty=0.0,
17 presence_penalty=0.0,
18 stop=stop
19 )
20 return response["choices"][0]["text"]
File c:\Users\fattoh.alqershi\ReAct\ReAct\myvenv\lib\site-packages\openai\lib_old_api.py:39, in APIRemovedInV1Proxy.call(self, *_args, **_kwargs)
38 def call(self, *_args: Any, **_kwargs: Any) -> Any:
---> 39 raise APIRemovedInV1(symbol=self._symbol)
APIRemovedInV1:
You tried to access openai.Completion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.
You can run openai migrate
to automatically upgrade your codebase to use the 1.0.0 interface.
Alternatively, you can pin your installation to the old version, e.g. pip install openai==0.28
It seems that error in version, when I back to version openai==0.28.
It raised other error related to the client parameters. Expected (messages, and other .....) but no messages there in the code.
Please, support me.
Thanks.
Did you use prompt like https://github.com/hwchase17/langchain/blob/bc2ed93b77cf9c40920ca5bf96968c90bb3e322e/langchain/agents/react/textworld_prompt.py#L4-L45 to ask GPT3 to generate result in ReAct format?
Or you just create many examples, and fine tune it, so it generate it? And this only works in your fine tuned model, and not working in GPT3-4?
I'd like to know if the method in langchain actually correct and works.
Hi,
Thanks for your great work! I have a question on Table 3, where results of Act and ReAct are reported as avg/best of 6. I am wondering where does 6 come from, given that the decoding strategy is greedy.
Thank you!
Hello @ysymyth, thanks for sharing your code, excellent work! Is there any plan to release the code of FEVER and WebShop? Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.