maitrix-org / llm-reasoners Goto Github PK

A library for advanced large language model reasoning

Home Page: https://www.llm-reasoners.net/

License: Apache License 2.0

Python 83.72% Jupyter Notebook 16.28%

llm-reasoners's Introduction

[Home] [Paper (COLM2024)] [Blog]

LLM Reasoners is a library to enable LLMs to conduct complex reasoning, with advanced reasoning algorithms. It approaches multi-step reasoning as planning and searches for the optimal reasoning chain, which achieves the best balance of exploration vs exploitation with the idea of "World Model" and "Reward".

Given any reasoning problem, simply define the reward function and an optional world model (explained below), and let LLM reasoners take care of the rest, including Reasoning Algorithms, Visualization, LLM calling, and more!

News

Jul. 10, 2024: Our paper on LLM Reasoners is accepted to COLM 2024!
Jun. 24, 2024: PromptAgent is in LLM Reasoners! Let it help you write down a super detailed prompt for your task (here).
May. 14, 2024: Check out Eurus, a suit of LLMs optimized for reasoning. With LLM Reasoners, Eurus-RM can easily boost Llama-8B from 0.49 to 0.73 📈 on GSM8k (code).
May. 2, 2024: We have integrated our first reasoning method for scientific reasoning, StructChem! Check it out here.
Apr. 22, 2024: We integrated Llama-3, with additional useful APIs (e.g., customizing EOS tokens, calculating likelihood)
Apr. 8, 2024: Our new paper introducing LLM Reasoners is available!
Mar. 29, 2024: Grace Decoding has been incoporated!
Oct. 25, 2023: A video tutorial on the visualizer of LLM Reasoners are available.
Oct. 23, 2023: Reasoning-via-Planning is accepted to EMNLP 2023! Check our paper with updated results and discussion!
Aug. 21, 2023: A batch of quantized Llama-2 models has arrived! BitsandBytes with huggingface API, GPT-Q with exllama are available. Now you can try llama-2-70B with 2 x 24G GPUs.
Aug. 10, 2023: Llama-2 is supported! You can run examples with Llama-2 now.

Why Choose LLM Reasoners?

Cutting-Edge Reasoning Algorithms: We offer the most up-to-date search algorithms for reasoning with LLMs, such as:
Intuitive Visualization and Interpretation: Our library provides a visualization tool to aid users in comprehending the reasoning process. Even for complex reasoning algorithms like Monte-Carlo Tree Search, users can easily diagnose and understand the process with one line of python code. See an exmaple in the tutorial notebook.
Compatibility with popular LLM libraries: Our framework is compatible with popular LLM frameworks, e.g. Huggingface transformers, OpenAI/Google/Anthropic API, etc. Specifically, we have integrated LLaMA-1/2/3 with the option of using fairscale (1,2, 3), LLaMA.cpp, Exllama or huggingface for different needs, e.g., fastest inference speed, minimal hardware requirements, etc.

Experiment Results

LLM Reasoners is applied to analyze the reasoning abilities of LLMs and the performance of multiple reasoning algorithms. See the comprehensive experiment results in the AutoRace Leaderboard, and more analysis in the blog and paper.
It has been tested to successfully reproduce the performance of Tree-of-Thoughts, Guided Decoding and GRACE Decoding with their official implementation. We list the results reported in their paper / reproduced from their official repositories for reference (†). Some results are on the subsets of the first 100 examples (*).

Method	Base LLM	GSM8k
Guided Decoding^†	CodeX (PAL)	0.80
Guided Decoding	CodeX (PAL)	0.83*

Method	Base LLM	Game of 24
Tree-of-Thoughts^†	GPT-3.5-turbo	0.22
Tree-of-Thoughts	GPT-3.5-turbo	0.22

Method	Base LLM	GSM8k
GRACE Decoding^†	Flan-T5-Large (Fine-tuned)	0.34
GRACE Decoding	Flan-T5-Large (Fine-tuned)	0.33*

Background of LLM Reasoning

Consider the following problem:

Let's start with a naive method for LLM reasoning: Prompted with a few examples of problem-solving step by step, an LLM can generate a chain of thoughts (or a sequence of actions) to solve a new problem. For the problem above, the prompt inputted to the LLM and the expected output (in bold) is shown below:

I am playing with a set of blocks where I need to arrange the blocks into stacks.

(Example problems and solutions * 4)

[STATEMENT] 
As initial conditions I have that, the red block is clear, the blue block is clear, the orange block is clear, the hand is empty, the red block is on the yellow block, the yellow block is on the table, the blue block is on the table and the orange block is on the table. My goal is to have that the orange block is on top of the blue block and the yellow block on top of the orange block.

[PLAN]
pick up the orange block
stack the orange block on top of the blue block
unstack the red block from on top of the yellow block
put the red block on the table
pick up the yellow block
stack the yellow block on top of the orange block

Regarding each reasoning step as an action, we have $a_1=$"pick up the orange block", $a_2=$"stack the orange block on top of the blue block", and so on. At each time step, the next action is sampled from the LLM conditioned on the previous actions. This simple method is often referred to as Chain-of-thoughts reasoning. Unfortunately, it doesn't always work for complex reasoning problems. For Blocksworld dataset where the problem above comes from, even the strongest GPT-4 model can only reach the success rate of ~30%.

LLM Reasoners formulate reasoning as planning (RAP). Different from Chain-of-thoughts reasoning which autoregressively samples the next action, our goal is to efficiently search in the reasoning space for the optimal reasoning chain. To achieve this, two components need to be defined: a world model and a reward function.

World model defines the state transition, formally $P(s_{i+1} | s_i, a_i)$. A default world model regards the partial solution as the state and simply appends a new action/thought to the state as the state transition (the same formulation of Tree-of-Thoughts). However, you’ll have the option to design a better world model which predicts and keeps track of a more meaningful state (e.g., environment status, intermediate variable values, etc. Check RAP for more examples), thus enhancing the reasoning. For the example shown above, we can naturally define the state as the condition of blocks (e.g., the red block is on the yellow block...), and a world model is to predict the condition of blocks after every potential action.
Reward function provides a criterion to evaluate a reasoning step. Ideally, a reasoning chain with a higher accumulated reward should be more likely to be correct. For the example shown above, we can reward actions based on the increased number of accomplished subgoals they lead to. Besides, the likelihood of LLMs generating the action can also be used as a reward, to give the search a good prior.

After we have the world model and reward function, it's time to apply an algorithm to search for the optimal reasoning trace. Here, we show the process of Monte-Carlo Tree Search with a gif:

Introduction of the library

The three key components in a reasoning algorithm, reward function, world model, and search algorithm in the formulation (top), correspond to three classes in the library, SearchConfig, WorldModel and SearchAlgorithm respectively. Besides, there are LLM APIs to power other modules, Benchmark, and Visualization to evaluate or debug the reasoning algorithm (middle). To implement a reasoning algorithm for a certain domain (a Reasoner object), a user may inherit the SearchConfig and WorldModel class, and import a pre-implemented SearchAlgorithm. We also show a concrete example of solving Blocksworld with RAP using LLM Reasoners (bottom).

Quick Tour

Let's go through the code of reasoning over Blocksworld problems. Note that the code is simplified for demonstration (check here for a runnable notebook).

The first step is to define the world model: you will set up an initial state given a question in init_state, judge whether a state is terminal in is_terminal, and most importantly, define the world dynamics with step:

from typing import NamedTuple
import utils
from reasoners import WorldModel, LanguageModel
import copy

BWState = str
BWAction = str

class BlocksWorldModel(WorldModel[BWState, BWAction]):
    def __init__(self,
                 base_model: LanguageModel,
                 prompt: dict) -> None:
        super().__init__()
        self.base_model = base_model
        self.prompt = prompt

    def init_state(self) -> BWState:
        # extract the statement from a given problem
        # e.g., "the red block is clear, the blue block is clear..."
        return BWState(utils.extract_init_state(self.example)) 

    def step(self, state: BWState, action: BWAction) -> tuple[BWState, dict]:
        # call the LLM to predict the state transition
        state = copy.deepcopy(state)
        # load the prompt for the LLM to predict the next state
        # e.g. "... I have that <state>, if I <action>, then ..."
        world_update_prompt = self.prompt["update"].replace("<state>", state).replace("<action>", action)
        world_output = self.base_model.generate([world_update_prompt],
                                    eos_token_id="\n", hide_input=True, temperature=0).text[0].strip()
        new_state = utils.process_new_state(world_output)
        # till now, we have the new state after the action
        # the following part is to speed up the reward calculation

        # we want to check the portion of the satisfied subgoals, and use it as a part of the reward
        # since we have predicted the new state already, we can just check it here at convenience
        goal_reached = utils.goal_check(utils.extract_goals(self.example, new_state))
        # return the new state and the additional dictionary (to be passed to the reward function)
        return new_state, {"goal_reached": goal_reached}

    def is_terminal(self, state: BWState) -> bool:
        # define the condition the terminal state to stop the search
        # e.g., all the subgoals are met
        if utils.goal_check(utils.extract_goals(self.example), state.blocks_state) == 1:
            return True
        return False

Then, it's time to consider how to search for the optimal reasoning chain. It involves get_actions to get the action space given a state, and the most important reward as the guidance for reasoning. For Monte-Carlo Tree Search, we can additionally define a fast_reward to speed up the roll-out stage.

import utils
from world_model import BWState, BWAction
from reasoners import SearchConfig, LanguageModel
class BWConfig(SearchConfig):
    def __init__(self,
                 base_model: LanguageModel,
                 prompt: dict,
                 reward_alpha=0.5,
                 goal_reward_default=0.,
                 goal_reached_reward=100) -> None:
        super().__init__()
        self.base_model = base_model
        self.example = None
        self.prompt = prompt
        # some parameters to calculate the fast reward or reward (explained below)
        self.reward_alpha = reward_alpha
        self.goal_reward_default = goal_reward_default
        self.goal_reached_reward = goal_reached_reward

    def get_actions(self, state: BWState) -> list[BWAction]:
        # use a rule-based function to extract all legal actions
        return utils.generate_all_actions(state)

    def fast_reward(self, state: BWState, action: BWAction) -> tuple[float, dict]:
        # build an in-context learning prompt (similar to the one used in Chain-of-thoughts reasoning)
        inputs = self.prompt["icl"].replace("<init_state>", state)\
            .replace("<goals>", utils.extract_goals(self.example))
        # concatenate a candidate action after the prompt, and test its loglikelihood
        intuition = self.base_model.get_loglikelihood(inputs, [inputs + action])[0]
        # the reward is a combination of intuition and goal satisfaction
        # in fast_reward, we skip the calculation of goal satisfaction and use a default value
        fast_reward = intuition * self.reward_alpha + self.goal_reward_default * (1 - self.reward_alpha)
        # cache some information for the reward calculation later (will be passed to `reward` function)
        details = {'intuition': intuition}
        return fast_reward, details

    def reward(self, state: BWState, action: BWAction,
               intuition: float = None,
               goal_reached: tuple[bool, float] = None) -> float:
        # note that `intuition` (cached in `fast_reward`) and `goal_reached` (cached in `step`) are automatically passed as parameters to this reward function
        if goal_reached == 1:
            # if the goal state is reached, we will assign a large reward
            goal_reward = self.goal_reached_reward
        else:
            # otherwise assign the reward based on the portion of satisfied subgoals
            goal_reward = goal_reached
        # the reward is a combination of intuition and goal satisfaction
        reward = intuition * self.reward_alpha + goal_reward * (1 - self.reward_alpha)
        # return the reward and an additional dictionary (to be saved in the log for visualization later)
        return reward, {'intuition': intuition, 'goal_reached': goal_reached}

Now, we are ready to apply a reasoning algorithm to solve the problem:

from reasoners.algorithm import MCTS
from reasoners.lm import LLaMAModel
from world_model import BlocksWorldModel
from search_config import BWConfig

llama_model = LLaMAModel(llama_ckpts, llama_size, max_batch_size=1)
with open(prompt_path) as f:
    prompt = json.load(f)
world_model = BlocksWorldModel(base_model=base_model, prompt=prompt)
config = BWConfig(base_model=llama_model, prompt=prompt)
# save the history of every iteration for visualization
search_algo = MCTS(output_trace_in_each_iter=True)
reasoner = Reasoner(world_model=world_model, search_config=config, search_algo=search_algo)
for i, example in enumerate(dataset):
    algo_output = reasoner(example)
    # save the MCTS results as pickle files
    with open(os.path.join(log_dir, 'algo_output', f'{resume + i + 1}.pkl'), 'wb') as f:
        pickle.dump(algo_output, f)

Finally, we can easily visualize the reasoning process:

import pickle
from reasoners.visualization import visualize
with open("logs/bw_MCTS/xxx/algo_output/1.pkl", 'rb') as f:
    mcts_result = pickle.load(f)

from reasoners.visualization.tree_snapshot import NodeData
from reasoners.algorithm.mcts import MCTSNode

# by default, a state will be presented along with the node, and the reward with saved dictionary in `SearchConfig.reward` will be presented along with the edge. 
# we can also define a helper function to customize what we want to see in the visualizer.
def blocksworld_node_data_factory(n: MCTSNode) -> NodeData:
    return NodeData({"block state": n.state.blocks_state if n.state else None,
                     "satisfied": n.fast_reward_details if n.fast_reward_details else "Not expanded"})
def blocksworld_edge_data_factory(n: MCTSNode) -> EdgeData:
    return EdgeData({"reward": n.reward, "intuition": n.fast_reward_details["intuition"]})
visualize(mcts_result, node_data_factory=blocksworld_node_data_factory,
                       edge_data_factory=blocksworld_edge_data_factory)

Then a URL of the visualized results will pop up. The figure will be interactive and look like the examples shown on our demo website.

Installation

Make sure to use Python 3.10 or later.

conda create -n reasoners python=3.10
conda activate reasoners

Clone the repository and install the package:

git clone https://github.com/Ber666/llm-reasoners --recursive
cd llm-reasoners
pip install -e .

Adding --recursive will help you clone exllama automatically. Note that some other optional modules may need other dependencies. Please refer to the error message for details.

Citation

This project is an extension of the following paper:

@inproceedings{hao2023reasoning,
  title={Reasoning with Language Model is Planning with World Model},
  author={Hao, Shibo and Gu, Yi and Ma, Haodi and Hong, Joshua and Wang, Zhen and Wang, Daisy and Hu, Zhiting},
  booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
  pages={8154--8173},
  year={2023}
}
@article{hao2024llm,
  title={LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models},
  author={Hao, Shibo and Gu, Yi and Luo, Haotian and Liu, Tianyang and Shao, Xiyan and Wang, Xinyuan and Xie, Shuhua and Ma, Haodi and Samavedhi, Adithya and Gao, Qiyue and others},
  journal={arXiv preprint arXiv:2404.05221},
  year={2024}
}

llm-reasoners's People

Contributors

Stargazers

Watchers

Forkers

valar23 shannonsands codeaudit touristshaun lewieyasu orenelbaum liujuncn mivanovitch tejas1995 stevegyutyan yangyi-chen dumpmemory mjdhasan advit200 openlmlab keyzf ysshannon fgenie nickydark1 islammohamedmosaad ziyu-deep techthiyanes andreasbinder xiaoeric hertera1 jeff52415 artivus2023 joydajunspacecraft kingdomad anminhhung hanzvivatma spidartist jjhw nabilkoneylaryea aaron617 pandora-alias gaohuaninchi asantoso xintianpan plowsai quester-one yangzhou12 wanjiazhao1203 huskydoge puyuan1996 edna-cyber muriloluz j0hngou githubchengpeng rong4ivy hillzhang1999 liwenju0 banguiskode kustomzone beimingmaster jawond dadaqingjian monstertail 1989ryan rahulschand zhan72 ber666 sailfish009 rhyang2021 sora1998 jianglai-0023 atanu2531 h-hp lukelike1001 creative-emporium mberkanbicer brunoscaglione chat19 optimacare yu-fangxu knowledgehacker akarinmoe yanbin-yin parajain kaynewest yangwang92

llm-reasoners's Issues

Setting the VAL env parameter for evaluation on Blocksworld

Hello, thanks for the great work! I have been trying to get the results on Blocksworld with a Llama2-7b model. When I run the MCTS algorithm from examples/blocksworld/rap_inference.py using the command in examples/blocksworld/test_rap_v2.sh, I am able to generate the correct plans. For instance, this is the first output json:

Case #1: correct=False, output='unstack the blue block from on top of the red block\nstack the blue block on top of the orange block', answer={'init': 'the blue block is clear, the orange block is clear, the hand is empty, the blue block is on top of the red block, the red block is on the table and the orange block is on the table', 'goal': 'the blue block is on top of the orange block', 'plan': '\nunstack the blue block from on top of the red block\nstack the blue block on top of the orange block\n[PLAN END]\n', 'question': '\n[STATEMENT]\nAs initial conditions I have that, the blue block is clear, the orange block is clear, the hand is empty, the blue block is on top of the red block, the red block is on the table and the orange block is on the table.\nMy goal is to have that the blue block is on top of the orange block.\n\nMy plan is as follows:\n\n[PLAN]\n', 'instance_file': 'LLMs-Planning/llm_planning_analysis/instances/blocksworld/generated_basic_3/instance-96.pddl'};accuracy=0.000 (0/1)

The plan is correct, however, during evaluation I do not get an accuracy point for that. Going through the script, my issue seems to be in reasoners/benchmark/bw_utils.py in validate_plan(), where the os.getenv('VAL') returns None, due to which the response string remains empty.

What value should this env variable set to?

Thanks!

tot game24 reproduce error

Hi I am trying to reproduce tot game24 using beam search and i found out that sometimes even if model get 24 sucessfully, we still can get correct=False because the output=None. How to fix it? I run this search on question 900.

following is my search config:
model: gpt-turbo-3.5
beam_size: 5
max_depth: 4
sampling_strategy: argmax
replace: None
temperature: None
temperature_decay: None
reject_sample: None
reject_min_reward: None
unbiased: None
reward_aggregator: last
action_dedup: False
early_terminate: True
return_beam: False

AtrributeError: 'str' object has no attribute 'name' when I build more example prompts on Blocksworld benchmark.

When running the utils.load_blocksworld function in build_prompts.ipynb to build prompts containing intermediate state information, when calling the get_intermediatiate_states function, an error is reported when running the s.add(Predicate(*i[1:-1].split(" "))): AtrributeError: 'str' object has no attribute 'name'

EOS token '\n' not working properly in llama3

There is an issue with '\n' not working properly in llama3. When passing '\n' through tokenizer.encode, it outputs the token ID 198, but it does not terminate the sentence generation appropriately and continues generating subsequent text.
eos_token_id = base_model.tokenizer.encode("\n", bos=False, eos=False)[-1]
In contrast, using other strings like 'Q' works correctly. Additionally, testing with llama2 shows that all strings, including '\n', work as expected.

Could you please look into this issue?

Guided Decoding Issues

put GPT api key to init()
aux and state issue

why so slow when I run rag_stragegyQA

llm: llama3
rap strageqa

the first question takes more than 1 hours runing on A100

Beam Search issue

beam search terminal sort
add paras for reward type
terminal handle of original code

Question about Prompt Tuning Process: Seeking Insights from Authors

Hi again! :)

I really like the RAP framework and especially the GSM8k experiment which I would like to extend! Basically, what I want to achieve is to have more actions to tackle a multi-modal dataset which includes retrieval.

However, when changing the prompt, the model (TheBloke/Llama-2-13B-GPTQ and meta-llama/Llama-2-13b-hf) fail completely.

I want to ask you about your experiences when it comes to questions like, how sensitive is the model if you reduce the number of demonstrations, or where to put the overall question etc.

Problems that I encounter are for example:

Giving the overall question and asking to create a subquestion, it just rephrases the question with no decomposition
When I do not give demonstration but instead formulate a query combining the overall question and the task description ("Generate a textual query for finding the university that started offering courses in the community with ZIP code 29707 in August 2018.\n”), the model continues with the description ('The goal of this project is to generate a textual query in SQL that would find the university that started offering courses in the community with ZIP code 29707 in August 2018.\n' or 'We assume that you have the data in the following column.\n')

I am aware those are more general questions and not totally specific to your code, but I was curious how much effort I have to invest to guide the model in a meaningful capacity (I hope not to just spend weeks prompt engineering but also actually implement some actions :D )

Great Work btw!

prompts for MATH are not uploaded

The prompts for the MATH dataset are not uploaded. Could you please upload them?

Applicability to Programming Problems?

Do you have suggestions on how to apply llm-reasoners or RAP to programming challenges?

Unbound TypeVar `Example` in `SearchConfig` and `WorldModel`

https://github.com/Ber666/llm-reasoners/blob/504afc75e158faeabcda1b92215b691dee941627/reasoners/base.py#L93

https://github.com/Ber666/llm-reasoners/blob/504afc75e158faeabcda1b92215b691dee941627/reasoners/base.py#L111

The type variable seems unbound in SearchConfig and WorldModel

Should they accept Example as a type parameter, and WorldModel passes the Example parameter to them? i.e. change

https://github.com/Ber666/llm-reasoners/blob/504afc75e158faeabcda1b92215b691dee941627/reasoners/base.py#L127-L131

 class Reasoner(ABC, Generic[State, Action, Example]): 
     def __init__(self, 
                  world_model: WorldModel[State, Action, Example], 
                  search_config: SearchConfig[State, Action, Example], 
                  search_algo: SearchAlgorithm) -> None:

Python version prerequisite

Now that the repo only runs with python version >= 3.10, do we need to put this in the README?

Download val error in Blocksworld

We follow the readme file in the examples folder. In Blocksworld, we download the val files via the link provided, but the link page say: This page does not exist yet.
Screenshot below：

Could you tell us how to solve this problem. we will be very grateful. Thanks.

XOT?

https://arxiv.org/pdf/2311.04254.pdf basically alpha zero mcts mixed with tot, looks cool

Llama3-70B inference stopping issue

First of all, thank you for your well-constructed code!

I have tried running rap_gam8k with the newly published Llama3 and successfully conducted the experiment with Llama3-8B. However, when using Llama3-70B, the experiment stops during the inference stage (max_seq_len = 2048). (It pauses without any specific error message.)

Therefore, I attempted to reduce the max_seq_len value or decrease the input prompt composed of 4-shot to one-shot or zero-shot. In this case, during the loop generating the next token, the initial dozens or hundreds of inferences proceed without any issues, but then it pauses without any error message when either mid-loop or upon receiving the next prompt.

If you've experienced a similar issue or have resolved it before, any assistance would be greatly appreciated! Thanks!

My command is as follows: torchrun --nproc-per-node 8 --master-port 6666 examples/rap_gsm8k/inference.py --base_lm llama-3 --llama_3_ckpts /home/llama3/ --llama_size 70B

World model `step` return type mismatch in MCTS implementation

World models' step function is supposed to return the world model's state.
https://github.com/Ber666/llm-reasoners/blob/504afc75e158faeabcda1b92215b691dee941627/reasoners/base.py#L88

However, in the MCTS implementation, it is expected to also return aux.
https://github.com/Ber666/llm-reasoners/blob/52ef9901d35f3074576345f17d4b3e8fe9c600f4/reasoners/algorithm/mcts.py#L168

Should we update base class' step function signature to match the usages?

RuntimeError: CUDA error: device-side assert triggered when using Llama 2 from HF

Good Day!
I tried to run the GSM8k example with the model from HF as you described: (only adjust the log and prompt paths)

  CUDA_VISIBLE_DEVICES=0,1 python examples/rap_gsm8k/inference.py --base_lm hf --hf_path meta-llama/Llama-2-70b-hf --hf_peft_path None --hf_quantized 'nf4'

However, I receive the following error

    RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I think this is related to the warning also mentioned in the log trace:

    llm-reasoners/reasoners/lm/hf_model.py:137: UserWarning: the eos_token '\n' is encoded into [29871, 13] with length != 1, using 13 as the eos_token_id
  warnings.warn(f'the eos_token {repr(token)} is encoded into {tokenized} with length != 1, '

When searching on GitHub, I think it is related to input mismatching due to some false tokenisation 1 2 3.
Did you also encounter this problem or how did you go about it?
I will try the other versions of Llama in the meantime.

I am using transformers 4.33.1

Thx!

Including GRACE decoding

Thanks for the amazing repo. I am the first author of GRACE Decoding and I think LLM-reasoners could benefit from having GRACE included. I would be happy to help with this, so let me know!

fast_reward() for GSM8K always returns 0

Why does fast_reward return 0 here?

Is this a bug?

TypeError: Too few parameters for <class 'reasoners.base.WorldModel'>; actual 2, expected 3

  File "/home/dev/PycharmProjects/llm-reasoners/examples/rap_gsm8k/world_model.py", line 29, in <module>
    class GSM8kWorldModel(WorldModel[GSM8kState, GSM8kAction]):
  File "/home/dev/anaconda3/envs/llm/lib/python3.10/typing.py", line 312, in inner
    return func(*args, **kwds)
  File "/home/dev/anaconda3/envs/llm/lib/python3.10/typing.py", line 1345, in __class_getitem__
    _check_generic(cls, params, len(cls.__parameters__))
  File "/home/dev/anaconda3/envs/llm/lib/python3.10/site-packages/typing_extensions.py", line 165, in _check_generic
    raise TypeError(f"Too {'many' if alen > elen else 'few'} parameters for {cls};"
TypeError: Too few parameters for <class 'reasoners.base.WorldModel'>; actual 2, expected 3

Build new task

Thanks for your contribution，I think your method is logic.And I want to build my own
task,could you teach me how can I do it

You forgot to change the path of some datasets.

There are some examples:

~/reasoners/benchmark/aqua.py

~/examples/RAP/gsm8k

You may need to recheck these paths.

No module named "llama3"

Visualizer edge selection

Now it selects the node with max Q. Should save the next edge selection to the MCTSResult.

Setup - GSM8k

--> 245 assert torch.all(prompt_tokens[: len(prefix_tokens)] == prefix_tokens), (prompt_tokens, prefix_tokens)

getting this assertion error when using openchat-3.6

Too few parameters for <class 'reasoners.algorithm.mcts.MCTSNode'>; actual 2, expected 3

TypeError: Too few parameters for <class 'reasoners.algorithm.mcts.MCTSNode'>; actual 2, expected 3
[2024-04-12 07:26:48,924] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 1263) of binary: /mlsteam/data/LLM/llama/venv/bin/python
Traceback (most recent call last):
  File "/mlsteam/data/LLM/llama/venv/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main
    run(args)
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
    elastic_launch(
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
examples/rap_gsm8k/inference.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-04-12_07:26:48
  host      : 8dede9e2fb55
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 1263)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Any Chinese based demo?

Hello, new to llm reasoner, wanna ask is there any speicifc problem solving examples in Chinese? Such as let llm do a real world task with reasoning? Like querying LK99 news based on it's web tools? Or calculating with 100% correctness on some complicated math problems?

visualizer fails for gsm8k

when trying to load a gsm8k run with the visualizer, the code fails

    import pickle
    from reasoners.visualization import visualize
    with open("logs/gsm8k/myfolder/1.pkl", 'rb') as f:
        mcts_result = pickle.load(f)
    visualize(mcts_result)

  llm-reasoners/reasoners/visualization/tree_log.py:52, in TreeLog.from_mcts_results.<locals>.default_node_data_factory(n)
       51 def default_node_data_factory(n: MCTSNode) -> NodeData:
  ---> 52     return NodeData(n.state._asdict() if n.state else {})
  
  AttributeError: 'list' object has no attribute '_asdict'

EDIT:

the json file works fine with the online visualizer upload

EDIT 2:

where can I find the code for the visualizer, I see the code here connecting to the llm-reasoners.net front & backend which is really not ideal as a dependency.

Blocksworld Example Runtime Error: 'NoneType' object has no attribute 'cadam32bit_grad_fp32'

Dear llm-reasoners team, we have setup reasoners in a Mac machine and are following the Blocksworld example in order to try out how reasoner works. Executing this command (CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.run --nproc_per_node 4 examples/rap_blocksworld/inference.py --llama_size "30B" --data_path 'examples/rap_blocksworld/data/step_4.json' --depth_limit 4 --output_trace_in_each_iter ) results in the following error:

'NoneType' object has no attribute 'cadam32bit_grad_fp32'

Could we get some hints on how to resolve this?

Thank you for your attention.

LLama 2 compatibility

Given the restricted access to LLama 1, is it possible to use llm-reasoner examples with LLama 2 out of the box?