Coder Social home page Coder Social logo

ngaut / buffer-of-thought-llm Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yangling0818/buffer-of-thought-llm

0.0 0.0 0.0 1008 KB

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Home Page: https://arxiv.org/abs/2406.04271

Python 100.00%

buffer-of-thought-llm's Introduction

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

     

This repository contains the official implementation of our Buffer of Thoughts (BoT) framework. Affiliation: Peking University, UC Berkeley, Stanford University

Introduction

We introduce BoT, a novel and versatile thought-augmented reasoning approach designed to enhance the accuracy, efficiency, and robustness of large language models (LLMs). Specifically, we propose a meta-buffer to store a series of high-level thoughts, referred to as thought-templates, distilled from problem-solving processes across various tasks. For each problem, we retrieve a relevant thought-template and adaptively instantiate it with specific reasoning structures to conduct efficient reasoning. To ensure scalability and stability, we also propose a buffer-manager to dynamically update the meta-buffer, thus enhancing its capacity as more tasks are solved. We conduct extensive experiments on 10 challenging reasoning-intensive tasks, achieving significant performance improvements over previous state-of-the-art (SOTA) methods: 11% on Game of 24, 20% on Geometric Shapes, and 51% on Checkmate-in-One. Further analysis demonstrates the superior generalization ability and robustness of our BoT, while requiring only 12% of the cost of multi-query prompting methods (e.g., tree/graph of thoughts) on average. Notably, we find that our Llama3-8B + BoT has the potential to surpass Llama3-70B model.

Overview of our BoT

🚩 New Updates

[2024.6] Our test code on three benchmarks is now available, supporting different LLMs (e.g., GPT-4, Llama3-70B).

TODO

  • Release BoT with more functions
  • Release BoT with meta-buffer and buffer-manager
  • Release initial code of BoT

Comparison between Different Methods

Task/Method GPT-4 PAL ToT Meta Prompting BoT (Ours)
Game of 24 3.0 64.0 74.0 67.0 82.4
MGSM (avg) 84.4 72.0 86.4 84.8 89.2
Multi-Step Arithmetic 84.0 87.4 88.2 90.0 99.8
WordSorting 80.4 93.2 96.4 99.6 100.0
Python Puzzles 31.1 47.3 43.5 45.8 52.4
Geometric Shapes 52.6 51.2 56.8 78.2 93.6
Checkmate-in-One 36.4 10.8 49.2 57.0 86.4
Date Understanding 68.4 76.2 78.6 79.2 88.2
Penguins 71.1 93.3 84.2 88.6 94.7
Sonnet Writing 62.0 36.2 68.4 79.6 80.0

Evaluation with Buffer of Thoughts

1. Benchmarks

For now, we release our demo version of BoT based on three different benchmarks:

2. Meta Buffer

For each task, we choose one thought template sampled from our meta-buffer library. Stay tuned for our complete meta-buffer library update!

3. Quick Start

First, set up the environment:

git clone https://github.com/YangLing0818/buffer-of-thought-llm
cd buffer-of-thought-llm
conda create -n BoT python==3.9 
pip install -r requirements.txt

3.1. Running on Three Benchmarks

Our BoT is easy to use. Just run:

python run_benchmarks.py --task_name 'gameof24' --api_key 'input your API key here if you want to use GPT-4' --model_id 'the model ID of GPT-4 or the path to your local LLM'

Here, --task_name could be one of gameof24, checkmate, wordsorting.

The --api_key is required if you want to use GPT-series; if not, you can skip it.

The --model_id should be the model ID of GPT-series like gpt-4o, gpt-4-turbo, or the path to your local LLM if you do not set --api_key.

The data for these three tasks are located in the /benchmarks directory.

The results generated during the experiment are stored in the /test_results directory.

3.2. Validate the Test Results

Run the command below to validate the test results of our BoT:

python validate_results.py --task_name 'gameof24'

This will print out the accuracy of the selected task.

📖 BibTeX

@article{yang2024buffer,
  title={Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models},
  author={Yang, Ling and Yu, Zhaochen and Zhang, Tianjun and Cao, Shiyi and Xu, Minkai and Zhang, Wentao and Gonzalez, Joseph E and Cui, Bin},
  journal={arXiv preprint arXiv:2406.04271},
  year={2024}
}

buffer-of-thought-llm's People

Contributors

yangling0818 avatar bitcodingwalkin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.