Coder Social home page Coder Social logo

iiis-ai / cumulative-reasoning Goto Github PK

View Code? Open in Web Editor NEW
270.0 5.0 29.0 7.72 MB

Official implementation of paper "Cumulative Reasoning With Large Language Models" (https://arxiv.org/abs/2308.04371)

Home Page: https://cumulative-reasoning.github.io

Python 91.55% Jupyter Notebook 7.14% Shell 1.31%
llm reasoning large-language-models prompting math

cumulative-reasoning's Introduction

Cumulative Reasoning With Large Language Models

PWC arXiv Python 3.10

Homepage: https://cumulative-reasoning.github.io.

Introduction

Official implementation of paper "Cumulative Reasoning with Large Language Models" (https://arxiv.org/abs/2308.04371).

  • Achieving 98% accuracy for the Game of 24 (+24% compared to Tree-of-Thoughts)!

  • Achieving 58% accuracy on the MATH dataset without code environment using GPT-4-0314 (+4.2% compared to PHP)!

  • Achieving 43% relative improvement on the hardest Level 5 MATH problems (22.4% to 32.1%)!

  • Achieving 72.2% accuracy on the MATH dataset with code environment using GPT-4-1106-preview (+20.2% compared to PAL (PoT) )!

  • Focusing on Level 5 MATH problems, the CR Agent v0.1 showed a remarkable 66.8% improvement over PAL!

Installation

conda create -n cr python==3.10
conda activate cr
pip install -r requirements.txt

For more usage help, please refer to the README.md in each subdirectory.

CR Agent: Solving MATH Problems with Code Environment

please see the ./CR-Agent folder for the output log and prompts on the MATH dataset, we have released the code for CR Agent v0.1 (a minimalist implementation based on ToRA).

Experimental Results

In this section, we employed GPT-4-1106-preview with a Python code environment, devoid of additional tools like external memory and retrieval systems. The experiment involved a minimalist setup where only one reasoning context session was utilized. This session was managed by simply accumulating and concatenating the context string, and the entire process was executed using a single LLM without the assistance of a verifier LLM. Notably, the implementation was carried out purely using Python strings, without leveraging any specialized frameworks such as Langchain or guidance.

The outcomes of this experimental setup revealed noteworthy results:

  • PAL (Program-Aided Language models): Achieved an accuracy of 52%.
  • ToRA (Tool-Integrated Reasoning Agent): Demonstrated a higher accuracy of 60.8%.
  • CR Agent (Cumulative Reasoning Agent) v0.1: Significantly outperformed the aforementioned methods with an impressive accuracy of 72.2%.
  • Specifically focusing on Level 5 problems, the CR Agent showed a remarkable 66.8% improvement over PAL and a 12.7% relative improvement over ToRA.

Category-wise Scores

Method Algebra Counting & Probability Geometry Intermediate Algebra Number Theory Prealgebra Precalculus
PAL (PoT) 65.3 57.9 31.7 30.9 66.1 73.2 23.2
ToRA 71.8 68.4 48.8 49.5 66.1 67.1 44.6
CR Agent 86.3 71.1 53.7 51.5 88.7 86.6 51.8

Difficulty Level Scores

Method Level 1 Level 2 Level 3 Level 4 Level 5
PAL (PoT) 88.4 65.6 60.0 45.3 31.3
ToRA 74.4 75.6 69.5 53.9 46.3
CR Agent 90.7 90.0 81.9 66.4 52.2

The asterisks highlight the best-performing method in each category and difficulty level, clearly indicating the superiority of the CR Agent in this experimental setup.

These tables provide a comprehensive view of the performance of each method across various categories and difficulty levels in the MATH dataset. The CR Agent shows marked improvements in most categories and levels, illustrating its robustness and effectiveness in solving complex mathematical problems, even within the constraints of a simplified experimental setup.

CR Agent Assistant v0.1 based on Meta Prompting

see ./CR-Agent-Assistant/cr-agent-assistant-v0.1.md for a minimalist implementation based on OpenAI Assistant API.

See https://chat.openai.com/g/g-L3a4ZCIHx-cr-agent-v0-1 for an online demo.

Meta Prompting (General Definition): Meta Prompting is a prompting technique inspired by type theory, emphasizing the structure and syntax of examples rather than their detailed content. It's an approach where the focus is on presenting the outline or framework of a problem or topic, offering a scaffold that can be filled with specific details as needed. This technique is particularly useful in situations where understanding the form and pattern of a problem or solution is more crucial than the specific content.

Revisiting Game of 24

We have implemented the CR Agent using pure Meta Prompting to let the AI Agent directly write a Python program to solve the Game of 24 tasks, and process all samples in one response, n time faster than previous methods. Please see https://github.com/meta-prompting/meta-prompting for details.

MP-CR-Agent-XML v0.2 Success Rate: 100%, Time usage: 0.08s per sample.

Acknowledgement

This repo is mainly based on Guidance, HuggingFace, Tree of Thoughts and ToRA. Thanks for their wonderful work!

Citations

Please cite the paper and star this repo if you use Cumulative Reasoning (CR) and find it interesting/useful, thanks! Feel free to contact [email protected] | [email protected] or open an issue if you have any questions.

@article{zhang2023cumulative,
  title={Cumulative Reasoning With Large Language Models},
  author={Zhang, Yifan and Yang, Jingqin and Yuan, Yang and Yao, Andrew Chi-Chih},
  journal={arXiv preprint arXiv:2308.04371},
  year={2023}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.