This README provides instructions for setting up, training, and evaluating large language models (LLMs) specifically tailored for formal theorem proving using the Lean theorem prover. Follow these steps to replicate our development environment and experiment settings.
- Install the required dependencies via the provided
requirements.txt
file using the following command:Alternatively, ensure you have the compatible dependencies installed manually.pip install -r requirements.txt
- Download the MUSTARDSauce dataset from the official repository and unzip it into the
data
directory:data/MUSTARDSauce
- Download the MMA data from the official repository and place it into the following directory:
data/MMA
- Execute the
prep/process.py
script to process the datasets:python prep/process.py
-
We have trained Mistral 7B and CodeLLaMa 7B models, but you can fine-tune other models by modifying the
MODEL_NAME
parameter within the relevant Python scripts. -
Run the script
train/train_mustard.py
to fine-tune the base models. Adjust the settings within this file to accommodate different experimental parameters. -
The script
train/train_mma.py
is specifically for the autoformalization task. Though not directly involved in our final experiments, it can be used for related tasks.
-
Clone the miniF2F repository under the root directory as
minif2f
. -
Transfer the
eval/verify.py
script into theminif2f/lean
directory. -
Set up the Lean environment according to the Lean specifications provided in the minif2f repository.
-
Modify the data fields (e.g., the path to the directory containing inference results) in the Python scripts contained in the
eval/
directory to reflect your environment. -
To conduct the evaluation, execute the corresponding Python scripts within
eval/methods
for different prompting methods and local/OpenAI LLMs, and sequentially runextract.py
,verify.py
, andget_results.py
:python eval/methods/<script_name>.py python eval/extract.py python eval/verify.py python eval/get_results.py
The
all_results.json
file containing the final results will be generated in the designated results directory.
Unless instructions specify otherwise, run all scripts from the root directory of the main repository.
Lean3 Local Copilot
was developed as a side product for our project. You can find in the editor's Marketplace. The source code is also provided in lean3-local-copilot
.