Coder Social home page Coder Social logo

madaan / pie-perf Goto Github PK

View Code? Open in Web Editor NEW
80.0 6.0 14.0 490.1 MB

Training language models to make programs faster

Home Page: htttps://pie4perf.com

C++ 0.72% Python 43.67% Jupyter Notebook 55.61%
code-generation code-optimization llms optimization software-engineering

pie-perf's Introduction

Learning Performance-Improving Code Edits

  • Repository for Learning Performance-Improving Code Edits (paper, website).

image

Updates ๐Ÿ“ข

[May 2023] A large number of problem statements in codenet were in Japanese. We have translated them to English using ChatGPT/GPT-4. The files are located here

Dataset

  • PIE is based on IBM CodeNet. Huge thanks to the authors of CodeNet for making their curated dataset available!

  • All trajectories (tsv) are located here. Columns description:

  • user_id: user id

  • problem_id: problem id. Details about the problems can be found in data/problem_list.csv

  • language: programming language

  • submission_id_v0: submission id of the first version of the code

  • submission_id_v1: submission id of the improved version of the code

  • cpu_time_v0: cpu time of the first version of the code

  • cpu_time_v1: cpu time of the second version of the code. cpu_time_v0 > cpu_time_v1 by at least 1% for all the pairs in the dataset. For pairs where the first version was TLE, cpu_time_v0 is set to some high value (e.g. 1000000).

  • memory_v{0,1}: memory used by the code in the two versions. We can also use memory_v0 > memory_v1 to filter out pairs.

  • status_v{0,1}: status of the code in the two versions. status_v0 can be Accepted or Time Limit Exceeded, but status_v1 is always Accepted.

  • improvement_frac: percentage of improvement of the second version of the code with respect to the first version. improvement_frac is always > 0.

Each file is a jsonl:

{
    "user_id": "u187233527",
    "problem_id": "p03317",
    "language": "python",
    "submission_id_v0": "s743350482",
    "submission_id_v1": "s961810347",
    "cpu_time_v0": 28.0,
    "cpu_time_v1": 17.0,
    "memory_v0": 3060.0,
    "memory_v1": 3060.0,
    "status_v0": "Accepted",
    "status_v1": "Accepted",
    "improvement_frac": 39.29,
    "input": "N, K = list(map(int, input().split()))\n\nN -= K\n\nans = 1\n\nwhile N > 0:\n\n  N -= K - 1\n\n  ans += 1\n\nprint(ans)",
    "target": "import math\n\n\n\nn, k = list(map(int, input().split()))\n\nprint((math.ceil((n - 1) / (k - 1))))",
    "code_v0_loc": 7.0,
    "code_v1_loc": 4.0,
    "code_v0_num_chars": 101,
    "code_v1_num_chars": 84,
    "code_v0_no_empty_lines": "N, K = list(map(int, input().split()))\nN -= K\nans = 1\nwhile N > 0:\n    N -= K - 1\n    ans += 1\nprint(ans)\n",
    "code_v1_no_empty_lines": "import math\n\nn, k = list(map(int, input().split()))\nprint((math.ceil((n - 1) / (k - 1))))\n",
    "code_same": false,
    "relative_loc_diff_percent": 42.8571428571,
    "diff": [
        "-N, K = list(map(int, input().split()))",
        "-N -= K",
        "-ans = 1",
        "-while N > 0:",
        "-    N -= K - 1",
        "-    ans += 1",
        "-print(ans)",
        "+import math",
        "+",
        "+n, k = list(map(int, input().split()))",
        "+print((math.ceil((n - 1) / (k - 1))))"
    ],
    "diff_only_import_comment": false,
    "measured_runtime_v0": 0.045435272,
    "measured_runtime_v1": 0.0459265449,
    "runtime_lift": 0.9893030722
}

We use src/make_splits.py to create these splits. The exact configuration for creating each split is specified in the folder.

Evaluating Your Method

  • Suppose you have a new method for code optimization, say awesome_optimization. We provide a sandbox for evaluating the generated code. The sandbox runs the input and the generated code over a set of test cases and reports the performance of both. We provide
  1. Save the generations in a jsonl file with the following fields:
{
    "slow_code_col": "the column name for the input code",
    "model_generated_potentially_faster_code_col": "slow_code_col after applying awesome_optimization. This is the code that will be evaluated. You can also provide a list of different candidates here, and the evaluation will be done for each candidate",
}
  1. Next, we need to provide the path to the file with some metadata. We call it the reference_file but providing references are optional. The main purpose of this file is to provide information like the language of the code, the problem id, etc. The file should have slow_code_col (same as the generations file) and problem_id. We join the generations file and the references file on the slow_code_col to get the problem id.

  2. Finally, we need to provide the path to the file with the actual test cases. We call it the inputs_outputs_basepath. This is a directory with the following structure:

inputs_outputs_basepath/{problem_id}/{inputs, outputs}.txt

where {inputs, outputs}.txt are the input and output files for the problem with id problem_id. The input and output are plain text files. Each program is fed inputs.txt and the output is compared with outputs.txt.

  1. So far, we have discussed the generation file, the reference file, and the inputs/outputs directory. In addition to these, we need to provide some information about the run. Specifically, the number of times each program should be run, the number of programs to evaluate, the timeout, and so on.

We wrap all of this information is provided in a yaml file. Here is an example:

model_generated_outputs_path: "data/sample/codex_greedy_outputs.jsonl"
inputs_outputs_basepath: "data/codenet/public_test_cases/"
reference_file_path: "data/sample/py_reference.jsonl"
output_report_file_path: "data/sample//codex_greedy_outputs.jsonl.report"
num_problems_to_evaluate: -1
num_trials: 25
ignore_first_k: 1
max_time_per_run: 10
temp_dir: null
model_generated_potentially_faster_code_col: "generated_answers"
slow_code_col: "input"
reference_code_col: "target"
is_prompt_based: true
cpu_number: 0

Please see src/codenet_eval/evalconfig.py for the full list of parameters and their descriptions.

  1. Finally, we can run the evaluation. We provide a script for this: src/codenet_eval/run_eval.py. The script takes the yaml file as input. Here is an example:
python src/codenet_eval/run_eval.py --eval_config data/sample/sample_eval_config.yaml

Citation

@article{madaan2023learning,
    title={Learning Performance-Improving Code Edits},
    author={Madaan, Aman and Shypula, Alexander and Alon, Uri and Hashemi, Milad and Ranganathan, Parthasarathy and Yang, Yiming and Neubig, Graham and Yazdanbakhsh, Amir},
    journal={arXiv preprint arXiv:2302.07867},
    year={2023}
}

pie-perf's People

Contributors

alexshypula avatar madaan avatar yazdanbakhsh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pie-perf's Issues

Question regarding the paper

Hello. Thank you so much for the great research!
I had a question while reading the paper, so I'm leaving it here.

I read section 5 "Analysis of Generated Code Edits" with great interest and was wondering if I could find more detailed information in the subsection "Comparing CODEX and CODEGEN", since it only shows the number of code edits that each model generated (i.e. optimized successfully). For example, I would like to see a list of problems that CODEX successfully optimized and the detailed %OPT/SpeedUp/%RTR for each problem.

Again, thank you so much for this great work.

Pretrained Model

Thank you for sharing the dataset and code!
I couldn't find the pretrained model trained with this data. Can you provide the CodeLlama 13B w/ FineTune used for reporting in Table 1 of the paper?
Also, is it possible to provide the synthetic data generated using GPT?
Thank you.
@madaan

input.*.txt files in public_test_cases does not start with 0

For the following problems with test cases in the public_test_cases, the input.*.txt and output.*.txt files does not start with index 0:

p01875
p02069
p02067
p00754
p02068
p02072
p02871
p01895
p02074
p02224
p01660
p02197
p01516
p01589
p00000
p01779
p01581
p01588
p01969
p00685
p00683
p01590
p02064
p03978
p02857
p02226
p02076
p02071
p01664
p02227
p02070
p02077
p02592
p01584
p01515
p01972
p01585
p01582
p00696
p01918

This causes an error in the run_eval.py script that expects that the index starts with 0.

One solution is to simply rename these files.

Running run_eval.py

Thanks for script for evaluation on the CodeNet dataset!

I am trying to evaluate some predictions on the dataset with the command:

python3 src/codenet_eval/run_eval.py --eval_config eval_files/example_eval_config.yaml

In the output report file, all input_* and generated_answers_* columns are either null or 0. I tried to submit the file (generated from by setting the temp_dir option to run_eval.py) to the AtCoder website and it was able to compile and run.

In attachment I have uploaded one line of the jsonl files and yaml config file. Thanks if you can take a look at this.

example_data.zip

How to post-process the generated code?

I checked the whole paper but failed to find any information about how you post-process the generated code.
Is there any post-processing or not?
Thanks for any reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.