zorazrw / odex Goto Github PK
View Code? Open in Web Editor NEW[EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation
Home Page: https://code-eval.github.io
License: Creative Commons Attribution Share Alike 4.0 International
[EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation
Home Page: https://code-eval.github.io
License: Creative Commons Attribution Share Alike 4.0 International
Hi,
great works~
Welcome to join the OpenCompass for more users.
https://github.com/open-compass/opencompass
OpenCompass Team
The init.py in the metric file mentions the import of compute_codebleu. However, codebleu is not included in the repo? Could you provide the corresponded code?
It seems that the Odex prompts fed to the model have a trailing whitespace, and this degrades the performance of models (CodeGen here) on the benchmark. Adding a strip
to the prompt here would increase the performance. here are some numbers:
python nl2code_codegen.py --language en --model_size 2B --model_data mono \
--num_tests_input 0 --num_tests_eval 100 --num_examples 0 --temperature 0.8 \
--top_p 0.95 --num_return_sequences 50
gives:
Overall Pass@K Scores:
[pass@1] 0.4137 (439)
[pass@2] 0.4662 (439)
[pass@3] 0.4920 (439)
[pass@4] 0.5078 (439)
[pass@5] 0.5188 (439)
[pass@6] 0.5270 (439)
[pass@7] 0.5335 (439)
[pass@8] 0.5387 (439)
[pass@9] 0.5431 (439)
[pass@10] 0.5467 (439)
as opposed to
"pass@1": 14.28,
"pass@2": 15.69,
"pass@5": 16.99,
"pass@10": 17.54
without stripping (also the numbers reported in the paper).
(thanks @murthyrudra for running the code)
Hi, When I run the codegen code, I am getting the following error
Command:
python nl2code_codegen.py --language en --model_size 350M --model_data mono --output_dir codegen_350M
Error:
Traceback (most recent call last):
File "/home/rudra/odex/nl2code_codegen.py", line 203, in <module>
main()
File "/home/rudra/odex/nl2code_codegen.py", line 175, in main
scores_dict = evaluate(model, eval_dataloader, tokenizer, args)
File "/home/rudra/odex/nl2code_codegen.py", line 77, in evaluate
for i, batch_inputs in enumerate(dataloader):
File "/home/rudra/.cache/CGLLM/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 628, in __next__
data = self._next_data()
File "/home/rudra/.cache/CGLLM/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
return self._process_data(data)
File "/home/rudra/.cache/CGLLM/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
data.reraise()
File "/home/rudra/.cache/CGLLM/lib/python3.9/site-packages/torch/_utils.py", line 543, in reraise
raise exception
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/rudra/.cache/CGLLM/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/rudra/.cache/CGLLM/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/rudra/.cache/CGLLM/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/rudra/odex/src/data.py", line 65, in __getitem__
prompt = create_fewshot_prompt_nl2code(
TypeError: create_fewshot_prompt_nl2code() got an unexpected keyword argument 'replace_function_name'
I get this when using a codebase based on ODEX:
/Users/gneubig/work/gemini-benchmark/benchmarking/Code/verify.py:13: FutureWarning: load_metric is deprecated and will be removed in the next major version of datasets. Use 'evaluate.load' instead, from the new library ๐ค Evaluate: https://huggingface.co/docs/evaluate
code_eval_metric = load_metric("code_eval")
I have found several description error and answer error in your English data:
l
and move first 3 elements to the end of the list
There are still many problems with bugs (containing semantic ambiguity or the incorrect answer.)
I hope you can revise your dataset carefully, as your dataset contains several diverse libraries, which can make huge impact on the whole code generation progress.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.