Coder Social home page Coder Social logo

Retrieved Content about codet HOT 7 OPEN

yiqingxyq avatar yiqingxyq commented on September 4, 2024
Retrieved Content

from codet.

Comments (7)

yiqingxyq avatar yiqingxyq commented on September 4, 2024 1

i am also try to run this project successfully, cheer up and thx for your guys‘ issue, it gives me hope to do it!thx!

If you're interested, here's our implementation of gt retrieval (without filtering the unfinished part): https://github.com/code-rag-bench/code-rag-bench/tree/main?tab=readme-ov-file#retrieval.

from codet.

zfj1998 avatar zfj1998 commented on September 4, 2024

We are truly sorry the generated content cannot be restored for now. We would love to help you reproduce the results though.

from codet.

yiqingxyq avatar yiqingxyq commented on September 4, 2024

Thanks for being willing to help! Just want to ask about the retrieval setting. When you do retrieval, do you filter out the file containing the code to complete?

If not, the model is possible to retrieve the target of code generation as the context, which does not make much sense to me -- if you want to use a model to help you complete the code, by the time you call the model, the target code does not exist in the repo yet.

If you do, is there an efficient way to do that?

from codet.

zfj1998 avatar zfj1998 commented on September 4, 2024

Of course, we need to filter out the target file to avoid leakage. However, we did not filter out all the content in the target file. We keep the content in the front of the target file that is not covered by the context provided to the LM. For example, file A has 100 lines, we have line 20-80 as the unfinished code, and line 81 as the completion hole. During retrieval, we also retrieve line 1-19 as useful supplementary Information for the completion.

The code related to this matter is

if metadata['end_line_no'] <= query_line['metadata']['context_start_lineno']:

The context_start_lineno is metadata we stored for each completion case.

from codet.

yiqingxyq avatar yiqingxyq commented on September 4, 2024

Thanks. The setting makes sense to me!

I retrieved the GT context for the "function" split by adapting your code (window_size=50, slice_size=5). Then I filtered out the unfinished part using your logic (line 44) and run code generation using ChatGPT (2k tokens for GT context, 2k for infile context). I only got Pass@1=0.2895, and you reported 0.4263.

The full evaluation results are:

{
    "EM": 0.10723860589812333,
    "ES": 0.48067297674081083,
    "Pass@1": 0.289544235924933,
}

Here's the GT context I got:
repoeval-function-4k-gt-top5-filter.jsonl.txt

Can you run your code to produce the GT context file for RepoEval-function, so I can compare the difference? Thanks!!!

from codet.

binwensun avatar binwensun commented on September 4, 2024

i am also try to run this project successfully, cheer up and thx for your guys‘ issue, it gives me hope to do it!thx!

from codet.

binwensun avatar binwensun commented on September 4, 2024

from codet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.