Coder Social home page Coder Social logo

parasj / contracode Goto Github PK

View Code? Open in Web Editor NEW
165.0 165.0 27.0 10.38 MB

Contrastive Code Representation Learning: functionality-based JavaScript embeddings through self-supervised learning

Home Page: https://parasj.github.io/contracode/

License: Apache License 2.0

Python 13.89% JavaScript 1.34% Makefile 0.01% Jupyter Notebook 82.54% Shell 2.22%
compiler contrastive-learning deep-learning machine-learning momentum-contrast programming-language pytorch

contracode's Issues

What kind of gpu environment did you use to train the model?

I tried to learn both pretrain and fine-tuning with one RTX-2080ti, but it takes a lot of time. What kind of learning environment did you use?
I would appreciate it if you could tell me the specs and number of gpu's you used for model pretraining and fine-tuning.

ask help for the codeclone dataset

great work! I need some help of your codeclone dataset. If you do not mind spend a little time and help me figure out it , I will be very appreciated to you. I download it by the "scripts/download_data.py" in your repo (codeclone/full_data.json.gz) , but I do not know wether it is the dataset used in "4.1 Evaluating Functionality and Robustness: Zero-shot Code Clone Detection" in your paper. I see the "split" function in "representjs/clone_detection.py", so I'm confused... And for the 2065 pairs you mention in your paper, ( also in 4.1) , is it from the same dataset? and how to get it? If you do not mind spend a little time and help me figure out it , I will be very appreciated to you.

Python functions extension

@parasj Is it your code applicable to Python language function? More precisely can the automated source to source compiler transformation be used for Python beyond js language?

How to generate augmented js file

Hi Parasj, thanks so much for your great work and released code, I notice that in the pre-trained step, the model is trained on the javascript_augmented.pickle.gz file. I want to ask that could we generate the augmented our js file by ourselves? If so, how to generate them? Thanks for your response and gudiance, best regards.

Proper Pytorch version

First of all, thanks for sharing the impressive work!

We were trying to get the finetuning for downstream task working, but got the following issue

torch.nn.modules.module.ModuleAttributeError: 'DataParallel' object has no attribute 'encoder'

It's quite likely that we were using the incorrect version of Pytorch (or maybe other dependencies).

Would you kindly share the proper version of the dependency required to run the project?

Cheers

data.zip

Failed to decompress the data.zip file in the cloud disk. Is there any solution

Memory explosion when pretrain Bidirectional LSTM

Hi,

Thanks for the wonderful work. May I ask a question, when I pretrain LSTM model with the default settings, the memory is overflow. My server has 180G RAM, so may I ask how much RAM needed for pretraining?

Thanks and best regards.

Cannot obtain the checkpoint

I follow the README to

Download the data subfolder from [this Google Drive link](https://drive.google.com/drive/folders/153pZfKPcr1-l8VaDPys29b1ElGLuoq3M?usp=sharing) and place at the root of the repository. This folder contains training and evaluation data, vocabularies and model checkpoints.

However, I do not fine the checkpoints and only the data.zip.

image

How can I obtain the checkpoint?

Thanks

Memory requirements for ContraCode

Hi Parasj, thanks for publishing your code. I want to ask what are the parameters of the equipment used in your experiment? I found that 16G memory is not enough if using the javascript_augmented.pickle.gz file.

Originally posted by @QZH-eng in #6 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.