ai-nstein / hoppity Goto Github PK

Hoppity

Home Page: https://hoppity.seas.upenn.edu

Python 92.29% JavaScript 3.97% Shell 1.60% Makefile 0.48% C++ 1.40% C 0.26%

hoppity's Introduction

Hoppity is a learning based approach to detect and fix bugs in Javascript programs.

Hoppity is trained on a dataset of (buggy, fixed) pairs from Github commits. Refer to our gh-crawler repo for scripts to download and generate a dataset. Alternatively, we provide a cooked dataset of pairs with a single AST difference here: https://drive.google.com/file/d/1kEJBCH1weMioTcnmG6fmqz6VP-9KjH7x/view?usp=sharing and the ZeroOneDiff cooked dataset: https://drive.google.com/file/d/1AHxXQhS2UVKOxNUfuetrM-uVKHjpjSOs/view?usp=sharing

We also provide trained models for each of the following datasets:
One Diff Model

Zero and One Diff Model

Zero, One, and Two Diff Model

INSTALL

Install python packages:

pip install torch==1.3.1
pip install numpy
pip install -r requirements.txt

Other dependencies

cd deps/torchext
pip install -e .

install current package

hoppity$ pip install -e .

JS packages

npm install shift-parser
npm install ts-morph
npm install shift-spec-consumer

Data Preprocessing

If you would like to use your own dataset, you will need to "cook" the folder as a preprocessing step to generate graphs. You can use the data_process/run_build_main.sh script to create the cooked dataset. Set the variables data_root, data_name and ast_fmt accordingly.

This builds the AST in our graph format, for each file and saves it in a pkl file. Additionally, it creates a graph edit file text file for each pair of (buggy, fixed) JS ASTs. This is in a JSON format such that each edit is an object in a list.

Data Split - Train, Validate, and Test

If you're using the cooked dataset we provided, this portion is already done for you. Once you've downloaded the compressed file, unzip by running tar xzf cooked-one-diff.gz. If you do not specify an output directory, the files will be placed in ~/cooked-full-fmt-shift_node/ by default. This will take around an hour. After the files are extracted you can move onto the next step to begin training.

Otherwise, run data_process/run_split.sh to partition your cooked dataset. The raw Javascript source files are needed for this script to filter out duplicates. Set the raw_src variable in the script accordingly.

run_split.sh calls split_train_test.py to load triples from the save_dir and partition according to the percentage arguments specified in config.py. The default split is 80% train, 10% validation, and 10% test. It saves three files: test.txt, val.txt, and train.txt in the save_dir with the cooked data. Each sample name in the cooked dataset is written in one of the three files.

Training

Now, run run_main.sh to train on our pre-processed dataset. Set the variables in the script accordingly. Hyperparameters can be changed in common/config.py. The training runs indefinitely. Kill the script manually to end training.

Finding the Best Model

To find the "Best Model", we've provided a script that evaluates each epoch's model dump on the validation set. Run find_best_model.sh to start the evaluation. Set the variables accordingly. The loss of each epoch's model will be recorded in the LOSS_FILE.

Evaluation

We provide an evaluation script that can evaluate a particular model on a number of metrics:

Total End-to-End Accuracy - A sample is considered accurate if the model detects the bug and predicts the entire given fix.
Location Accuracy - Bug detection acccuracy
Operator Accuracy - Since there are only 4 operators (ADD, REMOVE, REPLACE_VAL, REPLACE_TYPE), we always report top-1 accuracy.
Value Accuracy - If the sample is a REPLACE_VAL or ADD, it is considered accurate if the value is predicted correctly. We also include an UNKNOWN value for literals not included in the vocabulary. If the model predicts UNKNOWN a vlaue not in the vocabulary, it is considered correct.
Type Accuracy - If the sample is a REPLACE_TYPE or ADD, it is considered accurate if the node type is predicted correctly.

We also include an option for accuracy breakdown per operation type. Lastly, if you would like an exhaustive evaluation of all metrics, we provide the output_all option.

hoppity's People

Contributors

Stargazers

Watchers

Forkers

tehranixyz legitao ml-for-code saikat107 nalazhao lhysgithub wszlong chontipan avzero07 wliuxingxiangyu elizabethdinella supernovaaac meghasahay spisor

hoppity's Issues

confusion about refs.npy file and refs2.npy in one diff dataset

Hi,

I have downloaded the one diff dataset and extracted it using tar xzf cooked-one-diff.gz. And I am trying to parse the one diff dataset.

AFAIU, you use this generator with file_suffix=['_buggy.pkl', '_fixed.pkl', '_gedit.txt', '_refs.npy'] (eg. at here) to generate the DataSample. However, I noticed that there are also *refs2.npy files in the cooked-full-fmt-shift_node directory extracted from cooked-one-diff.gz. What are the difference between *refs.npy and *refs2.npy files?

Error on training phase

Hi and thank you for the work! I tried the tool on Google Colaboratory with the cooked dataset https://drive.google.com/file/d/1AHxXQhS2UVKOxNUfuetrM-uVKHjpjSOs/view?usp=sharing and checkpoint https://drive.google.com/file/d/1xAnJwPEd1DzsxHW2Z_SLZikgiUwS6_zW/view?usp=sharing, loaded through torch.

After unzipped the cooked dataset and load the checkpoint of the model, run_main.sh script (with variables cooked_root, data_name and save_dir modified accordingly) has been launched for the training phase. After half an hour of training, this error appears:

ValueError: MessagePassing.propagate only supports torch.LongTensor of shape [2, num_messages] or torch_sparse.SparseTensor for argument edge_index.
0% 0/100 [00:04<?, ?it/s]

Any suggestions about it?

How to produce ast edits or patches?

We were able to run the training and the inference (i.e. evaluation) script. However, we can only find the accuracy metrics. Is it also possible also get the AST edit operations in the tree?

./eval.sh
use cpu
loading HOPPITY from /Users/zhutao/lab/hoppity
Namespace(act_func=‘tanh’, ast_fmt=‘shift_node’, att_type=‘inner_prod’, batch_size=10, beam_agg=False, beam_size=3, comp_method=‘mlp’, data_in_mem=False, data_name=‘contextmltttttzzz’, data_root=‘/Users/zhutao/lab/data/small_astPKL’, dataset_stats=False, dropbox=None, dropout=0, end_epoch=10000, eval_dump_folder=‘~/eval_dump/’, gnn_msg_dim=128, gnn_out=‘last’, gnn_type=‘s2v_multi’, grad_clip=5, hinge_loss_type=‘sum’, init_model_dump=None, iters_per_val=100, lang_dict=‘None’, latent_dim=128, learning_rate=0.001, loc_acc=True, loc_given=False, loss_file=‘loss.txt’, max_ast_nodes=500, max_lv=4, max_modify_steps=1, max_token_len=100, min_lr=1e-06, mlp_hidden=256, msg_agg_type=‘sum’, neg_samples=1, num_cores=4, num_epochs=10000, op_acc=False, op_breakdown=False, op_given=False, output_all=True, penalize_unknown=False, phase=None, rand=False, raw_srcs=None, readout_agg_type=‘sum’, resampling=True, rnn_cell=‘gru’, rnn_layers=2, sample_list=None, save_dir=‘/Users/zhutao/lab/data/small_trainingResult’, seed=19260817, sibling_acc=False, sqr_data=None, start_epoch=0, target_model=‘/Users/zhutao/lab/data/small_trainingResult/epoch-0.ckpt’, test_pct=0.1, topk=3, train_pct=0.8, type_acc=False, val_acc=True, val_pct=0.1, vocab_type=‘fixes’)
====== begin of s2v configuration ======
| msg_average = 0
======   end of s2v configuration ======
not loading cuda jagged ops
not loading cuda metric ops
loading cooked asts and edits
51it [00:00, 3214.80it/s]
51 samples loaded.
loading vocab from /Users/zhutao/lab/data/small_astPKL/type_vocab.pkl
256 types of nodes in total.
train set has 41 samples
val set has 5 samples
test set has 5 samples
loading /Users/zhutao/lab/data/small_trainingResult/epoch-0.ckpt
Beam agg False
0it [00:00, ?it/s]----- replace_val
1it [00:03,  3.33s/it]
total accuracy 0.0000
location accuracy 0.0000
value accuracy 0.2000
number of unique edits 0 total samples 5
number of unknowns 0

Is it possible to share ZeroOneTwoDiff dataset?

Hello Hoppity Authors, is it possible to share the ZeroOneTwoDiff dataset?

How convert cooked sample to code?

Hi, is it possible to convert pre-processed sample file (.pkl) to code?

I have a look on the issue #12 and use depickle.py to print AST nodes. And, can I use these outputs to convert back to the code?

best model script returns best model none

when I run the best model script, it returns Best Model none. Am I doing something wrong? I'm new to this field and any help would be appreciated

What is comp_method used for?

The only reference of comp_method in the code is here, which looks like predicting the node location to perform the edit.
Can you elaborate on the differences of four options(i.e. inner_prod, mlp, bilinear, multilin) for comp_method?

ts-morph error on hoppity dataset preprocess

Hi,
I tried running hoppity model with the dataset that I retrieved from gh-crawler repo and it seems like while running the ./run_build_dataset.sh script, I’m getting the following errors. I have attached the dataset with the AST diff generated. I have tried to run the model with about 17k sample datapoints and I'm running into the following errors

Here is a snippet of the error logs. It is the same error related to the ts-morph lib

JSON ERROR Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/02-07-2019:16_2464_0sketch_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/02-07-2019:16_2464_0sketch_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
{"9":[],"16":[12],"26":[7],"29":[],"36":[32],"46":[],"51":[77],"54":[7],"64":[7],"65":[],"70":[66],"81":[77],"85":[],"96":[],"99":[85],"100":[],"107":[],"111":[105],"114":[94],"119":[105],"129":[],"133":[127],"138":[94],"141":[105],"148":[127],"158":[],"163":[85],"165":[127],"166":[],"167":[],"174":[],"179":[85],"182":[127],"185":[],"186":[],"191":[85],"193":[127],"197":[],"200":[127],"212":[],"215":[85],"217":[127],"222":[85],"224":[127],"227":[85],"230":[127],"237":[85],"240":[127],"244":[210]}

# valid: 727: : 729it [07:33,  1.06s/it]JSON ERROR Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/01-08-2019:13_1380_2Tweet_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/01-08-2019:13_1380_2Tweet_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/01-08-2019:13_1380_2Tweet_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/01-08-2019:13_1380_2Tweet_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/01-08-2019:13_1380_2Tweet_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/01-08-2019:13_1380_2Tweet_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/01-08-2019:13_1380_2Tweet_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/01-08-2019:13_1380_2Tweet_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
{"7":[],"8":[8],"17":[],"26":[24],"29":[17],"30":[],"41":[],"51":[41],"54":[24],"55":[61],"60":[24],"61":[61],"65":[17],"66":[],"71":[],"75":[],"79":[],"83":[17],"84":[],"85":[],"87":[],"91":[],"98":[24],"99":[99],"105":[],"109":[],"118":[109],"119":[],"129":[],"133":[],"142":[133],"143":[],"154":[17],"155":[],"159":[24]}

JSON ERROR Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/02-07-2019:18_5214_0classes_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/02-07-2019:18_5214_0classes_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/02-07-2019:18_5214_0classes_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/02-07-2019:18_5214_0classes_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/02-07-2019:18_5214_0classes_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/02-07-2019:18_5214_0classes_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
{"5":[3],"17":[],"26":[26],"28":[17],"31":[29],"33":[3],"44":[],"46":[],"56":[44],"61":[61],"63":[46],"66":[64],"68":[29],"79":[],"81":[],"83":[],"93":[79],"98":[98],"100":[81],"105":[105],"107":[83],"113":[],"124":[98],"128":[105],"134":[],"146":[98],"150":[105],"155":[153],"157":[64],"168":[],"170":[],"180":[168],"182":[170],"184":[170],"190":[],"201":[98],"209":[],"220":[98],"225":[223],"227":[3],"238":[],"240":[],"250":[238],"255":[255],"257":[240],"263":[],"280":[255],"288":[],"307":[255],"310":[308],"316":[316],"323":[355],"327":[327],"334":[362],"344":[],"346":[],"355":[355],"357":[344],"362":[362],"364":[346],"370":[],"374":[],"376":[],"384":[],"389":[316],"392":[374],"396":[327],"399":[376],"405":[],"409":[],"426":[316],"430":[409],"431":[],"439":[327],"443":[409],"444":[],"453":[],"458":[3],"461":[29],"464":[64],"467":[153],"470":[223],"473":[308]}

# valid: 730: : 732it [07:34,  1.54it/s]JSON ERROR Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/02-04-2019:08_779_0index_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/02-04-2019:08_779_0index_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
{"9":[],"12":[],"21":[],"24":[7],"32":[],"37":[],"38":[],"39":[],"46":[19],"47":[],"55":[],"57":[],"62":[57],"63":[],"70":[19],"71":[],"79":[],"81":[],"86":[81],"87":[],"94":[19],"95":[],"98":[30],"116":[30]}

JSON ERROR Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/03-11-2019:12_3621_2main_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/03-11-2019:12_3621_2main_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/03-11-2019:12_3621_2main_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/03-11-2019:12_3621_2main_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/03-11-2019:12_3621_2main_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/03-11-2019:12_3621_2main_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/03-11-2019:12_3621_2main_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/03-11-2019:12_3621_2main_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/03-11-2019:12_3621_2main_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/03-11-2019:12_3621_2main_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/03-11-2019:12_3621_2main_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/03-11-2019:12_3621_2main_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/03-11-2019:12_3621_2main_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/03-11-2019:12_3621_2main_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/03-11-2019:12_3621_2main_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
Error in file /mnt/volume1/ubc-works/hoppity-data/ml_astJSON/03-11-2019:12_3621_2main_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
{"24":[],"36":[],"40":[22],"41":[],"51":[],"58":[22],"59":[],"61":[49,49],"66":[49,49],"67":[67],"80":[22],"81":[],"83":[34],"86":[49,49],"93":[],"96":[22],"97":[],"103":[91,91],"114":[],"123":[91,91],"126":[],"147":[],"150":[22],"151":[],"157":[145,145],"168":[],"177":[145,145],"180":[],"201":[],"204":[22],"205":[],"211":[199,199],"222":[],"231":[199,199],"234":[],"255":[],"258":[22],"259":[],"265":[253,253],"276":[],"285":[253,253],"288":[],"309":[],"313":[253,253],"314":[],"319":[91,91],"320":[],"325":[],"328":[253,253],"329":[],"331":[],"334":[199,199],"335":[],"337":[],"340":[253,253],"341":[],"350":[307],"378":[],"382":[145,145],"383":[],"388":[307],"389":[],"396":[],"400":[145,145],"401":[],"407":[376],"415":[394],"417":[],"418":[],"426":[],"438":[442],"444":[],"445":[],"452":[],"456":[],"457":[457],"459":[49,49],"464":[],"469":[],"472":[],"474":[49,49],"487":[]}

Error in evalution

Hi there,

I run into a few problems when running Finding the Best Model section.

In the README, it says

Run find_best_model.sh to start the evaluation

Did you mean find_no_op_model.sh?

In find_best_model.py, the line loading vocabulary tries to load vocab_fixes.npy/vocab_full.npy. However, this file doesn't exist since the result of running run_build_dataset.sh would only save the vocabulary to a file with the name vocab.npy(code). Could you elaborate on what the fixes and full vocab are and what this vocab.npy should be?
If I use the vocab.npy to fix the file not existing problem above, then there was an error raised when loading the trained model(*.ckpt). The error message is:

Traceback (most recent call last):
File "find_best_model.py", line 57, in
model.load_state_dict(torch.load(_dir))
File "/Users/zhutao/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 839, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for GraphTrans:
Missing key(s) in state_dict: "bilin.weight", "bilin.bias", "ops.1.bilin.weight", "ops.1.bilin.bias", "ops.2.bilin.weight", "ops.2.bilin.bias", "ops.3.bilin.weight", "ops.3.bilin.bias", "ops.4.bilin.weight", "ops.4.bilin.bias".

Does anyone know the reason?

Problem about Data Load

Thanks for your meaningful work! But i meet some question about data load. Why loading cooked asts and edits(code below) costs so much time (4 hour in my computer).
Hope for your reply!

 for file_tuple in tqdm(cooked_gen):
            f_bug, f_fixed, f_diff, b_refs = file_tuple
            sample_name = get_bug_prefix(f_bug)
            if  phases is not None and sample_name not in avail_set:
                continue
            if any([not os.path.isfile(f) for f in file_tuple]):
                continue
            sample = DataSample(fidx, f_bug, f_fixed, f_diff, b_refs)
            if self.resampling or sample_types is not None:
                s_type = sample.g_edits[0].op
                if sample_types is not None and not s_type in sample_types:
                    continue
                self.sample_edit_type[fidx] = s_type    # edit类型
            self.data_samples.append(sample)
            self.sample_index[sample_name] = fidx
            fidx += 1
        assert len(self.data_samples) == fidx

502 error to access the https://hoppity.seas.upenn.edu

Unable to access to https://hoppity.seas.upenn.edu; return 502 gateway error

Error when runing the trained model (onediff)

Hi, I use the trained model (onediff) and got these errors.

my eval.sh

may I ask about how to solve these issues?

No vocab files included?

Hi Dear Author, this is a clear and excellent implementation of your good work, which I really enjoy to read. One thing when I run the code is there is no "vocab_fixed.npy" or "vocab_full.npy", even though I direct use your provided dataset named "cooked-one-diff". Can you suggest where could I download them? @Hanjun-Dai @elizabethdinella

How to use dataset

Hello,

Could you guide me on how to train the "cooked dataset of pairs with a single AST difference" provided by you(i.e. what's the command to run the training on the cooked dataset)?
Also, in the readme, you said "run run_main.sh to train on our pre-processed dataset", but it seems there is no run_main.sh file in your codebase.

Thanks,
Tao

ZeroOneDiff database will be available?

Hello, May I ask about the ZeroOneDiff database will be available? or how it can be processed on the available resources?

evaluation script throws error

when I run the eval.sh script it throws the following error.

Traceback (most recent call last):
File "eval.py", line 46, in
model.load_state_dict(torch.load(cmd_args.target_model, map_location=torch.device('cpu')))
File "/Users/adutta/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 829, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GraphTrans:
Unexpected key(s) in state_dict: "pred.main.0.weight", "pred.main.0.bias", "pred.main.2.weight", "pred.main.2.bias", "pred.main.4.weight", "pred.main.4.bias", "ops.1.pred.main.0.weight", "ops.1.pred.main.0.bias", "ops.1.pred.main.2.weight", "ops.1.pred.main.2.bias", "ops.1.pred.main.4.weight", "ops.1.pred.main.4.bias", "ops.2.pred.main.0.weight", "ops.2.pred.main.0.bias", "ops.2.pred.main.2.weight", "ops.2.pred.main.2.bias", "ops.2.pred.main.4.weight", "ops.2.pred.main.4.bias", "ops.3.pred.main.0.weight", "ops.3.pred.main.0.bias", "ops.3.pred.main.2.weight", "ops.3.pred.main.2.bias", "ops.3.pred.main.4.weight", "ops.3.pred.main.4.bias", "ops.4.pred.main.0.weight", "ops.4.pred.main.0.bias", "ops.4.pred.main.2.weight", "ops.4.pred.main.2.bias", "ops.4.pred.main.4.weight", "ops.4.pred.main.4.bias".

Is there something I'm doing wrong?

Cannot Reproduce OneDiff results

Hi, I run the provided code with the cooked dataset of OneDiff and the result does not reach the same as in the paper(i.e., total accuracy is 0.261 but I got around 0.25x). I run for 10000 epochs. I change comp_method in the provided code to ''bilinear" because the provided model may use bilinear. Could you please suggest me?

run_main.sh

find_best_model.sh

eval.sh

Purpose of find_all_refs.js and using own dataset

Hi!

I’m running into some issues when trying to use my own dataset of buggy and fixed javascript files with one line difference. Can you explain the necessity of using ts-morph in the preprocessing step (find_all_refs.js file)? Is there something I need to keep in mind about the type of dataset to use? Because I keep running into the error below for many of the files and am having to discard those files from the dataset (even though they can be parsed by shift-ast):

Error in file /ml_astJSON/less.js_91fd4c38a3ef30f31598831e64248b721ade4a4f_index_buggy.js --- TypeError: Cannot read property '_doActionPreNextModification' of undefined
{"5":[],"10":[],"15":[],"20":[],"25":[],"30":[],"35":[],"40":[],"45":[],"50":[],"55":[],"60":[],"65":[],"70":[],"75":[],"80":[],"85":[],"91":[],"95":[],"100":[],"109":[],"111":[],"119":[109],"122":[15],"125":[109],"127":[111],"134":[],"137":[50],"140":[109],"147":[],"150":[55],"153":[132],"155":[109],"162":[],"165":[60],"168":[145],"175":[],"178":[65],"181":[109],"188":[],"191":[70],"194":[109],"196":[160],"198":[173],"205":[],"208":[75],"211":[109],"213":[160],"215":[173],"222":[],"225":[40],"228":[109],"235":[],"239":[],"247":[5],"250":[10],"253":[15],"256":[20],"259":[25],"262":[109],"265":[30],"268":[35],"271":[220],"274":[45],"277":[132],"280":[145],"283":[160],"286":[173],"289":[186],"292":[203],"295":[80],"298":[85],"301":[],"304":[95],"307":[100],"314":[],"320":[],"329":[],"336":[320],"340":[],"347":[],"355":[],"363":[233],"370":[],"374":[233],"375":[250],"382":[347],"386":[233],"387":[250],"389":[370],"395":[347],"405":[353],"409":[370],"414":[312],"417":[347],"425":[353],"427":[370],"441":[],"444":[347],"453":[353],"455":[370],"459":[441],"464":[312],"468":[347],"470":[441],"473":[353]}

Here's a sample file where I'm getting this error:

import data from './data';
import tree from './tree';
import Environment from './environment/environment';
import AbstractFileManager from './environment/abstract-file-manager';
import AbstractPluginLoader from './environment/abstract-plugin-loader';
import visitors from './visitors';
import Parser from './parser/parser';
import Functions from './functions';
import contexts from './contexts';
import sourceMapOutput from './source-map-output';
import sourceMapBuilder from './source-map-builder';
import parseTree from './parse-tree';
import importManager from './import-manager';
import Render from './render';
import Parse from './parse';
import LessError from './less-error';
import transformTree from './transform-tree';
import * as utils from './utils';
import PluginManager from './plugin-manager';
import logger from './logger';

export default (environment, fileManagers) => {
    /**
     * @todo
     * This original code could be improved quite a bit.
     * Many classes / modules currently add side-effects / mutations to passed in objects,
     * which makes it hard to refactor and reason about. 
     */
    environment = new Environment(environment, fileManagers);

    const SourceMapOutput = sourceMapOutput(environment);
    const SourceMapBuilder = sourceMapBuilder(SourceMapOutput, environment);
    const ParseTree = parseTree(SourceMapBuilder);
    const ImportManager = importManager(environment);
    const render = Render(environment, ParseTree, ImportManager);
    const parse = Parse(environment, ParseTree, ImportManager);
    const functions = Functions(environment);

    /**
     * @todo
     * This root properties / methods need to be organized.
     * It's not clear what should / must be public and why.
     */
    const initial = {
        version: [3, 10, 2],
        data,
        tree,
        Environment,
        AbstractFileManager,
        AbstractPluginLoader,
        environment,
        visitors,
        Parser,
        functions,
        contexts,
        SourceMapOutput,
        SourceMapBuilder,
        ParseTree,
        ImportManager,
        render,
        parse,
        LessError,
        transformTree,
        utils,
        PluginManager,
        logger
    };

    // Create a public API
    const ctor = t => function (...args) {
        return new t(...args);
    };

    let t;
    const api = Object.create(initial);
    for (const n in initial.tree) {
        /* eslint guard-for-in: 0 */
        t = initial.tree[n];
        if (typeof t === 'function') {
            api[n.toLowerCase()] = ctor(t);
        }
        else {
            api[n] = Object.create(null);
            for (const o in t) {
                /* eslint guard-for-in: 0 */
                api[n][o.toLowerCase()] = ctor(t[o]);
            }
        }
    }

    return api;
};

Error trying to eval

Hello!

I am trying to perform just the evaluation using the pre-trained model and the cooked dataset. But the vocab_fixes is not matching the dimensions of the pretrained model, is this ok?

It is probably my fault but I'd like to know 😄


RuntimeError: Error(s) in loading state_dict for GraphTrans:
	size mismatch for gnn.w_n2l.weight: copying a param with shape torch.Size([128, 533]) from checkpoint, the shape in current model is torch.Size([128, 292]).
	size mismatch for ops.3.node_type_pred.main.2.weight: copying a param with shape torch.Size([533, 128]) from checkpoint, the shape in current model is torch.Size([292, 128]).
	size mismatch for ops.3.node_type_pred.main.2.bias: copying a param with shape torch.Size([533]) from checkpoint, the shape in current model is torch.Size([292]).
	size mismatch for ops.3.node_type_embedding.weight: copying a param with shape torch.Size([533, 128]) from checkpoint, the shape in current model is torch.Size([292, 128]).
	size mismatch for ops.4.node_type_pred.main.2.weight: copying a param with shape torch.Size([533, 128]) from checkpoint, the shape in current model is torch.Size([292, 128]).
	size mismatch for ops.4.node_type_pred.main.2.bias: copying a param with shape torch.Size([533]) from checkpoint, the shape in current model is torch.Size([292]).
	size mismatch for ops.4.node_type_embedding.weight: copying a param with shape torch.Size([533, 128]) from checkpoint, the shape in current model is torch.Size([292, 128]).

Solved: I was missing the type_vocab.pkl file from the original dataset 😄

Embedding for Hoppity

Hi,

I'm wondering what embedding you use in Hoppity? You mentioned in the paper that

We tried different configurations of our model with different number of layers and different graph embedding methods besides the generic one in Eq 2

In the code, I saw three models: s2v_code2inv, s2v_single, s2v_multi. What are the specific embeddings you use in these three models?

Error when execute run_build_dataset.sh (missing depickle.py in gtrans/data_process/)

Hi, when I was executing run_build_dataset.sh with raw JS pairs, it raised an error:

SyntaxError: Unexpected token u in JSON at position 0
at JSON.parse ()
at Object. (/Users/zhutao/lab/hoppity/gtrans/data_process/find_all_refs.js:268:12)
at Module._compile (internal/modules/cjs/loader.js:956:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:973:10)
at Module.load (internal/modules/cjs/loader.js:812:32)
at Function.Module._load (internal/modules/cjs/loader.js:724:14)
at Function.Module.runMain (internal/modules/cjs/loader.js:1025:10)
at internal/main/run_main_module.js:17:11

I found in gtrans/data_process/find_all_refs.js:265, the line var ast = spawnSync("python", [HOPPITY_HOME + "/depickle.py", ast_file]) executes depicke.py file. But it seems this file is missing in the repo. This results in ast to be undefined, which leads to the error above.

Could you @Hanjun-Dai @elizabethdinella add this depickle.py file to the repo? Thank you very much.