Add JS support via ANTLR grammar Wrap an existing too

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Add JS support,about jetbrains-research/astminer

Comments (17)

utkarsh-agrawaal commented on May 30, 2024

Hey, any update on JS support?

from astminer.

elena-lyulina commented on May 30, 2024

Hey @utkarsh-agrawaal, the news is as follows:
For the first, we hope to add JS support via ANTLR grammar in a few days, but there are some issues since a context-free grammar does not describe JS syntax (here you can read about it in detail), so make sure it’s okay for you.
For the second, we were unable to find any tool that would suit our needs.

from astminer.

HaseebLUMS commented on May 30, 2024

Hey @elena-lyulina , any update on JS support?

from astminer.

elena-lyulina commented on May 30, 2024

Hey @HaseebLUMS,
The beta version of JS parser is under review.
May I ask what is your purpose of this parser usage?

from astminer.

HaseebLUMS commented on May 30, 2024

@elena-lyulina
I need it for using it with code2vec for my project in which I am applying different ML techniques on scripts of web pages.

When should I expect to see JS parser publicly released?

from astminer.

elena-lyulina commented on May 30, 2024

@HaseebLUMS
Let's see what @egor-bogomolov can say about it, cause the review stage depends on him

from astminer.

HaseebLUMS commented on May 30, 2024

Hello @egor-bogomolov
Can you please give a tentative time when JS parser will be ready?

from astminer.

egor-bogomolov commented on May 30, 2024

@HaseebLUMS I will review parser's code tonight.

from astminer.

HaseebLUMS commented on May 30, 2024

@egor-bogomolov
Thank you. Will the output of this parse will be compatible with code2vec like other parsers?

from astminer.

egor-bogomolov commented on May 30, 2024

@HaseebLUMS hope so :)

from astminer.

kvenux commented on May 30, 2024

@egor-bogomolov Hi, how's it going?
Need some supports here. Many thanks!

from astminer.

nashid commented on May 30, 2024

@egor-bogomolov @elena-lyulina I would like to know whats the current status of JS support with astminer? Any plan when JS will be supported with astminer?

from astminer.

egor-bogomolov commented on May 30, 2024

Hi @nashid, big thanks for reminding us about JS :)
I added JS to the CLI (see #123), you can build the branch cli-javascript yourself and use it right away. If you need any further help, don't hesitate to contact us.

from astminer.

nashid commented on May 30, 2024

@egor-bogomolov I have attempted to use it with the following sample input:

example:
sum(a, b)

execution:
./cli.sh pathContexts --lang js --project context-ml-dataset --output context-ml-dataset-output --maxL 5000 --maxW 5000 --maxContexts 10 --maxTokens 5000 --maxPaths 10

Output files:

tokens.csv

id,token
3,
2,a
4,b
1,(
5,)

node_types.csv

id,node_type
1,OpenParen UP
2,arguments TOP
4,Comma DOWN
3,singleExpression|Identifier DOWN
6,singleExpression|Identifier UP
5,CloseParen DOWN
7,Comma UP

paths.csv

id,path
1,1 2 3
2,1 2 4
3,1 2 5
5,6 2 3
4,6 2 4
7,7 2 3
6,6 2 5
8,7 2 5

path_contexts.csv
context-ml-dataset/temp.js 1,1,2 1,2,3 1,1,4 1,3,5 2,4,3 2,5,4 2,6,5 3,7,4 3,8,5 4,6,5

A couple of pertinent questions:

Firstly, why we have some paths containing parenthesis and comma?
- Is there a way to omit parenthesis and comma as they are not supposed to be part of the AST path?
Is there a way to suppress comma (,) and left parentheses and right parenthesis in the AST paths?
Finally, I presume before feeding into code2vec we are supposed to replace the path_contexts along with actual values from tokens, node_types, and paths? I understand PathMiner is setting ID’s in the path-contexts for reducing memory and I can write a simple python code snippet to replace those tokens. Or I am missing something i.e. PathMiner can also perform the token replacement?

I would also be curious to know how to make the path output more closer to AST i.e. the output without comma, parenthesis.

Also AST output from Esprima illustrating the problem:

I am happy to contribute to the repo as required.

from astminer.

SpirinEgor commented on May 30, 2024

Hi!

Storing parenthesis, commas, etc. is strange. But I think this is predefined by ANTLR4 grammar which we use for JS. Maybe there are some parameters for generating rules inside ANTLR4 to set up the way of parsing... Btw, did you try to run the astminer on more complex examples? For example on some functions?

Speaking about changing ids back to words in paths, it's completely unnecessary. You already can feed this data in code2vec. You need this back conversion only on inference, to produce readable output to users.

from astminer.

egor-bogomolov commented on May 30, 2024

@nashid, you don't see the sum token because you've set a very tight limit on the number of extracted contexts -- only 10. If you raise it up to, let's say, 100, you will see that all the expected tokens are there.

As @SpirinEgor mentioned, all the non-alphanumeric tokens (like , and () are due to the ANTLR4 grammar used under the hood. If you will run code2vec task instead of pathContexts, all such tokens will be replaced with EMPTY_TOKEN. You can either change the code a little bit in order not to store such contexts or just clean them afterward.

@SpirinEgor I guess we need to work on the configuration so that we can automatically ignore such tokens and corresponding contexts.

from astminer.

SpirinEgor commented on May 30, 2024

Since JavaScript was added and there are no questions at this moment I will close the issue. But feel free to open at any time if you have ones.

from astminer.

Add JS support about astminer HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent