Coder Social home page Coder Social logo

Add JS support about astminer HOT 17 CLOSED

jetbrains-research avatar jetbrains-research commented on May 30, 2024 3
Add JS support

from astminer.

Comments (17)

utkarsh-agrawaal avatar utkarsh-agrawaal commented on May 30, 2024

Hey, any update on JS support?

from astminer.

elena-lyulina avatar elena-lyulina commented on May 30, 2024

Hey @utkarsh-agrawaal, the news is as follows:
For the first, we hope to add JS support via ANTLR grammar in a few days, but there are some issues since a context-free grammar does not describe JS syntax (here you can read about it in detail), so make sure it’s okay for you.
For the second, we were unable to find any tool that would suit our needs.

from astminer.

HaseebLUMS avatar HaseebLUMS commented on May 30, 2024

Hey @elena-lyulina , any update on JS support?

from astminer.

elena-lyulina avatar elena-lyulina commented on May 30, 2024

Hey @HaseebLUMS,
The beta version of JS parser is under review.
May I ask what is your purpose of this parser usage?

from astminer.

HaseebLUMS avatar HaseebLUMS commented on May 30, 2024

@elena-lyulina
I need it for using it with code2vec for my project in which I am applying different ML techniques on scripts of web pages.

When should I expect to see JS parser publicly released?

from astminer.

elena-lyulina avatar elena-lyulina commented on May 30, 2024

@HaseebLUMS
Let's see what @egor-bogomolov can say about it, cause the review stage depends on him

from astminer.

HaseebLUMS avatar HaseebLUMS commented on May 30, 2024

Hello @egor-bogomolov
Can you please give a tentative time when JS parser will be ready?

from astminer.

egor-bogomolov avatar egor-bogomolov commented on May 30, 2024

@HaseebLUMS I will review parser's code tonight.

from astminer.

HaseebLUMS avatar HaseebLUMS commented on May 30, 2024

@egor-bogomolov
Thank you. Will the output of this parse will be compatible with code2vec like other parsers?

from astminer.

egor-bogomolov avatar egor-bogomolov commented on May 30, 2024

@HaseebLUMS hope so :)

from astminer.

kvenux avatar kvenux commented on May 30, 2024

@egor-bogomolov Hi, how's it going?
Need some supports here. Many thanks!

from astminer.

nashid avatar nashid commented on May 30, 2024

@egor-bogomolov @elena-lyulina I would like to know whats the current status of JS support with astminer? Any plan when JS will be supported with astminer?

from astminer.

egor-bogomolov avatar egor-bogomolov commented on May 30, 2024

Hi @nashid, big thanks for reminding us about JS :)
I added JS to the CLI (see #123), you can build the branch cli-javascript yourself and use it right away. If you need any further help, don't hesitate to contact us.

from astminer.

nashid avatar nashid commented on May 30, 2024

@egor-bogomolov I have attempted to use it with the following sample input:

example:
sum(a, b)

execution:
./cli.sh pathContexts --lang js --project context-ml-dataset --output context-ml-dataset-output --maxL 5000 --maxW 5000 --maxContexts 10 --maxTokens 5000 --maxPaths 10

Output files:

tokens.csv

id,token
3,
2,a
4,b
1,(
5,)

node_types.csv

id,node_type
1,OpenParen UP
2,arguments TOP
4,Comma DOWN
3,singleExpression|Identifier DOWN
6,singleExpression|Identifier UP
5,CloseParen DOWN
7,Comma UP

paths.csv

id,path
1,1 2 3
2,1 2 4
3,1 2 5
5,6 2 3
4,6 2 4
7,7 2 3
6,6 2 5
8,7 2 5

path_contexts.csv
context-ml-dataset/temp.js 1,1,2 1,2,3 1,1,4 1,3,5 2,4,3 2,5,4 2,6,5 3,7,4 3,8,5 4,6,5

A couple of pertinent questions:

  • Firstly, why we have some paths containing parenthesis and comma?

    • Is there a way to omit parenthesis and comma as they are not supposed to be part of the AST path?
  • Is there a way to suppress comma (,) and left parentheses and right parenthesis in the AST paths?

  • Finally, I presume before feeding into code2vec we are supposed to replace the path_contexts along with actual values from tokens, node_types, and paths? I understand PathMiner is setting ID’s in the path-contexts for reducing memory and I can write a simple python code snippet to replace those tokens. Or I am missing something i.e. PathMiner can also perform the token replacement?

I would also be curious to know how to make the path output more closer to AST i.e. the output without comma, parenthesis.

Also AST output from Esprima illustrating the problem:
image

I am happy to contribute to the repo as required.

from astminer.

SpirinEgor avatar SpirinEgor commented on May 30, 2024

Hi!

Storing parenthesis, commas, etc. is strange. But I think this is predefined by ANTLR4 grammar which we use for JS. Maybe there are some parameters for generating rules inside ANTLR4 to set up the way of parsing... Btw, did you try to run the astminer on more complex examples? For example on some functions?

Speaking about changing ids back to words in paths, it's completely unnecessary. You already can feed this data in code2vec. You need this back conversion only on inference, to produce readable output to users.

from astminer.

egor-bogomolov avatar egor-bogomolov commented on May 30, 2024

@nashid, you don't see the sum token because you've set a very tight limit on the number of extracted contexts -- only 10. If you raise it up to, let's say, 100, you will see that all the expected tokens are there.

As @SpirinEgor mentioned, all the non-alphanumeric tokens (like , and () are due to the ANTLR4 grammar used under the hood. If you will run code2vec task instead of pathContexts, all such tokens will be replaced with EMPTY_TOKEN. You can either change the code a little bit in order not to store such contexts or just clean them afterward.

@SpirinEgor I guess we need to work on the configuration so that we can automatically ignore such tokens and corresponding contexts.

from astminer.

SpirinEgor avatar SpirinEgor commented on May 30, 2024

Since JavaScript was added and there are no questions at this moment I will close the issue. But feel free to open at any time if you have ones.

from astminer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.