Comments (17)
Hey, any update on JS support?
from astminer.
Hey @utkarsh-agrawaal, the news is as follows:
For the first, we hope to add JS support via ANTLR grammar in a few days, but there are some issues since a context-free grammar does not describe JS syntax (here you can read about it in detail), so make sure itβs okay for you.
For the second, we were unable to find any tool that would suit our needs.
from astminer.
Hey @elena-lyulina , any update on JS support?
from astminer.
Hey @HaseebLUMS,
The beta version of JS parser is under review.
May I ask what is your purpose of this parser usage?
from astminer.
@elena-lyulina
I need it for using it with code2vec for my project in which I am applying different ML techniques on scripts of web pages.
When should I expect to see JS parser publicly released?
from astminer.
@HaseebLUMS
Let's see what @egor-bogomolov can say about it, cause the review stage depends on him
from astminer.
Hello @egor-bogomolov
Can you please give a tentative time when JS parser will be ready?
from astminer.
@HaseebLUMS I will review parser's code tonight.
from astminer.
@egor-bogomolov
Thank you. Will the output of this parse will be compatible with code2vec like other parsers?
from astminer.
@HaseebLUMS hope so :)
from astminer.
@egor-bogomolov Hi, how's it going?
Need some supports here. Many thanks!
from astminer.
@egor-bogomolov @elena-lyulina I would like to know whats the current status of JS support with astminer? Any plan when JS will be supported with astminer?
from astminer.
Hi @nashid, big thanks for reminding us about JS :)
I added JS to the CLI (see #123), you can build the branch cli-javascript
yourself and use it right away. If you need any further help, don't hesitate to contact us.
from astminer.
@egor-bogomolov I have attempted to use it with the following sample input:
example:
sum(a, b)
execution:
./cli.sh pathContexts --lang js --project context-ml-dataset --output context-ml-dataset-output --maxL 5000 --maxW 5000 --maxContexts 10 --maxTokens 5000 --maxPaths 10
Output files:
tokens.csv
id,token
3,
2,a
4,b
1,(
5,)
node_types.csv
id,node_type
1,OpenParen UP
2,arguments TOP
4,Comma DOWN
3,singleExpression|Identifier DOWN
6,singleExpression|Identifier UP
5,CloseParen DOWN
7,Comma UP
paths.csv
id,path
1,1 2 3
2,1 2 4
3,1 2 5
5,6 2 3
4,6 2 4
7,7 2 3
6,6 2 5
8,7 2 5
path_contexts.csv
context-ml-dataset/temp.js 1,1,2 1,2,3 1,1,4 1,3,5 2,4,3 2,5,4 2,6,5 3,7,4 3,8,5 4,6,5
A couple of pertinent questions:
-
Firstly, why we have some paths containing parenthesis and comma?
- Is there a way to omit parenthesis and comma as they are not supposed to be part of the AST path?
-
Is there a way to suppress comma (,) and left parentheses and right parenthesis in the AST paths?
-
Finally, I presume before feeding into code2vec we are supposed to replace the path_contexts along with actual values from tokens, node_types, and paths? I understand PathMiner is setting IDβs in the path-contexts for reducing memory and I can write a simple python code snippet to replace those tokens. Or I am missing something i.e. PathMiner can also perform the token replacement?
I would also be curious to know how to make the path output more closer to AST i.e. the output without comma, parenthesis.
Also AST output from Esprima illustrating the problem:
I am happy to contribute to the repo as required.
from astminer.
Hi!
Storing parenthesis, commas, etc. is strange. But I think this is predefined by ANTLR4 grammar which we use for JS. Maybe there are some parameters for generating rules inside ANTLR4 to set up the way of parsing... Btw, did you try to run the astminer on more complex examples? For example on some functions?
Speaking about changing ids back to words in paths, it's completely unnecessary. You already can feed this data in code2vec. You need this back conversion only on inference, to produce readable output to users.
from astminer.
@nashid, you don't see the sum
token because you've set a very tight limit on the number of extracted contexts -- only 10. If you raise it up to, let's say, 100, you will see that all the expected tokens are there.
As @SpirinEgor mentioned, all the non-alphanumeric tokens (like ,
and (
) are due to the ANTLR4 grammar used under the hood. If you will run code2vec
task instead of pathContexts
, all such tokens will be replaced with EMPTY_TOKEN
. You can either change the code a little bit in order not to store such contexts or just clean them afterward.
@SpirinEgor I guess we need to work on the configuration so that we can automatically ignore such tokens and corresponding contexts.
from astminer.
Since JavaScript was added and there are no questions at this moment I will close the issue. But feel free to open at any time if you have ones.
from astminer.
Related Issues (20)
- Error running astminer HOT 24
- Error Parsing C++ Files for Code2Seq HOT 15
- Integrating astminer with code2vec for C source codes HOT 6
- need help HOT 3
- File information of path_context result HOT 2
- different paths for same code content in python HOT 2
- problem with running "gradle shadowJar" HOT 4
- cli.jar HOT 8
- Looping over AST trees to generate paths between terminals HOT 2
- can astminer extract control flow of a source code? HOT 5
- Fuzzy error
- How to add a new language? HOT 1
- Is it possible to extract shortest path between two nodes?
- Output format code2vec HOT 2
- Manage the number of output path contexts
- "No such file or directory" error while parsing C++ code HOT 1
- C/C++ tests fail on M1
- Plugin [id: 'org.jetbrains.dokka', version: '1.4.32'] was not found in any of the following sources:
- Which version of JDK do I need to install before running this project?
- About generating input data for Code2Vec from C files
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from astminer.