Comments (12)
Numbers are just strings of digits :)
from astminer.
My CLI version is 0.3 and I use it to parse python code.
The command I used was "java -jar cli.jar code2vec --lang py --project path/to/project --output path/to/results --maxH H --maxW W --maxContexts C --maxTokens T --maxPaths P"
------Explanation for the last question------
In path_context_xx.csv, I have many (Token id, Path id, Token id) triples.
In paths.csv, a path id corresponds to many node_type id.
In node_types.csv, a node_type id corresponds to many words.
I want to ask why does a node_type id correspond to so many words?
------Explanation for the last question------
Besides, I want to have a preprocessed data consisting string path. At first, I substituted the node_type id in a path by all its corresponding words, but the paths I got were too long to train code2vec. And I also tried to use digits path to train code2vec but I got a very bad result.
Now I want to substitute the node_type id in a path by its first corresponding words. Do you think it makes sense?
from astminer.
Why shouldn't we substitute the path ID by the corresponding node types?
from astminer.
Hi! You're not supposed to substitute anything at all. Just use numerical ids.
from astminer.
But code2vec requires input in string format rather than in numeric format.
from astminer.
I tried to input digits path, but it doesn't work well.
And I want to ask why does a path ID correspond to so many node types?
from astminer.
Could you provide more information: which data do you use, which language, the exact command to run astminer, CLI version?
I didn't quite get the last question. Node types are stored in a separate file. Long types with pipes inside are result of compressing paths of vertices with single child.
from astminer.
"java -jar cli.jar code2vec --lang py --project path/to/project --output path/to/results --maxH H --maxW W --maxContexts C --maxTokens T --maxPaths P"
in this command what is format of ' path/to/project' ? what exactly contents of this file
from astminer.
@liangxi11. Can you please write the code for extracting AST paths of Python scripts? I guess the output would be txt file, so should we just give that to code2vec for training please?
from astminer.
You can configure astminer to extract paths for Python code. Here is an example of config:
# input directory (path to project)
inputDir: src/test/resources/
# output directory
outputDir: output
# parse Python files with ANTLR parser
parser:
name: antlr
languages: [py]
filters:
- name: by tree size # exclude the trees that have > 1000 nodes
maxTreeSize: 1000
# use file names as labels
# this selects the file level granularity
label:
name: function name
# extract from each tree paths with length 9 and width 2
# save paths in code2vec format
storage:
name: code2vec
maxPathLength: 9
maxPathWidth: 2
To setup code2vec storage see documentation for it: docs/storages.md
from astminer.
Thank you. I have finally built astminer with gradle. Now I have astminer and I have Python samples stored in CSV file, one for each row. Should I have raw files of Python scripts and not store the content of each python script in CSV file, where each row corresponds to a python file please?
I run the following anyway based on your documentation and I got error:
$ ./cli.sh python.yaml Docker image not found, will use build/shadow/astminer.jar Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.UnsupportedClassVersionError: astminer/Main Kt has been compiled by a more recent version of the Java Runtime (class file ve rsion 55.0), this version of the Java Runtime only recognizes class file version s up to 52.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(Unknown Source) at java.security.SecureClassLoader.defineClass(Unknown Source) at java.net.URLClassLoader.defineClass(Unknown Source) at java.net.URLClassLoader.access$100(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.launcher.LauncherHelper.checkAndLoadMain(Unknown Source)
from astminer.
Reading from the CSV files isn't supported. If you need to create a custom pipeline, look into examples of using astminer as an API tool instead of CLI. For python and path-based representation, look into this.
You need to change traversing all python files with iterating over your file.
What about your error. It is about using different versions of Java. I may suggest cleaning build (gradle clean
) and rebuilding the tool with Java 11.
from astminer.
Related Issues (20)
- Error running astminer HOT 24
- Error Parsing C++ Files for Code2Seq HOT 15
- Integrating astminer with code2vec for C source codes HOT 6
- need help HOT 3
- File information of path_context result HOT 2
- different paths for same code content in python HOT 2
- problem with running "gradle shadowJar" HOT 4
- cli.jar HOT 8
- Looping over AST trees to generate paths between terminals HOT 2
- can astminer extract control flow of a source code? HOT 5
- Fuzzy error
- How to add a new language? HOT 1
- Is it possible to extract shortest path between two nodes?
- Output format code2vec HOT 2
- Manage the number of output path contexts
- "No such file or directory" error while parsing C++ code HOT 1
- C/C++ tests fail on M1
- Plugin [id: 'org.jetbrains.dokka', version: '1.4.32'] was not found in any of the following sources:
- Which version of JDK do I need to install before running this project?
- About generating input data for Code2Vec from C files
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from astminer.