Coder Social home page Coder Social logo

Comments (12)

egor-bogomolov avatar egor-bogomolov commented on May 30, 2024 1

Numbers are just strings of digits :)

from astminer.

liangxi11 avatar liangxi11 commented on May 30, 2024 1

My CLI version is 0.3 and I use it to parse python code.

The command I used was "java -jar cli.jar code2vec --lang py --project path/to/project --output path/to/results --maxH H --maxW W --maxContexts C --maxTokens T --maxPaths P"

------Explanation for the last question------
In path_context_xx.csv, I have many (Token id, Path id, Token id) triples.
In paths.csv, a path id corresponds to many node_type id.
In node_types.csv, a node_type id corresponds to many words.
I want to ask why does a node_type id correspond to so many words?
------Explanation for the last question------

Besides, I want to have a preprocessed data consisting string path. At first, I substituted the node_type id in a path by all its corresponding words, but the paths I got were too long to train code2vec. And I also tried to use digits path to train code2vec but I got a very bad result.

Now I want to substitute the node_type id in a path by its first corresponding words. Do you think it makes sense?

from astminer.

ayushbihani avatar ayushbihani commented on May 30, 2024 1

Why shouldn't we substitute the path ID by the corresponding node types?

from astminer.

egor-bogomolov avatar egor-bogomolov commented on May 30, 2024

Hi! You're not supposed to substitute anything at all. Just use numerical ids.

from astminer.

liangxi11 avatar liangxi11 commented on May 30, 2024

But code2vec requires input in string format rather than in numeric format.

from astminer.

liangxi11 avatar liangxi11 commented on May 30, 2024

I tried to input digits path, but it doesn't work well.

And I want to ask why does a path ID correspond to so many node types?

from astminer.

egor-bogomolov avatar egor-bogomolov commented on May 30, 2024

Could you provide more information: which data do you use, which language, the exact command to run astminer, CLI version?
I didn't quite get the last question. Node types are stored in a separate file. Long types with pipes inside are result of compressing paths of vertices with single child.

from astminer.

riyaj8888 avatar riyaj8888 commented on May 30, 2024

"java -jar cli.jar code2vec --lang py --project path/to/project --output path/to/results --maxH H --maxW W --maxContexts C --maxTokens T --maxPaths P"

in this command what is format of ' path/to/project' ? what exactly contents of this file

from astminer.

Avv22 avatar Avv22 commented on May 30, 2024

@liangxi11. Can you please write the code for extracting AST paths of Python scripts? I guess the output would be txt file, so should we just give that to code2vec for training please?

from astminer.

SpirinEgor avatar SpirinEgor commented on May 30, 2024

You can configure astminer to extract paths for Python code. Here is an example of config:

# input directory (path to project)
inputDir: src/test/resources/
# output directory
outputDir: output

# parse Python files with ANTLR parser
parser:
  name: antlr
  languages: [py]

filters:
  - name: by tree size  # exclude the trees that have > 1000 nodes
    maxTreeSize: 1000

# use file names as labels
# this selects the file level granularity
label:
  name: function name

# extract from each tree paths with length 9 and width 2
# save paths in code2vec format
storage:
  name: code2vec
  maxPathLength: 9
  maxPathWidth: 2

To setup code2vec storage see documentation for it: docs/storages.md

from astminer.

Avv22 avatar Avv22 commented on May 30, 2024

@SpirinEgor

Thank you. I have finally built astminer with gradle. Now I have astminer and I have Python samples stored in CSV file, one for each row. Should I have raw files of Python scripts and not store the content of each python script in CSV file, where each row corresponds to a python file please?

I run the following anyway based on your documentation and I got error:

$ ./cli.sh python.yaml Docker image not found, will use build/shadow/astminer.jar Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.UnsupportedClassVersionError: astminer/Main Kt has been compiled by a more recent version of the Java Runtime (class file ve rsion 55.0), this version of the Java Runtime only recognizes class file version s up to 52.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(Unknown Source) at java.security.SecureClassLoader.defineClass(Unknown Source) at java.net.URLClassLoader.defineClass(Unknown Source) at java.net.URLClassLoader.access$100(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.launcher.LauncherHelper.checkAndLoadMain(Unknown Source)

from astminer.

SpirinEgor avatar SpirinEgor commented on May 30, 2024

Reading from the CSV files isn't supported. If you need to create a custom pipeline, look into examples of using astminer as an API tool instead of CLI. For python and path-based representation, look into this.

You need to change traversing all python files with iterating over your file.

What about your error. It is about using different versions of Java. I may suggest cleaning build (gradle clean) and rebuilding the tool with Java 11.

from astminer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.