Comments (2)
Hello @jumormt and sorry for the late reply!
- Parsing C++ with an ANTLR parser is a very complex task (see here), even when the file was transformed by the preprocessor. We initially tried to use a parser built from an ANTLR grammar in our other projects, but faced problems. To give you an example, when ANTLR C++ grammar parses a declaration of
vector<vector<int>>
andvector< vector<int> >
, it gives different results, because in one case (the latter one if I remember correctly) it treats>
as a comparison operator (which is weird I would say). After trying to fix the grammar manually, we found more and more new cases and finally switched to another parser. While the parser we use is indeed built on top of the ANTLR4 grammar, it does a lot of post-processing, and from what we see, provides way more robust results. - Regarding your second question, we run preprocessing only for individual files and do not consider complex includes and dependencies. Thus, we use
g++
only as a preprocessor, without running linking/compilation. The goal of it is to substitute#define
directives that are present in the same file to increase our chances of successful parsing. We omit includes during preprocessing on purpose, because if we actually substitute them, files might become too large (e.g., if the developer included a huge part of the STL) and we will analyze some included libraries instead of the actual developer-written code. As a downside of omitting includes, we might miss some defines from the header files, but I don't see an easy way to fix it without unfolding#include
statements.
Hope that it answers your questions.
from astminer.
Hi,
Thanks a lot for your patient answer! The parser you recommended can indeed yield more robust results. However, it is relatively hard to use because its documents are limited. As per the second question, I think it is an interesting topic to do more transformation for the source code before parsing or feeding into our neural network.
from astminer.
Related Issues (20)
- Error Parsing C++ Files for Code2Seq HOT 15
- Integrating astminer with code2vec for C source codes HOT 6
- need help HOT 3
- File information of path_context result HOT 2
- different paths for same code content in python HOT 2
- problem with running "gradle shadowJar" HOT 4
- cli.jar HOT 8
- Looping over AST trees to generate paths between terminals HOT 2
- can astminer extract control flow of a source code? HOT 5
- Fuzzy error
- How to add a new language? HOT 1
- Is it possible to extract shortest path between two nodes?
- Output format code2vec HOT 2
- Manage the number of output path contexts
- "No such file or directory" error while parsing C++ code HOT 1
- C/C++ tests fail on M1
- Plugin [id: 'org.jetbrains.dokka', version: '1.4.32'] was not found in any of the following sources:
- Which version of JDK do I need to install before running this project?
- About generating input data for Code2Vec from C files
- Getting a stack overflow error when parsing glibc with Fuzzy
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from astminer.