staktrace / cafebabe Goto Github PK
View Code? Open in Web Editor NEWA java class file parser
A java class file parser
You have a nice parser and quite a compact repesentation. Any plans to add a writer so you could do round-trip modification of class files?
I think it would be useful to have a testing suite built-in that could be run against on CI (GitHub actions) to verify class files are parsed properly.
I would be willing to implement some of the testing, but a few questions about what you would prefer:
The current implementation only allows these characters for constructors and static initializers, but the spec only forbids them for methods and not for fields or local variables. https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.2.2
This is a problem, because the kotlin compiler uses weird local variable names like this.
Code which makes the wrong assumption:
https://github.com/staktrace/cafebabe/blob/main/src/attributes.rs#L600
Sample class:
Kotlin.zip
It would be useful to have dedicated error types for the parsing process. For use a more explicit error would be useful. You could also implement the Error
trait on the error type so you can still rely on the resulting error type being Display
.
Much of the constant_pool: Vec<Rc<ConstantPoolEntry<'a>>>
could be replaced by references into some arena type which owns the constant pool entries.
Although this causes a new problem: the constant pool arena must outlive the class file.
This could be changed by adding -D warnings
to the clippy
command in the CI.
For example the current build produces these warnings:
warning: field is never read: `attr_index`
--> src/constant_pool.rs:908:5
|
908 | attr_index: u16,
| ^^^^^^^^^^^^^^^
|
= note: `#[warn(dead_code)]` on by default
note: `Dynamic` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis
--> src/constant_pool.rs:906:10
|
906 | #[derive(Debug)]
| ^^^^^
= note: this warning originates in the derive macro `Debug` (in Nightly builds, run with -Z macro-backtrace for more info)
warning: field is never read: `name_and_type`
--> src/constant_pool.rs:909:5
|
909 | name_and_type: NameAndType<'a>,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
note: `Dynamic` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis
--> src/constant_pool.rs:906:10
|
906 | #[derive(Debug)]
| ^^^^^
= note: this warning originates in the derive macro `Debug` (in Nightly builds, run with -Z macro-backtrace for more info)
warning: `cafebabe` (lib) generated 2 warnings
Finished dev [unoptimized + debuginfo] target(s) in 1m 12s
I'm planning to write a method descriptor parser, for example:
(JLjava/nio/ByteBuffer;)J
Would parse to something like:
struct Descriptor { arguments: Vec<Type>, result: Type }
enum Type { Void, Primitive(PrimitiveType), Array(Box<Type>), Object(Cow<'a, str>) }
We could then replace all the descriptor
fields on MethodInfo
and some other types. Would you be open to this PR?
Would you consider adding a license to this project? I think I'd suggest MIT & Apache 2 like many others in the Rust ecosystem.
Thanks in advance, and great project.
The code inside CodeData
references constants by their u16
indices, but this library does not expose any way to access these constants by index.
My idea is to add a method like pub fn get(&self, index: u16) -> Option<ConstantPoolItem<'a>>
that uses the same logic as ConstantPoolIter::next
, but for the item at self.constant_pool[index].
I have a dependency on some of the changes in main. Thanks in advance!
I'll admit, I'm kind of a rookie when it comes to rust lifetimes. I can normally get by with little to no friction, but this issue has completely stumped me.
It seems that the issue is that the data for each ClassFile must live as long or longer than the ClassFile itself, Which makes it impossible(?) to use a function to populate a struct field with ClassFiles.
I'm more than willing to fix this myself if I have a few pointers on how, since i've just been fumbling around the codebase changing things from &'a [u8] to Vec with no success. Thank you so much for this great crate :)
A lot of structs would benefit from deriving more traits than just Debug
. I'm stuck having to copy-paste structures of the library because I can't Clone
them with the current design. This checklist provides a few tips for good crate interoperability, and more specifically a list of traits you may consider eagerly deriving/implementing on every struct where it makes sense.
I can submit a PR if you're interested.
I was experimenting with AFL fuzzing and it triggered a panic. I think cafebabe::parse_class
should never panic. If anything is wrong with the class file, an error should be returned.
validate
does this:
ConstantPoolEntry::ClassInfo(x) => Ok(
x.ensure_type(ConstantPoolEntryTypes::UTF8)? &&
x.borrow().get().validate_classinfo_name()?),
ensure_type(UTF8)
returns true for ConstantPoolEntry::Utf8
and for ConstantPoolEntry::Utf8Bytes
. So validate_classinfo_name
might get called with either of these two variants. It then does this:
fn validate_classinfo_name(&self) -> Result<bool, ParseError> {
match self {
ConstantPoolEntry::Utf8(x) => { ... }
_ => panic!("Attempting to get utf-8 data from non-utf8 constant pool entry!"),
}
}
So it panics if called with Utf8Bytes
. The same issue exists for some other validate_*
functions.
I see two issues here:
The panic happens because validate
and validate_classinfo_name
both check if the constant pool entry has the correct type, but they have different ideas about what the allowed types are.
I think this could be fixed by making validate_classinfo_name
cal fail!
instead of panic!
and by removing the call to ensure_type
in validate
. Perhaps this could get rid of ensure_type
and even ConstantPoolEntryTypes
entirely. This would seem to make sense because Rust already knows at runtime what variant of ConstantPoolEntry
it is, so it seems awkward to have a parallel ConstantPoolEntryTypes
that mirrors ConstantPoolEntry
.
cesu8::from_java_cesu8
fails?I understand that "UTF-8" constant pool entries in Java class files are not standard UTF-8 (spec). But isn't that handled by the cesu8
crate? If even that method cannot make sense of the bytes, is it really worth keeping parsing? Holding on to these malformed bytes seems to be responsible for awkward code in several places and ultimately, these bytes will be passed to the end user (using StringBytes
), who also won't know how to handle them.
I think it would make more sense to fail!
in this case. However, I see that this is how the library used to behave until you changed it here. I'm curious why you made that change? Did you encounter such badly formed strings "in the wild"?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.