Coder Social home page Coder Social logo

cafebabe's People

Contributors

bluejekyll avatar earthcomputer avatar schubart avatar staktrace avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cafebabe's Issues

Any plans for a class file writer?

You have a nice parser and quite a compact repesentation. Any plans to add a writer so you could do round-trip modification of class files?

Testing?

I think it would be useful to have a testing suite built-in that could be run against on CI (GitHub actions) to verify class files are parsed properly.

I would be willing to implement some of the testing, but a few questions about what you would prefer:

  • Build tooling - do we scale up to something on the level of Gradle or just use a bunch of Makefiles or bash scripts to compile our test files.
  • Compilers - Do we want to also test other compilers such as ECJ in our tooling? May not get to ECJ immediately but javac will be on the roster first.
  • Test artifacts - Specifically regarding code style as to prevent conflict hell with git. Ideally something that we could just run an autoformatter on.
  • Oddball artifacts - It may be useful to craft some very specific testing artifacts that javac or any compiler couldn't generate but are still considered legal bytecode - ASM may be useful for generating these special oddballs.

Class parsing fails if local variable name contains '<' or '>'

The current implementation only allows these characters for constructors and static initializers, but the spec only forbids them for methods and not for fields or local variables. https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.2.2

This is a problem, because the kotlin compiler uses weird local variable names like this.
image

Code which makes the wrong assumption:
https://github.com/staktrace/cafebabe/blob/main/src/attributes.rs#L600
Sample class:
Kotlin.zip

Consider backing class file data with an arena

Much of the constant_pool: Vec<Rc<ConstantPoolEntry<'a>>> could be replaced by references into some arena type which owns the constant pool entries.

Although this causes a new problem: the constant pool arena must outlive the class file.

Clippy warnings do not fail the build

This could be changed by adding -D warnings to the clippy command in the CI.

For example the current build produces these warnings:

warning: field is never read: `attr_index`
   --> src/constant_pool.rs:908:5
    |
908 |     attr_index: u16,
    |     ^^^^^^^^^^^^^^^
    |
    = note: `#[warn(dead_code)]` on by default
note: `Dynamic` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis
   --> src/constant_pool.rs:906:10
    |
906 | #[derive(Debug)]
    |          ^^^^^
    = note: this warning originates in the derive macro `Debug` (in Nightly builds, run with -Z macro-backtrace for more info)
warning: field is never read: `name_and_type`
   --> src/constant_pool.rs:909:5
    |
909 |     name_and_type: NameAndType<'a>,
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
note: `Dynamic` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis
   --> src/constant_pool.rs:906:10
    |
906 | #[derive(Debug)]
    |          ^^^^^
    = note: this warning originates in the derive macro `Debug` (in Nightly builds, run with -Z macro-backtrace for more info)
warning: `cafebabe` (lib) generated 2 warnings
    Finished dev [unoptimized + debuginfo] target(s) in 1m 12s

Method descriptor parser

I'm planning to write a method descriptor parser, for example:

(JLjava/nio/ByteBuffer;)J

Would parse to something like:

struct Descriptor { arguments: Vec<Type>, result: Type }
enum Type { Void, Primitive(PrimitiveType), Array(Box<Type>), Object(Cow<'a, str>) }

We could then replace all the descriptor fields on MethodInfo and some other types. Would you be open to this PR?

Add a project license?

Would you consider adding a license to this project? I think I'd suggest MIT & Apache 2 like many others in the Rust ecosystem.

Thanks in advance, and great project.

Add a method for accessing the constant pool by index

The code inside CodeData references constants by their u16 indices, but this library does not expose any way to access these constants by index.

My idea is to add a method like pub fn get(&self, index: u16) -> Option<ConstantPoolItem<'a>> that uses the same logic as ConstantPoolIter::next, but for the item at self.constant_pool[index].

Make ClassFile ownable(?)

I'll admit, I'm kind of a rookie when it comes to rust lifetimes. I can normally get by with little to no friction, but this issue has completely stumped me.

It seems that the issue is that the data for each ClassFile must live as long or longer than the ClassFile itself, Which makes it impossible(?) to use a function to populate a struct field with ClassFiles.

I'm more than willing to fix this myself if I have a few pointers on how, since i've just been fumbling around the codebase changing things from &'a [u8] to Vec with no success. Thank you so much for this great crate :)

[Suggestion] Derive more traits?

A lot of structs would benefit from deriving more traits than just Debug. I'm stuck having to copy-paste structures of the library because I can't Clone them with the current design. This checklist provides a few tips for good crate interoperability, and more specifically a list of traits you may consider eagerly deriving/implementing on every struct where it makes sense.

I can submit a PR if you're interested.

Panic in `validate_classinfo_name`

Background

I was experimenting with AFL fuzzing and it triggered a panic. I think cafebabe::parse_class should never panic. If anything is wrong with the class file, an error should be returned.

Details

validate does this:

ConstantPoolEntry::ClassInfo(x) => Ok(
             x.ensure_type(ConstantPoolEntryTypes::UTF8)? && 
             x.borrow().get().validate_classinfo_name()?),

ensure_type(UTF8) returns true for ConstantPoolEntry::Utf8 and for ConstantPoolEntry::Utf8Bytes. So validate_classinfo_name might get called with either of these two variants. It then does this:

    fn validate_classinfo_name(&self) -> Result<bool, ParseError> {
        match self {
            ConstantPoolEntry::Utf8(x) => { ... }
            _ => panic!("Attempting to get utf-8 data from non-utf8 constant pool entry!"),
        }
    }

So it panics if called with Utf8Bytes. The same issue exists for some other validate_* functions.

I see two issues here:

Constant pool entry type checks are out of sync

The panic happens because validate and validate_classinfo_name both check if the constant pool entry has the correct type, but they have different ideas about what the allowed types are.

I think this could be fixed by making validate_classinfo_name cal fail! instead of panic! and by removing the call to ensure_type in validate. Perhaps this could get rid of ensure_type and even ConstantPoolEntryTypes entirely. This would seem to make sense because Rust already knows at runtime what variant of ConstantPoolEntry it is, so it seems awkward to have a parallel ConstantPoolEntryTypes that mirrors ConstantPoolEntry.

Better to fail if cesu8::from_java_cesu8 fails?

I understand that "UTF-8" constant pool entries in Java class files are not standard UTF-8 (spec). But isn't that handled by the cesu8 crate? If even that method cannot make sense of the bytes, is it really worth keeping parsing? Holding on to these malformed bytes seems to be responsible for awkward code in several places and ultimately, these bytes will be passed to the end user (using StringBytes), who also won't know how to handle them.

I think it would make more sense to fail! in this case. However, I see that this is how the library used to behave until you changed it here. I'm curious why you made that change? Did you encounter such badly formed strings "in the wild"?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.