Coder Social home page Coder Social logo

artichoke / intaglio Goto Github PK

View Code? Open in Web Editor NEW
26.0 5.0 1.0 1.75 MB

๐Ÿ—ƒ UTF-8 string, byte string, and C string interner

Home Page: https://crates.io/crates/intaglio

License: MIT License

Rust 98.64% Ruby 1.36%
string-interning bytes utf-8 symbol artichoke symbol-table rust rust-crate interner

intaglio's Issues

Miri flag passing changed

cargo miri test changed to be more compatible with cargo test, but this means we had to find a different way to pass flags to the interpreter, so there now is a MIRIFLAGS environment variable for that. The old way still works for now, but is deprecated.

This affects the following line:

run: cargo miri test -- -- drop

This should now be cargo miri test drop, which runs the same tests as cargo test drop.

Using `NonZerou32` as the inner type for `Symbol`

Hi, thanks for the great library ๐Ÿ˜ I've been working on a compiler that has many Option<Symbol>s scattered throuhgout. This takes 8 bytes to store, since it needs 4 bytes for the u32, and 4 bytes for the discimminant + padding.

If you replace u32 with NonZeroU32, the compiler can see that the all-0 bit pattern is not valid, so it can use that for the None case, saving 4 bytes:

println!("{}", std::mem::size_of::<u32>());
println!("{}", std::mem::size_of::<Option<u32>>());
println!("{}", std::mem::size_of::<NonZeroU32>());
println!("{}", std::mem::size_of::<Option<NonZeroU32>>());

prints:

4
8
4
4

I've already made this change locally, and it works well, so I'd like to upstream (if it's useful to other people). However, this would be a breaking change, and it feels hard to justify a breaking change for a small (but noticeable) perf hit.

Currently, I see 3 ways of doing this:

  • change the function signatures to accept NonZeroU32 instead of u32 (and replace some From impls with TryFrom)
  • make the affected APIs panic, and add docs saying that they cannot take 0 as a parameter
  • make the APIs simply use u32::MAX when passed 0 (or some other "random" constant)

The first option seems ideal, but is a breaking change to the API signature. The second is what I've implemented in my local change, but is also a breaking change, just without the compiler errors, which is arguably worse. The third option just makes me feel really uneasy.

It's also worth mentioning that I'm not at all familiar with Ruby, and I understand this project is part of a system that needs to be compatible with existing versions of Ruby in some way. If there is a reason why NonZeroU32 is inappropriate here, then it probably makes sense to just fork it. I came across this repo in a comparison of various string interners, and it was the only one that supported PathBuf.

I'd be happy to PR the changes that I've made, but before going further, I thought it would be best to check what the best way forward would be

Thanks ๐Ÿ˜

Version 1.9.0 seems to drop Send trait

In upgrading to version 1.9.0, we hit a break with the following compilation error:

325 | impl NFSFileSystem for MirrorFS {
    |                        ^^^^^^^^ `NonNull<OsStr>` cannot be sent between threads safely
    |
    = help: within `intaglio::internal::Interned<OsStr>`, the trait `Send` is not implemented for `NonNull<OsStr>`

This seems to be a regression from 1.8.*.

This is with Cargo/rustc version 1.69.

Add a `PathBuf`/`Path` interner

Following up on #117, let's flesh out the remaining owned string/slice pairs in std.

This should be pretty easy to add and would clean up a lot of code in artichoke-backend which deals with mruby expecting NUL-terminated symbols in its own way.

Things to do:

  • Implement Slice for Path in internal.rs.
  • copy-paste the source in bytes.rs to path.rs
  • Fixup the argument types and docs.
  • Add a path feature for exposing this new interner.

Add a `OsString`/`OsStr` interner

Following up on #117, let's flesh out the remaining owned string/slice pairs in std.

This should be pretty easy to add and would clean up a lot of code in artichoke-backend which deals with mruby expecting NUL-terminated symbols in its own way.

Things to do:

  • Implement Slice for OsStr in internal.rs.
  • copy-paste the source in bytes.rs to osstr.rs
  • Fixup the argument types and docs.
  • Add a osstr feature for exposing this new interner.

Add a `CString`/`CStr` interner

This should be pretty easy to add and would clean up a lot of code in artichoke-backend which deals with mruby expecting NUL-terminated symbols in its own way.

Things to do:

  • Implement Slice for CStr in internal.rs.
  • copy-paste the source in bytes.rs to cstr.rs
  • Fixup the argument types and docs.
  • Add a cstr feature for exposing this new interner.

Miri failure on rustc 1.73.0-nightly (31395ec38 2023-07-24)

https://github.com/artichoke/intaglio/actions/runs/5663120563/job/15344308090

error: Undefined Behavior: trying to retag from <99387> for SharedReadOnly permission at alloc31495[0x0], but that tag does not exist in the borrow stack for this location
   --> /home/runner/work/intaglio/intaglio/src/bytes.rs:713:45
    |
713 |         debug_assert_eq!(self.get(id), Some(slice));
    |                                             ^^^^^
    |                                             |
    |                                             trying to retag from <99387> for SharedReadOnly permission at alloc31495[0x0], but that tag does not exist in the borrow stack for this location
    |                                             this error occurs as part of retag at alloc31495[0x0..0x64]
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <99387> was created by a SharedReadOnly retag at offsets [0x0..0x64]
   --> /home/runner/work/intaglio/intaglio/src/bytes.rs:708:30
    |
708 |         let slice = unsafe { name.as_static_slice() };
    |                              ^^^^^^^^^^^^^^^^^^^^^^
help: <99387> was later invalidated at offsets [0x0..0x64] by a Unique retag (of a reference/box inside this compound value)
   --> /home/runner/work/intaglio/intaglio/src/bytes.rs:711:23
    |
711 |         self.vec.push(name);
    |                       ^^^^
    = note: BACKTRACE (of the first span):
    = note: inside `intaglio::bytes::SymbolTable::intern::<std::vec::Vec<u8>>` at /home/runner/work/intaglio/intaglio/src/bytes.rs:713:45: 713:50
note: inside `bytes::dealloc_owned_data`
   --> tests/leak_drop/bytes.rs:9:22
    |
9   |         let sym_id = table.intern(symbol.clone()).unwrap();
    |                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure
   --> tests/leak_drop/bytes.rs:4:25
    |
3   | #[test]
    | ------- in this procedural macro expansion
4   | fn dealloc_owned_data() {
    |                         ^
    = note: this error originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to previous error

error: test failed, to rerun pass `--test leak_drop`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.