artichoke / intaglio Goto Github PK
View Code? Open in Web Editor NEW๐ UTF-8 string, byte string, and C string interner
Home Page: https://crates.io/crates/intaglio
License: MIT License
๐ UTF-8 string, byte string, and C string interner
Home Page: https://crates.io/crates/intaglio
License: MIT License
cargo miri test
changed to be more compatible with cargo test
, but this means we had to find a different way to pass flags to the interpreter, so there now is a MIRIFLAGS
environment variable for that. The old way still works for now, but is deprecated.
This affects the following line:
intaglio/.github/workflows/miri.yaml
Line 36 in 3697f89
cargo miri test drop
, which runs the same tests as cargo test drop
.Hi, thanks for the great library ๐ I've been working on a compiler that has many Option<Symbol>
s scattered throuhgout. This takes 8 bytes to store, since it needs 4 bytes for the u32
, and 4 bytes for the discimminant + padding.
If you replace u32
with NonZeroU32
, the compiler can see that the all-0 bit pattern is not valid, so it can use that for the None
case, saving 4 bytes:
println!("{}", std::mem::size_of::<u32>());
println!("{}", std::mem::size_of::<Option<u32>>());
println!("{}", std::mem::size_of::<NonZeroU32>());
println!("{}", std::mem::size_of::<Option<NonZeroU32>>());
prints:
4
8
4
4
I've already made this change locally, and it works well, so I'd like to upstream (if it's useful to other people). However, this would be a breaking change, and it feels hard to justify a breaking change for a small (but noticeable) perf hit.
Currently, I see 3 ways of doing this:
NonZeroU32
instead of u32
(and replace some From
impls with TryFrom
)0
as a parameteru32::MAX
when passed 0
(or some other "random" constant)The first option seems ideal, but is a breaking change to the API signature. The second is what I've implemented in my local change, but is also a breaking change, just without the compiler errors, which is arguably worse. The third option just makes me feel really uneasy.
It's also worth mentioning that I'm not at all familiar with Ruby, and I understand this project is part of a system that needs to be compatible with existing versions of Ruby in some way. If there is a reason why NonZeroU32
is inappropriate here, then it probably makes sense to just fork it. I came across this repo in a comparison of various string interners, and it was the only one that supported PathBuf
.
I'd be happy to PR the changes that I've made, but before going further, I thought it would be best to check what the best way forward would be
Thanks ๐
In upgrading to version 1.9.0, we hit a break with the following compilation error:
325 | impl NFSFileSystem for MirrorFS {
| ^^^^^^^^ `NonNull<OsStr>` cannot be sent between threads safely
|
= help: within `intaglio::internal::Interned<OsStr>`, the trait `Send` is not implemented for `NonNull<OsStr>`
This seems to be a regression from 1.8.*.
This is with Cargo/rustc version 1.69.
Following up on #117, let's flesh out the remaining owned string/slice pairs in std
.
This should be pretty easy to add and would clean up a lot of code in artichoke-backend which deals with mruby expecting NUL-terminated symbols in its own way.
Things to do:
Slice
for Path
in internal.rs
.bytes.rs
to path.rs
path
feature for exposing this new interner.Following up on #117, let's flesh out the remaining owned string/slice pairs in std
.
This should be pretty easy to add and would clean up a lot of code in artichoke-backend which deals with mruby expecting NUL-terminated symbols in its own way.
Things to do:
Slice
for OsStr
in internal.rs
.bytes.rs
to osstr.rs
osstr
feature for exposing this new interner.HashMap::try_reserve
was stabilized in Rust 1.57.0 (current MSRV is 1.58.0).
This should be pretty easy to add and would clean up a lot of code in artichoke-backend which deals with mruby expecting NUL-terminated symbols in its own way.
Things to do:
Slice
for CStr
in internal.rs
.bytes.rs
to cstr.rs
cstr
feature for exposing this new interner.See:
The current hard mode is:
$ MIRIFLAGS="-Zmiri-strict-provenance -Zmiri-check-number-validity -Zmiri-symbolic-alignment-check -Zrandomize-layout" cargo +nightly miri test --test leak_drop
Blockers:
https://github.com/artichoke/intaglio/actions/runs/5663120563/job/15344308090
error: Undefined Behavior: trying to retag from <99387> for SharedReadOnly permission at alloc31495[0x0], but that tag does not exist in the borrow stack for this location
--> /home/runner/work/intaglio/intaglio/src/bytes.rs:713:45
|
713 | debug_assert_eq!(self.get(id), Some(slice));
| ^^^^^
| |
| trying to retag from <99387> for SharedReadOnly permission at alloc31495[0x0], but that tag does not exist in the borrow stack for this location
| this error occurs as part of retag at alloc31495[0x0..0x64]
|
= help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
= help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <99387> was created by a SharedReadOnly retag at offsets [0x0..0x64]
--> /home/runner/work/intaglio/intaglio/src/bytes.rs:708:30
|
708 | let slice = unsafe { name.as_static_slice() };
| ^^^^^^^^^^^^^^^^^^^^^^
help: <99387> was later invalidated at offsets [0x0..0x64] by a Unique retag (of a reference/box inside this compound value)
--> /home/runner/work/intaglio/intaglio/src/bytes.rs:711:23
|
711 | self.vec.push(name);
| ^^^^
= note: BACKTRACE (of the first span):
= note: inside `intaglio::bytes::SymbolTable::intern::<std::vec::Vec<u8>>` at /home/runner/work/intaglio/intaglio/src/bytes.rs:713:45: 713:50
note: inside `bytes::dealloc_owned_data`
--> tests/leak_drop/bytes.rs:9:22
|
9 | let sym_id = table.intern(symbol.clone()).unwrap();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure
--> tests/leak_drop/bytes.rs:4:25
|
3 | #[test]
| ------- in this procedural macro expansion
4 | fn dealloc_owned_data() {
| ^
= note: this error originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)
note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace
error: aborting due to previous error
error: test failed, to rerun pass `--test leak_drop`
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.