ustr's Introduction


Fast, FFI-friendly string interning.

A Ustr (Unique str) is a lightweight handle representing a static, immutable entry in a global string cache, allowing for:

  • Extremely fast string assignment and comparisons.

  • Efficient storage. Only one copy of the string is held in memory, and getting access to it is just a pointer indirection.

  • Fast hashing ‒ the precomputed hash is stored with the string.

  • Fast FFI ‒ the string is stored with a terminating null byte so can be passed to C directly without doing the CString dance.

The downside is no strings are ever freed, so if you're creating lots and lots of strings, you might run out of memory. On the other hand, War and Peace is only 3MB, so it's probably fine.

This crate is based on OpenImageIO's (OIIO) ustring but it is not binary-compatible (yet). The underlying hash map implementation is directy ported from OIIO.


use ustr::{Ustr, ustr};

// Creation is quick and easy using either `Ustr::from` or the `ustr` short
// function and only one copy of any string is stored
let h1 = Ustr::from("hello");
let h2 = ustr("hello");

// Comparisons and copies are extremely cheap
let h3 = h1;
assert_eq!(h2, h3);

// You can pass straight to FFI
let len = unsafe {
assert_eq!(len, 5);

// For best performance when using Ustr as key for a HashMap or HashSet,
// you'll want to use the precomputed hash. To make this easier, just use
// the UstrMap and UstrSet exports:
use ustr::UstrMap;

// Key type is always Ustr
let mut map: UstrMap<usize> = UstrMap::default();
map.insert(u1, 17);
assert_eq!(*map.get(&u1).unwrap(), 17);

By enabling the "serde" feature you can serialize individual Ustrs or the whole cache with serde.

use ustr::{Ustr, ustr};

let u_ser = ustr("serialization is fun!");
let json = serde_json::to_string(&u_ser).unwrap();
let u_de : Ustr = serde_json::from_str(&json).unwrap();

assert_eq!(u_ser, u_de);

Since the cache is global, use the ustr::DeserializedCache dummy object to drive the deserialization.

ustr("Send me to JSON and back");
let json = serde_json::to_string(ustr::cache()).unwrap();

// ... some time later ...
let _: ustr::DeserializedCache = serde_json::from_str(&json).unwrap();
assert_eq!(ustr::num_entries(), 1);
assert_eq!(ustr::string_cache_iter().collect::<Vec<_>>(), vec!["Send me to JSON and back"]);

Calling from C/C++

If you are writing a library that uses ustr and want users to be able to create Ustrs to pass to your API from C, add to your crate and use include/ustr.h or include/ustr.hpp for function declarations.


Changes since 0.10

  • Actually renamed use of "serialization" feature to "serde"

Changes since 0.9

  • Fixed and issue that would stop Ustr from working on wasm32-unknown-unknown (contributed by bouk)

and thanks to virtualritz:

  • Ustr::get_cache() was renamed to cache()

  • All dependencies were bumped to latest versions

  • All features were removed (there are good defaults) except for serialization

  • The serialization feature was renamed to serde

  • ustr now uses Rust 2021

Changes since 0.8

  • Add existing_ustr function (contributed by macprog-guy)

    The idea behind this is to allow the creation of a Ustr only when that Ustr already exists. This is particularly useful when Ustrs are being created using untrusted user input (say from a web server or API). In that case, by providing different values at each call we consume more and more memory eventually running out (DoS).

  • Add implementation for Ord (contributed by zigazeljko)

  • Inlined a bunch of simple functions (contributed by g-plane)

  • Fixed tests to lock rather than relying on RUST_TEST_THREADS=1 (contributed by kornelski)

  • Fixed tests to handle serialization feature properly when enabled (contributed by kornelski)

  • Added a check for a potential allocation failure in the allocator (contributed by kornelski)

  • Added FromStr impl (contributed by martinmr)

  • Add rustfmt.toml to repo

Changes since 0.7

  • Update dependencies

    The versions of parking_lot and ahash have been updated.

  • Space optimization with NonNull

    The internal pointer is now a NonNull to take advanatge of layout optimizations in Option etc.

  • Add as_cstr() method

    Added as_cstr(&self) -> std::ffi::CStr to make it easier to interface with APIs that rely on CStr.

Changes since 0.6

  • Derive Ord for Ustr

    So now you can sort a Vec of Ustrs lexicographically.

Changes since 0.5

  • Added From<Ustr> for &str

    This impl makes it easier to pass a Ustr to methods expecting an Into<&str>.

Changes since 0.4

  • 32-bit support added

    Removed the restriction to 64-bit systems and fixed a bug relating to pointer maths. Thanks to agaussman for bringing it up.

  • Miri leak checks re-enabled

    Thanks to RalfJung for pointing out that Miri now ignores "leaks" from statics.

  • PartialOrd is now lexicographic

  • Thanks to macprog-guy for the PR implementing PartialOrd by deferring to &str. This will be slower than the previous derived implementation which just did a pointer comparison, but is much less surprising.

Changes since 0.3

  • Added Miri to CI tests

    Miri sanity-checks the unsafe parts of the code to guard against some types of UB.

  • Switched to ahash as the default hasher

    Ahash is a fast, non-cryptographic pure Rust hasher. Pure Rust is important to be able to run Miri and ahash benchmarks the fastest I could find. The old fasthash/cityhash is available by enabling --features=hashcity

Changes since 0.2

  • Serde support

    Ustr can now be serialized with Serde when enabling --features=serialization. The global string cache can also be serialized if you really want to.

  • Switched to parking_lot::Mutex as default synchronization

    Spinlocks have been getting a bad rap recently so the string cache now uses parking_lot::Mutex as the default synchronization primitive. spin::Mutex is still available behind the --features=spinlock feature gate if you really want that extra 5% speed.

  • Cleaned up unsafe

    Did a better job of documenting the invariants for the unsafe blocks and replaced some blind additions with checked_add() and friends to avoid potential (but very unlikely) overflow.

  • Compared to string-cache

    string-cache provides a global cache that can be created at compile time as well as at run time. Dynamic strings in the cache appear to be reference-counted so will be freed when they are no longer used, while Ustrs are never deleted.

    Creating a string_cache::DefaultAtom is much slower than creating a Ustr, especially in a multi-threaded context. On the other hand if you can just bake all your Atoms into your binary at compile-time this wouldn't be an issue.

  • Compared to string-interner

    string-interner gives you individual Interner objects to work with rather than a global cache, which could be more flexible. It's faster to create than string-cache but still significantly slower than Ustr.


Ustrs are significantly faster to create than string-interner or string-cache. Creating 100,000 cycled copies of ~20,000 path strings of the form:

... etc.

raft bench


It is common in certain types of applications to use strings as identifiers, but not really do any processing with them. To paraphrase from OIIO's ustring documentation:

Compared to standard strings, Ustrs have several advantages:

  • Each individual Ustr is very small -- in fact, we guarantee that a Ustr is the same size and memory layout as an ordinary *u8.

  • Storage is frugal, since there is only one allocated copy of each unique character sequence, throughout the lifetime of the program.

  • Assignment from one Ustr to another is just copy of the pointer; no allocation, no character copying, no reference counting.

  • Equality testing (do the strings contain the same characters) is a single operation, the comparison of the pointer.

  • Memory allocation only occurs when a new Ustr is constructed from raw characters the first time ‒ subsequent constructions of the same string just finds it in the canonial string set, but doesn't need to allocate new storage. Destruction of a Ustr is trivial, there is no de-allocation because the canonical version stays in the set. Also, therefore, no user code mistake can lead to memory leaks.

    But there are some problems, too. Canonical strings are never freed from the table. So in some sense all the strings "leak", but they only leak one copy for each unique string that the program ever comes across. Creating a Ustr is slower than String::from() on a single thread, and performance will be worse if trying to create many Ustrs in tight loops from multiple threads due to lock contention for the global cache.

On the whole, Ustrs are a really great string representation

  • if you tend to have (relatively) few unique strings, but many copies of those strings;

  • if you tend to make the same strings over and over again, and if it's relatively rare that a single unique character sequence is used only once in the entire lifetime of the program; ‒ if your most common string operations are assignment and equality testing and you want them to be as fast as possible;

  • if you are doing relatively little character-by-character assembly of strings, string concatenation, or other "string manipulation" (other than equality testing).

Ustrs are not so hot:

  • if your program tends to have very few copies of each character sequence over the entire lifetime of the program;

  • if your program tends to generate a huge variety of unique strings over its lifetime, each of which is used only a short time and then discarded, never to be needed again;

  • if you don't need to do a lot of string assignment or equality testing, but lots of more complex string manipulation.

Safety and Compatibility

This crate contains a significant amount of unsafe but usage has been checked and is well-documented. It is also run through Miri as part of the CI process.

I use it regularly on 64-bit systems, and it has passed Miri on a 32-bit system as well, bit 32-bit is not checked regularly. If you want to use it on 32-bit, please make sure to run Miri and open and issue if you find any problems.


BSD+ License

Copyright © 2019—2020 Anders Langlands

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

Subject to the terms and conditions of this license, each copyright holder and contributor hereby grants to those receiving rights under this license a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except for failure to satisfy the conditions of this license) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer this software, where such license applies only to those patent claims, already acquired or hereafter acquired, licensable by such copyright holder or contributor that are necessarily infringed by:

(a) their Contribution(s) (the licensed copyrights of copyright holders and non-copyrightable additions of contributors, in source or binary form) alone; or

(b) combination of their Contribution(s) with the work of authorship to which such Contribution(s) was added by such copyright holder or contributor, if, at the time the Contribution is added, such addition causes such combination to be necessarily infringed. The patent license shall not apply to any other combinations which include the Contribution.

Except as expressly stated above, no rights or licenses from any copyright holder or contributor is granted under this license, whether expressly, by implication, estoppel or otherwise.



Contains code ported from OpenImageIO, BSD 3-clause licence.

Contains a copy of Max Woolf's Big List of Naughty Strings, MIT licence.

Contains some strings from SecLists, MIT licence.

ustr's avatar anderslanglands avatar bouk avatar cormacrelf avatar g-plane avatar iwanabethatguy avatar kornelski avatar martinmr avatar ralfjung avatar samuelmcgowan avatar systemcluster avatar trolledwoods avatar virtualritz avatar zigazeljko avatar


ustr's Issues

Unit test failure in 'tests::blns'

I'm getting a unit test failure in tests::blns.
I'm on a mac (mojave 10.14.10).

This is on the master and serde branches.

Release and debug builds.

running 6 tests
test hash::test_hashing ... ok
test tests::c_str_works ... ok
test tests::it_works ... ok
test tests::empty_string ... ok
test tests::blns ... FAILED
test tests::raft ... ok


---- tests::blns stdout ----
thread 'tests::blns' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1315`', src/
stack backtrace:
   0: std::panicking::default_hook::{{closure}}
   1: std::panicking::default_hook
   2: std::panicking::rust_panic_with_hook
   3: std::panicking::continue_panic_fmt
   4: std::panicking::begin_panic_fmt
   5: ustr::tests::blns
   6: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
   7: __rust_maybe_catch_panic
   8: test::run_test::run_test_inner::{{closure}}
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

No safe clear cache function

This lib fits our use case well, except that there is no safe clear function. Is there any reason we could not have a safe clear cache function?

Wasm-pack fails with ustr

I don't know if wasm-pack is supported by ustr, but I think it'd be useful to record this, in case anyone else runs into it. I'm also not sure if the problem is with ustr or wasm-pack or some dependency.


Building WebAssembly with wasm-pack 0.10.1 fails if the project uses Ustr:

wasm-pack build --target no-modules

The error is (with some paths shortened):

error: cannot import from modules (`env`) with `--no-modules`
Error: Running the wasm-bindgen CLI
Caused by: failed to execute `wasm-bindgen`: exited with exit status: 1
  full command: "..../.cache/.wasm-pack/wasm-bindgen-e63eccbd87203048/wasm-bindgen" ".../target/wasm32-unknown-unknown/release/minimal_ustr_wasmpack_reproduction.wasm" "--out-dir" ".../pkg" "--typescript" "--target" "no-modules"

If the target is web, the error is different and happens at runtime, but I believe it's still related.


Looks a bit like rust-lang/rust issue 72758 but I'm not sure.

Minimal example

Wasm-pack is needed: cargo install wasm-pack


name = "minimal_ustr_wasmpack_reproduction"
version = "0.1.0"
edition = "2021"

crate-type = ["cdylib", "rlib"]

ustr = "0.8.1"
wasm-bindgen = "0.2.78"


use ::ustr::Ustr;
use ::wasm_bindgen::prelude::*;

pub fn mre() {

`serde` feature doesn't work

Per the changelog, v0.10 of this crate renames the serialization feature to serde. However, this change was only made in Cargo.toml—the library code (and docs/readme) still references the serialization feature. Therefore, the feature is unusable.

edit: As a side note, I just noticed the readme and crate docs mention a serialize feature, but the feature was actually called serialization.

Why Ustr does not derive Ord?

Hello, I've been using this excellent library in a project of mine and when I wanted sort a Vec, it was telling me that Ustr does not implement Ord! Ustr already implements Eq and PartialEq. All that needed is to derive Ord on live 157 of

Inconsistent Ord and PartialOrd

The derived Ord is inconsistent with the PartialOrd, since it does not implement lexicographical ordering.

By the way, it seems the compiler did already detect this issue, but #10 silenced the warning instead of fixing it.

Case-insensitive strings

I am implementing a tool that deals with case-insensitive programming languages (Ada in particular, but also a custom DSL from another company). I wonder whether you have given any thoughts as to supporting such a use case ?

Given a &str as read from the source code, with any casing, we should get the same ustr, preferably without requiring memory allocations except of course when this is a new string.


Use in HashMap

Thanks again for writing ustr, I'm using it for all object names at the moment in a hobby project I'm working on. Seems to be working okay at the moment, but I have as yet to profile and measue the project properly before I'll be able to give any good feedback.

A quick question.

I need to use this type as a key in a HashMap. Is this how you'd recommend implementing the hash trait for it? Its a bit funny re-hashing the hash but...

impl Hash for Ustr {
    fn hash<H: Hasher>(&self, hasher: &mut H) {

Just wondering what your thoughts are on this.

Support for 32-bit architectures

I would love to be able to use ustr for my WebAssembly project, but it is a 32-bit architecture. What are the blockers for supporting 32-bit?

Using as a dynamic loaded package

Hi, I am really new to Rust and I was wondering what would I have to change in order to compile the package to a .so and use it like in CPython? I think I would have to initialize, from Python, the Hash on memory first with something like lazy_static! and then drop after using it, right? Also what packages would you recommend to replace those using std?

Thanks in advance!

Possible to create multiple independent caches?


By only providing a global cache, the "address spaces" of any two different (serializable) data structures relying on Ustr will be wed. The only way to actually use the serialization feature in a coherent way is to keep your string "cache" (more like a database when serialized AND centralized/global) in its own file, and consider it coupled with any other transactions on data structures storing Ustr.


Allow for users to create their own Bins or something. I'd like to be able to have isolated Bins, and serialize them next to the data structures that depend on them; as it was mentioned in the README, string-interner has this ability. If it's important that you are using lazy_static with a static variable, maybe use once_cell in the public API instead.

Is there an easy way to use Ustrs as integers?

Hi and thanks for the great library!

I need to use Ustr backed categorical values in a ML library that only supports numeric values for categories. I see that I can easily retrieve the precomputed hash via Ustr::precomputed_hash and I can use this value as id, but when I get my results back I won't know which id is which Ustr. Or at least I cannot figure it out by briefly exploring the code.
Is there a straightforward way to do this or should I just store my own map of ids to Ustrs?

Update version of parking_lot dependency

Thanks for creating ustr, it's very nice!

Running cargo audit currently flags my project because of an issue that used to exist in lock_api (rustsec), and which can cause data races.

The issue in lock_api has been fixed, and parking_lot has updated to that version. The master branch of ustr has in turn updated to the latest parking_lot, but the version on is behind.

If it's not too minor, could a new version be released with latest dependencies, so that is not susceptible to data races?

Display implementation ignores formatting paramters

Hi. I'm not sure if this is intentional or not but the Display implementation ignores all formatting parameters

println!("ab{:>2}", "c"); //prints "ab c"
println!("ab{:>2}", Ustr::from_str("c").unwrap()); //prints "abc"

I believe this could be resolved by changing the current implementation to something like

impl fmt::Display for Ustr {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {

Obviously this would be a breaking change but possibly not a serious one as using formatting parameters currently does nothing, so it seems unlikely that anybody would be relying on the current behaviour.

Check if key in UstrMap without creating new ustr

We need a way to check if key is in the UstrMap without actually creating ustr,
think of a user input and i want to check if the search term is in the map i will need to keep this user input string in memory forever
If we had a way to check without creating the ustr in memory it would have been perfect

Symbols always resolve to an empty str on wasm32-unknown-unknown

Hi again!

I spoke a little too soon in my last issue, and even with an updated parking_lot, I haven't quite been able to get things to work on wasm32-unknown-unknown, and symbols always end up resolving to an empty &str.

I've created a minimal reproduction here. The code should be fairly straightforward, but it just exports a reproduce function to Javascript, which when called, logs Ustr::from("test").as_str() to the browser's console. You'll note that an empty string is logged.

I imagine this has something to do with the pointer tagging scheme you employ, but I haven't investigated enough to be sure.

In the meantime, I've swapped out for a simpler string interning crate in my code, so this isn't a blocking issue for me or anything, but I at least wanted to raise it in case you had any ideas, or anyone else gets stuck on the same thing :)

Thanks again for the crate! If this ever gets resolved, I'd love to switch back to Ustr, because it really is an impressively fast piece of code.

Choose hash function

#3 mentioned that ahash is performant enough, but it has other characteristics (inconsistent hash) which are not desirable for us - web-infra-dev/rspack#5481

Everything still works when I switched ahash to fxhash master...Boshen:ustr-fxhash:master

Is it worthwhile to add more popular hash functions as feature toggles or should I just keep and publish the fork?

We may also need to benchmark again with the latest changes from ahash, since the current benchmark is done a few years ago.

New release with parking_lot version bump

Hi Anders!

First off, thanks for the wonderful crate: it was super easy to get started with, and did exactly what I needed :)

I ran into a slight roadbump while trying to use the published version of library through wasm-bindgen, due to the incompatible version of parking_lot. You fixed that in f5cb285, and I'm able to get things running by pinning to that specific commit, but I was wondering whether you'd be able to cut a new release of the crate, so people don't run into this out of the box?

I know that winter holidays are just wrapping up, so I totally understand if you can't get to this anytime soon or anything!

Regardless, thanks a ton!

FFI functions should be implemented as inlined function in C/C++

Currently, extern functions ustr_len and ustr_hash involve an extern function call, which is usually much slower compared to inlined function in C, considering inlined functions like these can be optimized as a direct pointer access on compilation.

Since the layout of StringCacheEntry and char_ptr is guranteed and ustr_len is frequently used, I think that should be easily to implement as an inlined function.

