Coder Social home page Coder Social logo

arc-interner's Introduction

Rust

An interner that deallocates unused values.

This crate is a fork of David Roundy's internment crate. It provides an alternative implementation of the internment::ArcIntern type. It inherits David's high-level design and API; however it is built completely on Rust's standard Arc type and the dashmap crate and does not contain any unsafe code.

Interning reduces the memory footprint of an application by storing a unique copy of each distinct value. It speeds up equality comparison and hashing operations, as only pointers rather than actual values need to be compared. On the flip side, object creation is slower, as it involves lookup in the interned object pool.

Interning is most commonly applied to strings; however it can also be useful for other object types. This library supports interning of arbitrary objects.

There exist several interning libraries for Rust, each with its own set of tradeoffs. This library makes the following design choices:

  • Interned objects are reference counted. When the last reference to an interned object is dropped, the object is deallocated. This prevents unbounded growth of the interned object pool in applications where the set of interned values changes dynamically at the cost of some CPU and memory overhead (due to storing and maintaining an atomic counter).
  • Multithreading. A single pool of interned objects is shared by all threads in the program. The pool is implemented as a DashMap for safe concurrent access and low contention.
  • Not just strings: this library allows interning any data type that satisfies the Eq + Hash + Send + Sync trait bound.
  • Safe: this library is built on Arc and DashMap types and does not contain any unsafe code.

Example

use arc_interner::ArcIntern;
let x = ArcIntern::new("hello");
let y = ArcIntern::new("world");
assert_ne!(x, y);
assert_eq!(x, ArcIntern::new("hello"));
assert_eq!(*x, "hello"); // dereference an ArcIntern like a pointer

arc-interner's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

arc-interner's Issues

`ArcIntern<T>`'s `Hash` implementation is invalid

Since ArcIntern<T> provides Borrow<T>, the Hash implementations have to be the same. However, they're not.

This caused some problems when coercing ArcIntern to Borrow, and HashMaps stopped working correctly, as the given hashmap (before the coercion) used ArcIntern's hash functions, while after coercion, the hashmaps were looked up via T's Hash via Borrow<T>, which caused wrong results.

lock contention on state::Container - could design it out at cost of api change?

With the move to DashMap, lock contention is reduced, but there is still a global lock acquired via the state::Container that in my use-case causes contention on the try_get. Try_get calls the internal lock method when the Container is not frozen which spins so it uses lots of cpu when contended.

Container has a freeze() method to reduce the lock contention once its fully populated, but with current model I can't see a way to call that as there could always be a new type T coming to ArcIntern::new().

Currently we're effectively using the state::Container as a kind of shared generic static between all interned types. Looking at alternatives to state::Container, they are effectively also a shared HashMap<TypeId, _> that already wrap in a mutex internally (e.g. generic_static crate), or we'd need to wrap in a mutex (e.g. anymap crate) . Macro approaches to generate per-type statics inside a library crate also appear to be unappealing

What would you think about an approach where the crate moves to a shared-nothing approach between types, so that interning type Foo and type Bar don't need to acquire the same locks at all? This could potentially eliminate the state::Container and its lock entirely. This could be done by introducing a factory type (which would be an api change).

Current non-factory pseudo-code

// setup - could just use ArcIntern directly, but have this wrapper so can try different interning approaches.
struct Wrapped<T>(ArcIntern<T>)
impl<T> Wrapped {
  pub fn new(t: T) Self {
     Self(ArcIntern::new(t))
  }
}
// Also implements Deref

// callsite
let a = Wrapped::new(Foo::new())
let b = Wrapped::new(Bar::new())

vs possible alternative factory code where per-type factories are separate statics containing their own DashMap<T,()> and therefore share no state.

// setup
static FOO_INTERNER= OnceCell<ArcInterner<Foo>> =  OnceCell::new();
static BAR_INTERNER= OnceCell<ArcInterner<Bar>> =  OnceCell::new();
struct Wrapped<T>(ArcIntern<T>)
impl Wrapped<Foo> {
  pub fn new(t: Foo) Self {
     Self(FOO_INTERNER::intern(t))
  }
}
impl Wrapped<Bar> {
  pub fn new(t: Bar) Self {
     Self(BAR_INTERNER::intern(t))
  }
}
...
// callsite
let a = Wrapped::new(Foo::new())
let b = Wrapped::new(Bar::new())

ArcInterner would likely need to pass a function as a type parameter to ArcIntern to handle the cleanup on drop, and ArcIntern::new would become internal only so that construction is always via ArcInterner::intern .

Allow `T: ?Sized`

We're thinking of replacing all Arc instances with ArcIntern, however, the T we're working with here is ?Sized.

Next to ArcIntern opting out of that trait, we need From<Box<T>> and From<&T> for ArcIntern, to properly convert some instances of handling these types.

Include the rest of `internment`'s API

Even though only ArcIntern is flawed in internment, using its other APIs carries the risk that later on someone may start using ArcIntern without realizing its problems.

This crate should also export Intern and LocalIntern so that it can be used as a complete functional replacement.

Avoiding a copy when instantiating an ArcIntern

As the API currently stands, it looks like the only way to initialize an ArcIntern is via new which takes a val: T. Would it be possible to instead make new take &T, and require T: Clone? That way, no additional copy would be necessary when the value being instantiated is already interned.

I have a gist for how this might work under the hood (looking up in the map for the appropriate value and only copying the Vec if necessary): https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b380b64e7ef57499d612a0114e94170a

What do you think?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.