Coder Social home page Coder Social logo

imstr's Introduction

Immutable Strings

crates.io docs.rs

This crate offers a cheaply cloneable and sliceable UTF-8 string type. It is inspired by the bytes crate, which offers zero-copy byte slices, and the im crate which offers immutable copy-on-write data structures. It offers a standard-library String-compatible API.

Internally, the crate uses a standard library string stored in a smart pointer, and a range into that String. This allows for cheap zero-copy cloning and slicing of the string. This is especially useful for parsing operations, where a large string needs to be sliced into a lot of substrings.

TL;DR: This crate offers an ImString type that acts as a String (in that it can be modified and used in the same way), an Arc<String> (in that it is cheap to clone) and an &str (in that it is cheap to slice) all in one, owned type.

Diagram of ImString Internals

This crate offers a safe API that ensures that every string and every string slice is UTF-8 encoded. It does not allow slicing of strings within UTF-8 multibyte sequences. It offers try_* functions for every operation that can fail to avoid panics. It also uses extensive unit testing with a full test coverage to ensure that there is no unsoundness.

Features

Efficient Cloning: The crate's architecture enables low-cost (zero-copy) clone and slice creation, making it ideal for parsing strings that are widely shared.

Efficient Slicing: The crate's architecture enables low-cost (zero-copy) slice creation, making it ideal for parsing operations where one large input string is slices into many smaller strings.

Copy on Write: Despite being cheap to clone and slice, it allows for mutation using copy-on-write. For strings that are not shared, it has an optimisation to be able to mutate it in-place safely to avoid unnecessary copying.

Compatibility: The API is designed to closely resemble Rust's standard library String, facilitating smooth integration and being almost a drop-in replacement. It also integrates with many popular Rust crates, such as serde, peg and nom.

Generic over Storage: The crate is flexible in terms of how the data is stored. It allows for using Arc<String> for multithreaded applications and Rc<String> for single-threaded use, providing adaptability to different storage requirements and avoiding the need to pay for atomic operations when they are not needed.

Safety: The crate enforces that all strings and string slices are UTF-8 encoded. Any methods that might violate this are marked as unsafe. All methods that can fail have a try_* variant that will not panic. Use of safe functions cannot result in unsound behaviour.

Example

use imstr::ImString;

// Create new ImString, allocates data.
let mut string = ImString::from("Hello, World");

// Edit: happens in-place (because this is the only reference).
string.push_str("!");

// Clone: this is zero-copy.
let clone = string.clone();

// Slice: this is zero-copy.
let hello = string.slice(0..5);
assert_eq!(hello, "Hello");

// Slice: this is zero-copy.
let world = string.slice(7..12);
assert_eq!(world, "World");

// Here we have to copy only the part that the slice refers to so it can be modified.
let hello = hello + "!";
assert_eq!(hello, "Hello!");

Optional Features

Optional features that can be turned on using feature-flags.

Feature Description
serde Serialize and deserialize ImString fields as strings with the serde crate.
peg Use ImString as the data structure that is parsed with the peg crate. See peg-list.rs for an example.
nom Allow ImString to be used to build parsers with nom. See nom-json.rs for an example.

Similar

This is a comparison of this crate to other, similar crates. The comparison is made on these features:

  • Cheap Clone: is it a zero-copy operation to clone a string?
  • Cheap Slice ๐Ÿ•: is it possibly to cheaply slice a string?
  • Mutable: is it possible to modify strings?
  • Generic Storage: is it possible to swap out the storage mechanism?
  • String Compatible: is it compatible with String?

Here is the data, with links to the crates for further examination:

Crate Cheap Clone Cheap Slice Mutable Generic Storage String Compatible Notes
imstr โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ This crate.
tendril โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โŒ Complex implementation. API not quite compatible with String, but otherwise closest to what this crate does.
immut_string โœ”๏ธ โŒ ๐ŸŸก (no optimization) โŒ โŒ Simply a wrapper around Arc<String>.
immutable_string โœ”๏ธ โŒ โŒ โŒ โŒ Wrapper around Arc<str>.
arccstr โœ”๏ธ โŒ โŒ โŒ โŒ Not UTF-8 (Null-terminated C string). Hand-written Arc implementation.
implicit-clone โœ”๏ธ โŒ โŒ ๐ŸŸก โœ”๏ธ Immutable string library. Has sync and unsync variants.
semistr โŒ โŒ โŒ โŒ โŒ Stores short strings inline.
quetta โœ”๏ธ โœ”๏ธ โŒ โŒ โŒ Wrapper around Arc<String> that can be sliced.
bytesstr โœ”๏ธ ๐ŸŸก โŒ โŒ โŒ Wrapper around Bytes. Cannot be directly sliced.
fast-str โœ”๏ธ โŒ โŒ โŒ โŒ Looks like there could be some unsafety.
flexstr โœ”๏ธ โŒ โŒ โœ”๏ธ โŒ
bytestring โœ”๏ธ ๐ŸŸก โŒ โŒ โŒ Wrapper around Bytes. Used by actix. Can be indirectly sliced using slice_ref().
arcstr โœ”๏ธ โœ”๏ธ โŒ โŒ โŒ Can store string literal as &'static str.
cowstr โœ”๏ธ โŒ โœ”๏ธ โŒ โŒ Reimplements Arc, custom allocation strategy.
strck โŒ โŒ โŒ โœ”๏ธ โŒ Typechecked string library.

License

MIT, see LICENSE.md.

imstr's People

Contributors

xfbs avatar stevefan1999-personal avatar ledjolleshaj avatar shnatsel avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.