Coder Social home page Coder Social logo

rust-lazysort's Introduction

Lazysort

Build Status

Adds a method to iterators that returns a sorted iterator over the data. The sorting is achieved lazily using a quicksort algorithm.

Available via crates.io.

Usage

extern crate lazysort;

use lazysort::Sorted;

use lazysort::SortedBy;

use lazysort::SortedPartial;

The Sorted trait adds a method sorted to all Iterator<T: Ord> which returns an iterator over the same data in default order.

The SortedBy trait adds a method sorted_by to all Iterator<T> which returns an iterator over the same data ordered according to the provided closure/function of type Fn(&T, &T) -> Ordering

The SortedPartial trait adds two methods sorted_partial_first and sorted_partial_last to all Iterator<T: PartialOrd> which returns an iterator over the same data in the default order. The difference between the two is whether non-comparable values go first or last in the results.

For example:

let data: Vec<uint> = vec![9, 1, 3, 4, 4, 2, 4];
for x in data.iter().sorted() {
	println!("{}", x);
}

Will print: 1, 2, 3, 4, 4, 4, 9

A more complex example. Sort strings by length, then in default string order:

let before: Vec<&str> = vec!["a", "cat", "sat", "on", "the", "mat"];
before.iter().sorted_by(|a, b| {
    match a.len().cmp(&b.len()) {
        Equal => a.cmp(b),
        x => x
    }
})

This returns an iterator which yields: a, on, cat, mat, sat, the.

Implementation details and performance

The algorithm is essentially the same as described in my blog post using a lazy sort as an example of Clojure's lazy sequences. But made to fit in with Rust's iterators.

The full sequence from the parent iterator is read, then each call to next returns the next value in the sorted sequence. The sort is done element-by-element so the full order is only realised by iterating all the way through to the end.

The algorithm is the quicksort, but depth-first; upon each call to next it does the work necessary to find the next item then pauses the state until the next call to next.

To test performance we compare it against sorting the full vector, using the sort function from the standard library, and also against std::collections::BinaryHeap.

First we compare what happens when sorting the entire vector:

test benches::c_heap_bench     ... bench:   3,703,166 ns/iter (+/- 454,189)
test benches::c_lazy_bench     ... bench:   3,961,047 ns/iter (+/- 603,083)
test benches::c_standard_bench ... bench:   3,093,873 ns/iter (+/- 430,401)

There are differences between the three, and not surprisingly the built-in sort is fastest.

These benchmarks are for sorting 50,000 random uints in the range 0 <= x < 1000000. Run cargo bench to run them.

So what's the point of lazy sorting? As per the linked blog post, they're useful when you do not need or intend to need every value; for example you may only need the first 1,000 ordered values from a larger set.

Comparing the lazy approach data.iter().sorted().take(x) vs a standard approach of sorting a vector then taking the first x values gives the following.

The first 1,000 out of 50,000:

test benches::a_heap_bench     ... bench:     366,767 ns/iter (+/- 55,393)
test benches::a_lazy_bench     ... bench:     171,923 ns/iter (+/- 52,784)
test benches::a_standard_bench ... bench:   3,055,734 ns/iter (+/- 348,407)

The lazy approach is quite a bit faster; this is due to the 50,000 only being sorted enough to identify the first 1,000, the rest remain unsorted. BinaryHeap is also quite fast, for the same reason.

The first 10,000 out of 50,000:

test benches::b_heap_bench     ... bench:   1,126,774 ns/iter (+/- 156,833)
test benches::b_lazy_bench     ... bench:     993,954 ns/iter (+/- 208,188)
test benches::b_standard_bench ... bench:   3,054,598 ns/iter (+/- 285,970)

The lazy approach is still faster in this situation.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

rust-lazysort's People

Contributors

benashford avatar dtolnay avatar erickt avatar rookwood101 avatar spk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rust-lazysort's Issues

Add an in-place sortlazy

Maybe add trait that allow user to use sortlazy on a [T] directly, this would save the extra space allocated by the Iterator. I did an exemple here

Very bad performance with duplicate value

I did some benchmark and find out the the sort have terrible performance with duplicate value

#[bench]
fn lazysort(b: &mut Bencher) {
    b.iter(|| {
        use lazysort::Sorted;
        let mut rng = thread_rng();
        let v: Vec<i32> = std::iter::repeat_with(|| rng.gen_range(0, 2))
            .take(10000)
            .collect();
        let _: Vec<_> = v.iter().sorted().take(1).collect();
    })
}

You should implement the quick sort using the 3-ways method.

Relicense under dual MIT/Apache-2.0

This issue was automatically generated. Feel free to close without ceremony if
you do not agree with re-licensing or if it is not possible for other reasons.
Respond to @cmr with any questions or concerns, or pop over to
#rust-offtopic on IRC to discuss.

You're receiving this because someone (perhaps the project maintainer)
published a crates.io package with the license as "MIT" xor "Apache-2.0" and
the repository field pointing here.

TL;DR the Rust ecosystem is largely Apache-2.0. Being available under that
license is good for interoperation. The MIT license as an add-on can be nice
for GPLv2 projects to use your code.

Why?

The MIT license requires reproducing countless copies of the same copyright
header with different names in the copyright field, for every MIT library in
use. The Apache license does not have this drawback. However, this is not the
primary motivation for me creating these issues. The Apache license also has
protections from patent trolls and an explicit contribution licensing clause.
However, the Apache license is incompatible with GPLv2. This is why Rust is
dual-licensed as MIT/Apache (the "primary" license being Apache, MIT only for
GPLv2 compat), and doing so would be wise for this project. This also makes
this crate suitable for inclusion and unrestricted sharing in the Rust
standard distribution and other projects using dual MIT/Apache, such as my
personal ulterior motive, the Robigalia project.

Some ask, "Does this really apply to binary redistributions? Does MIT really
require reproducing the whole thing?" I'm not a lawyer, and I can't give legal
advice, but some Google Android apps include open source attributions using
this interpretation. Others also agree with
it
.
But, again, the copyright notice redistribution is not the primary motivation
for the dual-licensing. It's stronger protections to licensees and better
interoperation with the wider Rust ecosystem.

How?

To do this, get explicit approval from each contributor of copyrightable work
(as not all contributions qualify for copyright, due to not being a "creative
work", e.g. a typo fix) and then add the following to your README:

## License

Licensed under either of

 * Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
 * MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)

at your option.

### Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any
additional terms or conditions.

and in your license headers, if you have them, use the following boilerplate
(based on that used in Rust):

// Copyright 2016 rust-lazysort developers
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.

It's commonly asked whether license headers are required. I'm not comfortable
making an official recommendation either way, but the Apache license
recommends it in their appendix on how to use the license.

Be sure to add the relevant LICENSE-{MIT,APACHE} files. You can copy these
from the Rust repo for a plain-text
version.

And don't forget to update the license metadata in your Cargo.toml to:

license = "MIT/Apache-2.0"

I'll be going through projects which agree to be relicensed and have approval
by the necessary contributors and doing this changes, so feel free to leave
the heavy lifting to me!

Contributor checkoff

To agree to relicensing, comment with :

I license past and future contributions under the dual MIT/Apache-2.0 license, allowing licensees to chose either at their option.

Or, if you're a contributor, you can check the box in this repo next to your
name. My scripts will pick this exact phrase up and check your checkbox, but
I'll come through and manually review this issue later as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.