Coder Social home page Coder Social logo

Comments (7)

lnicola avatar lnicola commented on July 19, 2024 1

Something like GroupBy or ToLookup here https://codeblog.jonskeet.uk/category/edulinq/, I think. The latter is eager, while the former is deferred until the outer iterator is evaluated, I believe. That's in contrast with the other iterator adapters which are mostly lazy.

from itertools.

bluss avatar bluss commented on July 19, 2024

Sure, it's a bit like the current unique, except you collect all values that map to the same key.

from itertools.

matematikaadit avatar matematikaadit commented on July 19, 2024

I'm interested in submitting PR for this. But I'm not quite understand about the explanation. Care to gives an example?

from itertools.

matematikaadit avatar matematikaadit commented on July 19, 2024

Sorry, not familiar with C#. A rust snippet on how the method is used and the intended result would be appreciated.

from itertools.

lnicola avatar lnicola commented on July 19, 2024

@matematikaadit And I don't really know Rust, so the following might make little sense. I offered that link as a guideline for the API, not for the implementation. Anyway, consider this code:

// numbers and their lengths in English:
let items = vec![(4, "zero"), (3, "one"), (3, "two"), (4, "four"), (4, "five"),
                         (3, "six"), (5, "seven"), (5, "eight"), (4, "nine"), (3, "ten")];
// group by length, associate keys with references to the original values
let grouped = items.iter().group_by(|&item| item.0).collect::<Vec<_>>();
// grouped: Vec<(usize, Vec<&({Integer}, &str)>)>

assert_eq!(grouped.len(), 3); // three distinct lengths

// the keys, in the original order, but distinct
assert_eq!(grouped[0].0, 4);
assert_eq!(grouped[1].0, 3);
assert_eq!(grouped[2].0, 5);

assert_eq!(*grouped[0].1[0].1, "zero");
assert_eq!(*grouped[0].1[4].1, "four");
assert_eq!(*grouped[2].1[3].1, "ten");

// that looks rather ugly, not that I've typed it

// another variant, also available in the .NET API:
// this one has key and value selectors
let grouped = items.iter().group_by(|&item| item.0, |&item| &item.1).collect::<Vec<_>>();
// grouped: Vec<(usize, Vec<&str>)>

// or, maybe more idiomatic and easier to implement without overloading, return keys and values
let grouped = items.iter().group_by(|&item| (item.0, item.1)).collect::<Vec<_>>();
// which is more or less the same as
let grouped = items.iter().group_by(|item| item).collect::<Vec<_>>();

// a "real" example, take the length of each number, group by length,
// sort by how many of them there are
items.iter()
         .group_by(|item| item)
         .map(|group| (group.0, group.1.len()))
         .sorted_by(|&group1, &group2| group1.1.cmp(group2.1))
         .collect::<Vec<_>>();
// should yield [(4, 4), (3, 4), (5, 2)]

With some hand-waving about the lifetimes and references -- I assumed above that group_by can return references to the items, but that probably doesn't make sense as they don't necessarily live long enough.

As for the specifics, the .NET implementation returns keys in the original order and is somewhat lazy, in that the result is constructed only when the iterator is first dereferenced. The values in each group are also in the original order, if I recall correctly. It probably builds a hash table, with the caveat that it also needs to remember the key order. It's different from other LINQ/Iterator methods in that it needs to allocate.

Hope I made sense. You can also look at group_by in itertools, which does a similar thing but assumes that the equal keys are in consecutive positions. This allows it to work without the hash table, but is less general.

from itertools.

tobz1000 avatar tobz1000 commented on July 19, 2024

I'm interested in this iterator too. The exact output still needs to be decided however. The way I see it, the returned struct could be one of:

  • A HashMap<K, Vec<V>>,
  • A Vec<(K, Vec<V>)> (preserving first-key-encounter order),
  • A wrapper over one of the above, with an Iterator implementation where Item=(K, V), similar to GroupBy.

Whichever underlying structure it is, it would mean a wasted allocation if the caller wants to then convert it to the other structure.

@bluss, do you have any preference?

from itertools.

bluss avatar bluss commented on July 19, 2024

HashMap, then it will have reasonable performance for all scales of input. I think we should just go with HashMap here and let that be the practical solution. Any more general solution will not materialize for a while now.

from itertools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.