Coder Social home page Coder Social logo

Comments (8)

jaredhoberock avatar jaredhoberock commented on May 16, 2024

The above trick won't work because both the haystack and the needles need to be sorted wrt to the same compare (like, say, merge).

from cccl.

jaredhoberock avatar jaredhoberock commented on May 16, 2024

It seems difficult to expose this functionality without inventing a new name for it.

boost::flat_map's constructor uses a tag to indicate that a range is sorted for a similar problem:

template<typename InputIterator> 
  flat_map(ordered_unique_range_t, InputIterator, InputIterator, 
           const Pred & = Pred(), 
           const allocator_type & = allocator_type());

D's standard library has no binary search functions analogous to C++ (or functions for doing many searches in bulk like Thrust). Instead, it has a single find function and relies on ranges being wrapped in special SortedRange wrappers:

auto a = [ 1, 2, 3, 42, 52, 64 ];
auto r = assumeSorted(a);
assert(r.canFind(3));
assert(!r.canFind(32));

It doesn't seem to have a bulk "find these needles in this haystack" algorithm.

from cccl.

jaredhoberock avatar jaredhoberock commented on May 16, 2024

The structure of this problem is very similar to the set operations: both consume two sorted lists and produce a single result. In particular, this operation is close to set_intersection in some ways. However, it doesn't seem possible to express this operation with a lowering to an existing set operation.

Moreover, it seems likely that this algorithm often would be applied in contexts where the data would also be consumed by the set algorithms.

How about set_lower_bound:

template<typename InputIterator1, typename InputIterator2, typename OutputIterator, typename Compare>
  OutputIterator
    set_lower_bound(InputIterator1 first1, InputIterator last1,
                    InputIterator2 first2, InputIterator last2,
                    OutputIterator result,
                    Compare comp);

In this formulation, the [first1,last1) range would be the "needles" range: for each element e in the range [first1,last1), this algorithm writes to the result the lowest position i in the range [first2,last2) such that e could be inserted without disturbing the ordering.

On the other hand, if the result was used to actually do the hypothetical insertion, it would not necessarily be equivalent to the output of any of the standard set algorithms. For example, the insertion would not produce the result of set_union (due to special effort required by the multiset semantics), or even merge (because multiple needles would collide at the same location) without some additional effort.

from cccl.

jaredhoberock avatar jaredhoberock commented on May 16, 2024

There's also sorted_lower_bound:

template<typename ForwardIterator, typename InputIterator, typename OutputIterator, typename Compare1>
  OutputIterator
    sorted_lower_bound(ForwardIterator haystack_first,
                       ForwardIterator haystack_last,
                       InputIterator needles_first,
                       InputIterator needles_last,
                       OutputIterator result,
                       Compare comp);

I don't feel like this name really captures what's going on here.

from cccl.

jaredhoberock avatar jaredhoberock commented on May 16, 2024

Here are some ideas from Michael:

My immediate reaction is that a new name is warranted.  Some ideas:

all_lower_bounds -- because we're finding several

simultaneous_lower_bounds -- find them all at once; but this is verbose

each_lower_bound -- yet another variant of that idea

partition_with -- we're using the "needles" to partition the "hay stack"

merge_rank -- we're computing the position of all elements of one sorted sequence in the other.  Like sort, merge can be decomposed into compute ranks + permute.  This is the rank step of that process.

from cccl.

jaredhoberock avatar jaredhoberock commented on May 16, 2024

lower_merge_rank and upper_merge_rank might be a reasonable compromise, though the merge aspect of the name seems like it would be a red herring.

from cccl.

andrewcorrigan avatar andrewcorrigan commented on May 16, 2024

My code uses lower_bound and upper_bound with needles sorted all the time, so I'm definitely excited to hear that there might be further performance to be gained. For what it's worth, I think that lower_merge_rank/upper_merge_rank make sense given the explanation you quoted above.

[off-topic] I really like the search functions using arguments named using 'needles' and 'haystack' Are you considering using those names in the Thrust source code?

from cccl.

jaredhoberock avatar jaredhoberock commented on May 16, 2024

I came across the "needles" and "haystack" nomenclature in The D Programming Language. Dunno where it originated. We'd probably use it if we add this algorithm.

from cccl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.