Comments (8)
The above trick won't work because both the haystack and the needles need to be sorted wrt to the same compare (like, say, merge
).
from cccl.
It seems difficult to expose this functionality without inventing a new name for it.
boost::flat_map
's constructor uses a tag to indicate that a range is sorted for a similar problem:
template<typename InputIterator>
flat_map(ordered_unique_range_t, InputIterator, InputIterator,
const Pred & = Pred(),
const allocator_type & = allocator_type());
D's standard library has no binary search functions analogous to C++ (or functions for doing many searches in bulk like Thrust). Instead, it has a single find
function and relies on ranges being wrapped in special SortedRange
wrappers:
auto a = [ 1, 2, 3, 42, 52, 64 ];
auto r = assumeSorted(a);
assert(r.canFind(3));
assert(!r.canFind(32));
It doesn't seem to have a bulk "find these needles in this haystack" algorithm.
from cccl.
The structure of this problem is very similar to the set operations: both consume two sorted lists and produce a single result. In particular, this operation is close to set_intersection
in some ways. However, it doesn't seem possible to express this operation with a lowering to an existing set operation.
Moreover, it seems likely that this algorithm often would be applied in contexts where the data would also be consumed by the set algorithms.
How about set_lower_bound
:
template<typename InputIterator1, typename InputIterator2, typename OutputIterator, typename Compare>
OutputIterator
set_lower_bound(InputIterator1 first1, InputIterator last1,
InputIterator2 first2, InputIterator last2,
OutputIterator result,
Compare comp);
In this formulation, the [first1,last1)
range would be the "needles" range: for each element e
in the range [first1,last1)
, this algorithm writes to the result the lowest position i
in the range [first2,last2)
such that e
could be inserted without disturbing the ordering.
On the other hand, if the result was used to actually do the hypothetical insertion, it would not necessarily be equivalent to the output of any of the standard set algorithms. For example, the insertion would not produce the result of set_union
(due to special effort required by the multiset semantics), or even merge
(because multiple needles would collide at the same location) without some additional effort.
from cccl.
There's also sorted_lower_bound
:
template<typename ForwardIterator, typename InputIterator, typename OutputIterator, typename Compare1>
OutputIterator
sorted_lower_bound(ForwardIterator haystack_first,
ForwardIterator haystack_last,
InputIterator needles_first,
InputIterator needles_last,
OutputIterator result,
Compare comp);
I don't feel like this name really captures what's going on here.
from cccl.
Here are some ideas from Michael:
My immediate reaction is that a new name is warranted. Some ideas:
all_lower_bounds -- because we're finding several
simultaneous_lower_bounds -- find them all at once; but this is verbose
each_lower_bound -- yet another variant of that idea
partition_with -- we're using the "needles" to partition the "hay stack"
merge_rank -- we're computing the position of all elements of one sorted sequence in the other. Like sort, merge can be decomposed into compute ranks + permute. This is the rank step of that process.
from cccl.
lower_merge_rank
and upper_merge_rank
might be a reasonable compromise, though the merge
aspect of the name seems like it would be a red herring.
from cccl.
My code uses lower_bound and upper_bound with needles sorted all the time, so I'm definitely excited to hear that there might be further performance to be gained. For what it's worth, I think that lower_merge_rank/upper_merge_rank make sense given the explanation you quoted above.
[off-topic] I really like the search functions using arguments named using 'needles' and 'haystack' Are you considering using those names in the Thrust source code?
from cccl.
I came across the "needles" and "haystack" nomenclature in The D Programming Language. Dunno where it originated. We'd probably use it if we add this algorithm.
from cccl.
Related Issues (20)
- [DOC]: Update contributing guide to include information on how to use cmake presets
- [BUG]: CUB test relies on deprecated error code
- Missing full qualification of namespace std in CUB
- Simplify test matrix spec/usage
- Improve documentation of `vsmem_helper_default_fallback_policy_t`
- Remove old `meow(void)` function signatures
- Move to feature flag to guard for deduction guides
- [BUG]: thrust::optional<T&>::emplace() does not compile
- [BUG]: thrust::count_if and copy_if performance on Grace and x86 10x+ / 20x+ slower than libstdc++
- [FEA]: Expose `thrust::detail::contiguous_iterator_raw_pointer_cast` HOT 5
- Add tests for thrust::optional
- [FEA]: Add CI check for unqualified uses of `cuda::` namespace
- [BUG]: Thrust's OpenMP `unique_by_key` fails with large inputs/outputs.
- [FEA]: Upgrade Catch2 in CUB to version 3 HOT 1
- [DOC]: concepts library appears twice in TOC and is out of place (and ranges library page is missing)
- [BUG]: Intermittent wrong output from thrust::remove_if under heavy GPU loading HOT 5
- [BUG]: MSVC < 2022 doesn't properly handle thrust's member function detector.
- [BUG]: PTXAS emits advisory regarding cp.async.bulk.*.multicast use on sm_90
- [FEA]: Add support for SM_100 and SM_100a in <nv/target>
- Port Thrust docs to use Sphinx
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cccl.