Coder Social home page Coder Social logo

nvidia / thrust Goto Github PK

View Code? Open in Web Editor NEW
4.8K 205.0 757.0 17.39 MB

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

License: Other

Python 1.50% Cuda 23.04% C++ 68.59% C 3.70% CMake 2.11% Makefile 0.28% Shell 0.48% Perl 0.30%
cuda nvidia-hpc-sdk thrust gpu cpp cpp11 cpp14 cpp17 cpp20 cxx

thrust's Introduction

⚠️ The Thrust repository has been archived and is now part of the unified nvidia/cccl repository. See the announcement here for more information. Please visit the new repository for the latest updates. ⚠️

Thrust: The C++ Parallel Algorithms Library

Examples Godbolt Documentation

Thrust is the C++ parallel algorithms library which inspired the introduction of parallel algorithms to the C++ Standard Library. Thrust's high-level interface greatly enhances programmer productivity while enabling performance portability between GPUs and multicore CPUs. It builds on top of established parallel programming frameworks (such as CUDA, TBB, and OpenMP). It also provides a number of general-purpose facilities similar to those found in the C++ Standard Library.

The NVIDIA C++ Standard Library is an open source project; it is available on GitHub and included in the NVIDIA HPC SDK and CUDA Toolkit. If you have one of those SDKs installed, no additional installation or compiler flags are needed to use libcu++.

Examples

Thrust is best learned through examples.

The following example generates random numbers serially and then transfers them to a parallel device where they are sorted.

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>
#include <thrust/copy.h>
#include <thrust/random.h>

int main() {
  // Generate 32M random numbers serially.
  thrust::default_random_engine rng(1337);
  thrust::uniform_int_distribution<int> dist;
  thrust::host_vector<int> h_vec(32 << 20);
  thrust::generate(h_vec.begin(), h_vec.end(), [&] { return dist(rng); });

  // Transfer data to the device.
  thrust::device_vector<int> d_vec = h_vec;

  // Sort data on the device.
  thrust::sort(d_vec.begin(), d_vec.end());

  // Transfer data back to host.
  thrust::copy(d_vec.begin(), d_vec.end(), h_vec.begin());
}

See it on Godbolt

This example demonstrates computing the sum of some random numbers in parallel:

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/reduce.h>
#include <thrust/functional.h>
#include <thrust/random.h>

int main() {
  // Generate random data serially.
  thrust::default_random_engine rng(1337);
  thrust::uniform_real_distribution<double> dist(-50.0, 50.0);
  thrust::host_vector<double> h_vec(32 << 20);
  thrust::generate(h_vec.begin(), h_vec.end(), [&] { return dist(rng); });

  // Transfer to device and compute the sum.
  thrust::device_vector<double> d_vec = h_vec;
  double x = thrust::reduce(d_vec.begin(), d_vec.end(), 0, thrust::plus<int>());
}

See it on Godbolt

This example show how to perform such a reduction asynchronously:

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/async/copy.h>
#include <thrust/async/reduce.h>
#include <thrust/functional.h>
#include <thrust/random.h>
#include <numeric>

int main() {
  // Generate 32M random numbers serially.
  thrust::default_random_engine rng(123456);
  thrust::uniform_real_distribution<double> dist(-50.0, 50.0);
  thrust::host_vector<double> h_vec(32 << 20);
  thrust::generate(h_vec.begin(), h_vec.end(), [&] { return dist(rng); });

  // Asynchronously transfer to the device.
  thrust::device_vector<double> d_vec(h_vec.size());
  thrust::device_event e = thrust::async::copy(h_vec.begin(), h_vec.end(),
                                               d_vec.begin());

  // After the transfer completes, asynchronously compute the sum on the device.
  thrust::device_future<double> f0 = thrust::async::reduce(thrust::device.after(e),
                                                           d_vec.begin(), d_vec.end(),
                                                           0.0, thrust::plus<double>());

  // While the sum is being computed on the device, compute the sum serially on
  // the host.
  double f1 = std::accumulate(h_vec.begin(), h_vec.end(), 0.0, thrust::plus<double>());
}

See it on Godbolt

Getting The Thrust Source Code

Thrust is a header-only library; there is no need to build or install the project unless you want to run the Thrust unit tests.

The CUDA Toolkit provides a recent release of the Thrust source code in include/thrust. This will be suitable for most users.

Users that wish to contribute to Thrust or try out newer features should recursively clone the Thrust Github repository:

git clone --recursive https://github.com/NVIDIA/thrust.git

Using Thrust From Your Project

For CMake-based projects, we provide a CMake package for use with find_package. See the CMake README for more information. Thrust can also be added via add_subdirectory or tools like the CMake Package Manager.

For non-CMake projects, compile with:

  • The Thrust include path (-I<thrust repo root>)
  • The libcu++ include path (-I<thrust repo root>/dependencies/libcudacxx/)
  • The CUB include path, if using the CUDA device system (-I<thrust repo root>/dependencies/cub/)
  • By default, the CPP host system and CUDA device system are used. These can be changed using compiler definitions:
    • -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_XXX, where XXX is CPP (serial, default), OMP (OpenMP), or TBB (Intel TBB)
    • -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_XXX, where XXX is CPP, OMP, TBB, or CUDA (default).

Developing Thrust

Thrust uses the CMake build system to build unit tests, examples, and header tests. To build Thrust as a developer, it is recommended that you use our containerized development system:

# Clone Thrust and CUB repos recursively:
git clone --recursive https://github.com/NVIDIA/thrust.git
cd thrust

# Build and run tests and examples:
ci/local/build.bash

That does the equivalent of the following, but in a clean containerized environment which has all dependencies installed:

# Clone Thrust and CUB repos recursively:
git clone --recursive https://github.com/NVIDIA/thrust.git
cd thrust

# Create build directory:
mkdir build
cd build

# Configure -- use one of the following:
cmake ..   # Command line interface.
ccmake ..  # ncurses GUI (Linux only).
cmake-gui  # Graphical UI, set source/build directories in the app.

# Build:
cmake --build . -j ${NUM_JOBS} # Invokes make (or ninja, etc).

# Run tests and examples:
ctest

By default, a serial CPP host system, CUDA accelerated device system, and C++14 standard are used. This can be changed in CMake and via flags to ci/local/build.bash

More information on configuring your Thrust build and creating a pull request can be found in the contributing section.

Licensing

Thrust is an open source project developed on GitHub. Thrust is distributed under the Apache License v2.0 with LLVM Exceptions; some parts are distributed under the Apache License v2.0 and the Boost License v1.0.

CI Status

thrust's People

Contributors

alliepiper avatar andrewcorrigan avatar bjude avatar brycelelbach avatar chengjiew avatar dfontainenvidia avatar djns99 avatar dkolsen-pgi avatar elstehle avatar ericniebler avatar filipemaia avatar fkallen avatar germasch avatar gevtushenko avatar griwes avatar harrism avatar jaredhoberock avatar jrhemstad avatar kperelygin avatar mfbalin avatar mfrancis95 avatar miscco avatar msadang avatar nwhitehead avatar robertmaynard avatar rongou avatar stephandollberg avatar upsj avatar wmaxey avatar wnbell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

thrust's Issues

consider removing device-to-device copy workaround for absence of nvcc

IIRC we decided to eliminate this WAR because it violates the one-definition rule and there's no way to ensure we get the "good" version when both are present.

[1] http://code.google.com/p/thrust/source/browse/thrust/system/cuda/detail/copy_device_to_device.inl#41

I don't think we should modify that code unless we get a bug report

I'll take a look at this later. I can't say what the right approach is, but I'm concerned about the silent (and hard to diagnose) performance hazard that this introduces.

Forwarded from http://code.google.com/p/thrust/issues/detail?id=449

Consider support for stateful dispatch

Thrust does not introduce (much) state beyond the implicit state of each underlying backend system. However, it is occasionally desirable to perform algorithm dispatch in a stateful manner. For example, a user may desire to associate an algorithm's launch with a particular CUDA stream, or to associate a computation with a particular processor, or to associate an algorithm's temporary storage requirements with a particular segment of memory. The current method of (stateless) tag dispatch cannot support any of these use cases in general.

Thrust should generalize tag dispatch to provide stateful forms of invocation for each algorithm, e.g.

template<typename State, typename Iterator, typename Function>
void algo(State &state, Iterator first, Iterator last, Function f);

template<typename Iterator, typename Function>
void algo(Iterator first, Iterator last, Function f);

The obvious way to introduce stateful tags would be to provide this sort of interface:

thrust::algo(Tag t, Iterator first, Iterator last);

If you want to inject state, you'd just do:

vector vec; 
state s;
thrust::algo(s, vec.begin(), vec.end());

We had previously nixed allowing the user to explicitly provide a tag, because it's not clear what this should do:

cuda::vector vec;
thrust::algo(omp::tag(), vec.begin(), vec.end());  // should this be a compile time error?

nor what this should do:

cuda::vector cuda_vec;
omp::vector omp_vec;
thrust::algo2(omp::tag(), cuda_vec.begin(), cuda_vec.end(), omp_vec.begin()); // should this be a compile-time error?

This tag-explicit form raises issues:

  • if the iterator arguments' tags can't be reconciled with the provided tag, should this be a compile time error?
  • should the iterators be retagged before dispatch? If algo2 lowers to another Thrust algorithm, then it will be dispatched using the iterator's tags, not what was provided by the user.

Perhaps all lowering should be performed using the explicit tag form? There'd be no other way to communicate the state, because select_system doesn't know anything about it.

OTOH, this interface is attractive because it makes selecting a backend simpler than using retag on all the iterator arguments:

This:

thrust::algo(thrust::cuda::tag(), vec.begin(), vec.end());

Is nicer than:

thrust::algo(thrust::retag<thrust::cuda::tag>(vec.begin()), thrust::retag<thrust::cuda::tag>(vec.end()));

A tag-explicit form also makes it easy to communicate precisely what the implicit API does:

namespace thrust
{

template<typename Iterator>
  void algo(Iterator first, Iterator second)
{
  typename thrust::iterator_system<Iterator>::type system;
  thrust::algo(select_system(system()), first, second);
}

}

The tag-explicit form would do the ADL part:

namespace thrust
{

template<typename Tag, typename Iterator>
  void algo(Tag tag, Iterator first, Iterator last)
{
  // dispatch via adl
  algo(tag, first, last);
}

}

We'd probably have to change all Tag parameters to be references, which would create a lot of new entry points. Rvalue reference would probably solve that problem.

Alternatively, we could pass by value and ask the user to use something like reference_wrapper.

Document more subtleties

Here's a non-exhaustive list:

zip_iterator equality
could also mention that the current implementations of some algos requires data types to have default constructors and trivial copy constructors
host/cpp algorithms have parallel semantics, but current implementation is serial
presence of thrust::minimum and thrust::maximum
generalization of some algorithms to accept stencil sequences
presence of stable_partition_copy
interpretation of get_temporary_buffer's result
what thrust::detail means
what an experimental namespace implies

Forwarded from http://code.google.com/p/thrust/issues/detail?id=459

Thrust random headers do not pass trivial tests

xor_combine_engine_max.h does not properly #include <cstddef>:

./thrust/random/detail/xor_combine_engine_max.h:275:28: error: 'size_t' has not been declared
./thrust/random/detail/xor_combine_engine_max.h:275:57: error: 'size_t' has not been declared
./thrust/random/detail/xor_combine_engine_max.h:278:16: error: 'size_t' does not name a type
./thrust/random/detail/xor_combine_engine_max.h:284:37: error: 'w' was not declared in this scope
./thrust/random/detail/xor_combine_engine_max.h:284:41: error: template argument 2 is invalid
./thrust/random/detail/xor_combine_engine_max.h:285:5: error: template argument 3 is invalid
./thrust/random/detail/xor_combine_engine_max.h:291:37: error: 'w' was not declared in this scope
./thrust/random/detail/xor_combine_engine_max.h:291:41: error: template argument 2 is invalid
./thrust/random/detail/xor_combine_engine_max.h:292:5: error: template argument 3 is invalid
./thrust/random/detail/xor_combine_engine_max.h:308:7: error: 'w' was not declared in this scop

unit tester should test on all devices by default

Right now, the unit tester defaults to device ID 0. In a multi-gpu system, it might be better to default to device ID -1, which causes it to run tests on all GPUs in the system. This makes multi-GPU testing in Jenkins a lot easier since we don't have to launder a device ID command line variable through SCons.

Tag dispatch bug (customizing transform)

#include <thrust/iterator/iterator_traits.h>
#include <thrust/device_vector.h>
#include <thrust/for_each.h>
#include <thrust/transform.h>
#include <thrust/functional.h>
#include <iostream>

struct my_tag : thrust::device_system_tag {};

template<typename InputIterator,
         typename OutputIterator,
         typename UnaryFunction>
OutputIterator transform(my_tag, 
                         InputIterator first,
                         InputIterator last,
                         OutputIterator result,
                         UnaryFunction f)
{
  // output a message
  std::cout << "Hello, world from transform(my_tag)!" << std::endl;

  return thrust::transform(thrust::retag<thrust::device_system_tag>(first),
                           thrust::retag<thrust::device_system_tag>(last),
                           thrust::retag<thrust::device_system_tag>(result),
                           f);
}

int main()
{
  thrust::device_vector<int> vec(1);

  thrust::transform(thrust::retag<my_tag>(vec.begin()),
                    thrust::retag<my_tag>(vec.end()),
                    thrust::retag<my_tag>(vec.begin()),
                    thrust::identity<int>());

  return 0;
}

Jared suggested a workaround: adding a select_system overload for my_tag solves the problem:

my_tag select_system(my_tag, my_tag) {
  return my_tag;
}

And also suggested the real problem is that the thrust::cuda::detail::select_system() overload gets called, and erases the derivedness type of my_tag. Perhaps these select system calls need to be templatized?

template<typename Tag> Tag select_system(Tag,Tag) { return Tag(); }

c++11 requires vector_base to construct elements using its allocator

We do it in an ad hoc fashion, but we're required to use the allocator according to S. 23.2.1 subpoint 3:

For the components affected by this subclause that declare an allocator_type, objects stored in these
components shall be constructed using the allocator_traits<allocator_type>::construct function and
destroyed using the allocator_traits<allocator_type>::destroy function (20.6.8.2). These functions
are called only for the container’s element type, not for internal types used by the container. [ Note: This
means, for example, that a node-based container might need to construct nodes containing aligned buffers
and call construct to place the element into the buffer. —end note ]

This is actually a good thing for us, because it means that the user should be able to create an uninitialized_allocator, if he desires.

See this post [1] for why it's worthwhile to do this sort of thing.

[1] http://stackoverflow.com/questions/7218574/avoiding-default-construction-of-elements-in-standard-containers

Eliminate comma operator for placeholders

It causes problems in expressions like

thrust::transform_reduce(v.begin(), v.end(), _1 * _1, 0.0f, _1 + _2);

I don't know how to disambiguate cases like these and I can't think of any reason to keep it around.

Algorithms which assume sorted input should have _by_key variants

In practice, this would mean adding _by_key set algorithm variants.

This has value beyond Thrust; note how awkward [1] taking the difference of two std::sets is.

[1] http://stackoverflow.com/questions/7706602/how-to-subtract-one-list-of-map-keys-from-another-and-get-new-map-map-a-mab-b/7706740#7706740

How should the key and value sequences be ordered?

Option 1: [first1, last1) and [first2, last2) are the left and right key sequences

template <typename InputIterator1,
typename InputIterator2,
typename InputIterator3,
typename InputIterator4,
typename OutputIterator1,
typename OutputIterator2,
typename StrictWeakOrdering>
thrust::pair<OutputIterator1,OutputIterator2>
merge_by_key(InputIterator1 first1, InputIterator1 last1,
InputIterator2 first2, InputIterator2 last2,
InputIterator3 first3,
InputIterator4 first4,
OutputIterator1 output1,
OutputIterator2 output2,
StrictWeakOrdering comp);

Option 2: [first1, last1) and [first3, last3) are the left and right key sequences

template <typename InputIterator1,
typename InputIterator2,
typename InputIterator3,
typename InputIterator4,
typename OutputIterator1,
typename OutputIterator2,
typename StrictWeakOrdering>
thrust::pair<OutputIterator1,OutputIterator2>
merge_by_key(InputIterator1 first1, InputIterator1 last1,
InputIterator2 first2,
InputIterator3 first3, InputIterator last3,
InputIterator4 first4,
OutputIterator1 output1,
OutputIterator2 output2,
StrictWeakOrdering comp);

AFAICT there's no prior art within Thrust (in the public interface). Internally we used Option 1 for merge_by_key, but I didn't give it much consideration back when.

Forwarded from http://code.google.com/p/thrust/issues/detail?id=393

update cuda::arch constants and calculator

See Section 4.2 of the CUDA Programming Guide [1] and the new CUDA Occupancy Calculator [2]

[1] http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf
[2] http://developer.download.nvidia.com/compute/DevZone/docs/html/C/tools/CUDA_Occupancy_Calculator.xls

I think we might be able to remove the occupancy calculator entirely and make decisions based on cudaFuncAttributes::maxThreadsPerBlock instead.

Forwarded from http://code.google.com/p/thrust/issues/detail?id=382

transform to immutable transform_iterator compiles without error

The following shouldn't be possible:

#include <thrust/iterator/transform_iterator.h>
#include <thrust/host_vector.h>
#include <thrust/functional.h>
#include <thrust/transform.h>

int main()
{
  thrust::host_vector<int> vec;
  thrust::transform(vec.begin(), vec.end(), thrust::make_transform_iterator(vec.begin(), thrust::negate<int>()), thrust::negate<int>());
  return 0;
}

The unary transform functors in internal_functional.h seem to be assigning to the temporary returned by transform_iterator.

Potentially interpret tuple as an index type in zip_iterator

z_iter[(i,j,k)] could potentially make sense if the tuple is the right size and the element types are each convertible to zip_iterator::difference_type

Comment 1 by andrew.corrigan, Sep 16, 2011
I implemented a new iterator called multi_permutation_iterator [1]. It is a new iterator which is like a permutation_iterator but with tuple-valued indexes, and thus dereferences to produce tuples of values from ElementIterator, as opposed to a single value like permutation_iterator. This implements the functionality I described last year, which led to this issue being opened. [2]

As compared to a combination of zip_iterator and multiple permutation_iterators, this new iterator has the advantage of only using one ElementIterator, which can reduce register usage and also parameter size (and thus avoid launch_closure_by_pointer and its memory allocation and copy). This has already helped speed up my code quite a bit, where so far I use it to implement non-contiguous, coalesced multi-dimensional arrays. This multi_permutation_iterator should also help others implement things like finite-difference stencils without excessive register usage, as discussed in [3].

[1] https://code.google.com/r/andrewcorrigan-thrust/source/detail?r=22958c2f47c02cb1a67d8e5c2ad1f788da99f7ac
[2] http://groups.google.com/group/thrust-users/browse_thread/thread/d76e6f7d6206790e/1a7a307e843241f7
[3] http://groups.google.com/group/thrust-users/browse_thread/thread/853579a9bccff08c/33c67c39e7e380a1

Forwarded from http://code.google.com/p/thrust/issues/detail?id=205

compilation error when sorting bools with CUDA backend

The problem is that radix sort is being dispatched, but the underlying radix sort implementation doesn't provide a bool specialization. Since we probably don't want to use radix sort here (it would be unstable for general numerical values in bool storage) we should dispatch partition or stable_partition instead.

Minimal reproducer:

#include <thrust/device_vector.h>
#include <thrust/sort.h>

int main(void)
{
  thrust::device_vector<bool> V(10);
  thrust::sort(V.begin(), V.end());
  return 0;
}

Compiler output:

/home/nathan/NV/thrust/thrust/system/cuda/detail/detail/stable_radix_sort.inl(67): error: incomplete type is not allowed
          detected during:
            instantiation of "void thrust::system::cuda::detail::detail::stable_radix_sort(RandomAccessIterator, RandomAccessIterator) [with RandomAccessIterator=thrust::detail::normal_iterator<thrust::pointer<__nv_bool, thrust::device_system_tag, thrust::use_default, thrust::use_default>>]" 
(62): here
            instantiation of "void thrust::system::cuda::detail::detail::stable_radix_sort(RandomAccessIterator, RandomAccessIterator) [with RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<__nv_bool>>]" 
/home/nathan/NV/thrust/thrust/system/cuda/detail/sort.inl(284): here
            instantiation of "void thrust::system::cuda::detail::first_dispatch::stable_sort(RandomAccessIterator, RandomAccessIterator, StrictWeakOrdering, thrust::detail::true_type) [with RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<__nv_bool>>, StrictWeakOrdering=thrust::less<__nv_bool>]" 
/home/nathan/NV/thrust/thrust/system/cuda/detail/sort.inl(419): here
            instantiation of "void thrust::system::cuda::detail::stable_sort(thrust::system::cuda::detail::tag, RandomAccessIterator, RandomAccessIterator, StrictWeakOrdering) [with RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<__nv_bool>>, StrictWeakOrdering=thrust::less<__nv_bool>]" 
/home/nathan/NV/thrust/thrust/detail/sort.inl(88): here
            instantiation of "void thrust::stable_sort(RandomAccessIterator, RandomAccessIterator, StrictWeakOrdering) [with RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<__nv_bool>>, StrictWeakOrdering=thrust::less<__nv_bool>]" 
/home/nathan/NV/thrust/thrust/system/detail/generic/sort.inl(59): here
            instantiation of "void thrust::system::detail::generic::sort(thrust::system::detail::generic::tag, RandomAccessIterator, RandomAccessIterator, StrictWeakOrdering) [with RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<__nv_bool>>, StrictWeakOrdering=thrust::less<__nv_bool>]" 
/home/nathan/NV/thrust/thrust/detail/sort.inl(60): here
            instantiation of "void thrust::sort(RandomAccessIterator, RandomAccessIterator, StrictWeakOrdering) [with RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<__nv_bool>>, StrictWeakOrdering=thrust::less<__nv_bool>]" 
/home/nathan/NV/thrust/thrust/system/detail/generic/sort.inl(47): here
            instantiation of "void thrust::system::detail::generic::sort(thrust::system::detail::generic::tag, RandomAccessIterator, RandomAccessIterator) [with RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<__nv_bool>>]" 
/home/nathan/NV/thrust/thrust/detail/sort.inl(45): here
            instantiation of "void thrust::sort(RandomAccessIterator, RandomAccessIterator) [with RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<__nv_bool>>]" 
foo.cu(7): here

/home/nathan/NV/thrust/thrust/system/cuda/detail/detail/stable_radix_sort.inl(67): error: incomplete type is not allowed
          detected during:
            instantiation of "void thrust::system::cuda::detail::detail::stable_radix_sort(RandomAccessIterator, RandomAccessIterator) [with RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<__nv_bool>>]" 
/home/nathan/NV/thrust/thrust/system/cuda/detail/sort.inl(284): here
            instantiation of "void thrust::system::cuda::detail::first_dispatch::stable_sort(RandomAccessIterator, RandomAccessIterator, StrictWeakOrdering, thrust::detail::true_type) [with RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<__nv_bool>>, StrictWeakOrdering=thrust::less<__nv_bool>]" 
/home/nathan/NV/thrust/thrust/system/cuda/detail/sort.inl(419): here
            instantiation of "void thrust::system::cuda::detail::stable_sort(thrust::system::cuda::detail::tag, RandomAccessIterator, RandomAccessIterator, StrictWeakOrdering) [with RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<__nv_bool>>, StrictWeakOrdering=thrust::less<__nv_bool>]" 
/home/nathan/NV/thrust/thrust/detail/sort.inl(88): here
            instantiation of "void thrust::stable_sort(RandomAccessIterator, RandomAccessIterator, StrictWeakOrdering) [with RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<__nv_bool>>, StrictWeakOrdering=thrust::less<__nv_bool>]" 
/home/nathan/NV/thrust/thrust/system/detail/generic/sort.inl(59): here
            instantiation of "void thrust::system::detail::generic::sort(thrust::system::detail::generic::tag, RandomAccessIterator, RandomAccessIterator, StrictWeakOrdering) [with RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<__nv_bool>>, StrictWeakOrdering=thrust::less<__nv_bool>]" 
/home/nathan/NV/thrust/thrust/detail/sort.inl(60): here
            instantiation of "void thrust::sort(RandomAccessIterator, RandomAccessIterator, StrictWeakOrdering) [with RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<__nv_bool>>, StrictWeakOrdering=thrust::less<__nv_bool>]" 
/home/nathan/NV/thrust/thrust/system/detail/generic/sort.inl(47): here
            instantiation of "void thrust::system::detail::generic::sort(thrust::system::detail::generic::tag, RandomAccessIterator, RandomAccessIterator) [with RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<__nv_bool>>]" 
/home/nathan/NV/thrust/thrust/detail/sort.inl(45): here
            instantiation of "void thrust::sort(RandomAccessIterator, RandomAccessIterator) [with RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<__nv_bool>>]" 
foo.cu(7): here

2 errors detected in the compilation of "/tmp/tmpxft_00005faf_00000000-4_foo.cpp1.ii".

improve test coverage of tests using pseudo random input

The bug in Google Code issue #80 would have been discovered more easily if our
randomized testing of unique_copy() was more comprehensive. In general, we
should
(1) make our random input portable and not rely on the system's rand()
(2) add a range parameter to random_integers that specifies the range of
random values
(3) add a function repeated_random_integers(N, range, repeats) that
generates sequences of random values where the length of each run is
selected randomly from [1,repeats]

Forwarded from http://code.google.com/p/thrust/issues/detail?id=81

Ambiguous overloads of reinterpret_tag

I run into an ambiguous overload of reinterpret_tag when trying to invoke thrust::copy on thrust::detail::pointer objects which are both tagged with the thrust::system::omp::tag. This error does not occur when both pointers are tagged with thrust::system::cuda::tag. It also does not occur when pointers have different tags (i.e., one is from omp and one from cuda).

#include <thrust/copy.h>
#include <thrust/detail/pointer.h>
#include <thrust/system/omp/memory.h>
#include <thrust/system/cuda/memory.h>

//#define ERROR

int main() {
#ifdef ERROR
    typedef thrust::system::omp::tag my_tag;
#else
    typedef thrust::system::cuda::tag my_tag;
#endif

    thrust::pointer<int, my_tag> a_begin((int*)NULL);
    thrust::pointer<int, my_tag> b_begin((int*)NULL);
    thrust::copy(a_begin, a_begin,
                 b_begin);
}

The error message indicates the problem comes from

/Users/catanzar/thrust/thrust/system/omp/detail/copy.inl(60): error: more than one instance of overloaded function
"thrust::reinterpret_tag" matches the argument list:
    function template "thrust::pointer<T, Tag, thrust::use_default, thrust::use_default> thrust::reinterpret_tag<Tag,T,OtherTag>
        (thrust::pointer<T, OtherTag, thrust::use_default, thrust::use_default>)"
    function template "thrust::detail::enable_if_retaggable<OtherTag, Tag, thrust::pointer<T, Tag, thrust::use_default, 
        thrust::use_default>>::type thrust::reinterpret_tag<Tag,T,OtherTag>(thrust::pointer<T, OtherTag, thrust::use_default,
        thrust::use_default>)"
    argument types are: (thrust::pointer<int, thrust::system::omp::detail::tag, thrust::use_default, thrust::use_default>)

Unused variable warning in testing/copy.cu & testing/copy_n.cu

testing/copy.cu:212:117: warning: variable 'l_result' set but not used [-Wunused-but-set-variable]

testing/copy_n.cu:20:59: warning: variable 'end' set but not used [-Wunused-but-set-variable]

testing/copy_n.cu:176:117: warning: variable 'l_result' set but not used [-Wunused-but-set-variable]

Thrust lacks generic malloc/free

There is no thrust::malloc or thrust::free.

This makes it difficult to allocate (deallocate) tagged memory generically.

WARing this omission forces the user to dip into the private generic namespace and dispatch through ADL.

Consider adding these and exposing them in thrust/memory.h

reported by Bryan Catanzaro

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.