anikristo / learnedsort Goto Github PK
View Code? Open in Web Editor NEWLearned Sort: a model-enhanced sorting algorithm
License: GNU General Public License v3.0
Learned Sort: a model-enhanced sorting algorithm
License: GNU General Public License v3.0
Describe the bug
The slope calculation in the RMI::train() function (Case 6) has an underflow error that appears for unsigned types.
To reproduce
unsigned int
)min.x - max.x > min.x
due to underflow and roll-over.Describe the bug
When the input size is not divisible by 10^6 (1M), the performance suffers. For example, for INPUT_SIZE=1M, the code executes at 37,174 keys/ms.
However for 1.2M keys, it goes down to 12,332 keys/ms, whereas for 2M it goes back up to 38,461 keys/ms.
To reproduce
Steps to reproduce the behavior:
benchmarks_driver.cc
benchmark_arguments(benchmark::internal::Benchmark *)
in Line 156.static void benchmark_arguments(benchmark::internal::Benchmark *b) {
// Set input size parameter
b->Arg(1E6);
b->Arg(1.2E6);
b->Arg(1.23E6);
b->Arg(1.234E6);
b->Arg(1.2345E6);
b->Arg(2E6);
// Set time measurement unit
b->Unit(benchmark::kMillisecond);
}
Expected behavior
The sorting rate should remain fairly stable throughout the benchmarks.
Environment (please complete the following information):
Describe the bug
In rare occasions, one of the major buckets' size might be less than BATCH_SZ. In this case the second round of bucketization fails because the vector minor_bckt_sizes
is uninitialized. This is due to the fact that the loop condition in line 648 is unmet.
To reproduce
Steps to reproduce the behavior:
std::sort
(line 957)Environment (please complete the following information):
numpages
and pages
fields in the citationDescribe the bug
The RMI training function performs an integer division for slope calculation when the input type is of integer type. This behavior is undesired since the slope and intercept terms of linear models are floating-points.
Describe the bug
As mentioned in #14, I had issues when I tried to benchmark LearnedSort because it did not always yield a sorted collection depending on the input. I managed to write a simple program demonstrating one such cases.
To reproduce
#include <algorithm>
#include <cassert>
#include <functional>
#include <random>
#include <vector>
#include "learned_sort.h"
int main()
{
std::vector<int> vec;
std::mt19937_64 prng(1604922353);
for (int i = 0 ; i < 1'000'000 ; ++i) {
vec.emplace_back(i % 16);
}
std::shuffle(vec.begin(), vec.end(), prng);
learned_sort::sort(vec.begin(), vec.end());
assert(std::is_sorted(vec.begin(), vec.end()));
}
I used a standard library random engine, so the results should be reproducible across platforms.
Expected behavior
I expect this code to run to the end and the assertion not to fire. When I run it the assertion fires, which means that LearnedSort failed to sort the collection.
Environment
I was reading through the code and I noticed on line 208 that when the supplied overallocation_ratio
is invalid (<= 0
) that the value is set to be 1
rather than RMI<T>::Params::DEFAULT_OVERALLOCATION_RATIO
(which is 1.1
). However, on line 211 it is incorrectly indicated on stderr that the overallocation_ratio
is set to the default value.
Expected behavior
Expect the output on stderr
to match what is being set in the parameters.
Describe the bug
The unit tests fail for input sizes that are not divisible by 100.
To reproduce
Steps to reproduce the behavior:
TEST_SIZE = 1234560
in unit_tests/test_driver.cctest.sh
Expected behavior
Tests of any size should pass.
Environment (please complete the following information):
Additional context
The tests pass for TEST_SIZE = 1234500
, which is divisible by 100. THRESHOLD is also set to 100, so one might think it is because the input size is not divisible by the THRESHOLD value. However, even if you set THRESHOLD=50
, tests will also fail for sizes that are divisible by the THRESHOLD (e.g. 1234550).
Currently, Radix Sort only supports double types, therefore the following data types must be also supported:
Describe the bug
The learnedSort algorithm fails to sort the arrays when the keys to be sorted are of uint64_t type.
To reproduce
Steps to reproduce the bug:
Expected behavior
It was expected to have the array output sorted, and ideally beating the performance of other state-of-the-art baselines.
Screenshots
This is a screenshot of the output error:
Environment (please complete the following information):
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.