jefferislab / rann Goto Github PK
View Code? Open in Web Editor NEWR package providing fast nearest neighbour search (wraps ANN library)
Home Page: http://jefferis.github.io/RANN/
R package providing fast nearest neighbour search (wraps ANN library)
Home Page: http://jefferis.github.io/RANN/
I enjoy using the package and wonder if you can add the weight feature into the NN2 function where distances are normalised/adjusted according the weights ? (meanwhile, if there is any hack I can do, I appreciate any comments)
See jefferis/nabor#9
The ANN docs suggest that the Euclidean metric is used by default. Some digging in the code and in above docs suggest that this is also true for RANN
.
According to Section 2.2.1 (p. 14 ff.), only the macros in the code linked above need to be redefined to use other Minkowski norms. To define a code base where Manhattan, Euclidean and "max" metric are simultaneously accessible, this probably needs to be replaced by a template parameter that defines the implementation of the four operators.
The purpose is to accelerate statistical matching using the Gower distance: I think I can transform a "Gower distance problem" to a "Manhattan distance problem" but then RANN
should be able to solve it (it is called by the StatMatch
package).
Does this make sense to you?
the following line leaves my R process unresponsive, no segfault, no 100% cpu:
RANN::nn2(iris[,1:4], treetype = "bd")
treetype="kd"
works fine.
Could you please install
remotes::install_github(c("ropenscilabs/tic", "ropenscilabs/travis"))
and run
tic::use_tic()
If asked to overwrite .travis.yml
, select "yes". You may need to set up a Travis access token with travis::browse_travis_token()
.
Benefits:
I can also do it if you increase my permissions to this repository.
I just noticed {RANN.L1} has been orphaned on CRAN, I'd like to update.
I'm creating a package which requires "RANN.L1" as a dependency. The package "RANN.L1" is therefore listed under Imports:
in the DESCRIPTION
file of the package, and the remote dependency location is specified as jefferis/RANN@master-L1
under Remotes:
in the same file. However, as RANN.L1 is a branch of the RANN repo, the package build under Travis CI fails, presumably as it can't associate the "RANN.L1" import with the "RANN" remote. Any suggestions? Having the jefferis/RANN@master-L1
branch as a separate repo under jefferis/RANN.L1
could be a quick fix? Thanks.
RANN1 (by @krlmlr on top of the original RANN) supports the Manhattan (L1) norm rather than the standard Euclidean (L2) metric. It should be pretty much for release to CRAN โ and I propose that he is the maintainer! Presumably the version numbers should be kept in sync as much as possible.
I am using the version in CRAN, 2.4.1. I have a dataset that can contain objects with distances of zero. When I run a nearest neighbor search for all objects within a specified radius, the indexes can become switched. For example:
nn.idx[70,] = 71 70
instead of the expected 70 71.
I'm not sure if objects with zero distance are not allowed, or if its a bug. I can supply example data if necessary.
Hi, thanks for the nn2 function, it has transformed the efficiency of distance calculations for large datasets. I have a related problem, where I would like to calculate the distance from each point to the closest point on a line. I can convert the line to a set of coordinates and then run nn2, however, this is sort of cheating as it doesn't formally calculate distance to the line that connects each of those coordinates. I just wondered whether you were working on (or could be persuaded to! :)) a function to do this? At the moment dist2line from geosphere is the best option I have, but it is very slow, even if parallelized, on large datasets. I'm working on a function to first find the nearest line, then sample points along that line and run nn2, but its pretty inelegant...
Thanks again!
Hi @jefferis, it would be great if your c++ functions were exported by adding // [[Rcpp::interfaces(r, cpp)]]
to the relevant c++ functions for using your indexing methods.
I've already used those in my projects because it uses one of the fastest implementations of indexing, but currently I'm just copying your code and using it my personal projects.
Thanks a lot!
In page 10 of the manual of ANN (https://www.cs.umd.edu/~mount/ANN/Files/1.1.2/ANNmanual_1.1.pdf) . It mentioned that we can return the number of points within a radius when setting the ANNdist to sqRad and k = 0. This is very useful when we are trying to count the number of points for each sample whose distance is within the radius. Can you help to also return this value?
I think ANNkdFRPtsInRange is actually the value we want. But I don't know how can we return that value in the get_NN_2Set
and nn2
function
int ANNkd_tree::annkFRSearch(
ANNpoint q, // the query point
ANNdist sqRad, // squared radius search bound
int k, // number of near neighbors to return
ANNidxArray nn_idx, // nearest neighbor indices (returned)
ANNdistArray dd, // the approximate nearest neighbor
double eps) // the error bound
{
ANNkdFRDim = dim; // copy arguments to static equivs
ANNkdFRQ = q;
ANNkdFRSqRad = sqRad;
ANNkdFRPts = pts;
ANNkdFRPtsVisited = 0; // initialize count of points visited
ANNkdFRPtsInRange = 0; // ...and points in the range
ANNkdFRMaxErr = ANN_POW(1.0 + eps);
ANN_FLOP(2) // increment floating op count
ANNkdFRPointMK = new ANNmin_k(k); // create set for closest k points
// search starting at the root
root->ann_FR_search(annBoxDistance(q, bnd_box_lo, bnd_box_hi, dim));
for (int i = 0; i < k; i++) { // extract the k-th closest points
if (dd != NULL)
dd[i] = ANNkdFRPointMK->ith_smallest_key(i);
if (nn_idx != NULL)
nn_idx[i] = ANNkdFRPointMK->ith_smallest_info(i);
}
delete ANNkdFRPointMK; // deallocate closest point set
return ANNkdFRPtsInRange; // return final point count
}
On 11 Dec 2018, at 09:00, Prof Brian Ripley [email protected] wrote:
This concerns packages
FUNLDA RANN RJafroc SGL distances mixAK projmanr yaImpute
R-devel has switched to C++11 as the default standard, and your packages failed to compile on Solaris: see
https://www.stats.ox.ac.uk/pub/bdr/Solaris-gcc/ for the installation logs. (g++ 5.2 was used: Solaris is not often
updated ....)You were warned about ERR in 'Writing R Extensions' ยง1.6.4 : it looks like 'EBP' gives another header clash.
For 'distances', you are trying to compile in a sub-directory with C++98 yet the src directory with C++11: they cannot
portably be mixed. Use one or the other for both.Please correct ASAP and before Jan 11 to safely retain the package on CRAN.
This was difficult to do, because the metric is defined via preprocessor macros. I still think we could make it work, by including the library twice in two namespaces. This would be a cleaner approach, one package less to maintain.
input with >0 rows but 0 cols can cause R to crash
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.