jean-pierreboth / annembed Goto Github PK
View Code? Open in Web Editor NEWdata embedding based on approximate nearest neighbour
License: Other
data embedding based on approximate nearest neighbour
License: Other
Hello, I use the UMAP python library and was wondering if y'all have done any benchmarks to see which is faster? Also, it would be cool to know if both implementations produce the exact same results all else being equal. I use UMAP for work, so if the Rust version of UMAP is significantly faster, I can help make python bindings for annembed
please create a command-line interface that accepts a CSV or CSV file as a source embedding file and applies the umap or hdbscan algorithm on it and writes the results on disk. so the people who don't understand rust also can use your amazing project.
thanks
Hello!
Thanks for making this crate and putting in the effort of emulating UMAP in Rust. I look forward to using this crate in my work, however I would like to use annembed
using a mutual K-NN graph. To clarify, a K-NN graph is a NN graph where edges are only formed between nodes when both nodes are with n-nearest neighbours of each other.
However, I just want to double check that I'm handling the indexing used by KGraph correctly. Here is the function I have currently:
use annembed::fromhnsw::kgraph::KGraph;
use anyhow::Result;
use log::debug;
use num_traits::{FromPrimitive, Float};
use rayon::prelude::*;
/// Take a k-nearest neighbour graph and return a mutual k-nearest neighbour graph
/// A mutual k-nearest neighbour graph is a nearest neighbour graph where edges are only kept if they are mutual
/// i.e. if node A is a nearest neighbour of node B, and node B is a nearest neighbour of node A
pub fn mutual_knn<F: FromPrimitive + Float + std::fmt::UpperExp + Sync + Send + std::iter::Sum>(mut knn_graph: KGraph<F>) -> Result<KGraph<F>> {
let mutual_nodes = knn_graph.get_neighbours()
.into_par_iter()
.enumerate()
.map(|(node, neighbours)| {
let mut mutual_neighbours = Vec::new();
for neighbour in neighbours {
if knn_graph.get_neighbours()[neighbour.node].iter().any(|edge| edge.node == node) {
mutual_neighbours.push(neighbour.clone());
} else {
debug!("Node {} is a neighbour of node {}, but node {} is not a neighbour of node {}", neighbour.node, node, node, neighbour.node)
}
}
Ok(mutual_neighbours)
})
.collect::<Result<Vec<_>>>()?;
knn_graph.neighbours = mutual_nodes;
Ok(knn_graph)
}
My main confusion is, does the index of a node in the neighbours vec of the KGraph correspond the node index stored within the OutEdge
object? Or do I have to retrieve the index using the in built functions of KGraph? Or do these function, namely get_data_id_from_idx
, only apply when you are trying convert the KGraph index back to the index used in the original data?
Sorry for the bombardment of questions, hopefully I've made myself clear enough!
Cheers,
Rhys
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.