Coder Social home page Coder Social logo

biovault / hdilib Goto Github PK

View Code? Open in Web Editor NEW
6.0 4.0 3.0 5.15 MB

HDILib is a library for the scalable analysis of large and high-dimensional data.

License: MIT License

CMake 1.85% C++ 73.71% C 22.13% GLSL 0.94% Shell 0.45% Python 0.71% Batchfile 0.09% PowerShell 0.12%

hdilib's Introduction

HDILib: High Dimensional Inspector Library master ci status

HDILib is a library for the scalable analysis of large and high-dimensional data. It contains scalable manifold-learning algorithms, visualizations and visual-analytics frameworks. HDILib is implemented in C++, OpenGL and JavaScript. It is developed within a joint collaboration between the Computer Graphics & Visualization group at the Delft University of Technology and the Division of Image Processing (LKEB) at the Leiden Medical Center.

Authors

  • Nicola Pezzotti initiated the HDI project, developed the A-tSNE and HSNE algorithms and implemented most of the visualizations and frameworks.
  • Thomas Höllt ported the library to MacOS.

Used

HDI is used in the following projects:

  • Cytosplore: interactive system for understanding how the immune system works
  • Brainscope: web portal for fast, interactive visual exploration of the Allen Atlases of the adult and developing human brain transcriptome
  • DeepEyes: progressive analytics system for designing deep neural networks

Reference

Reference to cite when you use HDI in a research paper:

@inproceedings{Pezzotti2016HSNE,
  title={Hierarchical stochastic neighbor embedding},
  author={Pezzotti, Nicola and H{\"o}llt, Thomas and Lelieveldt, Boudewijn PF and Eisemann, Elmar and Vilanova, Anna},
  journal={Computer Graphics Forum},
  volume={35},
  number={3},
  pages={21--30},
  year={2016}
}
@article{Pezzotti2017AtSNE,
  title={Approximated and user steerable tsne for progressive visual analytics},
  author={Pezzotti, Nicola and Lelieveldt, Boudewijn PF and van der Maaten, Laurens and H{\"o}llt, Thomas and Eisemann, Elmar and Vilanova, Anna},
  journal={IEEE transactions on visualization and computer graphics},
  volume={23},
  number={7},
  pages={1739--1752},
  year={2017}
}

Building

GIT Cloning

With the latest git versions you shoule use the following command:

git clone --recurse-submodules https://github.com/biovault/HDILib.git

Requirements

HDILib depends on flann:

  1. Windows build requires: flann >1.8.5
  2. Ubuntu/Mac builds require: flann 1.8.4

Flann can be built from the source but we recommend vcpkg to install it, especially on Windows.

Installing flann

On Windows with vcpkg

.\vcpkg install flann:x64-windows-static

When configuring cmake make sure to setup vcpkg with CMAKE_TOOLCHAIN_FILE (PATH_TO/vcpkg/scripts/buildsystems/vcpkg.cmake) and VCPKG_TARGET_TRIPLET x64-windows-static

On Linux with e.g.

sudo apt-get -y install libflann-dev liblz4-dev

On Mac OS with

brew install flann lz4

Generate the build files

Windows This will produce a HDILib.sln file for VisualStudio. Open the .sln in VisualStudio and build ALL_BUILD for Release or Debug matching the CMAKE_BUILD_TYPE.

cmake -S . -B build -G "Visual Studio 16 2019" -A "x64" -DCMAKE_TOOLCHAIN_FILE=.\build\conan_toolchain.cmake -DVCPKG_TARGET_TRIPLET=x64-windows-static -DCMAKE_INSTALL_PREFIX=install

Linux This will produce a Makefile, other generators like ninja are also possible. Use the make commands e.g. make -j 8 && make install to build and install.

cmake  -S . -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=install -DHDILib_ENABLE_PID=ON -G "Unix Makefiles"

// build and install the libary, independent of generator
cmake --build build --config Release --target install

Macos Tested with Xcode 10.3 & apple-clang 10:

cmake  -S . -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=install

Using the HDILib

The subdirectory test_package builds an exammple that links agains the HDILib binaries. Check the CMakeLists.txt this shows how to consume the HDILib Cmake package.

Find the package

find_package(HDILib COMPONENTS hdiutils hdidata hdidimensionalityreduction PATHS ${HDILib_ROOT} CONFIG REQUIRED)

Consume the package and dependencies (Windows example)

target_link_libraries(example PRIVATE HDI::hdidimensionalityreduction HDI::hdiutils HDI::hdidata ${CMAKE_DL_LIBS})

Applications

A suite of command line and visualization applications is available in the original High Dimensional Inspector repository.

CI/CD process

Conan is used in the CI/CD process to retrieve a prebuilt flann from the lkeb-artifactory and to upload the completed HDILib to the artifactory. The conanfile uses the cmake tool as builder.

CI/CD note on https

OpenSSL in the python libraries does not have a recent list of CA-authorities, that includes the authority for lkeb-artifactory GEANT issued certificate. Therefore it is essential to append the lkeb-artifactory cert.pem to the cert.pem file in the conan home directory for a successful https connection. See the CI scripts for details.

Notes on Conan

CMake compatibility

The conan file uses (starting with conan 1.38) the conan CMakeDeps + CMakeToolchain logic to create a CMake compatible toolchain file. This is a .cmake file containing all the necessary CMake variables for the build.

These variables provided by the toolchain file allow the CMake file to locate the required packages that conan has downloaded.

Build bundle

The conan build creates three versions of the package, Release, Debug and

GitHub Actions status

master ci status

Currently the following build matrix is performed:

OS Architecture Compiler
Windows x64 MSVC 2019
Linux (ubuntu-22.04) x86_64 gcc 11
Macos (12) x86_64 clang 13

DOI

hdilib's People

Contributors

bldrvnlw avatar julianthijssen avatar alxvth avatar thoellt avatar jeggermont avatar n-dekker avatar

Stargazers

Yingkai Song avatar  avatar Ellery Queen avatar  avatar Onur Basak avatar Nicolás Chaves de Plaza avatar

Watchers

James Cloos avatar  avatar Thomas Kroes avatar  avatar

hdilib's Issues

Enable VisualStudio working

Once the export_sources changes had been made (Issue #5 ) it may be possible to get VisualStudio integration working.

Weird hnswlib behaviour

HNSW creates some unexpected result when compared with FLANN and ANNOY. For example, emebdding MNIST with the three knn algorithms using the same settings (perplexity=30, iterations=1000) results in:
image

I think the reason is how points are added in parallel when HNSW build its trees. In their own examples they don't use omp parallel for but a custom parallel for loop.

Using the same approach yields this:
image

Building on Linux

The cmake that sets up the project uses the variable HDI_EXTERNAL_FLANN_INCLUDE_DIR for referencing the FLANN include directory. This variable is sets in the automatic CD/CI build but not when following the install instructions given in the README.

The user has to set this path manually:
cmake -DCMAKE_BUILD_TYPE=Release -DHDI_EXTERNAL_FLANN_INCLUDE_DIR=<PATH/TO/FLANN/> ..
with <PATH/TO/FLANN> might be /usr/include/

Also, since CDRoaring dependencies were recently removed, the install-dependencies.sh does not need to build it anymore, right?

Clarify that the packaged build has shared:False

The HDILib package comprises 3 .lib/.a files (depending on OS) not a shared library. This is created by building with option shared:False.

The current setup of build.py is confusing in this regard, it looks like the default is shared:True but this does not link (due to OpenGL functions)

Also update the readme to show this.

Add unit/system/integration testing

In its current state, the HDI library does not have routines for unit/system/integration testing. Having these in place could offer benefits, and be of great value for quality control. We should come up with a road map on how to achieve this.

Change conan to use exports_sources to facilitate development work

Instead of working via scm or source change the conan setup to exports_sources to work with the current source.

This makes it easier to do development work because a push is not required to see changes before a conan build if build sources are obtained by exports_sources. (See, for example, the latest nptsne feature/1.1.0-hsne which was changed to exports_sources)

Int overflow in void HierarchicalSNE<scalar_type, sparse_scalar_matrix_type>::initializeFirstScale() for large data

In void HierarchicalSNE<scalar_type, sparse_scalar_matrix_type>::initializeFirstScale() there is the danger of an integer overlow in the for loop .

int idx = i * nn + n; will overflow for moderately large data (> ~25M points, perplexity 30).

Even unsigned will be rather limited. It should be considered to also move to 64bit indices wherever indexing into data structures with multiple values per datapoint.

knn_utils.h should be moved to dimensionality_reduction project

knn_utils.h which contains enumerated types for KNN algorithms and KNN algorithm distance measures is currently in the utils project. However, the utils project has no information (e.g. defines) regarding supported KNN algorithms. This is currently all handled in the dimensionality_reduction project.

Moving knn_utils.h from the utils project to the dimensionality reduction project would make it a lot easier to add functions which return which KNN algorithm is supported and which distance measure is supported for which KNN algorithm.

It would be possible to add these functions in the dimensionality_reduction project but this leads to information regarding KNN algorithms to be split over several projects.

CMake Error HDI_EXTERNAL_FLANN_INCLUDE_DIR-NOTFOUND on initial CMake command Conan build

Issue encountered on branch feature/conan-lite commit 2b736dc

The fist time I run cmake for a build with conan, I get an error.

D:\HDILib\_build_debug>cmake .. -G "Visual Studio 16 2019" -DCMAKE_BUILD_TYPE=Debug -DHDILIB_BUILD_WITH_CONAN=ON

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
HDI_EXTERNAL_FLANN_INCLUDE_DIR
used as include directory in directory D:/HDILib/hdi/dimensionality_reduction
used as include directory in directory D:/HDILib/hdi/dimensionality_reduction

Apparently commit b196656 did not fix the entire issue.

When I run the same cmake command again, on the same build directory, the error is gone, fortunately!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.