The tandem approach in clustering combines dimensionality reduction, which reduces the complexity of data while retaining relevant information, with clustering, which groups similar data point
- Create virtual environment
We are considering venv but feel free to other tools available.
Note: based on what is publicly available, you might need (for python users), package spherecluster. The package is currently being updated. Nevertheless, you can follow the instructions to see the program running.
$ python -m venv tdmenv
$ source tdmenv/bin/activate
$ pip install -r requirements.txt
Let's check results in table below, biomedical, stackoverflow and searchsnippets (top to bottom resp.). We apply on HuggingFace embedding () UMAP dimensionality reduction followed by clustering algorithm like skmeans++ or sphérical-kmeans++. Important results are highlighted.