Coder Social home page Coder Social logo

dstein64 / kmeans1d Goto Github PK

View Code? Open in Web Editor NEW
44.0 7.0 8.0 63 KB

A Python package for optimal 1D k-means clustering.

Home Page: https://pypi.org/project/kmeans1d/

License: MIT License

Python 34.22% C++ 65.78%
kmeans dynamic-programming optimization

kmeans1d's Introduction

Build Status

kmeans1d

A Python library with an implementation of k-means clustering on 1D data, based on the algorithm from Xiaolin (1991), as presented by Gronlund et al. (2017, Section 2.2).

Globally optimal k-means clustering is NP-hard for multi-dimensional data. Lloyd's algorithm is a popular approach for finding a locally optimal solution. For 1-dimensional data, there are polynomial time algorithms. The algorithm implemented here is an O(kn + n log n) dynamic programming algorithm for finding the globally optimal k clusters for n 1D data points.

The code is written in C++, and wrapped with Python.

Requirements

kmeans1d supports Python 3.x.

Installation

kmeans1d is available on PyPI, the Python Package Index.

$ pip3 install kmeans1d

Example Usage

import kmeans1d

x = [4.0, 4.1, 4.2, -50, 200.2, 200.4, 200.9, 80, 100, 102]
k = 4

clusters, centroids = kmeans1d.cluster(x, k)

print(clusters)   # [1, 1, 1, 0, 3, 3, 3, 2, 2, 2]
print(centroids)  # [-50.0, 4.1, 94.0, 200.5]

Tests

Tests are in tests/.

# Run tests
$ python3 -m unittest discover tests -v

Development

The underlying C++ code can be built in-place, outside the context of pip. This requires Python development tools for building Python modules (e.g., the python3-dev package on Ubuntu). gcc, clang, and MSVC have been tested.

$ python3 setup.py build_ext --inplace

The packages GitHub action can be manually triggered (Actions > packages > Run workflow) to build wheels and a source distribution.

License

The code in this repository has an MIT License.

See LICENSE.

References

[1] Wu, Xiaolin. "Optimal Quantization by Matrix Searching." Journal of Algorithms 12, no. 4 (December 1, 1991): 663

[2] Gronlund, Allan, Kasper Green Larsen, Alexander Mathiasen, Jesper Sindahl Nielsen, Stefan Schneider, and Mingzhou Song. "Fast Exact K-Means, k-Medians and Bregman Divergence Clustering in 1D." ArXiv:1701.07204 [Cs], January 25, 2017. http://arxiv.org/abs/1701.07204.

kmeans1d's People

Contributors

dstein64 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

kmeans1d's Issues

using in c++

what should i do if i need to using the k-means in c++ codes instead of python?

using in c++

what should i do if i need to using the k-means in c++ codes instead of python?

How to save the trained model to predict cluster ids of the test set?

Let's say we have a one dimensional training set feature that has 10.000 instances.

Also, we have a one dimensional test set feature that has 2.000 instances.

How can we save the trained model in order to load it and produce cluster ids for test set?

import random
import kmeans1d


def generate_items(n_items: int) -> list:
    items = list()
    for i in range(0, n_items):
        n = random.randint(1, 1000)
        items.append(n)
    return items


training_feature = generate_items(10000)

k = 10

clusters, centroids = kmeans1d.cluster(training_feature, k)

# save the trained model

# load the trained model

test_feature = generate_items(2000)

# produce cluster ids for test_feature

using in c++

what should i do if i need to using the k-means in c++ codes instead of python?

AttributeError: function 'cluster' not found

(Error)

Executing the sample use of kmeans1d in the README gives this error:

image

(Environment)

kmeans1d version used = 0.3.0
python version = 3.7.4 on Anaconda
OS = Windows 10 Version 2004 OS Build 20180.1000

build fails with Anaconda environment

Hi there,

I got an error like this.

running build
running build_py
creating build
creating build/lib.macosx-10.7-x86_64-3.7
creating build/lib.macosx-10.7-x86_64-3.7/kmeans1d
copying kmeans1d/__init__.py -> build/lib.macosx-10.7-x86_64-3.7/kmeans1d
copying kmeans1d/core.py -> build/lib.macosx-10.7-x86_64-3.7/kmeans1d
copying kmeans1d/version.txt -> build/lib.macosx-10.7-x86_64-3.7/kmeans1d
running build_ext
building 'kmeans1d._core' extension
creating build/temp.macosx-10.7-x86_64-3.7
creating build/temp.macosx-10.7-x86_64-3.7/kmeans1d
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/shingo/anaconda3/include -arch x86_64 -I/Users/shingo/anaconda3/include -arch x86_64 -I/Users/shingo/anaconda3/include/python3.7m -c kmeans1d/_core.cpp -o build/temp.macosx-10.7-x86_64-3.7/kmeans1d/_core.o -std=c++11
warning: include path for stdlibc++ headers not found; pass '-stdlib=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
kmeans1d/_core.cpp:3:10: fatal error: 'algorithm' file not found
#include <algorithm>
         ^~~~~~~~~~~
1 warning and 1 error generated.
error: command 'gcc' failed with exit status 1

The error occurs only with Anaconda python. I got no error when I use PSF made python command.

I found a similar error and its solutions. According to this, I modified setup.py as following.

extension = Extension('kmeans1d._core', ["kmeans1d/_core.cpp"], 
                        include_dirs=['/Library/Developer/CommandLineTools/usr/include/c++/v1'],
                       extra_link_args=["-stdlib=libc++"],
                       extra_compile_args=['-std=c++11'])

Then I successfully build the package with anaconda python.

I'm sending this just for your information.

Does not install with pip for Python 3.11

Currently, kmeans1d does not install for python 3.11 using pip.

It fails to build the c++ library with clang. It could simply be a setup in my clang compiler,
But I think for python 3.10, it didn't need to compile it at all.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.