Coder Social home page Coder Social logo

naereen / kullback-leibler-divergences-and-kl-ucb-indexes Goto Github PK

View Code? Open in Web Editor NEW
9.0 5.0 6.0 139 KB

🐍 🔬 Fast Python implementation of various Kullback-Leibler divergences for 1D and 2D parametric distributions. Also provides optimized code for kl-UCB indexes

Home Page: https://naereen.github.io/Kullback-Leibler-divergences-and-kl-UCB-indexes/docs/index.html

License: MIT License

Jupyter Notebook 23.20% Makefile 0.72% C 4.26% Python 23.18% Julia 8.64% CSS 2.07% JavaScript 2.21% HTML 35.72%
kullback-leibler-divergence kl-ucb bandit-algorithms divergence numba cython python-library

kullback-leibler-divergences-and-kl-ucb-indexes's Introduction

Python implementation of Kullback-Leibler divergences and kl-UCB indexes

This repository contains a small, simple and efficient module, implementing various Kullback-Leibler divergences for parametric 1D or 2D distributions.

Different versions

The same module, with same functions and same specification, is available in different forms:

  1. A naive pure Python implementation, valid for both old Python 2 and Python 3, see kullback_leibler.py,
  2. A pure Python implementation, using numba for automatic speed-up,, see kullback_leibler_numba.py,
  3. A Cython version, using code very-close-to Python and the Cython compiler to automatically build a C version and compile it as a .so or .dll dynamically linked library to be imported as a module from Python, see kullback_leibler_cython.pyx,
  4. A C version, using the C API of CPython, which produces the same thing as the Cython version but is harder to read and work on, see kullback_leibler_c.c.

There is also a Julia version, on this repository.

References

Examples

Simple usage

If the kullback_leibler.py file is accessible in your PATH or in Python's path:

>>> import kullback_leibler
>>> p = 0.5; q = 0.01
>>> kullback_leibler.klBern(p, q)
1.614463...
>>> kullback_leibler.klGauss(q, p)
0.480199...
>>> kullback_leibler.klPoisson(q, p)
0.450879...
>>> kullback_leibler.klExp(q, p)
2.932023...

Vectorized version?

All functions are not vectorized, and assume only one value for each argument. If you want vectorized function, use the wrapper numpy.vectorize:

>>> import numpy as np
>>> klBern_vect = np.vectorize(klBern)
>>> klBern_vect([0.1, 0.5, 0.9], 0.2)
array([0.036..., 0.223..., 1.145...])
>>> klBern_vect(0.4, [0.2, 0.3, 0.4])
array([0.104..., 0.022..., 0...])
>>> klBern_vect([0.1, 0.5, 0.9], [0.2, 0.3, 0.4])
array([0.036..., 0.087..., 0.550...])

For some functions, you would be better off writing a vectorized version manually, for instance if you want to fix a value of some optional parameters:

>>> # WARNING using np.vectorize gave weird result on klGauss
>>> # klGauss_vect = np.vectorize(klGauss, excluded="y")
>>> def klGauss_vect(xs, y, sig2x=0.25):  # vectorized for first input only
...    return np.array([klGauss(x, y, sig2x) for x in xs])
>>> klGauss_vect([-1, 0, 1], 0.1)
array([2.42, 0.02, 1.62])

Documentation

See this file.

With the C extension

If the kullback_leibler_c.so or kullback_leibler_cython.so file is accessible in your PATH or in Python's path:

Small benchmark

Let's compare quickly the 4 different implementations.

First, in an ipython console, import all of them:

$ ipython
...
>>> import kullback_leibler as kl
>>> import kullback_leibler_numba as kl_n
>>> import pyximport; _ = pyximport.install()
>>> import kullback_leibler_cython as kl_cy
>>> import kullback_leibler_c as kl_c
>>> import numpy as np; r = np.random.rand

Then let's compare a single computation of a KL divergence, for instance of two Bernoulli distributions:

>>> %timeit (r(), r())   # don't neglect this "constant"!
728 ns ± 34.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> %timeit kl.klBern(r(), r())
2.42 µs ± 109 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

>>> %timeit kl_n.klBern(r(), r())
1.26 µs ± 143 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> (2420 - 728) / (1260 - 728)  # compute speed-up factor
3.18...

>>> %timeit kl_cy.klBern(r(), r())
933 ns ± 48.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> (2420 - 728) / (933 - 728)  # compute speed-up factor
8.25

>>> %timeit kl_c.klBern(r(), r())
1.09 µs ± 127 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> (2420 - 728) / (1090 - 728)  # compute speed-up factor
4.67

This shows that the Numba version is about 3 times faster than the naive Python version, the Cython version is the fastest with a speed-up of about 8 and the C version is about 5 times faster.

And for kl-UCB indexes, for instance:

>>> %timeit (r(), r())   # don't neglect this "constant"!
743 ns ± 37.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> %timeit kl.klucbBern(r(), r())
28.9 µs ± 154 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> %timeit kl_n.klucbBern(r(), r())
75.8 µs ± 1.44 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> (28900 - 743) / (75800 - 743)
0.375...

>>> %timeit kl_cy.klucbBern(r(), r())
3.65 µs ± 42.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> (28900 - 743) / (3650 - 743)
9.68...

>>> %timeit kl_c.klucbBern(r(), r(), 1e-6)  # needs precision
2.23 µs ± 21.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> (28900 - 743) / (2230 - 743)
18.93...

This shows that the Numba version is about 3 times slower than the naive Python version, the Cython version is about 10 times faster and the C version is the fastest with a speed-up of about 20.

See this notebook: on nbviewever, which also compares with the Julia version.


Install and build

Manually ?

Easy! If you don't care for speed, then only use the pure python version.

Otherwise, you will have to clone this repository, go in the folder, compile, test, and if it works, install it.

cd /tmp/
git clone https://GitHub.com/Naereen/Kullback-Leibler-divergences-and-kl-UCB-indexes
cd Kullback-Leibler-divergences-and-kl-UCB-indexes/src/
make build
make test     # should pass
make install  # mv the build/lib*/*.so files where you need them

Be sure to include the dynamic library when you need it, or in a folder accessible by your Python interpreter (somewhere in sys.path).

  • Cython version: the file is kullback_leibler_cython.so (for Python 2) or the kullback_leibler_cython.cpython-35m-x86_64-linux-gnu.so (for Python 3.5, or higher, adapt the name).
  • C version: the file is kullback_leibler_c.so (for Python 2) or the kullback_leibler_c.cpython-35m-x86_64-linux-gnu.so (for Python 3.5, or higher, adapt the name).

With pip ?

This project is hosted on the Pypi package repository.

sudo pip install kullback_leibler
# test it
python -c "from kullback_leibler import klBern; print(round(klBern(0.1,0.5), 4) == 0.3681)"  # test

kullback_leibler in pypi PyPI implementation PyPI pyversions


Julia implementation ?

I was curious and wanted to write the same algorithm in Julia. Here it is: KullbackLeibler.jl.

The Julia package is published here: Naereen/KullbackLeibler.jl, and see here for its documentation.


About

Languages?

Python v2.7+ or Python v3.4+.

  • Numba can be used to speed up the pure Python version (in kullback_leibler_numba.py). It is purely optional, and the speedup is not that much when using numba (see the notebook for the complete benchmark).
  • Cython is needed to build the C extension (faster) (in kullback_leibler_cython.py).
  • For both the Cython and the C versions, a working version of gcc is required (probably version >= 6.0).

📜 License ? GitHub license

MIT Licensed (file LICENSE). © Lilian Besson, 2018.

Maintenance Ask Me Anything ! Analytics

ForTheBadge uses-badges ForTheBadge uses-git

forthebadge made-with-python ForTheBadge built-with-science

kullback-leibler-divergences-and-kl-ucb-indexes's People

Contributors

naereen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

kullback-leibler-divergences-and-kl-ucb-indexes's Issues

Publish it!

  • Release it publicly on GitHub
  • Host both Python documentation (and Julia ?) thanks to naereen.github.io
  • Host the Python version on pypi
  • On another repo, host the Julia version,
  • and then ask to add it to METADATA.jl
  • Host the Julia documentation online

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.