pythonot / pot Goto Github PK

POT : Python Optimal Transport

License: MIT License

Python 99.44% Makefile 0.15% Cython 0.41%

optimal-transport numerical-optimization machine-learning emd ot-mapping-estimation wasserstein-barycenter ot-solver python wasserstein wasserstein-discriminant-analysis

pot's Introduction

POT: Python Optimal Transport

This open source Python library provides several solvers for optimization problems related to Optimal Transport for signal, image processing and machine learning.

Website and documentation: https://PythonOT.github.io/

Source Code (MIT): https://github.com/PythonOT/POT

POT provides the following generic OT solvers (links to examples):

OT Network Simplex solver for the linear program/ Earth Movers Distance [1] .
Conditional gradient [6] and Generalized conditional gradient for regularized OT [7].
Entropic regularization OT solver with Sinkhorn Knopp Algorithm [2] , stabilized version [9] [10] [34], lazy CPU/GPU solver from geomloss [60] [61], greedy Sinkhorn [22] and Screening Sinkhorn [26] .
Bregman projections for Wasserstein barycenter [3], convolutional barycenter [21] and unmixing [4].
Sinkhorn divergence [23] and entropic regularization OT from empirical data.
Debiased Sinkhorn barycenters Sinkhorn divergence barycenter [37]
Smooth optimal transport solvers (dual and semi-dual) for KL and squared L2 regularizations [17].
Weak OT solver between empirical distributions [39]
Non regularized Wasserstein barycenters [16] with LP solver (only small scale).
Gromov-Wasserstein distances and GW barycenters (exact [13] and regularized [12,51]), differentiable using gradients from Graph Dictionary Learning [38]
Fused-Gromov-Wasserstein distances solver and FGW barycenters (exact [24] and regularized [12,51]).
Stochastic solver and differentiable losses for Large-scale Optimal Transport (semi-dual problem [18] and dual problem [19])
Sampled solver of Gromov Wasserstein for large-scale problem with any loss functions [33]
Non regularized free support Wasserstein barycenters [20].
One dimensional Unbalanced OT with KL relaxation and barycenter [10, 25]. Also exact unbalanced OT with KL and quadratic regularization and the regularization path of UOT [41]
Partial Wasserstein and Gromov-Wasserstein (exact [29] and entropic [3] formulations).
Sliced Wasserstein [31, 32] and Max-sliced Wasserstein [35] that can be used for gradient flows [36].
Wasserstein distance on the circle [44, 45]
Spherical Sliced Wasserstein [46]
Graph Dictionary Learning solvers [38].
Semi-relaxed (Fused) Gromov-Wasserstein divergences with corresponding barycenter solvers (exact and regularized [48]).
Quantized (Fused) Gromov-Wasserstein distances [68].
Efficient Discrete Multi Marginal Optimal Transport Regularization [50].
Several backends for easy use of POT with Pytorch/jax/Numpy/Cupy/Tensorflow arrays.
Smooth Strongly Convex Nearest Brenier Potentials [58], with an extension to bounding potentials using [59].
Gaussian Mixture Model OT [69]

POT provides the following Machine Learning related solvers:

Optimal transport for domain adaptation with group lasso regularization, Laplacian regularization [5] [30] and semi supervised setting.
Linear OT mapping [14] and Joint OT mapping estimation [8].
Wasserstein Discriminant Analysis [11] (requires autograd + pymanopt).
JCPOT algorithm for multi-source domain adaptation with target shift [27].
Graph Neural Network OT layers TFGW [52] and TW (OT-GNN) [53]

Some other examples are available in the documentation.

Using and citing the toolbox

If you use this toolbox in your research and find it useful, please cite POT using the following reference from our JMLR paper:

Rémi Flamary, Nicolas Courty, Alexandre Gramfort, Mokhtar Z. Alaya, Aurélie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, Léo Gautheron, Nathalie T.H. Gayraud, Hicham Janati, Alain Rakotomamonjy, Ievgen Redko, Antoine Rolet, Antony Schutz, Vivien Seguy, Danica J. Sutherland, Romain Tavenard, Alexander Tong, Titouan Vayer,
POT Python Optimal Transport library,
Journal of Machine Learning Research, 22(78):1−8, 2021.
Website: https://pythonot.github.io/

In Bibtex format:

@article{flamary2021pot,
  author  = {R{\'e}mi Flamary and Nicolas Courty and Alexandre Gramfort and Mokhtar Z. Alaya and Aur{\'e}lie Boisbunon and Stanislas Chambon and Laetitia Chapel and Adrien Corenflos and Kilian Fatras and Nemo Fournier and L{\'e}o Gautheron and Nathalie T.H. Gayraud and Hicham Janati and Alain Rakotomamonjy and Ievgen Redko and Antoine Rolet and Antony Schutz and Vivien Seguy and Danica J. Sutherland and Romain Tavenard and Alexander Tong and Titouan Vayer},
  title   = {POT: Python Optimal Transport},
  journal = {Journal of Machine Learning Research},
  year    = {2021},
  volume  = {22},
  number  = {78},
  pages   = {1-8},
  url     = {http://jmlr.org/papers/v22/20-451.html}
}

Installation

The library has been tested on Linux, MacOSX and Windows. It requires a C++ compiler for building/installing the EMD solver and relies on the following Python modules:

Numpy (>=1.16)
Scipy (>=1.0)
Cython (>=0.23) (build only, not necessary when installing from pip or conda)

Pip installation

You can install the toolbox through PyPI with:

pip install POT

or get the very latest version by running:

pip install -U https://github.com/PythonOT/POT/archive/master.zip # with --user for user install (no root)

Optional dependencies may be installed with

pip install POT[all]

Note that this installs cvxopt, which is licensed under GPL 3.0. Alternatively, if you cannot use GPL-licensed software, the specific optional dependencies may be installed individually, or per-submodule. The available optional installations are backend-jax, backend-tf, backend-torch, cvxopt, dr, gnn, all.

Anaconda installation with conda-forge

If you use the Anaconda python distribution, POT is available in conda-forge. To install it and the required dependencies:

conda install -c conda-forge pot

Post installation check

After a correct installation, you should be able to import the module without errors:

import ot

Note that for easier access the module is named ot instead of pot.

Dependencies

Some sub-modules require additional dependencies which are discussed below

ot.dr (Wasserstein dimensionality reduction) depends on autograd and pymanopt that can be installed with:

pip install pymanopt autograd

Examples

Short examples

Import the toolbox

import ot

Compute Wasserstein distances

# a,b are 1D histograms (sum to 1 and positive)
# M is the ground cost matrix
Wd = ot.emd2(a, b, M) # exact linear program
Wd_reg = ot.sinkhorn2(a, b, M, reg) # entropic regularized OT
# if b is a matrix compute all distances to a and return a vector

Compute OT matrix

# a,b are 1D histograms (sum to 1 and positive)
# M is the ground cost matrix
T = ot.emd(a, b, M) # exact linear program
T_reg = ot.sinkhorn(a, b, M, reg) # entropic regularized OT

Compute Wasserstein barycenter

# A is a n*d matrix containing d  1D histograms
# M is the ground cost matrix
ba = ot.barycenter(A, M, reg) # reg is regularization parameter

Examples and Notebooks

The examples folder contain several examples and use case for the library. The full documentation with examples and output is available on https://PythonOT.github.io/.

Acknowledgements

This toolbox has been created by

It is currently maintained by

The numerous contributors to this library are listed here.

POT has benefited from the financing or manpower from the following partners:

Contributions and code of conduct

Every contribution is welcome and should respect the contribution guidelines. Each member of the project is expected to follow the code of conduct.

Support

You can ask questions and join the development discussion:

On the POT slack channel
On the POT gitter channel
On the POT mailing list

You can also post bug reports and feature requests in Github issues. Make sure to read our guidelines first.

References

[1] Bonneel, N., Van De Panne, M., Paris, S., & Heidrich, W. (2011, December). Displacement interpolation using Lagrangian mass transport. In ACM Transactions on Graphics (TOG) (Vol. 30, No. 6, p. 158). ACM.

[2] Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems (pp. 2292-2300).

[3] Benamou, J. D., Carlier, G., Cuturi, M., Nenna, L., & Peyré, G. (2015). Iterative Bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37(2), A1111-A1138.

[4] S. Nakhostin, N. Courty, R. Flamary, D. Tuia, T. Corpetti, Supervised planetary unmixing with optimal transport, Workshop on Hyperspectral Image and Signal Processing : Evolution in Remote Sensing (WHISPERS), 2016.

[5] N. Courty; R. Flamary; D. Tuia; A. Rakotomamonjy, Optimal Transport for Domain Adaptation, in IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.PP, no.99, pp.1-1

[6] Ferradans, S., Papadakis, N., Peyré, G., & Aujol, J. F. (2014). Regularized discrete optimal transport. SIAM Journal on Imaging Sciences, 7(3), 1853-1882.

[7] Rakotomamonjy, A., Flamary, R., & Courty, N. (2015). Generalized conditional gradient: analysis of convergence and applications. arXiv preprint arXiv:1510.06567.

[8] M. Perrot, N. Courty, R. Flamary, A. Habrard (2016), Mapping estimation for discrete optimal transport, Neural Information Processing Systems (NIPS).

[9] Schmitzer, B. (2016). Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems. arXiv preprint arXiv:1610.06519.

[10] Chizat, L., Peyré, G., Schmitzer, B., & Vialard, F. X. (2016). Scaling algorithms for unbalanced transport problems. arXiv preprint arXiv:1607.05816.

[11] Flamary, R., Cuturi, M., Courty, N., & Rakotomamonjy, A. (2016). Wasserstein Discriminant Analysis. arXiv preprint arXiv:1608.08063.

[12] Gabriel Peyré, Marco Cuturi, and Justin Solomon (2016), Gromov-Wasserstein averaging of kernel and distance matrices International Conference on Machine Learning (ICML).

[13] Mémoli, Facundo (2011). Gromov–Wasserstein distances and the metric approach to object matching. Foundations of computational mathematics 11.4 : 417-487.

[14] Knott, M. and Smith, C. S. (1984).On the optimal mapping of distributions, Journal of Optimization Theory and Applications Vol 43.

[15] Peyré, G., & Cuturi, M. (2018). Computational Optimal Transport .

[16] Agueh, M., & Carlier, G. (2011). Barycenters in the Wasserstein space. SIAM Journal on Mathematical Analysis, 43(2), 904-924.

[17] Blondel, M., Seguy, V., & Rolet, A. (2018). Smooth and Sparse Optimal Transport. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics (AISTATS).

[18] Genevay, A., Cuturi, M., Peyré, G. & Bach, F. (2016) Stochastic Optimization for Large-scale Optimal Transport. Advances in Neural Information Processing Systems (2016).

[19] Seguy, V., Bhushan Damodaran, B., Flamary, R., Courty, N., Rolet, A.& Blondel, M. Large-scale Optimal Transport and Mapping Estimation. International Conference on Learning Representation (2018)

[20] Cuturi, M. and Doucet, A. (2014) Fast Computation of Wasserstein Barycenters. International Conference in Machine Learning

[21] Solomon, J., De Goes, F., Peyré, G., Cuturi, M., Butscher, A., Nguyen, A. & Guibas, L. (2015). Convolutional wasserstein distances: Efficient optimal transportation on geometric domains. ACM Transactions on Graphics (TOG), 34(4), 66.

[22] J. Altschuler, J.Weed, P. Rigollet, (2017) Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration, Advances in Neural Information Processing Systems (NIPS) 31

[23] Aude, G., Peyré, G., Cuturi, M., Learning Generative Models with Sinkhorn Divergences, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, (AISTATS) 21, 2018

[24] Vayer, T., Chapel, L., Flamary, R., Tavenard, R. and Courty, N. (2019). Optimal Transport for structured data with application on graphs Proceedings of the 36th International Conference on Machine Learning (ICML).

[25] Frogner C., Zhang C., Mobahi H., Araya-Polo M., Poggio T. (2015). Learning with a Wasserstein Loss Advances in Neural Information Processing Systems (NIPS).

[26] Alaya M. Z., Bérar M., Gasso G., Rakotomamonjy A. (2019). Screening Sinkhorn Algorithm for Regularized Optimal Transport, Advances in Neural Information Processing Systems 33 (NeurIPS).

[27] Redko I., Courty N., Flamary R., Tuia D. (2019). Optimal Transport for Multi-source Domain Adaptation under Target Shift, Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (AISTATS) 22, 2019.

[28] Caffarelli, L. A., McCann, R. J. (2010). Free boundaries in optimal transport and Monge-Ampere obstacle problems, Annals of mathematics, 673-730.

[29] Chapel, L., Alaya, M., Gasso, G. (2020). Partial Optimal Transport with Applications on Positive-Unlabeled Learning, Advances in Neural Information Processing Systems (NeurIPS), 2020.

[30] Flamary R., Courty N., Tuia D., Rakotomamonjy A. (2014). Optimal transport with Laplacian regularization: Applications to domain adaptation and shape matching, NIPS Workshop on Optimal Transport and Machine Learning OTML, 2014.

[31] Bonneel, Nicolas, et al. Sliced and radon wasserstein barycenters of measures, Journal of Mathematical Imaging and Vision 51.1 (2015): 22-45

[32] Huang, M., Ma S., Lai, L. (2021). A Riemannian Block Coordinate Descent Method for Computing the Projection Robust Wasserstein Distance, Proceedings of the 38th International Conference on Machine Learning (ICML).

[33] Kerdoncuff T., Emonet R., Marc S. Sampled Gromov Wasserstein, Machine Learning Journal (MJL), 2021

[34] Feydy, J., Séjourné, T., Vialard, F. X., Amari, S. I., Trouvé, A., & Peyré, G. (2019, April). Interpolating between optimal transport and MMD using Sinkhorn divergences. In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 2681-2690). PMLR.

[35] Deshpande, I., Hu, Y. T., Sun, R., Pyrros, A., Siddiqui, N., Koyejo, S., ... & Schwing, A. G. (2019). Max-sliced wasserstein distance and its use for gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10648-10656).

[36] Liutkus, A., Simsekli, U., Majewski, S., Durmus, A., & Stöter, F. R. (2019, May). Sliced-Wasserstein flows: Nonparametric generative modeling via optimal transport and diffusions. In International Conference on Machine Learning (pp. 4104-4113). PMLR.

[37] Janati, H., Cuturi, M., Gramfort, A. Debiased sinkhorn barycenters Proceedings of the 37th International Conference on Machine Learning, PMLR 119:4692-4701, 2020

[38] C. Vincent-Cuaz, T. Vayer, R. Flamary, M. Corneli, N. Courty, Online Graph Dictionary Learning, International Conference on Machine Learning (ICML), 2021.

[39] Gozlan, N., Roberto, C., Samson, P. M., & Tetali, P. (2017). Kantorovich duality for general transport costs and applications. Journal of Functional Analysis, 273(11), 3327-3405.

[40] Forrow, A., Hütter, J. C., Nitzan, M., Rigollet, P., Schiebinger, G., & Weed, J. (2019, April). Statistical optimal transport via factored couplings. In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 2454-2465). PMLR.

[41] Chapel*, L., Flamary*, R., Wu, H., Févotte, C., Gasso, G. (2021). Unbalanced Optimal Transport through Non-negative Penalized Linear Regression Advances in Neural Information Processing Systems (NeurIPS), 2020. (Two first co-authors)

[42] Delon, J., Gozlan, N., and Saint-Dizier, A. Generalized Wasserstein barycenters between probability measures living on different subspaces. arXiv preprint arXiv:2105.09755, 2021.

[43] Álvarez-Esteban, Pedro C., et al. A fixed-point approach to barycenters in Wasserstein space. Journal of Mathematical Analysis and Applications 441.2 (2016): 744-762.

[44] Delon, Julie, Julien Salomon, and Andrei Sobolevski. Fast transport optimization for Monge costs on the circle. SIAM Journal on Applied Mathematics 70.7 (2010): 2239-2258.

[45] Hundrieser, Shayan, Marcel Klatt, and Axel Munk. The statistics of circular optimal transport. Directional Statistics for Innovative Applications: A Bicentennial Tribute to Florence Nightingale. Singapore: Springer Nature Singapore, 2022. 57-82.

[46] Bonet, C., Berg, P., Courty, N., Septier, F., Drumetz, L., & Pham, M. T. (2023). Spherical Sliced-Wasserstein. International Conference on Learning Representations.

[47] Chowdhury, S., & Mémoli, F. (2019). The gromov–wasserstein distance between networks and stable network invariants. Information and Inference: A Journal of the IMA, 8(4), 757-787.

[48] Cédric Vincent-Cuaz, Rémi Flamary, Marco Corneli, Titouan Vayer, Nicolas Courty (2022). Semi-relaxed Gromov-Wasserstein divergence and applications on graphs. International Conference on Learning Representations (ICLR), 2022.

[49] Redko, I., Vayer, T., Flamary, R., and Courty, N. (2020). CO-Optimal Transport. Advances in Neural Information Processing Systems, 33.

[50] Liu, T., Puigcerver, J., & Blondel, M. (2023). Sparsity-constrained optimal transport. Proceedings of the Eleventh International Conference on Learning Representations (ICLR).

[51] Xu, H., Luo, D., Zha, H., & Duke, L. C. (2019). Gromov-wasserstein learning for graph matching and node embedding. In International Conference on Machine Learning (ICML), 2019.

[52] Collas, A., Vayer, T., Flamary, F., & Breloy, A. (2023). Entropic Wasserstein Component Analysis. ArXiv.

[53] C. Vincent-Cuaz, R. Flamary, M. Corneli, T. Vayer, N. Courty (2022). Template based graph neural network with optimal transport distances. Advances in Neural Information Processing Systems, 35.

[54] Bécigneul, G., Ganea, O. E., Chen, B., Barzilay, R., & Jaakkola, T. S. (2020). Optimal transport graph neural networks.

[55] Ronak Mehta, Jeffery Kline, Vishnu Suresh Lokhande, Glenn Fung, & Vikas Singh (2023). Efficient Discrete Multi Marginal Optimal Transport Regularization. In The Eleventh International Conference on Learning Representations (ICLR).

[56] Jeffery Kline. Properties of the d-dimensional earth mover’s problem. Discrete Applied Mathematics, 265: 128–141, 2019.

[57] Delon, J., Desolneux, A., & Salmona, A. (2022). Gromov–Wasserstein distances between Gaussian distributions. Journal of Applied Probability, 59(4), 1178-1198.

[58] Paty F-P., d’Aspremont 1., & Cuturi M. (2020). Regularity as regularization:Smooth and strongly convex brenier potentials in optimal transport. In International Conference on Artificial Intelligence and Statistics, pages 1222–1232. PMLR, 2020.

[59] Taylor A. B. (2017). Convex interpolation and performance estimation of first-order methods for convex optimization. PhD thesis, Catholic University of Louvain, Louvain-la-Neuve, Belgium, 2017.

[60] Feydy, J., Roussillon, P., Trouvé, A., & Gori, P. (2019). Fast and scalable optimal transport for brain tractograms. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part III 22 (pp. 636-644). Springer International Publishing.

[61] Charlier, B., Feydy, J., Glaunes, J. A., Collin, F. D., & Durif, G. (2021). Kernel operations on the gpu, with autodiff, without memory overflows. The Journal of Machine Learning Research, 22(1), 3457-3462.

[62] H. Van Assel, C. Vincent-Cuaz, T. Vayer, R. Flamary, N. Courty (2023). Interpolating between Clustering and Dimensionality Reduction with Gromov-Wasserstein. NeurIPS 2023 Workshop Optimal Transport and Machine Learning.

[63] Li, J., Tang, J., Kong, L., Liu, H., Li, J., So, A. M. C., & Blanchet, J. (2022). A Convergent Single-Loop Algorithm for Relaxation of Gromov-Wasserstein in Graph Data. In The Eleventh International Conference on Learning Representations.

[64] Ma, X., Chu, X., Wang, Y., Lin, Y., Zhao, J., Ma, L., & Zhu, W. (2023). Fused Gromov-Wasserstein Graph Mixup for Graph-level Classifications. In Thirty-seventh Conference on Neural Information Processing Systems.

[65] Scetbon, M., Cuturi, M., & Peyré, G. (2021). Low-Rank Sinkhorn Factorization.

[66] Pooladian, Aram-Alexandre, and Jonathan Niles-Weed. Entropic estimation of optimal transport maps. arXiv preprint arXiv:2109.12004 (2021).

[67] Scetbon, M., Peyré, G. & Cuturi, M. (2022). Linear-Time Gromov-Wasserstein Distances using Low Rank Couplings and Costs. In International Conference on Machine Learning (ICML), 2022.

[68] Chowdhury, S., Miller, D., & Needham, T. (2021). Quantized gromov-wasserstein. ECML PKDD 2021. Springer International Publishing.

[69] Delon, J., & Desolneux, A. (2020). A Wasserstein-type distance in the space of Gaussian mixture models. SIAM Journal on Imaging Sciences, 13(2), 936-970.

pot's People

Contributors

Stargazers

Watchers

Forkers

arakotom agramfort yin-shane-xia cfwen ngayraud ajaytalati helloworldwq alfio python3pkg bbdamodaran arolet djsutherland calebium slasnista yaojiebao ymustc dohmatob patricieni gxdai monty47 yw81 lehaifeng kudiyar devermyst philotuxo cuptea ahoyosid zhangjuju rachelzheng kadeng bengepai yochju kowshikthopalli tardyb hulalazz helenligit nmonath psuarezserrato wanpeng16 kilianfatras yx-s-z tvayer vishalbelsare vivienseguy leogautheron dephiehuang shlpu rafaelmri jakirkham arita37 peterouzh css1995 linzehua aboisbunon jdmartin86 chaozhang-zju gakkilovemath hugcis daniely-tracxpoint pkomiske kkdeng twitwi gustavocarita kwonoh mafuguo mohitzsh tmanole vfdev-5 loic001 grapefroot emited devmessias z2007c dmelis vaishgajaraj hichamjanati benjaminleroy stjordanis mbp28 csbioazim rtavenar thoamsdong soso128 wangyongguang scutjinchengli xieyujia flamato mengbinghen ahcheriet galerkin yimzhai3 rupsabasu markorajkovic nuraiman csyanbin 717ct zhlzhl benjamin-lucas afcarl lyndonckz

pot's Issues

Possible bugs in greenkhorn algorithm

Describe the bug

Whenever greenkhorn is called with log=True, the algorithm will return error.
greenkhorn does not allow for list input (I have to convert a, b, M as np.array manually), while sinkhorn_knopp does (in sinkhorn_knopp, the first three lines of code convert list to np.asarray).

To Reproduce
Steps to reproduce the behavior:

a = [.5, .5]
b = [.5, .5]
M = [[0., 1.], [1., 0.]]
a, b, M = np.array(a), np.array(b), np.array(M)
(greenkhorn will not allow list object, and have to convert manually)
T, log = ot.bregman.greenkhorn(a, b, M, 0.001, log=True)
(this line encounters error)

Expected behavior
It seems that I can never return the log dictionary using greenkhorn, and I am not sure why.
When running original sinkhorn, the log argument works fine.
Besides, I hope greenkhorn could allow for list as input (this is not an bug, but maybe it could be added in the future).

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: MacOSX
Python version [2.7,3.6]: 3.6
How was POT installed [source, pip, conda]: conda

Output of the following code snippet:

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import ot; print("POT", ot.__version__)

import platform; print(platform.platform())
Darwin-18.2.0-x86_64-i386-64bit
import sys; print("Python", sys.version)
Python 3.6.7 |Anaconda custom (64-bit)| (default, Oct 23 2018, 14:01:38)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
import numpy; print("NumPy", numpy.version)
NumPy 1.15.4
import scipy; print("SciPy", scipy.version)
SciPy 1.1.0
import ot; print("POT", ot.version)
POT 0.5.1

Additional context
Add any other context about the problem here.

Error: ndarray is not C-contiguous

Hello,

I am trying to compute ot.emd2() distances between two histograms and for some reason, it fails with this error. I have managed to compute between other histograms so I am wondering what might be wrong with my criteria.

They satisfy the constraint of sum = 1 - which is the only one i'm aware of?

I can fix the problem using np.ascontiguousarray() but I'm trying to get an intuition if I'm doing something wrong to begin with.

GENERALIZED CONDITIONAL GRADIENT FOR SOLVING REGULARIZED OT PROBLEMS

Could you please tell me where is the solver for GCG? I've been searched for a while but couldn't find it. Thank you

sphinx-gallery

you should use https://github.com/sphinx-gallery/sphinx-gallery to generate your example gallery.

you would have the notebooks for free with download links at the bottom of the page.

also you build the doc I had to comment out this in conf.py :

# sys.path.insert(0, os.path.abspath("../.."))
#sys.setrecursionlimit(1500)



# class Mock(MagicMock):
#     @classmethod
#     def __getattr__(cls, name):
#         return Mock()

# MOCK_MODULES = [ 'emd','ot.lp.emd']
# sys.modules.update((mod_name, Mock()) for mod_name in MOCK_MODULES)

thanks for making these tools easily available !

Getting Error while computing sinkhorn distance

Got below error :

/lib/python2.7/site-packages/ot/bregman.py:347: RuntimeWarning: invalid value encountered in multiply
Kp = (1 / a).reshape(-1, 1) * K
('Warning: numerical errors at iteration', 0)

Command:
ot.sinkhorn(a=input_vector, b=output_vector, M=distance_matrix, reg=0.01, verbose=True)

Details :
input_vector.shape : (8342,) [Sums upto 1]
output_vector.shape : (8342,) [Sums upto 1]
distance_matrix.shape : (8342,8342) [Euclidean distance]

What could be possible issue here. Please assist.

ot.sinkhorn returns OT matrix. How can we convert it to single number which is equivalent of distance.

Laplacian regularization

In paper[5] Optimal Transport for Domain Adaptation, you used laplacian regularization. But I'm not sure how do we get matrix S which is a similarity matrix? Is there any tutorial related to this?

Thanks for the help.

I have successfully installed the POT library (in windows OS), but I have issue with the emd function

In the following example, the emd function works well:

a =[0.5, 0.5], b= [0.5, 0.5], M =[[0., 1.], [1., 0.]]
G0 = ot.emd(a,b,M), G0 =array([[ 0.5, 0. ], [ 0. , 0.5]])

In the below example (failure case), the output of emd function is of zero's

a = [0.5, 0.5], b = [0.2, 0.8]
G0 = ot.emd(a,b,M) G0 = array([[ 0., 0.], [ 0., 0.]])

Failure case on the example mentioned in the documentation (Demo_1D_OT.ipynb), again the output of emd is a matrix of zero's (see the figure)

n=100, a= ot.datasets.get_1D_gauss(n, m=20, s=5), b= ot.datasets.get_1D_gauss(n, m=60, s=10), x=np.arange(n, dtype=np.float64), M = ot.dist(x.reshape((n,1)), x.reshape((n,1))), M/=M.max(), G0=ot.emd(a,b,M)

%matplotlib inline, pl.figure(1), ot.plot.plot1D_mat(a,b,G0, 'OT Matrix G0'), pl.show()

Please let me know how to resolve the issue

Sklearn compliant datasets functions

We should rename the datasets.get_* function to datasets.get_* in order to be more sklearn compliant.

Also it should be possible to give the rng as input as in sklearn.

Barycentric mapping and label propagation

Could anyone tell me that how do you transfer the label after using barycentric mapping? Cause you don't have labels for target, you would like to transfer source labels to the mapping points?

Is this available in some part of the code ? Thank you so much.

OT.sinkhorn, error when an input array contain zeros

I'm getting the following error
Warning: numerical errors at iteration 0
when calling
d_sinkhorn = ot.sinkhorn2(v1, v2, cm, reg)
and v1 or v2 contain zeros.

How to handle this case?
Thanks

Shape mismatch for stabilized sinkhorn with multi-distributions

Describe the bug
The following script gives a shape mismatch error when computing sinkhorn2 with stabilization and many dists.

  File "/Users/hichamjanati/Documents/github/forks/POT/ot/bregman.py", line 774, in sinkhorn_stabilized
    log['logu'] = alpha / reg + np.log(u)
ValueError: operands could not be broadcast together with shapes (100,) (100,2)

To Reproduce

import numpy as np
import ot
from ot.bregman import sinkhorn2


n = 100
x = np.arange(n, dtype=np.float64)

# Gaussian distributions
a = ot.datasets.make_1D_gauss(n, m=20, s=5)  # m= mean, s= std
b1 = ot.datasets.make_1D_gauss(n, m=60, s=8)
b2 = ot.datasets.make_1D_gauss(n, m=30, s=4)

# creating matrix A containing all distributions
b = np.vstack((b1, b2)).T

M = ot.utils.dist0(n)
M /= np.median(M)
epsilon = 0.1

w_stable, log = sinkhorn2(a, b, M, epsilon, method="sinkhorn_stabilized",
                          log=True)

Fix

Basically when log=True, the actual code does not take into account the case where b contains many distributions. The if nbb should be moved up before computing the dual
variables.

bregman.py

    if log:
        log['logu'] = alpha / reg + np.log(u)
        log['logv'] = beta / reg + np.log(v)
        log['alpha'] = alpha + reg * np.log(u)
        log['beta'] = beta + reg * np.log(v)
        log['warmstart'] = (log['alpha'], log['beta'])
        if nbb:
            res = np.zeros((nbb))
            for i in range(nbb):
                res[i] = np.sum(get_Gamma(alpha, beta, u[:, i], v[:, i]) * M)
            return res, log

I can make a tiny PR with an additional test if you want.

Doc GPU implementation

We need to have a proper doicumentation in the GPU implementation module

copy sinkhorn and OTDA class doc
proper doc for the pairwiseEuclideanGPU with formating

Use target class proportions in transport.

Hi,

I want to use the sinkhorn transport and the two regularization method), with different estimation of the target class proportions, similar to the work done here https://hal.archives-ouvertes.fr/hal-01254329/file/OT-multitemp2015-paper.pdf.
For now, the only estimation available is the uniform one, if I not missed something.
In the deprecated classes as OTDA_lpl1, it is possible to custom the weights used (with the ws parameter).
So my questions are:

Why the current transport classes no longer allow the use of customised weights?
There are other classes allowing using these parameters?
There is any kind of issue on the transport estimation by using estimated proportions?

Best regards,
Benjamin.

fail when using "pip install POT"

When I use "pip install POT", it failed. It depended on Cython. However, it seems that it forgets to tell pip that it depends on Cython.

I solve this problem by install Cython first. However, if we write both Cython and POT into requirements.txt, the installation will fail.

Could anyone solve that?

remove import plot from ot/init.py

Hello,

Would it be possible to remove line 19 from . import plot from ot/init.py ?

It automatically loads matplotlib which could generates an error when using an instance without graphical display.

Thanks in advance !

Feature request:- convolution Wasserstein distances

Hi, I wish to request to add the code from the paper- http://people.csail.mit.edu/jsolomon/assets/convolutional_w2.compressed.pdf
Their matlab code is here https://github.com/gpeyre/2015-SIGGRAPH-convolutional-ot.git

Thanks
Kowshik

Maximum input sample size

I have two data samples, each of size 100k, from two distributions in the 50-dimensional space, say n = 100k, p = 50. Can I use this OT library to compute the earth-mover distance between these two empirical data samples?

Gromov-Wasserstein Distance between 1-D vectors

Gromov-Wasserstein fails when the cost matrices are slightly different

Describe the bug
The .gromov.gromov wasserstein method fails (TypeError) when the cost matrices are very similar but not the same

To Reproduce
The full code is available at
https://colab.research.google.com/drive/1IhnOqeLV51gWE8FodnBsgR5cQC_w2EkL

How was POT installed [pip]

Sys specifications

Linux-3.10.0-327.22.2.el7.x86_64-x86_64-with-centos-7.2.1511-Core
Python 3.4.3 (default, Apr 28 2015, 11:29:27) 
[GCC 4.9.2]
NumPy 1.16.2
SciPy 1.2.1
POT 0.5.1

ot.gpu.sinkhorn uses dtype of the cost matrix

I tried to use ot.gpu.sinkhorn using CUDA and I got this traceback:

  File "/lib/python3.5/site-packages/ot/gpu/bregman.py", line 132, in sinkhorn_knopp
    np.divide(M, -reg, out=K)
  File "cupy/core/_kernel.pyx", line 831, in cupy.core._kernel.ufunc.__call__
  File "cupy/core/_kernel.pyx", line 355, in cupy.core._kernel._get_out_args
TypeError: output (typecode 'd') could not be coerced to provided output parameter (typecode 'h') according to the casting rule "same_kind"

I guess it has to do with: https://github.com/rflamary/POT/blob/master/ot/gpu/bregman.py#L120 which reuses the dtype of M, but M has been computed from ot.gpu.dist using a cost matrix which has been created with dtype: np.int16, so it makes sense to have this error.

I tried to set it as np.float64 to see if the error is indeed due to this. But I wonder if that's expected behavior. I can do a PR to make this error more user-friendly, but beyond this, why not have K be np.float64 anyway? My use case to use np.int16 on the cost matrix is because I have a really big matrix, this way I can save up a lot of RAM.

Thank you again for this project :)

A pure numpy implementation of the network simplex algorithm, `ot.emd(a,b,M)`

Dear Remi,

thank you very much for releasing and documenting this package - it's really helpful to learn from 👍 I was wondering if there's a simpler/more explicit way to learn the network simplex algorithm?

I was looking on the web for very simple 1D emd code to help to compare the number of computational steps and accuracy of the unregularized linear program algorithm, with the regularized Sinkhorn-Knopp algorithms which you have here in pure numpy.

I could only find MATLAB code though, and I don't have access to MATLAB? I tried converting it to Octave, but the linear programming solver in Octave seems to be different to the MATLAB one, and I could'nt get the same values as ot.emd(a,b,M). I tried both Gaussians, and simpler discrete distributions, but I couldn't find the problem?

It would be really great, and help my understanding a lot, if I could find some simple numpy code to calculate the emd by setting up the linear program as the network simplex algorithm, as you do in EMD_wrapper.cpp. I'm trying to do this with, linprog-simplex?

I just wondered if you know of any such code which is available? It would be really helpful for people new to optimal transport to see how the different algorithms work, (side by side in numpy), and compare their accuracy at a basic level.

All the best,

Ajay

Need to specify extra_link_args to compile

Hi, just wanted to mention that I needed to add extra_link_args=["-stdlib=libc++"] inside ext_modules = cythonize(Extension( ... )) to get the cython code to compile. I'm using Python 3.7 on Mojave 10.14.4 with...

$ gcc --version

Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/c++/4.2.1
Apple LLVM version 10.0.1 (clang-1001.0.46.3)
Target: x86_64-apple-darwin18.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Getting these things to compile is always tricky for me. Let me know if you want more info.

Improve typos in Notebooks

Hi,
Probably a minor copy/paste fix needed in the notebooks:

For example, in the latter, the following code seems produce EMD instead of Sinknorn

# prediction between images (using out of sample prediction as in [6])
transp_Xs_emd = ot_emd.transform(Xs=X1)
transp_Xt_emd = ot_emd.inverse_transform(Xt=X2)

transp_Xs_sinkhorn = ot_emd.transform(Xs=X1) # Shouldn't be ot_sinkhorn.transform(Xs=X1) ?
transp_Xt_sinkhorn = ot_emd.inverse_transform(Xt=X2)  # Same here

At least, it would match the example:
https://github.com/rflamary/POT/blob/e757b75976ece1e6e53e655852b9f8863e7b6f5a/examples/plot_otda_color_images.py#L118-L119

Thanks
PS. Sorry if I misunderstood something.

Remove rst compiled doc and notebooks from repository

The current documentation relies on sphinx-gallery which cannot be executed on readthedoc so we have to compile everything to rst and notebooks for a proper documentation.

This will make the repo explode so we should find a way to have an updated doc (staying on readthedoc if possible) probably by keeping a compiled version of the doc on a separate repository.

The compiled notebooks also are very nice (they allow a quick look at how the toolbox works) but should be stored also in a separate repo.

Feature request:- Away Frank Wolf

Hello,

I am writing you today to discuss the possible implementation of Frank Wolf variants which can be interesting to solve the GW problem. While the standard FW converges slowly in O(1/t), other methods converge faster. One of the faster method is the away step Frank Wolf which converges linearly (https://arxiv.org/pdf/1511.05932.pdf).

This was suggested by Thomas Kedreux.

Not in simplex -- two sets of largely different sizes

I am trying to calculate the EMD of two sets. When one set has a few hundred entries and the other has only 2, the EMD calculation fails and returns Problem Infeasible.

Steps to reproduce the behavior:
** SEE BELOW COMMENT FOR FIXED SCRIPT **

Expected behavior
Should return EMD around 1, instead says that the sets spherEng1 and pencilEnergy are not in the simplex

Screenshots
Here is comparing the EMDs calculated for less densely tiled to most densely tiled (number of particles = number of segments) with the two element set

Desktop (please complete the following information):

OS: [MacOSX]
Python version [3.6]
POT installed with pip

import platform; print(platform.platform())
Darwin-16.7.0-x86_64-i386-64bit
import sys; print("Python", sys.version)
('Python', '2.7.15 |Anaconda, Inc.| (default, Dec 14 2018, 13:10:39) \n[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]')
import numpy; print("NumPy", numpy.version)
('NumPy', '1.15.4')
import scipy; print("SciPy", scipy.version)
('SciPy', '1.1.0')
import ot; print("POT", ot.version)
('POT', '0.5.1')

Convert to sphinx-gallery only for notebooks

We should use sphinx Gallery to generate automatically the notebooks.

To do that we need to provide proper rst documentation in the examples as in
https://sphinx-gallery.readthedocs.io/en/latest/tutorials/plot_notebook.html#sphx-glr-tutorials-plot-notebook-py

UnicodeDecodeError: 'ascii' while installing with pip

Hi everyone,

I am trying to install POT on an Ubuntu 16.04 with Anaconda and

Python 3.6
Cython 0.28.3
Numpy 1.14.5
Scipy 1.1.0
Matplotlib 2.2.2

using the instructions on http://pot.readthedocs.io/en/stable/

When executing pip install POT, I obtain the following error message.

`
Collecting pot
Using cached https://files.pythonhosted.org/packages/50/66/714ee432a02e95a869c8e243e369ebad60e69a72ab1a72367c31df206619/POT-0.4.0.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "/tmp/pip-install-4awvn1uv/pot/setup.py", line 26, in
import pypandoc
ModuleNotFoundError: No module named 'pypandoc'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-install-4awvn1uv/pot/setup.py", line 29, in <module>
    README = open(os.path.join(ROOT, 'README.md')).read()
  File "/root/anaconda3/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5501: ordinal not in range(128)

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-4awvn1uv/pot/
`

However, if I do conda install -c conda-forge pot, it updates

ca-certificates
certifi
conda
openssl

and then it installs successfully POT.

I have installed POT on OSX with pip successfully with a similar anaconda setup.

Cannot run gpu modules

Hello,

I am trying out the GPU implementation of the sinkhorn transport, but with not much success.

>>> a=[.5,.5]
>>> b=[.5,.5]
>>> M=[[0.,1.],[1.,0.]]
>>> ot.gpu.sinkhorn(a,b,M,1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'ot' has no attribute 'gpu'

However, the ot.sinkhorn(a,b,M,1) works as expected.
I have cupy installed as well as the CUDA SDK.

Could someone help?

Usage of ot.gpu

hi,
I search the full document https://pot.readthedocs.io/en/latest/index.html but I can not find any usage information about ot.gpu.
I wonder whether there is some document or example about the usage of ot.gpu
Thanks

A litttle help to clarify something.

Hello,

I don't have a formal background in OT, therefore pardon me if I am asking something extremely silly. In the plot_ot_1d.py, for the cost matrix calculation :
M = ot.dist(x.reshape((n, 1)), x.reshape((n, 1)))

It is bit strange because I was expecting cost matrix to be something in between two distributions but it seems that cost matrix is rather in between the samples of the 2 distribution. Kindly reply.

EMD and Sinkhorn

Hi there,

great library :) I have some small questions.

I installed the package and ran some simple test, do I understand it right that:

ot.emd and ot.lp.emd are the same
ot.emd2 and ot.lp.emd2 are the same
emd and emd2 use the same underlying algorithm, but one prints the transportation matrix and the other one the distance
ot.sinkhorn(method='sinkhorn'), ot.bregman.sinkhorn(method='sinkhorn') and ot.bregman.sinkhorn_knopp are the same
ot.sinkhorn2 and ot.bregman.sinkhorn2(method='sinkhorn') are the same
sinkhorn(method='sinkhorn') and sinkhorn2 use the same underlying algorithm, but one prints the transportation matrix and the other one the distance?

I observed that for solving one transportation problem emd is much faster than sinkhorn, but I actually expected it to be vice versa, that's one reason to use it.... how come?
If I'm using ot.gpu.sinkhorn, how could I calculate the distance from the transportation matrix?

Thanks in advance for clarification on these issues.
Best, Patrick

Text mining

Hello there,

i am currently toying around POT comparing texts. I have a dictionary of 46k terms and im trying to compare 120k documents. Every document has at most 10-15 words(bibtex titles) so comparing 2 distributions - texts will result in comparing 2 [46000,1] vectors with at most 10 non zero entries.

Are there any suggestions in the process because the naive approach is too slow. Comparing 10k documents takes 2 days.
( emd2(p,q,C) p,q are [46k,1] C is [46k,46k]
Sinkhorn is even slower!

Thanks in advance!

semi supervised da - correction + example

Hi,

I may have found a line that could lead to errors when using OT objects in a supervised DA setting.

at line 992 in da.py I propose to change classes = np.unique(ys) into classes = [c for c in np.unique(ys) if c != -1] which would enable people to use source samples with no labels to find the optimal couplin.

I also propose to add an example for semi supervised DA.

Do you agree with these propositions ? If yes, I'll open a PR.

Outdated method call in "1D Wasserstein Barycenter demo"

Hi, I'd just like to mention that there is a small issue with one of the demos

On the "1D Wasserstein Barycenter demo" of the notebooks (notebooks/plot_barycenter_1D.ipynb) on lines 24 and 25 of the second code block, the Gaussian distributions are generated with

a1 = ot.datasets.make_1D_gauss(n, m=20, s=5)

However, this method was renamed to get_1D_gauss

a1 = ot.datasets.get_1D_gauss(n, m=20, s=5)

The demo runs without issues after that is fixed

Thanks !

Displacement interpolation?

Hello,

thank you for POT!
In the [1] reference, the interpolation is discussed and an example given (see link below).
Is it feasible to do this in POT (the matlab code is https://github.com/gpeyre/2013-SIIMS-ot-splitting ) or to extend POT to do it?

Best regards

Thomas

Domain adaptation Classes

We should change the domain adaptation Classes to be more sklearn compliant.

Main issues:

Use CamelCase for classes
Use init for setting parameters and instead of fit.

@agramfort proposed to Creat new Clases with proper names and begin deprecating the old classes.

I think it is a good move.

Road to POT 1.0

Hello to all contributors,

The last POT 0.6 release brought new features to the library and we have now 25 papers implemented in POT. It was discussed that before making the 1.0 release, we should work on some fundamental changes inside the library. In my humble opinion, we should work on the most urgent changes before adding new features. If we keep adding new features, it will be even more complicated to make the fundamental changes afterwards. I start this issue in order to discuss these matters.

I copy past here what was discussed before. The list is non exhaustive and I invite you to complete it if you have ideas/wishes:

Reform changes

Naming convention (clearer and more consistent)
Duplicated code (bregman module)
Clean commented code
a two letters package name -- ot -- can cause multiple headaches ..
The emd functions should be in a specific module not in the init file
In some functions, the transport plan is computed (which can be heavy to store on gpus) even though it is not needed. I'm thinking there should be a function that explicitly computes the transport plan given the dual variables making the call specific by the user.
sinkhorn returns the distance or the plan depending on the second dimension of the input distribution b ..
make sure we have all the working infrastructure to make this (and future releases) by the CIs.
Domain adaptation name
Torch backend

I would state that the most urgent before adding features is the naming convention, because we can't add new functions with old names (ot.sinkhorn2 ...).

Name Shifting

It will be updated each time we converge toward a new name.

------------------documentation/examples------------------

n -> n_source_samples
xs/xt -> x_source/x_target
G0/Gs -> Gamma_emd/ Gamma_sinkhorn (May be ?)
reg parameter entropic -> epsilon (blur ?)
d (dimension parameter) -> n_features
N (barycenter example) -> N_distributions
X1 -> X_source (color transfer)

------------------variable names------------------

numItermax -> num_iter_max
numInnerItermax -> num_iter_max_{function name}
stopThr -> stop_threshold
(reg -> blur ?)
log (variable not bool) -> log_{namefunction}

Assignements

Variable names (Kilian)

PEP8 cleanup

running pyflakes gives me:

$ pyflakes ot/*/*.py ot/*.py examples/*.py
ot/optim.py:9: '.bregman.sinkhorn_stabilized' imported but unused
ot/utils.py:96: undefined name 'reduce'
examples/demo_OTDA_classes.py:6: 'numpy as np' imported but unused
examples/demo_barycenter_1D.py:12: 'mpl_toolkits.mplot3d.Axes3D' imported but unused
examples/demo_barycenter_1D.py:14: 'matplotlib.colors.colorConverter' imported but unused

when running flake8

$ flake8 ot/*/*.py ot/*.py examples/*.py

you'll see that you have a lot of pep8 style violations.

Dockerizing POT, setting up error

By dockerizing POT using this POT Dockerfile I came across an error that occurs during the command python3 setup.py install --user:

Traceback (most recent call last):
  File "setup.py", line 3, in <module>
    from setuptools import setup, find_packages
  File "/usr/local/lib/python3.4/dist-packages/setuptools/__init__.py", line 12, in <module>
    import setuptools.version
  File "/usr/local/lib/python3.4/dist-packages/setuptools/version.py", line 1, in <module>
    import pkg_resources
  File "/usr/local/lib/python3.4/dist-packages/pkg_resources/__init__.py", line 70, in <module>
    import packaging.version
ImportError: No module named 'packaging'

Any idea how to solve this?

Update
I worked out first error but encounter another one. It seems related with Shippable/support#3316: the cause may be due to a new version of setuptools.

Another issue on Windows with ot.da.OTDA()

The following script:

import numpy as np
import ot
a = np.random.rand(1500, 95)
b = np.random.rand(50000, 95)
opt = ot.da.OTDA()
opt.fit(a,b)
print(np.sum(opt.G))

returns me on Windows
0

Windows-10-10.0.15063-SP0
Python 3.6.1 |Anaconda custom (64-bit)| (default, May 11 2017, 13:25:24) [MSC v.1900 64 bit (AMD64)]
NumPy 1.12.1
SciPy 0.19.0
POT 0.3.1

while it returns me on debian
0.716

Linux-3.2.0-4-amd64-x86_64-with-debian-7.11
('Python', '2.7.3 (default, Jun 21 2016, 18:38:19) \n[GCC 4.7.2]')
('NumPy', '1.13.1')
('SciPy', '0.19.1')
('POT', '0.3.1')

If I'm not mistaken, it should always return 1

Free support barycenter examples

Hello,

I have been working on your free support barycenter examples.
https://github.com/rflamary/POT/blob/master/examples/plot_free_support_barycenter.py

I went through the code and there is something which looks wrong to me. To plot your figure you used :

for (x_i, b_i) in zip(measures_locations, measures_weights):
color = np.random.randint(low=1, high=10 * N)
pl.scatter(x_i[:, 0], x_i[:, 1], s=b * 1000, label='input measure')

but I think it should be $s=b_i * 1000$ instead of $s=b * 1000$.

I can make a PR to correct it if it is a mistake.

Docstring issues in ot.da

There are some docstring inconsistencies in the docstring of some classes such as the Sinkhorn class: the parameter mapping is not in the signature call, so, how does one control the mapping now? there is an "out_of_sample_map" parameter in the call upon class construction which should be explained in the docstring of these classes.

Example:

Init signature: SinkhornLpl1Transport(reg_e=1.0, reg_cl=0.1, max_iter=10, max_inner_iter=200, log=False, tol=1e-08, verbose=False, metric='sqeuclidean', norm=None, distribution_estimation=<function distribution_estimation_uniform at 0x7effd9dd6400>, out_of_sample_map='ferradans', limit_max=inf)
Docstring:
Domain Adapatation OT method based on sinkhorn algorithm +
LpL1 class regularization.

Parameters

reg_e : float, optional (default=1)
Entropic regularization parameter
reg_cl : float, optional (default=0.1)
Class regularization parameter
mapping : string, optional (default="barycentric")
The kind of mapping to apply to transport samples from a domain into
another one.
if "barycentric" only the samples used to estimate the coupling can
be transported from a domain to another one.
metric : string, optional (default="sqeuclidean")
The ground metric for the Wasserstein problem
norm : string, optional (default=None)
If given, normalize the ground metric to avoid numerical errors that
can occur with large metric values.

Perform proper Pytest

For the moment, we only perform doctest and a simple loading of the module.

We should begin to convert and propose tests for all functions and classes.

Cannot install library on Ubuntu 14.04.5 LTS

Hello there,

I wanted to install the library in a PC that has a GPU to test the parallelism of optimal transport.
I cannot though because there is an error on build:

ot/lp/network_simplex_simple.h:234:46: error: macro "MAX" requires 2 arguments, but only 1 given
MAX(std::numeric_limits::max()),

I'm guessing there is a define of MAX from an older user somewhere? If that's the case can you help me giving me some insights where should i look?

OS : Ubuntu 14.04.5 LTS

Thank you for your time in advance!

Unusable parameter log for EMDTransport

Hi,

I need to get some values from transport computation, as cost matrix, value of minimisation...
Some of these values are stored in the log. But when I do:

ot_emd,log = ot.da.EMDTransport(norm="max",log=0)

I get the following error:

TypeError: __init__() got an unexpected keyword argument 'log'

In the EMDTransport class declaration there is:

"""
Parameters
----------
...
log : int, optional (default=0)
Controls the logs of the optimization algorithm
..."""

So the question:
Is it voluntary not to be able to recover the log with this class ? And so to get it back I should directly call the emd function without using the EMDTransport class.

Another question:
I want to get the min value computed by the minimisation problem (first with EMD but also with sinkhorn) to find a link between effectiveness of transport and OA obtained in classification, how can I do and are there some others values usable to get this kind of information?

At the end my goal is estimate several transports and choose automatically the best.

Regards

Benjamin

Compilation issue with MacOSX Mojave

I encountered the following issue while installing POT on MacOSX Mojave, with
python 3.6

python setup.py build running build running build_py running build_ext building 'ot.lp.emd_wrap' extension /usr/bin/gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -Iot/lp -I/anaconda3/lib/python3.6/site-packages/numpy/core/include -I/Users/nico/code/POT/ot/lp -I/anaconda3/include/python3.6m -c ot/lp/emd_wrap.cpp -o build/temp.macosx-10.7-x86_64-3.6/ot/lp/emd_wrap.o warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found] In file included from ot/lp/emd_wrap.cpp:648: In file included from /anaconda3/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4: In file included from /anaconda3/lib/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:18: In file included from /anaconda3/lib/python3.6/site-packages/numpy/core/include/numpy/ndarraytypes.h:1823: /anaconda3/lib/python3.6/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] #warning "Using deprecated NumPy API, disable it by " \ ^ In file included from ot/lp/emd_wrap.cpp:650: ot/lp/EMD.h:19:10: fatal error: 'iostream' file not found #include <iostream> ^~~~~~~~~~ 2 warnings and 1 error generated. error: command '/usr/bin/gcc' failed with exit status 1

I finally solved it by adding in setup.py the following extra argument for the compiler
extra_compile_args=["-stdlib=libc++"]

However before pushing a PR, it is not clear to me if adding this option will break compatibility with other OS. Meanwhile, it is a simple workaround for this problem.

POT calculate 2D vector EMD distance which have different length.

How to calculate 2D vector emd distance using POT? etc. I have these 2 vector:

[(0, 1), (1, 1), (2, 2), (3, 2), (4, 1), (5, 1)],
[(0, 1), (1, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1)]

distance matrix:
5x6

Actually, the 2 vector are words-bag, and distance matrix is the words Euclidean distance, I want using this to calculate Sentence Distance but don't know how to use EMD distance, any help?

"pip install POT" fail with Python 3.7

Hi guys, I am a newbie in programming, I tried "pip install POT" in my terminal, and the following (first) error happened:

ot/lp/emd_wrap.cpp:6660:65: error: too many arguments to function call, expected 3, have 4
return (*((__Pyx_PyCFunctionFast)meth)) (self, args, nargs, NULL);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~

I am sure any reliant packages are updated. Any help I would appreciate. Thanks!

setup.py needs to specify an encoding when opening README.md

README.md contains non-ascii characters, so setup.py will fail if the locale is ascii, e.g.

$ LC_ALL=C python setup.py install
Traceback (most recent call last):
  File "setup.py", line 26, in <module>
    import pypandoc
ModuleNotFoundError: No module named 'pypandoc'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "setup.py", line 29, in <module>
    README = open(os.path.join(ROOT, 'README.md')).read()
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5501: ordinal not in range(128)

This is easily fixed by using open(..., encoding="utf-8") (or, if you want Py2 compatibility, codecs.open(..., encoding="utf-8")).

pythonot / pot Goto Github PK

pot's Introduction

POT: Python Optimal Transport

Using and citing the toolbox

Installation

Pip installation

Anaconda installation with conda-forge

Post installation check

Dependencies

Examples

Short examples

Examples and Notebooks

Acknowledgements

Contributions and code of conduct

Support

References

pot's People

Contributors

Stargazers

Watchers

Forkers

pot's Issues

In the following example, the emd function works well:

In the below example (failure case), the output of emd function is of zero's

Failure case on the example mentioned in the documentation (Demo_1D_OT.ipynb), again the output of emd is a matrix of zero's (see the figure)

Please let me know how to resolve the issue

Reform changes

Name Shifting

Assignements

Parameters

Recommend Projects

Recommend Topics

Recommend Org