gregversteeg / npeet Goto Github PK
View Code? Open in Web Editor NEWNon-parametric Entropy Estimation Toolbox
License: MIT License
Non-parametric Entropy Estimation Toolbox
License: MIT License
The readme file lacks installation instructions -- I think it would be very helpful to include that.
I was able to puzzle out that
$ python3 setup.py build
$ python3 setup.py install
are enough to build and install the package. Maybe that info, or something like it, could be in the readme.
Also, I found that after installing, import entropy_estimators
fails, but import npeet.entropy_estimators
succeeds. The readme mentions the former, should it contain the latter instead?
Thank you for publishing your package NPEET on GitHub. Could you add a license file (preferably open-source license, such as MIT, GNU, BSD licenses) to your package? Thanks!
kbriggs:~/Downloads/NPEET> python3 test.py
For a uniform distribution with width alpha, the differential entropy is log_2 alpha, setting alpha = 2
and using k=1, 2, 3, 4, 5
Traceback (most recent call last):
File "./test.py", line 16, in
print("result:", [ee.entropy([[2 * random.random()] for i in range(1000)], k=j + 1) for j in range(5)])
File "./test.py", line 16, in
print("result:", [ee.entropy([[2 * random.random()] for i in range(1000)], k=j + 1) for j in range(5)])
File "/home/kbriggs/Downloads/NPEET/entropy_estimators.py", line 28, in entropy
return (const + d * np.mean(map(log, nn))) / log(base)
File "/usr/local/lib/python3.4/dist-packages/numpy/core/fromnumeric.py", line 2909, in mean
out=out, **kwargs)
File "/usr/local/lib/python3.4/dist-packages/numpy/core/_methods.py", line 82, in _mean
ret = ret / rcount
TypeError: unsupported operand type(s) for /: 'map' and 'int'
kbriggs:~/Downloads/NPEET> python2 test.py
For a uniform distribution with width alpha, the differential entropy is log_2 alpha, setting alpha = 2
and using k=1, 2, 3, 4, 5
('result:', [0.95063690299507952, 0.98051458362141108, 1.0803462913574611, 1.0316551234094444, 1.0289725544677049])
Gaussian random variables
Conditional Mutual Information
covariance matrix
[[4 3 1]
[3 4 1]
[1 1 2]]
('true CMI(x:y|x)', 0.5148736716970265)
('samples used', [10, 25, 50, 100, 200])
('estimated CMI', [0.24721094773861269, 0.39550091844389834, 0.46211431227905897, 0.48994541664326197, 0.49993287186420526])
('95% conf int. (a, b) means (mean - a, mean + b)is interval\n', [(0.32947891495120885, 0.47883105656907937), (0.42410443410041138, 0.40553319741348437), (0.29607520550148525, 0.29646667472554578), (0.17646000212101254, 0.19139043703562886), (0.1623733550388789, 0.19292824321772967)])
Mutual Information
('true MI(x:y)', 0.5963225389711981)
('samples used', [10, 25, 50, 100, 200])
('estimated MI', [0.32218586252030301, 0.54386805987295483, 0.59630897787131887, 0.60762939695898355, 0.60418593716673841])
('95% conf int.\n', [(0.42363251380820954, 0.46980791508516928), (0.46583034399247247, 0.50990786157500079), (0.35170125121037665, 0.33635610406503746), (0.23654160340493391, 0.30032100823502828), (0.2007355329953654, 0.17193438029361319)])
IF you permute the indices of x, e.g., MI(X:Y) = 0
('samples used', [10, 25, 50, 100, 200])
('estimated MI', [0.032435448506589186, -0.027013576228861892, -0.0048799193000058135, 0.0023174460892350754, -0.0002141277047037321])
('95% conf int.\n', [(0.28988781354141774, 0.41434201574331025), (0.24203605944116111, 0.29849816049646066), (0.18081726377075832, 0.18040335534919902), (0.15879645329878422, 0.22733498191676946), (0.13263900209136867, 0.13325413690339941)])
Test of the discrete entropy estimators
For z = y xor x, w/x, y uniform random binary, we should get H(x)=H(y)=H(z) = 1, H(x:y) etc = 0, H(x:y|z) = 1
Traceback (most recent call last):
File "./test.py", line 116, in
print("H(x), H(y), H(z)", ee.entropyd(x), ee.entropyd(y), ee.entropyd(z))
File "/home/kbriggs/Downloads/NPEET/entropy_estimators.py", line 114, in entropyd
return entropyfromprobs(hist(sx), base=base)
File "/home/kbriggs/Downloads/NPEET/entropy_estimators.py", line 149, in hist
sx = discretize(sx)
File "/home/kbriggs/Downloads/NPEET/entropy_estimators.py", line 280, in discretize
return [discretize_one(x) for x in xs]
File "/home/kbriggs/Downloads/NPEET/entropy_estimators.py", line 275, in discretize_one
if len(x) > 1:
TypeError: object of type 'int' has no len()
I observed this ValueError: not enough values to unpack (expected 2, got 1) when I tried to calculate the mutual info between a continuous and a discrete.
Can anybody help me?
import npeet.entropy_estimators as ee
ee.micd(cont.iloc[:,1].values.tolist(),disc.iloc[:,[1]].values.tolist()))
Hello Prof. Greg Ver Steeg,
I want to compute MI between two high dimensional continues time varying signal. their dimension are 39 and 300. It seems like this toolbox is not suitable for that. do you know if there is any easy way to measure the MI in this situation?
Hi,
Thanks for sharing you work. I want to use the continuous entropy of your project in mine.
I have a matrice like this:
x = tf.Variable( [ [0.96, -0.65, 0.99, -0.1 ],
[0.97, 0.33, 0.25 , 0.05 ],
[0.9, 0.001, 0.009, 0.33 ],
[-0.60, -0.1, -0.3, -0.5 ],
[0.49, -0.8, -0.05, -0.0036],
[0.0 , -0.45, 0.087, 0.023 ],
[0.3, -0.23, 0.82, -0.28 ]])
When I apply the ee.entropy
, I receive this error:
rev = 1/ee.entropy(row)
File "/home/sgnbx/Downloads/NPEET/npeet/entropy_estimators.py", line 21, in entropy
assert k <= len(x) - 1, "Set k smaller than num. samples - 1"
TypeError: object of type 'Tensor' has no len()
This is my code:
def rev_entropy(x):
def row_entropy(row):
rev = 1/ee.entropy(row)
return rev
rev= tf.map_fn(row_entropy, x, dtype=tf.float32)
return rev
x = tf.Variable( [ [0.96, -0.65, 0.99, -0.1 ],
[0.97, 0.33, 0.25 , 0.05 ],
[0.9, 0.001, 0.009, 0.33 ],
[-0.60, -0.1, -0.3, -0.5 ],
[0.49, -0.8, -0.05, -0.0036],
[0.0 , -0.45, 0.087, 0.023 ],
[0.3, -0.23, 0.82, -0.28 ]])
p = (x + tf.abs(x)) / 2
ent_p = rev_entropy(p)
Can you please explain how can I know the `k` here?
print(ent_p)
Thank you for providing these entropy estimators as open source.
I am having difficulty understanding the units of the continuous entropy estimations that are being produced by npeet. I wrote the below code to test this:
#!/usr/bin/env python3
from entropy_estimators import continuous as paulbrodersen
from npeet import entropy_estimators as npeet
from scipy import stats
import math
import pandas as pd
import numpy as np
uniform = stats.uniform(loc=0, scale=math.e) # Uniform distribution from 0 to e
cauchy = stats.cauchy(scale=0.01)
levy_stable = stats.levy_stable(alpha=2.0, beta=0.0, scale=0.01)
count = 5000
uniform_observations = uniform.rvs(size=count)
cauchy_observations = cauchy.rvs(size=count)
levy_stable_observations = levy_stable.rvs(size=count)
distributions = ["uniform to e", "cauchy", "levy stable"]
scipy_analytical = [uniform.entropy(), cauchy.entropy(), levy_stable.entropy()]
paulbrodersen_results = [paulbrodersen.get_h(uniform_observations, k=5),
paulbrodersen.get_h(cauchy_observations, k=5),
paulbrodersen.get_h(levy_stable_observations, k=5)]
npeet_results = [npeet.entropy(np.reshape(uniform_observations, [count, 1]), k=5),
npeet.entropy(np.reshape(cauchy_observations, [count, 1]), k=5),
npeet.entropy(np.reshape(levy_stable_observations, [count, 1]), k=5)]
results = pd.DataFrame({"distribution": distributions,
"scipy analytical": scipy_analytical,
"paulbrodersen": paulbrodersen_results,
"npeet": npeet_results})
print(results)
The result:
distribution scipy analytical paulbrodersen npeet
0 uniform to e 1.0 0.994836 1.435245
1 cauchy -2.0741459390188 -2.066179 -2.980866
2 levy stable -2.8396580625037564 -2.836570 -4.092305
paulbroderson's library also implements the Kraskov differential entropy estimation technique using k-nearest neighbors. Notice that its estimate of the Levy Stable and Cauchy distributions is very close to the analytical result from Scipy's differential entropy calculation.
A nat is defined as the information content of the uniform distribution on the internal [0, e]
. See here. You can see in the above table that paulbroderson's implementation does produce an entropy estimation of ~1.0 for the uniform distribution on the internal 0 to e. Kraskov's paper mentions the following:
where “log” will always mean natural logarithm so that information is measured in natural units
indicating that the paper is using the natural log and base e, which will produce values in nats.
However, npeet's estimates are very different from the expected values in nats. Is this library not producing values in the nats unit? I attempted to convert values from bits to nats using the conversion factor 1 nat = 1 / log(2) bits
, but this did not improve the comparison.
Any pointers would be very helpful.
for some reason the 'setup.py' file only gives this warning:
runfile('C:/Users/Yonatan/Documents/GitHub/NPEET/setup.py', wdir='C:/Users/Yonatan/Documents/GitHub/NPEET') Reloaded modules: npeet, npeet.entropy_estimators An exception has occurred, use %tb to see the full traceback. Traceback (most recent call last): File "C:\Users\Yonatan\anaconda3\lib\distutils\core.py", line 134, in setup ok = dist.parse_command_line() File "C:\Users\Yonatan\anaconda3\lib\site-packages\setuptools\dist.py", line 707, in parse_command_line result = _Distribution.parse_command_line(self) File "C:\Users\Yonatan\anaconda3\lib\distutils\dist.py", line 501, in parse_command_line raise DistutilsArgError("no commands supplied") DistutilsArgError: no commands supplied During handling of the above exception, another exception occurred: SystemExit: usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...] or: setup.py --help [cmd1 cmd2 ...] or: setup.py --help-commands or: setup.py cmd --help error: no commands supplied
tried to install from CMD too with no response.
but for some reason not clear to me for the test.py file i can operate the library.
Help pls! s
omebody has an explanation?
** the test file for some reason works perfect...
Hi Greg,
I would like to compute MMI (aka Interaction Information) for 3 variables. There are several ways to do this by combining entropies and mutual infromations, for example
I(X:Y:Z) = I(X:Y) - I(X:Y|Z)
or
I(X:Y:Z) = H(X) + H(Y) + H(Z) - H(XY) - H(XZ) - H(YZ) + H(XYZ)
Hi Greg,
Many thanks for making available such great Python code!
I was wondering if you could provided suggestions on how to compute normalized mutual information for discrete and continuous data. I would expect the normalized version of mutual information to be in the range [0, 1].
Kind regards,
Ivan
Thanks for making this wonderful package.
I'm trying to compute the mutual information in high-dimension but the case I am interested in is exceptionally simple and hence there may be a faster method than using the built-in function.
Specifically, I have a function
The documentation has a comment which is rather suggestive but I confess I don't really understand what is being said:
"On the other hand, in high-dimensions, KDTree’s are not very good, and you might be reduced to the speed of brute force
If I have
However, this question about the documentation may not be relevant. Perhaps there is a more direct way of answering my primary question.
Thanks again for the wonderful code!
Dear Greg,
I am using npeet for estimating mutual information in distributed least squares problem, but it seems I often get negative mutual information even with the use of shuffle_test. Despite that, one interesting thing is that even most of the results are negative, the tendency seems right. As I attached in the figure, the blue line first increase and then converge, the red line is far away from blue line and then converge. This trend is what I expected, but I cannot explain the negative values, do you have any idea about this? Thanks in advance.
Hi Greg,
I want to use your package to study some neuroscience data. I am having problem with a basic sanity check
The entropy of a uniform distribution theoretically scales as a logarithm of its standard deviation. I would expect that the entropy of the distribution in range [0, 100] would be log(100) + const, whereas the entropy for the range [0, 1] would be log(1) + const. However, my test seems to show that the entropy computed by NPEET does not change with increasing standard deviation. Why is that?
Here is a minimal example:
import numpy as np
import matplotlib.pyplot as plt
from npeet.entropy_estimators import entropy
data = np.random.uniform(0, 1, 1000)
alphaLst = np.arange(1, 100)
hLst = [entropy(a * data[:, None]) for a in alphaLst]
plt.figure()
plt.plot(alphaLst, hLst)
plt.show()
If possible, I would really appreciate a suggestion soon, I kind of discovered this problem during a validation study, and I need to submit some results soon.
Thanks,
Aleksejs
great code thanks
can it be used for
feature_selection.mutual_info_
https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_classif.html
and then for spectral_coclustering
??
PS
may you share link to simple code example
to understand what is
entropy estimation from k-nearest neighbors distances
pls
Hi there!
Thanks for making this package available to us. I was wondering how could be possible to compute a conditional mutual information of features X and Y condition to a set of features S (note that the set can have one or more features).
If it is possible, how would you proceed? Can this workflow be applied to continuous data (all features) and discrete data (all features).
Many thanks for your help,
Ivan
Hi, I noticed that also the cmi function can return a negative value. Can You give me some tips in order to obtain the correct result? Thanks
Dear Greg,
I think I have found a bug in the code, unless I am doing something seriously wrong. Consider this minimal example
import numpy as np
import npeet.entropy_estimators as ee
x = np.random.normal(0,1,10000)
y = np.random.normal(0,1,10000)
xy = np.array([x,y]).T
entrTrue1D = 0.5*(1 + np.log(2*np.pi))
print('H(X) =', ee.entropy(x[:, None], base=np.exp(1), k=3), 'expected', entrTrue1D)
print('H(Y) =', ee.entropy(y[:, None], base=np.exp(1), k=3), 'expected', entrTrue1D)
print('H(XY) =', ee.entropy(xy, base=np.exp(1), k=3), 'expected', 2*entrTrue1D)
print('I(X:X) =', ee.mi(x[:, None], x[:, None], base=np.exp(1), k=3), 'expected', entrTrue1D)
print('I(XY:XY) =', ee.mi(xy, xy, base=np.exp(1), k=3), 'expected', 2*entrTrue1D)
The output is as follows:
H(X) = 1.4081517115316977 expected 1.4189385332046727
H(Y) = 1.3950320463484136 expected 1.4189385332046727
H(XY) = 2.794510292787968 expected 2.8378770664093453
I(X:X) = 7.95417270271105 expected 1.4189385332046727
I(XY:XY) = 7.954172702711051 expected 2.8378770664093453
Problems:
Could you please tell me what is going on, and, if possible, how can I fix it
Best regards,
Aleksejs
Hello,
I am using your package to calculate the entropy for a continuous variable, however the entropy value I got is a negative number. Also I tried centropy(x,x), conditional entropy on itself. The result supposes to be zero. However, the results returned sometimes a positive or negative number, but not close to zero. Could you help me explain the issue? For discrete case, the result looks fine.
Thanks
Hi there! Thank you for making and sharing such a useful package.
I've noticed that the estimation of Jensen-Shannon divergence is currently not supported in this package. Do you have any plan of adding it to this package in the future? If not, is there any workaround to make use of the current supported functions to compute the Jensen–Shannon divergence? Thanks a lot~
Dear Greg,
I am getting strange behaviour in estimated mutual information. I want to check how much the mutual information I(aX, Y) depends on a positive scalar factor a. To the best of my knowledge, analytically mutual information should be completely independent of the scalar factor. However, when I try to estimate it with NPEET, the mutual information is decreasing significantly with increasing alpha. Can you comment on this please?
Here is the minimal example:
from npeet.entropy_estimators import mi
x = np.random.uniform(0,1,(1000,1))
y = np.random.uniform(0,1,(1000,1))
z = 0.5*x + 0.5*y
alphaLst = np.arange(1, 100)
miLst = [mi(a*x, z) for a in alphaLst]
plt.figure()
plt.plot(alphaLst, miLst)
plt.show()
Hi, following your example, I try to test whether the mutual information estimation makes sense or not. But there is something wrong with mutual information esitmator, because it is very strange that I(x;x) and h(x) are not the same.
Do you have any idea of this?
x = [[1.3],[3.7],[5.1],[2.4],[3.4]]
y = [[1.5],[3.32],[5.3],[2.3],[3.3]]
ee.mi(x,x)
Out[182]: 0.36067376022224085
ee.entropy(x)
Out[183]: 2.706665509186988
ee.entropy(y)
Out[184]: 2.6794531992583743
ee.mi(y,y)
Out[185]: 0.36067376022224085
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.