Currently, datashader's scatterplot/heatmap approach for points data partitions the se

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

In the <a href="https://github.com/bokeh/datashader/issues/102#issuecomment-194669092"

Thank you, <a class="user-mention notranslate" data-hovercard-type="user" data-hoverca

Thanks, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

Thanks, When I run <div class="highlight highlight-source-python

Example of plotting points with associated probabilities about datashader HOT 12 OPEN

holoviz commented on August 20, 2024

Example of plotting points with associated probabilities

from datashader.

Comments (12)

ianthomas23 commented on August 20, 2024 2

@naavis If you look at the contents of agg when you are using your zero weights you will see that it contains two values, 0 and np.nan. Zeros correspond to where you have data points that has a weight of zero, np.nan where there are no data points. If there is only a single finite data value in agg, it is mapped to the top end of the cmap, hence white.

Secondly, your combination of ds.tf.shade() and plt.imshow() is almost certainly not doing what you want. ds.tf.shade() outputs a 200x200 array containing RGBA values that are encoded into uint32, and if you pass an MxN array to imshow it will treat is as scalar data and apply a colormap. Hence you are applying a colormap twice. I recommend for debug purposes replacing your matplotlib code with a call to ds.util.export_image() and it should all be easier to understand.

Anyway, this is really a usage question and should have been posted to https://discourse.holoviz.org/ rather than being appended to a 6-year old github issue. If you have further questions about this, please could you ask on the discourse instead. Thanks!

from datashader.

Nithanaroy commented on August 20, 2024 1

In the comment above, tf.interpolate is deprecated. The new code would be:

cvs = ds.Canvas(plot_width=800, plot_height=500, x_range=x_range, y_range=y_range)
agg = cvs.points(df, 'X', 'Y', ds.mean('VAL'))
img = tf.shade(agg, cmap=["white", 'darkblue'], how='linear')

from datashader.

naavis commented on August 20, 2024 1

Thanks, and sorry. This Github issue was the only place I found mentioning using data point specific weights/probabilities with Datashader. The documentation isn't exactly abundant on this:
https://datashader.org/user_guide/Points.html
https://datashader.org/api.html#definitions

I was not aware of the Discourse page. I'll post any further thoughts there.

from datashader.

thoth291 commented on August 20, 2024

Thank you, @jbednar .
Two questions.

First:
Will this feature help to crossplot data like this:
X Y VAL
1 1 0.2
2 1 0.3
...
1 2 0.3
2 2 0.4
....
5 5 1.0

Where for each pair (X,Y) there are unique value VAL.
And the result is a scatter plot of these points colored by some mapping of VAL to RGB?

Basically equivalent of

df.plot(kind='scatter', x='X', y='Y', c='VAL', s=50);

Second:
Is (or will be) there any way to define size of the points in datashader?

Thanks!

from datashader.

jbednar commented on August 20, 2024

We're working on making point sizing be more flexible and automatic, and on properly documenting how to do it, but in the meantime you can apply the tf.spread function on your final image, as shown in this notebook:
https://gist.github.com/jcrist/62b366727886561356d8

The code is already available for the application you describe above; just pass the field you want to the appropriate aggregation function:

cvs = ds.Canvas(plot_width=800, plot_height=500, x_range=x_range, y_range=y_range)
agg = cvs.points(df, 'X', 'Y', ds.mean('VAL'))
img = tf.interpolate(agg, low="white", high='darkblue', how='linear')

where mean tells datashader that you want to average the VAL of all points falling into that pixel; you could instead take the max, median, etc.

from datashader.

thoth291 commented on August 20, 2024

Thanks, @jbednar .
I was able to colorize my plot - thanks for the example! It was quite easy and my understanding of datashader got more solid!
But it looks like tf.spread is not available (version from conda -c conda) - I guess I need use the github version instead...

from datashader.

jbednar commented on August 20, 2024

Oops, yes -- spread requires the Github master version.

from datashader.

thoth291 commented on August 20, 2024

Thanks,

When I run

import datashader as ds

I get this error:

OSError: [Errno 13] Permission denied: '/opt/dist/anaconda/lib/python2.7/site-packages/datashader-0.1.0-py2.7.egg/datashader/__pycache__'

DatashaderImportError.txt
The reason is that I install this package as system-admin, but I run it as my regular user.
Is there anyway to prohibit any file creations like that in your library? Or at least isolate them so that one user is not affecting other user.

The version from conda -c conda never had this problem.

For now - I just gave rwx permissions for all users to datashader directory and it seems to work.
Other than that - all the features are perfect! Thank you!

P.S. I'm curious if by design of spread API shape + px = mask. Then Why wouldn't you just generalize shape parameter to accept numpy masks and then just ignore px in that case... Or even beter - somehow scale the mask based on px... but I'm just curious - no demanding here :-)

from datashader.

jbednar commented on August 20, 2024

I don't think that issues with __pycache__ would be due to datashader per se, as we don't access that directly ourselves (though it looks like the separate numba library that we use does access it). So I'd assume that there's a different way to install it that would avoid permissions errors, but I don't know how you originally installed it, and thus what change to suggest.

For the shape, we often want to specify a circular mask at different radius values, which the px argument makes easy to do; it would be painful to make a new mask for every px value we wanted to try. Yes, scaling the mask based on the px value would be handy, but there are lots of ways to scale matrices, and so we'd rather leave that up to the user to do based on any of the many libraries available for that.

from datashader.

jcrist commented on August 20, 2024

The reason is that I install this package as system-admin, but I run it as my regular user.
Is there anyway to prohibit any file creations like that in your library? Or at least isolate them so that one user is not affecting other user.

We started caching code compilation in numba, which writes a cache file on first import. I've filed an issue, see numba/numba#1771.

For now, try running python -c "import datashader" with admin privileges after install. This should cause the compilation to happen once (and you have permission to write those files). Subsequent imports should only read the cache, which should be fine.

from datashader.

thoth291 commented on August 20, 2024

That all makes sense!
Thank you for the ticket at numba - I'll watch it.

from datashader.

naavis commented on August 20, 2024

Hi! I have been trying to use this method for plotting data points with associated probabilities/weights, but bumped into something I do not understand. If I pass all zeros values in the column used as the weighting factor, I expect the image to become empty. Yet it does not! Is it a bug or am I misunderstanding something?

Below is minimal code to reproduce it with datashader 0.13.0:

import datashader as ds
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

num_datapoints = 1000
xs = 200 * np.random.rand(num_datapoints)
ys = 200 * np.random.rand(num_datapoints)
weights = np.random.rand(num_datapoints)
# Uncommenting the line below should probably
# result in a black image, yet it doesn't?
# weights = np.zeros((num_datapoints,))

df = pd.DataFrame(np.array([xs, ys, weights]).T, columns=['x', 'y', 'weight'])
cvs = ds.Canvas(plot_width=200, plot_height=200, x_range=(0, 200), y_range=(0, 200))
agg = cvs.points(df, 'x', 'y', ds.sum('weight'))
img = ds.tf.shade(agg, cmap='white')

plt.imshow(img, origin='lower', cmap='gray')
plt.show()

And below is what I see if I uncomment the line that sets all the weights to zero.

In my other work the outputs of cvs.points(df, 'x', 'y', ds.sum('weight')) and a Matplotlib scatter plot with the weights used as colors or sizes look very different at the moment, so maybe I'm misunderstanding how it is supposed to work in Datashader. I assume using the ds.sum('weight') aggregator would make the brightness of each bin/pixel equal to the sum of the weights for data points that land in that bin.

from datashader.

Example of plotting points with associated probabilities about datashader HOT 12 OPEN

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent