Comments (4)
Note that this is basically calling dask's set_index
with the calculated distance:
ddf_shuffled = ddf.set_index(ddf.geometry.hilbert_distance())
But, one thing that we probably also want during that process is to calculate the spatial bounds of each partition, so that the resulting ddf_shuffled.spatial_partitions
is set.
I am not fully sure if we can do that more efficiently by tightly integrating that into the set_index
(which could mean copying some of the dask implementation to be able to adapt it), or whether calling a ddf.calculate_spatial_partitions()
afterwards is as good.
from dask-geopandas.
I don't think we can make it any faster than calling one after the other.
from dask-geopandas.
One question from the meeting: we could add a keyword to control whether to actually add the calculated distance as the index or to drop it.
from dask-geopandas.
This might be worth having a separate issue about, but one of the ideas mentioned in a previous meeting is to also have a way to shuffle two dataframes at the same time with a consistent partitioning (or shuffle one dataframe to align with the partitioning of another).
from dask-geopandas.
Related Issues (20)
- Will it be possible to use sjoin_nearest with dask? HOT 2
- Boolean indexing with Dask object causes conversion of Dask-Geopandas object to Dask object HOT 3
- Bug reading parquet files with `dask==2022.12.0` HOT 8
- Method directly calls PyGEOS function, but GeoPandas is deprecating PyGEOS for Shapely HOT 2
- Enable using groupby with shuffle
- 0.2.1 release? HOT 3
- Mistaken documentation for dask_geopandas.read_parquet HOT 2
- Overlay function or a way to mimic geopandas overlay HOT 2
- Pickling DaskGeoDataFrame loses `spatial_partitions`
- DOC: incomplete ipython cell HOT 5
- `GeoArrowEngine` error when reading Parquet files HOT 5
- Error when apply sjoin() function HOT 9
- FutureWarning for index_parts parameter in GeoDataFrame.explode() HOT 2
- Unpin sphinx-book-theme HOT 1
- Add support for Pandas 2.0.0 `dtype_backend` argument in `read_feather`
- read parquet from s3 failing with 'GeoArrowEngine' has no attribute 'extract_filesystem' HOT 3
- 0.3.1 release HOT 2
- dtype('O') not supported since geopandas 0.13.0
- Spatial_shuffle() can result in ArrowTypeError when using pyarrow 12 HOT 5
- FeatureError from filegdbtable.cpp when reading file HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask-geopandas.