Coder Social home page Coder Social logo

chenyangkang / stemflow Goto Github PK

View Code? Open in Web Editor NEW
13.0 3.0 1.0 432.76 MB

A Python Package for Adaptive Spatio-Temporal Exploratory Model (AdaSTEM)

Home Page: https://chenyangkang.github.io/stemflow/

License: MIT License

Python 100.00%
bird-migration geospatial machine-learning spatio-temporal-analysis species-distribution-modeling spatiotemporal adaptive-spatio-temporal-exploratory-model biodiversity biogeography ebird

stemflow's Introduction

stemflow 🐦

stemflow logo

A Python Package for Adaptive Spatio-Temporal Exploratory Model (AdaSTEM)

GitHub Anaconda version PyPI version Downloads Downloads GitHub last commit codecov status


Documentation πŸ“–

stemflow Documentation

JOSS paper


Installation πŸ”§

pip install stemflow

To install the latest beta version from github:

pip install pip@git+https://github.com/chenyangkang/stemflow.git

Or using conda:

conda install -c conda-forge stemflow

Brief introduction ℹ️

stemflow is a toolkit for Adaptive Spatio-Temporal Exploratory Model (AdaSTEM [1, 2]) in Python. Typical usage is daily abundance estimation using eBird citizen science data (survey data).

stemflow adopts "split-apply-combine" philosophy. It

  1. Splits input data using Quadtree or Sphere Quadtree.
  2. Trains each spatiotemporal split (called stixel) separately.
  3. Aggregates the ensemble to make the prediction.

The framework leverages the "adjacency" information of surroundings in space and time to model/predict the values of target spatiotemporal points. This framework ameliorates the long-distance/long-range prediction problem [3], and has a good spatiotemporal smoothing effect.

For more information, please see an introduction to stemflow and learning curve analysis


Model and data 🎰

Main functionality of stemflow Supported indexing Supported tasks
βœ… Spatiotemporal modeling & prediction
βœ… User-defined 2D spatial indexing (CRS)
βœ… Binary classification task
βœ… Calculate overall feature importances
βœ… 3D spherical indexing
βœ… Regression task
βœ… Plot spatiotemporal dynamics
βœ… User-defined temporal indexing
βœ… Hurdle task (two step regression – classify then regress the non-zero part)
βœ… Spatial-only modeling
For details see AdaSTEM Demo For details and tips see Tips for spatiotemporal indexing For details and tips see Tips for different tasks
Supported data types Supported base models
βœ… Both continuous and categorical features (prefer one-hot encoding)
βœ… sklearn style BaseEstimator classes (you can make your own base model), for example here
βœ… Both static (e.g., yearly mean temperature) and dynamic features (e.g., daily temperature)
βœ… sklearn style Maxent model. Example here.
For details and tips see Tips for data types For details see Base model choices

Usage ⭐

Use Hurdle model as the base model of AdaSTEMRegressor:

from stemflow.model.AdaSTEM import AdaSTEM, AdaSTEMClassifier, AdaSTEMRegressor
from stemflow.model.Hurdle import Hurdle
from xgboost import XGBClassifier, XGBRegressor

## "hurdle in Ada"
model = AdaSTEMRegressor(
    base_model=Hurdle(
        classifier=XGBClassifier(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1),
        regressor=XGBRegressor(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1)
    ),                                      # hurdel model for zero-inflated problem (e.g., count)
    save_gridding_plot = True,
    ensemble_fold=10,                       # data are modeled 10 times, each time with jitter and rotation in Quadtree algo
    min_ensemble_required=7,                # Only points covered by > 7 ensembles will be predicted
    grid_len_upper_threshold=25,            # force splitting if the grid length exceeds 25
    grid_len_lower_threshold=5,             # stop splitting if the grid length fall short 5         
    temporal_start=1,                       # The next 4 params define the temporal sliding window
    temporal_end=366,                            
    temporal_step=20,                       # The window takes steps of 20 DOY (see AdaSTEM demo for details)
    temporal_bin_interval=50,               # Each window will contain data of 50 DOY
    points_lower_threshold=50,              # Only stixels with more than 50 samples are trained and used for prediction
    Spatio1='longitude',                    # The next three params define the name of 
    Spatio2='latitude',                     # spatial coordinates shown in the dataframe
    Temporal1='DOY',
    use_temporal_to_train=True,             # In each stixel, whether 'DOY' should be a predictor
    njobs=1
)

Fitting and prediction methods follow the style of sklearn BaseEstimator class:

## fit
model = model.fit(X_train.reset_index(drop=True), y_train)

## predict
pred = model.predict(X_test)
pred = np.where(pred<0, 0, pred)
eval_metrics = AdaSTEM.eval_STEM_res('hurdle',y_test, pred_mean)
print(eval_metrics)

Where the pred is the mean of the predicted values across ensembles.

See AdaSTEM demo for further functionality.
See Optimizing stixel size for why and how you should tune the important gridding parameters.


Plot QuadTree ensembles 🌲

model.gridding_plot
# Here, the model is a AdaSTEM class, not a hurdle class

QuadTree example

Here, each color shows an ensemble generated during model fitting. In each of the 10 ensembles, regions (in terms of space and time) with more training samples were gridded into finer resolution, while the sparse one remained coarse. Prediction results were aggregated across the ensembles (that is, in this example, data were modeled 10 times).

If you use SphereAdaSTEM module, the gridding plot is a plotly generated interactive object by default:

See SphereAdaSTEM demo and Interactive spherical gridding plot.


Example of visualization πŸ—ΊοΈ

Daily Abundance Map of Barn Swallow

GIF visualization

See section AdaSTEM demo for how to generate this GIF.


Citation

Chen et al., (2024). stemflow: A Python Package for Adaptive Spatio-Temporal Exploratory Model. Journal of Open Source Software, 9(94), 6158, https://doi.org/10.21105/joss.06158

@article{Chen2024, 
  doi = {10.21105/joss.06158}, 
  url = {https://doi.org/10.21105/joss.06158}, 
  year = {2024}, 
  publisher = {The Open Journal}, 
  volume = {9}, 
  number = {94}, 
  pages = {6158}, 
  author = {Yangkang Chen and Zhongru Gu and Xiangjiang Zhan}, 
  title = {stemflow: A Python Package for Adaptive Spatio-Temporal Exploratory Model}, 
  journal = {Journal of Open Source Software} 
}

Contribute to stemflow πŸ’œ

We welcome pull requests. Contributors should follow contributor guidelines.

Application-level cooperation is also welcomed. We recognized that stemflow may consume large computational resources especially as data volume boosts in the future. We always welcome research collaboration of all kinds.


References:

  1. Fink, D., Damoulas, T., & Dave, J. (2013, June). Adaptive Spatio-Temporal Exploratory Models: Hemisphere-wide species distributions from massively crowdsourced eBird data. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 27, No. 1, pp. 1284-1290).

  2. Fink, D., Auer, T., Johnston, A., Ruiz‐Gutierrez, V., Hochachka, W. M., & Kelling, S. (2020). Modeling avian full annual cycle distribution and population trends with citizen science data. Ecological Applications, 30(3), e02056.

  3. Fink, D., Hochachka, W. M., Zuckerberg, B., Winkler, D. W., Shaby, B., Munson, M. A., ... & Kelling, S. (2010). Spatiotemporal exploratory models for broad‐scale survey data. Ecological Applications, 20(8), 2131-2147.

  4. Johnston, A., Fink, D., Reynolds, M. D., Hochachka, W. M., Sullivan, B. L., Bruns, N. E., ... & Kelling, S. (2015). Abundance models improve spatial and temporal prioritization of conservation resources. Ecological Applications, 25(7), 1749-1756.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.