Coder Social home page Coder Social logo

njtierney / geotargets Goto Github PK

View Code? Open in Web Editor NEW
50.0 5.0 4.0 1.51 MB

Targets extensions for geospatial data

Home Page: https://njtierney.github.io/geotargets/

License: Other

R 100.00%
geospatial pipeline r r-package r-targetopia raster reproducibility reproducible-research rstats targets

geotargets's Introduction

geotargets

Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. R Targetopia R-CMD-check Codecov test coverage

geotargets extends targets to work with geospatial data formats, such as rasters and vectors (e.g., shapefiles). Currently we support raster and vector formats for the terra package

Installation

You can install the development version of geotargets like so:

install.packages("geotargets", repos = c("https://njtierney.r-universe.dev", "https://cran.r-project.org"))

Why geotargets

If you want to use geospatial data formats (such as terra) with the targets package to build analytic reproducible pipelines, it involves writing a lot of custom targets wrappers. We wrote geotargets so you can use geospatial data formats with targets.

To provide more detail on this, a common problem when using popular libraries like terra with targets is running into errors with read and write. Due to the limitations that come with the underlying C++ implementation in the terra library, there are specific ways to write and read these objects. See ?terra for details. geotargets helps handle these write and read steps, so you don’t have to worry about them and can use targets as you are used to.

In essence, if you’ve ever come across the error:

Error in .External(list(name = "CppMethod__invoke_notvoid", address = <pointer: 0x0>,  : 
  NULL value passed as symbol address

or

Error: external pointer is not valid

When trying to read in a geospatial raster or vector in targets, then geotargets for you :)

Examples

We currently provide support for the terra package with targets. Below we show three examples of target factories:

  • tar_terra_rast()
  • tar_terra_vect()
  • tar_terra_sprc()
  • tar_stars()

You would use these in place of tar_target() in your targets pipeline, when you are doing work with terra raster, vector, or raster collection data.

If you would like to see and download working examples for yourself, see the repo, demo-geotargets.

tar_terra_rast(): targets with terra rasters

library(targets)

tar_dir({ # tar_dir() runs code from a temporary directory.
  tar_script({
    library(geotargets)
    
    get_elev <- function() {
        terra::rast(system.file("ex", "elev.tif", package = "terra"))
    }
    
    list(
      tar_terra_rast(
        terra_rast_example,
        get_elev()
      )
    )
  })
  
  tar_make()
  x <- tar_read(terra_rast_example)
  x
})
#> ▶ dispatched target terra_rast_example
#> ● completed target terra_rast_example [0.019 seconds]
#> ▶ ended pipeline [0.185 seconds]
#> class       : SpatRaster 
#> dimensions  : 90, 95, 1  (nrow, ncol, nlyr)
#> resolution  : 0.008333333, 0.008333333  (x, y)
#> extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source      : terra_rast_example 
#> name        : elevation 
#> min value   :       141 
#> max value   :       547

tar_terra_vect(): targets with terra vectors

tar_dir({ # tar_dir() runs code from a temporary directory.
  tar_script({
    library(geotargets)
    
    lux_area <- function(projection = "EPSG:4326") {
      terra::project(terra::vect(system.file("ex", "lux.shp",
                                             package = "terra")),
                     projection)
    }
    
    list(
      tar_terra_vect(
        terra_vect_example,
        lux_area()
      )
    )
  })
  
  tar_make()
  x <- tar_read(terra_vect_example)
  x
})
#> ▶ dispatched target terra_vect_example
#> ● completed target terra_vect_example [0.034 seconds]
#> ▶ ended pipeline [0.173 seconds]
#>  class       : SpatVector 
#>  geometry    : polygons 
#>  dimensions  : 12, 6  (geometries, attributes)
#>  extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
#>  source      : terra_vect_example
#>  coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#>  names       :  ID_1   NAME_1  ID_2   NAME_2  AREA   POP
#>  type        : <num>    <chr> <num>    <chr> <num> <int>
#>  values      :     1 Diekirch     1 Clervaux   312 18081
#>                    1 Diekirch     2 Diekirch   218 32543
#>                    1 Diekirch     3  Redange   259 18664

tar_terra_sprc(): targets with terra raster collections

tar_dir({ # tar_dir() runs code from a temporary directory.
  tar_script({
    
    library(geotargets)
    
    elev_scale <- function(z = 1, projection = "EPSG:4326") {
      terra::project(
        terra::rast(system.file("ex", "elev.tif", package = "terra")) * z,
        projection
      )
    }
    
    list(
      tar_terra_sprc(
        raster_elevs,
        # two rasters, one unaltered, one scaled by factor of 2 and
        # reprojected to interrupted goode homolosine
        command = terra::sprc(list(
          elev_scale(1),
          elev_scale(2, "+proj=igh")
        ))
      )
    )
  })
  
  tar_make()
  x <- tar_read(raster_elevs)
  x
})
#> ▶ dispatched target raster_elevs
#> ● completed target raster_elevs [0.112 seconds]
#> ▶ ended pipeline [0.266 seconds]
#> Warning message:
#> [rast] skipped sub-datasets (see 'describe(sds=TRUE)'):
#> /private/var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/Rtmpr9sjXA/targets_1085a12a40d0c/_targets/scratch/raster_elevs
#> class       : SpatRasterCollection 
#> length      : 2 
#> nrow        : 90, 115 
#> ncol        : 95, 114 
#> nlyr        :  1,   1 
#> extent      : 5.741667, 1558890, 49.44167, 5556741  (xmin, xmax, ymin, ymax)
#> crs (first) : lon/lat WGS 84 (EPSG:4326) 
#> names       : raster_elevs, raster_elevs

tar_stars(): targets with stars objects

tar_dir({ # tar_dir() runs code from a temporary directory.
  tar_script({
    library(geotargets)
    
    list(
      tar_stars(
        test_stars,
        stars::read_stars(system.file("tif", "olinda_dem_utm25s.tif", package = "stars"))
      )
    )
  })
  
  tar_make()
  x <- tar_read(test_stars)
  x
})
#> ▶ dispatched target test_stars
#> ● completed target test_stars [0.053 seconds]
#> ▶ ended pipeline [0.17 seconds]
#> Warning message:
#> In CPL_write_gdal(mat, file, driver, options, type, dims, from,  :
#>   GDAL Message 6: creation option '' is not formatted with the key=value format
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#>             Min. 1st Qu. Median     Mean 3rd Qu. Max.
#> test_stars    -1       6     12 21.66521      35   88
#> dimension(s):
#>   from  to  offset  delta                       refsys point x/y
#> x    1 111  288776  89.99 UTM Zone 25, Southern Hem... FALSE [x]
#> y    1 111 9120761 -89.99 UTM Zone 25, Southern Hem... FALSE [y]

Code of Conduct

Please note that the geotargets project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

A note on development

geotargets is still undergoing development. We currently consider the extensions with terra maturing and approaching stability. We would love for people to use the package to kick the tyres. We are using it in our own work, but want users to know that the API could change in subtle or breaking ways.

geotargets's People

Contributors

aariq avatar brownag avatar njtierney avatar richardscottoz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

geotargets's Issues

Update README with basic use cases

geotargets now works, so it would be good to demonstrate its use in the README. Could just pull some examples from the tests/examples

split_semicolon() function

We do the pattern:

strsplit(options, ";")[[1]]

and

paste0(options, collapse = ";")

A bit in the package. A function, semicolon_split(option) and converse I think is worthwhile.

semicolon_split <- function(options){
    strsplit(options, ";")[[1]]
}

some_options <- c("THING;ANOTHER;THINGY")

semicolon_split(some_options)
#> [1] "THING"   "ANOTHER" "THINGY"

semicolon_paste <- function(vec){
    paste0(vec, collapse = ";")
}

some_options <- c("THING;ANOTHER;THINGY")

some_options
#> [1] "THING;ANOTHER;THINGY"

semicolon_split(some_options)
#> [1] "THING"   "ANOTHER" "THINGY"

semicolon_split(some_options) |> semicolon_paste()
#> [1] "THING;ANOTHER;THINGY"

Created on 2024-04-29 with reprex v2.1.0

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.3 (2024-02-29)
#>  os       macOS Sonoma 14.3.1
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Australia/Hobart
#>  date     2024-04-29
#>  pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.2   2023-12-11 [1] CRAN (R 4.3.1)
#>  digest        0.6.35  2024-03-11 [1] CRAN (R 4.3.1)
#>  evaluate      0.23    2023-11-01 [1] CRAN (R 4.3.1)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
#>  fs            1.6.3   2023-07-20 [1] CRAN (R 4.3.0)
#>  glue          1.7.0   2024-01-09 [1] CRAN (R 4.3.1)
#>  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.3.1)
#>  knitr         1.45    2023-10-30 [1] CRAN (R 4.3.1)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.3.1)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
#>  purrr         1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
#>  R.cache       0.16.0  2022-07-21 [2] CRAN (R 4.3.0)
#>  R.methodsS3   1.8.2   2022-06-13 [2] CRAN (R 4.3.0)
#>  R.oo          1.26.0  2024-01-24 [2] CRAN (R 4.3.1)
#>  R.utils       2.12.3  2023-11-18 [2] CRAN (R 4.3.1)
#>  reprex        2.1.0   2024-01-11 [2] CRAN (R 4.3.1)
#>  rlang         1.1.3   2024-01-10 [1] CRAN (R 4.3.1)
#>  rmarkdown     2.26    2024-03-05 [1] CRAN (R 4.3.1)
#>  rstudioapi    0.16.0  2024-03-24 [1] CRAN (R 4.3.1)
#>  sessioninfo   1.2.2   2021-12-06 [2] CRAN (R 4.3.0)
#>  styler        1.10.3  2024-04-07 [2] CRAN (R 4.3.1)
#>  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.3.1)
#>  withr         3.0.0   2024-01-16 [1] CRAN (R 4.3.1)
#>  xfun          0.43    2024-03-25 [1] CRAN (R 4.3.1)
#>  yaml          2.3.8   2023-12-11 [1] CRAN (R 4.3.1)
#> 
#>  [1] /Users/nick/Library/R/arm64/4.3/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Make GDAL option argument names consistent across geotargets

I am going to suggest that we follow what the backend package's write function uses until we can identify how many unique "GDAL option" types we have to deal with across read and write, then choose a consistent name for each of them.

We currently only provide ability to set "creation options" via terra gdal/optionsargs--but for some GDAL vector drivers there is the ability to set both dataset and layer creation options for write.

I note that {sf} write has these option types separately via sf::st_write(..., layer_options=foo, dataset_options=bar). It appears terra::writeVector() does not currently support "dataset" creation options. For all packages we also have the "read" options (GDAL "open options"), which also use the argument name options in most cases.

x <- sf::st_read(system.file("ex", "lux.shp", package = "terra"))
#> Reading layer `lux' from data source `/home/andrew/R/x86_64-pc-linux-gnu-library/4.3/terra/ex/lux.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 12 features and 6 fields
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 5.74414 ymin: 49.44781 xmax: 6.528252 ymax: 50.18162
#> Geodetic CRS:  WGS 84
sf::st_write(x, "test.gpkg", append=F, dataset_options="METADATA_TABLES=YES")
#> options:        METADATA_TABLES=YES 
#> Writing layer `test' to data source `test.gpkg' using driver `GPKG'
#> Writing 12 features with 6 fields and geometry type Polygon.
terra::writeVector(terra::vect(x), "test2.gpkg", options = "METADATA_TABLES=YES")
#> Warning message:
#> In x@ptr$write(filename, layer, filetype, insert[1], overwrite[1],  :
#>   GDAL Message 6: dataset test2.gpkg does not support layer creation option METADATA_TABLES
## install.packages("gpkg")
gpkg::gpkg_list_tables("test.gpkg")
#>  [1] "gpkg_contents"           "gpkg_extensions"         "gpkg_geometry_columns"   "gpkg_metadata"          
#>  [5] "gpkg_metadata_reference" "gpkg_ogr_contents"       "gpkg_spatial_ref_sys"    "gpkg_tile_matrix"       
#>  [9] "gpkg_tile_matrix_set"    "rtree_test_geom"         "rtree_test_geom_node"    "rtree_test_geom_parent" 
#> [13] "rtree_test_geom_rowid"   "sqlite_sequence"         "test"                   
gpkg::gpkg_list_tables("test2.gpkg")
#>  [1] "gpkg_contents"           "gpkg_extensions"         "gpkg_geometry_columns"   "gpkg_ogr_contents"      
#>  [5] "gpkg_spatial_ref_sys"    "gpkg_tile_matrix"        "gpkg_tile_matrix_set"    "rtree_test2_geom"       
#>  [9] "rtree_test2_geom_node"   "rtree_test2_geom_parent" "rtree_test2_geom_rowid"  "sqlite_sequence"        
#> [13] "test2" 

Note that in the above, test.gpkg has two additional metadata tables after write.


Originally raised by @Aariq

Just noticing that sometimes this arg is called options and sometimes it is called gdal. We should decide whether we want to be consistent throughout geotargets or if we want to be consistent with whatever function the arg is being passed to (or split the difference and call it gdal_options)

This is a good idea. I was sortof leaning towards emulating whatever the "write" function uses for each particular spatial backend being used... But that won't cover everything. I definitely see the value of choosing something consistent across geotargets.

Originally posted by @brownag in #33 (comment)

create CITATION.cff

This will make it easier to cite and populate metadata if/when we archive code with Zenodo.

Add support for terra SpatVectorProxy, and `format="file"` for SpatRaster

The {terra} SpatVectorProxy allows you to create a "lazy" reference to a vector dataset with terra::vect(..., proxy=TRUE) that you can query with terra::query() rather than loading all attributes and geometry into memory. This is very helpful and can be much more efficient when working with portions of large data. The SpatVector is always in memory, SpatVectorProxy never in memory, and SpatRaster is in memory if it is sufficiently small, otherwise it automatically behaves as if it were a "SpatRasterProxy" to the source file.

Currently, tar_terra_vect() cannot handle SpatVectorProxy because there is no SpatVectorProxy writeVector() method:

library(targets)

tar_script({
    list(
        geotargets::tar_terra_vect(test_terra_proxy,
                                   terra::vect(system.file("ex", "lux.shp", package = "terra"), proxy = TRUE),
                                   filetype = "Parquet")
    )
})

tar_make()
#> Loading required namespace: terra
#> ▶ dispatched target test_terra_proxy
#> ✖ errored target test_terra_proxy
#> ✖ errored pipeline [0.121 seconds]
#> Error:
#> ! Error running targets::tar_make()
#> Error messages: targets::tar_meta(fields = error, complete_only = TRUE)
#> Debugging guide: https://books.ropensci.org/targets/debugging.html
#> How to ask for help: https://books.ropensci.org/targets/help.html
#> Last error message:
#>     _store_ unable to find an inherited method for function ‘writeVector’ for signature ‘"SpatVectorProxy", "character"’
#> Last error traceback:
#>     No traceback available.
# ...

A philosophical question is whether creating a target for a SpatVectorProxy should copy the full source data to the target store, as we do for vector objects in memory, OR create a format="file" target for the data source returned by terra::sources().

For a proxy object, I think I might often prefer the latter option. On one hand the former might be more reproducible in general as the source data get copied, but essentially we have this as the default SpatVector and SpatRaster approach already.

I often work with some fairly large file-based databases using a SpatVectorProxy or large SpatRaster initially and materializing only small portions relevant to specific areas later with query() or crop() or similar. Usually I would be fine to have targets just track the state of the source file, rather than a full copy of the data, as those things are not changing much, and often can be downloaded through standard methods (and the download could be a preceding target in the pipeline, prior to creating the SpatVectorProxy/SpatRaster)

Perhaps the methods we have implemented should have an option to utilize an existing source and a format="file" approach. This would be the default behavior for SpatVectorProxy, and default could be based on terra::inMemory() for SpatRaster.

I see a few problems with the above:

  • I suppose that format="file" would only work for source formats that are a file to begin with (e.g. GeoPackage, FGDB, SQLite, DuckDB, Parquet...) but not for true database drivers like PostgreSQL. I think this can be addressed as a different issue for true database sources, where instead of a format="file" you store a checksum for a table or query result from a database source, and possibly? something about the database connection.

  • {terra} will automatically write temporary files for raster operations that are too big to be done entirely in memory. This means inMemory() could return FALSE, but a target could be created based on a temporary file that will be deleted after the R session cleans up, invalidating downstream targets. In the case a temp file is used, rather than a file path specifically chosen by the user, we may not want to automatically decide for the user whether the "proxy" behavior of SpatRaster should be triggered.

Multiple calls to `geotargets_option_set()` nulls out existing options

One more thing I noticed after #45

library(geotargets)

# set an option
geotargets_option_set(gdal_raster_driver = "GPKG")

# as expected
geotargets_option_get("gdal_raster_driver")
#> [1] "GPKG"

# set a different option
geotargets_option_set(gdal_vector_driver = "GPKG")

# both should be "GPKG"
geotargets_option_get("gdal_vector_driver")
#> NULL
geotargets_option_get("gdal_raster_driver")
#> NULL

This way of setting options all at once is the issue, due to default NULL argument values:

options(
"geotargets.gdal.raster.driver" = gdal_raster_driver,
"geotargets.gdal.raster.creation.options" = gdal_raster_creation_options,
"geotargets.gdal.vector.driver" = gdal_raster_creation_options,
"geotargets.gdal.vector.creation.options" = gdal_raster_creation_options
)

Note also that the wrong arguments are set for the vector options. Should use

        "geotargets.gdal.vector.driver" = gdal_vector_driver,
        "geotargets.gdal.vector.creation.options" = gdal_vector_creation_options

Check if invalid option names are provided in `geotargets_option_get()`

As stated by @Aariq in #19

...[give] some other method of erroring when an invalid option name is provided

Referring to after this code chunk:

geotargets_option_get <- function(option_name) {
    if (!startsWith(option_name, "geotargets.")) {
        option_name <- paste0("geotargets.", option_name)
    }

# ...

We could do something like:

rlang::arg_match0(option_name, names(geotargets.env)

User input `resources` not being passed from `tar_terra_*()` to `tar_target_raw()`

I'm trying to follow the targets manual section about heterogeneous workers (https://books.ropensci.org/targets/crew.html#heterogeneous-workers) and can't get a tar_terra_rast() target to override the default crew controller set in tar_option_set(). It works with a normal tar_target() however. I suspect this is because the resources argument to tar_terra_rast() is not passed to targets::tar_target_raw(), rather it is overwritten with targets::tar_resources(custom_format = targets::tar_resources_custom_format(.... Need to find a way to append resources input by users.

consider how `tar_terra_sprc` might work in dynamic and static branching

RE @Aariq 's comment in #50

... what I really wish for is a way to use sprc() and terra::c() with dynamic and static branching. E.g., would the test pipeline in this PR perhaps be better done with tar_map() and a geotargets version of tarchetypes::tar_combine() like tar_combine_sprc()? Or would it be possible to create our own options for an iterate argument so one could use dynamic branching with iterate = "sprc" or iterate = "SpatRaster_layers"? I think we might have to dig more into the targets code or ask Will to answer those questions.

create `tar_terra_sds()`

SpatRasterDatasets are similar to SpatRasterCollections (implemented in #50) but all datasets must have the same extent (and projection?). They are a bit easier to work with though, as they appear to be coercible to lists and purrr::map() works on SpatRasterDataset but not SpatRasterCollection. I'm guessing we can borrow much if not all of the write function from tar_terra_sprc() for this.

Best way to use SpatRaster tiles?

I'm working with a large SpatRaster and running out of memory trying to do pixel-wise computations on the whole thing, so I'm thinking the best thing to do is to split it into tiles with makeTiles(), then (ideally) work on each tile with dynamic branching and re-combine the results. I'm having trouble getting this to work though, and not sure what pieces of this should be handled by geotargets. Here's what I've got so far:

library(targets)
tar_script({
    library(targets)
    library(geotargets)
    library(terra)
    
    make_tiles <- function(raster) {
        rast_name <- as.character(rlang::ensym(raster))
        x <- terra::rast(ncols = 2, nrows = 2) 
        ext(x) <- ext(raster)
        fs::dir_create("tiles")
        makeTiles(raster, x, filename = fs::path("tiles", fs::path_ext_set(rast_name, "tiff")), overwrite = TRUE)
    }
    
    list(
        tar_terra_rast(
            rast_example,
            terra::rast(system.file("ex/logo.tif", package="terra"))
        ),
        tar_target(
            rast_tiles,
            make_tiles(rast_example),
            format = "file"
        ),
        #for each tile, calculate mean for each pixel across the three layers
        tar_terra_rast(
            mean_tiles,
            mean(rast(rast_tiles)),
            pattern = map(rast_tiles)
        )
    )
})
tar_make()
#> terra 1.7.71
#> ▶ dispatched target rast_example
#> ● completed target rast_example [0.016 seconds]
#> ▶ dispatched target rast_tiles
#> ● completed target rast_tiles [0.077 seconds]
#> ▶ ended pipeline [0.294 seconds]
#> Error:
#> ! Error running targets::tar_make()
#> Error messages: targets::tar_meta(fields = error, complete_only = TRUE)
#> Debugging guide: https://books.ropensci.org/targets/debugging.html
#> How to ask for help: https://books.ropensci.org/targets/help.html
#> Last error message:
#>    Target mean_tiles tried to branch over rast_tiles, which is illegal. Patterns must only branch over explicitly 
#> declared targets in the pipeline. Stems and patterns are fine, but you cannot branch over branches or global 
#> objects. Also, if you branch over a target with format = "file", then that target must also be a pattern.
#> Last error traceback: <excluded for brevity>

tar_read(rast_tiles)
#> [1] "tiles/rast_example1.tiff" "tiles/rast_example2.tiff"
#> [3] "tiles/rast_example3.tiff" "tiles/rast_example4.tiff"

Created on 2024-05-14 with reprex v2.1.0

I'm not sure how to make rast_tiles also be a pattern here.

This is somewhat related to #53

Supporting various split-apply-combine spatial workflows

So there are at least 4 different ways (yikes!) to do a split-apply-combine type workflow with spatial data just in terra that I've encountered. It may be beneficial to figure out a way to implement these workflows using targets branching in geotargets

  1. SpatRasters or SpatVectors with multiple layers can be split with [[ and recombined with c() or rast()/vect(). Also can be iterated over with lapply() or coerced to a list with as.list()
  2. SpatRasterCollections hold a list of SpatRasters, possibly with different extents, resolutions, and projections. Can be split with [ (but not [[) and combined with terra::sprc(). lapply() also works on them and can be coerced with as.list(). (see #53)
  3. SpatRasterDatasets hold a list of SpatRasters with the same resolution and projection (sub-datasets). Can be split with [ or [[ and re-combined with terra::sds(). lapply() also works and can be coerced with as.list(). names are lost in the process (I assume this is a bug: rspatial/terra#1513). (see #59)
  4. makeTiles() splits a raster into tiles that are saved to disk and returns a vector of file paths. Can be opened individually, worked on, and re-combined with merge(sprc(rast(<list of tiles as SpatRasters>))). Alternatively, the files on disk can be opened with vrt() (see #69)
  5. ✅ iterate over lists of SpatRasters or SpatVectors (e.g. with dynamic branching)

Some open questions:

  1. Which of these (if any) should we try to support better in geotargets (e.g. with target factories)?
  2. Are there existing similar patterns we can borrow from other Targetopia packages? (knowing that we can't customize the behavior of the iteration argument to tar_target_raw())
  3. Will any of these be easier to do if we adopt an approach like #63? (I suspect that a tar_terra_tiles() would be easier like this)
  4. Should we try to figure this stuff out soon, or later after we've added basic support for sf and stars?

Generalization of compression for spatial targets with GDAL

Just pulling this from #4 as I'm not sure we captured this as an issue?

Generalization of the "multiple file target compression" GDAL /vsizip/ approach to all backends and formats that support it

From @brownag

The /vsizip/ GDAL virtual file system functionality used in format_shapefile() is an example of something that can be generalized further with a focus on generic GDAL data source paths. I think the idea of being able to compress files that are in the target store (and keep them compressed) is attractive for spatial data which can be quite large--even if targets are not comprised of multiple files.

Since GDAL can read from the compressed target store efficiently, you get the benefit of less file size footprint while also being able to read the file without fully extracting it.
Also should consider some of the other archive file formats/virtual file system types, and providing interfaces in R to produce them e.g. /vsigzip/ or /vsitar/ analogs to /vsizip/ + utils::zip().
Even without creating specific compressed archive files, there should be robust tools available for controlling GDAL file compression options, supported by many drivers, that are used to write target objects
The ZIP approach is useful for GeoTIFF files where category information is stored in the .tif.aux.xml sidecar file. Convenience methods for terra SpatRaster objects could automatically store a target as a ZIP file (and give warnings about target naming) if the input SpatRaster is categorical and output format is GeoTIFF.

Release geotargets 0.1.0

First release (for github):

Prepare for release:

  • git pull
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • git push
  • Draft blog post
  • Finish & publish blog post
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)
  • Toot

Combining list of SpatRasters with rast fails with multiple workers

I'm encountering a bug when trying to combine a list of rasters created by dynamic branching. It seems related to marshaling, since the error message mentions wrap() and I only get the error with multiple crew workers, but not sure.

targets::tar_script({
    targets::tar_option_set(controller = crew::crew_controller_local(workers = 2))
    list(
        geotargets::tar_terra_rast(
            rast_raw,
            terra::rast(system.file("ex/elev.tif", package = "terra"))
        ),
        targets::tar_target(x, 1:3),
        geotargets::tar_terra_rast(
            rast_plus,
            rast_raw + x,
            pattern = map(x),
            iteration = "list"
        ),
        geotargets::tar_terra_rast(
            combined,
            terra::rast(unname(rast_plus))
        )
    )
})
targets::tar_make()
#> ▶ dispatched target x
#> ▶ dispatched target rast_raw
#> ● completed target x [0.028 seconds]
#> ● completed target rast_raw [5.603 seconds]
#> ▶ dispatched branch rast_plus_29239c8a
#> ▶ dispatched branch rast_plus_7cc32924
#> ● completed branch rast_plus_7cc32924 [0.008 seconds]
#> ▶ dispatched branch rast_plus_bd602d50
#> ● completed branch rast_plus_bd602d50 [0.005 seconds]
#> ● completed branch rast_plus_29239c8a [0.006 seconds]
#> ● completed pattern rast_plus
#> ▶ dispatched target combined
#> ▶ ended pipeline [19.278 seconds]
#> Error:
#> ! Error running targets::tar_make()
#> Error messages: targets::tar_meta(fields = error, complete_only = TRUE)
#> Debugging guide: https://books.ropensci.org/targets/debugging.html
#> How to ask for help: https://books.ropensci.org/targets/help.html
#> Last error message:
#>     target combined error: unable to find an inherited method for function ‘wrap’ for signature ‘"NULL"’
#> Last error traceback:
#>     base::tryCatch(base::withCallingHandlers({ NULL base::saveRDS(base::do.c...
#>     tryCatchList(expr, classes, parentenv, handlers)
#>     tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]), na...
#>     doTryCatch(return(expr), name, parentenv, handler)
#>     tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
#>     tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>     doTryCatch(return(expr), name, parentenv, handler)
#>     base::withCallingHandlers({ NULL base::saveRDS(base::do.call(base::do.ca...
#>     base::saveRDS(base::do.call(base::do.call, base::c(base::readRDS("/var/f...
#>     base::do.call(base::do.call, base::c(base::readRDS("/var/folders/wr/by_l...
#>     (function (what, args, quote = FALSE, envir = parent.frame()) { if (!is....
#>     (function (targets_function, targets_arguments, options, envir = NULL, s...
#>     tryCatch(out <- withCallingHandlers(targets::tar_callr_inner_try(targets...
#>     tryCatchList(expr, classes, parentenv, handlers)
#>     tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>     doTryCatch(return(expr), name, parentenv, handler)
#>     withCallingHandlers(targets::tar_callr_inner_try(targets_function = targ...
#>     targets::tar_callr_inner_try(targets_function = targets_function, target...
#>     do.call(targets_function, targets_arguments)
#>     (function (pipeline, path_store, names_quosure, shortcut, reporter, seco...
#>     crew_init(pipeline = pipeline, meta = meta_init(path_store = path_store)...
#>     self$run_crew()
#>     self$iterate()
#>     self$conclude_worker_task()
#>     tar_assert_all_na(result$error, msg = paste("target", result$name, "erro...
#>     tar_throw_validate(msg %|||% default)
#>     tar_error(message = paste0(...), class = c("tar_condition_validate", "ta...
#>     rlang::abort(message = message, class = class, call = tar_empty_envir)
#>     signal_abort(cnd, .file)

Created on 2024-04-09 with reprex v2.1.0

Create `tar_terra_svc()` for SpatVectorCollection objects

I think at a minimum we should add an analog for SpatVectorCollection (i.e. tar_terra_svc() for terra::svc() results)

My initial response in the case of lists, which can be nested and heterogeneous, is that this may be a bit difficult to handle in general for geotargets methods.

However I can totally see the utility of list columns within a data.frame being a useful way to manage multiple terra objects and their metadata... so this is perhaps worth considering more.

Originally posted by @brownag in #77 (comment)

Functionality for objects *containing* `terra` objects?

In projects that iterate over, e.g. scenarios, countries, etc., I often find it convenient to organise spatial objects into list elements of tables, where some columns may contain variables, and others spatial inputs / results.

However neither geotargets nor targets can handle these objects at present (as far as I can tell).

This functionality would be helpful.

Examples below modified from README store a SpatVector in a list.

library(targets)
library(geotargets)


###### using `tar_terra_vect`


tar_dir({ # tar_dir() runs code from a temporary directory.
  tar_script({
    library(geotargets)
    lux_area <- function(projection = "EPSG:4326") {
      terra::project(
        terra::vect(system.file("ex", "lux.shp",
                                package = "terra"
        )),
        projection
      )
    }
    list(
      tar_terra_vect(
        terra_vect_example,
        lux_area() |> list()
      )
    )
  })
  tar_make()
  x <- tar_read(terra_vect_example)
  x
})
#> ▶ dispatched target terra_vect_example
#> ✖ errored target terra_vect_example
#> ✖ errored pipeline [0.062 seconds]
#> Error:
#> ! Error running targets::tar_make()
#> Error messages: targets::tar_meta(fields = error, complete_only = TRUE)
#> Debugging guide: https://books.ropensci.org/targets/debugging.html
#> How to ask for help: https://books.ropensci.org/targets/help.html
#> Last error message:
#>     _store_ unable to find an inherited method for function ‘writeVector’ for signature ‘"list", "character"’
#> Last error traceback:
#>     No traceback available.


###### using `tar_target`


tar_dir({ # tar_dir() runs code from a temporary directory.
  tar_script({
    library(geotargets)
    lux_area <- function(projection = "EPSG:4326") {
      terra::project(
        terra::vect(system.file("ex", "lux.shp",
                                package = "terra"
        )),
        projection
      )
    }
    list(
      tar_target(
        terra_vect_example,
        lux_area() |> list()
      )
    )
  })
  tar_make()
  x <- tar_read(terra_vect_example)
  x
})
#> ▶ dispatched target terra_vect_example
#> ● completed target terra_vect_example [1.242 seconds]
#> ▶ ended pipeline [1.856 seconds]
#> [[1]]
#> Error: external pointer is not valid

Created on 2024-05-24 with reprex v2.1.0

Linking to documentation for specifying multiple options in `geotargets.gdal.raster.creation_options`

As asked by @Aariq in #19

Can one supply multiple options? If so, how should they be delimited?

In reference to:

#' ## Available Options
#'
#'  - `"geotargets.gdal.raster.creation_options"` - set the GDAL creation options used when writing raster files to target store (default: `"ENCODING=UTF-8"`)

It might would be useful to discuss how to specify multiple options, and where users can go to find more options to set. E.g., for setting multiple options we could reference this:

geotargets::geotargets_option_set("raster_gdal_creation_options", c("COMPRESS=DEFLATE", "TFW=YES"))

And if we want users to have the capacity to provide multiple options, maybe we point them to https://gdal.org/drivers/raster/gtiff.html#creation-options to list creation options? Is that the right source?

using `geodata::gadm` with geotargets

geodata::gadm is commonly used to get shapefiles for countries at various levels of boundaries.

This issue is to talk about how to use gadm within geotargets - related to #5.

Some code to download a country shapefile might be something like:

get_gadm_country <- function(country = "Australia") {
    dir.create("data/shapefiles", recursive = TRUE, showWarnings = FALSE)
    geodata::gadm(
        country = country,
        level = 0,
        path = "data/shapefiles",
        version = "4.1",
        # low res
        resolution = 2
    )
}

My question is how should we recommend that users implement and use something like this in a targets workflow? At this stage I haven't tested this code out with the latest version of {geotargets} so it might all just work, but my specific concerns are around:

  • What do we recommend users do when creating functions like this? Do they need to create directories?
  • Should we provide helpers so users can do this kind of data retrieval easily?

Also porting in @brownag 's comment from #5 in here as it is relevant:

In the case of get_gadm_country() you will get one .rds file in the directory path for each country specified. Each .rds file contains a wrapped SpatVector of a single country. When the cache exists for a particular country/version/resolution, rather than calling download.file() the SpatVector is read from file and unwrapped. Each of these could be "file" targets if you can determine what a particular call to gadm() should produce (i.e. 1 file with specific file name given parameters in path for each country specified)

After Aariq@f82edd4 you can easily do something like tar_terra_vect(some_countries, get_gadm_country(c("Australia", "New Zealand"))) which will write the SpatVector combination of two .rds files as a new seek-optimized shapefile (.shz). If the source .rds files don't exist (i.e. on first run of the pipeline) or are deleted (i.e. a temp folder is used), they would be re-downloaded. This does not rely on external files outside the target store, but technically you have copies of the data in the initial gadm() download (stored as PackedSpatVector) and the target(s) (stored as zipped shapefile)

If you want to have the file(s) produced by gadm() be the targets... we need to consider that they are not shapefiles but rather PackedSpatVector in "rds" format... and a single call could produce multiple "rds" files (multiple targets)

You might need to have one format="file" target for each country of interest, possibly using dynamic branching or using a wrapper function around gadm() so you can identify what the file names are, rather than return the unwrapped SpatVector result in memory. EDIT: format="file" can be multiple file paths or directory, so the number of targets to use would depend on what you want to do with them

In general format="file" can probably be used for a variety of cases of geospatial data, especially when the user has some specific folder structure or process they need to follow that wouldn't work well out of the target store proper. Using format="file" does require some additional bookkeeping on the paths, filenames and types, though. https://books.ropensci.org/targets/data.html#external-files

`format_shapefile()` `utils::zip()` usage

An alternative to utils::zip() may be to write shapefiles as .shz/.shp.zip extension and readily use seek-optimized ZIP

From https://gdal.org/user/virtual_file_systems.html#sozip-seek-optimized-zip

GDAL (>= 3.7) has full read and write support for .zip files following the SOZip (Seek-Optimized ZIP) profile.

The ESRI Shapefile / DBF and GPKG -- GeoPackage vector drivers can directly generate SOZip-enabled .shz/.shp.zip or .gpkg.zip files.

While this would allow for us to easily make use of seek-optimize ZIP, I think it still does not get us around the need to have the target object name end in ".shz" or ".zip" so that GDAL can detect the file on read. #4 (comment)

Apparent bugs / problems with function body modification with covr+testthat+tar_test()

Replacement of a function body does not seem to work properly within the evaluation environment created by combination of {covr}, targets tests with tar_test(), and {testthat}. We dynamically modify our internal functions using body<- to customize the functions passed to tar_format().

Interactive runs, local runs, runs of {testthat} locally and remotely will all work fine, but when running tests via {covr} the problem arises.

I did some digging the other day into this issue, and there is a great vignette on how {covr} works behind the scenes: https://cran.r-project.org/web/packages/covr/vignettes/how_it_works.html

From the vignette:

The core function in covr is trace_calls(). This function was adapted from ideas in Advanced R - Walking the Abstract Syntax Tree with recursive functions. This recursive function modifies each of the leaves (atomic or name objects) of an R expression by applying a given function to them. If the expression is not a leaf the walker function calls itself recursively on elements of the expression instead.

My suspicion is whatever {covr} does to modify the code during its processing causes a situation where no modification to the tar_format() read/write function occurs when we call body<-.

To solve this, I think we need to isolate the bug in a reprex and report it to {covr} maintainers. Probably this is not intended behavior, since other evaluation contexts have no apparent issues. In the meantime we could possibly find a way to "protect" or exclude the specific functions from being modified by {covr}.

I have tried several different alternatives to current approach--including converting function body to string and modifying the string, replacing the whole function; using an intermediate object; storing the function in a different environment and modifying it there--and no method of replacement with body<- seems to work.

These tests indicate to me it is not something simple like that the {covr} modified function body has more than 2 elements, or that the names of the arguments are missing, or that it is evaluating in the wrong environment. It seems to mealmost like there is a copy of the function made, and it corresponds to the initial function definition, and the body calls do not get applied to the right version of the function, which is what is ultimately passed to tar_format()

@Aariq had some good thoughts on covr related issues worth following up on:

Would it be helpful to maybe look at how targets, tarchetypes, etc. deals with this? There might be some hints in their github actions or in commit history. I'm assuming the coverage issues have something to do with callr R sessions in tests, but I don't actually know. Not sure if this is helpful, but I've noticed that the tests use the installed version of geotargets because of the namepacing with geotargets:: rather than using the current state of the code.

Originally posted by @Aariq in #31 (comment)

Default behaviours for filetype and GDAL

As noted at https://github.com/njtierney/geotargets/blob/master/R/tar-terra-rast.R#L66-L73 by @brownag

It could be a good idea to have some default values for things like raster filetypes ("GTiff", "netCDF", ?) and GDAL ("ENCODING=UTF-8").

These could be set with options or maybe there's a way {targets} like us to set target options - tar_option_set() doesn't look like it has slots for new arguments, so either we make our own geotarget_option_set function, or we can get the user to use options or set an envvar, or something. Note that https://design.tidyverse.org/def-user.html doesn't discuss this in great detail.

Adopt API of `tar_options_set` for `geotargets_options_set`

targets::tar_options_set() just uses named args with NULL as default and it might be nice for users to have a similar experience with geotargets_options_set(). E.g.

geotargets_options_set <- function(
  gdal_raster_driver = NULL,
  gdal_raster_creation_options = NULL,
  gdal_vector_driver = NULL,
  gdal_vector_creation_options = NULL
  ){ ...
 

And because we are setting these options with defaults on package load, we can do arguments to tar_*() functions similar to targets like so:

tar_terra_rast <- function(name,
                           command,
                           pattern = NULL,
                           filetype = geotargets::geotargets_options_get("gdal_raster_driver"),
                           gdal = geotargets::geotargets_options_get("gdal_raster_creation_options"),
                           ...,
                           tidy_eval = targets::tar_option_get("tidy_eval"),

And remove geotargets_options_get() from the body of these functions.

RConsortium ISC proposal

@brownag, @njtierney

Would either of y'all be interested in writing an RConsortium ISC proposal to get funding to work on geotargets? Next proposal deadline is April 1. More info here: https://www.r-consortium.org/all-projects/call-for-proposals

I've been on two ISC proposals in the past, and I think geotargets has a really good chance of getting funding. I just talked to my boss and she's cool with me taking this project on as part of my official duties at University of Arizona, if we were to get it funded.

April 1 does not give us a whole lot of time, and I don't know how much time/effort y'all are planning on spending on this project. The next deadline is August 1 if you'd rather shoot for that (in the meantime, I'd still be interested in contributing, I just wouldn't be able to devote as much of my time without funding).

I'm happy to start a separate repo with a template for the proposal where we could continue discussion on this. Just checking here to see who wants to be involved and when! Maybe if we are going to shoot for April 1, we should just set up a Zoom meeting to chat?

Move all supported r-spatial packages to Suggests

In #4 @brownag mentioned moving terra and other r-spatial packages from Imports to Suggests, and I think this is a good call. We shouldn't require users to have all supported r-spatial packages installed just to use targets with one of them. rlang::check_installed() prompts users to install a missing package and installs it for them, but since this code will mostly be run in a callr environment, it might be more appropriate to just use rlang::is_installed(). We could even supply a custom tip like

! package terra is required.
Did you forget to add "terra" to tar_option_set(packages = ?

`targets` 1.7.0 seems to have broken `geotargets`

I update my packages like a good boy and now it no work no good 😭

library(targets)

tar_dir({ # tar_dir() runs code from a temporary directory.
  tar_script({
    library(geotargets)
    lux_area <- function(projection = "EPSG:4326") {
      terra::project(
        terra::vect(system.file("ex", "lux.shp",
                                package = "terra"
        )),
        projection
      )
    }
    list(
      tar_terra_vect(
        terra_vect_example,
        lux_area()
      )
    )
  })
  tar_make()
  x <- tar_read(terra_rast_example)
  x
})
#> Error:
#> ! Error running targets::tar_make()
#> Error messages: targets::tar_meta(fields = error, complete_only = TRUE)
#> Debugging guide: https://books.ropensci.org/targets/debugging.html
#> How to ask for help: https://books.ropensci.org/targets/help.html
#> Last error message:
#>     `values` must have at least one element.
#> Last error traceback:
#>     base::tryCatch(base::withCallingHandlers({ NULL base::saveRDS(base::do.c...
#>     tryCatchList(expr, classes, parentenv, handlers)
#>     tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]), na...
#>     doTryCatch(return(expr), name, parentenv, handler)
#>     tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
#>     tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>     doTryCatch(return(expr), name, parentenv, handler)
#>     base::withCallingHandlers({ NULL base::saveRDS(base::do.call(base::do.ca...
#>     base::saveRDS(base::do.call(base::do.call, base::c(base::readRDS("/var/f...
#>     base::do.call(base::do.call, base::c(base::readRDS("/var/folders/vh/7fqn...
#>     (function (what, args, quote = FALSE, envir = parent.frame()) { if (!is....
#>     (function (targets_function, targets_arguments, options, envir = NULL, s...
#>     tryCatch(out <- withCallingHandlers(targets::tar_callr_inner_try(targets...
#>     tryCatchList(expr, classes, parentenv, handlers)
#>     tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>     doTryCatch(return(expr), name, parentenv, handler)
#>     withCallingHandlers(targets::tar_callr_inner_try(targets_function = targ...
#>     targets::tar_callr_inner_try(targets_function = targets_function, target...
#>     eval(parse(file = script, keep.source = TRUE), envir = envir)
#>     eval(parse(file = script, keep.source = TRUE), envir = envir)
#>     tar_terra_vect(terra_vect_example, lux_area())
#>     rlang::arg_match0(filetype, drv$name)
#>     abort(message = message, call = call)
#>     signal_abort(cnd, .file)


tar_dir({ # tar_dir() runs code from a temporary directory.
  tar_script({
    library(targets)
    library(geotargets)
    list(
      tar_terra_rast(
        terra_rast_example,
        system.file("ex/elev.tif", package = "terra") |> terra::rast()
      )
    )
  })
  tar_make()
  x <- tar_read(terra_rast_example)
  x
})
#> Error:
#> ! Error running targets::tar_make()
#> Error messages: targets::tar_meta(fields = error, complete_only = TRUE)
#> Debugging guide: https://books.ropensci.org/targets/debugging.html
#> How to ask for help: https://books.ropensci.org/targets/help.html
#> Last error message:
#>     `values` must have at least one element.
#> Last error traceback:
#>     base::tryCatch(base::withCallingHandlers({ NULL base::saveRDS(base::do.c...
#>     tryCatchList(expr, classes, parentenv, handlers)
#>     tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]), na...
#>     doTryCatch(return(expr), name, parentenv, handler)
#>     tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
#>     tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>     doTryCatch(return(expr), name, parentenv, handler)
#>     base::withCallingHandlers({ NULL base::saveRDS(base::do.call(base::do.ca...
#>     base::saveRDS(base::do.call(base::do.call, base::c(base::readRDS("/var/f...
#>     base::do.call(base::do.call, base::c(base::readRDS("/var/folders/vh/7fqn...
#>     (function (what, args, quote = FALSE, envir = parent.frame()) { if (!is....
#>     (function (targets_function, targets_arguments, options, envir = NULL, s...
#>     tryCatch(out <- withCallingHandlers(targets::tar_callr_inner_try(targets...
#>     tryCatchList(expr, classes, parentenv, handlers)
#>     tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>     doTryCatch(return(expr), name, parentenv, handler)
#>     withCallingHandlers(targets::tar_callr_inner_try(targets_function = targ...
#>     targets::tar_callr_inner_try(targets_function = targets_function, target...
#>     eval(parse(file = script, keep.source = TRUE), envir = envir)
#>     eval(parse(file = script, keep.source = TRUE), envir = envir)
#>     tar_terra_rast(terra_rast_example, terra::rast(system.file("ex/elev.tif"...
#>     rlang::arg_match0(filetype, drv$name)
#>     abort(message = message, call = call)
#>     signal_abort(cnd, .file)

sessionInfo()
#> R version 4.3.3 (2024-02-29)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Sonoma 14.4.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Australia/Melbourne
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] targets_1.7.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] base64url_1.4          compiler_4.3.3         reprex_2.1.0          
#>  [4] tidyselect_1.2.1       callr_3.7.6            yaml_2.3.8            
#>  [7] fastmap_1.1.1          R6_2.5.1               igraph_2.0.3          
#> [10] knitr_1.45             backports_1.4.1        tibble_3.2.1          
#> [13] R.cache_0.16.0         pillar_1.9.0           R.utils_2.12.3        
#> [16] rlang_1.1.3            utf8_1.2.4             xfun_0.43             
#> [19] fs_1.6.3               cli_3.6.2              withr_3.0.0           
#> [22] magrittr_2.0.3         ps_1.7.6               digest_0.6.35         
#> [25] processx_3.8.4         rstudioapi_0.16.0.9000 secretbase_0.4.0      
#> [28] lifecycle_1.0.4        R.methodsS3_1.8.2      R.oo_1.26.0           
#> [31] vctrs_0.6.5            evaluate_0.23          glue_1.7.0            
#> [34] data.table_1.15.4      styler_1.10.3          codetools_0.2-20      
#> [37] fansi_1.0.6            rmarkdown_2.26         purrr_1.0.2           
#> [40] tools_4.3.3            pkgconfig_2.0.3        htmltools_0.5.8.1

Created on 2024-04-23 with reprex v2.1.0

`tar_terra_rast()` doesn't save units or varnames

I suspect this is because those metadata are stored in a "sidecar file" when filetype = "GeoTIFF", but I haven't investigated further.

library(targets)
tar_script({
    make_rast <- function() {
        x <- terra::rast(system.file("ex/elev.tif", package = "terra"))
        terra::units(x) <- "m"
        terra::varnames(x) <- "elev"
        x
    }
    list(
        geotargets::tar_terra_rast(
            rast,
            make_rast()
        )
    )
})
tar_make()
#> ▶ dispatched target rast
#> ● completed target rast [0.017 seconds]
#> ▶ ended pipeline [0.148 seconds]
tar_load(rast)
terra::units(rast)
#> [1] ""
terra::varnames(rast)
#> [1] "rast"
terra::names(rast)
#> [1] "elevation"

Created on 2024-04-25 with reprex v2.1.0

Implement "filetype" argument

If tar_* functions are specific to packages and data types, then adding a filetype argument somewhere would be a way for users to override defaults for what kind of file targets are stored as (e.g. GeoTIFF vs netCDF). I could imagine filetype being an argument to tar_terra_rast() or an argument to a function supplied to the format argument of tar_terra_rast()

For example:
Option 1

tar_terra_rast <-
  function(name, command, pattern = NULL, filetype = c("GeoTIFF", "netCDF"), ...)

Option 2

tar_terra_rast <-
  function(name, command, pattern = NULL, format = format_terra_rast(filetype = c("GeoTIFF", "netCDF")), ...)

where format_terra_rast() returns the result of a call to tar_format()

The first option is probably preferable, unless there are other customizations that users might need to do to the format

import rlang

Targets uses {rlang} anyway, and there's some really useful things in there which I can imagine using immediately:

  • rlang::%||%`

Does:

function (x, y) 
{
    if (is_null(x)) 
        y
    else x
}
  • rlang::arg_match0: Nicer arg match printing error and refers to function being called in

Required for #19

Ideas on generalization of spatial package backends and file sources using GDAL (terra, sf, stars, etc.)

I wanted to throw up some ideas for discussion, might be a bit rambling for a single issue. Happy to break off any particular items as new issues or address in specific PRs; I will submit some draft PRs once I have fleshed these ideas out. I say "we" a lot in here but ultimately I am just one interested opinion and welcome any thoughts or alternatives.


The current target storage format functions defined are file-format centric. This is great, because GDAL is the library behind the scenes for common interfaces to a variety of different file formats. GDAL is used in several R spatial packages notably: sf, terra, and stars. I think this project should abstract out the functionality for GDAL data source paths and provide support for multiple R package/object type interfaces in the result the user sees.

In my opinion, {geotargets} should provide default behavior based on type of spatial data, i.e. vector geometry vs. raster--this is so the user doesn't have to think too much about the formats in their target store, just that they are able to roundtrip an R object equivalent to what they started with. If they care about the format, they should have the ability to choose.

I'd like to make (or suggest others make) a couple PRs to implement:

  1. Spatial backend options to allow, for example: GeoTIFF format object with {stars} or Shapefile format with {sf}
  2. Generalization of the "multiple file target compression" GDAL /vsizip/ approach to all backends and formats that support it

These should provide some room for discussion about specifics how the group wants to abstract or break out functionality.


Spatial backends based on GDAL

I imagine some users don't care so much what file format their target store contains, but likely will care more about the object types that are returned and the associated packages. The object type matters because of chosen dependencies and preferred workflows of the user. The file type may matter especially when it comes time to read targets back in, in part or in full, when they start taking up a lot of disk space, or some step in the process requires a specific format.

  • We may not want to require users to load both {sf} and {terra}, for example

    • Package usage gated by requireNamespace() and having all of these types of packages that produce the user-facing object in Suggests seems like a good strategy. The alternative would be to say, pick {terra} for use internally and then provide conversion methods for compatibility with other ({sf}/{stars}) objects as input/output.

    • I personally am a big fan of {terra}, but still use {sf} for quite a few things. {terra} is great in that it can do both vector and raster data, but there are many R spatial users and a much broader R ecosystem built around {sf}. I think users should be able to avoid one or the other, or interchange as needed, in their workflows if they need to be able to.

Specific result types (e.g. sf data.frame, or lazy tbl, vs SpatVector/SpatVectorProxy) would be customize with options set for the whole pipeline, for a target factory, or in wrapper functions.

For example:

  • In addition to a tar_geotiff() with multiple options set we could have functions liketar_geotiff_stars() and tar_geotiff_terra(). More generic functions would be possible if we abstract out the file type for all GDAL drivers, you might have tar_vector_sf(filetype="parquet") or tar_vector(filetype="ESRI Shapefile", package="terra")

  • Target factories and formats could utilize default arguments, possibly customized based on selected filetype; they might read a targets option or environment variable, or be settable through a function {geotargets} would offer.

    • Some formats have specific limitations. For example I think you would "always" need to use /vsizip/ or similar compressed file option if you need your target to be stored as a Shapefile (multiple files). So, /vsizip/ would be default in a tar_shapefile_*() helper method. Perhaps such a function would be better named tar_shapefile_zip() to indicate that it only works with target names ending in a ".zip" suffix (which is something users may prefer to avoid)

Generalization of compression for spatial targets with GDAL

The /vsizip/ GDAL virtual file system functionality used in format_shapefile() is an example of something that can be generalized further with a focus on generic GDAL data source paths. I think the idea of being able to compress files that are in the target store (and keep them compressed) is attractive for spatial data which can be quite large--even if targets are not comprised of multiple files.

  • Since GDAL can read from the compressed target store efficiently, you get the benefit of less file size footprint while also being able to read the file without fully extracting it.
    • Also should consider some of the other archive file formats/virtual file system types, and providing interfaces in R to produce them e.g. /vsigzip/ or /vsitar/ analogs to /vsizip/ + utils::zip().
  • Even without creating specific compressed archive files, there should be robust tools available for controlling GDAL file compression options, supported by many drivers, that are used to write target objects
  • The ZIP approach is useful for GeoTIFF files where category information is stored in the .tif.aux.xml sidecar file. Convenience methods for terra SpatRaster objects could automatically store a target as a ZIP file (and give warnings about target naming) if the input SpatRaster is categorical and output format is GeoTIFF.

tests with multiple workers

I think it's important to add tests running pipelines with multiple workers, since that is when the marshaling/unmarshaling of R objects comes into play.

Facilitate targets that are a list of geospatial rasters

For example, in:

targets::tar_script({
  library(geodata)
  library(targets)
  agcrop_area <- function(crop ) {
    
    the_raster <- crop_spam(
      crop = crop,
      var = "area",
      path = "data/rasters",
      africa = TRUE
    )
    
    the_raster
    
  }
  
  format_geotiff <- tar_format(
    read = function(path) terra::rast(path),
    write = function(object, path) {
      terra::writeRaster(
        x = object,
        filename = path,
        filetype = "GTiff",
        overwrite = TRUE
      )
    },
    marshal = function(object) terra::wrap(object),
    unmarshal = function(object) terra::unwrap(object)
  )
  
  tar_target(raster_coffee  = agcrop_area(crop = "acof"))
  tar_target(raster_veg = agcrop_area(crop = "vege"))
  
  tar_target(
    raster_countries,
    command = list(
      raster_coffee,
      raster_veg
    ),
    format = format_geotiff
  )
  
    
})

tar_target(
  raster_countries,
  command = list(
    raster_coffee,
    raster_veg
  ),
  format = format_geotiff
)
#> Error in tar_target(raster_countries, command = list(raster_coffee, raster_veg), : could not find function "tar_target"

Created on 2024-03-04 with reprex v2.1.0

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.3 (2024-02-29)
#>  os       macOS Sonoma 14.3.1
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Australia/Melbourne
#>  date     2024-03-04
#>  pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  backports     1.4.1   2021-12-13 [1] CRAN (R 4.3.0)
#>  base64url     1.4     2018-05-14 [2] CRAN (R 4.3.0)
#>  callr         3.7.5   2024-02-19 [1] CRAN (R 4.3.1)
#>  cli           3.6.2   2023-12-11 [1] CRAN (R 4.3.1)
#>  codetools     0.2-19  2023-02-01 [2] CRAN (R 4.3.3)
#>  data.table    1.15.0  2024-01-30 [1] CRAN (R 4.3.1)
#>  digest        0.6.34  2024-01-11 [1] CRAN (R 4.3.1)
#>  evaluate      0.23    2023-11-01 [1] CRAN (R 4.3.1)
#>  fansi         1.0.6   2023-12-08 [1] CRAN (R 4.3.1)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
#>  fs            1.6.3   2023-07-20 [1] CRAN (R 4.3.0)
#>  glue          1.7.0   2024-01-09 [1] CRAN (R 4.3.1)
#>  htmltools     0.5.7   2023-11-03 [1] CRAN (R 4.3.1)
#>  igraph        2.0.2   2024-02-17 [1] CRAN (R 4.3.1)
#>  knitr         1.45    2023-10-30 [1] CRAN (R 4.3.1)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.3.1)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
#>  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
#>  processx      3.8.3   2023-12-10 [1] CRAN (R 4.3.1)
#>  ps            1.7.6   2024-01-18 [1] CRAN (R 4.3.1)
#>  purrr         1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
#>  R.cache       0.16.0  2022-07-21 [2] CRAN (R 4.3.0)
#>  R.methodsS3   1.8.2   2022-06-13 [2] CRAN (R 4.3.0)
#>  R.oo          1.26.0  2024-01-24 [2] CRAN (R 4.3.1)
#>  R.utils       2.12.3  2023-11-18 [2] CRAN (R 4.3.1)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
#>  reprex        2.1.0   2024-01-11 [2] CRAN (R 4.3.1)
#>  rlang         1.1.3   2024-01-10 [1] CRAN (R 4.3.1)
#>  rmarkdown     2.25    2023-09-18 [1] CRAN (R 4.3.1)
#>  rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
#>  secretbase    0.3.0   2024-02-21 [1] CRAN (R 4.3.1)
#>  sessioninfo   1.2.2   2021-12-06 [2] CRAN (R 4.3.0)
#>  styler        1.10.2  2023-08-29 [2] CRAN (R 4.3.0)
#>  targets       1.5.1   2024-02-15 [1] CRAN (R 4.3.1)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
#>  utf8          1.2.4   2023-10-22 [1] CRAN (R 4.3.1)
#>  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.3.1)
#>  withr         3.0.0   2024-01-16 [1] CRAN (R 4.3.1)
#>  xfun          0.42    2024-02-08 [1] CRAN (R 4.3.1)
#>  yaml          2.3.8   2023-12-11 [1] CRAN (R 4.3.1)
#> 
#>  [1] /Users/nick/Library/R/arm64/4.3/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Require minimum version of GDAL?

Certain features (e.g. SOZip shapefile use by tar_terra_vect() with filetype = "ESRI Shapefile") require GDAL >= 3.7. However, that release was only about a year ago. Running tests on GDAL 3.0.4 gives the following:

Error (test-tar-terra.R:62:5): tar_terra_vect() works
<tar_condition_run/tar_condition_targets/rlang_error/error/condition>
Error: Error running targets::tar_make()
Error messages: targets::tar_meta(fields = error, complete_only = TRUE)
Debugging guide: https://books.ropensci.org/targets/debugging.html
How to ask for help: https://books.ropensci.org/targets/help.html
Last error message:
    the write() function in tar_format() must not create a directory. Found directories inside the data store where there should only be files: _targets/objects/test_terra_vect_shz

I think there are a few options:

  1. Require GDAL >= 3.7
  2. Check for GDAL >= 3.7 and error if older
  3. Check GDAL version and fallback to a different system when version requirement isn't met (e.g. using utils::zip() instead of the .shz extension)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.