The fluxdatakit's discuss from geco-bern

the unit of model gpp and EC-gpp seems not the same

Hi Koen,

I was trying to compare the EC-based GPP from CH-Dav and p-model simulated GPP. It seems the values of EC GPP are much higher compared to modeled EC. I am wondering what's the unit of EC GPP and p-model GPP? The simulated GPP by the p-model followed the same model parameterization as Stocker et al., 2020?

Best

Amend remote sensing data to final data frame

Document changes relative to FluxnetLSM

LAI + FPAR
energy balance closure + raw inputs

LST ingestion

Implement download of MODIS MYD21A2 product as part of ingestr. Follow implementation of other MODIS products within ingestr. @Koen Hufkens might be able to help, too.

Fixed on my dev version: https://github.com/bluegreen-labs/ingestr/commit/d4a70a6700d49af35b3f58ebb383f4688704980f

I'm writing a routine which will go in sofunCalVal for now, will run for a while though but I prioritized LST (first in line, backfilling the rest)

@beni Stocker could you set the permissions for the modis_subsets directory so I can write to it?

Custom subsetting in ERA correction

There is custom (arbitrary) subsetting in the ERA correction without much documentation in the original setup. No idea on what this is based.

https://github.com/computationales/FluxDataKit/blob/1e35838a542a778a2f85617bc05d0245eab4fbc0/R/fdk_correct_era.R#L109

MODIS interpollation on daily level

Interpolate linearly on a daily level, introduce QA/QC flag for interpolated values.

Error when installing package

I'm getting the following error when I try to install the package:

Error: package or namespace load failed for ‘FluxDataKit’ in namespaceExport(ns, exports):
undefined exports: prepare_setup_sofun

prepare_setup_sofun.R was removed in the latest commit, but it's still referenced in the 'NAMESPACE' file. The install is successful when I remove the reference or add the file back in.

@khufkens

FluxnetLSM installation

Vignette 03_data_generation.Rmd loads the FluxnetLSM library. I tried to install it by

devtools::install_github("aukkola/FluxnetLSM")

But got this error:

* installing *source* package ‘FluxnetLSM’ ...
** using staged installation
** libs
** arch - 
Rscript zzz.R
fatal: not a git repository (or any of the parent directories): .git
Warning message:
In system("git rev-parse --verify HEAD", intern = TRUE) :
  running command 'git rev-parse --verify HEAD' had status 128
git rev: NON-GIT
** R
** data
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
ERROR: hard-coded installation path: please report to the package maintainer and use ‘--no-staged-install’
* removing ‘/Library/Frameworks/R.framework/Versions/4.2/Resources/library/FluxnetLSM’
Warning messages:
1: In readLines(old_path) :
  incomplete final line found on '/Users/benjaminstocker/.R/Makevars'
2: In i.p(...) :
  installation of package ‘/var/folders/50/6vwrc_t54pv4n9vty5dgt4fw0000gn/T//RtmplOVr6N/file610347d6221d/FluxnetLSM_1.0.tar.gz’ had non-zero exit status

Should I be installing it from a different source?

How to install FluxnetLSM and FluxnetEO?

The vignette vignettes/03_data_generation.Rmd uses the packages FluxnetLSM and FluxnetEO, but these are not dependencies of FluxDataKit. Could you add a note for how to install them (where to find them)?

Processing differences

Trevor's comment on Slack:
I wonder how you are integrating the Plumber2 data with the other datasets as it was processed using a different codebase with different processing decisions. The most important is likely that Plumber2 forces energy balance closure, as it is designed for land surface model evaluation, while the other datasets do not. This can lead to large differences in LE and H.

Summary statistics

Documentation:

list modis products
site years
site years added
source data
meta-data (sources?)

Package icoscp not available

 library(FluxDataKit)
source("data-raw/01_collect_meta-data.R")

Returns

Error in library(icoscp) : there is no package called ‘icoscp’

And the package does not seem to be available from CRAN.

> install.packages("icoscp")
Warning in install.packages :
  package ‘icoscp’ is not available for this version of R

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages

Negative GPP fluxes in NT_REF

@SamanthaBiegel

This is the -correct- procedure to deal with this (as included in {ingestr} here: https://github.com/geco-bern/ingestr/blob/2ecf65224ff8b555cb8e693138a68a62df7e6fac/R/get_obs_bysite_fluxnet.R#L1297).

This routine uses the workflow from Tramontana et al. 2016 - "Predicting carbon dioxide and energy fluxes across global FLUXNET sites with regression algorithms " (https://bg.copernicus.org/articles/13/4291/2016/bg-13-4291-2016.html)

Meeting issues to raise:

Model efficiency factor in daily aggregation
La Thuile integration (do not expand beyond PLUMBER)
What is this correction? #28
Long term maintenance and goals - merger with other initiatives?
CRAN compliance code quality and standards including unit tests
Documentation, who's in charge of expanding this where necessary (function calls as well as overall use)?
site exceptions, permissible or not - those are the ones that tend to mess up the automation
from Dario Papale, Ameriflux direct downloads should be used instead of ONEFlux beta (integration with {amerifluxr}?

`flux_data_kit_site-info` object

The object flux_data_kit_site-info comes in a shape that doesn't look as expected:

`flux_data_kit_site-info` %>% 
  as_tibble()

# A tibble: 306 × 1
   sitename.lat.lon.elv.date_start.date_end.product.koeppen_code.year_end.year_start.koeppen_code_beck.whc.igbp_land_use.data_path              
   <fct>                                                                                                                                        
 1 SE-Nor,60.0865,17.479504,45,2014-01-01,2020-12-31,icos,Dfb,2020,2014,Dfb,295.938995361328,Evergreen Needleleaf Forest,data-raw/flux_data/    
 2 BE-Bra,51.30761,4.51984,16,1996-01-01,2020-12-31,icos,Cfb,2020,1996,Cfb,425.602996826172,Mixed Forest,data-raw/flux_data/                    
 3 BE-Lcr,51.11218,3.85043,6.25,2019-01-01,2020-12-31,icos,Cfb,2020,2019,Cfb,321.834991455078,Croplands,data-raw/flux_data/                     
 4 BE-Lon,50.55162,4.746234,170,2004-01-01,2020-12-31,icos,Cfb,2020,2004,Cfb,441.085998535156,Croplands,data-raw/flux_data/                     
 5 BE-Maa,50.97987,5.631851,87,2016-01-01,2020-12-31,icos,Cfb,2020,2016,Cfb,314.752014160156,Savannas,data-raw/flux_data/                       
 6 BE-Vie,50.304962,5.998099,490,1996-01-01,2020-12-31,icos,Dfb,2020,1996,Dfb,395.394012451172,Mixed Forest,data-raw/flux_data/                 
 7 DE-Geb,51.09973,10.91463,161.5,2001-01-01,2020-12-31,icos,Dfb,2020,2001,Dfb,358.447998046875,Croplands,data-raw/flux_data/                   
 8 DE-Gri,50.95004,13.51259,385,2004-01-01,2020-12-31,icos,Dfb,2020,2004,Dfb,392.226013183594,Savannas,data-raw/flux_data/                      
 9 DE-Hai,51.079407,10.452089,438.7,2000-01-01,2020-12-31,icos,Dfb,2020,2000,Dfb,358.447998046875,Deciduous Broadleaf Forest,data-raw/flux_data/
10 DE-HoH,52.08656,11.22235,193,2015-01-01,2020-12-31,icos,Dfb,2020,2015,Dfb,392.226013183594,Mixed Forest,data-raw/flux_data/                  
# … with 296 more rows
# ℹ Use `print(n = ...)` to see more rows

Should be separated into multiple columns.
Suggestion: the object could be named fdk_sites.

Provide output

Create output of FDK based on FLUXNET2015, ICOS, PLUMBER2, ... (what we currently have) and make file accessible (e.g., Zenodo).

Instructions for how to create rsofun drivers

Let's say we have a list of sites and we want to create the rsofun driver object for this site set. What functions or scripts should be run in what sequence?
Here, I found the instruction "Run the numbered scripts in order to generate the benchmark datasets". What numbered scripts? Where are they?
Would be nice to have this as a vignette (Article on website).
Thanks.

Custom site processing does not work

Binding for custom site processing fail upon a query to fluxnet.ornl.gov.

This binding doesn't trace, not sure where ornl.gov is called, or what for? It fails on some of the sites, which then propagates and halts further processing. Reprex to follow.

Check coverage on HH output

For @lauramarques, checks coverage of GPP values on HH p-model formatted output

# quality control on coverage of site level HH data

library(tidyverse)

df <- readRDS("site_based_drivers_HH.rds")

stats <- df %>%
  group_by(sitename, date) %>%
  summarize(
    na_values = length(which(is.na(gpp)))/length(gpp)
  ) %>%
  group_by(sitename) %>%
  summarize(
    coverage = 1 - mean(na_values)
  )

Copy plumber2 workflow (as far as possible)

Release time line FDK

test run - final fixes 5/12
first run - 6/12
qa/qc - 8/12

New run probably early 2023 with the delivery of new data from Trevor
@stineb

US-MMS data not found

Francesco's LE filtering

Implement the filtering steps described in Giardina et al. in prep.:

We first applied a rainfall filter with a buffer of 6 hours after each rain event to exclude interception evaporation and to avoid sensor saturation with high relative humidity (Li et al., 2019). We removed data with relative humidity higher than the 95% quantile to exclude the impact of dew evaporation on ET (Knauer, Zaehle, et al., 2018). To avoid stable boundary layer conditions, we excluded data where the sensible heat flux was smaller than 5 W m-2 and incoming shortwave radiation was smaller than 50 W m-2. Finally, only daytime data (GPP, ET and VPD > 0) were considered. Half-hourly data were aggregated into daily data to reduce noise and to avoid the ET-VPD hysteresis effect, observed at sub-daily timescales (Q. Zhang et al., 2014). While aggregating to the daily level, the daily mean was calculated for all variables, except for VPD (for which we calculated the daily maximum), ET and precipitation (for which we used the daily sum). We only retained daily estimates with at least 8 measured half-hourly points, as in (Li et al., 2019).

@fgiardin could you please ask @khufkens where to add your code to integrate this into flux_data_kit? A possibility is to create a column in the that identifies whether the data is giardina-filtered or not (binary information)

Remove all rbeni components

Create processing workflow using FluxnetLSM

Double check the units!

Double check all units before release.

Skipping file writing

Message

File exists, skipping

seems to be raised after computationally heavy calculations are executed. If file is not overwritten, all calculations may be skipped. May consider an argument overwrite.

reformat format function to use site data

Currently watch-wfdei data is used (as this can be scaled globally for other sites and locations). This is not really required, so fall back on site specific data i.e., create a new format function.

Close to release

@stineb

MODIS data at data/modis_data.rds
current site list (223 sites) at data/p_model_drivers/site_based_drivers.rds
visuals of coverage at new article

Cross tabulate data coverage

This table should be rendered upon the completion of the final data product.

The current meta-data file lists sites which potentially don't have enough data to process cleanly.

fdk_convert_lsm() fails with latest R version

Probably because of changes in as.POSIXct(), the fdk_convert_lsm() function fails on R version 4.3.1.

The problem is in this line: dplyr::left_join(df[[1]], df[[2]])
it returns a very large df with this warning:

Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 1 of `x` matches multiple rows in `y`.
ℹ Row 1 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship = "many-to-many"`
  to silence this warning.

Possible fix:

Change lines 68 and 69 as follows:

line 68:    time_date <- as.POSIXct(time, origin = time_units, tz="GMT")
line 69:    # time_date <- as.POSIXct(time_date, tz = "GMT")

Replace watch data with ERA5

Pre-download cloud cover watch data with ERA5 (true daily) total cloud cover values.

Wrap scripts in workflow function

Create a new function which generates a full new release mostly automated. Needs an input and output path only, creates all required directories according to a fixed structure. Should simplify the processing workflow to the absolute minimum.

fdk_release(
 in_path,
 out_path,
 overwrite
)

MODIS data smoothing mess

Ok, I seriously doubt that the MODIS smoothing and gap filling routine makes any sense. Seriously f-cked up stuff IMO. Anyway, the current routine introduces spurious results throughout and in particular at year end changes, with sharp peaks in LAI/FPAR for anything that isn't northern hemisphere seasonal as far as I can tell.

Data cleaning

Check with @Megzy11 to adopt data cleaning steps. See also on Notion.

Citation info

For publication, we should always include a table with each eddy covariance site from which we used data and cite the appropriate paper.

I once started writing code for this in the repo stineb/fluxnet2015_citations. It contains code to create a latex table with a column with entries like \cite{CH-Lae}. The accompanying .bib file contains all bibtex entries for papers with bib-citation keys corresponding to the site names (here CH-Lae).

Should we provide this functionality also in FluxDataKit? If so, please adopt the code and files and note this open issue.

Documentation and dissemination

Documentation

half-hourly, not complemented with MODIS
daily: document how aggregated, complement daily interpolated with MODIS
two versions:
- 1st what we have from PLUBER, ICOS, Oneflux beta (to be done now)
- 2nd (once provided by Ameriflux): extended site list update
Document how much data added per site, and additional sites, compared to FLUXNET2015
Show what additional MODIS variables added
Explain meta info

Dissemination

Strategy? DOI? Data paper?
Share with LEMONTREE

Integration of Ameriflux processed data

Ensuring that the data that Trevor's team have already processed are available through the toolkit site

Gapfilling

We need data with standardized aggregation to daily, monthly and annual values.

Fill gaps, e.g., using KNN imputation (see [here](https://bradleyboehmke.github.io/HOML/engineering.html#impute)), record which data points are gapfilled in a QC column.

Include non-closed energy balance data as well

For aggregation to daily, check FLUXNET2015 protocol

Process data using FluxnetLSM

Screen all functions for excessive parameters

Reduce parameter complexity to limit use cases and maintenance

La Thuile integration

Add La Thuile data to fdk.

Amending MODIS file names

ERA corrections also alter the field names of the MODIS product. This is illogical and breaks the split processing between FLUXNET and PLUMBER data. Move this section to the smoothing routine or a separate function.

Ameriflux BASE compatibilty

On advice from Dario drop the OneFlux processed data in favour of Ameriflux Oneflux (most recent) and or base products (as successor to FLUXNET2015).

Propagate QC flags to DD

For the DD data, specify QC information as fraction (0-1) of good (not gap-filled) HH data that went into the respective, aggregated DD data.

Keep all HH data (do not apply any filtering) and provide HH data quality in the respective QC variable. At HH, the QC variable is a code (integer, 0 = good, 1 = ...).

This is to avoid that DD data is aggregated from HH data that has gaps. Gaps are non-random (tend to be more frequent at night). Therefore daily means calculated from cleaned, gappy, HH data will be biased.

Introduce QA/QC flags for interpolated values for drivers and GPP

Downsampling (DD)

We will need data aggregated to daily for a lot of applications (some use also for also monthly or annual).

Is the aggregation implemented as part of the FluxDataKit code, based on the HH data?
We should provide outputs also for the daily aggregated data.

Sideloading data on GECO systems

Provision to side-load data on GECO systems.

Cloud coverage requires CRU data which is too much to ask for most people (in terms of getting data etc). This option will be dynamic and system name dependent (not 100% fool proof but close enough).

On GECO systems (Balder / Dash) workstations the code will sideload CRU cloud cover data, on all other systems it will be set to 0, clear sky conditions.

geco-bern / fluxdatakit Goto Github PK

fluxdatakit's Issues

Documentation

Dissemination

Recommend Projects

Recommend Topics

Recommend Org