Coder Social home page Coder Social logo

fluxdatakit's Issues

the unit of model gpp and EC-gpp seems not the same

Hi Koen,

I was trying to compare the EC-based GPP from CH-Dav and p-model simulated GPP. It seems the values of EC GPP are much higher compared to modeled EC. I am wondering what's the unit of EC GPP and p-model GPP? The simulated GPP by the p-model followed the same model parameterization as Stocker et al., 2020?

Best
test

LST ingestion

  • Implement download of MODIS MYD21A2 product as part of ingestr. Follow implementation of other MODIS products within ingestr. @Koen Hufkens might be able to help, too.

Fixed on my dev version: https://github.com/bluegreen-labs/ingestr/commit/d4a70a6700d49af35b3f58ebb383f4688704980f

I'm writing a routine which will go in sofunCalVal for now, will run for a while though but I prioritized LST (first in line, backfilling the rest)

@beni Stocker could you set the permissions for the modis_subsets directory so I can write to it?

Error when installing package

I'm getting the following error when I try to install the package:

Error: package or namespace load failed for ‘FluxDataKit’ in namespaceExport(ns, exports):
undefined exports: prepare_setup_sofun

prepare_setup_sofun.R was removed in the latest commit, but it's still referenced in the 'NAMESPACE' file. The install is successful when I remove the reference or add the file back in.

@khufkens

FluxnetLSM installation

Vignette 03_data_generation.Rmd loads the FluxnetLSM library. I tried to install it by

devtools::install_github("aukkola/FluxnetLSM")

But got this error:

* installing *source* packageFluxnetLSM...
** using staged installation
** libs
** arch - 
Rscript zzz.R
fatal: not a git repository (or any of the parent directories): .git
Warning message:
In system("git rev-parse --verify HEAD", intern = TRUE) :
  running command 'git rev-parse --verify HEAD' had status 128
git rev: NON-GIT
** R
** data
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
ERROR: hard-coded installation path: please report to the package maintainer and use--no-staged-install* removing/Library/Frameworks/R.framework/Versions/4.2/Resources/library/FluxnetLSMWarning messages:
1: In readLines(old_path) :
  incomplete final line found on '/Users/benjaminstocker/.R/Makevars'
2: In i.p(...) :
  installation of package/var/folders/50/6vwrc_t54pv4n9vty5dgt4fw0000gn/T//RtmplOVr6N/file610347d6221d/FluxnetLSM_1.0.tar.gzhad non-zero exit status

Should I be installing it from a different source?

How to install FluxnetLSM and FluxnetEO?

The vignette vignettes/03_data_generation.Rmd uses the packages FluxnetLSM and FluxnetEO, but these are not dependencies of FluxDataKit. Could you add a note for how to install them (where to find them)?

Processing differences

Trevor's comment on Slack:
I wonder how you are integrating the Plumber2 data with the other datasets as it was processed using a different codebase with different processing decisions. The most important is likely that Plumber2 forces energy balance closure, as it is designed for land surface model evaluation, while the other datasets do not. This can lead to large differences in LE and H.

Summary statistics

Documentation:

  • list modis products
  • site years
  • site years added
  • source data
  • meta-data (sources?)

Package icoscp not available

 library(FluxDataKit)
source("data-raw/01_collect_meta-data.R")

Returns

Error in library(icoscp) : there is no package calledicoscp

And the package does not seem to be available from CRAN.

> install.packages("icoscp")
Warning in install.packages :
  packageicoscpis not available for this version of R

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages

Meeting issues to raise:

  • Model efficiency factor in daily aggregation
  • La Thuile integration (do not expand beyond PLUMBER)
  • What is this correction? #28
  • Long term maintenance and goals - merger with other initiatives?
  • CRAN compliance code quality and standards including unit tests
  • Documentation, who's in charge of expanding this where necessary (function calls as well as overall use)?
  • site exceptions, permissible or not - those are the ones that tend to mess up the automation
  • from Dario Papale, Ameriflux direct downloads should be used instead of ONEFlux beta (integration with {amerifluxr}?

`flux_data_kit_site-info` object

The object flux_data_kit_site-info comes in a shape that doesn't look as expected:

`flux_data_kit_site-info` %>% 
  as_tibble()

# A tibble: 306 × 1
   sitename.lat.lon.elv.date_start.date_end.product.koeppen_code.year_end.year_start.koeppen_code_beck.whc.igbp_land_use.data_path              
   <fct>                                                                                                                                        
 1 SE-Nor,60.0865,17.479504,45,2014-01-01,2020-12-31,icos,Dfb,2020,2014,Dfb,295.938995361328,Evergreen Needleleaf Forest,data-raw/flux_data/    
 2 BE-Bra,51.30761,4.51984,16,1996-01-01,2020-12-31,icos,Cfb,2020,1996,Cfb,425.602996826172,Mixed Forest,data-raw/flux_data/                    
 3 BE-Lcr,51.11218,3.85043,6.25,2019-01-01,2020-12-31,icos,Cfb,2020,2019,Cfb,321.834991455078,Croplands,data-raw/flux_data/                     
 4 BE-Lon,50.55162,4.746234,170,2004-01-01,2020-12-31,icos,Cfb,2020,2004,Cfb,441.085998535156,Croplands,data-raw/flux_data/                     
 5 BE-Maa,50.97987,5.631851,87,2016-01-01,2020-12-31,icos,Cfb,2020,2016,Cfb,314.752014160156,Savannas,data-raw/flux_data/                       
 6 BE-Vie,50.304962,5.998099,490,1996-01-01,2020-12-31,icos,Dfb,2020,1996,Dfb,395.394012451172,Mixed Forest,data-raw/flux_data/                 
 7 DE-Geb,51.09973,10.91463,161.5,2001-01-01,2020-12-31,icos,Dfb,2020,2001,Dfb,358.447998046875,Croplands,data-raw/flux_data/                   
 8 DE-Gri,50.95004,13.51259,385,2004-01-01,2020-12-31,icos,Dfb,2020,2004,Dfb,392.226013183594,Savannas,data-raw/flux_data/                      
 9 DE-Hai,51.079407,10.452089,438.7,2000-01-01,2020-12-31,icos,Dfb,2020,2000,Dfb,358.447998046875,Deciduous Broadleaf Forest,data-raw/flux_data/
10 DE-HoH,52.08656,11.22235,193,2015-01-01,2020-12-31,icos,Dfb,2020,2015,Dfb,392.226013183594,Mixed Forest,data-raw/flux_data/                  
# … with 296 more rows
# ℹ Use `print(n = ...)` to see more rows

Should be separated into multiple columns.
Suggestion: the object could be named fdk_sites.

Provide output

Create output of FDK based on FLUXNET2015, ICOS, PLUMBER2, ... (what we currently have) and make file accessible (e.g., Zenodo).

Instructions for how to create rsofun drivers

Let's say we have a list of sites and we want to create the rsofun driver object for this site set. What functions or scripts should be run in what sequence?
Here, I found the instruction "Run the numbered scripts in order to generate the benchmark datasets". What numbered scripts? Where are they?
Would be nice to have this as a vignette (Article on website).
Thanks.

Custom site processing does not work

Binding for custom site processing fail upon a query to fluxnet.ornl.gov.

This binding doesn't trace, not sure where ornl.gov is called, or what for? It fails on some of the sites, which then propagates and halts further processing. Reprex to follow.

Check coverage on HH output

For @lauramarques, checks coverage of GPP values on HH p-model formatted output

# quality control on coverage of site level HH data

library(tidyverse)

df <- readRDS("site_based_drivers_HH.rds")

stats <- df %>%
  group_by(sitename, date) %>%
  summarize(
    na_values = length(which(is.na(gpp)))/length(gpp)
  ) %>%
  group_by(sitename) %>%
  summarize(
    coverage = 1 - mean(na_values)
  )

Release time line FDK

  • test run - final fixes 5/12
  • first run - 6/12
  • qa/qc - 8/12

New run probably early 2023 with the delivery of new data from Trevor
@stineb

Francesco's LE filtering

Implement the filtering steps described in Giardina et al. in prep.:

We first applied a rainfall filter with a buffer of 6 hours after each rain event to exclude interception evaporation and to avoid sensor saturation with high relative humidity (Li et al., 2019). We removed data with relative humidity higher than the 95% quantile to exclude the impact of dew evaporation on ET (Knauer, Zaehle, et al., 2018). To avoid stable boundary layer conditions, we excluded data where the sensible heat flux was smaller than 5 W m-2 and incoming shortwave radiation was smaller than 50 W m-2. Finally, only daytime data (GPP, ET and VPD > 0) were considered. Half-hourly data were aggregated into daily data to reduce noise and to avoid the ET-VPD hysteresis effect, observed at sub-daily timescales (Q. Zhang et al., 2014). While aggregating to the daily level, the daily mean was calculated for all variables, except for VPD (for which we calculated the daily maximum), ET and precipitation (for which we used the daily sum). We only retained daily estimates with at least 8 measured half-hourly points, as in (Li et al., 2019).

@fgiardin could you please ask @khufkens where to add your code to integrate this into flux_data_kit? A possibility is to create a column in the that identifies whether the data is giardina-filtered or not (binary information)

Skipping file writing

Message

File exists, skipping

seems to be raised after computationally heavy calculations are executed. If file is not overwritten, all calculations may be skipped. May consider an argument overwrite.

reformat format function to use site data

Currently watch-wfdei data is used (as this can be scaled globally for other sites and locations). This is not really required, so fall back on site specific data i.e., create a new format function.

Close to release

@stineb

  • MODIS data at data/modis_data.rds
  • current site list (223 sites) at data/p_model_drivers/site_based_drivers.rds
  • visuals of coverage at new article

Cross tabulate data coverage

This table should be rendered upon the completion of the final data product.

The current meta-data file lists sites which potentially don't have enough data to process cleanly.

fdk_convert_lsm() fails with latest R version

Probably because of changes in as.POSIXct(), the fdk_convert_lsm() function fails on R version 4.3.1.

The problem is in this line: dplyr::left_join(df[[1]], df[[2]])
it returns a very large df with this warning:

Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 1 of `x` matches multiple rows in `y`.
ℹ Row 1 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship = "many-to-many"`
  to silence this warning.

Possible fix:

Change lines 68 and 69 as follows:

line 68:    time_date <- as.POSIXct(time, origin = time_units, tz="GMT")
line 69:    # time_date <- as.POSIXct(time_date, tz = "GMT")

Wrap scripts in workflow function

Create a new function which generates a full new release mostly automated. Needs an input and output path only, creates all required directories according to a fixed structure. Should simplify the processing workflow to the absolute minimum.

fdk_release(
 in_path,
 out_path,
 overwrite
)

MODIS data smoothing mess

Ok, I seriously doubt that the MODIS smoothing and gap filling routine makes any sense. Seriously f-cked up stuff IMO. Anyway, the current routine introduces spurious results throughout and in particular at year end changes, with sharp peaks in LAI/FPAR for anything that isn't northern hemisphere seasonal as far as I can tell.

Citation info

For publication, we should always include a table with each eddy covariance site from which we used data and cite the appropriate paper.

I once started writing code for this in the repo stineb/fluxnet2015_citations. It contains code to create a latex table with a column with entries like \cite{CH-Lae}. The accompanying .bib file contains all bibtex entries for papers with bib-citation keys corresponding to the site names (here CH-Lae).

Should we provide this functionality also in FluxDataKit? If so, please adopt the code and files and note this open issue.

Documentation and dissemination

Documentation

  • half-hourly, not complemented with MODIS
  • daily: document how aggregated, complement daily interpolated with MODIS
  • two versions:
    • 1st what we have from PLUBER, ICOS, Oneflux beta (to be done now)
    • 2nd (once provided by Ameriflux): extended site list update
  • Document how much data added per site, and additional sites, compared to FLUXNET2015
  • Show what additional MODIS variables added
  • Explain meta info

Dissemination

  • Strategy? DOI? Data paper?
  • Share with LEMONTREE

Amending MODIS file names

ERA corrections also alter the field names of the MODIS product. This is illogical and breaks the split processing between FLUXNET and PLUMBER data. Move this section to the smoothing routine or a separate function.

Ameriflux BASE compatibilty

On advice from Dario drop the OneFlux processed data in favour of Ameriflux Oneflux (most recent) and or base products (as successor to FLUXNET2015).

Propagate QC flags to DD

For the DD data, specify QC information as fraction (0-1) of good (not gap-filled) HH data that went into the respective, aggregated DD data.

Keep all HH data (do not apply any filtering) and provide HH data quality in the respective QC variable. At HH, the QC variable is a code (integer, 0 = good, 1 = ...).

This is to avoid that DD data is aggregated from HH data that has gaps. Gaps are non-random (tend to be more frequent at night). Therefore daily means calculated from cleaned, gappy, HH data will be biased.

Downsampling (DD)

We will need data aggregated to daily for a lot of applications (some use also for also monthly or annual).

  • Is the aggregation implemented as part of the FluxDataKit code, based on the HH data?
  • We should provide outputs also for the daily aggregated data.

Sideloading data on GECO systems

Provision to side-load data on GECO systems.

Cloud coverage requires CRU data which is too much to ask for most people (in terms of getting data etc). This option will be dynamic and system name dependent (not 100% fool proof but close enough).

On GECO systems (Balder / Dash) workstations the code will sideload CRU cloud cover data, on all other systems it will be set to 0, clear sky conditions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.