traitecoevo / austraits.build Goto Github PK
View Code? Open in Web Editor NEWSource for AusTraits
License: Other
Source for AusTraits
License: Other
These go into files configPlantCharacters.csv
for each datasets, e.g. data/dataset_008/configPlantCharacters.csv
what is currently in the character column needs to go in 'variable_in' and the character column needs to be populated with standardised trait names
Stuart, could you review the README.md and CONTRIBUTING.md pages, to see if the instructions generally make sense?
For example dataset18, which has data for a few traits which I haven't previously seen
In config/variableConversions.csv
We need a table of desired variables names and units for site variables, like this one for plant traits config/variableDefinitions.csv
Minor issue, some datasets contain a 'primary_source' column, others have 'primary_source_id'
At different times we're using the terms character and trait to mean the same thing. I suggest we standardise. Any objections? I prefer trait
but @rachaelgallagher it's your call.
This is mostly done, just some eyeballing of data required.
Should be able to more or less use existing function (from BAAD), combined with James' updated table of conversions
Calls to regexpr
in processData
function give the warning:
Error in regexpr(regex, x) :
(converted from warning) argument 'pattern' has length > 1 and only the first element will be used
A note in the file by @snubian says: " # NOTE: this throws a warning for some reason, saying the pattern has length > 1. Not sure why, doesn't seem to matter."
For now I have silenced warnings with suppressWarnings(...)
but would be good to understand why this warning appears.
A number of different formats for flowering and fruiting time data are present (see below). austraits v1 had code to standardise this data which could be used to similar effect in austraits v2
e.g.
a_ <- a[a$trait_name == "flowering_time",]
unique(a_$value)
[1] "Sept - Oct" "All year" "Nov - Dec" NA "Dec - Feb" "Aug - Nov" "Aug. - Oct" "Oct - Nov"
[9] "Aug - Dec" "Dec - Jan" "Jan - Feb" "June - Aug" "Oct - Dec" "Oct -Dec" "Sept - Dec" "July - Nov"
[17] "July - OCt" "Aug - Sept" "Mar - May" "Oct - Feb" "Sept - Nov" "Feb - Apr" "July - Oct" "Aug - Oct"
[25] "April - June" "Nov - Feb" "Nov - Jan" "Nov - March" "Sept - Jan" "Summer" "Oct - Jan" "July - Dec"
[33] "Oct - Mar" "Spring - Summer" "Summer - Autumn" "Dec - Mar" "Jan - Apr" "Sept - Mar" "Apr - May" "April - July"
[41] "April - Oct" "Feb - June" "June - Sept" "May - July" "Spring" "Aug - Mar" "Dec - May" "Feb - Mar"
[49] "July - Feb" "June - July" "Nov - MAr" "May - Oct" "Nov - April" "Aug - Feb" "Sept - Feb" "Aug - Dec ?"
[57] "Mar - Nov" "Dec - Apr" "Dec. - Feb" "July - Sept" "June - Nov" "Nov - Mar" "Dec - Feb, fire" "May - June"
[65] "irregular" "Nov - Apr" "April - May" "July - Aug" "Mar - Sept" "Sept - May" "Apr - Nov" "Apr - Sept"
[73] "Feb - Apr;July - Aug" "Feb - May" "Mar - June" "Mar - May; Oct - Dec" "May - Sept" "Sept -Nov" "Apr - Aug" "Feb - July"
[81] "Jan - July" "June - Oct" "Dec - MAr" "Spring- summer" "May - Aug" "Dec - July" "Mar - MAy" "Nov. - Jan"
[89] "April - Sept" "Mar - April" "Mar - Aug" "Mar - July" "June - Jan" "May - Dec" "May - Nov" "Apr - Oct"
[97] "Aug - Oct" "Dec - June" "June - Dec" "Nov" "Nov -Dec" "Oct" "Apr - July" "Aug - Jan"
[105] "Aug - NOv" "Jan" "Jan - Apr" "Jan - June" "Jan - May" "Aug -Nov" "Nov - Dec" "Oct. - Nov"
[113] "Jan - Mar" "July - Oct" "Sept" "Oct - April" "Sept- Dec" "all year" "allyear" "apr-aug"
[121] "jan-sep" "jan-mar" "feb-may" "feb-sep" "feb-aug" "mar-jul" "mar-may" "feb-jun"
[129] "jan-jun" "jun-sep" "jan-jul" "jan-apr" "sep-nov" "sep-dec" "dec" "nov-may"
[137] "dec-jul" "oct-jan" "nov-apr" "jan-aug" "apr-dec" "dec-may" "jun-oct" "apr-oct"
[145] "jul-sep" "jul-aug" "aug-nov" "dec-jun" "mar-nov" "jul-nov" "sep-feb" "dec-jan"
[153] "mar-apr" "jan-feb" "dec-mar" "mar-sept" "feb-jul" "apr-jul" "mar-jun" "may-nov"
[161] "apr-nov" "jul-oct" "jan-may" "june" "oct-mar" "jun-feb" "nov-jan" "nov-jun"
[169] "aug-mar" "nov-aug" "dec-aug" "nov-mar" "jun-dec" "nov-feb" "oct-dec" "nov-dec"
[177] "oct-feb" "aug-jan" "apr-jun" "may-oct" "sep-mar" "mar" "jan" "jun-aug"
[185] "apr-sep" "jan-oct" "jul-mar" "jun-may" "sep-may" "ay" "oct-may" "may-aug"
[193] "apr-may" "dec-feb" "feb" "nov-jul" "feb-mar" "feb-apr" "apr" "mar-dec"
[201] "may-sep" "jul-jan" "feb-nov" "feb-oct" "mar-aug" "feb-dec" "sep-oct" "may-jul"
[209] "oct" "sep" "aug-dec" "dec-apr" "AY" "mar-sep" "dec-sep" "jun-nov"
[217] "sep-jun" "jul" "mar-oct" "jun" "jun-mar" "oct-jul" "aug-apr" "oct-jun"
[225] "nov" "jan-nov" "sep-apr" "oct-nov" "aug-feb" "aug-may" "aug-oct" "may-jun"
[233] "jun-jul" "aug" "nov-oct" "jul-dec" "apr-feb" "apr-jan" "aug-sep" "dec-oct"
[241] "ephemeral" "Feb" "feb-april" "jan/june" "jan-feb/jun-jul" "jul-apr" "jul-may" "mar,sep"
[249] "mar-aug/ephemeral" "mar-aug/nov-jan" "mar-may/aug-oct" "may" "may-dec" "may-july" "nov-sep" "oct-apr"
[257] "sep-jan" "sep-jen" "sep-jul" "spring" "spring/autumn" "spring-summer" "summer" "aug-jul"
[265] "autumn" "jan-july" "july" "summer-autumn" "periodic" "may-feb" "jun-jan" "aug-jun"
[273] "jul-feb" "may-jan" "4-10" "
1-12" "8-1" "
7-10" "5-9" "
5-11"
[281] "8-10" "
7-9" "2-7" "
10-1" "4-5" "
7-8" "9-10" "
5-10"
[289] "12-8" "
5-8" "2-8" "
6-9" "11-2" "
11-12" "6-7" "
7-11"
[297] "9-3" "
1-3" "5" "
8-11" "12-7" "
5-6" "4-6" "
4-8"
[305] "6-8" "
11-1" "8-9" "
5-7" "3-5" "
9-12" "12-1" "
1"
[313] "12-3" "
11-6" "6" "
10-2" "9" "
3-10" "3-8" "
2-9"
[321] "1-8" "
2-11" "12-4" "
4-9" "3-7" "
1-7" "8-12" "
1-5"
[329] "1-6" "
10-12" "3-6" "
11-3" "1-10" "
6-11" "11-4" "
6-12"
[337] "4-7" "
1-4" "9-11" "
3" "7-1" "
8" "3-9" "
2-10"
[345] "1-2" "
7-12" "3-12" "
6-1" "6-10" "
10" "10-11" "
12-2"
[353] "2-4" "
9-2" "2-5" "
2-3" "12-5" "
9-1" "8-2" "
10-6"
[361] "2" "
9-5" "4-12" "
9-4" "2-6" "
10-3" "4-2" "
12"
[369] "11" "
11-5" "10-4" "
1-9" "4" "
4-11" "7" "
10-7"
[377] "5-4" "
6-2" "6-3" "
7-2" "5-12" "
10-5" "3-11" "
12-6"
[385] "7-3" "
3-4" "8-5" "
8-3" "5-2" "
5-1" "8-4" "
1-11"
a_ <- a[a$trait_name == "flowering_month_start",]
unique(a_$value)
[1] "10" "12" NA "11" "9" "3" "8" "1" "6" "2" "4" "7" "all year" "5"
[15] "May" "July" "March" "All year" "September" "November" "December" "June" "August" "April" "January" "February" "October" "NULL"
[29] "MAR" "MAY" "SEP" "FEB" "JAN" "AUG" "DEC" "APR" "JUN" "JUL" "OCT, JAN" "OCT" "NOV" "SEPT"
[43] "NOV, JAN" "ephemeral" "JULY" "JUNE" "APRIL" "JAN, AUG" "APRIL, OCT" "MAR, JULY" "APRI" "FEB,AUG" "OCT,JAN" "MAY, AUG" "FEB, MAY"
a_ <- a[a$trait_name == "flowering_month",]
unique(a_$value)
[1] "Sep" "Oct" "Nov" "Dec" "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul"
[12] "Aug" "all year" "after rain" "after flooding" "after fire"
Things to check
For methods:
Also general guidelines for contributing
Create separate tables for unit renaming and unit conversion
I.e. files which the script bungled
@snubian, I noticed you only added some of the data. Is there a reason we can't add it all now? I'm doing some operations like adding a metadata file to each folder, so I'll need to redo these each time a new cache of data appears.
Also, I removed all the config
subfolder within the folder for each dataset. Easier to have just as ingle folder for each dataset I think. I wonder if you might have recently moved the config files into config folder, so apologies if this cost you time.
Some datasets (e.g. 29) have data which has been converted into date format (e.g. leaf length, seed length)
Rahchael can you read over the new readme page and check you are happy with content. It will get as expanded in future.
Add code in process_datasets.R
Datasets 92 and 93 have rows where neither a value nor a unit is recorded. The unit conversion code is appears to be throwing errors on these rows.
Recently testthat package has been updated and the tests i've written may not work with new version. I'm running testthat_0.11.0 whereas latest is testthat_1.0.2
I.e. lists all columns/variables
Ones we don't use can be marked as NA
currently says CC0 but this is as yet undetermined. May need to change through history.
test integrity of data directories
E.g. dataset 55 has 'lobed' under the trait leaf_compoundness. This data should come under 'lobed'. The whole trait can not be renamed as most of the data is appropriate.
Noticed in one dataset that some trait name data has trailing whitespace, will ensure these are trimmed.
Some datasets, especially those which are derived from parsed text, have duplicated rows for traits with different units. E.g. dataset 92 and 93
Rachael and I decided to split these into separate datasets, that will remove the need to handle multiple IDs per folder
Go through and allocate realistic min_value and max_value.
Just noticed this, for example in dataset_066, that a single character "T" is converted to TRUE by read.csv. Here is a line of data from the csv:
Berrimah_66,Eucalyptus tetrodonta,T,E,S,C3,No,,63.5,,0.68,1.09,105,16.4, 7.0 ,1.1
In R this becomes:
2 Berrimah_66 Eucalyptus miniata TRUE E S C3 No NA 74.0 etc
Not only is "TRUE" written to the output file, but the lookup value is not found - in this case we want T to give the lookup value of "tree" for trait growth form.
Includes short and long names, desired units, type. Like https://github.com/dfalster/baad/blob/master/config/variableDefinitions.csv
This may need to be done manually
Good to commit a copy of existing table into code
, then we'll break up by study
Currently ignoring theses until decide how will enter workflow
I think stuarts existing code has this functionality, via the file configLookups.csv
. Elsewhere James has been documenting changes to be made, so task is to import James's changes into existing workflow.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.