castelao / cotede Goto Github PK
View Code? Open in Web Editor NEWQuality Control of Oceanographic Data
Home Page: https://cotede.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
Quality Control of Oceanographic Data
Home Page: https://cotede.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
At some point I changed '==' for 'is' by mistake.
Wanted to try out your algorithms and see if they can be applied to our data... installed the package per instructions and got the following error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-43-02ba7dda9ecb> in <module>
----> 1 pqc = fProfileQC('dPIRX003.cnv')
~/anaconda2/envs/cotede/lib/python3.7/site-packages/cotede/qc.py in __init__(self, inputfile, cfg, saveauxiliary, verbose, logger)
467 # Not the best way, but will work for now. I should pass
468 # the reference for the logger being used.
--> 469 input = cnv.fCNV(inputfile, logger=None)
470 except CNVError as e:
471 #self.attributes['filename'] = basename(inputfile)
TypeError: __init__() got an unexpected keyword argument 'logger'
Any guidance? I've tried it on python 3.7, and 2.7 with the same error.
Thanks
Returning flag 4 where it should be 9.
My packages (eg. https://github.com/evanleeturner/sonde3) use the PyPi seawater package to calculate salinity: https://pypi.org/project/seawater/
Since calculating salinity may be different among groups and also affects your written QAPP/QAQC having the ability to apply alternative methods for seawater conversion instead of the GSW package using the Thermodynamic Equation of Seawater 2010 (TEOS-10) method or creating specific forks to the CoTeDe package would be an excellent feature of this codebase!
If latitude or longitude are not available at .attributes, it fail safe in a except but do not define any flag for that. It's probably better to return a failing flag.
density_invertion test uses gsw, which return nan as result of invalid inputs. When comparing with the threshold, it returns an annoying message:
RuntimeWarning: invalid value encountered in greater_equal
import skfuzzy as fuzz
ModuleNotFoundError: No module named 'skfuzzy'
descentPrate() always fail.
It was partially updated for the new pattern to receive the whole data object instead of the old pattern that would receive directly and only the required variables.
@BillMills , identified an inconsistency in the spike feature. That is the result of creating empty masked arrays and later recovering only the data.
For now, the (dirty & fast) solution is to force NaN to the first and last data points of the features that require the neighbors.
There was a typo, using & instead of: "x < min | x > max".
I might be missing something, but I don't see docs for the API at https://cotede.readthedocs.io/en/latest/readme.html
Looks like the code as docstrings for most classes and test methods, so this is probably just a matter of configuring sphinx to produce the docs.
CoTeDe is currently operates with temperature and salinity. What is needed to start to evaluate chlorophyll fluorescence (fchl)?
I know of two good references for QC procedures. Do you know more?
Some tests like valid location already exist in CoTeDe.
Fuzzy logic fails if any input data is masked when aggregating the outcome of the rules.
Would be nice to do not depend on the PyDAP by default, but let the user know the user know what can't be done without it.
Depth conditional spike test is trying to use variable g
, instead of s
.
It's a minor bug. When defining the top n% of the samples to fit the exponweib.pdf(), it was considering the total N that included non valid data.
A sample can be valid, but not possible to evaluate in a specific test, like a place with a climatology build with only one historical sample. In that case, I do not consider the climatology result, so there is a sample that could be flagged valid by other tests, but do not have a climatology test result.
Update the instructions related to climatology and bathymetry, which are now handled by another package, the OceansDB.
Use valid or ~isfinite might be a more generic way to do this.
flag[ma.getmaskarray(data[v])] = 9
Flag 9 is used for non available or NaN.
It is not being set due a bad use of ~np.isfinite() in a MaskedArray.
Hi @castelao. Thanks for making this software available! I've been working through all the tests you have implemented and I noticed that the WOA_normbias config in gtspp.json includes the variable 't_an', which I think should be 't_mn'. Also, I'm not sure about this, but I think that the 'at_sea' test in the same file (and some of the other config files) may not work unless it is called 'location_at_sea'?
Our team uses conda and conda-forge to manage dependencies for our python projects. Looks like neither CoTeDe nor oceansdb are in conda-forge. Looks like GSW is in there.
Hi @castelao, I noticed you have two config files for fuzzy logic tests - fuzzylogic.json and morello2014.json. The settings look very similar in both. For AutoQC should I consider these as separate tests or would you say that they are too similar to bother?
print(numpy.fmin(numpy.zeros(1), numpy.ma.masked) is numpy.ma.masked)
evaluates to true in numpy 1.11 and 1.12, but false in 1.13. So what? This changes how some of cotede's fuzzy logic gets evaluated at https://github.com/castelao/CoTeDe/blob/master/cotede/fuzzy/fuzzy_core.py#L139-L140 - numpy 1.13+ will fail to bail out on that if clause.
If you agree, you may consider pinning numpy>=1.11.1,<=1.12.1
.
All preset QC configuration files were using 'valid_date' key while the QC engine was searching for 'valid_datetime'.
I'll use datetime instead of date to make clear that time will be together, when available.
This is probably abandoned, but there is a mistake in the fuzzy_logic.ipnb where the fuzzy functions are generated.
for each line, where there is something like:
data['spike_lo'] = fuzz.zmf(data['x_spike'], cfg['fuzzylogic']['features']['spike']['low'])
it should be:
data['spike_lo'] = fuzz.zmf(data['x_spike'], cfg['fuzzylogic']['features']['spike']['low']['params'])
So the parameters of the z-membership function are loaded properly. The author probably modified load_cfg and forgot to update the notebook.
Hi again @castelao, I just wanted to check what threshold should be for the tukey53H_norm test in the cotede.json configuration? At the moment it looks like it is picking up the value of the threshold from the last test run (2). Is this the correct value to use?
The gradient test frequently returns the first element flag as 9.
If there is none masked element on the profile, the .mask returns one boolean False. The proper way to do this is using getmaskarray().
Hi @castelao, sorry to raise another issue but I found that my code was failing if a profile location was on land. I think it might be because 'flag' is not set before it is used in location_at_sea.py. If I set flag = 3 it runs through as expected.
It would still require Pandas for some functionalities, but it must be possible to install and use the core applications without Pandas.
It's only used for some interpolation. It's easy to avoid it. Although Scipy is a quite nice package, there is no sense on require it just for 2D linear interpolation.
Create a consistency check for depth conditional spike test.
Bug #24 wasn't detected before.
The first feature evaluated in Anomaly Detection was being considered twice.
Although it's a conceptual error since I'm not using weights at this point, it shouldn't compromise the classification results.
For TSG each mesurement has its positions, requiring multiple evaluations of location_at_sea test.
Rethink the function itself, as well as the call on: common and evaluate.
ProfileQCCollection.flags is returning woa_comparison as a float64. It was supposed to be an integer
Improper use of indices on Masked Arrays to set the flags.
The CONTRIBUTING docs recommend running flake8. However when I tried this, there were a lot of flake8 failures. I recommend resolving those and/or adding a list of ignored tests to flake8-ignore. Otherwise, as a contributor I'd just ignore the output of these tests.
Double check the procedure, including what happens if lat/lon are not available.
Update the consistency tests accordingly.
i2b_flags() would fail if loaded with a pandas.Series().
The solution should be generic and avoid the requirement in the pandas package, i.e. be able to handle a pd.Series() without explicitly checking if it is a pd.Series().
Unavailable data, flagged 9 or masked, was being considered as bad data by split_data_groups(), while it should be ignored by the adjusting procedure to define the parameters for the Anomaly Detection.
split_data_groups() randomly sub-sample the data into fit, test and err groups for the anomaly detection procedure.
If OceansDB returns only masked values, woa_normbias fails to process it.
woa_normbias should return flag 0 for every point that doesn't have a climatology.
Maybe use data from https://api.tidesandcurrents.noaa.gov/api/prod/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.