castelao / cotede Goto Github PK

View Code? Open in Web Editor NEW

48.0 48.0 17.0 1.27 MB

Quality Control of Oceanographic Data

Home Page: https://cotede.readthedocs.io

License: BSD 3-Clause "New" or "Revised" License

Python 98.31% Makefile 0.73% TeX 0.95%

cotede's People

Contributors

Stargazers

Watchers

Forkers

gutofonseca s-good akshay-hegde evanleeturner jessicaaustin aashish24 bkatiemills biavillasboas chanjeunlam cxzhangqi akbarinasab callumrollo wghostw ci-cmg jr3cermak eervaisa neha2801-create

cotede's Issues

split_data_groups returns all False

At some point I changed '==' for 'is' by mistake.

Problems following basic usage - logger issue

Wanted to try out your algorithms and see if they can be applied to our data... installed the package per instructions and got the following error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-43-02ba7dda9ecb> in <module>
----> 1 pqc = fProfileQC('dPIRX003.cnv')

~/anaconda2/envs/cotede/lib/python3.7/site-packages/cotede/qc.py in __init__(self, inputfile, cfg, saveauxiliary, verbose, logger)
    467             # Not the best way, but will work for now. I should pass
    468             #   the reference for the logger being used.
--> 469             input = cnv.fCNV(inputfile, logger=None)
    470         except CNVError as e:
    471             #self.attributes['filename'] = basename(inputfile)

TypeError: __init__() got an unexpected keyword argument 'logger'

Any guidance? I've tried it on python 3.7, and 2.7 with the same error.
Thanks

Profile_envelope

Returning flag 4 where it should be 9.

add alternative methods for calculating salinity

My packages (eg. https://github.com/evanleeturner/sonde3) use the PyPi seawater package to calculate salinity: https://pypi.org/project/seawater/

Since calculating salinity may be different among groups and also affects your written QAPP/QAQC having the ability to apply alternative methods for seawater conversion instead of the GSW package using the Thermodynamic Equation of Seawater 2010 (TEOS-10) method or creating specific forks to the CoTeDe package would be an excellent feature of this codebase!

No flag if lat or lon are not available

If latitude or longitude are not available at .attributes, it fail safe in a except but do not define any flag for that. It's probably better to return a failing flag.

nan in gsw output

density_invertion test uses gsw, which return nan as result of invalid inputs. When comparing with the threshold, it returns an annoying message:
RuntimeWarning: invalid value encountered in greater_equal

Example of Fuzzy Logic Quality Control

import skfuzzy as fuzz

ModuleNotFoundError: No module named 'skfuzzy'

Failing on descentPrate()

descentPrate() always fail.

It was partially updated for the new pattern to receive the whole data object instead of the old pattern that would receive directly and only the required variables.

First and last feature points

@BillMills , identified an inconsistency in the spike feature. That is the result of creating empty masked arrays and later recovering only the data.

For now, the (dirty & fast) solution is to force NaN to the first and last data points of the features that require the neighbors.

Global Range was not flagging extreme values

There was a typo, using & instead of: "x < min | x > max".

Add API documentation

I might be missing something, but I don't see docs for the API at https://cotede.readthedocs.io/en/latest/readme.html

Looks like the code as docstrings for most classes and test methods, so this is probably just a matter of configuring sphinx to produce the docs.

Create a plan to extend tests for Chlorophyll

CoTeDe is currently operates with temperature and salinity. What is needed to start to evaluate chlorophyll fluorescence (fchl)?

I know of two good references for QC procedures. Do you know more?

BGC-Argo: https://archimer.ifremer.fr/doc/00243/35385/
QARTOD: https://ioos.noaa.gov/ioos-in-action/oceanic-optics/

Some tests like valid location already exist in CoTeDe.

Create a config file (QC descriptor) for the tests already available;
List the desired tests to include;

Fuzzy logic fails with masked value

Fuzzy logic fails if any input data is masked when aggregating the outcome of the rules.

Remove DAP dependency

Would be nice to do not depend on the PyDAP by default, but let the user know the user know what can't be done without it.

Spike Depth Conditional using wrong variable

Depth conditional spike test is trying to use variable g, instead of s.

Incorrect top params fit

It's a minor bug. When defining the top n% of the samples to fit the exponweib.pdf(), it was considering the total N that included non valid data.

A sample can be valid, but not possible to evaluate in a specific test, like a place with a climatology build with only one historical sample. In that case, I do not consider the climatology result, so there is a sample that could be flagged valid by other tests, but do not have a climatology test result.

Fix install documentation

Update the instructions related to climatology and bathymetry, which are now handled by another package, the OceansDB.

What's the best approach to capture invalid flag 9 values?

Use valid or ~isfinite might be a more generic way to do this.

flag[ma.getmaskarray(data[v])] = 9

Flag 9 is not being set

Flag 9 is used for non available or NaN.

It is not being set due a bad use of ~np.isfinite() in a MaskedArray.

GTSPP config for WOA test

Hi @castelao. Thanks for making this software available! I've been working through all the tests you have implemented and I noticed that the WOA_normbias config in gtspp.json includes the variable 't_an', which I think should be 't_mn'. Also, I'm not sure about this, but I think that the 'at_sea' test in the same file (and some of the other config files) may not work unless it is called 'location_at_sea'?

Add conda-forge packages for CoTeDe and oceansdb

Our team uses conda and conda-forge to manage dependencies for our python projects. Looks like neither CoTeDe nor oceansdb are in conda-forge. Looks like GSW is in there.

Fuzzylogic and Morello2014 tests

Hi @castelao, I noticed you have two config files for fuzzy logic tests - fuzzylogic.json and morello2014.json. The settings look very similar in both. For AutoQC should I consider these as separate tests or would you say that they are too similar to bother?

numpy version

print(numpy.fmin(numpy.zeros(1), numpy.ma.masked) is numpy.ma.masked)

evaluates to true in numpy 1.11 and 1.12, but false in 1.13. So what? This changes how some of cotede's fuzzy logic gets evaluated at https://github.com/castelao/CoTeDe/blob/master/cotede/fuzzy/fuzzy_core.py#L139-L140 - numpy 1.13+ will fail to bail out on that if clause.

If you agree, you may consider pinning numpy>=1.11.1,<=1.12.1.

valid_date instead of valid_datetime

All preset QC configuration files were using 'valid_date' key while the QC engine was searching for 'valid_datetime'.

I'll use datetime instead of date to make clear that time will be together, when available.

Error in fuzzylogic notebook

This is probably abandoned, but there is a mistake in the fuzzy_logic.ipnb where the fuzzy functions are generated.

for each line, where there is something like:
data['spike_lo'] = fuzz.zmf(data['x_spike'], cfg['fuzzylogic']['features']['spike']['low'])

it should be:
data['spike_lo'] = fuzz.zmf(data['x_spike'], cfg['fuzzylogic']['features']['spike']['low']['params'])

So the parameters of the z-membership function are loaded properly. The author probably modified load_cfg and forgot to update the notebook.

Threshold for tukey53H_norm in cotede configuration

Hi again @castelao, I just wanted to check what threshold should be for the tukey53H_norm test in the cotede.json configuration? At the moment it looks like it is picking up the value of the threshold from the last test run (2). Is this the correct value to use?

Gradient test, flag 9 on first element

The gradient test frequently returns the first element flag as 9.

If there is none masked element on the profile, the .mask returns one boolean False. The proper way to do this is using getmaskarray().

flag setting in location_at_sea.py

Hi @castelao, sorry to raise another issue but I found that my code was failing if a profile location was on land. I think it might be because 'flag' is not set before it is used in location_at_sea.py. If I set flag = 3 it runs through as expected.

Remove dependency on Pandas

It would still require Pandas for some functionalities, but it must be possible to install and use the core applications without Pandas.

Avoid dependency on Scipy

It's only used for some interpolation. It's easy to avoid it. Although Scipy is a quite nice package, there is no sense on require it just for 2D linear interpolation.

Create tests for depth conditional spike

Create a consistency check for depth conditional spike test.

Bug #24 wasn't detected before.

Double weight on first feature of Anomaly Detection

The first feature evaluated in Anomaly Detection was being considered twice.

Although it's a conceptual error since I'm not using weights at this point, it shouldn't compromise the classification results.

location_at_sea: one vs multiple positions

For TSG each mesurement has its positions, requiring multiple evaluations of location_at_sea test.

Rethink the function itself, as well as the call on: common and evaluate.

woa_comparison as float64

ProfileQCCollection.flags is returning woa_comparison as a float64. It was supposed to be an integer

Improper flag set on global_range and woa_comparison

Improper use of indices on Masked Arrays to set the flags.

Resolve flake8 failures and/or add flake8-ignore

The CONTRIBUTING docs recommend running flake8. However when I tried this, there were a lot of flake8 failures. I recommend resolving those and/or adding a list of ignored tests to flake8-ignore. Otherwise, as a contributor I'd just ignore the output of these tests.