Coder Social home page Coder Social logo

cotede's People

Contributors

bkatiemills avatar castelao avatar kthyng avatar s-good avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cotede's Issues

Problems following basic usage - logger issue

Wanted to try out your algorithms and see if they can be applied to our data... installed the package per instructions and got the following error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-43-02ba7dda9ecb> in <module>
----> 1 pqc = fProfileQC('dPIRX003.cnv')

~/anaconda2/envs/cotede/lib/python3.7/site-packages/cotede/qc.py in __init__(self, inputfile, cfg, saveauxiliary, verbose, logger)
    467             # Not the best way, but will work for now. I should pass
    468             #   the reference for the logger being used.
--> 469             input = cnv.fCNV(inputfile, logger=None)
    470         except CNVError as e:
    471             #self.attributes['filename'] = basename(inputfile)

TypeError: __init__() got an unexpected keyword argument 'logger'

Any guidance? I've tried it on python 3.7, and 2.7 with the same error.
Thanks

add alternative methods for calculating salinity

My packages (eg. https://github.com/evanleeturner/sonde3) use the PyPi seawater package to calculate salinity: https://pypi.org/project/seawater/

Since calculating salinity may be different among groups and also affects your written QAPP/QAQC having the ability to apply alternative methods for seawater conversion instead of the GSW package using the Thermodynamic Equation of Seawater 2010 (TEOS-10) method or creating specific forks to the CoTeDe package would be an excellent feature of this codebase!

No flag if lat or lon are not available

If latitude or longitude are not available at .attributes, it fail safe in a except but do not define any flag for that. It's probably better to return a failing flag.

nan in gsw output

density_invertion test uses gsw, which return nan as result of invalid inputs. When comparing with the threshold, it returns an annoying message:
RuntimeWarning: invalid value encountered in greater_equal

Failing on descentPrate()

descentPrate() always fail.

It was partially updated for the new pattern to receive the whole data object instead of the old pattern that would receive directly and only the required variables.

Create a plan to extend tests for Chlorophyll

CoTeDe is currently operates with temperature and salinity. What is needed to start to evaluate chlorophyll fluorescence (fchl)?

I know of two good references for QC procedures. Do you know more?

Some tests like valid location already exist in CoTeDe.

  • Create a config file (QC descriptor) for the tests already available;
  • List the desired tests to include;

Remove DAP dependency

Would be nice to do not depend on the PyDAP by default, but let the user know the user know what can't be done without it.

Incorrect top params fit

It's a minor bug. When defining the top n% of the samples to fit the exponweib.pdf(), it was considering the total N that included non valid data.

A sample can be valid, but not possible to evaluate in a specific test, like a place with a climatology build with only one historical sample. In that case, I do not consider the climatology result, so there is a sample that could be flagged valid by other tests, but do not have a climatology test result.

Flag 9 is not being set

Flag 9 is used for non available or NaN.

It is not being set due a bad use of ~np.isfinite() in a MaskedArray.

GTSPP config for WOA test

Hi @castelao. Thanks for making this software available! I've been working through all the tests you have implemented and I noticed that the WOA_normbias config in gtspp.json includes the variable 't_an', which I think should be 't_mn'. Also, I'm not sure about this, but I think that the 'at_sea' test in the same file (and some of the other config files) may not work unless it is called 'location_at_sea'?

Fuzzylogic and Morello2014 tests

Hi @castelao, I noticed you have two config files for fuzzy logic tests - fuzzylogic.json and morello2014.json. The settings look very similar in both. For AutoQC should I consider these as separate tests or would you say that they are too similar to bother?

valid_date instead of valid_datetime

All preset QC configuration files were using 'valid_date' key while the QC engine was searching for 'valid_datetime'.

I'll use datetime instead of date to make clear that time will be together, when available.

Error in fuzzylogic notebook

This is probably abandoned, but there is a mistake in the fuzzy_logic.ipnb where the fuzzy functions are generated.

for each line, where there is something like:
data['spike_lo'] = fuzz.zmf(data['x_spike'], cfg['fuzzylogic']['features']['spike']['low'])

it should be:
data['spike_lo'] = fuzz.zmf(data['x_spike'], cfg['fuzzylogic']['features']['spike']['low']['params'])

So the parameters of the z-membership function are loaded properly. The author probably modified load_cfg and forgot to update the notebook.

Threshold for tukey53H_norm in cotede configuration

Hi again @castelao, I just wanted to check what threshold should be for the tukey53H_norm test in the cotede.json configuration? At the moment it looks like it is picking up the value of the threshold from the last test run (2). Is this the correct value to use?

Gradient test, flag 9 on first element

The gradient test frequently returns the first element flag as 9.

If there is none masked element on the profile, the .mask returns one boolean False. The proper way to do this is using getmaskarray().

flag setting in location_at_sea.py

Hi @castelao, sorry to raise another issue but I found that my code was failing if a profile location was on land. I think it might be because 'flag' is not set before it is used in location_at_sea.py. If I set flag = 3 it runs through as expected.

Remove dependency on Pandas

It would still require Pandas for some functionalities, but it must be possible to install and use the core applications without Pandas.

Avoid dependency on Scipy

It's only used for some interpolation. It's easy to avoid it. Although Scipy is a quite nice package, there is no sense on require it just for 2D linear interpolation.

Double weight on first feature of Anomaly Detection

The first feature evaluated in Anomaly Detection was being considered twice.

Although it's a conceptual error since I'm not using weights at this point, it shouldn't compromise the classification results.

location_at_sea: one vs multiple positions

For TSG each mesurement has its positions, requiring multiple evaluations of location_at_sea test.

Rethink the function itself, as well as the call on: common and evaluate.

woa_comparison as float64

ProfileQCCollection.flags is returning woa_comparison as a float64. It was supposed to be an integer

Resolve flake8 failures and/or add flake8-ignore

The CONTRIBUTING docs recommend running flake8. However when I tried this, there were a lot of flake8 failures. I recommend resolving those and/or adding a list of ignored tests to flake8-ignore. Otherwise, as a contributor I'd just ignore the output of these tests.

i2b_flags fails with pandas.Series()

i2b_flags() would fail if loaded with a pandas.Series().

The solution should be generic and avoid the requirement in the pandas package, i.e. be able to handle a pd.Series() without explicitly checking if it is a pd.Series().

Incorrect sub-samplig by split_data_groups()

Unavailable data, flagged 9 or masked, was being considered as bad data by split_data_groups(), while it should be ignored by the adjusting procedure to define the parameters for the Anomaly Detection.

split_data_groups() randomly sub-sample the data into fit, test and err groups for the anomaly detection procedure.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.