Coder Social home page Coder Social logo

meaningful-data / sdmxthon Goto Github PK

View Code? Open in Web Editor NEW
9.0 9.0 0.0 27.15 MB

Library with SDMX to Pandas, Pandas to SDMX, SDMX validation and SDMX metadata validation

Home Page: https://docs.sdmxthon.meaningfuldata.eu/

License: Apache License 2.0

Python 88.54% Jupyter Notebook 11.46%
pandas sdmx sdmx-format sdmx-standard validation

sdmxthon's People

Contributors

antonio-olleros avatar dependabot[bot] avatar guillegrc avatar javihern98 avatar marinavelascomd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

sdmxthon's Issues

Wrong date string format.

Hi, thank you for this library.

I am trying to read an sdmx file (at this url: 'https://www.i14y.admin.ch/api/CodeLists/CL_HGDE_KT/exports/SDMX-ML/2.1?annotations=false') and I get errors due to the date string format. The library only allows the format %Y-%m-%d but I have date time format of the type 1978-12-31T23:00:00 in my file. Would it be possible to update the library to allow this format ?

Something like this in 'set_date_from_string' in model/utils.py

def set_date_from_string(value: str, format_: str = "%Y-%m-%dT%H:%M:%S"):
    """Generic function to format a string to datetime

    Args: value: The value to be validated.
    format_: A regex pattern to validate if the string has a specific format

    Returns:
        A datetime object

    Raises:
        ValueError: If the value violates the format constraint.
    """

    if value is None:
        return None
    
    for fmt in (format_, "%Y-%m-%d", "%Y-%m-%dT%H:%M:%S"):
        try:
            return datetime.strptime(value, fmt)
        except ValueError:
            pass
        
    raise ValueError(f"Wrong date string format. The formats {format_} "
                     f"or %Y-%m-%d or %Y-%m-%dT%H:%M:%S "
                     f"should be followed. {str(value)} passed")

Thank you :)

FMR validation on dataset

Implement method on DataSet class to validate data using SDMX-CSV from the Pandas Dataframe.

Signature of method should allow the definition of the FMR host (domain and port, on separated arguments. Port must be an integer between 1 and 65535). Even on remote it is asynchronous, the process should wait until the validation has status completed and returns the response from FMR (in the future we may change this)

Endpoints to be used:
https://fmrwiki.sdmxcloud.org/Asynchronous_Data_Validation_and_Transformation_Web_Service

Process:

  1. Ensure the host is up and available -> Exception if not.
  2. Generate SDMX-CSV in memory (for now, expect to use disk on large files in the future) and send request
  3. Get UID from response and query status (until status is Completed, interval of 0.5s (default value with parameter)
  4. Parse response as JSON when status is completed and check for errors.

https://fmrwiki.sdmxcloud.org/Data_Validation_Web_Service#Dataset_with_Errors

SDMX-CSV support in API methods

Add support for SDMX-CSV in get_pandas_df and get_datasets. Make sure the limitations of these methods are still present (data file checking and same output)

Improve code coverage

Add tests for the following items:

  • Reading data file with Dataflow (readingValidations)
  • Writing more than one dataset (dataWriting)
  • Read from URL using query builder (queryBuilder)
  • Test get_pandas_df with SDMX-CSV (APImethods)
  • Test get_datasets with SDMX-CSV (APImethods)
  • Test metadata download from data (APImethods)

Empty structure for sdmxthon.model.definitions.DataFlowDefinition

Running the following code returns a NoneType. Is this expected?

import sdmxthon

url = "https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/dataflow/ESTAT/MET_EDAT_LFSE4/1.0?detail=full"

message = sdmxthon.read_sdmx(url)
resource = message.payload[list(message.payload)[0]]
df = resource[list(resource)[0]]

df.structure

I would expect it returns <s:Structure>, for example:
<Ref id="MET_EDAT_LFSE4" version="18.0" agencyID="ESTAT" package="datastructure" class="DataStructure"/>

Is this correct or is there another way to handle it?

Thank you very much in advance.

Message payload should return object on simple messages

Change message payload to retrieve the sole object if only one of one kind is found:

  • If a DataSet is present and it is only one, return the Dataset object
  • If any ItemScheme or Definition is present, return the sole object
  • If more than one is present, but of the same kind, return a list
  • If more than one is present from different kinds, return same type as content

Create dataset with only 1 record fails

Check error sdmxthon/parsers/data_read.py", line 135, in create_dataset
df = pd.DataFrame(dataset[OBS]).replace(np.nan, '')

Should be able to go through when Dataframe has only 1 record.

Validate dataset component with codelist

Add functionality to validate a dataset column with the codes of a Codelist, passed as argument to the function.

It must support unique_id as argument and download of the codelist from the WebService, if available, as well as passing the Codelist object.

Issue with ILO codelist

Running the following does not return any code

import sdmxthon message = sdmxthon.read_sdmx("https://www.ilo.org/sdmx/rest/codelist/ILO/CL_AREA/1.0?detail=full") message.content["Codelists"]['ILO:CL_AREA(1.0)'].items

However, running the same code for Eurostat seems to work properly:
message = sdmxthon.read_sdmx("https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/codelist/ESTAT/GEO/1.0?detail=full") message.content["Codelists"]['ESTAT:GEO(1.0)'].items

Could you kindly help with this issue? Thank you

Tests csv with dataflows

  • Read_sdmx file that points to a dataflow.
  • Use dataset.dataflow
  • Perform structural validation.
  • Read csv file and use pd.assert_equal to compare two dataframes.

Dataset attributes and SDMX-CSV bug

Revision dataset attributes bug on _check_DA_keys when removing a key in the Dataset attributes.

Revision SDMX-CSV v2 to include action column (default value I -> Inform)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.