Coder Social home page Coder Social logo

nbbrd / sdmx-dl Goto Github PK

View Code? Open in Web Editor NEW
3.0 5.0 0.0 19.39 MB

Easily download official statistics

Home Page: https://nbbrd.github.io/sdmx-dl

License: European Union Public License 1.2

Java 99.88% PowerShell 0.12%
library java8 command-line-tool sdmx official-statistics

sdmx-dl's People

Contributors

charphi avatar dependabot-preview[bot] avatar dependabot-support avatar dependabot[bot] avatar hakky54 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

sdmx-dl's Issues

Add reading of gzipped SDMX-XML files

XML files are really verbose and generate big files.
As the data grows, it could be interesting to allow the direct reading of gzip-compressed files.
The typical compress rate of XML in gzip is 90%.
The file extension would be .xml.gz

Improve feedback on missing data flow

On CLI, an invalid data flow reference generates a ResponseError: 404: null.
It would be better to check the reference before sending queries to server.

Fix unexpected exception while getting data on WITS

At least one flow of WITS has a time dimension that is not the latest dimension.
This triggers a IndexOutOfBoundsException in Key.Builder.

Example: http://wits.worldbank.org/API/V1/SDMX/V21/rest/datastructure/WBG_WITS/TARIFF_TRAINS/1.1/?references=children

<DimensionList id="DimensionDescriptor" urn="urn:sdmx:org.sdmx.infomodel.datastructure.DimensionDescriptor=WBG_WITS:TARIFF_TRAINS(1.1).DimensionDescriptor">
	<Dimension id="FREQ" urn="urn:sdmx:org.sdmx.infomodel.datastructure.Dimension=WBG_WITS:TARIFF_TRAINS(1.1).FREQ" position="1">
		<ConceptIdentity>
			<Ref id="FREQ" maintainableParentID="TARIFF_CONCEPTS" maintainableParentVersion="1.0" agencyID="WBG_WITS" package="conceptscheme" class="Concept" xmlns="" />
		</ConceptIdentity>
		<LocalRepresentation>
			<Enumeration>
				<Ref id="CL_FREQ_WITS" version="1.0" agencyID="WBG_WITS" package="codelist" class="Codelist" xmlns="" />
			</Enumeration>
		</LocalRepresentation>
	</Dimension>
	<Dimension id="REPORTER" urn="urn:sdmx:org.sdmx.infomodel.datastructure.Dimension=WBG_WITS:TARIFF_TRAINS(1.1).REPORTER" position="2">
		<ConceptIdentity>
			<Ref id="REPORTER" maintainableParentID="TARIFF_CONCEPTS" maintainableParentVersion="1.0" agencyID="WBG_WITS" package="conceptscheme" class="Concept" xmlns="" />
		</ConceptIdentity>
		<LocalRepresentation>
			<Enumeration>
				<Ref id="CL_COUNTRY_WITS" version="1.0" agencyID="WBG_WITS" package="codelist" class="Codelist" xmlns="" />
			</Enumeration>
		</LocalRepresentation>
	</Dimension>
	<Dimension id="PARTNER" urn="urn:sdmx:org.sdmx.infomodel.datastructure.Dimension=WBG_WITS:TARIFF_TRAINS(1.1).PARTNER" position="3">
		<ConceptIdentity>
			<Ref id="PARTNER" maintainableParentID="TARIFF_CONCEPTS" maintainableParentVersion="1.0" agencyID="WBG_WITS" package="conceptscheme" class="Concept" xmlns="" />
		</ConceptIdentity>
		<LocalRepresentation>
			<Enumeration>
				<Ref id="CL_COUNTRY_WITS" version="1.0" agencyID="WBG_WITS" package="codelist" class="Codelist" xmlns="" />
			</Enumeration>
		</LocalRepresentation>
	</Dimension>
	<TimeDimension id="TIME_PERIOD" urn="urn:sdmx:org.sdmx.infomodel.datastructure.TimeDimension=WBG_WITS:TARIFF_TRAINS(1.1).TIME_PERIOD" position="4">
		<ConceptIdentity>
			<Ref id="YEAR" maintainableParentID="TARIFF_CONCEPTS" maintainableParentVersion="1.0" agencyID="WBG_WITS" package="conceptscheme" class="Concept" xmlns="" />
		</ConceptIdentity>
		<LocalRepresentation>
			<TextFormat textType="ObservationalTimePeriod" />
		</LocalRepresentation>
	</TimeDimension>
	<Dimension id="PRODUCTCODE" urn="urn:sdmx:org.sdmx.infomodel.datastructure.Dimension=WBG_WITS:TARIFF_TRAINS(1.1).PRODUCTCODE" position="5">
		<ConceptIdentity>
			<Ref id="PRODUCTCODE" maintainableParentID="TARIFF_CONCEPTS" maintainableParentVersion="1.0" agencyID="WBG_WITS" package="conceptscheme" class="Concept" xmlns="" />
		</ConceptIdentity>
		<LocalRepresentation>
			<Enumeration>
				<Ref id="CL_PRODUCTCODE_WITS" version="1.0" agencyID="WBG_WITS" package="codelist" class="Codelist" xmlns="" />
			</Enumeration>
		</LocalRepresentation>
	</Dimension>
	<Dimension id="DATATYPE" urn="urn:sdmx:org.sdmx.infomodel.datastructure.Dimension=WBG_WITS:TARIFF_TRAINS(1.1).DATATYPE" position="6">
		<ConceptIdentity>
			<Ref id="DATATYPE" maintainableParentID="TARIFF_CONCEPTS" maintainableParentVersion="1.0" agencyID="WBG_WITS" package="conceptscheme" class="Concept" xmlns="" />
		</ConceptIdentity>
		<LocalRepresentation>
			<Enumeration>
				<Ref id="CL_DATATYPE_WITS" version="1.0" agencyID="WBG_WITS" package="codelist" class="Codelist" xmlns="" />
			</Enumeration>
		</LocalRepresentation>
	</Dimension>
</DimensionList>

Same example with cli:

> sdmx-dl list concepts WITS DF_WITS_Tariff_TRAINS --sort | convertfrom-csv | format-table

Concept           Label             Type      Coded Position
-------           -----             ----      ----- --------
FREQ              Freq              dimension true  1
REPORTER          Reporter_ISO_N    dimension true  2
PARTNER           Partner           dimension true  3
PRODUCTCODE       ProductCode       dimension true  5
DATATYPE          DataType          dimension true  6
DATASOURCE        DataSource        attribute true
EXCLUDEDFROM      ExcludedFrom      attribute true

Refactor features discovery

Each provider have different capabilities and the current method SdmxConnection#isDetailSupported() is too limited to describe it. Another solution would be to use an enum instead.

For example:

enum SdmxWebFeature {
  KEY_FILTER,
  DETAIL_FILTER
}

Fix key validity check on input

It is currently possible to specify an input key with an number of dimension different from the one defined in data structure. This leads to strange results.
It would be better to check key validaty before using it in requests.

Improve CLI version option

Improve CLI version option by adding relevant information and colors.

For example, Maven print the following information:

Apache Maven 3.8.1 (05c21c65bdfed0f71a2f2ada8b84da59348c4c5d)
Maven home: C:\...\apache-maven-3.8.1\bin\..
Java version: 16, vendor: AdoptOpenJDK, runtime: C:\...\jdk-16+36
Default locale: en_US, platform encoding: Cp1252
OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"

Related picocli documentation : https://picocli.info/#_version_help

Modify command check.properties

  1. The command properties should be renamed config to be more concise.
  2. The ouput column type should be renamed to provide a clearer meaning.

Fix inconsistent formatting of NaN values

NaN values are formatted differently in CLI depending on the JDK version.

Example: sdmx-dl fetch data ECB IFI M.U2.MM.1D.P000.ME.Z5.EUR

JDK8:

Series ObsAttributes ObsPeriod ObsValue
M.U2.MM.1D.P000.ME.Z5.EUR OBS_STATUS=M,OBS_COM= ,OBS_CONF=F 1992-07-01T00:00:00 ?
M.U2.MM.1D.P000.ME.Z5.EUR OBS_STATUS=M,OBS_COM= ,OBS_CONF=F 1992-08-01T00:00:00 ?

JDK16:

Series ObsAttributes ObsPeriod ObsValue
M.U2.MM.1D.P000.ME.Z5.EUR OBS_STATUS=M,OBS_COM= ,OBS_CONF=F 1992-07-01T00:00:00 NaN
M.U2.MM.1D.P000.ME.Z5.EUR OBS_STATUS=M,OBS_COM= ,OBS_CONF=F 1992-08-01T00:00:00 NaN

Data parsing is failing due to an invalid xml file

DataflowRef: StatCan:DF_10100139(1.0)

Xml file is invalid because the series headers are missing:

<?xml version='1.0' encoding='UTF-8'?>
<message:StructureSpecificData xmlns:ss="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/structurespecific" xmlns:footer="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message/footer" xmlns:ns1="urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=StatCan:DF_10100139(1.0):ObsLevelDim:TIME_PERIOD" xmlns:message="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message" xmlns:common="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xml="http://www.w3.org/XML/1998/namespace">
	<message:Header>
		<message:ID>DS1598331006473</message:ID>
		<message:Test>false</message:Test>
		<message:Prepared>2020-08-25T04:50:06</message:Prepared>
		<message:Sender id="MetadataTechnology"/>
		<message:Structure structureID="StatCan_DF_10100139_1_0" namespace="urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=StatCan:DF_10100139(1.0):ObsLevelDim:TIME_PERIOD" dimensionAtObservation="TIME_PERIOD">
			<common:StructureUsage>
				<Ref agencyID="StatCan" id="DF_10100139" version="1.0"/>
			</common:StructureUsage>
		</message:Structure>
	</message:Header>
	<message:DataSet ss:dataScope="DataStructure" xsi:type="ns1:DataSetType" ss:structureRef="StatCan_DF_10100139_1_0">
		<Series/>
	</message:DataSet>
</message:StructureSpecificData>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.