nbbrd / sdmx-dl Goto Github PK
View Code? Open in Web Editor NEWEasily download official statistics
Home Page: https://nbbrd.github.io/sdmx-dl
License: European Union Public License 1.2
Easily download official statistics
Home Page: https://nbbrd.github.io/sdmx-dl
License: European Union Public License 1.2
XML files are really verbose and generate big files.
As the data grows, it could be interesting to allow the direct reading of gzip-compressed files.
The typical compress rate of XML in gzip is 90%.
The file extension would be .xml.gz
API documentation: https://knoema.fr/dev/opendata/sdmx
SdmxWebSource:
Key | Value |
---|---|
Name | KNOEMA |
Description | |
Aliases | |
Driver | |
Dialect | |
Enpoint | http://knoema.com/api/1.0/sdmx |
Properties | |
Website | https://knoema.com/atlas |
Monitor |
On CLI, an invalid data flow reference generates a ResponseError: 404: null
.
It would be better to check the reference before sending queries to server.
As announced on https://stat.data.abs.gov.au/:
ABS.Stat beta is scheduled to be decommissioned on 10 December 2021.
The new server is the latest .STAT suite and supports SDMX21.
Details:
Key | Value |
---|---|
Driver | ri:sdmx21 |
Endpoint | https://api.data.abs.gov.au |
Website | https://explore.data.abs.gov.au |
The current provider is UptimeRobot but a recent modification of rate limits makes it useless.
See #129
I've created a self-hosted solution that uses Upptime:
https://nbbrd.github.io/sdmx-upptime/
At least one provider requires the presence of a user-agent header to handle queries.
The default user-agent should be sdmx-dl/<version>
and should be overridable by configuration (http.agent
)
The current parsing of data doesn't provide feedback on errors. It just bypass it.
It might be useful to provide feedback on these problems.
The current documentation is hosted on GitHub wiki and is written in Markdown.
Markdown is quite limited when used for technical documentation.
AsciiDoc is more suited for this task and is supported by GitHub wiki.
Prior to the migration, we must check the limitations of GitHub wiki against AsciiDoc.
HTTP 429 = Too Many Requests response status
At least one flow of WITS has a time dimension that is not the latest dimension.
This triggers a IndexOutOfBoundsException
in Key.Builder
.
<DimensionList id="DimensionDescriptor" urn="urn:sdmx:org.sdmx.infomodel.datastructure.DimensionDescriptor=WBG_WITS:TARIFF_TRAINS(1.1).DimensionDescriptor">
<Dimension id="FREQ" urn="urn:sdmx:org.sdmx.infomodel.datastructure.Dimension=WBG_WITS:TARIFF_TRAINS(1.1).FREQ" position="1">
<ConceptIdentity>
<Ref id="FREQ" maintainableParentID="TARIFF_CONCEPTS" maintainableParentVersion="1.0" agencyID="WBG_WITS" package="conceptscheme" class="Concept" xmlns="" />
</ConceptIdentity>
<LocalRepresentation>
<Enumeration>
<Ref id="CL_FREQ_WITS" version="1.0" agencyID="WBG_WITS" package="codelist" class="Codelist" xmlns="" />
</Enumeration>
</LocalRepresentation>
</Dimension>
<Dimension id="REPORTER" urn="urn:sdmx:org.sdmx.infomodel.datastructure.Dimension=WBG_WITS:TARIFF_TRAINS(1.1).REPORTER" position="2">
<ConceptIdentity>
<Ref id="REPORTER" maintainableParentID="TARIFF_CONCEPTS" maintainableParentVersion="1.0" agencyID="WBG_WITS" package="conceptscheme" class="Concept" xmlns="" />
</ConceptIdentity>
<LocalRepresentation>
<Enumeration>
<Ref id="CL_COUNTRY_WITS" version="1.0" agencyID="WBG_WITS" package="codelist" class="Codelist" xmlns="" />
</Enumeration>
</LocalRepresentation>
</Dimension>
<Dimension id="PARTNER" urn="urn:sdmx:org.sdmx.infomodel.datastructure.Dimension=WBG_WITS:TARIFF_TRAINS(1.1).PARTNER" position="3">
<ConceptIdentity>
<Ref id="PARTNER" maintainableParentID="TARIFF_CONCEPTS" maintainableParentVersion="1.0" agencyID="WBG_WITS" package="conceptscheme" class="Concept" xmlns="" />
</ConceptIdentity>
<LocalRepresentation>
<Enumeration>
<Ref id="CL_COUNTRY_WITS" version="1.0" agencyID="WBG_WITS" package="codelist" class="Codelist" xmlns="" />
</Enumeration>
</LocalRepresentation>
</Dimension>
<TimeDimension id="TIME_PERIOD" urn="urn:sdmx:org.sdmx.infomodel.datastructure.TimeDimension=WBG_WITS:TARIFF_TRAINS(1.1).TIME_PERIOD" position="4">
<ConceptIdentity>
<Ref id="YEAR" maintainableParentID="TARIFF_CONCEPTS" maintainableParentVersion="1.0" agencyID="WBG_WITS" package="conceptscheme" class="Concept" xmlns="" />
</ConceptIdentity>
<LocalRepresentation>
<TextFormat textType="ObservationalTimePeriod" />
</LocalRepresentation>
</TimeDimension>
<Dimension id="PRODUCTCODE" urn="urn:sdmx:org.sdmx.infomodel.datastructure.Dimension=WBG_WITS:TARIFF_TRAINS(1.1).PRODUCTCODE" position="5">
<ConceptIdentity>
<Ref id="PRODUCTCODE" maintainableParentID="TARIFF_CONCEPTS" maintainableParentVersion="1.0" agencyID="WBG_WITS" package="conceptscheme" class="Concept" xmlns="" />
</ConceptIdentity>
<LocalRepresentation>
<Enumeration>
<Ref id="CL_PRODUCTCODE_WITS" version="1.0" agencyID="WBG_WITS" package="codelist" class="Codelist" xmlns="" />
</Enumeration>
</LocalRepresentation>
</Dimension>
<Dimension id="DATATYPE" urn="urn:sdmx:org.sdmx.infomodel.datastructure.Dimension=WBG_WITS:TARIFF_TRAINS(1.1).DATATYPE" position="6">
<ConceptIdentity>
<Ref id="DATATYPE" maintainableParentID="TARIFF_CONCEPTS" maintainableParentVersion="1.0" agencyID="WBG_WITS" package="conceptscheme" class="Concept" xmlns="" />
</ConceptIdentity>
<LocalRepresentation>
<Enumeration>
<Ref id="CL_DATATYPE_WITS" version="1.0" agencyID="WBG_WITS" package="codelist" class="Codelist" xmlns="" />
</Enumeration>
</LocalRepresentation>
</Dimension>
</DimensionList>
Same example with cli:
> sdmx-dl list concepts WITS DF_WITS_Tariff_TRAINS --sort | convertfrom-csv | format-table
Concept Label Type Coded Position
------- ----- ---- ----- --------
FREQ Freq dimension true 1
REPORTER Reporter_ISO_N dimension true 2
PARTNER Partner dimension true 3
PRODUCTCODE ProductCode dimension true 5
DATATYPE DataType dimension true 6
DATASOURCE DataSource attribute true
EXCLUDEDFROM ExcludedFrom attribute true
So, after verification, there are three distinct problems:
Originally posted by @charphi in #165 (comment)
Each provider have different capabilities and the current method SdmxConnection#isDetailSupported()
is too limited to describe it. Another solution would be to use an enum instead.
For example:
enum SdmxWebFeature {
KEY_FILTER,
DETAIL_FILTER
}
Some commands have a lot of options (such as CSV format options and Obs format options).
These options add complexity to the application and clutter the integrated help.
The functionalities provided by some of these options could be achieved by using other command lines through pipes.
For example : https://csvkit.readthedocs.io/en/latest/scripts/csvformat.html
API documentation: https://osp.stat.gov.lt/en/rdb-rest
SdmxWebSource:
Key | Value |
---|---|
Name | LSD |
Description | Statistics Lithuania |
Aliases | |
Driver | |
Dialect | |
Endpoint | https://osp-rs.stat.gov.lt/rest_xml/ |
Properties | |
Website | https://osp.stat.gov.lt/ |
Monitor |
Some parameters are expected to be validated by patterns.
https://github.com/sdmx-twg/sdmx-rest/blob/master/v2_1/ws/rest/docs/4_3_structural_queries.md#parameters
https://github.com/sosna/sdmx-rest4js/blob/master/src/utils/sdmx-patterns.coffee
https://github.com/sdmx-twg/sdmx-ml-v2_1/blob/master/schemas/SDMXCommonReferences.xsd
To ease demos and/or tests
IMF is currently only available through a connectors driver.
It would be useful to have also a RI driver.
API documentation: https://www.bundesbank.de/en/statistics/time-series-databases/-/help-for-sdmx-web-service-855900
SdmxWebSource:
Key | Value |
---|---|
Name | BBK |
Description | Bundesbank |
Endpoint | https://api.statistiken.bundesbank.de/rest |
Website | https://www.bundesbank.de/en/statistics/time-series-databases |
Data structure query doesn't provide codelists anymore
It is currently possible to specify an input key with an number of dimension different from the one defined in data structure. This leads to strange results.
It would be better to check key validaty before using it in requests.
Improve CLI version option by adding relevant information and colors.
For example, Maven print the following information:
Apache Maven 3.8.1 (05c21c65bdfed0f71a2f2ada8b84da59348c4c5d)
Maven home: C:\...\apache-maven-3.8.1\bin\..
Java version: 16, vendor: AdoptOpenJDK, runtime: C:\...\jdk-16+36
Default locale: en_US, platform encoding: Cp1252
OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"
Related picocli documentation : https://picocli.info/#_version_help
properties
should be renamed config
to be more concise.type
should be renamed to provide a clearer meaning.API documentation: ?
SdmxWebSource:
Key | Value |
---|---|
Name | SGR |
Description | SDMX Global Registry |
Endpoint | https://registry.sdmx.org/ws/rest |
Website | https://registry.sdmx.org/overview.html |
API documentation: https://sis-cc.gitlab.io/dotstatsuite-documentation/
SdmxWebSource:
Key | Value |
---|---|
Name | ESCAP |
Description | Economic and Social Commission for Asia and the Pacific |
Endpoint | https://api-dataexplorer.unescap.org/rest/ |
Website | https://dataexplorer.unescap.org/ |
API documentation: https://stats2.digitalresources.jisc.ac.uk/guides/
SdmxWebSource:
Key | Value |
---|---|
Name | UKDS |
Description | UK Data Service |
Driver | ri:dotstat |
Endpoint | https://stats2.digitalresources.jisc.ac.uk/restsdmx/sdmx.ashx |
Website | https://stats2.digitalresources.jisc.ac.uk/ |
API documentation: https://sis-cc.gitlab.io/dotstatsuite-documentation/
SdmxWebSource:
Key | Value |
---|---|
Name | SPC |
Description | Pacific Data Hub |
Endpoint | https://stats-nsi-stable.pacificdata.org/rest |
Website | https://stats.pacificdata.org/?locale=en |
There is a new provider for ILO at https://www.ilo.org/sdmx/rest/
The old one doesn't produce content anymore.
API documentation: https://www.ilo.org/ilostat-files/Documents/SDMX_User_Guide.pdf
NaN values are formatted differently in CLI depending on the JDK version.
Example: sdmx-dl fetch data ECB IFI M.U2.MM.1D.P000.ME.Z5.EUR
JDK8:
Series | ObsAttributes | ObsPeriod | ObsValue |
---|---|---|---|
M.U2.MM.1D.P000.ME.Z5.EUR | OBS_STATUS=M,OBS_COM= ,OBS_CONF=F | 1992-07-01T00:00:00 | ? |
M.U2.MM.1D.P000.ME.Z5.EUR | OBS_STATUS=M,OBS_COM= ,OBS_CONF=F | 1992-08-01T00:00:00 | ? |
JDK16:
Series | ObsAttributes | ObsPeriod | ObsValue |
---|---|---|---|
M.U2.MM.1D.P000.ME.Z5.EUR | OBS_STATUS=M,OBS_COM= ,OBS_CONF=F | 1992-07-01T00:00:00 | NaN |
M.U2.MM.1D.P000.ME.Z5.EUR | OBS_STATUS=M,OBS_COM= ,OBS_CONF=F | 1992-08-01T00:00:00 | NaN |
DataflowRef: StatCan:DF_10100139(1.0)
Xml file is invalid because the series headers are missing:
<?xml version='1.0' encoding='UTF-8'?>
<message:StructureSpecificData xmlns:ss="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/structurespecific" xmlns:footer="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message/footer" xmlns:ns1="urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=StatCan:DF_10100139(1.0):ObsLevelDim:TIME_PERIOD" xmlns:message="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message" xmlns:common="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xml="http://www.w3.org/XML/1998/namespace">
<message:Header>
<message:ID>DS1598331006473</message:ID>
<message:Test>false</message:Test>
<message:Prepared>2020-08-25T04:50:06</message:Prepared>
<message:Sender id="MetadataTechnology"/>
<message:Structure structureID="StatCan_DF_10100139_1_0" namespace="urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=StatCan:DF_10100139(1.0):ObsLevelDim:TIME_PERIOD" dimensionAtObservation="TIME_PERIOD">
<common:StructureUsage>
<Ref agencyID="StatCan" id="DF_10100139" version="1.0"/>
</common:StructureUsage>
</message:Structure>
</message:Header>
<message:DataSet ss:dataScope="DataStructure" xsi:type="ns1:DataSetType" ss:structureRef="StatCan_DF_10100139_1_0">
<Series/>
</message:DataSet>
</message:StructureSpecificData>
DataflowRef formats are different in input and output.
Solution: allow both formats in input
This will allow optional drivers
Content is missing when the fetch-meta command is called twice with slightly different queries.
To reproduce:
sdmx-dl fetch meta ECB EXR M.USD+CHF.EUR.SP00.A
sdmx-dl fetch meta ECB EXR A.CHF.EUR.SP00.A
API documentation: https://www.statcan.gc.ca/en/developers/wds
SdmxWebSource:
Key | Value |
---|---|
Name | STATCAN |
Description | Statistics Canada |
Aliases | |
Driver | |
Dialect | |
Endpoint | https://www150.statcan.gc.ca/t1/wds/rest |
Properties | |
Website | https://www150.statcan.gc.ca/n1/en/type/data?MM=1 |
Monitor |
Problems:
API documentation: ?
SdmxWebSource:
Key | Value |
---|---|
Name | UNICEF |
Description | UN International Children’s Emergency Fund |
Aliases | |
Driver | |
Dialect | |
Enpoint | https://sdmx.data.unicef.org/ws/public/sdmxapi/rest |
Properties | |
Website | https://data.unicef.org/ |
Monitor |
Attributes can be attached at different level such as observation-level
, time series-level
and sibling-level
.
(see example at https://sdw.ecb.europa.eu/datastructure.do?datasetinstanceid=120)
Currently, sdmx-dl cannot distringuish these levels.
API documentation: .Stat Suite
SdmxWebSource:
Key | Value |
---|---|
Name | SIMEL |
Description | El Salvador Labour Market Information System |
Aliases | |
Driver | ri:sdmx21 |
Dialect | |
Endpoint | https://disseminatesimel.mtps.gob.sv/rest |
Properties | |
Website | https://datasimel.mtps.gob.sv/ |
Monitor |
Done in 741dd40
API documentation: https://sis-cc.gitlab.io/dotstatsuite-documentation/
SdmxWebSource:
Key | Value |
---|---|
Name | CAMSTAT |
Description | National Statistical Institute of Cambodia |
Endpoint | https://nsiws-stable-camstat-live.officialstatistics.org/rest |
Website | http://camstat.nis.gov.kh/?locale=en&start=0 |
International Monetary Fund’s SDMX Central source doesn't allow data queries anymore.
See https://pandasdmx.readthedocs.io/en/v1.0/sources.html#imf-international-monetary-fund-s-sdmx-central-source
Alternative API: https://datahelp.imf.org/knowledgebase/articles/630877-api
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.