amattioc / sdmx Goto Github PK
View Code? Open in Web Editor NEWSDMX Connectors
License: European Union Public License 1.2
SDMX Connectors
License: European Union Public License 1.2
ISTAT has published the new SDMX 2.1 REST web services: http://sdmx.istat.it/SDMXWS/rest/help
They are fully 2.1 compliant (nice job!) so they'll work even using addProvider, anyway I'll add them to the list available out-of-the-box.
When I request Codes from Eurostat in R
using RJSDMX
, non-Ascii characters are not displayed correctly. See the following example:
> c_GEO <- getCodes("EUROSTAT","demo_r_pjanaggr3","GEO")
> c_GEO[[2]] # should return "Thüringen"
[1] "Thüringen"
I ran the result through all available encodings, but didn't succeed to retrieve the correct value
> str <- sapply(iconvlist(),function(x){iconv(c_GEO[[2]], from=x, to="")[[1]]})
> grep("Thüringen",str,value = T)
named character(0)
Would it be possible to pass an encoding as argument to getCodes()
and similar functions to remedy this situation?
I am aware that this might be a problem of my specific setup, however I am unsure which parameters to report. Please let me know if you need additional session-info.
Apart from that: Great package! Thanks for the effort!
With Worldbank as provider, it is possible to create valid queries only as long as no selection for FREQ
or REF_AREA
has been carried out. For example testing the query WDI/.BX_KLT_DINV_WD_GD_ZS.
identifies 248 time series while WDI/.BX_KLT_DINV_WD_GD_ZS.GBR
returns an error.
I don't know if this is a problem with the World bank (Beta provider) or if this is related to RJSDMX.
Is it possible to add UNdata to the list of providers? They seem to run a SDMX based REST service though I don't know about it's maturity and if it runs SDMX2.1
On the RJSDMX Wiki page the install_github()
command should be changed from
install_github(repo = "SDMX", username = "amattioc", subdir = "RJSDMX")
to
install_github(repo = "amattioc/SDMX", subdir = "RJSDMX")
as the username parameter has been depracated.
This is a clearly minor issue and filing an issue somewhat excessive. What would be your suggestion for reporting textual changes in the Wiki? I am still new to git / github and do not know yet how to address these things properly in the given framework.
It should be possible to completely suppress messages in an R session, but I think the java code is printing output even in the case when the configuration file sets levels to WARNING. (Perhaps level OFF should be available?)
z <- try(getSDMX("OECD", 'G20_PRICES.CAB.CP.IXOB.M'), silent=TRUE)
Dec 05, 2014 11:46:49 AM it.bankitalia.reri.sia.sdmx.client.RestSdmxClient runQuery
SEVERE: Connection failed. HTTP error code : 404, message: Not Found
SDMX meaning: No results matching the query.
The warning does not need to be printed, it is available to the R programmer:
attr(z,"condition")$message
[1] "it.bankitalia.reri.sia.util.SdmxException: Connection failed. HTTP error code : 404, message: Not Found\nSDMX meaning: No results matching the query."
E.G.
getSDMX("ECB", "ILM.W.U2.C.A010.Z5.Z0Z")
It would be a lot more useful if there was a way to show remaining dimensions of a DSD available on the sdmxHelp app once one dimension was selected. For example, if I am looking at the ECB, and a specific dataflow, if I limit it to quarterly FREQ, it would be nice to at least get a a full codelist of remaining dimensions available, but also to know that other dimensions become unavailable when you select quarterly frequency. As it is now, there's no real way to determine if you have constructed a valid dataflow code, as each dimension of a DSD is given separately. It's not a problem on most of the other Euro government websites because their web interface is usable, but the ECB's web interface for searching available codes is so clunky as to be unusable for serious researchers who need multiple series. BTW, thanks for this package, I've been trying to find some other way of getting code definitions by installing all the SDMX tools provided through sdmx.org and the documentation is not very helpful.
Sometimes getSDMX()
is not able to retrieve the requested data. For Eurostat this is usually the case in the following two scenarios:
There is no data available for the specified ID e.g.
getSDMX(provider = "EUROSTAT",id = "teilm310.A.NSA.JOBRATE.TOTAL.C.")
The amount of data is beyond a specific size limit e.g.
getSDMX(provider = "EUROSTAT",id = "nrg_110a.A.KTOE.2410..")
In both cases I get an error message that reads The query: XXX did not match any time series on the provider.
While this is ok for downloading a single dataset it is a problem when I want to download a list of dataflow-ids using lapply()
. Once an error is encountered, the lapply()
breaks. Would it be possible to return an error code in these cases which I could use in a condition to avoid breaking the lapply()
code?
An example of this situation is illustrated in the following gist
https://gist.github.com/Tungurahua/a1aa7044a3a46c8b8eec
in which for a list of dataflows the some information is extracted.
In both cases the EUROSTAT server returns a html message containing an error message. It would be ideal if this message could be returned in the R console. But as usual I don't know if this behaviour can be generalized over different REST-providers.
Best
Albrecht
Investigate a way for opening swing applications from within R in OsX environments.
Is there a way to un-select a dimension? Once I have selected e.g. an annual frequency, the only way to clear the frequency dimension is to select another data flow. The expected behaviour would be that Ctrl + Click
not only clears the selection field but also the selection in the REST-ID. This does not work if a single item is to be de-selected. This is especially cumbersome when working with WB indicators as they do not distinguish individual indicators by FlowID but by the SERIES
dimension.
See example below:
Handle a new conf key for default proxy, useful with single proxy environments
Add a new function for importing SDMX data files from disk or from network URL. Useful for providers that do not provide web services but disseminate SDMX data.
For data providers with a large number of flows (e.g. Eurostat/ILO) it is almost impossible to find a specific flow as they are in random order. From a user perspective I could think of the following improvements:
However, I don't know how difficult each of them would be to implement.
Using the package in R I think that getProviders()
and getFlows()
do a good job for identifying a dataflow or a group of dataflows. However, the selection of dimensions and codes is where a gui really comes in helpful. Thus another idea would be to allow the helper to accept a combination of data-provider / dataflow as input parameters. Something like sdmxHelper(provider="EUROSTAT", flow="nrg105")
which would open the helper with this selection set.
I was testing your Java components, but it seems to me OECD client doesn't work. I tried this simple request:
try {
GenericSDMXClient client = SDMXClientFactory.createClient("OECD");
client.getDataflows();
} catch (SdmxException e) {
e.printStackTrace();
}
and I got
III 18, 2015 10:39:39 DOP. it.bancaditalia.oss.sdmx.client.RestSdmxClient runQuery
INFO: Contacting web service with query: http://stats.oecd.org/restsdmx/sdmx.ashx//GetDataStructure/ALL
III 18, 2015 10:39:39 DOP. it.bancaditalia.oss.sdmx.client.RestSdmxClient runQuery
SEVERE: Connection failed. HTTP error code : 400, message: Bad Request
SDMX meaning: there is a problem with the syntax of the query
it.bancaditalia.oss.sdmx.util.SdmxException: Connection failed. HTTP error code : 400, message: Bad Request
SDMX meaning: there is a problem with the syntax of the query
at it.bancaditalia.oss.sdmx.client.RestSdmxClient.runQuery(RestSdmxClient.java:288)
at it.bancaditalia.oss.sdmx.client.custom.DotStat.getDataflows(DotStat.java:109)
at eu.keyup.kejml.sdmx.App.main(App.java:14)
works:
tts <- getSDMX('ABS', 'CPI.1.50.10001.10.Q')
names(tts)
[1] "CPI.1.50.10001.10.Q"
fails:
tts <- getSDMX('ABS', 'CPI.1.*.10001.10.Q')
Dec 12, 2014 2:10:41 PM it.bankitalia.reri.sia.sdmx.client.custom.RestSdmx20Client getTimeSeries
SEVERE: Exception caught parsing results from call to provider ABS
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
it.bankitalia.reri.sia.util.SdmxException: Exception. Class: javax.xml.stream.XMLStreamException .Message: ParseError at [row,col]:[1,63]
Message: White spaces are required between publicId and systemId.
getFlows('IMF')
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
it.bancaditalia.oss.sdmx.util.SdmxException: Connection failed. HTTP error code : 404, message: Not Found
SDMX meaning: No results matching the query.
It appears this web-services query from the SDMX Helper Tool no longer works:
http://sdmxws.imf.org/RestSDMX2/sdmx.ashx/GetKeyFamily/ALL
This link seems to suggest no RESTful way of obtaining this list.
http://sdmxws.imf.org/Gateway/Help.aspx
Any thoughts on how to get this? Thanks.
The function should allow the user to easily switch from a time series perspective to a dataframe. Metadata should be added only if required by a specific parameter
IN the .onLoad function the path to the conf file is wrong. The 'inst' dir must be removed.
I was wondering if RJSDMX has the possibility of handling sdmx queries with a updatedAfter parameter? I haven't been able to find any worked examples.
Does it require that the publisher is already using version 2.1 of SDMX? The link below is a replica of section 7 of the official SDMX standards, which suggests that you should be able to run a query that just returns the deltas since the last query ... any new INSERTS, UPDATES and DELETES.
https://github.com/sdmx-twg/sdmx-rest/blob/master/v2_1/ws/rest/docs/4_4_data_queries.md
As a use-case, today's IMF global growth revisions - could a query into their REST service effectively return just those values that have been revised, if you used the last forecast date as 'updatedAfter' parameter.
Thanks for your insight.
SDMX is designed to support multiple languages but the current client only retrieve one (english or first).
Here is a link to an example : http://stat.nbb.be/restsdmx/sdmx.ashx/GetDataStructure/CAPSTOCK2010
This is a question related to my sidenote in #46. Have you developed an idea for the overall scope of the SDMX
package and RSDMX
package in general? Is it intended as a low-level entry point to SDMX data which other packaged are supposed to build upon, of is the intention to create a one-stop SDMX-data-provider?
I am asking this as for the sources I work with (EUROSTAT) I think it would make sense to wrap the retrieved data (including dictionaries) into an object structure (S3, S4). I think this would have the following advantages:
ggplot2
Since this might make sense for Eurostat data (which is mostly of annual resolution) it could be problematic for data with higher resolution like exchange rates, so I don't know if such an object structure should be tackled in RJSDMX
or maybe rather in a downstream package tailored to a specific data provider (e.g. RJEUROSTAT).
If you have any thoughts on that matter I would be keen to hear your opinion. My overview of the SDMX process as well as R-programming is currently quite limited, so I would be happy to get an expert opinion.
Best
Albrecht
For now only the status attribute is processed.
E.G.
http://sdw-wsrest.ecb.europa.eu/service/data/ICP/M.U2.N.000000.4.ANR
Is the World Bank beta site SDMX/REST 2.1 so addProviders() can be used?
I have not been able to get anything from ILO:
require("RJSDMX")
z <- getFlows('ILO')
Using central configuration: /home/paul/.SdmxClient
Dec 05, 2014 11:05:16 AM it.bankitalia.reri.sia.sdmx.client.RestSdmxClient runQuery
SEVERE: Connection failed. HTTP error code : 502, message: Bad Gateway
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
it.bankitalia.reri.sia.util.SdmxException: Connection failed. HTTP error code : 502, message: Bad Gateway
tts <- getSDMX("ILO", 'EAP_TEAP_SEX_AGE_NB.AUS...*')
Dec 05, 2014 11:20:31 AM it.bankitalia.reri.sia.sdmx.client.RestSdmxClient runQuery
SEVERE: Connection failed. HTTP error code : 404, message: Not Found
SDMX meaning: No results matching the query.
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
it.bankitalia.reri.sia.util.SdmxException: Connection failed. HTTP error code : 404, message: Not Found
SDMX meaning: No results matching the query.
The user should be enabled to set a text pattern filtering on the dataflows names and descriptions when browsing providers content.
In the case of no series being returned I think rather than returning an empty list it may be better to throw an error, "series ___ not found" or "no data available for series ___", if those can be distinguished. e.g.
z <- getSDMX('IMF', 'PGI.CA.BIS.FOSLB.A.L_M')
str(z)
Named list()
length(z)
[1] 0
If you switch from a functioning dataflow1 to a erroneous dataflow2 the "Select Dimensions" retains the "Select Dimensions" of dataflow1. The field should be empty instead.
To illustrate the behaviour, select Eurostat as provider, sort Code ID alphabetically and change the code ID from sts_trtugr_q
to t2020_10
. As you can see in the following screenshots, the Code at the beginning of the REST-ID stays unchanged as well as the content of the "Select Dimensions" box.
Note that t2020_10
returns a server 404 error which seems to be due to an error at Eurostat that relates to all t2020
indicators at the moment. The behavior will not be reproducable if the server is reachable again.
Graying out Dimensions and Code selection boxes if server is unreachable would be a great feat.
I could be wrong here, but I think the right behaviour for the Edit | Copy Menu should always be to move the current REST-ID to the clipboard. Right now it returns the ID if it has been selected before.
sdmxhelp()
does not work under OsX. When I run the command I get the following error message:
> sdmxHelp()
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.awt.HeadlessException
If I understand it correctly R tries to run RJSDMX/java/SDMX.jar. I found the
.jar` but couldn't run it from the terminal:
> RJSDMX/java$ java -jar SDMX.jar
no main manifest attribute, in SDMX.jar
I am stabbing in the dark here so it is quite possible that I am completely on the wrong path. Can you provide any hints how to get sdmxhelp()
to work?
With best regards
Albrecht
Agency is tied to artifacts, not provider. The model has to be refined.
This seems to be a pretty obscure issue. I've failed to resolve it through Google and following similar instances of the issue for others. But any idea why I might be getting the below error message when trying to build the Java library?
c:\E\New\MATLAB\work\SDMX>ant SDMX
Buildfile: c:\E\New\MATLAB\work\SDMX\build.xml
BUILD FAILED
c:\E\New\MATLAB\work\SDMX\build.xml:6: Unexpected element "{}html" {antlib:org.apache.tools.ant}html
Total time: 0 seconds
There is an error in this:
tts0 <- getSDMX('IMF', 'PGI.CA....')
[1] "Number of observations and time slots equal to zero,
or not matching: 0 0"
Error in FUN(X[[534L]], ...) : object 'tts' not found
This is happening in the call
result = lapply(X = rList, FUN = convertSingleTS)
rList[[534L]] seems to have no data so (I'm guessing) in
convertSingleTS it prints the message "Number of observations..."
and does not assign tts, but then tries to return tts.
I think you may want an empty series here, so everything else retrieved is not discarded. If not, this really should be an error message not a print statement.
@amattioc As per your suggestion in #25, having a dedicated function to deriving relevant metadata and data from a SDMX query that specifies 'updatedAfter' and 'includeHistory' parameters. In #25 I highlight the absense of any contextual metadata being returned from a query that includes these parameters. The use-case is mainly for a system that is looking for the latest changes to a dataset that have been applied since a previous query (set as the updatedAfter base date).
There is some relevant discussion on the implementation of these here: sdmx-twg/sdmx-rest#17
In conjunction with the data values, a user might specify what other meta-data pertaining to those values should be returned. Perhaps as a default, it would include the following:
'action', 'validFromDate', 'OBS_STATUS', 'OBS_CONF' and 'OBS_COM'.
Perhaps others have additional ideas.
Thanks. Colum
sdmxHelp()
no longer starts under window.
> sdmxHelp()
Warning message:
running command 'java -classpath C:/Program Files/R/R-3.1.2/library/RJSDMX/java/SDMX.jar
it.bancaditalia.oss.sdmx.helper.SDMXHelper' had status 127
I could open last weeks versions without problem
Also, how can I start sdmxHelp2()
under Windows. Setting of a system command does not work here as java
is not recognized as a command in the shell.
Any hints would be appreciated.
Best regards
Albrecht
As in SAS, the new table class can be a generic container, better than the timeseries class
It would be nice if a query with + or | would return results in the order of the query specification. (Enhancement request)
In the helper tool the topmost table reads Code ID
and Code Description
. I would suggest to change this to Flow ID
and Flow Description
to have a consistent wording (-> Codes are the variable levels of the dimensions)
I am trying to add FAO sdmx as a new provider using addProvider()
. However I am unsure what an "endpoint" is in this context. My first attempt has been:
# no Error received
addProvider("FAO","http://data.fao.org/sdmx")
# lists FAO as new source
getProviders()
# throws error
getFlows("FAO")
I suppose the url "http://data.fao.org/sdmx" is wrong. I tried to view the other Sources, but I understand that they are somewhere in the java code (?). Is there some resource that explains in more detail what I have to look out for when entering a new "endpoint"? Any hints would be appreciated.
The rest client for ABS seems broken.
The following query doesn't work anymore:
http://stat.abs.gov.au/restsdmx/sdmx.ashx/GetDataStructure/ALL/ABS
tts <- getSDMX("ILO", "DF_YI_ALL_EMP_TEMP_SEX_AGE_NB/YI.MEX.A.463.EMP_TEMP_NB.SEX_F.AGE_10YRBANDS_TOTAL", start="1995", end="2012")
2012 specified but:
end(tts[[1]])
[1] 2011
This does not happen on OECD:
tts <- getSDMX('OECD', '7HA_A_Q.CAN.AF411LI.ST.C.A', start="2001", end="2009")
end(tts[[1]])
[1] 2009
tts <- getSDMX("INEGI", 'DF_COMTRADE.Q.MX.TOTAL.CAN...USD.Z.CAN.*')
returns an empty list, as does everything else I have tried with fields replaced by *, up to next which is very slow and fails with
tts <- getSDMX("INEGI", 'DF_COMTRADE..........')
Dec 04, 2014 8:12:40 PM it.bankitalia.reri.sia.sdmx.client.RestSdmxClient runQuery
SEVERE: Exception caught calling provider INEGI
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
it.bankitalia.reri.sia.util.SdmxException: Exception. Class: java.net.SocketException .Message: Connection reset
I have not managed to get any data from INEGI, do you have an example that works?
E.G:
getSDMX('OECD', 'G20_PRICES.CAN.CPALTT01.IXOB.M')
(OECD sets monthly dates like this: 2001M1)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.