datacommonsorg / docsite Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Move /info down (and /observations up) -- mostly ordering these in terms of expected usage.
Getting Started Guide
Triples
Properties
Property Values
Property Values (linked)
Single Observation
Single Observation (linked)
Series of Observations
Series of Observations (linked)
Place Info
Variable Info
Variable Group Info
Variables
Currently there's no way to get the parent location or area for a given place, although you can find the places within a place.
The actual problem I faced was using the CDC 500 City list and trying to find the state that each city is in.
Here's my rather hacky workaround for doing it from the other direction, which involves finding another list of places and merging:
# Get the 500 cities in the CDC500 list
cdc500_dcids = pd.DataFrame(
dc.get_property_values(["CDC500_City"], "member", limit=500)["CDC500_City"]
).rename(columns={0:'DCID'})
# Get the list of 50 US states by dcid
states = dc.get_property_values(["PlacePagesComparisonStateCohort"], "member")["PlacePagesComparisonStateCohort"]
# Get all the cities in those states by dcid
cities = datacommons.get_places_in(states, "City")
# Convert this list of cities into a dataframe
cities_list = [(key, x) for key,val in cities.items() for x in val]
cities_df = pd.DataFrame(cities_list, columns=['State_DCID', 'City_DCID'])
# Merge with our original data
cities_cdc500 = pd.merge(cdc500_dcids, cities_df, left_on="DCID", right_on = "City_DCID")
http://docs.datacommons.org/api/python/property_label.html
should probably replace with "foo" and "bar"
https://docs.datacommons.org/contributing/background/mcf_format.html last section has a broken img
many are "get_property_labels"
Make notes look like this one:
https://docs.datacommons.org/api/sheets/get_variable.html
This endpoint allows a user to get the value associated with a statistical variable for a set of child places of a certain type, for a given date.
Endpoint: /stat/collection
Parameters:
Available as GET
Note, a direct parent/child place relationship is required here as well, i.e. country/USA -- State works, but country/USA -- County would not:
https://api.datacommons.org/stat/collection?parent_place=geoId/06&child_type=County&date=2013&stat_vars=Count_Person
https://api.datacommons.org/stat/collection?parent_place=country/USA&child_type=County&date=2013&stat_vars=Count_Person
The docsite https://docs.datacommons.org/contributing/contributing_to_datacommons.html probably should be tailored to more practical how-to's rather than the standard Google https://github.com/datacommonsorg/docsite/blob/master/CONTRIBUTING.md
https://juliawu.github.io/datacommons-docsite/api/rest/v1/variables
Can we use include an example for a place (of a different type) with much much more data? e.g. geoId/06, India, USA. Can use ellipses but if users try it out, they'll see how much data we have (and they will most likely just copy/paste the example).
Seems it accepts both list and series. get_property_values mentions series.
Some of our docs can be combined and reused-- a separate section:
Background
API
Contributing
E.g. we have https://docs.datacommons.org/data_model.html AND a link to schema.org data model at https://docs.datacommons.org/contributing/background/background.html
StatVars are important to users too: https://docs.datacommons.org/api/python/stat_value.html, so information about StatVars (currently only here) will be relevant.
For this issue, feel free to take large freedoms. As Guha put it, the current Representing Statistics documentation is not written with sensitivity to those who have not been in the space for years.
https://juliawu.github.io/datacommons-docsite/api/rest/v1/bulk/observations/series
At the moment, judging from the examples and request format at the top of the page, it looks as if exactly 2 entities & variables are required. Can we make it clearer that it's 1 or more? Perhaps with ellipses, and an example that uses a different number of arguments.
Will reflect this in R's PR, then subsequent PR to fix for Python/REST
e.g.:
"Given a list of Place DCID’s, return the DCID of StatisticalPopulation’s for these places, constrained by the given property values."
Probably a copy pasta
It gets tedious to keep linking to browser nodes. We should add custom markdown to automatically create these links.
A starting point: https://jekyllrb.com/docs/configuration/markdown/
This endpoint allows a user to get the value for a set of statistical variables, for a single date, across a set of places.
Endpoint: /stat/set (code)
Params:
Available as POST
Example: curl -X POST https://api.datacommons.org/stat/set -d '{ "places": ["geoId/06", "geoId/0649670", "country/FRA", "country/USA"], "stat_vars": ["Count_Person", "Count_CriminalActivities_CombinedCrime"], "date": "2017"}'
https://juliawu.github.io/datacommons-docsite/api/rest/v1/observations/series
Main byline says: Retrieve series of observations from a specific variable for an entity from the preferred facet.
but it's unclear what "preferred facet" means.
It also seems as if we should allow users to select a facet? /cc @shifucun
https://docs.datacommons.org/contributing/#add-data points to https://docs.datacommons.org/courseware.html instead of https://docs.datacommons.org/courseware/
Getting a 404 error when clicking on link for 'the courseware page'
https://juliawu.github.io/datacommons-docsite/api/rest/v1/info/place
Get information on a single place (or city)
Most SPARQL tutorials "out there" are rather confusing. It would be nice to have something more in depth attached to our docs.
Here is an illustration from this drawing.
When searching for specific properties like "scalingFactor" or "measurementMethod", it would be useful to have an endpoint that returns a list of entities possessing these specific properties.
This endpoint allows a user to explore statistical variables which are available for a set of places. Available as both GET and POST
Endpoint: /place/stat-vars (code)
Parameters: dcids (list)
Request: https://api.datacommons.org/place/stat-vars?dcids=country/USA
Response: {"places":{"country/USA":{"statVars":["dc/zl73qp466bs28","dc/z6jy58c28k7zh"]}}}
Example: http://api.datacommons.org/place/stats-var?dcids=geoId/06&dcids=zip/94025
When running the command =DCPLACESIN(A1, "Country")
(where A1 = asia
as a DCID) and then =DCGETNAME(B1)
on the output, the list of names output includes Egypt but not Russia. This strikes me as a little strange.
This endpoint allows a user to retrieve dates with data available for each statistical variable specified. The set of places to query is specified by an ancestor place, and the place type of the child places to consider (similar to our places-in API). This is helpful for building an interactive app to explore our data, e.g. https://staging.datacommons.org/tools/scatter2
Endpoint: /place/stat/date/within-place (code)
Params:
Available as both GET and POST
Request: curl -X POST https://api.datacommons.org/place/stat/date/within-place -d '{ "ancestor_place": "geoId/06”, “place_type”: “City”, “stat_vars”: [“Count_Person”]}'
Response: {“Count_Person”: [“2017”, “2018”, ...]}
or
/cc: @shifucun
https://docs.datacommons.org/contributing/background/glossary.html
E.g. under Statistical variable, I see some interesting line breakages with indents
https://browser.datacommons.org/browser/country/TZA
The alternateName for Tanzania is a Tanzanian flag emoji.
The whole section of https://docs.datacommons.org/api/ is up for a refinement pass. Feel free to create new, smaller-scoped bugs.
When I run the Python code datacommons.get_triples(['dc/c3j78rpyssdmf','dc/7hfhd2ek8ppd2'],limit=2)
, the result limits me to three triples for the endpoint, rather than two as I would expect:
{'dc/c3j78rpyssdmf': [('dc/zn6l0flenf3m6', 'biosampleOntology', 'dc/c3j78rpyssdmf'), ('dc/tkcknpfwxfrhf', 'biosampleOntology', 'dc/c3j78rpyssdmf'), ('dc/c3j78rpyssdmf', 'provenance', 'dc/h2lkz1')], 'dc/7hfhd2ek8ppd2': [('dc/7hfhd2ek8ppd2', 'provenance', 'dc/h2lkz1'), ('dc/4mjs95b1meh1h', 'biosampleOntology', 'dc/7hfhd2ek8ppd2'), ('dc/13xcyzcr819cb', 'biosampleOntology', 'dc/7hfhd2ek8ppd2')]}
Likewise for limit=1:
>>> datacommons.get_triples(['dc/c3j78rpyssdmf','dc/7hfhd2ek8ppd2'],limit=1)
{'dc/c3j78rpyssdmf': [('dc/c3j78rpyssdmf', 'provenance', 'dc/h2lkz1'), ('dc/zn6l0flenf3m6', 'biosampleOntology', 'dc/c3j78rpyssdmf')], 'dc/7hfhd2ek8ppd2': [('dc/7hfhd2ek8ppd2', 'provenance', 'dc/h2lkz1'), ('dc/4mjs95b1meh1h', 'biosampleOntology', 'dc/7hfhd2ek8ppd2')]}
https://juliawu.github.io/datacommons-docsite/api/rest/v1/observations/point
The information is a little hidden that the latest observation is returned if a date isn't specified, especially since the main description says "Retrieve a specific observation at a set date from a variable for an entity." There should be a quick follow there that the latest is returned (otherwise you have to hunt for it in the query params). Users could be quickly scanning through APIs, and might miss this.
As discussed in the chat, the tabs plug-in might benefit from being moved in-house.
Since we have published links to our v0 API (which could be bookmarked, etc), please maintain the old link structure, or add redirects.
https://docs.datacommons.org/api/rest/place_in.html -- content
https://juliawu.github.io/datacommons-docsite/api/rest/place_in.html -- 404
Suggested fix: move old files back to the existing, prod, structure (so first link above continues to work)
As a follow on when we are ready to deprecate v0, we can move to a v0 subfolder with redirects & pointers to the new versions of the API.
We should make sure https://docs.datacommons.org/statistical_variables.html, etc. have the nav bar items.
Especially since it's important, can we increase the size to at least match the rest of the body text?
css for the icon:
background: var(--dc-red-lite);
color: white;
padding-right: 0;
margin-right: 0.5em;
and removed font-size: 0.8rem on the containing div
This applies to all div's with class alert
(e.g. API key).
From the site-wide Documentation toolbar -
Line 53 in 02d6a46
When running the cURL command curl --request GET \ --url 'https://api.datacommons.org/stat/value?place=country%2FGMB&stat_var=Amount_EconomicActivity_ExpenditureActivity_EducationExpenditure_Government_AsFractionOf_Amount_EconomicActivity_GrossDomesticProduction_Nominal&scalingFactor=100.0000000000'
, if I try to shorten the scalingFactor to 100.0, I get a response with 404 status code back telling me that no stat data has been found.
The markdown files that are used to build the documentation is at the top-level directory of this repository. This is an issue because non-documentation .md
files also get built into production.
For example, the README of this repository is served at this link, which seems unintentional. LICENSE is also affected.
Also, since the source files are mixed with repository files in the top-level directory, it is hard to scan the contents and look for the file one is interested in!
Normally, pages that are not intended to be deployed are added to _config.yml
under exclude
. (see #140 )
However, this process is prone to human-errors (as evidenced by the README and LICENSE being exposed). They require manual upkeep to make sure the list is up-to-date, etc.
I'd like to move all source files to its dedicated directory, perhaps called src
or source
. Then, jekyll configuration can be changed in one line to look at that directory for files to build. This Jekyll option is described in the official documentation.
This is a straightforward code change, and therefore it should be suitable for me as a first-time contributor.
This endpoint allows a user to explore the union of statistical variables which are available for a set of places. Only available as POST
Endpoint: /place/stat-vars/union (code)
Parameters:
Want to match the docs we've been creating on the docsite.
index.html only shows a "Welcome to Data Commons" title
But instead it 'd explain little about DataCommons and also shows the ToC
Update https://docs.datacommons.org/api/ with new get_stat_* related functions.
When I used the DCID getter tool documented in https://docs.datacommons.org/api/sheets/get_dcid.html to get the DCID for the United States, it returned the value geoId/72127
. When I tried to use some functions in the Sheets API on this DCID, they consistently returned errors. Only when I manually replaced geoId/72127
with country/USA
was I able to return a list of state DCIDs from the endpoint.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.