Coder Social home page Coder Social logo

usgcrp / gcis Goto Github PK

View Code? Open in Web Editor NEW
20.0 13.0 16.0 6.46 MB

Global Change Information System

Home Page: https://data.globalchange.gov

License: Other

Perl 72.40% Shell 0.17% PLpgSQL 4.37% JavaScript 7.65% CSS 15.23% Dockerfile 0.19%
perl mojolicious rose-db climate-science government-data provenance semantic-web

gcis's People

Contributors

adthrasher avatar amcqueen12 avatar amruelama avatar bduggan avatar jimbiard avatar jimbiardcics avatar justgo129 avatar lomky avatar pymonger avatar rewolfe avatar rsindlin avatar tsizer avatar zacharylandes avatar zednis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gcis's Issues

Derreferencable URIs

From a comment on the METHOD paper review:

Also, although the URIs are resolvable, they are not derreferenceable. For instance: curl -sH "Accept: text/turtle" -L http://data.globalchange.gov/report/nca3/chapter/our-changing-climate will return html. The same happens for the ontology; if I try to import it to Protege (http://data.globalchange.gov/gcis.owl), it will fail. These are small issues which difficult the reusability of the resources described in the paper.

GCIS seems to use "application/x-turtle" for the turtle type, but even when I try that with the Accept header, I still get text/html back:

$ curl -svIH "Accept: application/x-turtle" http://data.globalchange.gov/report/nca3/chapter/our-changing-climate > /dev/null
* About to connect() to data.globalchange.gov port 80 (#0)
*   Trying 128.117.225.225...
* connected
* Connected to data.globalchange.gov (128.117.225.225) port 80 (#0)
> HEAD /report/nca3/chapter/our-changing-climate HTTP/1.1
> User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8} zlib/1.2.5
> Host: data.globalchange.gov
> Accept: appication/x-turtle
> 
< HTTP/1.1 200 OK
< Server: nginx/1.6.2
< Date: Mon, 17 Aug 2015 18:39:53 GMT
< Content-Type: text/html;charset=UTF-8
< Content-Length: 13881
< Connection: keep-alive
< X-API-Version: 1.34
< Access-Control-Allow-Origin: *
< 
* Connection #0 to host data.globalchange.gov left intact
* Closing connection #0

usage of places:Country

lib/Tuba/files/templates/organization/object.ttl.tut uses the place:Country class as a property.

% if (my $country = $organization->country) {
   place:Country "<%= $country->name %>"^^xsd:string;
% }

We should update this so place:Country is used as a class. We will need to determine the best property to use to relate the organization to a country.

I didn't see any properties in the GCIS ontology or dublin core that seemed like the right fit.

Likewise the W3C Organization Ontology did not have anything that looked promising. (there is place:in - but that is intended to relate one place to another place).

Dbpedia has a country property that looks good, but it doesn't look like we are using dbpedia anywhere else at present.

http://dbpedia.org/ontology/country
comment: The country where the thing is located.

% if (my $country = $organization->country) {
   dbpedia:country [ 
      a place:Country ;
      rdfs:label "<%= $country->name %>"^^xsd:string;
   ] ;
% }

I think from my (very cursory) search that the dbpedia:country property is the best candidate.

Extend Merging Autocomplete to show Organizations and Journals

Regarding the fields which facilitate merging (i.e., the white box to the right of "delete") within the page for each org and journal: they are only populated with people's names, which doesn't help when trying to merge orgs or journals. This bug should be fixed.

Relating org names in searches

Allow one to search for alternative org names within the “organization” box (not general search box)

Example: if one wants to conduct a search on "NESDIS" within the "organization" box in the UI, nothing will come up in autocomplete. One would have to type“NESDIS” pulls up “National Oceanic and Atmospheric Administration National Environmental Satellite, Data, and Information Service." Such functionality would be really good to have.

I'm not sure if this would be a logical outgrowth of #291 and therefore will not need an additional ticket.

spatial and temporal extent descriptions

We should update the dataset template to use dcterms:spatial and dcterms:temporal.

lib/Tuba/files/templates/dataset/object.ttl.tut

## Projection and resolution:
   gcis:spatialExtents "<%= $dataset->spatial_extent %>";
   dwc:geodeticDatum "<%= $dataset->spatial_ref_sys %>";
   gcis:spatialResolution "<%= $dataset->spatial_res %>"^^xsd:string;
   gcis:TemporalExtents "<%= $dataset->temporal_extent %>";
   dcterms:verticalExtents "<%= $dataset->vertical_extent %>";

The GCIS ontology has several properties that can be used to describe a spatial or temporal extents.

Can we use the values from <%= $dataset->spatial_extent %> and <%= $dataset->temporal_extent %> to extract the information for these properties?

Also, are these the properties we want to use to describe the spatial and temporal extents?

gcis:extentTypeCode a owl:DatatypeProperty ;
    rdfs:label "Extent Type Code" ;
    rdfs:comment "The extent type code of a spatial extent." ;
    rdfs:domain gcis:SpatialExtents ;
    rdfs:range xsd:string .

gcis:westBoundLongitude a owl:DatatypeProperty ;
    rdfs:label "West Bound Longitude" ;
    rdfs:comment "The value of west bound longitude." ;
    rdfs:domain gcis:SpatialExtents ;
    rdfs:range xsd:float .

gcis:eastBoundLongitude a owl:DatatypeProperty ;
    rdfs:label "East Bound Longitude" ;
    rdfs:comment "The value of east bound longitude." ;
    rdfs:domain gcis:SpatialExtents ;
    rdfs:range xsd:float .

gcis:southBoundLatitude a owl:DatatypeProperty ;
    rdfs:label "South Bound Latitude" ;
    rdfs:comment "the value of south bound latitude." ;
    rdfs:domain gcis:SpatialExtents ;
    rdfs:range xsd:float .

gcis:northBoundLatitude a owl:DatatypeProperty ;
    rdfs:label "North Bound Latitude" ;
    rdfs:comment "The value of north bound latitude." ;
    rdfs:domain gcis:SpatialExtents ;
    rdfs:range xsd:float .

gcis:startedAt a owl:DatatypeProperty ;
    rdfs:label "Started At" ;
    rdfs:comment "The start date/time of a temporal extent." ;
    rdfs:domain gcis:TemporalExtents ;
    rdfs:range xsd:dateTime .

gcis:endedAt a owl:DatatypeProperty ;
    rdfs:label "Ended At" ;
    rdfs:comment "The end date/time of a temporal extent." ;
    rdfs:domain gcis:TemporalExtents ;
    rdfs:range xsd:dateTime .

isCitedBy, isReferencedBy

@zednis @bduggan @aulenbac @rewolfe @BNewman2104 I don't know whether this would be an ontology question or a dev one so I'm posting it here.

Most pubs in GCIS have references for which their representation in endnote had only one UUID regardless of the quantity of identical references. An example is at http://data.globalchange.gov/report/federal-actions-climate-resilient-nation.thtml
which, although cited in multiple chapters, had a reference in endnote with only one UUID in all instances. Therefore, cito:isCitedBy only appears once.

However, there were a few instances of references repeated in multiple nca3 chapters which have multiple uuids, e.g.:
http://data.globalchange.gov/report/ceq-progrep-2010.thtml

This is because we were unable to catch 100% of the cases where a reference was cited by multiple chapters and consequently apply the same UUID to those references. My point is that it's solely an endnote issue.

The point is that this results in identical triples. For the second xample provided above,

http://data.globalchange.gov/report/ceq-progrep-2010
cito:isCitedBy http://data.globalchange.gov/report/nca3;
biro:isReferencedBy http://data.globalchange.gov/reference/1a73a6da-b3fb-4e34-aa3a-868579682b56.

http://data.globalchange.gov/report/ceq-progrep-2010
cito:isCitedBy http://data.globalchange.gov/report/nca3;
biro:isReferencedBy http://data.globalchange.gov/reference/37163c2c-5801-4702-9191-f32bf0f05b26.

Is this a problem? @zednis would it be easier to just locate an alternate term for biro:isReferencedBy and thus create something like the following (improper use of RDF) pseudocode:

<report_URI> <reference_UUID> .

where would replace biro:isReferencedBy?

Update Organization metadata with new relationship type

Now that we have added the relationships of “center_of” and “unit_of,” use SPARQL queries to go back through and update relationship types for university centers which predated the entry of these new relationships in GCIS.

Require journals to have ISSNs

All journals have ISSNs, whether they be eissns, regular ISSNs, or both. Just as we require ISBNs for books, we should require ISSNs for journals.

Enable redirects akin to Wikipedia / dbpedia "seeAlso"

We'd like to utilize the dbpedia "seeAlso" field in GCIS. For instance, a user searching for "U.C.L.A." and one searching for "UCLA" (or other values identical to those in the appropriate dbpedia:seeAlso field) should be both be directed to the GCIS page for the "University of California, Los Angeles."

Extend autocomplete

Within dev and stage, extend autocomplete to encompass more components of the organization name. Alternatively stated, only a smaller portion of an org with a really long name would show up in autocomplete, as the rest is stored as ... Therefore, someone entering the really long name into GCIS would need to shorten it for the autocomplete to locate the existing org. This should change.

Usage of dcterms:InteractiveResource

We are using dcterms:InteractiveResource to describe the computing environment of an activity.

from activity/object.ttl.tut

## Computing environment
   dcterms:InteractiveResource "<%= $activity->computing_environment %>"^^xsd:string;
  1. We are using InteractiveResource as a property here instead of as a class
  2. InteractiveResource is not in the dcterms namespace. It is in the dctype namespace.
  3. The intended semantics of dctype:InteractiveResource do not match how we are intending to use it

from http://dublincore.org/documents/dcmi-terms/#dcmitype-InteractiveResource

definition: A resource requiring interaction from the user to be understood, executed, or experienced.
examples: Examples include forms on Web pages, applets, multimedia learning objects, chat services, or virtual reality environments.

examples of usage from the dublin core RDF user guide: (BTW - not a great example)

ex:myStuff dcterms:type dctype:InteractiveResource , 
                        dctype:Text .

dctype:InteractiveResource rdfs:label "Interactive Resource" .

dctype:Text rdfs:label "Text" .

I suggest we find an alternate property to represent the computing environment for the activity.

usage of dcterms:RightsStatement

lib/Tuba/files/templates/figure/object.ttl.tut uses dcterms:RightsStatement as a property.

## Tags:
   dcterms:subject "<%= $image->attributes %>"^^xsd:string;
   dcterms:RightsStatement "<%= $image->usage_limits %>"^^xsd:string;

We should use the property dcterms:rights which has the domain dcterms:RightsStatement.

This link provides an example of the usage of dcterms:rights in RDF
http://wiki.dublincore.org/index.php/User_Guide/Publishing_Metadata#dcterms:rights

I recommend changing the template to look like this

dcterms:rights [ rdf:value "<%= $image->usage_limits %>"^^xsd:string; ] ;

The instance data after this update would look like this:

  dcterms:rights [ rdf:value "Free to use with credit to the original figure source."^^xsd:string;  ] ;

Lexicon RDF issues

I believe there may be errors in the RDF generated for http://data.globalchange.gov/lexicon/nsidc (and in other lexicons generated using the same template)

The RDF we are currently generating looks like this:

<http://data.globalchange.gov/lexicon/nsidc>
   dcterms:identifier "nsidc";
   dcterms:title "National Snow and Ice Data Center NASA DAAC lexicon"^^xsd:string;
   gcis:hasURL "http://nsidc.org/api/opensearch/index.html"^^xsd:anyURI;

   a skos:Concept, dbpedia:Lexicon .

</instrument/advanced-microwave-scanning-radiometer-eos>
   a skos:Concept;
   skos:altLabel "AMSR-E".

The </instrument/x> URI is not a HTTP URI and will be interpreted as the following when read by a RDF library.

<file:///instrument/advanced-microwave-scanning-radiometer-eos>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>  skos:Concept ;
        skos:altLabel  "AMSR-E" .

Do we want this URI to instead be http://data.globalchange.gov/lexicon/nsidc/instrument/advanced-microwave-scanning-radiometer-eos ?

Also, the lexicon resource should have type skos:ConceptScheme instead of skos:Concept.

It is my guess that we want the generated RDF to look like this:

<http://data.globalchange.gov/lexicon/nsidc>
   dcterms:identifier "nsidc";
   dcterms:title "National Snow and Ice Data Center NASA DAAC lexicon"^^xsd:string;
   gcis:hasURL "http://nsidc.org/api/opensearch/index.html"^^xsd:anyURI;
   a skos:ConceptScheme, dbpedia:Lexicon .

<http://data.globalchange.gov/lexicon/nsidc/instrument/advanced-microwave-scanning-radiometer-eos>
   a skos:Concept;
   skos:inScheme <http://data.globalchange.gov/lexicon/nsidc> ;
   skos:altLabel "AMSR-E".

Indicate contributor role on html of publication page

We need to explicitly indicate the nature of the role (e.g. “author,” “funding agency,” “publisher,” etc.) on the html versions of publication pages. Otherwise a reader could think that a publishing company or agency was involved with a study when in reality its involvement is through funding or an alternate method.

instance data for roles

References USGCRP/gcis-ontology/issues/128, USGCRP/gcis-ontology/issues/101, USGCRP/gcis-ontology/issues/102.

We have decided to store role types in the database instead of in the ontology. As a result we will be removing the role instances from the ontology and building role instance RDF from the database using templates.

Role information should also be available via the RESTful API.

http://data.globalchange.gov/role/{id}

Roles will be simple records composed of an identifier, label, and comment.

Role

  • id TEXT
  • label TEXT
  • comment TEXT

Enable Inferencing through RDFS in Virtuoso

I think we should start running inferencing so we can support queries that use super-classes and super-properties or type information that is inferred from property domain and range values. Running inferencing will also put a spotlight on our class hierarchies and will potentially help expose class relationships that lead to confusion and should be re-evaluated (I consider this a good thing).

Right now I suggest we focus on RDFS inference. We can evaluate different profiles of OWL2 inference later.

There are two different strategies we can use to implement RDFS reasoning

  1. Configure Virtuoso to implement the RDFS reasoning virtuoso documentation
  2. Run the reasoning as part of our RDF load process, import the inferred triples into the triplestore - Jena RIOT documentation

Create a "doi" field for books

*Create a "DOI" (or “doi”) field in the entries for books in GCIS. The field can be populated by looking for the http://dx.doi.org/... string in the URL for some GCIS book entries. This could probably be a quick coding task involving regular expressions.

Publication 'Publisher' as a related Contributor

Migrate the values of the publication "publisher" field into that for the "publisher" as a "contributor" field for books (i.e., move value of "publisher" in "books" to that of "publisher" in "contributors"). Ignore where this has been done automatically.

Note: the subject of the level of granularity for publishers and the addressing of the various changes in the publisher market (e.g. Wiley or Elsevier has many divisions) is topic for discussion.

Add field / support for ISBNs for entities of class "report"

Add field / support for ISBNs for entities of class "report" and populate accordingly. Some examples include those of the IPCC, National Academy of Sciences/NRC, and Arctic Monitoring Program.

For this purpose recall that a gcis:Report can also be a book that calls itself a report.

skos concept references in 'also known as section'

from https://data.globalchange.gov/dataset/nasa-nsidcdaac-0032.ttl

## Also known as:
<http://data.globalchange.gov/dataset/nasa-nsidcdaac-0032>
   skos:altLabel "oai:nsidc/NSIDC-0032";
   gcis:hasURL "http://nsidc.org/api/dataset/2/oai?verb=GetRecord&metadataPrefix=dif&identifier=oai:nsidc/NSIDC-0032";
   skos:Concept <http://data.globalchange.gov/lexicon/nsidc> .

What are we trying to say with the statement that currently uses skos:Concept as a property?

skos:Concept is a class and should not be used as a property.

If we are 'tagging' this resource with a skos:Concept we should use dcterms:subject or similar. The correct property to use here depends on the semantics of the tagging relationship.

dcterms:subject is a good general-purpose predicate to use in this case because it has the following definition:

definition: The topic of the resource.
comment: Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary.

My guess is that we should replace skos:Concept here with dcterms:subject so the example above looks like

## Also known as:
<http://data.globalchange.gov/dataset/nasa-nsidcdaac-0032>
   skos:altLabel "oai:nsidc/NSIDC-0032";
   gcis:hasURL "http://nsidc.org/api/dataset/2/oai?verb=GetRecord&metadataPrefix=dif&identifier=oai:nsidc/NSIDC-0032";
   dcterms:subject <http://data.globalchange.gov/lexicon/nsidc> .

but I would like to know more about what we are attempting to say here. Are we attempting to say that the NSIDC lexicon is a topic of this dataset? I am not confident that is what we are trying to say.

Add field for “report series” to facilitate querying

Add field for “report series” (e.g. IPCC AR6 volumes) to facilitate querying.
e.g.: the series "IPCC AR5," "IPCC AR6." Right now this relation can only be identified through a regular expression search on their respective titles, not through a formal relationship in the database.

Adjust turtle templates appropriately to accommodate this addition.

dcterms:subject (free-text)

Inspired by the discussion of free-form text strings at:
#150
hence I'm broaching it at this point.

Regarding the use of "attributes" for images, e.g.:
http://data.globalchange.gov/image/ff6a7a8e-d886-4b30-acd7-a3538a787baf

Note that the line beginning with "dcterms:subject" also contains multiple objects.

I can think of the use case where someone wishes to query all images pertaining to "precipitation." As written now at:
http://data.globalchange.gov/image/ff6a7a8e-d886-4b30-acd7-a3538a787baf
(1) Will this image come up given the multiple objects?
(2) If not, could we fix this?

Thanks.

isCitedBy, isReferencedBy

@zednis @bduggan @aulenbac @rewolfe @BNewman2104 I don't know whether this would be an ontology question or a dev one so I'm posting it here.

Most pubs in GCIS have references for which their representation in endnote had only one UUID regardless of the quantity of identical references. An example is at http://data.globalchange.gov/report/federal-actions-climate-resilient-nation.thtml
which, although cited in multiple chapters, had a reference in endnote with identical UUIDs in all instances. Therefore, cito:isCitedBy <nca3> only appears once on this page.

However, there were a few instances of references repeated in multiple nca3 chapters which have multiple uuids, e.g.:
http://data.globalchange.gov/report/ceq-progrep-2010.thtml

This is because we were unable to catch 100% of the cases where a reference was cited by multiple chapters and consequently assign the same UUID to all instances of those references. My point is that it's solely an endnote issue.

The point is that this results in identical triples. For the second example provided above,

<http://data.globalchange.gov/report/ceq-progrep-2010>
   cito:isCitedBy <http://data.globalchange.gov/report/nca3>;
   biro:isReferencedBy <http://data.globalchange.gov/reference/1a73a6da-b3fb-4e34-aa3a-868579682b56>.

<http://data.globalchange.gov/report/ceq-progrep-2010>
   cito:isCitedBy <http://data.globalchange.gov/report/nca3>;
   biro:isReferencedBy <http://data.globalchange.gov/reference/37163c2c-5801-4702-9191-f32bf0f05b26>.

Is this a problem? @zednis would it be easier to just locate an alternate term for biro:isReferencedBy and thus create something like the following (improper use of RDF) pseudocode:

<report_URI> <CitedBy> <report> <inContextOf> <reference_UUID> .

where <inContextOf> would replace biro:isReferencedBy?

Names containing first and middle initials vis-a-vis URLs

As part of the last code update, support was added for contributors' names within URL, e.g.:
http://data.globalchange.gov/person/Andrew_Buddenberg

now resolves, redirecting to:
http://data.globalchange.gov/person/1948

Please advise as to the correct URL be for pages referring to contributors' who use their first initial, along with their middle and last names? e.g. for C. Ben Beard:
http://data.globalchange.gov/person/903

as neither:
http://data.globalchange.gov/person/C_Ben_Beard
http://data.globalchange.gov/person/C%20Ben%20Beard
resolve.

and similarly for those who use "first name, middle initial, and last name"

e.g.:
http://data.globalchange.gov/person/2308
since http://data.globalchange.gov/person/elizabeth_a_ainsworth doesn't resolve.

Thanks.

Annales_de_chimie

Thread began at:
#147

I'm spinning off the subject of this particular journal into its own entry.

Org types and nature of org-org relationships

This issue can fall into multiple categories and relate to many other issues, such as #229:

  • Update turtle templates to make the rdf representation (subject and predicate) for organization relationship type contingent on the nature of the relationship;
    Example: if org A is a department of org B, and org C is an affiliate of org D, relate through “org:unitOf” and “gcis:affiliateOf,” respectively within the turtle.
  • allow searchability to be independent of relationship type.
    Example: If someone wants to identify all the organizations related to a given organization without knowing the nature of the relationship or without knowing the semantically defined representation thereof, how would one code and implement? This is an outgrowth of the Justin-Rama use case.

Enhances to searchability

Enhances to searchability: increased string distance, searchability by resource type, allowing of quotation marks

Harmonize Organization Identifiers and Titles

Harmonize organization identifiers and titles, e.g. look at the inconsistent org names and identifiers for those listed at:
http://data.globalchange.gov/organization/us-department-agriculture
http://data.globalchange.gov/organization/us-environmental-protection-agency
Also, we aren’t consistent about identifier or organization Titles for departments of University Medical Schools. Sometimes, we provide “University School of Medicine Department of” and sometimes “University School of Medicine.” This should be fixed.

Link journals with DBPedia

You've got journals such as:
http://data.globalchange.gov/journal/science

some have wikipedia pages such as:
https://en.wikipedia.org/wiki/Science_(journal)
and DBPedia pages:
http://dbpedia.org/page/Science_(journal)

you could link to.

Where you have "bibo:issn", dbpedia has "dbo:issn" which appear to match.. A "sync" type script could look up matching ISSNs and establish the links automatically.

DBPedia also has "dbo:impactFactor" and "dbp:impact" from which you could determine Justin's "articles with a top journal ranking (e.g. Nature and Science) per an official citation metric of one's choice".

(Actually, it would be pretty cool and not too hard I think to dynamically pull in the JSON version of the DBPedia data from your HTML template on the client side and duplicate the wikipedia box of facts about the journal on your journal HTML page -- without sucking in all the data to your database.)

Journals

Inspired by #191 , I am wondering whether we could relate dbpedia:AcademicJournal
(http://dbpedia.org/page/Academic_journal) to gcis:Journal or rewrite the end of the code for journals to:

"a gcis:Journal, dbpedia:AcademicJournal."

Looking at dbpedia, I see many publication types which relate, at least at first glance, to those in gcis.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.