Coder Social home page Coder Social logo

gaz's Introduction

GAZ

An open source gazetteer constructed on ontological principles.

  • Creator: Michael Ashburner (retired)
  • Lead-editor: Lynn Schriml (caretaker -- coordinating community efforts)

Brief description

GAZ is a large, ontologically-oriented resource listing place names which can be treated as instances of environments (see ENVO).

Getting GAZ

Due to size constraints, GAZ's OWL- and OBO-formatted files will not be hosted here for the moment. You can, however, download them here:

##GitHub can handle large files, extension git-lfs extension [Download] (https://git-lfs.github.com)

Issues, comments, and questions

Please use this repository's issue tracker to post issues (such as new term requests), comments, and questions.

gaz's People

Contributors

andrawaag avatar cmungall avatar lschriml avatar pbuttigieg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gaz's Issues

Upload all of GAZ to wikidata

We don't have resources to update gaz.obo. Unless we can find a volunteer it may make most sense to upload to wikidata and have people update on wikidata (having a way to export a gazetteer in obo or owl from wikidata will be easy).

If people are in favor I can look to getting some tips on how best to do this

How to cite?

Greetings, I am interested in citing Gazeteer in a manuscript. Can you point me towards a publication or some other artifact you'd like me to cite to? Thank you.

Elizabeth Umberfield, PhD, RN
Postdoctoral Research Fellow
Public & Population Health Informatics
Regenstrief Institute and Indiana University Fairbanks School of Public Health
[email protected]

Zip code ZCAT and geographic coordinate

I am new to the GZ but I have recently start working on geospatial data and one of the issue I am having is the following
The data I am working on are capturing the zip code information but as you know this is a label for postal service and does not have any direct relation with the ZCAT and the geographic coordinates.
Is this the right place to raise this issue?

Marine Regions Gazetteer

Question migrated over from #23

Perhaps lower priority issue for now, but do you think there could be any scope to link GAZ to the Marine Regions Gazetteer it's a pretty extensive resource which has term hierarchies for Large Marine Ecosystems of the World, Longhurst Provinces etc. This could be quite a useful resource for the Oceans and Seas module.

Determine who is using GAZ and how it is being used

Anyone using GAZ, please add comments to this ticket!

Also include whether you use the obo or owl file, if you download or use an API, etc. Note any specific things you would like to see at a general level (or link to a ticket)

Ensure non-ascii form of names are present (e.g. umlauts, as in Gurbantünggüt desert)

Searching OLS for Gurbantünggüt desert returns no results

The term in GAZ is "Gurbantunggut Desert"

https://www.ebi.ac.uk/ols/ontologies/gaz/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FGAZ_00114666

Perhaps the OLS should be enhanced to better be able to search on umlauts

But either way the official name seems to have umlauts, GAZ should have them

I imagine there were issues with umlauts in obo-edit

We could look to correcting these automatically once wikidata xrefs are done, as WP/WD has the formal name: https://en.wikipedia.org/wiki/Gurbant%C3%BCngg%C3%BCt_Desert

Random fact:

t is in this desert that the remotest point of land from any sea is located. According to some calculations, the precise point is at 46°16.8′N 86°40.2′E. It was pinpointed and reached on 27 June 1986 by British explorers Nicholas Crane and Richard Crane; the location was described as being in the Dzoosotoyn Elisen Desert. This position is over 2,600 kilometers (1,600 mi) from the nearest coastline.[5]

It would be neat to have this point in GAZ but we have no use case for this...

Rename South Africa GAZ:00000553

I would like to request this term is renamed as
'Southern Africa' GAZ:00000553
to avoid confusion with the country commonly known as 'South Africa'.

If it's not possible to rename this, suggest adding Southern Africa as a synonym.

Thanks,
Varsha

Biogeographic regions

Hi @EnvironmentOntology/gaz-editors , branching from #20:

While GAZ was dormant, we incubated a few WWF ecoregions in ENVO. See EnvironmentOntology/envo#658

When you get to those modules, let us know and we'll obsolete our instances and point to GAZ instances.

As ENVO is meant to provide the classes for GAZ's instances, do let us know if you need more content ENVO side. You're also welcome to propose edits yourselves.

Proposal: convert GAZ to an instance-based representation

Currently GAZ is represented as all classes. E.g. Andorra is a class. Clearly this is not correct, but it's well-known that the reason for this choice is purely pragmatic - Michael would edit in Obo-Edit, which only handles classes.

I propose the following:

  • gaz.owl is converted to an instance-based representation
  • gaz.obo is converted to classes in the release process (instances are not well-supported in .obo format)

Any consumers of gaz.obo (which includes a number of relational database systems) should be unaffected, provided the conversion does not lose anything.

We would also need to check with all consumers of gaz.owl to make sure this does not break anything.

I think the OLS display would break somewhat, but we can just point OLS at gaz.obo

This will need considerable planning.

  • Will editing instances in Protege be as easy as editing classes? The current representation of GAZ is not just geared towards OE, it's an easy structure to browse (especially now Protege supports existential hierarchies directly)
  • We will need at least preliminary rdf:type / ClassAssertion axioms, I had a conversion at some point, will look this up...

Explanatory document for newbies

GAZ needs an explanatory document for newbies – how to get a country's 2nd level subdivisions, whether they are states/provinces/territories/regions etc. and the linkage between geopolitical levels of government entities using "located in". Same for municipalities, villages, ENVO human habitation etc… Unless there is some documentation about this that I don't know about?

NTR: North Sea coastal waters of Belgium

Suggest new term
"North Sea coastal waters of Belgium"

as sibling of
North Sea coastal waters of Norway GAZ:00144350
North Sea coastal waters of Denmark GAZ:00144351
North Sea coastal waters of the United Kingdom GAZ:00144349

Thanks,
Varsha

Decide how to modularize GAZ such that individual subsets can be managed in github

  • What source format? .obo is easy for diffing but this assumes we don't convert to instances. Dependency on #20
  • How do we modularize? We need a set of mutually exclusive exhaustive categories. If this is not possible we need an agreed upon prioritization to detemine which entity belongs in which module
  • How do we determine the initial conversion is not lossy?
    • robot diff doesn't scale
    • If we punt on #20 for now, then obo-level diffing is very easy and scalable, could be done at the ascii level even, I also have scripts here: https://github.com/cmungall/obo-scripts
    • for an instance representation, there will be no blank nodes so this makes RDF-level diffing easy

GAZ is inconsistent

In an attempt to set up a uniform set of quality control checks across OBO ontologies, we noticed that GAZ is currently inconsistent. Due to its size, its a bit hard to determine run the reasoner in protege, so here the explanation for the inconsistency:

Thing SubClassOf Nothing

Reason for inconsistency:

grassland area is ultimately classified as an immaterial entity:

undersea feature is ultimately classified as a material entity

Nothing can be both material and immaterial

Tualatin Mountains are instances of both of the above.

Which seems to come from a bad interaction between ENVO and GAZ. This may not solve the deeper modelling issue, but at least could drastically reduce the error severity: removing the type assertions on Tualatin Mountains.

NTR: Ankeniheny-Zahamena Corridor

NTR: Ankeniheny-Zahamena Corridor
Defn: links Zahamena National Park (GAZ:00076565) and Andasibe-Mantadia National Park (GAZ:00005428)

Replace obsolete 'lava field' (ENVO:00000095)

Replace old ENVO:00000095 with new ENVO:01000437 (also lava field).

It's currently in use in the following country subsets (which I'll be pushing shortly):

  • Canada
  • Guatemala
  • Nicaragua
  • United States of America

Evaluate coverage of GAZ with respect to other resources

We are currently aligning to wikidata (#3)

It would be useful to see a complete landscape picture of what is covered and not covered in existing resources, alongside any licensing restrictions.

This can help us with prioritizing curation resources - no need to redo what is there in an existing open resources

geonames seems to be CC-BY, so would be no issue in re-using

They have a premium tier but I don't think we need it
http://www.geonames.org/products/premium-data.html

Cross-ticket on RDP: reusabledata/reusabledata#185

Geographic relations

Hi all, branching off of #21

GAZ will inevitably need some more geographically themed relations in RO, linked to issues such as
EnvironmentOntology/envo#80, EnvironmentOntology/envo#148

Many existing spatial relations coming in from biology and anatomy can be used if their definitions are generalised a bit or if more general superclasses are created.

@rctauber @cmungall @lschriml Shall we use this issue to log object properties that GAZ curators feel would be useful?

GAZ hasDbXref to ISO_3166 countries

GAZ offers up these database cross references for, e.g. Belgium

"hasDbXref": [
"ISO3166-1:BE",
"ISO3166-2:BE",
"ISO3166-1:BEL",
"ISO3166-1:056"
]
But there are actually sub-categories of ISO3166-1, and that's why what looks like a 1-many mapping is actually 1-1 if subcategory is taken into account. I'm not sure how exactly to represent it but something like?

hasDbXref 
  https://www.iso.org/obp/ui/#iso:code:3166:BE
  ISO_3166-1_alpha-2:BE
  ISO_3166-1_alpha-3:BEL
  ISO_3166-1_numeric:056
  ISO_3166-2:BE

More info at:
https://en.wikipedia.org/wiki/ISO_3166

Some related work: https://arxiv.org/pdf/0801.3908.pdf
https://www.slideshare.net/nichtich/encoding-changing-country-codes-in-rdf-with-iso-3166-and-skos

Transitioning from GAZ to Wikidata - is there a comprehensive query?

As a transition effort from GAZ to wikidata, I will try to figure out a comprehensive wikidata query that can retrieve all continents and countries of the world, and their states/provinces/territories, cities/towns/villages but before I start, has any GAZ curator already done this? I don't need the map from GAZ to Wikidata ID's - just the Wikidata hierarchy of things. [EDIT: this would be used to create a dynamic menu system to aid in data entry, providing picklists for users. Its not good UI just to dump users over in wikidata to search for a town etc.]

Identify Wikidata reconciliation strategies for GAZ.

GAZ does seem to have many mappings to external identifiers (if at all). This makes aligning Wikidata particularly challenging.

To get all terms in GAZ covered in Wikidata we would probably need to apply different strategies to see if a term is already is covered or not.

In the case where the label used in Wikidata exactly matches the term in GAZ, Open refine, can be our friend. I used this tool - offered in for example PAWS - to align GAZ countries with Wikidata.

However, I continued with terms on Suriname in GAZ. So far all terms do exist in Wikidata but most with a different spelling variation. I will try to add all GAZ terms for that country, manually.

So so far two strategies have been applied:

  1. Where the terms match exactly in Wikidata, we can rely on Open Refine
  2. Where the terms exist, but with difference in spelling, manual curation by a curator with local knowledge is required
  3. .......

GAZ to INSDC country list mapping

In an effort to comply with INSDC reporting of sample location by country, we have created a mapping of GAZ country & region codes to their INSDC counterparts at http://www.insdc.org/country.html by using a hasDbXref:INSDC:country:[xyz] annotation on the appropriate GAZ item. Is the GAZ team interested in adding these annotations directly to GAZ? If so we have the .owl import file containing the annotations we can provide that could be easily merged.

The include file is at https://github.com/GenEpiO/genepio/blob/master/imports/gaz_insdc_mapping.owl
see
http://www.insdc.org/country.html
It is based on https://unstats.un.org/unsd/methodology/m49/ , but sadly, in either case, there are no region / country-specific purls to offer.

do we need GAZ?

I don't understand why we need an ontology for this when other standards like geonames already exist. I view it more like the issues with making gene names available for axioms.

Coordinating and Prioritizing tasks for GAZ

I would like to plan a meeting of the GAZ group, to prioritize
tasks and to determine who will carry out the work, and take
responsibility for the various tickets.

Please indicate your availability in this ticket, so that we can coordinate a call for
early June.

Lynn: On travel May 15-29, fairly open schedule first two weeks of June.

Cheers,
Lynn

Request GAZ class "State (United States of America)"

GAZ has state classes for Mexico, Australia etc. Can we get this for the US to provide easy menu selection? Wikipedia def seem fine:

label: "State (United States of America)"
definition: "In the United States, a state is a constituent political entity, of which there are currently 50. Bound together in a political union, each state holds governmental jurisdiction over a separate and defined geographic territory and shares its sovereignty with the federal government."
definition source: https://en.wikipedia.org/wiki/U.S._state

If helpful, SPARQL query is in "insert USAstate" query in https://github.com/GenEpiO/genepio/blob/master/src/ontology/gazetteer2.py , although I see a few american protectorates in there too, and it uses GenEpiO's "US state" class.

Also, there is one discrepancy in Gazetteer right now:
It had adopted the practice of having owl:NamedIndividual with same id's as owl:Class for a number of items (punning), but not for states in the USA.

Create unique names for entities

A long time ago I had a discussion with Michael about entity names. GAZ stood in contrast to everything else in OBO that had largely unique names.

He convinced me that it was fine in GAZ to have many entities in GAZ share the same name. However, it might be nice to create a unique obo foundry unique label for all entities.

An important note that will hopefully become stale soon. In previous versions of Protege, you get into massive trouble if your ontology has entities that share the same rdfs:label. I think this is fixed in 5.5 or 5.6 but haven't tested. This should probably be in the editor docs when we get them up

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.