This is an idea that I've been thinking about for a while. I discussed it with <a clas

Wouldn't it be great to think in URLs here? E.g, Oxfam could then be <code class="notr

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Note connection here with <a class="issue-link js-issue-link" data-error-text="Failed

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Organisation identifiers (for discussion) about publicbodies HOT 10 OPEN

datasets commented on May 31, 2024

Organisation identifiers (for discussion)

from publicbodies.

Comments (10)

augusto-herrmann commented on May 31, 2024 1

Considering that government structure changes quite frequently in most countries, I think this project should have some instructions or guidelines on how to handle the merge, split, and transformation of public bodies.

We could take as an example the policy paper from OpenCorporates on How OpenCorporates should handlecompany number problems. There should be some identifiable parallels on how they deal with company data and how we deal with public organizations data.

from publicbodies.

marians commented on May 31, 2024

Wouldn't it be great to think in URLs here? E.g, Oxfam could then be http://publicbodies.org/GB-CHC-202918 or even better http://publicbodies.org/GB/CHC/202918.

from publicbodies.

practicalparticipation commented on May 31, 2024

@marians In the IATI ORG ID standard we've designed it to work for legacy systems where URLs are not valid values of a database field, and to avoid being tied to any particular domain for resolving identifiers, but so that the pattern of identifiers can be very easily converted into URLs and resolved by any number of services.

See for example: http://opencirce.org/org/

So - URL compatible string-based IDs has been guiding principle.

from publicbodies.

marians commented on May 31, 2024

@practicalparticipation Thanks for the comment & info!

from publicbodies.

practicalparticipation commented on May 31, 2024

@markbrough Easy questions first:

3) How should the identifiers be identified as being created by Public Bodies - just a prefix like PB-?
In the IATI ORG ID standard the namespace should really be:

MISC-PB-{ID}

At the moment the registry of namespaces for IATI is just a spreadsheet - but there's the goal of making this a shared list and getting some better services for managing it in future, including services that can help resolve namespace prefixes into URLs for getting more information on any named entity.

Now the trickier ones:

*(1) Does this sound sensible? Is it a good idea? Is there a better alternative? *

Obviously the best case is whenever official lists of bodies actually exist - but we know this is often not the case. But presumably the mapping element of this would mean that if an official list did become available it could be mapped to these 'incubator ids' - and if the service provided a 'canonical ID' API that when called with an ID would check if a better one had come along, or if the ID requested had been merged with another - would return a canonical ID, we would get to a far better place in terms of users being able to find when they are dealing with data about the same organisation.

I doubt we'll get many original govt publishers of data using these IDs, but the potential for them to harmonise how re-users of the data represent the information they have is interesting.

The risk of false positives and bad matches getting into data and leading to wrong conclusions 'downstream' is fairly big with this - so thinking about provenance or 'certainty' information that an API might return could be important.

*(2) Will the fuzzy matching be accurate enough to be useful? Is it likely to assign organisations an incorrect code? *

I think this is going to be a challenge and a risk. When we get down to names of schools, health services etc. then real problems of name clashes are likely to occur. At the level of departments the problem is less likely.

Thinking about the other data that might be used in fuzzy matching, like 'city of head office', 'website address' etc. that could help firm up matches might be useful.

from publicbodies.

bill-anderson commented on May 31, 2024

For IATI this issue is fast heading away from being a problem towards becoming a road crash. So my current twopence is:

We've spent the last year or so searching for a methodology that has both pragmatic logic and political traction. There's nothing substantial out there and a depressing lack of interest from a range of bodies whom one would think would need this as badly as IATI does.

We've always said that it is not IATI's business to curate such a methodology: it needs a wider home. But we're reaching a point where we've got to do something.

The way we are going to solve this problem is by thrashing as many ideas around as possible - so this is an excellent thread. Good stuff @markbrough !

I'm not convinced that a system based on the machine interpretation of spelling, however sophisticated the algorithm is, is going to be efficient enough.

Firstly people don't spell very well.
Secondly there are many languages.
Thirdly will machine logic cope with the difference between a change in spelling and a change in name that represents a new entity; for example there are currently 62 different ways in which governments around the world have named established ministries that start with the word "Agriculture" (see below). When a Ministry of Agriculture becomes the Ministry of Agriculture and Food Security is this a change of name or a re-organisation of government departments?
And fourthly once identifiers are in use it is very difficult to tidy up the list

Here's another imperfect idea...

https://docs.google.com/spreadsheet/ccc?key=0AnWngmdQt3stdGNDVDB5SlZrWVNkd0w4a1FWX0xTY2c#gid=0

I've scraped the CIA Heads of States list and built a (tidied) list of names of current departments and added a code which is a mixture of the name and a counter (which allows new names to be added manually in at least some kind of logical order).

In IATI the Rwandan Ministry of Finance and Economic Planning would become something like

MISC-PB-RW-FI18

Problem with this coding is that the code is language specific. Not a good idea for a global list.

With this approach the list is centrally curated and manual intervention would be required to create a new code. Is this a good or bad thing? While names of government departments may be maintained with relative ease, government agencies are whole different ball game.

from publicbodies.

jpmckinney commented on May 31, 2024

The Sunlight Foundation proposes using a UUID (and possibly scoping the UUID to a country) and then using a reconciliation/ID resolution service to avoid duplicates: https://github.com/opencivicdata/opencivicdata/wiki/Entity-ID-Resolution-Service

from publicbodies.

rufuspollock commented on May 31, 2024

Note connection here with #23 and discussion around keys ...

from publicbodies.

markbrough commented on May 31, 2024

Hey @augusto-herrmann thanks for bringing this thread back to life, I had forgotten about it :)

A couple of years ago @practicalparticipation was commissioned by IATI to write a discussion paper on this which is worth taking a look at. It explores a number of different approaches. There is a bunch of discussion on that paper here. My own view now is that we should be using (existing) government Charts of Accounts as the primary source for these codelists (rather than the approach I had set out above).

I know that this approach would be imperfect, but my argument is that it is at least a solid start to dealing with this problem. I haven't really seen anything to dissuade me of this argument over the last couple of years.

from publicbodies.

markbrough commented on May 31, 2024

An update on this issue: we now have codes for 50 countries, based on country budgets or charts of accounts, extracted and published here:
https://gov-id-finder.codeforiati.org/

The source repository is here:
https://github.com/codeforIATI/gov-id-finder-data

According to the methodology detailed on the site, the organisation identifier for Ministry of Health and Social Welfare - Liberia is LR-COA-310

from publicbodies.

Organisation identifiers (for discussion) about publicbodies HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent