Coder Social home page Coder Social logo

Comments (10)

augusto-herrmann avatar augusto-herrmann commented on May 31, 2024 1

Considering that government structure changes quite frequently in most countries, I think this project should have some instructions or guidelines on how to handle the merge, split, and transformation of public bodies.

We could take as an example the policy paper from OpenCorporates on How OpenCorporates should handlecompany number problems. There should be some identifiable parallels on how they deal with company data and how we deal with public organizations data.

from publicbodies.

marians avatar marians commented on May 31, 2024

Wouldn't it be great to think in URLs here? E.g, Oxfam could then be http://publicbodies.org/GB-CHC-202918 or even better http://publicbodies.org/GB/CHC/202918.

from publicbodies.

practicalparticipation avatar practicalparticipation commented on May 31, 2024

@marians In the IATI ORG ID standard we've designed it to work for legacy systems where URLs are not valid values of a database field, and to avoid being tied to any particular domain for resolving identifiers, but so that the pattern of identifiers can be very easily converted into URLs and resolved by any number of services.

See for example: http://opencirce.org/org/

So - URL compatible string-based IDs has been guiding principle.

from publicbodies.

marians avatar marians commented on May 31, 2024

@practicalparticipation Thanks for the comment & info!

from publicbodies.

practicalparticipation avatar practicalparticipation commented on May 31, 2024

@markbrough Easy questions first:

3) How should the identifiers be identified as being created by Public Bodies - just a prefix like PB-?
In the IATI ORG ID standard the namespace should really be:

MISC-PB-{ID}

At the moment the registry of namespaces for IATI is just a spreadsheet - but there's the goal of making this a shared list and getting some better services for managing it in future, including services that can help resolve namespace prefixes into URLs for getting more information on any named entity.

Now the trickier ones:

*(1) Does this sound sensible? Is it a good idea? Is there a better alternative? *

Obviously the best case is whenever official lists of bodies actually exist - but we know this is often not the case. But presumably the mapping element of this would mean that if an official list did become available it could be mapped to these 'incubator ids' - and if the service provided a 'canonical ID' API that when called with an ID would check if a better one had come along, or if the ID requested had been merged with another - would return a canonical ID, we would get to a far better place in terms of users being able to find when they are dealing with data about the same organisation.

I doubt we'll get many original govt publishers of data using these IDs, but the potential for them to harmonise how re-users of the data represent the information they have is interesting.

The risk of false positives and bad matches getting into data and leading to wrong conclusions 'downstream' is fairly big with this - so thinking about provenance or 'certainty' information that an API might return could be important.

*(2) Will the fuzzy matching be accurate enough to be useful? Is it likely to assign organisations an incorrect code? *

I think this is going to be a challenge and a risk. When we get down to names of schools, health services etc. then real problems of name clashes are likely to occur. At the level of departments the problem is less likely.

Thinking about the other data that might be used in fuzzy matching, like 'city of head office', 'website address' etc. that could help firm up matches might be useful.

from publicbodies.

bill-anderson avatar bill-anderson commented on May 31, 2024

For IATI this issue is fast heading away from being a problem towards becoming a road crash. So my current twopence is:

We've spent the last year or so searching for a methodology that has both pragmatic logic and political traction. There's nothing substantial out there and a depressing lack of interest from a range of bodies whom one would think would need this as badly as IATI does.

We've always said that it is not IATI's business to curate such a methodology: it needs a wider home. But we're reaching a point where we've got to do something.

The way we are going to solve this problem is by thrashing as many ideas around as possible - so this is an excellent thread. Good stuff @markbrough !

I'm not convinced that a system based on the machine interpretation of spelling, however sophisticated the algorithm is, is going to be efficient enough.

  • Firstly people don't spell very well.
  • Secondly there are many languages.
  • Thirdly will machine logic cope with the difference between a change in spelling and a change in name that represents a new entity; for example there are currently 62 different ways in which governments around the world have named established ministries that start with the word "Agriculture" (see below). When a Ministry of Agriculture becomes the Ministry of Agriculture and Food Security is this a change of name or a re-organisation of government departments?
  • And fourthly once identifiers are in use it is very difficult to tidy up the list

Here's another imperfect idea...

https://docs.google.com/spreadsheet/ccc?key=0AnWngmdQt3stdGNDVDB5SlZrWVNkd0w4a1FWX0xTY2c#gid=0

I've scraped the CIA Heads of States list and built a (tidied) list of names of current departments and added a code which is a mixture of the name and a counter (which allows new names to be added manually in at least some kind of logical order).

In IATI the Rwandan Ministry of Finance and Economic Planning would become something like

MISC-PB-RW-FI18

Problem with this coding is that the code is language specific. Not a good idea for a global list.

With this approach the list is centrally curated and manual intervention would be required to create a new code. Is this a good or bad thing? While names of government departments may be maintained with relative ease, government agencies are whole different ball game.

from publicbodies.

jpmckinney avatar jpmckinney commented on May 31, 2024

The Sunlight Foundation proposes using a UUID (and possibly scoping the UUID to a country) and then using a reconciliation/ID resolution service to avoid duplicates: https://github.com/opencivicdata/opencivicdata/wiki/Entity-ID-Resolution-Service

from publicbodies.

rufuspollock avatar rufuspollock commented on May 31, 2024

Note connection here with #23 and discussion around keys ...

from publicbodies.

markbrough avatar markbrough commented on May 31, 2024

Hey @augusto-herrmann thanks for bringing this thread back to life, I had forgotten about it :)

A couple of years ago @practicalparticipation was commissioned by IATI to write a discussion paper on this which is worth taking a look at. It explores a number of different approaches. There is a bunch of discussion on that paper here. My own view now is that we should be using (existing) government Charts of Accounts as the primary source for these codelists (rather than the approach I had set out above).

I know that this approach would be imperfect, but my argument is that it is at least a solid start to dealing with this problem. I haven't really seen anything to dissuade me of this argument over the last couple of years.

from publicbodies.

markbrough avatar markbrough commented on May 31, 2024

An update on this issue: we now have codes for 50 countries, based on country budgets or charts of accounts, extracted and published here:
https://gov-id-finder.codeforiati.org/

The source repository is here:
https://github.com/codeforIATI/gov-id-finder-data

According to the methodology detailed on the site, the organisation identifier for Ministry of Health and Social Welfare - Liberia is LR-COA-310

from publicbodies.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.