Coder Social home page Coder Social logo

Comments (5)

sshaw avatar sshaw commented on June 15, 2024

Hi, thanks for bringing this to my attention.

I'm wondering if the library should erase punctuation and flatten to ASCII when comparing?

I think this will require a library to address the tricky cases, for example ß to ss. iconv can do this but one thing that is nice is this gem has not dependencies.

And come to think of it, the official name is also a bit weird, mixing "Republic of" (English) with D'Ivoire (French).

In the US this is common for names. For example, Gary Dell'Abate uses the ' (Italian) or Pedro Muñoz uses the ñ (Spanish). I also see US papers using São Paulo, Malmö, etc...

I see elsewhere in en.yml there are aliases. Perhaps that's a better solution, adding a lot of aliases?

Seems to be the best. I can see this becoming unmanageable but seems that we're far away from that.

Given you examples and similar existing cases, we should have aliases with non-ascii apostrophe and Palestine, State of variants. But one question here: is this name part of a standard somewhere? Not sure how to apply to others. We have some already and others no. For example: State of Israel but not Israel, State of.

Likewise punctuation as in Bosnia-Herzegovina, Guinea-Bissau

I don't think mdash or endash is appropriate here. Are there other Unicode dashes that should be covered?

Is there a use case for the name without a dash? I can see it both ways and don't have an issue having an alias without it.

or accents as in Åland Islands

Here I think it's fine to add an ASCII alias too

and just alternative spellings like Faeroes.

Yeah this should be an alias too.

from normalize_country.

sshaw avatar sshaw commented on June 15, 2024

Checkout master for some updates to this.

Is Palestine, State of part of a standard somewhere?

from normalize_country.

wu-lee avatar wu-lee commented on June 15, 2024

Thanks, will check, maybe I can remove some hacks!

The Carmen gem mentioned in the Readme for this project uses the Debian ISO-3166-1 data as a source: and I notice that data includes "Palestine, State of". I just happen to know where to find that - I've not gone to the ISO standard itself to check, which is presumably the most definitive.

My current use-case is to create a SKOS vocabulary of terms (in RDF) for the International Coop Association's database of members' locations and/or territories. Their data is notionally based on the ISO-3166-1 country code system, but they currently use English language labels instead of IDs in their database, which we need to convert to country codes, Their particular set of labels they have includes "Palestine, State of" and "Côte d’Ivoire" with the non-ASCII backquote. I'm not sure where these labels come from originally. I would hazard a guess that the backquote may have been automatically inserted by Word or Excel or something similar.

from normalize_country.

sshaw avatar sshaw commented on June 15, 2024

The Carmen gem mentioned in the Readme for this project uses the Debian ISO-3166-1 data as a source: and I notice that data includes "Palestine, State of".

Thanks. At some point I will check that data to make sure it's included.

from normalize_country.

sshaw avatar sshaw commented on June 15, 2024

Note to self: #9 (comment)

from normalize_country.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.