Coder Social home page Coder Social logo

sindresorhus / transliterate Goto Github PK

View Code? Open in Web Editor NEW
283.0 7.0 19.0 28 KB

Convert Unicode characters to Latin characters using transliteration

License: MIT License

JavaScript 99.21% TypeScript 0.79%
transliteration latinization deburr unicode unicode-converter string-conversion node-module npm-package

transliterate's Introduction

transliterate

Convert Unicode characters to Latin characters using transliteration

Can be useful for slugification purposes and other times you cannot use Unicode.

Install

$ npm install @sindresorhus/transliterate

Usage

import transliterate from '@sindresorhus/transliterate';

transliterate('Fußgängerübergänge');
//=> 'Fussgaengeruebergaenge'

transliterate('Я люблю единорогов');
//=> 'Ya lyublyu edinorogov'

transliterate('أنا أحب حيدات');
//=> 'ana ahb hydat'

transliterate('tôi yêu những chú kỳ lân');
//=> 'toi yeu nhung chu ky lan'

API

transliterate(string, options?)

string

Type: string

String to transliterate.

options

Type: object

customReplacements

Type: Array<string[]>
Default: []

Add your own custom replacements.

The replacements are run on the original string before any other transformations.

This only overrides a default replacement if you set an item with the same key.

import transliterate from '@sindresorhus/transliterate';

transliterate('Я люблю единорогов', {
	customReplacements: [
		['единорогов', '🦄']
	]
})
//=> 'Ya lyublyu 🦄'

Supported languages

Most major languages are supported.

This includes special handling for:

  • Arabic
  • Armenian
  • Czech
  • Danish
  • Dhivehi
  • Georgian
  • German (umlauts)
  • Greek
  • Hungarian
  • Latin
  • Latvian
  • Lithuanian
  • Macedonian
  • Pashto
  • Persian
  • Polish
  • Romanian
  • Russian
  • Serbian
  • Slovak
  • Swedish
  • Turkish
  • Ukrainian
  • Urdu
  • Vietnamese

However, Chinese is currently not supported.

Related

transliterate's People

Contributors

alexxnb avatar dersimoezdag avatar hay avatar rhnorskov avatar richienb avatar silvandiepen avatar sindresorhus avatar vhpoet avatar yhdgms1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

transliterate's Issues

Update list of supported languages to reflect which replacements are disabled by default

I wanted to use Swedish replacements (actually Finnish, but for my purposes they are the same thing) and struggled for awhile, because I couldn't understand how to get them to work. In the end I commented out Latin and English, which conflicted with Swedish/Finnish.

If this is the intended way to change the transliteration, I would suggest adding "disabled (commented out) by default in replacements.js" to the supported languages list in the readme. I could do this if the issue is this straightforward.

Add Sindhi Language to it

Add Sindhi Language Support to it.
As I am native Sindhi Language Speaker ,Can I add support to it?

When Chinese characters are found, Can we leave it as-is instead of clearing it out?

When Chinese characters are found, Can we leave it as-is instead of clearing it out?

The reason why I'm asking for this is because if you clear it out there's no way I can apply a second filter to do a job that this library cannot cover. If you leave it as it is then I can apply another filter from another third party library.

Thank you very much guys! It's a great library and keep up the great work!

Language hint

Some languages have overlapping characters. To provide the most accurate result, we could accept a language hint and prefer that language when there's a conflict. You would still be able to use multiple languages in a string, but the provided one gets priority. For example, sv-SE to prioritize the Swedish replacement.

Case mismatch compared with String.normalize("NFD") when decomposing diacritics into canonical unicode points

FWIW, I automated some comparative checks on this repository's mappings:
https://github.com/sindresorhus/transliterate/blob/master/replacements.js

...and I discovered the following case inconsistencies compared with String.normalize() and "NFD" (Canonical Decomposition):

  • [ 'Ş', 's' ] => S
  • [ 'Ğ', 'g' ] => G
  • [ 'İ', 'i' ] => I
  • [ 'Ķ', 'k' ] => K

To reproduce in the shell or Javascript console:
node -e "console.log('Ş'.normalize('NFD').replace(/[\u0300-\u036f]/g, ''));"

Reference:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize

Swedish substitute list contains 1 extra letter

I was interested to why Swedish was included but none of the other Scandinavian languages and I spotted a letter I have never seen in Swedish before Ë. There is also no mention of Swedish on Wikipedia (English nor Norwegian or Swedish version).

// Disabled as it conflicts with German and Latin.
// Swedish
// ['å', 'o'],
// ['Å', 'o'],
// ['ä', 'a'],
// ['Ä', 'A'],
// ['ë', 'e'], <-- does not exist in any Scandinavian language
// ['Ë', 'E'], <-- does not exist in any Scandinavian language
// ['ö', 'o'],
// ['Ö', 'O'],

I'm Danish, lived in Sweden and speak Swedish. Currently I live in Norway and are picking up Norwegian as well.

If you want to get support for the rest of the Scandinavian languages, you are in luck. Both Norwegian and Danish share letters and translation to Latin letters, although we do not pronounce Latin letters the same.

Norwegian and Danish:

['æ', 'ae'],
['Æ', 'Ae'],
['ø', 'oe'],
['Ø', 'Oe'],
['å', 'aa'],
['Å', 'Aa'],

Error in Arabic transliteration? ("i" should be "a"?)

Shouldn't the i in fact be an a?

	['آ', 'a'],
	['أ', 'a'],

	['إ', 'i'],

	['ا', 'a'],

You may run the following tests in your shell / Javascript console:

The base / non-accented "a":

["ا", "a"]

node -e "console.log('ا'.codePointAt(0).toString(16)); console.log('ا'.normalize('NFD').replace(/[\u0300-\u036f]/g, '').codePointAt(0).toString(16));"
=>

627
627

First diacritic variant:

["أ", "a"]

node -e "console.log('أ'.codePointAt(0).toString(16)); console.log('أ'.normalize('NFD').replace(/[\u0300-\u036f]/g, '').codePointAt(0).toString(16));"
=>

623
627

Second diacritic variant:

["آ", "a"]

node -e "console.log('آ'.codePointAt(0).toString(16)); console.log('آ'.normalize('NFD').replace(/[\u0300-\u036f]/g, '').codePointAt(0).toString(16));"
=>

622
627

Third diacritic variant:

["إ", "i"]

node -e "console.log('إ'.codePointAt(0).toString(16)); console.log('إ'.normalize('NFD').replace(/[\u0300-\u036f]/g, '').codePointAt(0).toString(16));"
=>

625
627

Why i, when 627 (same as all other variants) which resolves to a when mapped?

https://github.com/sindresorhus/transliterate/blob/master/replacements.js

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.