Coder Social home page Coder Social logo

locale-index-of's People

Contributors

arty-name avatar dependabot[bot] avatar snyk-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

ragaeeb

locale-index-of's Issues

Handling of surrogate pairs when Intl.Segmenter isn't available

Sorry to trouble you, again.

Currently, when Intl.Segmenter isn't available, it uses the following to segment the string

locale-index-of/index.js

Lines 58 to 65 in cb7f400

*segment(string) {
const { length } = string;
// have to use that instead of `for segment of string` because we need index of chars, not code points
for (let index = 0; index < length; index += 1) {
const segment = string[index];
yield { segment, index };
}
}

I just learned, however, that this won't handle surrogate pairs properly. For example,

[...segmenter.segment('𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡')]

Expected:

[
  { segment: '𝟘', index: 0 },
  { segment: '𝟙', index: 2 },
  { segment: '𝟚', index: 4 },
  { segment: '𝟛', index: 6 },
  { segment: '𝟜', index: 8 },
  { segment: '𝟝', index: 10 },
  { segment: '𝟞', index: 12 },
  { segment: '𝟟', index: 14 },
  { segment: '𝟠', index: 16 },
  { segment: '𝟡', index: 18 }
]

Actual:

[
  { segment: '\ud835', index: 0 },
  { segment: '\udfd8', index: 1 },
  { segment: '\ud835', index: 2 },
  { segment: '\udfd9', index: 3 },
  { segment: '\ud835', index: 4 },
  { segment: '\udfda', index: 5 },
  { segment: '\ud835', index: 6 },
  { segment: '\udfdb', index: 7 },
  { segment: '\ud835', index: 8 },
  { segment: '\udfdc', index: 9 },
  { segment: '\ud835', index: 10 },
  { segment: '\udfdd', index: 11 },
  { segment: '\ud835', index: 12 },
  { segment: '\udfde', index: 13 },
  { segment: '\ud835', index: 14 },
  { segment: '\udfdf', index: 15 },
  { segment: '\ud835', index: 16 },
  { segment: '\udfe0', index: 17 },
  { segment: '\ud835', index: 18 },
  { segment: '\udfe1', index: 19 }
]

The length and index of the string refer to code units, not code points. The iterator is actually the preferred method in almost all cases, as one doesn't normally want to deal with code units:

const segmenter = {
  *segment(string) {
    let index = 0;
    for (const segment of string) {
      yield { segment, index };
      index += segment.length;
    }
  }
}

In the case of string searching, I think it's actually fine to split at code units. It should still be able get a correct match.

But since it's a very easy fix I don't think there's any reason not to handle it properly, unless one wants to support IE, which doesn't support String.prototype[@@iterator]()...

An in-range update of tape is breaking the build 🚨

The devDependency tape was updated from 4.11.0 to 4.12.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

tape is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build could not complete due to an error (Details).

Commits

The new version differs by 9 commits.

  • 42c84d6 v4.12.0
  • 3e0a341 [Deps] update is-regex, string.prototype.trim
  • ba7e2b2 [Dev Deps] update eslint
  • f3a5925 [Tests] use shared travis-ci configs
  • 6e94800 [Deps] update deep-equal, glob, object-inspect, resolve, string.prototype.trim
  • 8150c3b [Refactor] use is-regex instead of instanceof RegExp
  • 24487cb add tap-nyc to pretty-reporters
  • c283615 [New] when the error type is wrong, show the message and stack
  • 44cbbf5 [Tests] add a test for the wrong kind of error

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

`localeIndexOf` does not work on multiple-char letters

In some locales, e.g. sk-SK there are letters that consist of more than one sign: ch is one letter in Slovak and calling:

const shouldNotFindAnything = localeIndexOf('ch', 'h', 'sk-SK');

should return -1. If we use e.g.

"ch".localeCompare("h", "sk-SK");

we are getting "1" that shows that collation in localeCompare works fine (for en it's comparing first "c" with "h" and returns -1.
Another test case that you can use to fix the problem is:

const shouldBeSeven = localeIndexOf('chodit hore', 'ho', 'sk-SK');

The first h on position 1 should be ignored as it is a part of ch. Expected result: 7. Current result: 1.

The reason for it is faulty Intl.Segmenter. It is not able to correctly segment characters in a locale. In the fix we would need to either make an issue on the segmenter and fix it or use some other, workaround function.

const segmenterSk = new Intl.Segmenter('sk-SK', { granularity: 'grapheme' });
Array.from(segmenterSk.segment("chh"));
// (3) [{…}, {…}, {…}]
// 0 :  {segment: 'c', index: 0, input: 'chh'}
// 1 :  {segment: 'h', index: 1, input: 'chh'}
// 2 :  {segment: 'h', index: 2, input: 'chh'}
// length :  3 [[Prototype]] :  Array(0)

Support decomposed forms of strings

It seems that it does not currently handle decomposed forms. Example:

const collator = new Intl.Collator('fr', { sensitivity: 'base' })
indexOf(collator, 'caf\u0065\u0301 au lait, caf\u00e9 au lait', 'cafe au lait')
// expected: 0
// actual: 15

A simple solution would be to use Intl.Segmenter to align the strings:

const indexOf_ = (segmenter, collator, string, substring) => {
    const substringLength = Array.from(segmenter.segment(substring)).length
    const segments = Array.from(segmenter.segment(string))

    for (let i = 0; i <= segments.length - substringLength; i++) {
        const potentialMatch = segments
            .slice(i, i + substringLength)
            .map(x => x.segment).join('')

        if (collator.compare(potentialMatch, substring) === 0) {
            return segments[i].index
        }
    }
    return -1
}

const segmenter = new Intl.Segmenter('fr', { granularity: 'grapheme' })
const collator = new Intl.Collator('fr', { sensitivity: 'base' })
indexOf_(segmenter, collator, 'caf\u0065\u0301 au lait, caf\u00e9 au lait', 'cafe au lait')
// => 0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.