wicg / handwriting-recognition Goto Github PK

View Code? Open in Web Editor NEW

72.0 12.0 15.0 313 KB

Handwriting Recognition Web API Proposal

Home Page: https://wicg.github.io/handwriting-recognition/

License: Other

Makefile 100.00%

handwriting-recognition shipping-chromium

handwriting-recognition's Introduction

Handwriting Recognition API

Welcome to the WICG repository for the Handwriting Recognition API. This API aims to bring advanced handwriting recognition (e.g. the ones used by many handwriting input methods) capabilities to the Web Platform to enable seamless and integrated user experiences.

Contributing

Please read CONTRIBUTING.md.

Feedbacks and suggestions are welcome and important, in the form of GitHub issues or pull requests.

handwriting-recognition's People

Contributors

Stargazers

Watchers

Forkers

wacky6 yuhonglin tekka global-localhost global19 global19-atlassian-net tomayac christianliebel yersultan-17 qpc-github tf-185 seanpm2001 lenusic dinodinu

handwriting-recognition's Issues

Language fallbacks

https://github.com/WICG/handwriting-recognition/blob/main/explainer.md#recognition-hints

If there's no dedicated models for that language tag, the recognizer falls back to the macro language (zh-CN becomes zh). If the macro language is not supported, the recognizer fall back to the default language of the browser (i.e. navigator.language).

What if the browser default language is set to something that the recogniser cannot deal with?

I suspect that, like for hyphens in CSS, it must be required that a language be selected for this to work, and the selection would be from a list of languages supported by the recogniser.

Definition of grapheme cluster

https://github.com/WICG/handwriting-recognition/blob/main/explainer.md#recognition-hints

each string represents a grapheme cluster (a user-visible character)

A user can see all visible characters, but the grapheme (to which the grapheme cluster attempts an approximation) is a user-perceived unit of the orthography, and usually specific to a given editing operation.

You may do better here to indicate that each string contains a 'grapheme' (a user-percieved unit of the orthography), which may also correspond to a Unicode 'grapheme cluster'.

Note that, Unicode grapheme clusters don't cover all user-perceived graphemes, esp. in many Brahmi-derived scripts.

It would certainly be useful to consider some character groupings as units, eg. Tamil கு (ku) since it's hard to separate the constituents. Whether its necessary or desirable to treat Balinese ᬓ᭄ᬱᭀ as a single unit i'm not so sure.

Note however that the Balinese, like many complex scripts, will require recognised glyphs to be paired and reordered to compose the actual character sequence (the first and last glyphs above are a single unicode code point).

hth

Consider using DOMHighResTimestamp instead of DOMTimeStamp

We're considering our options RE DOMTimeStamp, and wondering what it's used for. It seems less well-defined than DOMHighResTimestamp. Would y'all consider switching over?

See whatwg/webidl#2 for discussion.

Incubation status

Hi! Just checking status on this incubations as there has been limited activity for the last few years.

Has the incubation stalled or is this something that is still being pursued?

If it's stalled then let us know and we can archive the repo (we can always unarchive it later if there is renewed interest).

Handling confusable characters

https://github.com/WICG/handwriting-recognition/blob/main/explainer.md#the-prediction-result

The prediction result may contain (if the implementation choose to support):

alternatives: A list of JavaScript objects, where each object has a text field. These are the next best predictions (alternatives), in decreasing confidence order. Up to a maximum of alternatives (given in hints) strings. For example, the first string is the second best prediction (the best being the prediction result).

I suspect that for some languages the may will be a must, since many orthographies use the same glyphs to represent more than one semantic, eg. some use letters as numbers, some have indistinguishable glyphs for different underlying code points or sequences (eg. par excellence Mongolian, but also many others), and some have glyph shapes that can be ambiguous if not drawn carefully because the differences are very small.

The model may not require alternatives for certain languages, but should not preclude them for others (which may not yet be supported).

Use proper BCP 47 language tags for Chinese

https://github.com/WICG/handwriting-recognition/blob/main/explainer.md

Languages are identified by IETF BCP 47 language tags (e.g. en, zh-CN). If there's no dedicated models for that language tag, the recognizer falls back to the macro language (zh-CN becomes zh).

zh-CN is presumably meant to indicate Simplified Chinese, which is also used in Singapore. That's why it is better to use zh-Hans as the language tag, rather than zh-CN (and zh-Hant, rather than zh-TW).

Please change the example.

Text segmentation will vary by language

https://github.com/WICG/handwriting-recognition/blob/main/explainer.md#the-prediction-result

segmentationResult: [TODO] Come up with a way to represent text segmentation.

Just a reminder that segmentation strategies can be very different across languages.

Some scripts don't separate words (at all), some do so with special wordspace characters, rather than spaces. Some don't have sentence punctuation, but separate phrases with gaps, or use punctuation in somewhat different ways, some not only don't separate words but also combine letters at the end+start of a word, etc.

So some flexibility will be needed, and it's really important to avoid the trap of relying on spaces to indicate segmentation boundaries.

TypeScript Definitions

If it's helpful for further development, I've created TypeScript definition files based on the current WebIDLs here: https://github.com/christianliebel/handwriting-textarea/blob/main/handwriting-recognition.d.ts

Just wanted to bring this to your attention, feel free to close this issue right away. 😇

Consider an API with fewer mutable classes

Hey there,

From the perspective of idiomatic JavaScript, I was wondering why the API has so many classes and custom add/remove/clear methods, instead of using objects and arrays.

That is, instead of the example at https://github.com/WICG/handwriting-recognition/blob/main/explainer.md#perform-recognition, I would have expected something like this:

// Create a new stroke. It's a plain JS array.
const stroke = [];

// Add a point.
const point = { x: 84, y: 34, t: 959 };

// The point dictionary is added to the stroke array
stroke.push(point)

// Modifying a point added to a stroke does have an effect.
point.x = newX    // This does work.
stroke[0].x = newX    // This also works.

// The point's value has changed
stroke[0].x === 84    // => false

// Point's time attribute is optional.
stroke.push({ x: 93, y: 54 })

// Create a new drawing. It's a JS array of strokes.
const drawing = [stroke];

// Add more points to the stroke.
stroke.push({ x: 93, y: 54, t: 1013 });

// Get predictions of the partial drawing.
// This will take into account both points that were added to the stroke.
await handwriting.getPrediction(drawing);

// The returned value is the same as for the original drawing.getPrediction() API.

// Add a new stroke.
const stroke2 = []
stroke2.push({x: 160, y: 39, t: 1761});
drawing.push(stroke2);

// Get all strokes:
drawing
// => [stroke, stroke2]

// Delete a previous stroke.
drawing.splice(0, 1);

// Get a new prediction.
await handwriting.getPrediction(drawing)

// No need to free up resources, since it's just a JS array of objects, which the GC handles normally.

I'm not sure why the current API has so many mutable classes and extra copies. All the actual work seems to happen in the async drawing.getPrediction() (or, in my version, handwriting.getPrediction(drawing)). So transforming that data into HandwritingStroke and HandwritingDrawing classes seems like extra work.

Are there implementation reasons why these mutable classes are necessary? If so, it'd be good to be clear about them in the explainer, and especially to explain why they are necessary in all implementations and not just a Chromium implementation limitation. I found https://github.com/WICG/handwriting-recognition/blob/main/explainer.md#alternative-api-design which maaaybe explains why a HandwritingDrawing class is necessary (although I had to read a lot between the lines; I'm guessing the idea is that you have some sort of side table mapping earlier versions of the drawing to recognition results, and then reuse those partial results in future calls to getPrediction()? And that's impossible to do based on normal JS objects---for some reason?). But I'm not sure it explains why the HandwritingStroke class is necessary.

Dev interest in using Handwriting Recognition API

Hi web developers,

We would like to gauge interest in using this new Handwriting Recognition API, as described in the explainer

If you are interested in potentially using this new API, please reply to this thread, including your affiliation to a company or organisation or app, if any.

Thanks!

Text direction needs to be taken into account

Not only will the recogniser need to take into account the language, but it will be unable to decipher the text unless it understands the glyphs it recognises proceed from right-to-left or left-to-right or vertically top-to-bottom with lines stacked LTR or RTL.

This includes orthographies that are generally written in one direction, but that have embedded text that runs in the opposite direction, and sometimes embedded text within that.

To some extent the recogniser will be able to apply the Unicode bidi algorithm to reverse engineer the logical character sequence, but in other bidirectional cases this will not be sufficient. Also it would probably be beneficial to indicate for the recogniser the overall scanning direction for the text being entered, for which it may be useful to apply a directional label, in a similar way to how one does this for language.