Coder Social home page Coder Social logo

hyphenation's Introduction

hyphenation

Hackage Build Status

Configurable Knuth-Liang hyphenation using the UTF8 encoded hyphenation patterns provided by hyph-utf8

Usage:

>>> hyphenate english_US "supercalifragilisticexpialadocious"
["su","per","cal","ifrag","ilis","tic","ex","pi","al","ado","cious"]
>>> hyphenate english_US "hyphenation"
["hy","phen","ation"]
>>> hyphenate icelandic "va\240lahei\240avegavinnuverkf\230rageymslusk\250r"
["va\240la","hei\240a","vega","vinnu","verk","f\230ra","geymslu","sk\250r"]

Contact Information

Contributions and bug reports are welcome!

Please feel free to contact me through github or on the #haskell IRC channel on irc.freenode.net.

-Edward Kmett

hyphenation's People

Contributors

albertov avatar dnnx avatar ekmett avatar peti avatar phadej avatar ryanglscott avatar vshabanov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hyphenation's Issues

Single-character syllables in german

The values for defaultLeftMin and defaultRightMin are set to 2. How do those (wrong) hyphenations come about?

Ord-nungs-fort-schrit-t
Aa-chen-er-s
Aa-dor-f
Aal-fan-g

There are a couple of issues that don't seem right to me, regarding the german hyphenation. I don't know if the hyph-utf8 patterns are at fault.

embed flag should be default true

I was just bitten by #3 , in haskell/haskell-language-server#1976, where the entire HLS 1.2.0 release is missing the hyphenation data, and thus fails at runtime. Except that CI doesn't catch this, because CI builds everything from source, and thus has the files available.

I can empathize with not wanting to bloat binaries, but default behavior that silently fails at deployment time with no early warnings seems like an exceptionally bad choice.

hyphenation does not respect non-breaking space

Non breakable space is not respected by hyphenate. Example:

λ> hyphenate english_US "the\x00a0table"
["the\160table"]

(expected: ["the ta", "ble"]).

Other repl experiments to check how hyphenate behaves with multiword input:

λ> hyphenate english_US "organge dolphin"
["or","gange ","dol","phin"]

λ> hyphenate english_US "the table"
["the table"]

hyphenation is painful to deploy

When building an executable which links to hyphenation in one machine for deployment to a different machine I need to create a tree of directories in production mimicking the path where I built the executable in the dev machine and copy the data files to avoid errors in production like: /home/alberto/src/myproject/.cabal-sandbox/share/x86_64-linux-ghc-7.8.3/hyphenation-0.4/hyph-es.chr.txt: openFile: does not exist (No such file or directory)

Ideally I would like to deploy self-contained executables or at least have a configurable data directory.

Perhaps some Template Haskell could embed those data files in the compiled library itself?

Add hyphenate breaks at hyphens

I recently used hyphenation to add soft hyphens to gwern.net so I could enable fully-justified text on desktop Chrome/Chromium browsers. (Bizarrely, for many years now, Chrome has had hyphenation on mobile Android but not desktop, and the devs have dragged their feet with the excuse that they just can't figure out how to ship dictionary files for desktop browsers; I gave up waiting for them to fix it.)

A user reported that on Safari browsers, Safari would line-break a hyphen-separated word like "compile-time" but there would be two hyphens: the original hyphen and then the smaller line-breaking hyphen, presumably the soft hyphen. Other browsers correctly ignore the soft hyphen and show only the regular hyphen if they need to line-break things like "compile-time". But why was there a soft hyphen there to begin with?

It turns out that hyphenation inserts soft hyphens even at existing hyphens!

> H.hyphenate H.english_US "Compile-time"
["Com","pile-","time"]

I would expect instead a breaking like ["Com", "pile-time"]. There is no need to insert a soft hyphen at the existing hyphen, since that is where a justification algorithm would break anyway. It can only cause problems, and in the case of Safari, does.

My current workaround is a post-processing hack to string-replace any hyphen+soft-hyphen present: Data.Text.replace "-\173" "-" etc. But this does other hyphenation users no good.


On an additional note, it would be nice to have a utility function which takes a String/Text and returns it with soft hyphens inserted. My current implementation goes like this, and it's complex enough that I'm not convinced I'm doing it right:

T.pack $ unwords $ map (intercalate "\173" . H.hyphenate H.english_US{H.hyphenatorLeftMin=3}) $ words $ T.unpack s

(I don't know how much the lack of a native Text version hurts, but I'm sure it does my compile-times no good, anyway.)

Words are hyphenated incorrectly

First of all I'm not sure if it is a problem with library or data files from tex project. I find it more reasonable to post issue here first and then post to hyph-utf8 mailing list if it happens to be fault of data files.

Examples:

-- Names
hyphenate lithuanian "Darius" ["Da","rius"] -- The result is correct
hyphenate lithuanian "Jonas" ["Jonas"] -- Should be ["Jo", "nas"]
hyphenate lithuanian "Auksė" ["Auks\279"] -- ["Auk", "s\279"]
-- Nouns
hyphenate lithuanian "Bananas" ["Ba","nanas"] -- ["Ba", "na", "nas"]
hyphenate lithuanian "Stalas" ["Stalas"] -- ["Sta", "las"]
-- Verbs
hyphenate lithuanian "Bėgti" ["B\279gti"] -- ["B\279g", "ti"]
hyphenate lithuanian "Nebeprisikiškiakopūsteliaudavome"
["Ne","be","pri","si","ki\353","kia","ko","p\363s","te","liau","da","vome"] 
-- ["Ne","be","pri","si","ki\353","kia","ko","p\363s","te","liau","da","vo", "me"]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.