Coder Social home page Coder Social logo

typeset's Introduction

TeX line breaking algorithm in JavaScript

This is an implementation of the Knuth and Plass line breaking algorithm using JavaScript. The goal of this project is to optimally set justified text in the browser, and ultimately provide a library for various line breaking algorithms in JavaScript.

The paragraph below is set using a JavaScript implementation of the classic Knuth and Plass algorithm as used in TeX. The numbers on the right of each line are the stretching or shrinking ratio compared to the optimal line width. This example uses a default space of 1/3 em, with a stretchability and shrink-ability of 1/6 em and 1/9 em respectively.

The following paragraph is set by a browser using text-align: justify. Notice the lines in the paragraph have, on average, greater inter-word spacing than the Knuth and Plass version, which is successful at minimizing the inter-word spacing over all lines.

The browser also ends up with ten lines instead of the nine lines found by the Knuth and Plass line breaking algorithm. This comparison might not be completely fair since we don't know the default inter-word space used by the browser (nor its stretching and shrinking parameters.) Experimental results however indicate the values used in most browsers are either identical or very similar. The next section explains how the ratio values for the browser were calculated.

Measuring the quality of browser line breaks

Unfortunately there is no API to retrieve the positions of the line breaks the browser inserted, so we'll have to resort to some trickery. By wrapping each word in an invisible <span> element and retrieving its y position we can find out when a new line starts. If the y position of the current word is different from the previous word we know a new line has started. This way a paragraph is split up in several individual lines.

The ratios are then calculated by measuring the difference between the width of each line when text-align is set to justify and when it is set to left. This difference is then divided by the amount of stretchability of the line: i.e. the number of spaces multiplied by the stretch width for spaces. Although we don't know the actual stretchability we can use 1/6 em, just like the Knuth and Plass algorithm, if we only use it for comparison.

Assisted browser line breaks

The line breaking algorithm can also be used to correct the line breaks made by the browser. The easiest way to do is to split a text up into lines and adjust the CSS word-spacing property. Unfortunately, Webkit based browsers do not support sub-pixel word-spacing. Alternatively, we can absolute position each word or split the line into segmants with integer word spacing. You can see the latter approach in action on the Flatland line breaking example.

Examples

The line breaking algorithm is not only capable of justifying text, it can perform all sorts of alignment with an appropriate selection of boxes, glue and penalties. It is also possible to give it varying line widths to flow text around illustrations, asides or quotes. Alternatively, varying line widths can be used to create interesting text shapes as demonstrated below.

Ragged right and centered alignment

The following example is set ragged right. Ragged right is not simply justified text with fixed width inter-word spacing. Instead the algorithm tries to minimize the amount of white space at the end of each sentence over the whole paragraph. It also attempts to reduce the number of words that are "sticking out" of the margin.

Ragged left text can be achieved by using a ragged right text and aligning its line endings with the left border. The example below is set centered. Again this is not simply a centering of justified text, but instead an attempt at minimizing the line lengths over the whole paragraph.

Variable line width

By varying the line width for a paragraph it is possible to flow the text around illustrations, asides, quotes and such. The example below leaves a gap for an illustration by setting the line widths temporarily shorter and then reverting. You can also see that the algorithm chose to hyphenate certain words to achieve acceptable line breaking.

It is also possible to make some non-rectangular shapes, as shown in the examples below. In the first example, the text is laid out using an increasing line width and center aligning each line. This creates a triangular shape.

Using some basic math it is also possible to set text in circles or even arbitrary polygons. Below is an example of text set inside a circle.

To-do

The following are some extensions to the algorithm discussed in the original paper, which I intend to implement (at some point.)

  • Hanging punctuation. The following quote from the original paper explains how to implement it using the box, glue and penalty model:

    Some people prefer to have the right edge of their text look ‘solid’, by setting periods, commas, and other punctuation marks (including inserted hyphens) in the right-hand margin. For example, this practice is occasionally used in contemporary advertising.

    It is easy to get inserted hyphens into the margin: We simply let the width of the corresponding penalty item be zero. And it is almost as easy to do the same for periods and other symbols, by putting every such character in a box of width zero and adding the actual symbol width to the glue that follows. If no break occurs at this glue, the accumulated width is the same as before; and if a break does occur, the line will be justified as if the period or other symbol were not present.

  • Compare quality against line-breaking implemented by Internet Explorer's text-justify CSS property.

  • Figure out how to deal with dynamic paragraphs (i.e. paragraphs being edited) as their ratios will change during editing and thus visibly move around.

References

These are the resources I found most useful while implementing the line breaking algorithm.

typeset's People

Contributors

bramstein avatar dominictarr avatar fdb avatar fregante avatar macdada avatar mlang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

typeset's Issues

How to handle ul/li, blockquote indentation ?

Hi,

Very interesting work.

I've tried to apply it to some documents and noticed that some blocks are not being indented properly, in particular lists and blockquotes content is shifted to the right.
Is it possible to have their width reduced so the display becomes coherent ?

Demos with vars

This is all great to see again.

I'm interested, during this round of typographic bar-raising, in demonstrating uses for variations in concert with, or as a superior substitute for current practice of more traditional typography.

So, in the case of justification, I'd like to employ use of the x-transparency of letters and the word space, leaving the letter spacing and width axis out of it. The demo enclosed, from illustrator, shows how I can (manually), lengthen lines, via the word space, and shorten the lines with the XTRA axis.
just demo.pdf

In the case of justifying text in circles, other non-collumnar composition, or with variable column width down the page, in the 21st century version, I would like to demonstrate the use both the above and per-line linespacing appropriate to the varying line lengths, with the descenders reacting via YTDE, growing longer for lines of text, and shrinking for shorter lines of text, along with the line spacing.

Add some Documentation

I'm fascinated by the concept of this javascript typesetting, but after trying to understand it for about half an hour, I'm still not clear how to actually use this code. It would be nice if there was a Jquery Plugin (or there is one and I just didn't find it yet...), or a clear example, or some note how to use the typeset library.

Thanks for all the work, though ;-)

Errors in 'article' example

In order to get the example/article to display correctly (none of the five "canvas" samples show up), I needed to make a couple of corrections in the align() function of index.html:

  1. In recently added line devicePixelRatio = window.devicePixelRatio || 1; the semicolon must be changed to a comma: devicePixelRatio = window.devicePixelRatio || 1, or else bad things happen. This is clearly a coding error, as the next line format, nodes, breaks; is clearly meant to be the end of the var statement.
  2. A few lines later, I had to comment out the entire line canvas.scale(devicePixelRatio, devicePixelRatio);. If I don't, there's no sign that it ever gets past this statement (window.alert("I just did the scale"); doesn't show up). devicePixelRatio has a value of "1", so it shouldn't be making any difference. I'm not sure what's going wrong here, but since the scale factor is 1 anyway, no harm done.

This was tried on the current Firefox browser. It works great with the two changes (all five canvas samples now show up). Does it display OK (without change) on other browsers? Incidentally, the same "article" works OK on the frobnitzem/typeset fork.

Add: Is there anything I can do or set in Javascript to flag the errors I reported above? It just silently glided past these errors without saying a word (just not working as expected). I find that most unsatisfactory. Flagging undeclared variables, flagging syntax errors, and reporting runtime errors is the bare minimum for any decent language!

Something is either undefined or zero

Warning, this is a post by a complete idiot. meaning it probably has a dead simple answer.

jquery.hypher.js works fine just alone. but for whatever reason I can't get the version of hypher.js inside typset's example to work - keep getting the following error when I copy your files verbatim:

' Uncaught TypeError: Cannot set property 'en-us' of undefined en-us.js: line 33''
'Uncaught TypeError: Cannot read property 'patterns' of undefined hypher.js: line 11'

I messed around trying to add browser.pre and browser.post files from hypher.js' lib folder. but that created syntax problems of it's own. I also tried using the jquery.hypher.js instead of hypher.js inside typeset, no luck.

Mis-alignment of right edge of text when using Typeset and Hypher

Hi Bram,

Looking at your Flatland example, I note that many lines include a space before the closing </span tag, which leaves the right edge of the text slightly mis-aligned. I don't see this effect in your example without hyphenation, so wonder if this is the cause.

Do you think it's possible to get perfect alignment including hyphenation?

BTW - this project is very promising and just what I am looking for - tx so much for your efforts.

Regards, Rob

Flatland example has a line that wraps badly

An screenshot demonstrating the problem

- Windows 10 - Hi-DPI (devicePixelRatio of 2) - Reproduced in Firefox Nightly, Edge and Chrome. - The font it selected is [Gentium Basic from Google Fonts](https://fonts.google.com/specimen/Gentium+Basic). It looks _ghastly_ at 16px, apparently. `` will get you the right font for attempting to reproduce the issue (make sure to adjust the body font-family if you have Minion Pro or Gentium available, of course).

If you are unable to reproduce it I’m quite willing to assist further in great detail. I have a project I believe this will work very well for and I want to help iron out any issues like this so people that use my software don’t run into this sort of thing ever.

How do I use this?

It's a little confusing how this is actually supposed to be used. The examples are anything but clear.

flatland sample error

script type="text/javascript" src="../../lib/hypher.js"
and not
script type="text/javascript" src="../../lib/Hypher.js"

line breaking with optional word breaking if needed?

Hello there,
not sure cif this issue is already answered somewhere (not sure)

i want a variant of the Knuth-Plass/Tex line breaking with optimal hyphenation insertion (word breaking)

Specificaly i have a paragraph of text that i need to format in specifieed width with minimal lines but also optimal word breaks (meaning words can be broken/hyphenated if needed but optimaly to preserve visual/textual pleasing layout). i already have a greedy/local algorithm in place but i would like to change to the Tex algorithm.

Is this possible in typeset or maybe i ned to add this (and maybe create a PR)

Thank you

Incorrect flagged demerits value

In the Knuth-Plass article, the parameters used for typesetting Seminumerical Algorithms are given:

the consecutive-hyphens and adjacent-incompatibility demerits were α = γ = 3000

The default value used in this library, 100, isn't mentioned in the article.

status?

Does this require the old jQuery 1.4x series, or can it run with the latest version? Also, I noticed that it uses your Hypher project... that project has a lot of new code --- can it work with the latest version of it?

Kerning

I was thinking of using this library in a project I'm working on to create PDF documents in the web browser. (Although, since I see in other issues that you're working on a new version, I may hold off for now).

I was thinking about how to deal with kerning in the box/glue/penalty model, and wondering if you'd considered this. Take the word 'suckers'. The combined width of 'suckers' will be less than the combined separate widths of 'suck' and 'ers', because of kerning between the k and the e.

I'm surprised to see that Knuth & Plass make no mention of kerning, so perhaps I've missed something obvious here. But it seems to me that a solution would be to insert negative-width, fixed-width glue after each hyphen-related penalty, which will be stripped at the start of a line, but not otherwise.

Does this seem right to you?

All the best,
George

Some questions

Hi,

This is not an issue, I apologize if this is an abuse of issue tracker.
I'm currently using your work (great job !) in a little project of canvas text editor.
I've some questions, I keep trying to find answers from my own but in case you have some clues...

  • When the paragraph of your example has a line width of "80" it works if align is "center", but for getting breaks calculated when align is "left" I have to increase tolerance. Is this normal ?
  • Does it make sense to change the algorithm slightly so that it adds a break even if there is no penalty when the length of a box node is greater than the row in which it should be displayed ? (i.e. non hyphenable strings like "aaaaaaaaaaaaaa").

Thanks in advance,
Nicolas.

Global functions could conflict with other libraries

Hi!

Your library is great and I already use it in a project. Unfortunately it has a little (?) drawback: it uses global javascript functions/objects, which names might conflict with other libraries.

I'm talking about formatter, linebreak and LinkedList.

An easy solution is to move them into one namespace, which should make conflicts less likely.

I have already done the job for my projects and will send you a pull request, so you could use it if you want to. I can squash the commits into one, if you prefer to.

I moved formatter to Typeset.formatter, linebreak to Typeset.linebreak and LinkedList to Typeset.LinkedList.

If you have questions or issues with that, please contact me.
David

Do line lengths work?

While trying to figure out why I couldn't set more than two line lengths in the Perl version (Text::KnuthPlass), PhilterPaper/Text-KnuthPlass#7, I decided to try out the lineLengths setting in your Flatlands.html example. As shipped, it is [], so I tried one, two, and three values (apparently in Points, or possibly Pixels). It seems to usually only affect the very first line, and usually not any other, although some lists of values seemed to have strange effects several paragraphs down, including loss of right-justification. Note that the first paragraph is very short, usually only two lines.

  1. Can anyone confirm that lineLengths actually work, and give a brief example (such as with Flatland.html)?
  2. Are lineLengths supposed to start over at each paragraph, or does it just keep going? If I give 5 lengths, it appears to continue into the second paragraph (as ragged right), and then the third paragraph and on are back to full width.

I want to make sure I'm using lineLengths correctly, and that the Javascript (typeset) code works, before I go through a lot of labor cross-checking against the Perl (and C) version I recently picked up support for. It would be very disappointing to find that I can only give two line lengths (e.g., indented and normal paragraph lines), and not be able to do image and aside insets, or non-rectangular paragraph shapes, as promised in the documentation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.