Coder Social home page Coder Social logo

Comments (17)

jbezos avatar jbezos commented on July 29, 2024

The problem is Amiri uses the Private Use Area. Once the box is built, there is no information about the direction of the glyphs. If the direction is set explicitly with \babelcharproperty (I don't know the exact range), the text is rendered correctly:

\documentclass{article}
\usepackage[bidi=basic]{babel}
\babelprovide[import=ar-DZ, main]{arabic}
\babelfont{rm}{Amiri}

% The following range is just a guess:
\babelcharproperty{980000}[990000]{direction}{al}

\begin{document}

\setbox0=\hbox{نص عربي}

\unhbox0

\end{document}

You may use this macro as a workaround, but I must investigate a better solution.

from babel.

u-fischer avatar u-fischer commented on July 29, 2024

@jbezos Marcel (@zauguin) is investigating bidi in luaotfload too. Perhaps some coordination is needed/useful, see e.g. latex3/luaotfload#82.

from babel.

jbezos avatar jbezos commented on July 29, 2024

Yes, I know (and these are great news), but I just got back from a trip. I still have to look at it.

from babel.

khaledhosny avatar khaledhosny commented on July 29, 2024

The problem is Amiri uses the Private Use Area.

What you are seeing is the way luaotfload handles glyph indices that don’t directly map to Unicode code points, Amiri itself does not use any PUA.

The real problem here is trying to apply BiDi on output glyphs, which by definition can not work since BiDi needs the original characters in their original order. Either unboxed nodes needs to carry the original direction information as well and BiDi algorithm need not be run again on them (needs changes to the engine I think, probably not easy to implement), or unboxing should to be avoided altogether since it comes from a very simplistic view of modern text layout.

from babel.

khaledhosny avatar khaledhosny commented on July 29, 2024

(also excuse my ignorance, but why unbox a box instead of using it as is?)

from babel.

davidcarlisle avatar davidcarlisle commented on July 29, 2024

@khaledhosny

(also excuse my ignorance, but why unbox a box instead of using it as is?)

latex (even with no packages loaded) does that quite a lot, eg list labels are set as

\hbox to\labelwidth {\unhbox\@tempboxa}%

which shows the typical case, you box up text to find its natural width but then having decided by some calculation which width to use for all of them you unbox the texts allowing any glue within them to stretch or shrink to hit the target width.

The other use of course is to allow line breaking. (Although there are possibly less cases where you need to box first but still allow lines to break)

from babel.

khaledhosny avatar khaledhosny commented on July 29, 2024

Well, then this needs to be dealt with as BiDi can’t be applied to the unboxed content in this case. BiDi would need to be applied at box creation time then the direction information somehow need to be carried over when unboxed.

This still might not give the expected output, since if the unboxed content became part of a larger chunk of text (e.g. part of a paragraph) the result of applying BiDi algorithm can be different than when processed standalone.

Like I said, boxing and unboxing seems to stem from a simplistic view of text layout (understandable given its age) and should be avoided. For example, the width of a piece of text does not necessarily equal the sum of the widths of its parts cut at different places. Multiple factors can changes the final glyphs (and thus the widths) depending on what context the text was in.

from babel.

davidcarlisle avatar davidcarlisle commented on July 29, 2024

@khaledhosny yes I was not disagreeing with what you are saying but (as you note above) doing this right might require engine changes (or in luatex more lua code, I guess...)

For example, the width of a piece of text does not necessarily equal the sum of the widths of its parts cut at different places.

yes that's true even with english of course but with the tools available in current tex engines if you want to implement a requirement like "set all the list labels to the width of the widest label" then boxing and unboxing is the only tool available.

similarly \makebox[2\width]{foo bar} to set some text to twice its natural width boxes it first (to work out \width) then unboxes it.

That said, it ought to be possible in the macro layer to (somehow) delimit text that has already been bidi processed and so mark directionality when unboxing. I think...

from babel.

khaledhosny avatar khaledhosny commented on July 29, 2024

Babel could set an attribute indicating the resolved direction of the node (and hope it gets carried over correctly during text layout) and use it to reconstruct the direction of the unboxed content.

from babel.

jbezos avatar jbezos commented on July 29, 2024

What you are seeing is the way luaotfload handles glyph indices that don’t directly map to Unicode code points, Amiri itself does not use any PUA.

I see. Thank you. Anyway, luaotfload does, so problem it's still the PUA. I think this answers my question in #28 -- the font doesn't contain information about the bidi class of these glyphs.

The problem is the bidi algorithm is applied twice, when the box is built (the right place), and when it's unboxed. As the workaround with \babelcharproperty shows, with the proper bidi classes everything seems fine (babel assumes ‘L’ if unknown), so the solution, which I'm currently investigating, passes for either another default (eg, assume ‘AL’ is the script of the language is Arabic, a very crude guess) or deactivate somehow the bidi algorithm with glyphs in the PUA. I'm still not sure which solution actually works.

from babel.

jbezos avatar jbezos commented on July 29, 2024

Or a flag with a lua property meaning ‘already processed with bidi’.

from babel.

khaledhosny avatar khaledhosny commented on July 29, 2024

What you are seeing is the way luaotfload handles glyph indices that don’t directly map to Unicode code points, Amiri itself does not use any PUA.

I see. Thank you. Anyway, luaotfload does, so problem it's still the PUA. I think this answers my question in #28 -- the font doesn't contain information about the bidi class of these glyphs.

BiDi class is a character property not a glyph one i.e. glyphs don’t have a direction, the underlying characters do (it doesn’t help that the boundary between characters and glyphs is blurry in LuaTeX).

from babel.

jbezos avatar jbezos commented on July 29, 2024

Yes, I know, but as far as Unicode is concerned, the initial, medial, final and isolated forms of a letter are the same character, and at this point of the internal process LuaTeX only sees the corresponding glyphs, which is was I was talking about. Anyway, this is just terminology.

from babel.

khaledhosny avatar khaledhosny commented on July 29, 2024

Not sure I follow, but if the code is getting glyphs instead of characters then it is being run at the wrong stage of text processing i.e. the BiDi algorithm should be applied before any font layout.

from babel.

jbezos avatar jbezos commented on July 29, 2024

And it's applied at the right place, before the font layout, so the box is correct. The problem is the bidi algorithm is applied again when the box is unboxed, when characters have been already converted to glyphs in the internal processing.

from babel.

seloumi avatar seloumi commented on July 29, 2024

Thanks @jbezos this solve the issue 👍

from babel.

jbezos avatar jbezos commented on July 29, 2024

But it's still a hack. One of the reasons the current bidi algorithm is named basic is the handling of boxes, either boxed or unboxed (both have their own issues), is incomplete (even if very often works).

from babel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.