Comments (17)
The problem is Amiri uses the Private Use Area. Once the box is built, there is no information about the direction of the glyphs. If the direction is set explicitly with \babelcharproperty
(I don't know the exact range), the text is rendered correctly:
\documentclass{article}
\usepackage[bidi=basic]{babel}
\babelprovide[import=ar-DZ, main]{arabic}
\babelfont{rm}{Amiri}
% The following range is just a guess:
\babelcharproperty{980000}[990000]{direction}{al}
\begin{document}
\setbox0=\hbox{نص عربي}
\unhbox0
\end{document}
You may use this macro as a workaround, but I must investigate a better solution.
from babel.
@jbezos Marcel (@zauguin) is investigating bidi in luaotfload too. Perhaps some coordination is needed/useful, see e.g. latex3/luaotfload#82.
from babel.
Yes, I know (and these are great news), but I just got back from a trip. I still have to look at it.
from babel.
The problem is Amiri uses the Private Use Area.
What you are seeing is the way luaotfload handles glyph indices that don’t directly map to Unicode code points, Amiri itself does not use any PUA.
The real problem here is trying to apply BiDi on output glyphs, which by definition can not work since BiDi needs the original characters in their original order. Either unboxed nodes needs to carry the original direction information as well and BiDi algorithm need not be run again on them (needs changes to the engine I think, probably not easy to implement), or unboxing should to be avoided altogether since it comes from a very simplistic view of modern text layout.
from babel.
(also excuse my ignorance, but why unbox a box instead of using it as is?)
from babel.
(also excuse my ignorance, but why unbox a box instead of using it as is?)
latex (even with no packages loaded) does that quite a lot, eg list labels are set as
\hbox to\labelwidth {\unhbox\@tempboxa}%
which shows the typical case, you box up text to find its natural width but then having decided by some calculation which width to use for all of them you unbox the texts allowing any glue within them to stretch or shrink to hit the target width.
The other use of course is to allow line breaking. (Although there are possibly less cases where you need to box first but still allow lines to break)
from babel.
Well, then this needs to be dealt with as BiDi can’t be applied to the unboxed content in this case. BiDi would need to be applied at box creation time then the direction information somehow need to be carried over when unboxed.
This still might not give the expected output, since if the unboxed content became part of a larger chunk of text (e.g. part of a paragraph) the result of applying BiDi algorithm can be different than when processed standalone.
Like I said, boxing and unboxing seems to stem from a simplistic view of text layout (understandable given its age) and should be avoided. For example, the width of a piece of text does not necessarily equal the sum of the widths of its parts cut at different places. Multiple factors can changes the final glyphs (and thus the widths) depending on what context the text was in.
from babel.
@khaledhosny yes I was not disagreeing with what you are saying but (as you note above) doing this right might require engine changes (or in luatex more lua code, I guess...)
For example, the width of a piece of text does not necessarily equal the sum of the widths of its parts cut at different places.
yes that's true even with english of course but with the tools available in current tex engines if you want to implement a requirement like "set all the list labels to the width of the widest label" then boxing and unboxing is the only tool available.
similarly \makebox[2\width]{foo bar}
to set some text to twice its natural width boxes it first (to work out \width
) then unboxes it.
That said, it ought to be possible in the macro layer to (somehow) delimit text that has already been bidi processed and so mark directionality when unboxing. I think...
from babel.
Babel could set an attribute indicating the resolved direction of the node (and hope it gets carried over correctly during text layout) and use it to reconstruct the direction of the unboxed content.
from babel.
What you are seeing is the way luaotfload handles glyph indices that don’t directly map to Unicode code points, Amiri itself does not use any PUA.
I see. Thank you. Anyway, luaotfload
does, so problem it's still the PUA. I think this answers my question in #28 -- the font doesn't contain information about the bidi class of these glyphs.
The problem is the bidi algorithm is applied twice, when the box is built (the right place), and when it's unboxed. As the workaround with \babelcharproperty
shows, with the proper bidi classes everything seems fine (babel assumes ‘L’ if unknown), so the solution, which I'm currently investigating, passes for either another default (eg, assume ‘AL’ is the script of the language is Arabic, a very crude guess) or deactivate somehow the bidi algorithm with glyphs in the PUA. I'm still not sure which solution actually works.
from babel.
Or a flag with a lua property meaning ‘already processed with bidi’.
from babel.
What you are seeing is the way luaotfload handles glyph indices that don’t directly map to Unicode code points, Amiri itself does not use any PUA.
I see. Thank you. Anyway,
luaotfload
does, so problem it's still the PUA. I think this answers my question in #28 -- the font doesn't contain information about the bidi class of these glyphs.
BiDi class is a character property not a glyph one i.e. glyphs don’t have a direction, the underlying characters do (it doesn’t help that the boundary between characters and glyphs is blurry in LuaTeX).
from babel.
Yes, I know, but as far as Unicode is concerned, the initial, medial, final and isolated forms of a letter are the same character, and at this point of the internal process LuaTeX only sees the corresponding glyphs, which is was I was talking about. Anyway, this is just terminology.
from babel.
Not sure I follow, but if the code is getting glyphs instead of characters then it is being run at the wrong stage of text processing i.e. the BiDi algorithm should be applied before any font layout.
from babel.
And it's applied at the right place, before the font layout, so the box is correct. The problem is the bidi algorithm is applied again when the box is unboxed, when characters have been already converted to glyphs in the internal processing.
from babel.
Thanks @jbezos this solve the issue 👍
from babel.
But it's still a hack. One of the reasons the current bidi
algorithm is named basic
is the handling of boxes, either boxed or unboxed (both have their own issues), is incomplete (even if very often works).
from babel.
Related Issues (20)
- `\@tempa` in `\@citex` can be overwritten.
- vietnamese.ldf's \ProvidesLanguage
- The cancel package doesn't work sometimes in Hebrew HOT 4
- \selectfont in math mode raises an error with onchar
- Shorthands create issues when creating new environments HOT 5
- Provide install instructions HOT 7
- Interaction between babel and ltxtable HOT 15
- hbox inside equation is problematic HOT 5
- An outdated link in doc
- Too much space before a comma in maths if [italian] is used HOT 9
- \endotherlanguage leaves @ignore globally true HOT 17
- TL2024 + latex or pdflatex + norsk = “! Missing number, treated as zero.” HOT 15
- [british]babel should spit out a warning of a noticeable kind if hyphen-english is missing HOT 9
- Tibetan line breaks HOT 3
- Problems with \Ref and babel (The Latex Companion 3rd edition p.78) HOT 2
- Rollback option for .ldf files HOT 5
- Estonia babel won't compile due to missing headtoname HOT 2
- Spurious space in welsh.ldf HOT 3
- \babelfont fails with some fonts as of recent update HOT 2
- Change first item in abjad alph counter for Algeria, Morocco and Tunisia. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from babel.