Comments (29)
Hi Marc, thanks for filing this. We would be prepared to offer, or contribute to, a bug bounty for this issue to be fixed.
from mpdf.
After quite some digging, I was able to add the configuration (CSS params) and even have the code in place to detect whether we'd create orphans or widdows – see here. Now my problem is, that we'd be doing that detecting as reflowing while we are already kinda printing stuff to the buffer – and I was simply not able to get it to put a page break before the block we are currently in processing: changes of margin, padding all had no effect anymore (only pre flowing...) and any break I did still spit out the buffered text before doing the page break. I am super close, just not familiar enough with mpdf...
If someone want to give me a hint, which of the many things I have to tweak how, to make a page break before the block I am currently processing, I am sure I can this done in no-time.
from mpdf.
Thanks, great work! It may be that you've uncovered an aspect of the mPDF design which requires a rethink for the 7.0 branch. By the way, in the CSS spec http://www.w3.org/wiki/CSS/Properties/widows the word 'widows' is spelled with only one 'd'.
from mpdf.
Ha. right. Widows – one 'd'. Will fix that.
I was wondering about that in general. As the CSS specs states, that the default value is '2' (if not specified) that means that a bunch of previously rendered PDFs, which contained widows and orphans, will render differently after this has been implemented. One could consider that breaking changes. I was wondering if we maybe want to put the feature behind a command-line-flag or something – but that is only a bother once it can be finished ;) .
from mpdf.
In general I don't think mPDF is used with command line flags. What I would suggest is that in the mPDF configuration file config.php there is a default widows and orphans setting of 1 which produces the same behaviour as the previous mPDF 6.0 release, i.e. no widows and orphans protection. This would therefore not be a breaking change for users. However there might be unintended consequences of changing the page/column break model.
What if when a widow or orphan is detected, the page break or column break is applied within the current block rather than before the current block? That might mean the block has to be at least as large as the number of lines specified in the widows and orphans settings.
from mpdf.
What if when a widow or orphan is detected, the page break or column break is applied within the current block rather than before the current block? That might mean the block has to be at least as large as the number of lines specified in the widows and orphans settings.
Like, let's say, we have a paragraph of 7 lines and 6 would fit on one page, leaving a widow on the page after? Well, As of now, I am only able to tell that once we have already printed line 6, and as we can't move back and "unprint" them at that point (it seems), it is still the same issue.
But yes, I was thinking of adding support for more complex behaviors (which might possibly be configuration options). So in this case, it breaking at line 5 and printing 6 and 7 on the second page (if configured to do so). Not actually that hard to do, once I am able to put a page break where I want 😝 .
What I would suggest is that in the mPDF configuration file config.php there is a default widows and orphans setting of 1 which produces the same behaviour as the previous mPDF 6.0 release, i.e. no widows and orphans protection.
In light of that, I'd actually suggest to have the configuration option a number defining the way to deal with them (allows to have more complex behaviors later, too, like letter spacing ;) ). Putting it per default to 0
(don't deal with widows or orphans), 1
as "break before paragraph and 2
meaning "smart break the paragraph if possible"...
from mpdf.
Your proposed configuration option seems like a totally different setting to a CSS default, I can see the usefulness of it though.
That implies that each block cannot be fully committed to the document before it has passed a widows and orphans check, if that check is enabled.
There's an interesting comment at line 23599 of https://github.com/mpdf/mpdf/blob/development/mpdf.php as follows:
If page-break-inside:avoid section has broken to new page but fits on one side - then move
Maybe debugging page-break-inside:avoid would shed some light on how this is meant to be done.
from mpdf.
Your proposed configuration option seems like a totally different setting to a CSS default, I can see the usefulness of it though.
Not entirely. The spec states that orphan and widow have to taken care for whenever you break, but it also contains a "best breaking practice" section, which states, what is a desired behavior on how to break exactly. That would be closely resembling what I outlined as 2
. Breaking the entire paragraph would still totally be according to spec though, as that paragraph is a "should" not a "must".
If page-break-inside:avoid section has broken to new page but fits on one side - then move
That is one of the starting points I had. The thing is, that here it is directly defined at parsing level, that this paragraph can't be broken. The (ehem, rather hackish) way this is achieved is by forcing a page break before processing and then move it back up one page if there is still enough space left on the page. I was thinking of doing something similar, but that means we'd have to break before every paragraph and move them back up after we are done processing, according to the rules of orphans and widows. While that could be understood as a sufficient approach, it is a rather big change in the way processing works right now and I didn't want to do that. Also, it feels wrong – create each paragraph on their own page and then move them back once you know its size ....
from mpdf.
Thank you Benjamin for giving this a go!
Everyone: progress can be followed here:
https://github.com/ligthyear/mpdf/commits/orphan-support
from mpdf.
thanks, @marclaporte .
But unless there are any new ideas on how to solve this, coming up here in this conversation, I don't really have any way to continue ... so work is on a standby until then ...
from mpdf.
Hi Ben, hi Marc, I have talked to the Booktype developer team about holding an online workshop to devise the best solution, would you have some time this week to participate?
In principle, as we know the starting position of a paragraph, the number of characters, the column width and the line height, do you think it would be possible to determine in advance which paragraphs are going to break over the column or page end? Or would it be simpler to create every paragraph on the next column or page and move it back it up, as Ben mentions above? What I like about the latter solution is that we will be very sure about what the paragraph contains before we finalise its placement.
from mpdf.
To all: There will be a conference call Thursday. Check time for your city:
http://www.timeanddate.com/worldclock/fixedtime.html?msg=mPDF%20Widows%20and%20Orphans&iso=20151112T15
The call will be here (or perhaps another solution)
https://meet.jit.si/mPDF
from mpdf.
It'll be 2am on my side of the world so I'll miss it (would love to read a summary of how the meeting went though).
from mpdf.
I'll be there, thanks for arranging!
from mpdf.
Hi everyone!
I did record the conference with intent (and permission) to upload for the community, but I didn't set my screen recorder properly, and the default setting is to record my voice, but not the others. I am sorry about this. I will make sure it works next time.
We were 4:
- Benjamin Kampmann (ligthyear)
- Daniel James, Booktype
- Aleksandar Erkalović, Booktype lead developer (aerkalov)
- Marc Laporte, Tiki Wiki CMS Groupware
There was a great explanation of the challenge to do this properly. If it was easy, it would already have been done.
As a follow-up:
- Aco and Benjamin have some things to try out.
- Daniel will do some research on how other software handles this
We'll do another such conference call in a few weeks to discuss community organisation and roadmap. Picking a time for this will be tricky given Jake is in Australia. Please see:
https://github.com/mpdf/mpdf/wiki/Community-Conference-Calls
from mpdf.
Great. Nice work @marclaporte!
from mpdf.
@ligthyear: any interesting news? Thanks!
from mpdf.
Hi Marc, I've been discussing the issue with Mark Lewis @thnkloud9 from our team, we will report back shortly.
from mpdf.
Hi @marclaporte. As @danielhjames mentioned, I've been looking into the issue and catching up the progress so far. I've forked and started estimating a solution that does not involve adding extraneous pages or post processing. It will calculate the required and available space for each block, accounting for current line height, padding, borders, and the orphans and widows limit config vars, before printing any of the block elements........at least in theory so far ;-)
Still not certain of level of effort at the moment, as this thing is a beast of a "party like its 1999" PHP 4 hot mess. Although I am guessing I should have a PR to submit for review early next week.
from mpdf.
proposed solution uses:
function EstimateFlowingBlockWriteLines($s, $sOTLdata)
which just clones the current object ($this) and runs WriteFlowingBlock($s, $sOTLdata) on the cloned copy ($mock) to determine required lines before actually running them on the current object ($this).
from mpdf.
Any news? Thanks!
from mpdf.
Hi guys,
There has been some progress here:
This PR fixes all the reported issues in examples/example02_CSS_styles.php. However, there are still several issues parsing examples/amnesty/amnesty2014-report-english-litho-full.php file, which is a multi-column document. I've fixed many of these issues, but currently, I have not been able to fix issues on pages:
38, 51, 55, 56, 71, 75, 84, 86, and several others.
All of which appeared only after resolving other issues with multi-column widow and orphan support. I've added additional debug output to help troubleshooting, however I'm finding that I'm running into a slippery slope of new issues related to multi-column documents, and not sure I have the time to identify and fix all of these.
from mpdf.
I've only used multi-columns a handful of times and I always have to compromise on style and aesthetics because of bugs with it. It's one component of mPDF that needs a full overhaul.
from mpdf.
I've had no problems with styles in multi-column PDFs when using mPDF, they worked as expected. My only tip would be that if you want the baseline grid to align reliably between columns, any fonts larger or smaller than the body font size need to have a line height which is the same as or an exact multiple of the body font line height.
from mpdf.
Where can I find a copy of examples/amnesty/amnesty2014-report-english-litho-full.php ?
Thanks!
from mpdf.
Hi Marc, you can find this file here: https://github.com/thnkloud9/mpdf/tree/85c674200c0b28586c9b3174af95999145114857/examples/amnesty
I made a pull request (now merged) to remove examples/amnesty/ from the development branch, as it's not appropriate to have such large example files in the mPDF source itself. This particular example does not use the correct custom fonts for the book and the CSS is not up to date, it's strictly for testing widows and orphans in a two-column layout, but it may give you a useful insight into how we use mPDF with Booktype.
We could use this separate repo for longform test materials, as long as they are under an appropriate copyright licence which enables redistribution: https://github.com/mpdf/mpdf-examples
from mpdf.
Thank you Daniel!
The Amnesty report is a fantastic showcase of the power of mPDF. That would be great for that new repo.
from mpdf.
The 2016 edition is Creative Commons licensed, so that is a possibility for a separate repo. I wouldn't like to mix it into the repo of a GPL licensed program like mPDF, that could get complicated :-)
from mpdf.
For the record, here is a branch to attempt to address this challenge:
https://github.com/thnkloud9/mpdf/commits/widows_and_orphans
But it is not at the level it was felt it could be merged.
from mpdf.
Related Issues (20)
- Nesting a table inside a tfoot element results in "Trying to access array offset on null"
- PDF/A : endobj is missing EOL-Marker
- ArrayAccess::offsetSet()" might add "void" as a native return type
- Error while trying to save pdf to a folder. HOT 1
- Write HTML - NOTICE Uninitialized string offset - XXXX on Otl.php HOT 1
- Subsequent calls to Output() of pdf with images are missing images in 2++ data HOT 1
- SELECT don't show selected element HOT 6
- Signed PDF doesn't show signature
- Font size in absolute positioned div depends on position on page
- MPDF "Trying to access array offset on null" error during pdf generation
- Documentation Update Needed for Margin Property Syntax Change in mPDF Version 8 HOT 1
- Bug Report HOT 1
- Set margin-left and margin-right in @page :first { ... }
- text-align: justify add spases between letters not words
- Fatal error: Uncaught Error: Class "Mpdf\Mpdf" not found in C:\xampp\htdocs\CV Builder\index.php:5 Stack trace: #0 {main} thrown in C:\xampp\htdocs\CV Builder\index.php on line 5
- MPDF 8.2 with PHP 8.1 - cannot suppress PHP notices and warnings HOT 1
- PDF/A-3b support
- mPDF Package Incompatibility with Swoole in Laravel Octane (Streaming/Download Issue) HOT 3
- Bug in Mpdf::transformRotate: Unsupported operand types: string * int HOT 1
- Warnings that cause the pdf to break during download HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mpdf.