Coder Social home page Coder Social logo

Comments (9)

kba avatar kba commented on July 4, 2024 1

OK, so a stress test of sorts, that should be doable.

from assets.

EEngl52 avatar EEngl52 commented on July 4, 2024

@kba I guess this can be closed?

from assets.

kba avatar kba commented on July 4, 2024

I don't remember what I meant by this. I'll try to open more descriptive isssues in the future 😬

from assets.

bertsky avatar bertsky commented on July 4, 2024

I think this was to have a realistic test case for performance issues with large METS. Large could be many fileGrps or many files therein or many pages – or any combination of it. This came up earlier when some change to the PAGE model (esp. the pageId lookup) severely degraded performance on my workspaces to the point were it became unusable.

from assets.

EEngl52 avatar EEngl52 commented on July 4, 2024

probably sth like this? http://digital.slub-dresden.de/id336927223

from assets.

bertsky avatar bertsky commented on July 4, 2024

probably sth like this? http://digital.slub-dresden.de/id336927223

well, 300 pages is not that much of a stretch. How about: http://digital.slub-dresden.de/id507244877-18920000

That would cover the many pages scenario. But how about many fileGrps? The METS from Kitodo.Presentation is rather small (just FULLTEXT, ORIGINAL and various JPEG qualities). All I can think of is an OCR-D workspace after running lots of different workflows with many steps.

from assets.

bertsky avatar bertsky commented on July 4, 2024

That would cover the many pages scenario

Or rather: I could give you the METS built from https://github.com/bertsky/ocrd_publaynet – it contains 671407 pages in the training set and 56227 in the validation set.

from assets.

EEngl52 avatar EEngl52 commented on July 4, 2024

my example above is 1400 pages, nothing compared to your publaynet though

from assets.

bertsky avatar bertsky commented on July 4, 2024

my example above is 1400 pages, nothing compared to your publaynet though

oh, right! Sorry, got confused. Yes, I do think the bible should be a test case. PubLayNet is an extreme (probably never used that way) – I actually recommend against having it included in the auto regression tests, as it's such a drag. (But it might help to have it somewhere ...)

from assets.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.