Coder Social home page Coder Social logo

Comments (33)

Popolechien avatar Popolechien commented on June 14, 2024 1

perfect.

from mwoffliner.

ISNIT0 avatar ISNIT0 commented on June 14, 2024 1

@kelson42 @Popolechien
For review:

from mwoffliner.

ISNIT0 avatar ISNIT0 commented on June 14, 2024 1

BM Full nopic: https://framadrop.org/r/cyk0sHthFk#vjOsZMdLvq9vqrulrpSOO/WUqSAlZ7ehMf6Zv36aVy0=

No, the current logic is to check each article for categories as it's downloaded. Then we only end up with categories that contain at least one article as per @kelson42's spec:

Mirrors categories which only have at least one article in it.

from mwoffliner.

kelson42 avatar kelson42 commented on June 14, 2024

Yes, categories are not mirrored. This is a work to do in mwoffliner. Probablyt the top priority.

from mwoffliner.

kelson42 avatar kelson42 commented on June 14, 2024

@ISNIT0 Here is currently the TOP priority topic on mwoffliner. It's not extremly complicated but need a bit work. Let me know if you are interested to have a look so I can explain you a bit.

from mwoffliner.

ISNIT0 avatar ISNIT0 commented on June 14, 2024

@kelson42 I'm interested :) What's the best place to start looking?

from mwoffliner.

kelson42 avatar kelson42 commented on June 14, 2024

@ISNIT0 Let's make a video conf about that. Let me know when you have time.

from mwoffliner.

tim-moody avatar tim-moody commented on June 14, 2024

Was 'probably the top priority' in Sept of 2016 yet still not implemented. Any time frame?

from mwoffliner.

kelson42 avatar kelson42 commented on June 14, 2024

No, it is still the top priority, but there is nobody to work on this so far.

from mwoffliner.

WikiDocJames avatar WikiDocJames commented on June 14, 2024

Thanks for pointing me to this. Hope to see it fixed sometime soon. Maybe a google summer of code project for someone?

from mwoffliner.

kelson42 avatar kelson42 commented on June 14, 2024

@WikiDocJames Maybe even if this first GSoC we are managing is focus on Kiwix-Android. That said if someone comes to me and is motivated and capable, I might consider to mentor it myself.

from mwoffliner.

holta avatar holta commented on June 14, 2024

Further Context: this issue directly affects Haiti schools who've made clear they would use Vikidia IF its link ("84 super articles") were clickable in the top right, as seen in the current Vikidia ZIM here:
http:// iiab . me : 3000 /vikidia_fr_all_novid_2018-03/

Current Vikidia ZIM downloaded from:
http://download.kiwix.org/zim/vikidia/vikidia_fr_all_2018-03.zim

Compare the original (online) version at https://fr.vikidia.org works far better. However the offline version (above ZIM file) is extremely frustrating to educators or children, when the most important link ("84 super articles") is not yet fixed — in future these essential materials should appear much like they do online here:
https://fr.vikidia.org/wiki/Cat%C3%A9gorie:Super_article

PS @kelson42 has clarified that he's hopeful this will be fixed before the end of 2018.

from mwoffliner.

kelson42 avatar kelson42 commented on June 14, 2024

Things to do (the ones I can see):

  • Include "Category" namespace to the namespace to scrape per default
  • Verify the category pages are scrapped properly
  • Secure that links within articles pages to categories work properly
  • Secure the category links at the bottom of the page are displayed properly
  • Secure the list of articles are displayed properly like online (sorted alphabetically) are displayed also offline
  • Secure the category pagination works properly
  • Remove articles which are not mirrored from the category list of articles.
  • Mirrors categories which only have at least one article in it.

from mwoffliner.

ISNIT0 avatar ISNIT0 commented on June 14, 2024

What is the best thing to do for an articleList selection? Keep all the many parent categories? Not keep categories? Keep only one level of categories? Something else?

from mwoffliner.

kelson42 avatar kelson42 commented on June 14, 2024

@ISNIT0 Keep each category with at least one non-category child and merge all categories (to the top one) if there is only one sub-category.

from mwoffliner.

ISNIT0 avatar ISNIT0 commented on June 14, 2024

What about categories with media? e.g. https://commons.wikimedia.org/wiki/Category:Birds_in_art

from mwoffliner.

ISNIT0 avatar ISNIT0 commented on June 14, 2024

There doesn't seem to be a way to get the structured data of what order to show the sub-categories in. It's not just Alphabetical:
e.g. https://bm.wikipedia.org/wiki/Cat%C3%A9gorie:Lien_th%C3%A9matique_pour_cat%C3%A9gories
The single category is in the "G" namespace

and
https://en.wikipedia.org/wiki/Category:London
There is a *, Β (greek letter), Ι, Ξ, and Σ

Any suggestions here @Popolechien?

The query I'm currently using is this: https://bm.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtype=subcat&cmlimit=500&format=json&cmtitle=Cat%C3%A9gorie%3ALien_th%C3%A9matique_pour_cat%C3%A9gories
Which only gives back the article namespace, pageid, and title.

from mwoffliner.

ISNIT0 avatar ISNIT0 commented on June 14, 2024

Progress:
I've added a --getCategories work in progress flag which enables the category scraping.
There are certainly issues with the current implementation, so it should not be used yet.

Each article has a Categories section added to the bottom with a list of links to category pages, each Category page has a Sub-categories section which links to sub-category pages.

TODO:

  • Display categories as on wikipedia.org
  • List pages within a category
  • Improve efficiency of category page scraping

from mwoffliner.

ISNIT0 avatar ISNIT0 commented on June 14, 2024

It seems to display the categories in the same way as MediaWiki displays them, we need information that isn't available through the API. Instead I'm just grouping them Alphabetically which is pretty close

from mwoffliner.

ISNIT0 avatar ISNIT0 commented on June 14, 2024

Progress so far:
https://framadrop.org/r/R1A5MJwaey#hxz6gNnGy7mFqv23Hf9SJNQSVL9JPS+pHOGQgVcyDvc=

Known issues:

  • Categories/Sub-Categories/Pages sections don't collapse/expand
    (resolved by implementing #677)
  • "Hidden" Are not separated from normal categories
    (solved by not downloading "hidden" categories)

from mwoffliner.

Popolechien avatar Popolechien commented on June 14, 2024

Yeah, the hidden categories not being weeded out is a real blocker. These are useless and take up quite some space. @ISNIT0 what's your plan about those?

from mwoffliner.

ISNIT0 avatar ISNIT0 commented on June 14, 2024

@Popolechien I've just updated the comment above, we're now not scraping them at all. Is this okay?

from mwoffliner.

Popolechien avatar Popolechien commented on June 14, 2024

Niiiice.
Am I right to understand that all categories within the categories will also be showcased (ie not only the categories in the articles themselves)?
Either way, good job!

from mwoffliner.

kelson42 avatar kelson42 commented on June 14, 2024

I have tested with https://framadrop.org/r/D1EE0C6YxL#SwJO6719lYGfukNN1i71HHy1glAK4MaJTdKiifDHBlo=:

  • In Category pages, if an article is not in the selection it should not be listed (currently in black)
  • I think we should choose an other namespace to put category pages. The ZIM specs talks about "U" see https://wiki.openzim.org/wiki/ZIM_file_format#Namespaces
  • Up categories should be migrated too, it is not the case in "Category:2010s in Austria".

from mwoffliner.

ISNIT0 avatar ISNIT0 commented on June 14, 2024

@kelson42 What do you mean by "Up categories should be migrated too"?

from mwoffliner.

kelson42 avatar kelson42 commented on June 14, 2024

@ISNIT0 In mean "categories parent categories", the full ancestor three should be downloaded (but of course in a simplified version).

from mwoffliner.

ISNIT0 avatar ISNIT0 commented on June 14, 2024

@kelson42 You previously said:

Mirrors categories which only have at least one article in it.

from mwoffliner.

samkellerhals avatar samkellerhals commented on June 14, 2024

@ISNIT0 @kelson42 @Popolechien thanks a lot for working on this - I think the addition of live category links will make for a huge improvement! So far I've been working with Kirundi/Kynarwanda/French zims for use in refugee camps and they also appeared with dead links on the index.html page. Are you thinking of applying these changes (active category links) to all zims currently available for download via the kiwix website?

from mwoffliner.

kelson42 avatar kelson42 commented on June 14, 2024

@samkellerhals This is the goal, might take a few additional months to see it happening everywhere.

from mwoffliner.

ISNIT0 avatar ISNIT0 commented on June 14, 2024

@kelson42 This is now doing the tree-shaking/graph simplification:
https://framadrop.org/r/dIaIeQVRtO#zZjY9W6s5P6ukctJPxU8GDvEQpzAUPdsqSKXbQohwII=

Because this is done using the top 100 articles, there is not a lot of shared categorisation, but Mantis is a good example

from mwoffliner.

ISNIT0 avatar ISNIT0 commented on June 14, 2024

@kelson42 I'd like to move the namespacing item you mentioned into a separate ticket and add it to 2.0

I can see it causing lots of back-and-forth with routing edge-cases

from mwoffliner.

kelson42 avatar kelson42 commented on June 14, 2024

@ISNIT0 From what I can see from last file you have proposed https://framadrop.org/r/P1S5xi6PRm#A6fiUMsysQsdZzr72yXsT6i/QaYm/Dc97iJZZtYktVg= This looks quite good :) That said I was not able to check if the pagination works fine! Do you have a demo ZIM for that?

from mwoffliner.

kelson42 avatar kelson42 commented on June 14, 2024

AFAIK everything has now been implemented in 1.9, except #762 to be done in 2.0

from mwoffliner.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.