Coder Social home page Coder Social logo

[io.jwpl] WikipediaStandardReaderBase does not fill in several fields in DocumentMetaData and uses wrong language codes about dkpro-core HOT 10 CLOSED

dkpro avatar dkpro commented on July 28, 2024
[io.jwpl] WikipediaStandardReaderBase does not fill in several fields in DocumentMetaData and uses wrong language codes

from dkpro-core.

Comments (10)

reckart avatar reckart commented on July 28, 2024
I think it would be best if the documentUri be the actual wikipedia page URL (if you
paste it in the browser, it should work). The collectionId might be the base URL of
the particular language version of the wikipedia. 

Example:

baseUri: http://en.wikipedia.org/
documentUri: http://en.wikipedia.org/w/index.php?title=Main_Page&oldid=433914749

I think the URI should always contain the revision, not only when reading texts from
the RevisionMachine.

Original issue reported on code.google.com by richard.eckart on 2011-08-30 18:27:16

from dkpro-core.

reckart avatar reckart commented on July 28, 2024
Nice idea.

I see one major issue:
We would then need to translate the language stored in the JWPL database (english,
simple_english, german, etc) into the prefix for the URL (en, de, etc).

I am not aware of any such translation table. We would have to build and maintain it
on our own :/

Original issue reported on code.google.com by torsten.zesch on 2011-09-14 07:52:01

from dkpro-core.

reckart avatar reckart commented on July 28, 2024
We could use this list to map from a Wikipedia language String to the wiki:
http://meta.wikimedia.org/wiki/List_of_Wikipedias

We "just" have to parse the page :)

Original issue reported on code.google.com by oliver.ferschke on 2011-09-14 08:50:59

from dkpro-core.

reckart avatar reckart commented on July 28, 2024
Here's the current language statistics (the source for the list of the static wiki page
mentioned before):
http://s23.org/wikistats/wikipedias_wiki.php

Original issue reported on code.google.com by oliver.ferschke on 2011-09-14 08:53:36

from dkpro-core.

reckart avatar reckart commented on July 28, 2024
(No text was entered with this change)

Original issue reported on code.google.com by richard.eckart on 2012-02-08 22:51:53

  • Labels added: Milestone-1.4.0

from dkpro-core.

reckart avatar reckart commented on July 28, 2024
(No text was entered with this change)

Original issue reported on code.google.com by richard.eckart on 2012-06-12 09:21:59

from dkpro-core.

reckart avatar reckart commented on July 28, 2024
(No text was entered with this change)

Original issue reported on code.google.com by richard.eckart on 2012-07-19 09:38:57

from dkpro-core.

reckart avatar reckart commented on July 28, 2024
(No text was entered with this change)

Original issue reported on code.google.com by richard.eckart on 2012-10-13 18:31:40

  • Labels added: DKPro-ASL

from dkpro-core.

reckart avatar reckart commented on July 28, 2024
(No text was entered with this change)

Original issue reported on code.google.com by richard.eckart on 2012-10-13 18:33:40

  • Labels added: Milestone-1.5.0
  • Labels removed: Milestone-1.4.0

from dkpro-core.

reckart avatar reckart commented on July 28, 2024
Only provided a few language mappings.
Additional ones need to be added when needed.

Original issue reported on code.google.com by torsten.zesch on 2013-01-20 15:05:59

from dkpro-core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.