Coder Social home page Coder Social logo

wikidata / wikidata-toolkit-examples Goto Github PK

View Code? Open in Web Editor NEW
49.0 49.0 23.0 129 KB

Examples showing how to use Wikidata Toolkit as a Maven library in your project

Home Page: https://www.mediawiki.org/wiki/Wikidata_Toolkit

License: Apache License 2.0

Java 100.00%

wikidata-toolkit-examples's Introduction

Wikidata

This git repository contains all scripts that evolved during Wikidata year 1 that help with QA of the project.

  • doc: Everything you need for a Doxygen setup for Wikibase.
  • htmlValidation: Everything you need for validating a running verision of the Wikidata repository.
  • testcoverage: Everything you need for a testcoverage setup for Wikibase.

wikidata-toolkit-examples's People

Contributors

guenthermi avatar mkroetzsch avatar tpt avatar wetneb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wikidata-toolkit-examples's Issues

JSON dump parsing error

Hello,

I cloned this project and run 'EntityStatisticsProcessor' class and this is the error that occurs

********************************************************************
*** Wikidata Toolkit: EntityStatisticsProcessor
*** 
*** This program will download and process dumps from Wikidata.
*** It will print progress information and some simple statistics.
*** Results about property usage will be stored in a CSV file.
*** See source code for further details.
********************************************************************
2022-11-23 11:53:46 INFO  - Using download directory C:\_optimaize\wikidata\Wikidata-Toolkit-Examples\dumpfiles\wikidatawiki
2022-11-23 11:53:46 INFO  - Found 0 local dumps of type JSON: []
2022-11-23 11:53:49 INFO  - Found 360 online dumps of type JSON: [wikidatawiki-json-20221123, wikidatawiki-json-20221116, wikidatawiki-json-20221114, ..., wikidatawiki-json-20170925]
2022-11-23 11:53:49 INFO  - Downloading JSON dump file 20221123.json.gz from https://dumps.wikimedia.org/other/wikidata/20221123.json.gz ...

2022-11-23 11:55:35 ERROR - Error when reading JSON for entity: Missing type id when trying to resolve subtype of [simple type, class org.wikidata.wdtk.datamodel.implementation.FormDocumentImpl]: missing type id property 'type' (for POJO property 'forms')
 at [Source: (GZIPInputStream); line: 2, column: 1492] (through reference chain: org.wikidata.wdtk.datamodel.implementation.LexemeDocumentImpl["forms"]->java.util.ArrayList[0])
2022-11-23 11:55:35 WARN  - Entering recovery mode to parse rest of file. This might be slightly slower.
2022-11-23 11:55:35 WARN  - Skipping rest of current line: BA2239BB8","rank":"normal"}]}}],"pageid":54387040,"ns":146,"title":"Lexeme:L4","lastrevid":171059607[...]id":1710596079,"modified":"2022-08-22T19:28:34Z"},
2022-11-23 11:55:35 ERROR - Error when reading JSON for entity: Missing type id when trying to resolve subtype of [simple type, class org.wikidata.wdtk.datamodel.implementation.FormDocumentImpl]: missing type id property 'type' (for POJO property 'forms')
 at [Source: (String)"{"type":"lexeme","id":"L314","lemmas":{"ca":{"language":"ca","value":"pi"}},"lexicalCategory":"Q1084","language":"Q7026","claims":{"P5185":[{"mainsnak":{"snaktype":"value","property":"P5185","datavalue":{"value":{"entity-type":"item","numeric-id":1775415,"id":"Q1775415"},"type":"wikibase-entityid"},"datatype":"wikibase-item"},"type":"statement","id":"L314$45650151-4ed8-025d-2442-e36ef22e6a2a","rank":"normal"}]},"forms":[{"id":"L314-F1","representations":{"ca":{"language":"ca","value":"pis"}},"gr"[truncated 281 chars]; line: 1, column: 543] (through reference chain: org.wikidata.wdtk.datamodel.implementation.LexemeDocumentImpl["forms"]->java.util.ArrayList[0])
2022-11-23 11:55:35 ERROR - Problematic line was: {"type":"lexeme","id":"L314","lemmas":{"ca":{"lang...

Project is using wikidata toolkit version 0.11.0.

I tried newer versions (https://mvnrepository.com/artifact/org.wikidata.wdtk/wdtk-datamodel)
and got another error

2022-11-23 12:05:48 ERROR - Error when reading JSON for entity: Cannot deserialize value of type `java.util.ArrayList<org.wikidata.wdtk.datamodel.implementation.SenseDocumentImpl>` from Object value (token `JsonToken.START_OBJECT`)
 at [Source: (GZIPInputStream); line: 3, column: 674] (through reference chain: org.wikidata.wdtk.datamodel.implementation.LexemeDocumentImpl["senses"])
2022-11-23 12:05:48 WARN  - Entering recovery mode to parse rest of file. This might be slightly slower.
2022-11-23 12:05:48 WARN  - Skipping rest of current line: id":"L117$2bc66535-41a6-a4ca-3748-060ad3bbe56c","rank":"normal"}],"P1343":[{"mainsnak":{"snaktype":"[...]id":1742289833,"modified":"2022-10-03T18:45:08Z"},
2022-11-23 12:05:48 ERROR - Error when reading JSON for entity: Cannot deserialize value of type `java.util.ArrayList<org.wikidata.wdtk.datamodel.implementation.FormDocumentImpl>` from Object value (token `JsonToken.START_OBJECT`)
 at [Source: (String)"{"type":"lexeme","id":"L68","lemmas":{"fa":{"language":"fa","value":"\u062c\u0627\u0646\u0627\u0646"}},"lexicalCategory":"Q1084","language":"Q9168","claims":{},"forms":{},"senses":{},"pageid":54387656,"ns":146,"title":"Lexeme:L68","lastrevid":683797031,"modified":"2018-05-23T11:27:17Z"}"; line: 1, column: 169] (through reference chain: org.wikidata.wdtk.datamodel.implementation.LexemeDocumentImpl["forms"])
2022-11-23 12:05:48 ERROR - Problematic line was: {"type":"lexeme","id":"L68","lemmas":{"fa":{"langu...

i guess the data is downloaded from: https://dumps.wikimedia.org/other/wikidata/ (this link appeared in console)

What could i do in order to make this work?

RDF export error

I want to export rdf from local json dump file. But when I run RdfSerializationExample.java with JDK1.8, I got error:
java.lang.NoSuchMethodError: java.nio.ByteBuffer.rewind()Ljava/nio/ByteBuffer

I did not edit RdfSerializationExample.java, and I ONLY modified ExampleHelpers.java.

When dumpfile is set to sample-dump-20150815.json.gz, the program works. But when I change dumpfile to wikidata-20180430-all.json.gz, there is error: java.lang.NoSuchMethodError: java.nio.ByteBuffer.rewind()Ljava/nio/ByteBuffer

RDF Exports

Hi,

I am trying to create an RDF Export from a current Wikidata dump (20181105).

First I tried to use the toolkit client (v0.8.0) and I always got 31 triples, no matter what parameters I tried to use.

Now I am using the version 0.9.0 of the toolkit in eclipse, but I am getting some warnings and errors.

One Warning I am encountering for several language codes is:
Unknown Wikimedia language code "inh". Using this code in RDF now, but this might be wrong.

And for various properties I get the errors:
Count not export SomeValueSnak for property P1971: OWL range not known.
or
Could not fetch datatype of http://www.wikidata.org/entity/P883. Assuming type http://wikiba.se/ontology#String

Furthermore I am trying to filter the data by english and german using setLanguageFilter, but it has no effect. I added the following to the RdfSerializationExample but I get the same amount of triples with or without it:

Set<String> languageSet = new HashSet<String>();
languageSet.add("en"); 
languageSet.add("de");
dumpProcessingController.setLanguageFilter(languageSet);

FetchOnlineDataExample dies with NPE

*** It does not download any dump files.


*** Fetching data for one entity:
The current revision of the data for entity Q42 is 287075311
The English name for entity Q42 is Douglas Adams
*** Fetching data for several entities:
Exception in thread "main" java.lang.NullPointerException
at org.wikidata.wdtk.wikibaseapi.ApiConnection.fillCookies(ApiConnection.java:544)
at org.wikidata.wdtk.wikibaseapi.ApiConnection.sendRequest(ApiConnection.java:346)
at org.wikidata.wdtk.wikibaseapi.WbGetEntitiesAction.wbGetEntities(WbGetEntitiesAction.java:187)
at org.wikidata.wdtk.wikibaseapi.WbGetEntitiesAction.wbGetEntities(WbGetEntitiesAction.java:96)
at org.wikidata.wdtk.wikibaseapi.WikibaseDataFetcher.getEntityDocumentMap(WikibaseDataFetcher.java:254)
at org.wikidata.wdtk.wikibaseapi.WikibaseDataFetcher.getEntityDocuments(WikibaseDataFetcher.java:161)
at org.wikidata.wdtk.wikibaseapi.WikibaseDataFetcher.getEntityDocuments(WikibaseDataFetcher.java:141)
at FetchOnlineDataExample.main(FetchOnlineDataExample.java:52)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

FetchOnlineDataExample raises a JsonParseException

Hello, I have cloned the repo and attempted to run the FetchOnlineDataExample example. Didn't modify a thing.
Running the sample gives me the following error message:

Exception in thread "main" 2017-12-14 17:29:09 ERROR - Could not retrive data: com.fasterxml.jackson.core.JsonParseException: Unexpected character ('<' (code 60)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: sun.net.www.protocol.http.HttpURLConnection$HttpInputStream@7920ba90; line: 1, column: 2]
java.lang.NullPointerException
	at examples.FetchOnlineDataExample.main(FetchOnlineDataExample.java:45)

Basically the call

EntityDocument q42 = wbdf.getEntityDocument("Q42");

returns null and the successive

System.out.println("The current revision of the data for entity Q42 is "
						+ q42.getRevisionId());

throws a NullPointerException closing the program.

I'm not sure what the issue could be here since I didn't do much aside from cloning and running the software itself. Any ideas?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.