richardlitt / mlode Goto Github PK

View Code? Open in Web Editor NEW

0.0 2.0 0.0 132 KB

Automatically exported from code.google.com/p/mlode

mlode's People

Watchers

mlode's Issues

[deleted issue]

[deleted issue]

Ontos URIs produce too many redirects and http staus code 302 instead of 303

What steps will reproduce the problem?
curl -IL -H "Accept: application/rdf+xml" 
"http://www.ontosearch.com/2008/01/identification%23EID-3b79064eeb9930abe4da398c
afc870ef"  


What is the expected output? What do you see instead?
We expect 303 redirects, not 302. Also the number of redirects is very high. 

See the output below:
HTTP/1.1 302 Moved Temporarily
Date: Fri, 20 Jul 2012 15:51:55 GMT
Server: Apache-Coyote/1.1
Location: 
http://www.ontosearch.com/2008/01/identification/EID-3b79064eeb9930abe4da398cafc
870ef
Connection: close
Content-Type: text/plain; charset=UTF-8

HTTP/1.1 302 Moved Temporarily
Date: Fri, 20 Jul 2012 15:51:55 GMT
Server: Apache-Coyote/1.1
Location: 
http://www.ontosearch.com/2008/01/rdf%23EID-3b79064eeb9930abe4da398cafc870ef
Connection: close
Content-Type: text/plain; charset=UTF-8

HTTP/1.1 302 Moved Temporarily
Date: Fri, 20 Jul 2012 15:51:55 GMT
Server: Apache-Coyote/1.1
Location: 
http://www.ontosearch.com/2008/01/rdf/EID-3b79064eeb9930abe4da398cafc870ef
Connection: close
Content-Type: text/plain; charset=UTF-8

HTTP/1.1 200 OK
Date: Fri, 20 Jul 2012 15:52:05 GMT
Server: Apache-Coyote/1.1
Content-Type: application/rdf+xml;charset=UTF-8
Content-Length: 106496
Connection: close

Original issue reported on code.google.com by [email protected] on 20 Jul 2012 at 4:07

[deleted issue]

[deleted issue]

PHOIBLE

If you have a dataset, you want to have reviewed or converted, enter the
information below.

1. (Required) Provide a link of a dataset at http://thedatahub.org . (If
the dataset is not yet there, please add your dataset to thedatahub first.)

I don't want to put the data in the datahub yet, so send me a message.


2. (Required) Who should we contact for questions?

Me.

Further questions (optional):
- Is the data in RDF already?

Yes, but it needs an expert to look at it.

- What should we do with the data set (convert to RDF, review quality of
RDF, fix RDF , interlink with other data set, provide hosting)

Create Linked Data.

Original issue reported on code.google.com by [email protected] on 2 Aug 2012 at 12:01

Provide SPARQLIFY mapping and upload everything


Claus please upload and describe everything you have about Wortschatz...

Original issue reported on code.google.com by [email protected] on 19 Jul 2012 at 3:15

Add dataset IDS

Dataset can be found here:

http://lingweb.eva.mpg.de/ids/

I also have it in XML format with links to a central "concepticon".

Original issue reported on code.google.com by [email protected] on 6 Aug 2012 at 1:17

Related terms converted to "collocations"

On http://wiktionary.dbpedia.org/page/dragon-English-Noun-6en "drake", "draco", 
"dragoon" etc. are listed as collocations, but on 
http://en.wiktionary.org/wiki/dragon they are listed as "related terms".

Wiktionary is right, these terms are not collocations 
http://en.wikipedia.org/wiki/Collocation.

Original issue reported on code.google.com by [email protected] on 2 Aug 2012 at 6:04

Why use glottolog:geospan ?

What steps will reproduce the problem?

curl -L -H "Accept: application/rdf+xml" 
http://glottolog.livingsources.org/resource/languoid/id/tuva1238 | grep 
'glottolog:geospan'


What is the expected output? What do you see instead?

Output:
<glottolog:geospan> 179.16999999999999, 179.16999999999999, -8.5, -8.5 
</glottolog:geospan>

Comment:
glottolog:geospan is glottolog pacific type will NOT be understandable by 
others, why don't use some will defined type like GeoRSS 
http://dbpedia.org/page/GeoRSS


How many triples are affected? (if less than 3-5% of the whole data set,
please set priority to _low_)

Please use labels and text to provide additional information.

Original issue reported on code.google.com by [email protected] on 26 Jul 2012 at 1:53

Document the workflow and provide guidance

Update the intro page and explain to people how this is supposed to work:
http://code.google.com/p/mlode/

This is for Johannes and Richard

Original issue reported on code.google.com by [email protected] on 16 Jul 2012 at 1:57

New languoid dump

What steps will reproduce the problem?
1. Download http://www.glottolog.org/downloadarea/languoids.n3.tgz
2. untar (yields 10 GB)
3. check RDF
sebastian_nordhoff@lingua182:~/rdf$ rapper -i turtle 0-10000languoids.n3 
>/dev/null
rapper: Parsing URI file:///home/sebastian_nordhoff/rdf/0-10000languoids.n3 
with parser turtle
rapper: Serializing with serializer ntriples
rapper: Parsing returned 1277081 triples


The dump is split into 12 subsets for reasons of size
The dump is very large because of the recursive structure of the tree. It 
currently states that Saxonian is a member of German and that Saxonian is a 
member of West Germanic, even if this follows from German being a member of 
West Germanic. 

By the same token "Sächsisches Wörterbuch" is attached to all of Saxonian, 
Germanic, West Germanic, Indo-European

Recursive searches in relational databases are already slow, so I imagine that 
recursive searches in triple stores are even slower. This is why the dataset is 
denormalized


What is the expected output? What do you see instead?
to me, the RDF looks fine ;) but I suppose there will be some issues

How many triples are affected? (if less than 3-5% of the whole data set,
please set priority to _low_)
10^8

Please use labels and text to provide additional information.

Original issue reported on code.google.com by [email protected] on 2 Aug 2012 at 9:26

Glottolog references dump

What steps will reproduce the problem?
1. Download http://www.glottolog.org/downloadarea/references.rdf.zip
2. Use the tools at your disposal to tell me what needs to be improved
3. Release

What is the expected output? What do you see instead?
n/a

How many triples are affected? (if less than 3-5% of the whole data set,
please set priority to _low_)
100%

Please use labels and text to provide additional information.

since numerical IDs are not allowed, I prefix them with a 'r', so 
/resource/reference/id/r1234.rdf. The routing will be updated to resolve these 
addresses in due course

Original issue reported on code.google.com by [email protected] on 1 Aug 2012 at 12:56

SIL ISO-639-3 tables

ISO-639-3 codes are widely used, but not static. SIL's data is the best option 
for a canonical source; they provide the data for download, but it needs to be 
converted.

Caveat - the data has a closed licence.

Original issue reported on code.google.com by joregan on 3 Aug 2012 at 2:02

Product Scheme Classifications Catalogue

If you have a dataset, you want to have reviewed or converted, enter the
information below.

1. http://thedatahub.org/en/dataset/pscs-catalogue

2. Jose María Alvarez Rodríguez ([email protected]) and José Emilio 
Labra Gayo ([email protected]

Further questions (optional):

- Is the data in RDF already?

Yes and it is part of the Linked Data Cloud. 

- What should we do with the data set (convert to RDF, review quality of RDF, 
fix RDF , interlink with other data set, provide hosting)

Maybe, the best option will be interlink with other datasets.

Original issue reported on code.google.com by [email protected] on 31 Jul 2012 at 9:00

Catalan WordNet

If you have a dataset, you want to have reviewed or converted, enter the
information below.

1. (Required) Provide a link of a dataset at http://thedatahub.org . (If
the dataset is not yet there, please add your dataset to thedatahub first.)

http://thedatahub.org/dataset/catalan-wordnet

2. (Required) Who should we contact for questions?

Further questions (optional):
- Is the data in RDF already?

- What should we do with the data set (convert to RDF, review quality of
RDF, fix RDF , interlink with other data set, provide hosting)

Initial conversion here: https://github.com/jimregan/caWN-200611-rdf

The URIs are temporary, while I hunt for hosting; I'm looking for feedback, in 
particular for the lemon version.

Original issue reported on code.google.com by joregan on 5 Aug 2012 at 12:13

Finalize Wiktionary for Release

- update: http://thedatahub.org/dataset/wiktionary-dbpedia
- put faqs and common error docu on wiktionary.dbpedia.org. Ideally, you would 
restucture the wiki into "Usage of data" and "walkthrough tutorial"-> "how to 
create a new mapping" and "development"
- upload jar and config xml and demo xml into this repo, folder wiktionary
- simplify: http://downloads.dbpedia.org/wiktionary/ 
  1. move all files in a folder dumps/en dumps/fr
  2. set link to latest: ln -s   dumps/en/jjfjjsfj823438.tar.gz wiktionary-en-latest.tar.gz
- send virtuoso script for graph_groups to me

Original issue reported on code.google.com by [email protected] on 17 Jul 2012 at 11:36

No pronounciation scheme on hasPronounciation

Resources such as http://wiktionary.dbpedia.org/page/cactus-English do not make 
it clear which pronounciation scheme is being used... this can be fixed by 
making the language tag "en-fonipa"

See http://www.iana.org/assignments/language-subtag-registry/

Minor, but also easy to fix ;)

Original issue reported on code.google.com by [email protected] on 2 Aug 2012 at 6:20

RDF object value written in HTML

What steps will reproduce the problem?
1. Use Ontosearch at http://www.ontosearch.com/  with keyword Barack Obama
2. Many Barack Obama as keywords will appear, just pick the first one.
3. Below This EID will appear: 
    http://www.ontosearch.com/2008/01/identification#EID-3b79064eeb9930abe4da398caf
c870ef
4. Copy and Paste the EID in Address bar, a file will be downloaded
5. Open it with text editor Copy the contents
6. Go to http://www.w3.org/RDF/Validator/ and paste it in the text area of 
"Test by Direct Input" and press parse RDF
7. Group of Errors appear

What is the expected output? What do you see instead?
Group of Errors appear with description:
"Relative URIs are not permitted in RDF"
which indicates that some RDF object values are written in the form of HTML 
other than literal or URI

How many triples are affected? (if less than 3-5% of the whole data set,
please set priority to _low_)

Please use labels and text to provide additional information.
The attached files provides the RDF file and file with the details of the Error

Original issue reported on code.google.com by [email protected] on 20 Jul 2012 at 11:17

Attachments:

[www.ontosearch.com Obama RDFs](https://storage.googleapis.com/google-code-attachments/mlode/issue-18/comment-0/www.ontosearch.com Obama RDFs)
W3CValidationServiceOntos.pdf

Wortschatz

If you have a dataset, you want to have reviewed or converted, enter the
information below.

1. (Required) Provide a link of a dataset at http://thedatahub.org . If the
dataset is not yet there, please add your dataset to thedatahub first.
No datahub entry yet: http://corpora.informatik.uni-leipzig.de/

2. (Required) Who should we contact for questions?
For now: [email protected] the maintainer of SPARQLIFY

Further questions (optional):
- Is the data in RDF already? yes/no
- What should we do with the data set (convert to RDF, review quality of
RDF, fix RDF , interlink with other data set, provide hosting)

Original issue reported on code.google.com by [email protected] on 19 Jul 2012 at 3:13

Add some example URIs to Glottolog datahub entry

Have a look at:
http://thedatahub.org/de/dataset/dbpedia

The sixth resource is:
Link to an example data item within the dataset (RDF/XML)

Could you do the same for Glottolog? 
This would help us debug it better.

Original issue reported on code.google.com by [email protected] on 25 Jul 2012 at 1:52

doap:creator links

Pages such as 

http://wiktionary.dbpedia.org/page/cactus

Have a link using the following property http://usefulinc.com/ns/doap#creator, 
which appear to be invalid ("#creator") is not an ID in the DOAP vocabulary, 
perhaps Dublin Core was the intended vocabulary?

Also the link is to 
http://de.wiktionary.org/w/index.php?title=cactus&action=history which seems 
very bizarre

Original issue reported on code.google.com by [email protected] on 2 Aug 2012 at 6:13

Ontos News Portal

If you have a dataset, you want to have reviewed or converted, enter the
information below.

1. (Required) Provide a link of a dataset at http://thedatahub.org . If the
dataset is not yet there, please add your dataset to thedatahub first.

http://thedatahub.org/dataset/ontos-news-portal

2. (Required) Who should we contact for questions?
Ontos AG (Christian Ehrlich) christian.ehrlich -at- ontos.com

Further questions (optional):
- Is the data in RDF already?

yes

- What should we do with the data set (convert to RDF, review quality of
RDF, fix RDF , interlink with other data set, provide hosting)

check availability, run tests on the dump and report any strange things, 
especially have a look at the '#' uris and see if the work for Linked Data

Original issue reported on code.google.com by [email protected] on 16 Jul 2012 at 2:04

Why xmlns:glottolog defined twice

What steps will reproduce the problem?

curl -L -H "Accept: application/rdf+xml" 
http://glottolog.livingsources.org/resource/languoid/id/tuva1238 | grep 
'glottolog='


What is the expected output? What do you see instead?

Output:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
xmlns:powder="http://www.w3.org/2007/05/powder-s#" 
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" 
xmlns:owl="http://www.w3.org/2002/07/owl#" 
xmlns:gold="http://purl.org/linguistics/gold/" 
xmlns:dcterms="http://purl.org/dc/terms/" 
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" 
xmlns:skos="http://www.w3.org/2004/02/skos/core#" 
xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" 
xmlns:foaf="http://xmlns.com/foaf/0.1/" 
xmlns:glottolog="http://glottolog.livingsources.org/ontologies/glottolog" 
xmlns:lexvo=" http://lexvo.org/ontology" 
xmlns:lingvoj="http://www.lingvoj.org/ontology">
    <owl:sameAs xmlns:glottolog="http://www.glottolog.org/ontologies/glottolog" rdf:resource="http://www.glottolog.org/resource/languoid/id/84411"/>

Defect:
xmlns:glottolog defined twice.




How many triples are affected? (if less than 3-5% of the whole data set,
please set priority to _low_)

Please use labels and text to provide additional information.

Original issue reported on code.google.com by [email protected] on 26 Jul 2012 at 1:32

Some Dataset-Glottolog URIs contains spaces

What steps will reproduce the problem?

1. First download the data set
wget http://www.glottolog.org/downloadarea/languoids.rdf.zip
unzip http://www.glottolog.org/downloadarea/languoids.rdf.zip 

2. Download and install VRP validation tool from: 
file:///home/sherif/GlottologLangdoc/vrp3.0/HowToUse.html
formore details see wiki: 
http://code.google.com/p/mlode/wiki/DebuggingLinkedData

3.Run the tool against the data set using:
java -jar vrp3.0.jar
there is a GUI where u can enter the i/o files, chick in "Triple" radio button.

What is the expected output? What do you see instead?

Error Example:

Semantic/Syntax error2012Not a valid URI!
Error called by: CUP$parser$actions.Invalid value for attribute 
http://www.w3.org/1999/02/22-rdf-syntax-ns#resource: 
http://en.wikipedia.org/wiki/Kuman (Russia) 
parser.URI_check
The URI http://en.wikipedia.org/wiki/Kuman (Russia)
Must be http://en.wikipedia.org/wiki/Kuman%20(Russia)
with the space replaced by %20

Error Description:

How many triples are affected? (if less than 3-5% of the whole data set,
please set priority to _low_)

there exist 2161 such Errors from 9346647 whole triples (0.023% of the whole 
triple affected with this error type).

The whole log file is attached.

Original issue reported on code.google.com by [email protected] on 23 Jul 2012 at 2:52

Attachments:

languiodsVrp3.log.tar.gz

Create CKAN entry for Wortschatz

The data set is here: 
http://corpora.informatik.uni-leipzig.de/
Please create a datahub entry for it and tag it as "mlode" and "linguistics"

Original issue reported on code.google.com by [email protected] on 19 Jul 2012 at 3:17

a lot of white spaces

What steps will reproduce the problem?

There are a lot of wight spaces, this one example:
curl -L -H "Accept: application/rdf+xml" 
http://www.glottolog.org/resource/languoid/id/tuva1238.rdf | grep 'geo:'


What is the expected output? What do you see instead?

Output:
<geo:long> 179.17 </geo:long>
    <geo:lat> -8.5 </geo:lat>

Defect:
white spaces around the numbers 

Right output:
<geo:long>179.17</geo:long>
<geo:lat>-8.5</geo:lat>

How many triples are affected? (if less than 3-5% of the whole data set,
please set priority to _low_)

Please use labels and text to provide additional information.

Original issue reported on code.google.com by [email protected] on 31 Jul 2012 at 2:04

[deleted issue]

[deleted issue]

Content negotiation not working

curl -H "Accept: application/rdf+xml" http://wals.info/languoid/family/jabuti

redirects to 

http://127.0.0.1:6084/languoid/family/jabuti.rdf

instead of 

http://wals.info/languoid/family/jabuti.rdf

Original issue reported on code.google.com by [email protected] on 3 Aug 2012 at 9:38

Warning when converting languoids using rapper

What steps will reproduce the problem?
1. I tray using rapper to convert the data set from rdfxml to nTriples using
{{{ 
rapper -i rdfxml languoids.rdf -o ntriples > languoids.nt 2>languoidsWarning.log
}}}
2. I get some warning and  
3.I attached the warning file

What is the expected output? What do you see instead?


Please use labels and text to provide additional information.

Original issue reported on code.google.com by [email protected] on 16 Jul 2012 at 1:03

Attachments:

languoidsWarning.log

Add examples to CKAN entry of Wiktionary

CKAN entry has no examples here http://thedatahub.org/dataset/wiktionary-dbpedia

Original issue reported on code.google.com by [email protected] on 3 Aug 2012 at 2:07

[deleted issue]

[deleted issue]

Link to data categories

wiktionary.dbpedia.org does not seem to use any data categories, even more 
weirdly it uses its own vocabulary to indicate most categories: e.g.,

http://wiktionary.dbpedia.org/page/Dutch-English-Noun-1fr dc:language 
http://wiktionary.dbpedia.org/terms/English

Thus, dbpedia:English is both a word and a language!

http://dublincore.org/documents/dcmi-terms/#elements-language recommends using 
RFC4646-based IDs, e.g., "en","de","fr" etc.

Similarly: http://wiktionary.dbpedia.org/page/English-English-Noun-1fr hasPoS 
http://wiktionary.dbpedia.org/terms/Noun 

Could you consider linking to ISOcat (http://www.isocat.org e.g., 
http://www.isocat.org/datcat/DC-1333). Guidelines for linking are here 
http://svn.aksw.org/papers/2012/LDL/ldl2012_proceedings/public/windhouwer.pdf

Original issue reported on code.google.com by [email protected] on 2 Aug 2012 at 6:49

GOLD


The General Ontology of Linguistic Description (GOLD) is in the LLOD image but 
was labeled FAIL this morning. It's OWL/RDF.

If you have a dataset, you want to have reviewed or converted, enter the
information below.

1. (Required) Provide a link of a dataset at http://thedatahub.org . (If
the dataset is not yet there, please add your dataset to thedatahub first.)

The data is not yet in thedatahub. It's available here:

http://linguistics-ontology.org/version

2. (Required) Who should we contact for questions?

Me.

Further questions (optional):
- Is the data in RDF already?

Yes.

- What should we do with the data set (convert to RDF, review quality of
RDF, fix RDF , interlink with other data set, provide hosting)

Figure out why it's labelled FAIL. Figure out how it should be linked to the 
LLOD.

Original issue reported on code.google.com by [email protected] on 2 Aug 2012 at 11:55

Lexvo links in Glottolog

Glottolog contains a lot of triples like:
<http://glottolog.livingsources.org/resource/languoid/id/guaj1245> 
<http://www.w3.org/2002/07/owl#seeAlso> 
<http://www.lexvo.org/page/iso639-3/Guajiboan> .

'http://www.lexvo.org/page/iso639-3/Guajiboan' 404s (presumably because that's 
not an iso-639-3 code - maybe related to Issue 9?), and OWL doesn't have a 
'seeAlso': ITYM 'http://www.w3.org/2000/01/rdf-schema#seeAlso'

Original issue reported on code.google.com by joregan on 2 Aug 2012 at 6:02

[deleted issue]

[deleted issue]

Expand Wiki page with links and test and procedures

http://code.google.com/p/mlode/wiki/TestingLinkedDataAndRDF

Original issue reported on code.google.com by [email protected] on 16 Jul 2012 at 3:49

World Atlas of Languages (WALS)

1. (Required) Provide a link of a dataset at http://thedatahub.org . (If
the dataset is not yet there, please add your dataset to thedatahub first.)
http://thedatahub.org/dataset/wals

2. (Required) Who should we contact for questions?
Probably Sebastian Nordhoff?

Further questions (optional):
- Is the data in RDF already?
yes, you can attach .rdf to any resource, see: 
http://wals.info/languoid/family/jabuti
http://wals.info/languoid/family/jabuti.rdf

- What should we do with the data set (convert to RDF, review quality of
RDF, fix RDF , interlink with other data set, provide hosting)
review and interlink

Original issue reported on code.google.com by [email protected] on 3 Aug 2012 at 9:36

"boughen" is homonym "bow"!

Here: http://wiktionary.dbpedia.org/page/bow-English-Noun

Clearly false (homonyms are spelled the same) and also  I can't find it on 
Wiktionary 

https://www.google.de/search?aq=f&sugexp=chrome,mod=17&sourceid=chrome&ie=UTF-8&
q=boughen+site%3Awiktionary.org#hl=en&newwindow=1&sclient=psy-ab&q=boughen+site:
wiktionary.org&oq=boughen+site:wiktionary.org&gs_l=serp.3...84646.85317.1.85591.
6.6.0.0.0.2.127.601.3j3.6.0...0.0...1c.kAkGi9CvQh4&pbx=1&bav=on.2,or.r_gc.r_pw.r
_cp.r_qf.&fp=c53f52b43bdc3012&biw=1680&bih=949

Original issue reported on code.google.com by [email protected] on 2 Aug 2012 at 6:23

inspection of http://www.glottolog.org/resource/languoid/id/khoi1249

What steps will reproduce the problem?
1. go to page http://www.glottolog.org/resource/languoid/id/khoi1249 in your 
browser
2. the following links are not working:

http://www.ethnologue.com/show_language.asp?code=Khoisan
http://www.sil.org/iso639-3/documentation.asp?id=Khoisan
http://en.wikipedia.org/wiki/ISO_639:Khoisan
http://www.language-archives.org/language/Khoisan
http://multitree.org/codes/Khoisan
http://www.llmap.org/maps/by-code/Khoisan.html
http://linguistlist.org/forms/langs/LLDescription.cfm?code=Khoisan
http://odin.linguistlist.org/igt_urls.php?lang=Khoisan
http://scriptsource.org/lang/Khoisan

only: http://wals.info/languoid/family/khoisan is working

3. xhtml is not valid:
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.glottolog.org%2Fresource%2Fla
nguoid%2Fid%2Fkhoi1249.xhtml&charset=%28detect+automatically%29&doctype=Inline&g
roup=0

Worst is, that there are several <html> tags, see line 50 and 51 in attachment.


4. why is there an :80 in all links ? 
5. http://www.glottolog.org/resource/languoid/id/khoi1249.xhtml does not 
contain a 
  <link rel="alternate" type="application/rdf+xml" href="http://dbpedia.org/data/London.rdf" title="Structured Descriptor Document (RDF/XML format)" />
or a <title></title>

Original issue reported on code.google.com by [email protected] on 18 Jul 2012 at 8:05

Attachments:

khoi1249.htm

LemonWordNet

DataHub: http://thedatahub.org/en/dataset/lemonwordnet

Contact: [email protected]

Original issue reported on code.google.com by [email protected] on 1 Aug 2012 at 12:45

Merged into: #91

Glottolog/Langdoc

If you have a dataset, you want to have reviewed or converted, enter the
information below.

1. (Required) Provide a link of a dataset at http://thedatahub.org . If the
dataset is not yet there, please add your dataset to thedatahub first.
http://thedatahub.org/de/dataset/glottolog-langdoc

2. (Required) Who should we contact for questions?
http://www.eva.mpg.de/lingua/staff/nordhoff/home.php

Further questions (optional):
- Is the data in RDF already?
yes

- What should we do with the data set (convert to RDF, review quality of
RDF, fix RDF , interlink with other data set, provide hosting)
review and send errors to Sebastian N.

Original issue reported on code.google.com by [email protected] on 16 Jul 2012 at 1:53

[deleted issue]

[deleted issue]

Ontos data set not crawlable via Linked Data

See the script in the repository, which crawls linked data:
http://code.google.com/p/mlode/wiki/DebuggingLinkedData#linked_data_crawler_test


./testLinkedData.sh  
"http://www.ontosearch.com/2008/01/identification#EID-3b79064eeb9930abe4da398caf
c870ef" "http://www.ontosearch.com/2008/01/identification"

retrieves 3 triples only:
_:genid1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://88.198.67.99:8080/dereferencer/2008/01/rdf#identification> .
_:genid1 <http://purl.org/dc/elements/1.1/publisher> "Ontos AG" .
_:genid1 <http://purl.org/dc/terms/license> 
"http://creativecommons.org/licenses/by-nc/3.0/" .

./testLinkedData.sh  
"http://www.ontosearch.com/2008/01/identification%23EID-3b79064eeb9930abe4da398c
afc870ef" "http://www.ontosearch.com/2008/01/identification"

the first retrieval works now, but crawling is impossible, because of the #

**************************
retrieving 
http://www.ontosearch.com/2008/01/identification#EID-32b509444fa7c3dd76ca2da5c22
3bfd6
**************************
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   797  100   797    0     0   5641      0 --:--:-- --:--:-- --:--:--  5641
rapper: Parsing URI file:///tmp/test/mlode/unix_debugging_scripts/test.rdf with 
parser guess
rapper: Serializing with serializer ntriples
rapper: Guessed parser name 'rdfxml'
rapper: Parsing returned 3 triples


recommended fix:
curl -H "Accept: XXX" 
"http://www.ontosearch.com/2008/01/identification%23EID-3b79064eeb9930abe4da398c
afc870ef"

If XXX=application/rdf+xml redirect with 303 to 
"http://www.ontosearch.com/2008/01/rdf/EID-3b79064eeb9930abe4da398cafc870ef"
if XXX=text/html redirect with 303 to 
"http://www.ontosearch.com/2008/01/identification%23EID-3b79064eeb9930abe4da398c
afc870ef"

Original issue reported on code.google.com by [email protected] on 26 Jul 2012 at 10:00

Attachments:

log.txt

LemonWordNet has no links

LemonWordNet has no links; strangely enough, not even to the W3C conversion 
(which it's based on). 

I've started on this.

Original issue reported on code.google.com by joregan on 2 Aug 2012 at 9:37

xmlns:lexvo contains white space

What steps will reproduce the problem?

curl -L -H "Accept: application/rdf+xml" 
http://glottolog.livingsources.org/resource/languoid/id/tuva1238 | grep 
'xmlns:lexvo'


What is the expected output? What do you see instead?
Output:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
xmlns:powder="http://www.w3.org/2007/05/powder-s#" 
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" 
xmlns:owl="http://www.w3.org/2002/07/owl#" 
xmlns:gold="http://purl.org/linguistics/gold/" 
xmlns:dcterms="http://purl.org/dc/terms/" 
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" 
xmlns:skos="http://www.w3.org/2004/02/skos/core#" 
xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" 
xmlns:foaf="http://xmlns.com/foaf/0.1/" 
xmlns:glottolog="http://glottolog.livingsources.org/ontologies/glottolog" 
xmlns:lexvo=" http://lexvo.org/ontology" 
xmlns:lingvoj="http://www.lingvoj.org/ontology">

Error:
" xmlns:lexvo=" http://lexvo.org/ontology" contain white space.


How many triples are affected? (if less than 3-5% of the whole data set,
please set priority to _low_)

Please use labels and text to provide additional information.

Original issue reported on code.google.com by [email protected] on 26 Jul 2012 at 1:16

xml:lang attributes contain whitespaces

What steps will reproduce the problem?

1.
wget http://www.glottolog.org/downloadarea/languoids.rdf.zip 
unzip http://www.glottolog.org/downloadarea/languoids.rdf.zip 
rapper -i rdfxml languoids.rdf -o ntriples > languoids.nt 
rapper -i ntriples -c languoids.nt 2> parseerror.log 

2. Inspecting the "junk" lines with vim, I find "@south africa" or "@central 
african republic". They are converted from the xml:lang attribute of 
skos:altLabel.

You can check it yourself with

cat languoids.nt | grep @south


What is the expected output? What do you see instead?
xml:lang attribute tags should confirm to BCP 47 
(http://tools.ietf.org/rfc/bcp/bcp47.txt) 

Whitespace is not permitted in a language tag.

That's why languoids.nt has 31 triples less than languoids.rdf

Original issue reported on code.google.com by [email protected] on 16 Jul 2012 at 3:03

Merged into: #70

Attachments:

parseerror.log

Find a way to keep track of people


Please think of a way on how we are going to keep track of people.

I would suggest having an OntoWiki and then using Exhibit, so you can browse 
everything.

It would be perfect to have soem sort of visualisation of who helped to 
convert/fix data and who will  attend the conference.

Pictures are a must!

Also you will need a way to keep track who is on which mailing list. Maybe with 
a google spreadsheet?

Original issue reported on code.google.com by [email protected] on 16 Jul 2012 at 2:08

Wrong licence in LemonWordNet CKAN entry

As noted in: 
http://code.google.com/p/mlode/issues/detail?id=29

At the CKAN entry of http://thedatahub.org/en/dataset/lemonwordnet the licence 
field is incorrect: it reads 'Other (public domain)' where it should read 
'Other (attribution)'.

Please fix it, CKAN is a wiki.

Original issue reported on code.google.com by [email protected] on 2 Aug 2012 at 8:26

Add examples to CKAN entry of LemonWordnet

The entry of http://thedatahub.org/en/dataset/lemonwordnet
is a little bit sparse.
Could you add some Linked Data examples as in 
http://thedatahub.org/en/dataset/dbpedia

What about sparql endpoint?

Original issue reported on code.google.com by [email protected] on 2 Aug 2012 at 8:32

Lexing error on this page http://en.wiktionary.org/wiki/Dutch

Lexing error on this page http://en.wiktionary.org/wiki/Dutch results in giving 
"ʌtʃ" as the pronounciation of 
http://wiktionary.dbpedia.org/page/Dutch-English

Original issue reported on code.google.com by [email protected] on 2 Aug 2012 at 6:40

Unicode percent encoded URLs don't resolve

e.g., 

http://wiktionary.dbpedia.org/page/bow-English-Noun-2de

Links to 

http://wiktionary.dbpedia.org/page/n%C5%93ud-French

No data there, yet the page exists: http://en.wiktionary.org/wiki/n%C5%93ud

Related to this no page on DBpedia Wiktionary returns a 404

e.g., 
http://wiktionary.dbpedia.org/page/dafjkadjfklajfkdjsklfjasdkljfdsakjfkadjfa-Fre
nch

Original issue reported on code.google.com by [email protected] on 2 Aug 2012 at 6:31

richardlitt / mlode Goto Github PK

mlode's People

Watchers

mlode's Issues

Recommend Projects

Recommend Topics

Recommend Org