Coder Social home page Coder Social logo

linq-to-wiki's Introduction

LinqToWiki

LinqToWiki is a library for accessing sites running MediaWiki (including Wikipedia) through the MediaWiki API from .Net languages like C# and VB.NET.

It can be used to do almost anything that can be done from the web interface and more, including things like editing articles, listing articles in categories, listing all kinds of links on a page and much more. Querying the various lists available can be done using LINQ queries, which then get translated into efficient API requests.

The library is strongly-typed, which means it should be hard to make invalid requests and it also makes it easy to discover available methods and properties though IntelliSense.

Because the API can vary from wiki to wiki, it's necessary to configure the library thorough an automatically generated assembly.

Downloads

Usage

Simple example

For example, to edit the Sandbox on the English Wikipedia anonymously, you can use the following:

var wiki = new Wiki("TheNameOfMyBot/1.0 (http://website, myemail@site)", "en.wikipedia.org");

// get edit token, necessary to edit pages
var token = wiki.tokens(new[] { tokenstype.edit }).edittoken;

// create new section called "Hello" on the page "Wikipedia:Sandbox"
wiki.edit(
    token: token, title: "Wikipedia:Sandbox", section: "new", sectiontitle: "Hello", text: "Hello world!");

As you can see, in methods like this, you should use named parameters, because the edit() method has lots of them, and you probably don't need them all.

The code looks more convoluted than necessary (can't the library get the token for me?), but that's because it's all generated automatically.

Queries

Where LINQ to Wiki really shines, though, are queries: If you wanted to get the names of all pages in Category:Mammals of Indonesia, you can do:

var pages = (from cm in wiki.Query.categorymembers()
             where cm.title == "Category:Mammals of Indonesia"
             select cm.title)
            .ToEnumerable();
List of mammals of Indonesia
Mammals of Borneo
Agile gibbon
Andrew's Hill Rat
Anoa
…

The call to ToEnumerable() (or, alternatively, ToList()) is necessary, so that LINQ to Wiki methods don't get mixed up with LINQ to Objects methods, but the result is now an ordinary IEnumerable<string>.

Well, actually, you want the list sorted backwards (maybe you want to know whether there are any Indonesiam mammals whose name starts with Z):

var pages = (from cm in wiki.Query.categorymembers()
              where cm.title == "Category:Mammals of Indonesia"
              orderby cm descending 
              select cm.title)
    .ToEnumerable();
Wild water buffalo
Wild boar
Whitish Dwarf Squirrel
Whitehead's Woolly Bat
White-thighed surili
…

Hmm, no luck with the Z. Okay, can I get the first section of those articles? This is where things start to get more comlicated. If you were using the API directly, you would have to use generators. LINQ to Wiki can handle that for you, but since generators are quite powerful, you have to do something like this:

var pages = (from cm in wiki.Query.categorymembers()
             where cm.title == "Category:Mammals of Indonesia"
             orderby cm descending
             select cm)
    .Pages
    .Select(
        page =>
        new
        {
            title = page.info.title,
            text = page.revisions()
                .Where(r => r.section == "0")
                .Select(r => r.value)
                .FirstOrDefault()
        })
    .ToEnumerable();
Wild water buffalo
{{About|the wild species|the domestic livestock varieties descended from it|water buffalo}}

{{Taxobox|…}}

The '''wild water buffalo''' (''Bubalus arnee''), also called '''Asian buffalo''' and '''Asiatic buffalo''',
is a large [[bovinae|bovine]] native to [[Southeast Asia]].  …

This deserves some explanation. When you use Pages to access more information about the pages in some list, you then call Select() to choose what exactly do you want to know. In that Select(), you can use info for basic information about the page, like its name, ID or whether you are watching it. Then there are several lists, including revisions(). You can again use LINQ methods to alter this part of the query. For example, I want only the first section (Where(r => r.section == "0")), I want to select the text of the revision (here called “value”, Select(r => r.value)) and only for the first (latest) revision (FirstOrDefault()).

For examples of almost all methods in LINQ to Wiki, have a look at the LinqToWiki.Samples project.

Generating configuration for a wiki

To generate a configuration assembly for a certain wiki, you can use the linqtowiki-codegen command-line application (see the LinqToWiki.Codegen.App project). If you run it without parameters, it will show you basic usage, along with some examples:

Usage:    linqtowiki-codegen url-to-api [namespace [output-name]] [-d output-directory] [-p props-file-path]
Examples: linqtowiki-codegen en.wikipedia.org LinqToWiki.Enwiki linqtowiki-enwiki -d C:\Temp -p props-defaults-sample.xml
          linqtowiki-codegen https://en.wikipedia.org/w/api.php

The application retrieves information about the API from the API itself, using the URL you gave as a first parameter. This requires information about properties of results of the API, that was not previously available from the API, and was added there because of this library. This was done quite recently (on 12 June 2012), so it's not available in the most recent public version of MediaWiki (1.19.1) but it is available in the version currently in use on Wikipedia (1.20wmf7).

If you don't have recent enough version of MediaWiki, you can use a workaround: get the necessary information from a file. The file looks almost the same as an API response in XML format that would contain the information. There is a sample of the file available, which will most likely work for you out of the box.

You don't have to generate a separate assembly for each wiki, if the methods you want to use look the same on all of them. In that case, don't forget to specify which wiki do you want to use in the constructor of the Wiki class.

If you want to access multiple wikis with different configuration assemblies from one program, you can, if you generate each of them into a different namespace (the default namespace is LinqToWiki.Generated).

If you want to do something more complicated regarding generating the configuration assemblies (for example, create a bunch of C# files that you can modify by hand and then compile into a configuration assembly), you can use the LinqToWiki.Codegen library directly from your own application.

Developer documentation

If you want to modify this code (patches are welcome) or just have a look at the implementation, here is a short overview of the projects (more details are in the project directories):

  • LinqToWiki.Core – The core of the library. This project is referenced by all other projects and contains types necessary for acessing the API, processing LINQ expressions, etc.
  • LinqToWiki.Codegen – Handles generating code using Roslyn.
  • LinqToWiki.Codegen.App – The linqtowiki-codegen command-line application, see above.
  • LinqToWiki.Samples – Samples of code that uses this library.
  • LinqToWiki.ManuallyGenerated – A manually written configuration assembly. You could use this as a template for your own configuration assembly, but otherwise it's mostly useless.

linq-to-wiki's People

Contributors

svick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

linq-to-wiki's Issues

case sensitive

Below code wont work if the title is compared against a lower case string. It works only if the match is accurate. My equals method with ignorecase didnt work.

where cm.title.ToLower() == "Category:Mammals of Indonesia".ToLower() throws error

var pages = (from cm in wiki.Query.categorymembers()
where cm.title == "Category:Mammals of Indonesia"
orderby cm descending
select cm.title)
.ToEnumerable();

Add support for HTTP authentication

Our wiki uses HTTP Basic Authentication, so it would be useful if the Wiki constructor could take a HttpBasicAuthenticator, or a username/password pair, as an optional argument.

Getting an edit token

this
var token = wiki.tokens(new[] { tokenstype.edit }).edittoken;

throws, no items in sequence

it creates this:
http://test1wiki.rauland-borg.net/mediawiki/api.php?action=tokens&type=edit

and that inputted into the browser gives me this:
"warnings": { "tokens": { "*": "action=tokens has been deprecated. Please use action=query&meta=tokens instead." } }, "tokens": { "edittoken": "ded92de33eec3c084822a721722057ce56a187ff+\\" } }

What "should" I be doing?

Not apparent if API is not installed, or a different error occured

and XmlException is returned when creating the Wiki object.
{"'>' is an unexpected token. The expected token is '"' or '''. Line 1, position 50."}

its not readily apparent what sort of error this is. I suspect it to be a case where the API.php is not installed.

Examples do not work

I need to parse the content of wikipedia pages and read lists form these pages. The examples that come close to parsing pages do not build, like the one on the home page which uses revisions().

Extracts support

Hi svick,

I saw a Stack Overflow you answered back in June 2013 relating to grabbing the first paragraph of a wiki article (http://stackoverflow.com/a/17055961/1795862). You noted that at that time Linq-to-Wiki didn't support the extracts functionality of the API, and I was wondering if this will still the case in newer versions of the library?

If this is still the case, how would you suggest I go about grabbing a plain text representation of the "introductory paragraph" of an article (i.e. the paragraph before the table of contents on each page).

Thanks,

Adam

Error while downloading

I had an exception while running my code. Here is my stacktrace
Hoofdelement ontbreekt. (rootelement is missing)
StackTrace: bij System.Xml.XmlTextReaderImpl.Throw(Exception e)
bij System.Xml.XmlTextReaderImpl.ParseDocumentContent()
bij System.Xml.XmlTextReaderImpl.Read()
bij System.Xml.Linq.XDocument.Load(XmlReader reader, LoadOptions options)
bij System.Xml.Linq.XDocument.Parse(String text, LoadOptions options)
bij System.Xml.Linq.XDocument.Parse(String text)
bij LinqToWiki.Download.Downloader.Download(IEnumerable1 parameters) in c:\code\LinqToWiki\LinqToWiki.Core\Download\Downloader.cs:regel 74 bij LinqToWiki.Internals.QueryProcessor.Download(WikiInfo wiki, IEnumerable1 processedParameters, IEnumerable1 queryContinues) in c:\code\LinqToWiki\LinqToWiki.Core\Internals\QueryProcessor.cs:regel 162 bij LinqToWiki.Internals.QueryProcessor.Download(WikiInfo wiki, IEnumerable1 processedParameters, HttpQueryParameter queryContinue) in c:\code\LinqToWiki\LinqToWiki.Core\Internals\QueryProcessor.cs:regel 144
bij LinqToWiki.Internals.QueryPageProcessor.d__91.MoveNext() in c:\code\LinqToWiki\LinqToWiki.Core\Internals\QueryPageProcessor.cs:regel 44 bij System.Linq.Enumerable.WhereSelectEnumerableIterator2.MoveNext()
bij System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source)
bij WikiOnWheels.Program.GetCurrentRevIdFromPageID(Int64 pid, String title) in Program.cs:regel 218
bij WikiOnWheels.Program.d__8.MoveNext() in Program.cs:regel 191
Where is this problem at? Is is something i can fix?

.Net Core

Is there interest of porting this project to .Net Core? I had a quick look and it seems it will require switching from RestSharp to some alternatives that ported to .Net Core like http://tmenier.github.io/Flurl/

Return totalhits of a query

var tothits = wiki.Query.search(title).ToEnumerable().Count();

How can I optimize this statement?
It 's really slow for many results.
thanks

Can't connect to private wiki

I have a private wiki where the users must log in to access the content.
When I try to do

var wiki=new Wiki("foo","http://mywiki.com");

I get following exception:

LinqToWiki.ApiErrorException was unhandled
  Code=readapidenied
  HResult=-2146233088
  Message=You need read permission to use this module
  Source=LinqToWiki.Core
  StackTrace:
       at LinqToWiki.Internals.QueryProcessor.Download(WikiInfo wiki, IEnumerable`1 processedParameters, IEnumerable`1 queryContinues) in c:\code\LinqToWiki\LinqToWiki.Core\Internals\QueryProcessor.cs:line 179
       at LinqToWiki.Internals.QueryProcessor.Download(WikiInfo wiki, IEnumerable`1 processedParameters, HttpQueryParameter queryContinue) in c:\code\LinqToWiki\LinqToWiki.Core\Internals\QueryProcessor.cs:line 144
       at LinqToWiki.Internals.QueryProcessor`1.Download(IEnumerable`1 processedParameters, HttpQueryParameter queryContinue) in c:\code\LinqToWiki\LinqToWiki.Core\Internals\QueryProcessor.cs:line 103
       at LinqToWiki.Internals.QueryProcessor`1.ExecuteSingle[TResult](QueryParameters`2 parameters) in c:\code\LinqToWiki\LinqToWiki.Core\Internals\QueryProcessor.cs:line 76
       at LinqToWiki.Internals.NamespaceInfo.GetNamespaces(WikiInfo wiki) in c:\code\LinqToWiki\LinqToWiki.Core\Internals\NamespaceInfo.cs:line 46
       at LinqToWiki.Internals.NamespaceInfo..ctor(WikiInfo wiki) in c:\code\LinqToWiki\LinqToWiki.Core\Internals\NamespaceInfo.cs:line 29
       at LinqToWiki.Internals.WikiInfo..ctor(String userAgent, String baseUrl, String apiPath, IEnumerable`1 namespaces) in c:\code\LinqToWiki\LinqToWiki.Core\Internals\WikiInfo.cs:line 70
       at LinqToWiki.Generated.Wiki..ctor(String userAgent, String baseUri, String apiPath) in c:\Users\Svick\AppData\Local\Temp\LinqToWiki\Wiki.cs:line 32

I allow get this exception if I try to use linqtowiki-codege

It must be a way to give username and password in the constructor, or it must not try to access the wiki before login if it do not have access.

XmlException on var w = new Wiki(...)

Hello,

I get an exception when calling the constructor of the Wiki class on wikipedia. The code worked fine a couple of months ago and hasn't been changed since. Can you give me a hint on what's going on? Thank you very much for your great work!!

Marc

var wiki = new Wiki("...", "en.wikipedia.org");

Throws a XmlException:

"'en' is an unexpected token. The expected token is '"' or '''. Line 2, position 12."

at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.ParseAttributes()
at System.Xml.XmlTextReaderImpl.ParseElement()
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r)
at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r, LoadOptions o)
at System.Xml.Linq.XDocument.Load(XmlReader reader, LoadOptions options)
at System.Xml.Linq.XDocument.Parse(String text, LoadOptions options)
at LinqToWiki.Download.Downloader.Download(IEnumerable1 parameters) in c:\code\LinqToWiki\LinqToWiki.Core\Download\Downloader.cs:line 74 at LinqToWiki.Internals.QueryProcessor.Download(WikiInfo wiki, IEnumerable1 processedParameters, IEnumerable1 queryContinues) in c:\code\LinqToWiki\LinqToWiki.Core\Internals\QueryProcessor.cs:line 156 at LinqToWiki.Internals.QueryProcessor1.ExecuteSingle[TResult](QueryParameters2 parameters) in c:\code\LinqToWiki\LinqToWiki.Core\Internals\QueryProcessor.cs:line 76 at LinqToWiki.Internals.NamespaceInfo.GetNamespaces(WikiInfo wiki) in c:\code\LinqToWiki\LinqToWiki.Core\Internals\NamespaceInfo.cs:line 38 at LinqToWiki.Internals.WikiInfo..ctor(String userAgent, String baseUrl, String apiPath, IEnumerable1 namespaces) in c:\code\LinqToWiki\LinqToWiki.Core\Internals\WikiInfo.cs:line 70
at LinqToWiki.Generated.Wiki..ctor(String userAgent, String baseUri, String apiPath) in c:\Users\Svick\AppData\Local\Temp\LinqToWiki\Wiki.cs:line 32

Can you made LINQ-To-Wiki UWP compatible?

Hello,

When I try to add the library from nuget, I got a bunch of compatibility errors:

LinqToWiki 1.5.0 is not compatible with UAP,Version=v10.0.
LinqToWiki.Core 1.3.0 is not compatible with UAP,Version=v10.0.
RestSharp 105.1.0 is not compatible with UAP,Version=v10.0.
Some packages are not compatible with UAP,Version=v10.0.
...

Can you made LINQ-To-Wiki UWP compatible?
I dont know if there is a RestSharp port to UWP.
I may check later

Thank you for your works!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.