Coder Social home page Coder Social logo

mediawiki_rust's People

Contributors

1-byte avatar enterprisey avatar erutuon avatar fenhl avatar legoktm avatar magnusmanske avatar moxian avatar oylenshpeegul avatar qedk avatar siddharthvp avatar v-gar avatar waldyrious avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

mediawiki_rust's Issues

Purpose of src/bin/main.rs?

What's the purpose of src/bin/main.rs? It looks like a script but I'm not sure if it's just for testing or we should be updating/maintaining it as part of this crate.

Unresolved import futures while compiling

Hi,

I have been using mediawiki for a small scraping tool, but an error came up inside the lib recently (It was working fine before).
I have cloned the repo to see if it was just my setup but I am having the same issue:

Compiling mediawiki v0.2.7 (E:\Projects\mediawiki_rust)
error[E0432]: unresolved import `futures`
  --> src\api.rs:25:5
   |
25 | use futures::{Stream, StreamExt};
   |     ^^^^^^^ use of undeclared crate or module `futures`

error[E0433]: failed to resolve: use of undeclared crate or module `futures`
   --> src\api.rs:364:9
    |
364 |         futures::stream::unfold(initial_query_state, |mut query_state| async move {
    |         ^^^^^^^ use of undeclared crate or module `futures`

error: aborting due to 2 previous errors

Some errors have detailed explanations: E0432, E0433.
For more information about an error, try `rustc --explain E0432`.
error: could not compile `mediawiki`

What I did was:

  • git clone
  • cargo run --package mediawiki --bin main (through rust-analyzer on visual studio code)

Future direction?

Hi Magnus,

First, thanks for working on this crate - it is a great foundation and all of my tools so far use it.

I started writing some more advanced bots last week using this crate as the base and felt like I was missing stuff that I've come to expect from using Pywikibot for so many years. IMO, right now the crate takes care of the basics for login, token handling, and simple objects for titles, getting page text and other properties like links, external links, coordinates, editing, etc. But there's no high level error types, credential storage/handling, automatic retries, logging, other actions like page moving, deletion, protection and so on. And then there's stuff like {{nobots}} handling which is solely for bots and not other MW API consumers.

What do you see the scope of this crate as being? Are more bot-like functions and high-level types welcome contributions? Or would you see those go in a higher level "wikibot" crate that builds on top of this one? Something in the middle?

I don't want to step on any toes nor duplicate any work, but this is something I'd like to work on to make adopting Rust even easier.

Thanks!

P.S. I (along with @enterprisey) am starting a Wikimedia Rust developers user group, which you are definitely invited to join, that we hope will work on issues like this.

Structs or enums for API responses

Using serde_json::Value to represent API responses is pretty laborious. It requires lots of .as_object() or .as_str() and then checking that the result is Some(_), or Option::map, or matching on variants of serde_json::Value, etc.

I propose creating custom structs or enums to represent responses. This makes accessing fields in the JSON as simple as accessing fields in the struct or enum. For instance, this example shows a struct that could be used in the Page::text method (and returns a serde_json::Value if the JSON fails to deserialize as RevisionsResponse, though that may not be necessary):

use serde::Deserialize;
use serde_json::Value as JsonValue;
use std::collections::HashMap;
use url::Url;

#[derive(Debug, Deserialize)]
#[serde(untagged)]
enum FallibleDeserialization<T> {
    Success(T),
    Failure(JsonValue)
}

#[derive(Debug, Deserialize)]
#[allow(unused)]
struct RevisionsResponse {
    batchcomplete: bool,
    query: PagesQuery,
}

#[derive(Debug, Deserialize)]
struct PagesQuery {
    pages: Vec<Page>,
}

#[derive(Debug, Deserialize)]
struct Page {
    #[serde(rename = "pageid")]
    id: u32,
    #[serde(rename = "ns")]
    ns: i32,
    title: String,
    revisions: Vec<Revision>,
}

#[derive(Debug, Deserialize)]
struct Revision {
    slots: HashMap<String, RevisionSlot>,
}

#[derive(Debug, Deserialize)]
struct RevisionSlot {
    #[serde(rename = "contentmodel")]
    content_model: String,
    #[serde(rename = "contentformat")]
    content_format: String,
    content: String,
}

#[tokio::main]
async fn main() {
    let mut url: Url = Url::parse("https://en.wiktionary.org/w/api.php").unwrap();
    url.set_query(Some(&serde_urlencoded::to_string(&[
        ("action", "query"),
        ("prop", "revisions"),
        ("titles", "Template:link"),
        ("rvslots", "*"),
        ("rvprop", "content"),
        ("formatversion", "2"),
        ("format", "json"),
    ]).unwrap()));
    let response: FallibleDeserialization<RevisionsResponse> = reqwest::get(url).await.unwrap().json().await.unwrap();
    if let FallibleDeserialization::Success(response) = response {
        for Page { revisions, .. } in response.query.pages {
            for Revision { slots } in revisions {
                let slot = slots.get("main").or_else(|| slots.iter().next().map(|(_, slot)| slot));
                dbg!(slot);
            }
        }
    }
}

Dependencies in Cargo.toml:

reqwest = { version = "0.10", features = ["json"]}
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
serde_urlencoded = "0.6"
tokio = { version = "0.2", features = ["rt-core", "macros"] }
url = "2.1"

To make this possible, the Api methods that currently decode the response as serde_json::Value (ultimately via Api::query_api_json) would need to be generic, so that they could instead deserialize into a more specific struct or enum (something like, in the example above, FallibleDeserialization<RevisionsResponse>).

And Api::get_query_api_json_limit would probably need some way to perform the function of Api::json_merge generically, for the various structs that it would return in place of serde_json::Value. For instance it could be generic over a trait that has a merge method (maybe named MergeableResponse).

Difficulties: Using the Deserialize derive macro will add to compile time. Also it may require trial-and-error to figure out what the schema for the API responses actually is.

Page.edit_text() should have edit conflict and integrity protection

Integrity protection is as simple as filling in the md5 parameter with a hash of the text value.

For edit conflict protection, we need to pass the revision id and timestamp of the revision we obtained from .text(). My suggestion is to have Page keep track of the text/revision info (lazy loading it) so if text() was called, then edit_text() can pass it back for conflict detection. Lazy-loading text info would also unlock preloading from generators in the future, but that's another issue...

Improve error handling

We currently have functions that return a Result that has an error of:

  • Box<dyn Error> (problematic because it's not thread-safe)
  • PageError
  • String
  • MediaWikiError
  • &str

The inconsistencies make it difficult to use ? because you usually have to map the err to something else first. My proposal is to have one single mediawiki::Error type that all functions use. This type would have from implementations for reqwest:Error, serde_json::Error, and then ones for MissingPage, and so on, and then a generic UnknownAPIError. The thiserror crate should make writing it straightforward.

Clients then only need to implement from for one type, and you only need to import one error type in.

If that sounds good, then I can rework my existing PR into this direction. Or we can keep discussing :)

Can't build time dependency

  • rustc 1.43.1 (8d69840ab 2020-05-04)
  • cargo 1.43.0 (2cbe9048e 2020-05-03)

My code is more or less the sample code in the README with the query parameters tweaked:

$ cargo build
   Compiling time v0.2.8
   Compiling user_agent v0.9.0
error: expected an item keyword
   --> /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/time-0.2.8/src/utc_offset.rs:366:13
    |
366 |             let tm = timestamp_to_tm(datetime.timestamp())?;
    |             ^^^

error: aborting due to previous error

error: could not compile `time`.

Apparently this is because of time-rs/time#233 - if there's a way to workaround that it would be nice, but I understand if it's just waiting for other packages to update. I'm also a rust noob so it's totally possible I'm doing something wrong, help appreciated.

Offer async API

Hi,
have you thought about providing an async API? Since you are using reqwest, it should be easily possible by using reqwest::async::Client and returning futures instead of results.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.