Coder Social home page Coder Social logo

Comments (1)

nathaniel-daniel avatar nathaniel-daniel commented on September 27, 2024 1

@Michael-Purtill ElementRef::wrap only works if the Node is an element. However, next_sibling can return things that aren't elements, like text. As a result, you need to call the iterator next_siblings and filter that instead.

Here's a version that works:

use scraper::Html;
use scraper::Selector;

fn main() {
    let response = ureq::get("https://en.wiktionary.org/wiki/pes#Czech")
        .call()
        .expect("invalid request");
    let response_text = response.into_string().expect("invalid response");

    let doc = Html::parse_document(&response_text);

    let h2_selector = Selector::parse("#Czech").expect("invalid selector");
    let h2 = doc
        .select(&h2_selector)
        .next()
        .expect("missing h2")
        .parent()
        .expect("missing parent");

    println!("{}\n", scraper::ElementRef::wrap(h2).expect("h2 is not an element").html());

    let element = h2
        .next_siblings()
        .find(|node| node.value().is_element())
        .expect("missing next_sibling");

    println!("{}", scraper::ElementRef::wrap(element).expect("element is not an element").html());
}

yields:

<h2><span id="Czech" class="mw-headline">Czech</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a title="Edit section: Czech" href="/w/index.php?title=pes&amp;action=edit&amp;section=16">edit</a><span class="mw-editsection-bracket">]</span></span></h2>

<div class="sister-wikipedia sister-project noprint floatright" style="border: 1px solid #aaa; font-size: 90%; background: #f9f9f9; width: 250px; padding: 4px; text-align: left;"><div style="float: left;"><div class="floatnone"><img alt="" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/6/63/Wikipedia-logo.png/66px-Wikipedia-logo.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/6/63/Wikipedia-logo.png/88px-Wikipedia-logo.png 2x" data-file-width="200" height="44" decoding="async" width="44" src="//upload.wikimedia.org/wikipedia/commons/thumb/6/63/Wikipedia-logo.png/44px-Wikipedia-logo.png" data-file-height="200"></div></div><div style="margin-left: 60px;">Czech <a title="Wikipedia" href="/wiki/Wikipedia">Wikipedia</a> has an article on:<div style="margin-left: 10px;"><b lang="cs" class="Latn"><a class="extiw" title="w:cs:pes" href="https://en.wikipedia.org/wiki/cs:pes">pes</a></b></div></div><span class="interProject"><a href="https://en.wikipedia.org/wiki/cs:pes" class="extiw" title="w:cs:pes">Wikipedia <sup>cs</sup></a></span></div>

from scraper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.