Coder Social home page Coder Social logo

webbotparser's Introduction

webbotparseR

Codecov test coverage R-CMD-check

webbotparseR allows to parse search engine results that where scraped with the WebBot browser extension. A similar python library is also available.

Installation

You can install the development version of webbotparseR like so:

remotes::install_github("schochastics/webbotparseR")

The package contains an example html from a google search on climate change.

library(webbotparseR)
ex_file <- system.file("www.google.com_climatechange_text_2023-03-16_08_16_11.html", package = "webbotparseR")

Such search results can be parsed via the function parse_search_results(). The parameter engine is used to specify the search engine and the search type.

output <- parse_search_results(path = ex_file,engine = "google text")
output
#> # A tibble: 10 × 10
#>    title link  text  image page  posit…¹ searc…² type  query date               
#>    <chr> <chr> <chr> <chr> <chr>   <int> <chr>   <chr> <chr> <dttm>             
#>  1 What… http… Clim… data… 1           1 www.go… text  clim… 2023-03-16 08:16:11
#>  2 Home… http… Vita… data… 1           2 www.go… text  clim… 2023-03-16 08:16:11
#>  3 Vita… http… “Cli… data… 1           3 www.go… text  clim… 2023-03-16 08:16:11
#>  4 Clim… http… In c… data… 1           4 www.go… text  clim… 2023-03-16 08:16:11
#>  5 IPCC… http… The … data… 1           5 www.go… text  clim… 2023-03-16 08:16:11
#>  6 Clim… http… Comp… data… 1           6 www.go… text  clim… 2023-03-16 08:16:11
#>  7 Clim… http… Clim… <NA>  1           7 www.go… text  clim… 2023-03-16 08:16:11
#>  8 UNFC… http… What… data… 1           8 www.go… text  clim… 2023-03-16 08:16:11
#>  9 Clim… http… Clim… data… 1           9 www.go… text  clim… 2023-03-16 08:16:11
#> 10 Caus… http… This… data… 1          10 www.go… text  clim… 2023-03-16 08:16:11
#> # … with abbreviated variable names ¹​position, ²​search_engine

Note that images are always returned base64 encoded.

output$image[1]
#> [1] "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAIAAACQkWg2AAAABnRSTlMAAAAAAABupgeRAAAAMklEQVR4AWMAgYYG4hEdNJAHGoCIABvBJayhgcYaIAwaakCwydUA52MKYeeSCgZh4gMAXrJ9ASggqqAAAAAASUVORK5CYII="

The function base64_to_img() can be used to decode the image and save it in an appropriate format.

webbotparser's People

Contributors

schochastics avatar chainsawriot avatar wanlo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.