Coder Social home page Coder Social logo

ispras / web-scraper-chrome-extension Goto Github PK

View Code? Open in Web Editor NEW
198.0 6.0 65.0 4.07 MB

Web data extraction tool implemented as chrome extension

License: GNU Lesser General Public License v3.0

JavaScript 79.82% CSS 6.82% HTML 13.36%
webscraping scraping scraping-tool javascript

web-scraper-chrome-extension's Introduction

Web Scraper

Web Scraper is a chrome browser extension built for data extraction from web pages. Using this extension you can create a plan (sitemap) how a web site should be traversed and what should be extracted. Using these sitemaps the Web Scraper will navigate the site accordingly and extract all data. Scraped data later can be exported as CSV or JSON Lines.

Latest Version

Read about installation process on installation page.

Changelog

v0.3.6

  • Updated support for Tables (update vertical tables support and added complex headers and data rows)
  • Added export and import sitemap from file
  • Added Russian translations and support of i18n that make possible to add every language translation
  • Added Rest Api CRUD storage for sitemaps
  • Moved to webpack bundler
  • Added id hints from predefined model
  • Added selectors for Constants and Documents
  • Refactored preview data and added search in scraped data
  • Refactored returned items model to JSON
  • Added saving in JSON lines

v0.3

  • Enabled pasting of multiple start URLs (by @jwillmer)
  • Added scraping of dynamic table columns (by @jwillmer)
  • Added style extraction type (by @jwillmer)
  • Added text manipulation (trim, replace, prefix, suffix, remove HTML) (by @jwillmer)
  • Added image improvements to find images in div background (by @jwillmer)
  • Added support for vertical tables (by @jwillmer)
  • Added random delay function between requests (by @Euphorbium)
  • Start URL can now also be a local URL (by @3flex)
  • Added CSV export options (by @mohamnag)
  • Added Regex group for select (by @RuneHL)
  • JSON export/import of settings (by @haisi)
  • Added date and number pattern in URL (by @codoff)
  • Added pagination selector limit (by @codoff)
  • Improved CSV export (by @haisi)
  • Added click limit option (by @panna-ahmed)

v0.2

  • Added Element click selector
  • Added Element scroll down selector
  • Added Link popup selector
  • Improved table selector to work with any html markup
  • Added Image download
  • Added keyboard shortcuts when selecting elements
  • Added configurable delay before using selector
  • Added configurable delay between page visiting
  • Added multiple start url configuration
  • Added form field validation
  • Fixed a lot of bugs

v0.1.3

  • Added Table selector
  • Added HTML selector
  • Added HTML attribute selector
  • Added data preview
  • Added ranged start urls
  • Fixed bug which made selector tree not to show on some operating systems

Bugs

When submitting a bug please attach an exported sitemap if possible.

Development

Read the Development Instructions before you start.

License

LGPLv3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.