Coder Social home page Coder Social logo

octoparse's Introduction

Octoparse -- A free client-side Windows web scraping software that turns unstructured or semi-structured data from websites into a structured Dataset without coding.

image

If you can use a web browser, you can use Octoparse.Crawlers run in Octoparse are determined by the rules configured. The extraction rule would tell Octoparse: which website is to be open; where is the data you plan to crawl; what kind of data you want, etc.

Octoparse simulates web browsing behavior such as opening a web page, logging into an account, entering a text, pointing-and-clicking the web element, etc. Our tool allows users to easily get data by clicking the information in the built in browser. image

Point-and-Click Interface

  • Simply point and click web data
  • Automatically extract all the data in similar layout.
  • No coding required for most 98% websites

Deal with almost all the websites - dynamic or static

  • Extract text, image URLs, links, etc.
  • Extract data from listing pages, sites with infinite scrolling, pagination, etc.
  • Extract data from dropdown menus
  • Extract data behind log in
  • Extract data loaded with AJAX, JavaScript, etc.

Extract data from websites precisely

  • Automatically generates XPath
  • Built-in XPath tool
  • Built-in RegEx tool
  • Extract data using cloud servers 24/7
  • Extract and store your data in the cloud platform
  • Automatic IP rotation -- Avoiding IP being blacklisted.
  • Scheduled extraction tasks

Export data in any format you like

Store the data Octoparse extracts on our cloud platform. Or export the data in any format you like:

  • API
  • CVS
  • Excel
  • HTML
  • TXT
  • Database(MySQL,SQLServer,Oracle)

Links

octoparse's People

Contributors

octoparse avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.