Topic: scraping Goto Github

Some thing interesting about scraping

👇 Here are 5523 public repositories matching this topic...

aapatre / automatic-udemy-course-enroller-get-paid-udemy-courses-for-free

scraping,Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!

User: aapatre

python scraping selenium python3 scraper

adbar / trafilatura

scraping,Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

User: adbar

Home Page: https://trafilatura.readthedocs.io

web-scraping text-extraction nlp html2text news text-mining crawler text-cleaning text-preprocessing article-extractor

alirezamika / autoscraper

scraping,A Smart, Automatic, Fast and Lightweight Web Scraper for Python

User: alirezamika

scraping scraper scrape webscraping crawler web-scraping ai artificial-intelligence python webautomation

altimis / scweet

scraping,A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...

User: altimis

selenium-webdriver scraper scraping twitter tweets python following followers twitter-scraper scrape

ammeysaini / edu-mail-generator

scraping,Generate Free Edu Mail(s) within minutes

User: ammeysaini

selenium python edumail scraping scraping-websites install-webdriver python3 edu mail student-mail edu-account edu-generator selenium-python auto-install-webdriver

apify / crawlee

scraping,Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

Organization: apify

Home Page: https://crawlee.dev

web-scraping web-crawling npm headless-chrome puppeteer automation apify scraping crawling crawler

apify / fingerprint-suite

scraping,Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.

Organization: apify

fingerprinting playwright puppeteer scraping typescript

claffin / cloudproxy

scraping,Hide your scrapers IP behind the cloud. Provision proxy servers across different cloud providers to improve your scraping success.

User: claffin

Home Page: https://cloudproxy.io/

cloud proxy proxy-server scraping

code4craft / webmagic

scraping,A scalable web crawler framework for Java.

User: code4craft

Home Page: http://webmagic.io/

crawler java scraping framework

damklis / dataengineeringproject

scraping,Example end to end data engineering project.

User: damklis

big-data scraping mongodb elasticsearch data-engineering kafka kafka-connect debezium django-rest-framework redis airflow minio s3 python data-pipeline hacktoberfest

datahenhq / till

scraping,DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.

Organization: datahenhq

Home Page: https://till.datahen.com

web-scraping man-in-the-middle proxy-server mitm scraping crawler scraper

elixir-crawly / crawly

scraping,Crawly, a high-level web crawling & scraping framework for Elixir.

Organization: elixir-crawly

Home Page: https://hexdocs.pm/crawly

elixir erlang scraper scraping scraping-websites extract-data spider crawler crawling

emadehsan / thal

scraping,Getting started with Puppeteer and Chrome Headless for Web Scraping

User: emadehsan

Home Page: https://emadehsan.com

puppeteer chrome-headless nodejs scraping mongoose mongodb

fake-useragent / fake-useragent

scraping,Up-to-date simple useragent faker with real world database

Organization: fake-useragent

Home Page: https://pypi.python.org/pypi/fake-useragent

python python3 user agent fake faker scraping user-agent user-agent-spoofer useragent

geziyor / geziyor

scraping,Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.

Organization: geziyor

go scraping scraper crawler spider

gocolly / colly

scraping,Elegant Scraper and Crawler Framework for Golang

Organization: gocolly

Home Page: https://go-colly.org/

golang scraper framework crawler scraping crawling spider go

holgerd77 / django-dynamic-scraper

scraping,Creating Scrapy scrapers via the Django admin interface

User: holgerd77

Home Page: http://django-dynamic-scraper.readthedocs.io

python django scraper scraping scrapy spider webscraping

iawia002 / lulu

scraping,[Unmaintained] A simple and clean video/music/image downloader 👾

User: iawia002

downloader video python python3 crawler scraper crawling scraping

iiab / iiab

scraping,Internet-in-a-Box - Build your own LIBRARY OF ALEXANDRIA with a Raspberry Pi !

Organization: iiab

Home Page: https://internet-in-a-box.org

learning hotspot library scraping raspberry-pi knowledge medical prisoners-rights human-rights curriculum-design

istresearch / scrapy-cluster

scraping,This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

Organization: istresearch

Home Page: http://scrapy-cluster.readthedocs.io/

python scrapy kafka redis scraping distributed

jfilter / clean-text

scraping,🧹 Python package for text cleaning

User: jfilter

python natural-language-processing text-cleaning text-normalization text-preprocessing python-package nlp user-generated-content scraping

kevinzg / facebook-scraper

scraping,Scrape Facebook public pages without an API key

User: kevinzg

facebook scraping facebook-scraping facebook-scraper hacktoberfest

khuyentran1401 / data-science

scraping,Collection of useful data science topics along with articles, videos, and code

User: khuyentran1401

Home Page: https://khuyentran1401.github.io/Data-science/

data-science machine-learning natural-language-processing python data-visualization data-analysis articles artificial-intelligence time-series scraping

kuwala-io / kuwala

scraping,Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times

Organization: kuwala-io

Home Page: https://kuwala.io

data data-integration data-science open-data spatial-analysis elt kuwala open-source scraping dbt

leoncvlt / loconotion

scraping,📄 Python tool to turn Notion.so pages into lightweight, customizable static websites

User: leoncvlt

notion pyhton scraping static-site-generator

lorey / mlscraper

scraping,🤖 Scrape data from HTML websites automatically by just providing examples

User: lorey

Home Page: https://pypi.org/project/mlscraper/

scraping crawling html machine-learning extraction-engine scraper crawler crawler-python

lorien / awesome-web-scraping

scraping,List of libraries, tools and APIs for web scraping and data processing.

User: lorien

web-scraping captcha-bypass captcha-recaptcha crawling crawling-framework crawling-python crawling-tool scraping scraping-framework scraping-python scraping-tool webscraping crawler spider