Topic: crawler Goto Github

Some thing interesting about crawler

👇 Here are 6696 public repositories matching this topic...

adbar / trafilatura

crawler,Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

User: adbar

Home Page: https://trafilatura.readthedocs.io

article-extractor corpus corpus-builder corpus-tools crawler html-to-markdown html2text news news-aggregator news-crawler nlp readability rss-feed scraping tei text-cleaning text-extraction text-mining text-preprocessing web-scraping

alirezamika / autoscraper

crawler,A Smart, Automatic, Fast and Lightweight Web Scraper for Python

User: alirezamika

scraping scraper scrape webscraping crawler web-scraping ai artificial-intelligence python webautomation automation machine-learning

apify / crawlee

crawler,Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

Organization: apify

Home Page: https://crawlee.dev

apify automation crawler crawling headless headless-chrome javascript nodejs npm playwright puppeteer scraper scraping typescript web-crawler web-crawling web-scraping

apify / crawlee-python

crawler,Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

Organization: apify

Home Page: https://crawlee.dev/python/

apify automation beautifulsoup crawler crawling headless headless-chrome pip playwright python

arachni / arachni

crawler,Web Application Security Scanner Framework

Organization: arachni

Home Page: http://www.arachni-scanner.com

arachni dom ruby audit detection security-audit analysis modular javascript scanners web-application vulnerability-detection crawler scanner hack hacking penetration-testing xss sql-injection

bda-research / node-crawler

crawler,Web Crawler/Spider for NodeJS + server-side jQuery ;-)

Organization: bda-research

crawler javascript spider extract-data cheerio jquery nodejs

binux / pyspider

crawler,A Powerful Spider(Web Crawler) System in Python.

User: binux

Home Page: http://docs.pyspider.org/

crawler python

boris-code / feapder

crawler,🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder是一款上手简单，功能强大的Python爬虫框架。内置AirSpider、Spider、TaskSpider、BatchSpider四种爬虫解决不同场景的需求。且支持断点续爬、监控报警、浏览器渲染、海量数据去重等功能。更有功能强大的爬虫管理系统feaplat为其提供方便的部署及调度

User: boris-code

Home Page: http://feapder.com

scrapy feapder spider crawler python feaplat

brucedone / awesome-crawler

crawler,A collection of awesome web crawler,spider in different languages

User: brucedone

web-crawler crawler web-scraper spider node-crawler scraper awesome

charlespikachu / decryptlogin

crawler,DecryptLogin: APIs for loginning some websites by using requests.

User: charlespikachu

Home Page: https://httpsgithubcomcharlespikachudecryptlogin.readthedocs.io/zh/latest/

requests login python3 spider crawler zhihu bilibili weibo taobao jingdong

chyroc / wechatsogou

crawler,基于搜狗微信搜索的微信公众号爬虫接口

User: chyroc

wechat sogou python crawler pypi scrapy

code4craft / webmagic

crawler,A scalable web crawler framework for Java.

User: code4craft

Home Page: http://webmagic.io/

crawler java scraping framework

codelucas / newspaper

crawler,newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

User: codelucas

Home Page: https://goo.gl/VX41yK

python news crawler crawling scraper news-aggregator

constverum / proxybroker

crawler,Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS :performing_arts:

User: constverum

Home Page: http://proxybroker.readthedocs.io

proxy anonymity privacy socks http-proxy crawler proxy-server anonymous proxy-checker proxy-list

crawlab-team / crawlab

crawler,Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

Organization: crawlab-team

Home Page: https://www.crawlab.cn

webcrawler scrapy crawlab spiders-management go scrapyd-ui spider crawler webspider web-crawler

dataabc / weibo-crawler

crawler,新浪微博爬虫，用python爬取新浪微博数据，并下载微博图片和微博视频

User: dataabc

weibo crawler weibo-spider

dedsecinside / torbot

crawler,Dark Web OSINT Tool

Organization: dedsecinside

tor tor-network deepweb dark-web psnappz crawler spider dedsec-inside python-web-crawler security-tools security python3 python algorithm osint go projects hacking torbot hacktoberfest

dotnetcore / dotnetspider

crawler,DotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework

Organization: dotnetcore

crawler dotnetcore cross-platform csharp distributed

dropsdevopsorg / ecommercecrawlers

crawler,实战🐍多种网站、电商数据爬虫🕷。包含🕸：淘宝商品、微信公众号、大众点评、企查查、招聘网站、闲鱼、阿里任务、博客园、微博、百度贴吧、豆瓣电影、包图网、全景网、豆瓣音乐、某省药监局、搜狐新闻、机器学习文本采集、fofa资产采集、汽车之家、国家统计局、百度关键词收录数、蜘蛛泛目录、今日头条、豆瓣影评、携程、小米应用商店、安居客、途家民宿❤️❤️❤️。微信爬虫展示项目:

Organization: dropsdevopsorg

Home Page: http://wechat.doonsec.com/

python3 crawler baidu-tieba taobao-spider dazhong-spider douban-movie douban-music alitask baotu quanjing

elliotgao2 / toapi

crawler,Every web site provides APIs.

User: elliotgao2

Home Page: https://gaojiuli.github.io/toapi/

api crawler flask html json python spider toapi web

evil0ctal / douyin_tiktok_download_api

crawler,🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具，支持API调用，在线批量解析及下载。

User: evil0ctal

Home Page: https://douyin.wtf

python pywebio tiktok douyin api scraper fastapi no-watermark online-parsing async

gocolly / colly

crawler,Elegant Scraper and Crawler Framework for Golang

Organization: gocolly

Home Page: https://go-colly.org/

golang scraper framework crawler scraping crawling spider go

guyueyingmu / avbook

crawler,AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

User: guyueyingmu

javbus avmoo javlibrary spider crawler laravel scraper adult magnet-link magnet

hardkoded / puppeteer-sharp

crawler,Headless Chrome .NET API

Organization: hardkoded

Home Page: https://www.puppeteersharp.com

automation chrome chromium crawler crawling csharp e2e e2e-testing puppeteer webautomation

hiddenstrawberry / crawler_illegal_cases_in_china

crawler,Collection of China illegal cases about web crawler 本项目用来整理所有**大陆爬虫开发者涉诉与违规相关的新闻、资料与法律法规。致力于帮助在**大陆工作的爬虫行业从业者了解我国相关法律，避免触碰数据合规红线。 [AD]中文知识图谱门户

User: hiddenstrawberry

Home Page: http://kgkg.kg

china law crawler

iawia002 / lux

crawler,👾 Fast and simple video download library and CLI tool written in Go

User: iawia002

bilibili crawler download downloader go golang iqiyi qq scraper tumblr video youku youtube

imwildcat / scylla

crawler,Intelligent proxy pool for Humans™ to extract content from the internet and build your own Large Language Models in this new AI era

User: imwildcat

crawler python3 scylla proxy-pool python

injetlee / python

crawler,Python脚本。模拟登录知乎，爬虫，操作excel，微信公众号，远程开机

User: injetlee

python crawler wechat excel

jae-jae / querylist

crawler,:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

User: jae-jae

Home Page: https://querylist.cc

querylist crawler spider scraper

jhao104 / proxy_pool

crawler,Python ProxyPool for web spider

User: jhao104

Home Page: https://jhao104.github.io/proxy_pool/

crawler http proxy redis spider

jumper2014 / lianjia-beike-spider

crawler,链家网和贝壳网房价爬虫，采集北京上海广州深圳等21个**主要城市的房价数据（小区，二手房，出租房，新房），稳定可靠快速！支持csv,MySQL, MongoDB,Excel, json存储，支持Python2和3，图表展示数据，注释丰富，点星支持，仅供学习参考，请勿用于商业用途，后果自负。

User: jumper2014

lianjia spider crawler beike house

kanasimi / work_crawler

crawler,Download comics novels 小说漫画下载工具小説漫画のダウンローダ小說漫畫下載:腾讯漫画大角虫漫画有妖气咪咕 SF漫画哦漫画看漫画漫画柜汗汗酷漫動漫伊甸園快看漫画微博动漫 733动漫网大古漫画网漫画DB 無限動漫動漫狂卡推漫画动漫之家动漫屋古风漫画网 36漫画网亲亲漫画网乙女漫画 webtoons 咚漫ニコニコ静画 ComicWalker ヤングエースUP モアイ pixivコミックサイコミ;アルファポリスカクヨムハーメルン小説家になろう起点中文网八一中文网顶点小说落霞小说网努努书坊笔趣阁→epub.

User: kanasimi

comic-downloader novel-downloader cejs downloader download-comic epub ebook comics webcomics manga-downloader

lorien / awesome-web-scraping

crawler,List of libraries, tools and APIs for web scraping and data processing.

User: lorien

web-scraping captcha-bypass captcha-recaptcha crawling crawling-framework crawling-python crawling-tool scraping scraping-framework scraping-python

madawei2699 / mygptreader

crawler,A community-driven way to read and chat with AI bots - powered by chatGPT.

User: madawei2699

Home Page: https://www.i365.tech/

chatgpt slack-bot prompt crawler embedding gpt-35-turbo scraper ai reader hot-news

mendableai / firecrawl

crawler,🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

Organization: mendableai

Home Page: https://firecrawl.dev

ai ai-scraping crawler data html-to-markdown llm markdown rag scraper scraping web-crawler

montferret / ferret

crawler,Declarative web scraping

Organization: montferret

Home Page: https://www.montferret.dev/

cdp chrome cli crawler crawling data-mining dsl go golang hacktoberfest library query-language scraper scraping scraping-websites tool

naibowang / easyspider

crawler,A visual no-code/code-free web crawler/spider易采集：一个可视化浏览器自动化测试/数据采集/爬虫软件，可以无代码图形化的设计和执行爬虫任务。别名：ServiceWrapper面向Web应用的智能化服务封装系统。

User: naibowang

Home Page: https://www.easyspider.net

batch-processing batch-script code-free crawler data-collection frontend gui html input-parameters layman parameters robotics rpa scraper spider visual visualization visualprogramming web www

niespodd / browser-fingerprinting

crawler,Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?

User: niespodd

Home Page: https://niespodd.github.io/browser-fingerprinting/

bot detection chromium stealth puppeteer scraper webscraping web automation chromium-browser

projectdiscovery / katana

crawler,A next-generation crawling and spidering framework.

Organization: projectdiscovery

crawler web-spider gocrawler spider-framework cli headless

qianlitp / crawlergo

crawler,A powerful browser crawler for web vulnerability scanners

User: qianlitp

arsenal blackhat chrome-devtools chromedp crawler crawlergo golang headless headless-chrome vulnerability-scanner web-vulnerability-scanners

rmax / scrapy-redis

crawler,Redis-based components for Scrapy.

User: rmax

Home Page: http://scrapy-redis.readthedocs.io

crawler distributed redis scrapy

s0md3v / photon

crawler,Incredibly fast crawler designed for OSINT.

User: s0md3v

crawler information-gathering osint python spider

scrapy / scrapy

crawler,Scrapy, a fast high-level web crawling & scraping framework for Python.

Organization: scrapy

Home Page: https://scrapy.org

crawler crawling framework hacktoberfest python scraping web-scraping web-scraping-python

shengqiangzhang / examples-of-web-crawlers

crawler,一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )

User: shengqiangzhang

crawler spider taobao tmall example python selenium pyquery stock fund

spiderclub / haipproxy

crawler,:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis

Organization: spiderclub

Home Page: https://spiderclub.github.io/haipproxy/

high-availability scrapy ipproxy distributed redis crawler scheduler spider

ssssssss-team / spider-flow

crawler,新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

Organization: ssssssss-team

Home Page: https://www.spiderflow.org

spider crawler jsoup xpath web-spider webspider webcrawler web-crawler spider-flow

tuhinshubhra / red_hawk

crawler,All in one tool for Information Gathering, Vulnerability Scanning and Crawling. A must have tool for all penetration testers

User: tuhinshubhra

scanner crawler information-gathering admin-scanner backups-finder sql-vulnerability-scannig cms-detector cloudflare-detection geo-ip subdomain-scanner

wkunzhi / python3-spider

crawler,Python爬虫实战 - 模拟登陆各大网站包含但不限于：滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝，如果喜欢请start ❤️

User: wkunzhi

scrapy python crawl crawler geek spider taobao dianping meituan selenium

yujiosaka / headless-chrome-crawler

crawler,Distributed crawler powered by Headless Chrome

User: yujiosaka

chrome chromium crawler crawling headless-chrome jquery promise puppeteer scraper scraping

zu1k / proxypool

crawler,Automatically crawls proxy nodes on the public internet, de-duplicates and tests for usability and then provides a list of nodes

User: zu1k

crawler proxypool