Coder Social home page Coder Social logo

xuhao61 / communityscrapers Goto Github PK

View Code? Open in Web Editor NEW

This project forked from stashapp/communityscrapers

0.0 0.0 0.0 3.22 MB

This is a public repository containing scrapers created by the Stash Community.

License: GNU Affero General Public License v3.0

Shell 0.14% JavaScript 0.75% Ruby 1.24% Python 33.20% YAML 64.67%

communityscrapers's Introduction

CommunityScrapers

This is a public repository containing scrapers created by the Stash Community.

โ— Make sure to read ALL of the instructions here before requesting any help in the Discord channel. For a more user friendly step-by-step guide you can check out the Guide to Scraping โ—

When asking for help do not forget to mention what version of Stash you are using, the scraper that is failing, the URL you are attempting to scrape, and your current Python version (but only if the scraper requires Python)

Note that some scrapers (notably ThePornDB for Movies and ThePornDB for JAV) require extra configuration. As of v0.24.0 this is not possible through the web interface so you will need to open these in a text editor and read the instructions to add the necessary fields, usually an API key or a cookie.

Installing scrapers

With the v0.24.0 release of Stash you no longer need to install scrapers manually: if you go to Settings > Metadata Providers you can find the scrapers from this repository in the Community (stable) feed and install them without ever needing to copy any files manually.

If you still prefer to manage your scrapers manually that is still supported as well, using the same steps as before. Manually installed scrapers and ones installed through Stash can both be used at the same time.

Installing scrapers (manually)

To download all of the scrapers at once you can clone the git repository. If you only need some of the scrapers they can be downloaded individually.

When downloading directly click at the .yml you want and then make sure to click the raw button:

and then save page as file from the browser to preserve the correct format for the .yml file.

Any scraper file has to be stored in the path you've configured as your Scrapers Path in Settings > System > Application Paths, which is ~/.stash/scrapers by default. You may recognize ~/.stash as the folder where the config and database file are located.

After manually updating the scrapers folder contents or editing a scraper file a reload of the scrapers is needed and a refresh of the edit scene/performer page. (Scrape with... -> Reload scrapers)

Some sites block content if the user agent is not valid. If you get some kind of blocked or denied message make sure to configure the Scraping -> Scraper User Agent setting in stash. Valid strings e.g. for firefox can be found here https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent/Firefox . Scrapers for those sites should have a comment mentioning this along with a tested and working user agent string

Scrapers with useCDP set to true require that you have properly configured the Chrome CDP path setting in Stash. If you decide to use a remote instance the headless chromium docker image from https://hub.docker.com/r/chromedp/headless-shell/ is highly recommended.

Python scrapers

Some scrapers require external programs to function, usually Python. All scrapers are tested with the newest stable release of Python, currently 3.12.2.

Depending on your operating system you may need to install both Python and the scrapers' dependencies before they will work. For Windows users we strongly recommend installing Python using the installers from python.org instead of through the Windows Store, and also installing it outside of the Users folder so it is accessible to the entire system: a commonly used option is C:\Python312.

After installing Python you can install the most commonly used dependencies by running the following command in a terminal window:

python -m pip install stashapp-tools requests cloudscraper beautifulsoup4 lxml

You may need to replace python with py in the command if you are running on Windows.

If Stash does not detect your Python installation you can set the Python executable path in Settings > System > Application Paths. Note that this needs to point to the executable itself and not just the folder it is in.

Scrapers

You can find a list of sites that currently have a scraper in SCRAPERS-LIST.md

๐Ÿ’ฅ For most scrapers you have to provide the scene/performer URL

Stable build (>=v0.11.0)
Once you populate the URL field with an appropriate url, the scrape URL button will be active.
stable

Clicking on that button brings up a popup that lets you select which fields to update.

Some scrapers support the Scrape with... function so you can you use that instead of adding a url. Scrape with... usually works with either the Title field or the filename so make sure that they provide enough data for the scraper to work with.

A Query button is also available for scrapers that support that. Clicking the button allows you to edit the text that the scraper will use for your queries.

In case of errors/no results during scraping make sure to check stash's log section (Settings->Logs->Log Level Debug) for more info.

For more info please check the scraping help section

Contributing

Contributions are always welcome! Use the Scraping Configuration help section to get started and stop by the Discord #scrapers channel with any questions.

The last line of a scraper definition (.yml file) must be the last updated date, in the following format:
# Last Updated Month Day, Year
Month = Full month name (October)
Day = Day of month, with leading zero (04, 16)
Year = Full year (2020)
Example: # Last Updated October 04, 2020

Validation

The scrapers in this repository can be validated against a schema and checked for common errors.

First, install the validator's dependencies - inside the ./validator folder, run: yarn.

Then, to run the validator, use node validate.js in the root of the repository.
Specific scrapers can be checked using: node validate.js scrapers/foo.yml scrapers/bar.yml

communityscrapers's People

Contributors

maista6969 avatar bnkai avatar jasenzc avatar belleyy avatar mmenanno avatar nrg101 avatar echo6ix avatar mrx292 avatar peolic avatar jackdawson94 avatar smcallah avatar reynn avatar tweeticoats avatar dogmadragon avatar niemands avatar mortonbridges avatar withoutpants avatar spednsfw avatar xantror avatar aussiehuddo avatar plz12345 avatar zzfet avatar imagineimaginingthings avatar nocrad349 avatar gimmeliina avatar emilo2 avatar muldec avatar stg-annon avatar ryosaeba75 avatar philpw99 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.