Coder Social home page Coder Social logo

poketo-cli's Introduction

poketo-node

Build Status Coverage npm

Node library for scraping manga sites.

Provides a consistent API for scraping metadata and chapter images from 16+ manga sites. Makes it easy to build manga readers, downloaders, archival tools, and more.

For working examples, check out the Poketo manga reader, or the Poketo CLI!

People should be able to read content on the web in the way that works for them. Manga sites are often a special brand of bad. Each manga page loads a new web page, ads everywhere, yuck! Poketo opens up access to that content to make better tools on top.

🚧 This project is still v0.x.x and the API may change as more sites are added.

Install

npm install poketo --save

You can also use api.poketo.app, a hosted micro-service for this library.

Usage

import poketo from 'poketo';

poketo.getSeries('http://merakiscans.com/senryu-girl/').then(series => {
  console.log(series);
  //=> { id: 'meraki-scans:senryu-girl', title: 'Senryu Girl', chapters: [...], ... }
});

poketo.getChapter('http://merakiscans.com/senryu-girl/5/').then(chapter => {
  console.log(chapter);
  //=> { id: 'meraki-scans:senryu-girl:5', pages: [...], ... }
});

Full documentation of the API can be found below.

Supported Sites

Site Series Info Chapter Images
Helvetica Scans βœ“ βœ“
Hot Chocolate Scans βœ“ βœ“
Jaimini’s Box βœ“ βœ“
Kirei Cake βœ“ βœ“
MangaHere βœ“ βœ“ (slow)
MangaUpdates βœ“
Mangadex βœ“ βœ“
MangaFox βœ“ βœ“
Mangakakalot βœ“ βœ“
Manganelo βœ“ βœ“
MangaRock βœ“ βœ“
MangaStream βœ“ βœ“
Meraki Scans βœ“ βœ“
Phoenix Serenade βœ“ βœ“
Sense Scans βœ“ βœ“
Sen Manga βœ“ βœ“
Silent Sky Scans βœ“ βœ“
Tuki Scans βœ“ βœ“

If there's a site or group you'd like to see supported, make an issue!

API

Poketo exposes four methods:

poketo.getSeries(idOrUrl: string): Promise<Series>
poketo.getChapter(idOrUrl: string): Promise<Chapter>
poketo.getType(input: string): 'series' | 'chapter'
poketo.constructUrl(id: string): string

To understand what is returned for a Series or Chapter, check out the examples below.

Docs

Get series information

Use poketo.getSeries to get the ID, title, cover image, and chapter listing for a manga series.

poketo.getSeries('http://merakiscans.com/senryu-girl').then(series => {
  console.log(series);
});

// {
//   id: 'meraki-scans:senryu-girl',
//   slug: 'senryu-girl',
//   url: 'http://merakiscans.com/senryu-girl',
//   title: 'Senryu Girl',
//   coverImageUrl: 'http://merakiscans.com/.../senryu_200x0.jpg',
//   chapters: [
//     {
//       id: 'meraki-scans:senryu-girl:1',
//       slug: '1',
//       chapterNumber: '1',
//       volumeNumber: '1',
//       title: '5-7-5 Girl',
//       createdAt: 1522811950
//     },
//     ...
//   ],
//   updatedAt: 1522811950,
// }

Get pages for a chapter

Use poketo.getChapter to get a page listing for a chapter. This doesn't include any information about the chapter or series, just the page list.

Depending on the site, the page list will also include image dimensions.

poketo.getChapter('http://merakiscans.com/senryu-girl/5').then(chapter => {
  console.log(chapter);
});

// {
//  id: 'meraki-scans:senryu-girl:5',
//  slug: '1',
//  url: 'http://merakiscans.com/senryu-girl/5',
//  pages: [
//    { id: '01', url: 'http://merakiscans.com/image01.png', width: 800, height: 1200 },
//    { id: '02', url: 'http://merakiscans.com/image02.png', width: 800, height: 1200 },
//    ...
//  ]
// }

Validate a URL or ID

If you have an arbitrary input, you can see if Poketo recognizes it by calling the poketo.getType method. It will return 'series' or 'chapter' if it's supported, or will throw a Poketo error if not.

poketo.getType('http://merakiscans.com/senryu-girl');
//=> 'series'
poketo.getType('http://merakiscans.com/senryu-girl/5');
//=> 'chapter'
poketo.getType('http://merakiscans.com/i/am/a/banana/yo');
//=> throws a `poketo.InvalidUrlError`
poketo.getType('http://google.com');
//=> throws a `poketo.UnsupportedSiteError`

Get a URL from a Poketo ID

If you've stored a Poketo ID, you can get a URL back out by using the poketo.constructUrl method. You can learn more about the difference between IDs and URLs.

poketo.constructUrl('meraki-scans:senryu-girl:5');
//=> http://merakiscans.com/senryu-girl/5

poketo.constructUrl('manga-stream:haikyuu:314/5286');
//=> https://readms.net/r/haikyuu/314/5286/1

What's the difference between an ID and a URL?

Poketo scrapes information from many sites. To identify which site, series (aka. manga), and chapter you're talking about, Poketo lets you provide information in two ways: a Poketo ID or a URL.

These IDs below are equivalent to the URLs on their right:

ID                                URL
mangadex:13127:311433          β†’  https://mangadex.org/chapter/311433/1
meraki-scans:senryu-girl:5     β†’  http://merakiscans.com/senryu-girl/5
manga-stream:haikyuu:314/5286  β†’  https://readms.net/r/haikyuu/314/5286/1

For the getSeries and getChapter methods, you can provide either a URL, or an ID, like so:

// Both lines return the same series
poketo.getSeries('http://merakiscans.com/senryu-girl/');
poketo.getSeries('meraki-scans:senryu-girl');

// Both lines return the same chapter
poketo.getChapter('https://mangadex.org/chapter/311433/1');
poketo.getChapter('mangadex:13127:311433');

Why use an ID?

Poketo IDs have stronger guarantees they won't change.

It's not uncommon for a site to change their domain name or URL structure. For example, MangaDex once changed URLs for manga series from https://mangadex.org/manga/1234 to https://mangadex.org/title/1234. If that happens, your URL might break. But by using an ID, Poketo will know to do the right thing. This makes IDs a more robust way to store information about a series.

Of course, there are no true guarantees with scraping. Even an ID that works one day might break the next β€”Β but it's a slightly better guarantee.

Error Handling

Scraping isn't a perfect. When using Poketo you'll inevitably run into an error, so we try to make what happened as clear as possible.

  • RequestError -Β unable to make a request to scrape the site
  • TimeoutError - tried to make a request, but the source site didn't respond in a reasonable time. Defaults to 5 seconds.
  • HTTPError - tried to scrape the site, but the site returned an error (eg. 404, 500)
  • LicenseError - tried to scrape the site, but the series/chapter is current blocked, licensed, or was subjected to a DCMA takedown.
  • UnsupportedSiteError - the site you're trying to scrape from isn't supported. If you'd like to see it supported, make an issue!
  • UnsupportedOperationError - some sites don't support reading chapters. This error is thrown if you call poketo.getChapter for these sites.

Contributing

Contributions are welcome! Poketo is meant to be built on top of. Feel free to propose ideas or changes that would make it work for your situation β€” whether it's a bug report, site request, or contributed code. Read more at CONTRIBUTING.md

poketo-cli's People

Contributors

rosszurowski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

poketo-cli's Issues

Allow specifying download path

Right now files can only be downloaded into the current directory. We could allow a flag to pass a different directory to download to.

Show download progress

Right now the CLI only shows the total number of items to download (45 chapters, 80 images, etc.) but gives no indication of where it is in that process. Let's add a way of showing progress, potentially using the visionmedia/progess package.

Expose JS API

If someone wanted to programmatically use the CLI as a downloader, it'd be nice to expose an importable public API to help manage the downloading. It'd also help better separate concerns in this code, since a lot of downloading logic is contained in the CLI file right now.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.