Coder Social home page Coder Social logo

artskydj / comicsrss.com Goto Github PK

View Code? Open in Web Editor NEW
70.0 7.0 8.0 745.15 MB

RSS feeds for comics

Home Page: https://www.comicsrss.com

JavaScript 1.26% HTML 98.52% CSS 0.20% Batchfile 0.02%
comic comics rss gocomics feed rss-generator arcamax hacktoberfest

comicsrss.com's Introduction

comicsrss.com

ComicsRSS

Source code for the site generator and rss feed generator for comicsrss.com.

Also, all of the site's content is in this repository, as it is hosted by GitHub Pages.

Support Me

If you'd like to help keep this site going, you can send me a few bucks using Patreon. I'd really appreciate it!

Technical Details

I have received many requests to add more comic series to the site. However, my time is limited. So if you want to help out, you can make a scraper!

To be able to add comic series to Comics RSS, it is helpful to understand the basics of what is going on.

Comics RSS has scrapers, and the site generator. Each scraper parses a different comic website, and writes a temporary file to the disk. The site generator reads the temporary JSON files, and writes static HTML/RSS files to the disk.

How scrapers work

The scrapers make https requests to a website (for example, https://www.gocomics.com), parse the responses, and write temporary JSON files to the disk.

On a multi-comic site like https://www.gocomics.com, a scraper has to get the list of comic series (e.g. Agnes, Baby Blues, Calvin and Hobbes, etc). For example, the scraper might request and parse https://www.gocomics.com/comics/a-to-z.

Then, for each comic series, it gets the most recent comic strip. Then it looks up the previous day's comic strip. When it finds a comic strip that it has seen before, it will continue to the next comic series, until it finishes the website.

Finally, it writes the lists of comic series with their list of strips to a temporary JSON file on the hard drive.

How the site generator works

The site generator reads the temporary JSON files made by the scrapers. Those files are read into one big list of comic series, each with their list of comic strips. The generator uses templates to generate an index.html file, and rss/{comic}.rss files.

When these updated/new files are committed and pushed to this repository, they get hosted on gh-pages, which is how you view the site today.

Run locally

  1. Fork and clone the repository
  2. Run these commands on your command line:
# in /comicsrss.com
npm install

cd _generator

# If you want to see all the options:
# node bin --help

# Re-generate the site with the cached scraped site data:
node bin --generate

# If you want to run the scrapers (takes a while) then run this:
# node bin --scrape --generate

# I have nginx serving up my whole code directory, so I can go to http://localhost:80/comicsrss.com/
# If you don't have anything similar set up, you can try:
cd ..
npx serve
# Then open http://localhost:3000 in your browser

Run your own auto-updating scraper and website using CircleCI

  1. Fork the repository
  2. Create a GitHub Deploy Key, add it to GitHub, and CircleCI
  3. Change .circleci/config.yml from my username, email, and key fingerprint to your username, email, and key fingerprint
  4. Enable the repo in CircleCI
  5. I think that's it? Make a PR if you attempt the above steps and I missed something!

Scraper API

To create a scraper for a single-series website that shows multiple days' comic strips per web page, copy the code from dilbert.js and change it as needed.

To create a scraper for a multi-series website, copy the code from arcamax.js and change it as needed.

If you're not sure which to use, probaby start from arcamax.js, or feel free to open a GitHub issue to discuss it with me.

License

MIT

comicsrss.com's People

Contributors

artskydj avatar dependabot[bot] avatar nnisarggada avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

comicsrss.com's Issues

Add feedly and rss buttons

<a href='http://cloud.feedly.com/#subscription%2Ffeed%2Fhttp%3A%2F%2Fwww.comicsrss.com%2Frss%2Fcalvinandhobbes.rss'  target='blank'><img id='feedlyFollow' src='http://s3.feedly.com/img/follows/feedly-follow-logo-black_2x.png' alt='follow us in feedly' width='28' height='28'></a>

feedly follow button like or

Also add an rss button to copy the rss link, and some sort of Copied! text.

<svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="128px" height="128px" id="RSSicon" viewBox="0 0 256 256">
<rect width="256" height="256" rx="50" ry="50" x="0" y="0" fill="#F49C52"/>
<circle cx="68" cy="189" r="24" fill="#FFF"/>
<path d="M160 213h-34a82 82 0 0 0 -82 -82v-34a116 116 0 0 1 116 116z" fill="#FFF"/>
<path d="M184 213A140 140 0 0 0 44 73 V 38a175 175 0 0 1 175 175z" fill="#FFF"/>
</svg>

Add feed preview icons

You might want to read a few comics instead of just blindly subscribing.

Add a button to the left of the rss button for going to the gocomics page for that comic.


One option is to use the gocomics logo, but I don't think that would be more clear. And there are probably IP issues with that. And if I ever support a site other than gocomics, I would have to use different icons, or something ugly like that. And I don't really like their logo. And I don't think it would work well at 24px square.


I'm thinking some sort of "eye" icon, like Github's notification "watching" icon: capture

The SVG is here:

<svg aria-hidden="true" class="octicon octicon-eye" height="16" version="1.1" viewBox="0 0 16 16" width="16">
	<path fill-rule="evenodd" d="M8.06 2C3 2 0 8 0 8s3 6 8.06 6C13 14 16 8 16 8s-3-6-7.94-6zM8 12c-2.2 0-4-1.78-4-4 0-2.2 1.8-4 4-4 2.22 0 4 1.8 4 4 0 2.22-1.78 4-4 4zm2-4c0 1.11-.89 2-2 2-1.11 0-2-.89-2-2 0-1.11.89-2 2-2 1.11 0 2 .89 2 2z"></path>
</svg>

but there might be IP issues...


Found a similar one on wikimedia:

<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24">
	<path d="M12 8c-5 0-11 6-11 6s6 6 11 6 11-6 11-6-6-6-11-6zm0 10c-2.2 0-4-1.8-4-4s1.8-4 4-4 4 1.8 4 4-1.8 4-4 4z"/>
	<circle cx="12" cy="14" r="2"/>
</svg>

I think this option is the best

Make the website decent

The website is just an auto-generated README.md.

Finding feeds

It's hard to find the feed you want from a list of 300+ links. Maybe there could be buttons like
A - B - C - D - E - F - G - H - I - J ... X - Y - Z that would send you to the right anchor.

Differ from README

The github readme should explain the project, not be an alternate place to get the feed links

Links to me

Should link to the gh repo, and my site

Search

It would have to be javascript based. Take each search word, and add the hidden class to any li element with li.innerHTML.toLowerCase().indexOf(searchWord) !== -1

Responsive

More buzzwords, and stuff

Buy a domain for this

This will also be a backwards-incompatible change. Maybe I can add a note to the end of the rss stream that I will be changing URLs.

Generate site from my VPS

This is surprisingly annoying to accomplish because of pushing to github. Try to find a way to authenticate that vps. Generating a new ssh key is probably easiest and best.

The other part is to get a cron job working.

While creating this, make a script so that if I ever change to another vps, then I can just run the script.

Tests

This will take some work, since I didn't build this as well as I should've.

I'm wanting unit tests and integration tests.

  • unit tests
    • each file in _generator, except index.js and generate....bat
  • integration tests:
    • Test against a locally-hosted version of the site that with fewer files. Just delete most of sitemap.xml.
    • Assert the output feed files match
  • both:
    • Needs to take a hostname upon initialization. Certain places just assume www.gocomics.com

gocomics.com is blocking requests

I accessed gocomics.com from my computer. Then I tried running this job. Then I tried accessing gocomics.com, and I was blocked. A few hours later, I was no longer blocked. My guess is that it's got rate limiting in place, or it is looking at the User Agent?

To Do:

  1. Find a way around it
  2. Try to hit their servers less
    a. Maybe just keep the last 3 days comics instead of 5?
    b. Limit the request rate to avoid excessive traffic?

Fri, 26 May 2017 02:16:00 -0400

Auto packing the repository for optimum performance. You may also
run "git gc" manually. See "git help gc" for more information.
events.js:141
throw er; // Unhandled 'error' event
^

Error: socket hang up
at createHangUpError (_http_client.js:200:15)
at Socket.socketOnEnd (_http_client.js:292:23)
at emitNone (events.js:72:20)
at Socket.emit (events.js:166:7)
at endReadableNT (_stream_readable.js:913:12)
at nextTickCallbackWith2Args (node.js:442:9)
at process._tickCallback (node.js:356:17)

Don't throw when unable to parse comicImageUrl

A common issue is that gocomics.com does not have a comicImageUrl right away. This causes the generator to throw an error, which causes me to get an email about it, which causes a github issue to be opened. Of the 8 issues opened with the cron job, only 2 so far (#14, #19) have been unrelated to parsing the comicImageUrl.

  • Perhaps the feed just shouldn't be generated that time?
  • Maybe if the feed doesn't exist yet, then the issue can be ignored?
  • Maybe if it happens to fewer than 1 in 10 then it is ok?

Most likely it isn't actually the comicImageUrl, as it is just an invalid page. Note that comicImageUrl comes first in the validation: get-comic-pages.js, line 42.

Related #20, #18, #17, #16, #15, #13, #12 .

Better search

  • An appersand (&) and the word and should be treated as equivalent in a search.
  • If someone searches by, it should not include every single comic, like it does now.
  • en espanol should pull up comics with en Español.
  • Also, this should be splitting the search words into words, and looking up each of those words.
    E.g. Cow Mark should get Cow and Boy Classics by Mark Leiknes and Lucky Cow by Mark Pett
  • Partial words should continue to be ok, if they're the beginning of the word.
    E.g. Cow Ch should get 2 Cows and a Chicken by Steve Skelton and CowTown by Charlie Podrebarac
  • If you type in 4 search words, and no comics match all 4, but some match 3 search-words, pull those up?

Cron Daemon Thu, 15 Jun 2017 03:52:40 -0400

Thu, 15 Jun 2017 03:52:40 -0400
Auto packing the repository for optimum performance. You may also
run "git gc" manually. See "git help gc" for more information.
comic no longer exists

Add install script

Add an install script... I think it would look something like this:

(Should I explain the git setup in this file?)

cd ~
git clone [email protected]:ArtskydJ/comicsrss.com.git
crontab -l > ./crontab.txt
echo "[email protected]" >> ./crontab.txt
echo "# Runs at 1:15 CDT. It would work at 12:15, but I don't want to" >> ./crontab.txt
echo "# have to change it for DST. Not sure if I would have to or not..." >> ./crontab.txt
echo "15 2 * * * sh /root/comicsrss.com/_generator/generate-and-push.sh" >> ./crontab.txt
crontab ./crontab.txt

(Should I have quotes around the email address? Do I need the sh command, or can I just call the .sh file?)

Move rss files to a subdirectory

To do this without breaking everyone's feeds, I would need to have 301 redirects. Not sure if there's a way to do that with gh-pages.

Make the colors consistent

There are multiple grays on the page.

selector part color css rule
input.search underline #dddddd border-bottom: 1px solid #ddd
input.search:focus underline #888888 border-bottom: 1px solid #888
input.search placeholder #757575 Not sure if there is a css rule attached.
.icon-link icon #808080 opacity: 0.5; bg-color: #fff; color: #000
ul underline #808080 border-top: 1px solid gray
li underline #808080 border-bottom: 1px solid gray

Make it consistent.

Fri, 9 Jun 2017 02:15:08 -0400

events.js:141
throw er; // Unhandled 'error' event
^

Error: read ECONNRESET
at exports._errnoException (util.js:870:11)
at TCP.onread (net.js:552:26)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.