Coder Social home page Coder Social logo

tasdikrahman / xkcd-dl Goto Github PK

View Code? Open in Web Editor NEW
145.0 5.0 19.0 695 KB

:arrow_double_down: Download ALL xkcd's which have been uploaded till date. Ever!

Home Page: http://tasdikrahman.me/xkcd_dl

License: MIT License

Python 99.09% Makefile 0.91%
comics xkcd python

xkcd-dl's Introduction

PyPI version License Travis

Download each and every xkcd comic uploaded! Like ever!

Author

Tasdik Rahman

Features

  • Can download all the xkcd's uploaded till date(1603 as I am writing this!).
  • Download individual xkcd's and store them
  • Download ranges of xkcd's and store them
  • Download the latest issue xkcd
  • Download the meta text inside each xkcd and store it
  • No duplicacy in your XKCD database.
  • Stores each xkcd in a separate file named as the title of the xkcd at your home directory
  • Writes a description.txt for each xkcd. Storing meta-data like
    • date-publised
    • url value
    • a small description of that xkcd
    • The alt text on the comic
  • written in uncomplicated python.

Demo

Usage

Usage

Each Comic is stored in it's own individual folder with a description.txt placed in it. It contains meta-data like -img-link - title - date-published - alt

Here's a little example for the same

xkcd_archive Structure

xkcd_archive Structure

Usage

When running for the first time, do a xkcd-dl --update-db

$ xkcd-dl --update-db
XKCD link database updated
Stored it in 'xkcd_dict.json'. You can start downloading your XKCD's!
Run 'xkcd-dl --help' for more options
$

--help

$ xkcd-dl --help
usage: xkcd-dl [-h] [-u] [-l] [-d XKCD_NUM | -a]
               [-r [DOWNLOAD_RANGE [DOWNLOAD_RANGE ...]]] [-v] [-P PATH]
               [-s XKCD_NUM]

Run `xkcd-dl --update-db` if running for the first time.

optional arguments:
  -h, --help            show this help message and exit
  -u, --update-db       Update the database
  -l, --download-latest
                        Download most recent comic
  -d XKCD_NUM, --download XKCD_NUM
                        Download specified comic by number
  -a, --download-all    Download all comics
  -r [DOWNLOAD_RANGE [DOWNLOAD_RANGE ...]], --download-range [DOWNLOAD_RANGE [DOWNLOAD_RANGE ...]]
                        Download specified range
  -v, --version         show program's version number and exit
  -P PATH, --path PATH  set path
  -s XKCD_NUM, --show XKCD_NUM
                        Show specified comic by number

--download-latest

This downloads the last uploaded xkcd comic and stores under the home directory of the user with a brief description

$ xkcd-dl --download-latest
Downloading xkcd from 'http://imgs.xkcd.com/comics/flashlights.png' and storing it under '/home/tasdik/xkcd_archive/1603'
$

If it has been downloaded, will not do anything

This command will work even if you have not run --update-db yet.

--download=XKCDNUMBER

Downloads the particular XKCDNUMBER(given that it exists and has not been downloaded already) and stores it in the home directory

$ xkcd-dl --download=143
Downloading xkcd from 'http://xkcd.com/143/' and storing it under '/home/tasdik/xkcd_archive/143'
$ xkcd-dl --download=1603
Downloading xkcd from 'http://xkcd.com/1603/' and storing it under '/home/tasdik/xkcd_archive/1603'
xkcd  number '1603' has already been downloaded!
$

--download-range <START> <END> --------------------

Will take two number parameters and download all the xkcd's between the two, inclusive.

$ xkcd-dl --download-range 32 36
Downloading xkcd from 'http://xkcd.com/32/' and storing it under '/home/tasdik/xkcd_archive/32'
Downloading xkcd from 'http://xkcd.com/33/' and storing it under '/home/tasdik/xkcd_archive/33'
Downloading xkcd from 'http://xkcd.com/34/' and storing it under '/home/tasdik/xkcd_archive/34'
Downloading xkcd from 'http://xkcd.com/35/' and storing it under '/home/tasdik/xkcd_archive/35'
Downloading xkcd from 'http://xkcd.com/36/' and storing it under '/home/tasdik/xkcd_archive/36'

--download-all

As the name suggests, will download all the xkcd's uploaded till date and store them under the home directory of the user.

$ xkcd-dl --download-all
Downloading all xkcd's Till date!!
Downloading xkcd from 'http://xkcd.com/1466' and storing it under '/home/tasdik/xkcd_archive/1466'
Downloading xkcd from 'http://xkcd.com/381' and storing it under '/home/tasdik/xkcd_archive/381'
Downloading xkcd from 'http://xkcd.com/198' and storing it under '/home/tasdik/xkcd_archive/198'
Downloading xkcd from 'http://xkcd.com/512' and storing it under '/home/tasdik/xkcd_archive/512'
Downloading xkcd from 'http://xkcd.com/842' and storing it under '/home/tasdik/xkcd_archive/842'
Downloading xkcd from 'http://xkcd.com/920' and storing it under '/home/tasdik/xkcd_archive/920'
....
....

--path=PATH

To use a custom directory to store your xkcd_archive, you can append --path=./any/path/here to the end of any download method. Absolute and relative paths work, but the directory must already exist.

$ xkcd-dl --download=3 --path=comic
Downloading xkcd from 'http://xkcd.com/3/' and storing it under '/home/tasdik/comic/xkcd_archive/3'
$ xkcd-dl --download-range 54 56 --path=/home/tasdik/xkcd
Downloading xkcd from 'http://xkcd.com/54/' and storing it under '/home/tasdik/xkcd/xkcd_archive/54'
Downloading xkcd from 'http://xkcd.com/55/' and storing it under '/home/tasdik/xkcd/xkcd_archive/55'
Downloading xkcd from 'http://xkcd.com/56/' and storing it under '/home/tasdik/xkcd/xkcd_archive/56'

--show XKCD_NUM

Opens the specified comic. Downloads it, if not downloaded already. Prints the alt text and metadata to stdout.

$ xkcd-dl --show 32
Downloading xkcd from 'http://xkcd.com/32/' and storing it under '/home/bk/Documents/xkcd-dl/xkcd_dl/xkcd_archive/32'
title : Pillar
date-publised: 2006-1-1
url: http://xkcd.com/32/
alt: A comic by my brother Doug, redrawn and rewritten by me
 
$ xkcd-dl -s 1000
Downloading xkcd from 'http://xkcd.com/1000/' and storing it under '/home/bk/Documents/xkcd-dl/xkcd_dl/xkcd_archive/1000'
xkcd  number '1000' has already been downloaded!
title : 1000 Comics
date-publised: 2012-1-6
url: http://xkcd.com/1000/
alt: Thank you for making me feel less alone.

Installation

Option 1: installing through pip (Suggested way)

pypi package link

$ pip3 install xkcd-dl

If you are behind a proxy

$ pip3 --proxy [username:password@]domain_name:port install xkcd-dl

Note: If you get command not found then $ sudo apt-get install python3-pip should fix that

Option 2: installing from source

$ git clone https://github.com/tasdikrahman/xkcd-dl.git
$ cd xkcd-dl/
$ pip3 install -r requirements.txt
$ python3 setup.py install

Upgrading

$ pip3 install -U xkcd-dl

Uninstalling

$ pip3 uninstall xkcd-dl

For Arch distributions

Here is the AUR link for you

Contributing

I hacked this up in one night, so its a little messy up there. Feel free to contribute.

  1. Fork it.
  2. Create your feature branch (git checkout -b my-new-awesome-feature)
  3. Commit your changes (git commit -am 'Added <xyz> feature')
  4. Push to the branch (git push origin my-new-awesome-feature)
  5. Create new Pull Request

Contributors

Big shout out to

  • Ian C for fixing issue #2 which stopped the download if a title of a comic had a special character in it and BlitzKraft for pointing it out.
  • BlitzKraft for adding the feature to download the alt-text from the the xkcd and major clean ups!
  • Braden Best for pointing out the issues when installing from source apart from his valuable input.

To-do

  • [x] add xkcd-dl --download-latest
  • [x] add xkcd-dl --download=XKCDNUMBER
  • [x] add xkcd-dl --download-all
  • [x] add xkcd-dl download-range <START> <END>
  • [x] add path setting with [--path=/path/to/directory] option
  • [x] add exclude list to easily recognize and ignore dynamic comics i.e. comics without a default image.
  • [x] Remove redundant code in download_xkcd_number(), download_latest() and download_all() (Refactoring!!)
  • [x] Adding support to open a particular xkcd at the CLI itself. Implemented using xdg-open. Opens using your default image viewer.
  • [x] Add tests

Known Issues

  • There have been issues when installed from source if you are using python 2.* as discussed in #5. So using python3.* is suggested.
  • If you get command not found when installing, it may mean that you don't have pip3 installed. $ sudo apt-get install python3-pip should fix that. To check your version of pip
  • Dynamic comics have to be added manually using the excludeList
$ pip3 --version
pip 1.5.6 from /usr/lib/python3/dist-packages (python 3.4)
$

Bugs

Please report the bugs at the issue tracker

OR

You can tweet me at @tasdikrahman if you can't get it to work. In fact, you should tweet me anyway.

Changelog

  • 0.1.2:

    bug: fixed relative import error in setup.py added support for gif files when renaming downloaded image (#38)

Motivation

xkcd-dl is inspired by an awesome package called youtube-dl written by Daniel Bolton (Much respect!)

How about you get to download all of the xkcd which have been uploaded till date? This does just that!

Now I don't know about you, but I just love reading xkcd's! Had a boring Sunday night looming over, thought why not create something like youtube-dl but for downloading xkcd's!

And hence xkcd-dl

Cheers to a crazy night!

Legal stuff

Built with ♥ by Tasdik Rahman (@tasdikrahman) and others released under MIT License

You can find a copy of the License at http://prodicus.mit-license.org/

Donation

If you have found my little bits of software of any use to you, you can help me pay my internet bills :)

Paypal badge

Instamojo

gratipay

patreon

xkcd-dl's People

Contributors

antonc42 avatar blitzkraft avatar bradenbest avatar eternalfool avatar ianleeclark avatar kickball avatar lethargilistic avatar prodicus avatar rahulhp avatar taranjeet avatar tasdikrahman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

xkcd-dl's Issues

Installed, but after that nothing works.

Ubuntu 20.04,
Python version 3.8.5
Installing message (pip3 install xkcd-dl):

Collecting xkcd-dl
  Downloading xkcd-dl-0.1.2.tar.gz (13 kB)
Collecting beautifulsoup4==4.4.1
  Downloading beautifulsoup4-4.4.1-py3-none-any.whl (81 kB)
     |████████████████████████████████| 81 kB 807 kB/s
Collecting python-magic==0.4.10
  Downloading python-magic-0.4.10.tar.gz (4.0 kB)
Collecting requests==2.8.1
  Downloading requests-2.8.1-py2.py3-none-any.whl (497 kB)
     |████████████████████████████████| 497 kB 2.3 MB/s
Building wheels for collected packages: xkcd-dl, python-magic
  Building wheel for xkcd-dl (setup.py) ... done
  Created wheel for xkcd-dl: filename=xkcd_dl-0.1.2-py3-none-any.whl size=9474 sha256=467c59d454051e43fd8a9a40dfd81030d1493b623abe3d7ffba3b835714da7d9
  Stored in directory: /home/bawse69/.cache/pip/wheels/5c/04/03/a4b0774b2d78efb7ecbc883568913b333c52d6d47f15d883e9
  Building wheel for python-magic (setup.py) ... done
  Created wheel for python-magic: filename=python_magic-0.4.10-py3-none-any.whl size=4214 sha256=d666fcb216df708acce1446a88a2f271508c1abab1c45692d439b775fc8bf654
  Stored in directory: /home/bawse69/.cache/pip/wheels/b4/4a/23/cc7d0113212a0146b392420e7968a909ca14377c0b3fd068a7
Successfully built xkcd-dl python-magic
Installing collected packages: beautifulsoup4, python-magic, requests, xkcd-dl
Successfully installed beautifulsoup4-4.4.1 python-magic-0.4.10 requests-2.8.1 xkcd-dl-0.1.2

When I do --update-db, message received is:

Traceback (most recent call last):
  File "/home/bawse69/.local/bin/xkcd-dl", line 5, in <module>
    from xkcd_dl.cli import main
  File "/home/bawse69/.local/lib/python3.8/site-packages/xkcd_dl/cli.py", line 21, in <module>
    from bs4 import BeautifulSoup as bs4
  File "/home/bawse69/.local/lib/python3.8/site-packages/bs4/__init__.py", line 30, in <module>
    from .builder import builder_registry, ParserRejectedMarkup
  File "/home/bawse69/.local/lib/python3.8/site-packages/bs4/builder/__init__.py", line 314, in <module>
    from . import _html5lib
  File "/home/bawse69/.local/lib/python3.8/site-packages/bs4/builder/_html5lib.py", line 70, in <module>
    class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder):
AttributeError: module 'html5lib.treebuilders' has no attribute '_base'

Same traceback error is received when I do just xkcd-dl. Should I try python 2?

Adding tests to xkcd-dl

Here is my test plan.

All tests would run in /tmp/xkcd-dl-home/
update_dict() creates new dict in specified xkcd_dict_location. Check if file is created and size > 0
test_download_one() --> Would download a jpeg, png and gif xkcd comic and test if specified folders are created with the images and description.
test_show_xkcd -> show xkcd comic not downloaded. show xkcd comic already downloaded.
Should this actually open a comic?
test_download_range() - download a certain range and check if folders are created
I did not get set_custom_path to work. What is it’s intended behaviour?
test_download_latest() <— I think this might be a little redundant.

So do tell me if I’m missing some obvious things/or something else completely.

Sanitizing titles in the comic

The title of xkcd#934 "Mac/PC" breaks the download process. May be replace such characters with something safer (like an underscore) would solve it.

Except for that, I love this.

Needs cleanup

As mentioned in #6, I tried to step into the code and immediately noticed 3 things:

  • The functions are absolutely huge and the code littered with comments.
  • The functions do way too much.
  • Multiple functions are repeating the same steps, making it impossible to change the download behavior without changing multiple functions.

The code needs some serious refactoring.

My suggestions are to:

  1. Start breaking the functions down into smaller functions
  2. Remove duplicate functionality (e.g. move all the code specific to downloading a page to download_page(xkcd_number), like the example referenced in the last comment in #6)

Suggestions

So I used it, and have some suggestions to improve it.

1: instead of keeping the database in ./xkcd_dict.json, it should be in a dotfile (~/.xkcd_dict.json) to hide it in the home directory so it doesn't clutter it up and is in a consistent place. I believe python's method for that is os.getenv('$HOME') or something. Also make it always look there (~/.xkcd_dict.json) so it can be run from anywhere.

2: add an option/config file where you can specify a save-directory, and if no directory is specified, save to the current directory. This is the functionality I expected, so I was thrown off when it saved to ~/xkcd_archive.

3: I like the tooltip comments on the site (when you hover over the image). It would be nice to have the downloader grab that as well.

If I get the time, I might try to add those in myself and do a PR. But I'm leaving this "issue" here in hopes that you can beat me to the punch.

Does not work out of the box, if at all

$ python setup.py build
$ sudo python setup.py install
$ xkcd-dl
Traceback (most recent call last):
  File "/usr/local/bin/xkcd-dl", line 9, in <module>
    load_entry_point('xkcd-dl==0.0.5', 'console_scripts', 'xkcd-dl')()
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 351, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2363, in load_entry_point
    return ep.load()
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2088, in load
    entry = __import__(self.module_name, globals(),globals(), ['__name__'])
  File "/usr/local/lib/python2.7/dist-packages/xkcd_dl-0.0.5-py2.7.egg/bin/main.py", line 21, in <module>
    import urllib.request
ImportError: No module named request
$ sudo apt-get install python-urllib3 python-requests
Reading package lists... Done
Building dependency tree       
Reading state information... Done
python-requests is already the newest version.
python-urllib3 is already the newest version.

I don't believe this is how python is meant to function. When did something that was supposed to be simple and easy for rapid prototyping turn into this over-complicated setup.py egg nonsense, where even if you have a dependency properly installed, it still breaks because there's apparently no python equivalent to -I/path/to/h/file -L/path/to/library-binary? Couple this with the fact that there are both python 2 and 3 in active development with divided, separate communities, and you get a bunch of crap. This is why I stopped using Python. Sure, it's easy to program in, but it fails epically in the portability department. I have never had this much trouble with C.

Rant aside, this repo seems to be broken for the above reasons. Not to mention, upon inspecting urllib itself, there doesn't even seem to be a "request" object/module inside urllib, nor does there appear to be a urllib.request module of any fashion other than /usr/lib/python2.7/dist-packages/urllib3/request.py. Is that the package? Shouldn't the import line thus be "import urllib3.request"? What's going on here? How did you even manage to get this to function on your system? Cause I can't. The only way I can import urllib.request of any fashion is to go directly into urllib3's directory and type "import urllib3.request". That won't work anywhere else, and regular urllib doesn't have a request submodule, according to python. And symlinking the directory doesn't work, either.

This is so broken.

Opening on the comic on the command line, options?

My go to image viewer is feh, but my xdg-open default is eog. I looked at img2text but it seems unsuitable for stick figures.

This is not really helpful:
2016-04-11-064215_956x1076_scrot

The comic in the picture is #1665, city talk pages.

Consider this issue an RFC on using feh as default and xdg-open if there is no feh.

404 in the xkcd_dl website

Don't know if this project is still up, but still just wanted to let you know that the website you provide shows a 404.

Here is a screenshot of that:
Screenshot (497)

urllib.request issue

While fixing #2, I ran accross an issue where if I install the script, I can't run xkcd-dl without getting a nag about request

xkcd-dl --download=934
Downloading xkcd from 'http://xkcd.com/934/' and storing it under '/home/tinyl_server/xkcd_archive/934'
Traceback (most recent call last):
  File "/usr/local/bin/xkcd-dl", line 9, in <module>
    load_entry_point('xkcd-dl==0.0.5', 'console_scripts', 'xkcd-dl')()
  File "/usr/local/lib/python2.7/dist-packages/xkcd_dl-0.0.5-py2.7.egg/bin/main.py", line 314, in main
    download_xkcd_number()
  File "/usr/local/lib/python2.7/dist-packages/xkcd_dl-0.0.5-py2.7.egg/bin/main.py", line 286, in download_xkcd_number
    urllib.request.urlretrieve(complete_img_url, file_name)
AttributeError: 'module' object has no attribute 'request'

I don't know if you're not getting this, but I am.

UNLESS, if I run python main.py --download=934 and I change import urllib.request to import urllib, then everything runs.

I want to submit a pull request for #2, but as it stands, I don't know if I'm going to be introducing more bugs into the program by doing so.

Crashes randomly

This tool randomly crashes with the error:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/bin/xkcd-dl", line 33, in <module>
    sys.exit(load_entry_point('xkcd-dl==0.1.2', 'console_scripts', 'xkcd-dl')())
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/xkcd_dl/cli.py", line 271, in main
    download_xkcd_range(*args.download_range)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/xkcd_dl/cli.py", line 64, in download_xkcd_range
    download_one(json_content, number)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/xkcd_dl/cli.py", line 206, in download_one
    r = requests.get(complete_img_url, stream = True)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/requests/api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/requests/sessions.py", line 454, in request
    prep = self.prepare_request(req)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/requests/sessions.py", line 378, in prepare_request
    p.prepare(
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/requests/models.py", line 293, in prepare
    self.prepare_url(url, params)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/requests/models.py", line 353, in prepare_url
    raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL 'http:/2067/asset/challengers_footer.png': No schema supplied. Perhaps you meant http://http:/2067/asset/challengers_footer.png?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.