Coder Social home page Coder Social logo

tasdikrahman / xkcd-dl Goto Github PK

View Code? Open in Web Editor NEW
145.0 5.0 19.0 695 KB

:arrow_double_down: Download ALL xkcd's which have been uploaded till date. Ever!

Home Page: http://tasdikrahman.me/xkcd_dl

License: MIT License

Python 99.09% Makefile 0.91%
comics xkcd python

xkcd-dl's Issues

urllib.request issue

While fixing #2, I ran accross an issue where if I install the script, I can't run xkcd-dl without getting a nag about request

xkcd-dl --download=934
Downloading xkcd from 'http://xkcd.com/934/' and storing it under '/home/tinyl_server/xkcd_archive/934'
Traceback (most recent call last):
  File "/usr/local/bin/xkcd-dl", line 9, in <module>
    load_entry_point('xkcd-dl==0.0.5', 'console_scripts', 'xkcd-dl')()
  File "/usr/local/lib/python2.7/dist-packages/xkcd_dl-0.0.5-py2.7.egg/bin/main.py", line 314, in main
    download_xkcd_number()
  File "/usr/local/lib/python2.7/dist-packages/xkcd_dl-0.0.5-py2.7.egg/bin/main.py", line 286, in download_xkcd_number
    urllib.request.urlretrieve(complete_img_url, file_name)
AttributeError: 'module' object has no attribute 'request'

I don't know if you're not getting this, but I am.

UNLESS, if I run python main.py --download=934 and I change import urllib.request to import urllib, then everything runs.

I want to submit a pull request for #2, but as it stands, I don't know if I'm going to be introducing more bugs into the program by doing so.

Suggestions

So I used it, and have some suggestions to improve it.

1: instead of keeping the database in ./xkcd_dict.json, it should be in a dotfile (~/.xkcd_dict.json) to hide it in the home directory so it doesn't clutter it up and is in a consistent place. I believe python's method for that is os.getenv('$HOME') or something. Also make it always look there (~/.xkcd_dict.json) so it can be run from anywhere.

2: add an option/config file where you can specify a save-directory, and if no directory is specified, save to the current directory. This is the functionality I expected, so I was thrown off when it saved to ~/xkcd_archive.

3: I like the tooltip comments on the site (when you hover over the image). It would be nice to have the downloader grab that as well.

If I get the time, I might try to add those in myself and do a PR. But I'm leaving this "issue" here in hopes that you can beat me to the punch.

Crashes randomly

This tool randomly crashes with the error:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/bin/xkcd-dl", line 33, in <module>
    sys.exit(load_entry_point('xkcd-dl==0.1.2', 'console_scripts', 'xkcd-dl')())
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/xkcd_dl/cli.py", line 271, in main
    download_xkcd_range(*args.download_range)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/xkcd_dl/cli.py", line 64, in download_xkcd_range
    download_one(json_content, number)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/xkcd_dl/cli.py", line 206, in download_one
    r = requests.get(complete_img_url, stream = True)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/requests/api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/requests/sessions.py", line 454, in request
    prep = self.prepare_request(req)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/requests/sessions.py", line 378, in prepare_request
    p.prepare(
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/requests/models.py", line 293, in prepare
    self.prepare_url(url, params)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/requests/models.py", line 353, in prepare_url
    raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL 'http:/2067/asset/challengers_footer.png': No schema supplied. Perhaps you meant http://http:/2067/asset/challengers_footer.png?

404 in the xkcd_dl website

Don't know if this project is still up, but still just wanted to let you know that the website you provide shows a 404.

Here is a screenshot of that:
Screenshot (497)

Adding tests to xkcd-dl

Here is my test plan.

All tests would run in /tmp/xkcd-dl-home/
update_dict() creates new dict in specified xkcd_dict_location. Check if file is created and size > 0
test_download_one() --> Would download a jpeg, png and gif xkcd comic and test if specified folders are created with the images and description.
test_show_xkcd -> show xkcd comic not downloaded. show xkcd comic already downloaded.
Should this actually open a comic?
test_download_range() - download a certain range and check if folders are created
I did not get set_custom_path to work. What is it’s intended behaviour?
test_download_latest() <— I think this might be a little redundant.

So do tell me if I’m missing some obvious things/or something else completely.

Does not work out of the box, if at all

$ python setup.py build
$ sudo python setup.py install
$ xkcd-dl
Traceback (most recent call last):
  File "/usr/local/bin/xkcd-dl", line 9, in <module>
    load_entry_point('xkcd-dl==0.0.5', 'console_scripts', 'xkcd-dl')()
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 351, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2363, in load_entry_point
    return ep.load()
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2088, in load
    entry = __import__(self.module_name, globals(),globals(), ['__name__'])
  File "/usr/local/lib/python2.7/dist-packages/xkcd_dl-0.0.5-py2.7.egg/bin/main.py", line 21, in <module>
    import urllib.request
ImportError: No module named request
$ sudo apt-get install python-urllib3 python-requests
Reading package lists... Done
Building dependency tree       
Reading state information... Done
python-requests is already the newest version.
python-urllib3 is already the newest version.

I don't believe this is how python is meant to function. When did something that was supposed to be simple and easy for rapid prototyping turn into this over-complicated setup.py egg nonsense, where even if you have a dependency properly installed, it still breaks because there's apparently no python equivalent to -I/path/to/h/file -L/path/to/library-binary? Couple this with the fact that there are both python 2 and 3 in active development with divided, separate communities, and you get a bunch of crap. This is why I stopped using Python. Sure, it's easy to program in, but it fails epically in the portability department. I have never had this much trouble with C.

Rant aside, this repo seems to be broken for the above reasons. Not to mention, upon inspecting urllib itself, there doesn't even seem to be a "request" object/module inside urllib, nor does there appear to be a urllib.request module of any fashion other than /usr/lib/python2.7/dist-packages/urllib3/request.py. Is that the package? Shouldn't the import line thus be "import urllib3.request"? What's going on here? How did you even manage to get this to function on your system? Cause I can't. The only way I can import urllib.request of any fashion is to go directly into urllib3's directory and type "import urllib3.request". That won't work anywhere else, and regular urllib doesn't have a request submodule, according to python. And symlinking the directory doesn't work, either.

This is so broken.

Needs cleanup

As mentioned in #6, I tried to step into the code and immediately noticed 3 things:

  • The functions are absolutely huge and the code littered with comments.
  • The functions do way too much.
  • Multiple functions are repeating the same steps, making it impossible to change the download behavior without changing multiple functions.

The code needs some serious refactoring.

My suggestions are to:

  1. Start breaking the functions down into smaller functions
  2. Remove duplicate functionality (e.g. move all the code specific to downloading a page to download_page(xkcd_number), like the example referenced in the last comment in #6)

Installed, but after that nothing works.

Ubuntu 20.04,
Python version 3.8.5
Installing message (pip3 install xkcd-dl):

Collecting xkcd-dl
  Downloading xkcd-dl-0.1.2.tar.gz (13 kB)
Collecting beautifulsoup4==4.4.1
  Downloading beautifulsoup4-4.4.1-py3-none-any.whl (81 kB)
     |████████████████████████████████| 81 kB 807 kB/s
Collecting python-magic==0.4.10
  Downloading python-magic-0.4.10.tar.gz (4.0 kB)
Collecting requests==2.8.1
  Downloading requests-2.8.1-py2.py3-none-any.whl (497 kB)
     |████████████████████████████████| 497 kB 2.3 MB/s
Building wheels for collected packages: xkcd-dl, python-magic
  Building wheel for xkcd-dl (setup.py) ... done
  Created wheel for xkcd-dl: filename=xkcd_dl-0.1.2-py3-none-any.whl size=9474 sha256=467c59d454051e43fd8a9a40dfd81030d1493b623abe3d7ffba3b835714da7d9
  Stored in directory: /home/bawse69/.cache/pip/wheels/5c/04/03/a4b0774b2d78efb7ecbc883568913b333c52d6d47f15d883e9
  Building wheel for python-magic (setup.py) ... done
  Created wheel for python-magic: filename=python_magic-0.4.10-py3-none-any.whl size=4214 sha256=d666fcb216df708acce1446a88a2f271508c1abab1c45692d439b775fc8bf654
  Stored in directory: /home/bawse69/.cache/pip/wheels/b4/4a/23/cc7d0113212a0146b392420e7968a909ca14377c0b3fd068a7
Successfully built xkcd-dl python-magic
Installing collected packages: beautifulsoup4, python-magic, requests, xkcd-dl
Successfully installed beautifulsoup4-4.4.1 python-magic-0.4.10 requests-2.8.1 xkcd-dl-0.1.2

When I do --update-db, message received is:

Traceback (most recent call last):
  File "/home/bawse69/.local/bin/xkcd-dl", line 5, in <module>
    from xkcd_dl.cli import main
  File "/home/bawse69/.local/lib/python3.8/site-packages/xkcd_dl/cli.py", line 21, in <module>
    from bs4 import BeautifulSoup as bs4
  File "/home/bawse69/.local/lib/python3.8/site-packages/bs4/__init__.py", line 30, in <module>
    from .builder import builder_registry, ParserRejectedMarkup
  File "/home/bawse69/.local/lib/python3.8/site-packages/bs4/builder/__init__.py", line 314, in <module>
    from . import _html5lib
  File "/home/bawse69/.local/lib/python3.8/site-packages/bs4/builder/_html5lib.py", line 70, in <module>
    class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder):
AttributeError: module 'html5lib.treebuilders' has no attribute '_base'

Same traceback error is received when I do just xkcd-dl. Should I try python 2?

Sanitizing titles in the comic

The title of xkcd#934 "Mac/PC" breaks the download process. May be replace such characters with something safer (like an underscore) would solve it.

Except for that, I love this.

Opening on the comic on the command line, options?

My go to image viewer is feh, but my xdg-open default is eog. I looked at img2text but it seems unsuitable for stick figures.

This is not really helpful:
2016-04-11-064215_956x1076_scrot

The comic in the picture is #1665, city talk pages.

Consider this issue an RFC on using feh as default and xdg-open if there is no feh.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.