Coder Social home page Coder Social logo

podscrape's Introduction

Podscrape

A Python module for crawling the iTunes Podcast directory.

The reason for scraping the web pages is to collect each of the podcast urls, which contains the podcast's id number. This id number can be used in Apple's Lookup API in order to get the podcast's actual url (Apple doesn't host any of these podcasts, just puts up a fancy landing page for them on iTunes). Since I use Linux, and don't use iTunes, I'd rather have the original feed urls.

A sample script is included in the root directory of the project. It starts from near the end of the list. If you want to run it from the beginning, the very first url is in a comment in the script. Keep Control+C handy, it will probably run for a while if you let it. I haven't tested to see what kind of rate limiting Apple does, or if they have cutoffs. So consider that before running from the beginning.

Running the Unit Tests:

# Once nose is installed
$ nosetests
Dependencies:
  • beautifulsoup4: for parsing and extracting content from web pages
  • nose: for testing (though some of the tests are usable with unittest)
  • requests: for making http requests

Set up the Python environment

Install Virtualenv

Always use a virtualenv when you install Python package requirements. It saves a ton of pain with library versioning issues, and prevents clutter in the system wide package directory. It also lets you install Python packages through its own internal pip, which doesn't require superuser permissions.

If pip isn't installed, nor available in the package manager repositories, then install distribute first. I use distribute instead of setup-tools because distribute is more forward compatible with Python 3.

Install Distribute:

$ curl -O http://python-distribute.org/distribute_setup.py
$ python distribute_setup.py

Install Pip:

$ curl -O https://raw.github.com/pypa/pip/master/contrib/get-pip.py
$ [sudo] python get-pip.py

Install Virtualenv:

$ [sudo] pip install virtualenv

Set up the Virtualenv

Make a virtualenv for us to run in:

#This can go wherever. e.g. /usr/local/
#Creates a new folder './podscrape-env' to hold the environment
#If you're using setup-tools instead of distribute, leave out the --distribute flag
$ cd /usr/local
$ virtualenv podscrape-env --distribute --no-site-packages

Activate the virtualenv:

$ source /usr/local/podscrape-env/bin/activate

Install the requirements from the project:

$ cd /path/to/the/code
$ pip install -r requirements.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.