Coder Social home page Coder Social logo

zembrodt / pymdb Goto Github PK

View Code? Open in Web Editor NEW
6.0 2.0 0.0 386 KB

Python package to both parse datsets provided by IMDb and scrape information from imdb.com

Home Page: https://pymdb.com

License: MIT License

Python 100.00%
imdb imdb-dataset imdb-api imdb-movies movies movies-api movie-database moviedb-api tvdb pymdb

pymdb's Introduction

PyMDb

PyPI Python Versions License Build Status

PyMDb is a package for both parsing the datasets provided by IMDb and scraping information from their web pages.

This package is able to gather information on people, titles, and companies provided by IMDb and is split into two separate modules: one for parsing the IMDb datasets, and one for scraping webpages on imdb.com.

Installation

The latest release of PyMDb can be installed from PyPI with:

pip install py-mdb

If downloading the source from GitHub, PyMDb requires the following packages:

Usage

>>> import pymdb
>>> from collections import defaultdict
>>>
>>> parser = pymdb.PyMDbParser(gunzip_files=True)
>>> genre_count = defaultdict(int)
>>> for title in parser.get_title_basics("path/to/files"):
...     for genre in title.genres:
...             genre_count[genre] += 1
...
>>> for genre in genre_count:
...     print(f"{genre}: {genre_count[genre]}")
...
Documentary: 600184
Short: 837912
Animation: 312227
    ...
Talk-Show: 584252
Reality-TV: 307037
Adult: 178493
>>>
>>> scraper = pymdb.PyMDbScraper(rate_limit=500)
>>> title = scraper.get_title("tt0076759")
>>> print(f"{title.display_title} came out in {title.release_date.year}!")
Star Wars: Episode IV - A New Hope came out in 1977!

Documentation

Full documentation can be found at the PyMDb Read the Docs page.

Disclaimer

PyMDb is still in a pre-release state and has only been tested with a small amount of data found on imdb.com. The web scraper portion of the code does have a rate limiter value you can customize, please be kind to IMDb. If any bugs or issues are found, please do not hesitate to create an issue or make a pull request on GitHub. Suggestions for features to be added to PyMDb in future releases are also welcome!

License

This project is licensed under the MIT License. Please see the LICENSE file for details.

pymdb's People

Contributors

zembrodt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

pymdb's Issues

Import issues within the pymdb package

Stack trace:

File "C:\Users\Administrator\Documents\python_projects\django_test\update\view
s.py", line 5, in
from pymdb import PyMDbScraper
File "C:\Python\Python37\lib\site-packages\pymdb_init_.py", line 1, in
from .parser import PyMDbParser
File "C:\Python\Python37\lib\site-packages\pymdb\parser_init_.py", line 1,
in
from .pymdb_parser import PyMDbParser
File "C:\Python\Python37\lib\site-packages\pymdb\parser\pymdb_parser.py", line
7, in
from pymdb.models.name import (
File "C:\Python\Python37\lib\site-packages\pymdb\models_init_.py", line 1,
in
from .company import *
File "C:\Python\Python37\lib\site-packages\pymdb\models\company.py", line 6, i
n
from pymdb.utils import is_int
ImportError: cannot import name 'is_int' from 'pymdb.utils' (C:\Python\Python37
lib\site-packages\pymdb\utils_init_.py)

Add IMDb Search Requests

Add the ability to search IMDb with keywords.

IMDb's search works by sending a GET request to the server (for example: https://v2.sg.media-imdb.com/suggestion/r/robert.json) and receives a JSON response with the results to populate.

Use this JSON response to create a SearchResult Python object containing the ID, and potentially other details, of each result. May need to exclude certain results such as videos or trailers.

Results we care about should only be titles (tt) and people (nm).

Map credit job categories into key values

Map a credit job type, such as "Series Directed by" into an actual key such as "director". This will help align with the already hard coded "actor" category in get_full_cast.

Modify Generator Methods to Also Return Dictionaries

Modify certain generator methods, potentially for both PyMDbParser and PyMDbScraper, that return a dictionary of objects rather than a generator. Example would be get_full_credits returning a dictionary where each key is a unique job title and the values are an array of Credit objects.

get_title returns null display_title

Occasionally the get_title method returns a Title with a null display_title within unit tests, but other times will return the correct title.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.