Coder Social home page Coder Social logo

cinemagoer / cinemagoer Goto Github PK

View Code? Open in Web Editor NEW
1.2K 49.0 349.0 5.84 MB

Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies

Home Page: https://cinemagoer.github.io/

License: GNU General Public License v2.0

Python 99.86% Makefile 0.14%
imdb movies actors cinema movie-database python database sql cast internet-movie-database

cinemagoer's Introduction

PyPI version. Supported Python versions. Project license.

Cinemagoer (previously known as IMDbPY) is a Python package for retrieving and managing the data of the IMDb movie database about movies, people and companies.

This project and its authors are not affiliated in any way to Internet Movie Database Inc.; see the DISCLAIMER.txt file for details about data licenses.

Revamp notice

Starting on November 2017, many things were improved and simplified:

  • moved the package to Python 3 (compatible with Python 2.7)
  • removed dependencies: SQLObject, C compiler, BeautifulSoup
  • removed the "mobile" and "httpThin" parsers
  • introduced a test suite (please help with it!)

Main features

  • written in Python 3 (compatible with Python 2.7)
  • platform-independent
  • simple and complete API
  • released under the terms of the GPL 2 license

Cinemagoer powers many other software and has been used in various research papers. Curious about that?

Installation

Whenever possible, please use the latest version from the repository:

pip install git+https://github.com/cinemagoer/cinemagoer

But if you want, you can also install the latest release from PyPI:

pip install cinemagoer

Example

Here's an example that demonstrates how to use Cinemagoer:

from imdb import Cinemagoer

# create an instance of the Cinemagoer class
ia = Cinemagoer()

# get a movie
movie = ia.get_movie('0133093')

# print the names of the directors of the movie
print('Directors:')
for director in movie['directors']:
    print(director['name'])

# print the genres of the movie
print('Genres:')
for genre in movie['genres']:
    print(genre)

# search for a person name
people = ia.search_person('Mel Gibson')
for person in people:
   print(person.personID, person['name'])

Getting help

Please refer to the support page on the project homepage and to the the online documentation on Read The Docs.

The sources are available on GitHub.

Contribute

Visit the CONTRIBUTOR_GUIDE.rst to learn how you can contribute to the Cinemagoer package.

License

Copyright (C) 2004-2022 Davide Alberani <da --> mimante.net> et al.

Cinemagoer is released under the GPL license, version 2 or later. Read the included LICENSE.txt file for details.

NOTE: For a list of persons who share the copyright over specific portions of code, see the CONTRIBUTORS.txt file.

NOTE: See also the recommendations in the DISCLAIMER.txt file.

cinemagoer's People

Contributors

aapjeisbaas avatar alberanid avatar alipphardt avatar arshamalh avatar cclauss avatar csweaver avatar darklow avatar deg3x avatar enriqueav avatar ethorne2 avatar frleder avatar grbavacigla avatar jcea avatar jsynowiec avatar kostko avatar kostyafarber avatar maximshidlovski23 avatar miigotu avatar philippe-cholet avatar salehdehqanpour avatar sandrotosi avatar sergiomaffeis avatar shobhitsinghal624 avatar squigglezworth avatar sstifler avatar tadoran avatar tsaklidis avatar uyar avatar vlyalcin avatar werecatf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cinemagoer's Issues

imdbpy2sql.py never completes (OS X 10.7.5 Lion)

I've tried to run imdbpy2sql.py three times now and each time it gets to the stage "adding foreign keys" and stalls. I left it running for more than 24 hours then gave up. Running it again I can see it get to around 47/48 minutes CPU time in Activity Monitor (which is like a GUI version of the top command) then nothing happens. There's no disk activity and the memory usage is zero.

I set the computer never to sleep and ran the command through nohup in case that was a factor but the same thing happened.

As it's done most of the work except for the foreign keys, is there a way I can re-run it, skipping everything else and just doing the foreign keys?

Non-numeric season titles confuse the number of seasons

Some TV series have season titles which are not numeric. For example "Doctor Who 2005" (http://akas.imdb.com/title/tt0436992/combined) has a season titled "Unknown" which contains episodes that are not part of regular seasons. So the list of seasons for this series (as of May 2017) is [1,2,3,4,5,6,7,8,9,10,unknown]. IMDbPY only stores the number of seasons, in this case 11.

If one wants to get to the pages for the seasons, the link for the "unknown" season (http://akas.imdb.com/title/tt0436992/episodes?season=unknown) can not be generated. Worse, one might think that there would be a page URL such as http://akas.imdb.com/title/tt0436992/episodes?season=11

A probable solution would be to store the titles of the seasons as a list of strings.

Another related problem is regarding the number of seasons. In this example, I would suggest that the correct number of seasons should be 10 (since there is no season 11). This value could be reported as the largest numeric value in the season title list.

Movie color info parsed incorrectly

In the movie combined details page, the color info is parsed incorrectly for some titles. For the title with id '0063650' (the movie "If....") the color info is reported as ':(Eastmancolor) (uncredited)' when it should have been 'Color::(Eastmancolor) (uncredited)'

NULL movie_id in move_link file

 * LOADING CSV FILE /home/bbaumer/dumps/imdb/raw/movie_link.csv...
ERROR: unable to import CSV file /home/bbaumer/dumps/imdb/raw/movie_link.csv: null value in column "movie_id" violates not-null constraint
DETAIL:  Failing row contains (14300, null, 95357, 12).
CONTEXT:  COPY movie_link, line 14300: "14300,NULL,95357,12"

Using PostgreSQL and a recent copy of the data files.

Movie IMDb index in search results not parsed

Instead, the IMDb index becomes part of title. For example, search for the title "blink". The result includes an item "Blink (IV)". This text is interpreted as the title of the movie instead of the title being "Blink" and adding an imdbIndex key with the value "'IV".

--local-infile with MySQL?

I'm trying to use imdbpy2sql. When running

imdbpy2sql.py --mysql-force-myisam -d ~/dumps/imdb/raw -u 'mysql://root:<password>@localhost/imdb' -c ~/dumps/imdb/raw

I get the following error:

loading CSV files into the database
 * LOADING CSV FILE /home/bbaumer/dumps/imdb/raw/complete_cast.csv...
ERROR: unable to import CSV file /home/bbaumer/dumps/imdb/raw/complete_cast.csv: (1148, 'The used command is not allowed with this MySQL version')

The problem is that the --local-infile flag on my client is not on. Now, if I was writing the command myself, I could just add --local-infile=1 and it should work. But since imdbpy2sql is generating the mysql command for me, I can't add that option.

Could you add that option? Or an option to pass-through additional arguments to mysql?

Or is there a better solution? Any help would be appreciated.

[BTW, I am working on a derivative R package. See (https://github.com/beanumber/imdb/issues/3)]

How to obtain the plot of a movie?

Hi,

My question maybe stupid, I am trying to obtain the plots of several movies, but I didn't find the right API to use.

I tried the following code:
the_matrix = ia.get_movie('0133093')
print the_matrix.get('plot outline') # works for me
print the_matrix['plot'] # doesn't work for me

I am wondering what is the right way to get the plot outlines, summaries, and synopses?

Constraint errors

Hey !

With the new version, I'm having two errors when building the foreign keys :

ERROR caught exception creating a foreign key: Cannot add or update a child row: a foreign key constraint fails (`imdb`.<result 2 when explaining filename '#sql-dbf_47'>, CONSTRAINT `aka_title_movie_id_exists` FOREIGN KEY (`movie_id`) REFERENCES `title` (`id`))
ERROR caught exception creating a foreign key: Cannot add or update a child row: a foreign key constraint fails (`imdb`.<result 2 when explaining filename '#sql-dbf_4a'>, CONSTRAINT `movie_link_linked_movie_id_exists` FOREIGN KEY (`linked_movie_id`) REFERENCES `title` (`id`))

It seems that you try to insert aka and movie links to a title that doesn't exists.

Unicode error in actor names

Movie ID : 0060196

Line 90

for name in cast:
       print '      %s (%s)' % (name['name'], name.currentRole)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 11: ordinal not in range(128)

Allow local database update

A very interesting feature of this project is the possibility to download a local copy of the whole imdb database.

But it could be more interesting to update this database without downloading it again, maybe by looking the RSS feeds.

Duplicate persons in imdbpy2sql database, probably normal/canonical parser problem

Just noticed database contains two entries for same person:

   id    |        name         | imdb_index | gender | name_pcode_cf | name_pcode_nf | surname_pcode |              md5sum              
---------+---------------------+------------+---------+--------+---------------+---------------+---------------+----------------------------------
 1520700 | Toro, Guillermo del |            |  m      | T6246         | G4653         | T6            | 6fa07d318a205b82ef40a5c76db0974e
 2709933 | del Toro, Guillermo |            |         | D4362         | G4653         | D436          | 99b042e07b27b0321ca6e19f126c67f0

Could this be related to some bug in name parsing im imdbpy2sql script?
One name appears in normal way another in canonical.

First name has 100 movies
Second name has 69 movies

All of 169 movies belong to real Guillermo del Toro, which means that this must be one person instead of two.

In few days will update to latest text files and see if there are any changes, but i think bug can be because of middle "... del ..." name

AKA list not populating correctly..

Lets use imdb id tt4005402 as an example.. the original title of the movie was Colonia, The movie was renamed to The Colony...

http://akas.imdb.com/title/tt4005402/?ref_=fn_al_tt_1

image

Full list of AKA's is http://akas.imdb.com/title/tt4005402/releaseinfo?ref_=tt_dt_dt#akas

image

how ever if we do a search...

from imdb import IMDb
from pprint import pprint

ia = IMDb(accessSystem='http')
imdb_result = ia.search_movie('Colonia', results=5)

for res in imdb_result:
     pprint(res.data)

We only get the following results...

{'akas': [u'Colonia Dignidad'],
 'kind': u'movie',
 'title': u'The Colony',
 'year': 2015}
{'kind': u'tv series',
 'title': u'Colonia (2013) (TV Episode)  - Season 12 | Episode 3  - Espa\xf1oles en el mundo',
 'year': 2009}
{'kind': u'tv series',
 'title': u'Colonia (2011) (TV Episode)  - Season 2 | Episode 7  - Danni Lowinski',
 'year': 2010}
{'akas': [u'La colonia'],
 'kind': u'movie',
 'title': u'The Colony (I)',
 'year': 2013}
{'akas': [u'Colonia, brigada criminal'],
 'kind': u'tv series',
 'title': u'SOKO K\xf6ln',
 'year': 2003}

Where as the main page lists proper aka Original title, and is also listed in the aka section... where as the aka in your result is simply a one element list...

If you then pull the via the imdb id ia.get_movie(4005402) , The title moves from The Colony and becomes Colonia

The aka list then becomes..

[u'Colonia::France (imdb display title), International (imdb display title)',
 u'Colonia Dignidad - Es gibt kein Zur\xfcck::, Germany (imdb display title)',
 u'The Colony::UK (imdb display title)',
 u'\u0397 \u03b1\u03c0\u03bf\u03b9\u03ba\u03af\u03b1::Greece',
 u'\u041a\u043e\u043b\u043e\u043d\u0438\u044f \u0414\u0438\u0433\u043d\u0438\u0434\u0430\u0434::Russia',
 u'A kol\xf3nia::Hungary (imdb display title)',
 u'Amor e Revolu\xe7\xe3o::Brazil (imdb display title)',
 u'Colonia Dignidad::Chile (imdb display title)',
 u'Kolonija::Slovenia (imdb display title)']

imdbpy2sql shows an error while finishing with foreign keys

adding foreign keys (this may take a while)
ERROR caught exception creating a foreign key: insert or update on table "aka_title" violates foreign key constraint "movie_id_exists"
DETAIL:  Key (movie_id)=(0) is not present in table "title".

 # TIME createForeignKeys() : 1min, 28sec (wall) 0min, 0sec (user) 0min, 0sec (system)

Is it ok receiving such an error about foreign keys while finishing imdbpy2sql script?
Is it just one FK failed or whole part of FKs?

Direct hit parsers can be removed from movie search pages

If I understand correctly what this feature does, it doesn't apply anymore. A search that results in one movie doesn't display the movie page. For example, searching for "Od+instituta+do+proizvodnje" displays a result page with only one movie in it. If that's really the case removing these parsers from the search pages would simplify the code.

codename: simplify

IMDbPY contains a lot of legacy code and needs some new features.
Let's fix it in the master branch. :-)

If you need the old version (supporting Python 2.7), look at the imdbpy-legacy branch.

  • remove the "mobile" parser
  • remove SQLObject support
  • remove the cutils C module (keep it, but make it optional and off by default)
  • move to Python 3: #27
    • http parser
    • sql parser
  • introduce support for the new data set: #60
  • introduce python-requests for queries (to support sessions): #87
  • introduce documentation about those changes

Optionally:

  • remove the BeautifulSoup dependency (python-lxml will be required)
  • if possible, re-introduce Python 2.7 compatibility

getIMDB __dict__ not being fully populated anymore

As an example of the following search does not populate the attributes genre, year , cast etc

This is for the imdb id 3315342

{'_Container__role': None,
 '_roleClass': <class 'imdb.Character.Character'>,
 '_roleIsPerson': False,
 'accessSystem': 'http',
 'charactersRefs': {},
 'current_info': ['main', 'plot'],
 'data': {'kind': u'movie',
          'plot': [u'In 2029 the mutant population has shrunken significantly and the X-Men have disbanded. Logan, whose power to self-heal is dwindling, has surrendered himself to alcohol and now earns a living as a chauffeur. He takes care of the ailing old Professor X whom he keeps hidden away. One day, a female stranger asks Logan to drive a girl named Laura to the Canadian border. At first he refuses, but the Professor has been waiting for a long time for her to appear. Laura possesses an extraordinary fighting prowess and is in many ways like Wolverine. She is pursued by sinister figures working for a powerful corporation; this is because her DNA contains the secret that connects her to Logan. A relentless pursuit begins - In this third cinematic outing featuring the Marvel comic book character Wolverine we see the superheroes beset by everyday problems. They are aging, ailing and struggling to survive financially. A decrepit Logan is forced to ask himself if he can or even wants to put his remaining powers to good use. It would appear that in the near-future, the times in which they were able put the world to rights with razor sharp claws and telepathic powers are now over.',
                   u"In the near future, a weary Logan cares for an ailing Professor X somewhere on the Mexican border. However, Logan's attempts to hide from the world and his legacy are upended when a young mutant arrives, pursued by dark forces."],
          'title': u'Help'},
 'infoset2keys': {'main': ['kind', 'title'], 'plot': ['plot']},
 'key2infoset': {'kind': 'main', 'plot': 'plot', 'title': 'main'},
 'keys_tomodify': {'alternate versions': None,
                   'business': None,
                   'crazy credits': None,
                   'dvd': None,
                   'faqs': None,
                   'goofs': None,
                   'laserdisc': None,
                   'news': None,
                   'plot': None,
                   'quotes': None,
                   'soundtrack': None,
                   'supplements': None,
                   'trivia': None,
                   'video review': None},
 'modFunct': <function modClearRefs at 0x7f5bb681acf8>,
 'movieID': '3315342',
 'myID': None,
 'myTitle': u'',
 'namesRefs': {},
 'notes': u'',
 'titlesRefs': {}}

pep8

Any Python repos should be formatted to adhere to pep8.
https://www.python.org/dev/peps/pep-0008/

Some of the formatting standards are commonly ignored, however, such as

  • E501: line too long - lines should only no more than 79 characters in length

Import of data into MySQL db via imdb2sql hanging

My import of the data into the MySQL database using imdb2sql.py is getting stuck at the following:

building database indexes (this may take a while)
# TIME createIndexes() : 31min, 51sec (wall) 0min, 0sec (user) 0min, 0sec (system)
adding foreign keys (this may take a while)

Have tried it twice, but it keeps hanging at this point.
Any suggestions how to tackle this and whether the databases can be used by aborting at this point?

-Saish

Person akas not collected from search page

The HTML markup for person akas has changed from <em> to <i> but I haven't changed it in the parser because there seems to be inconsistency between how movie akas and person akas are handled. For movies, the parser returns:

(imdb_id, {'title': ..., 'akas': [...]})

whereas for persons the parser returns:

(imdb_id, {'name': ...}, [list_of_akas?])

The akas are a part of the dict in the movie result and they are the third element of the tuple in the person result.

Install without sql feature fails

repo freshly cloned,
OSX 10.11.12

flap at MacBook-Pro on master* ± python ./setup.py --without-sql  install                                                                                          ~/Dev/imdbpy 1 ↵ 
Created locale for: ar bg de en es fr it tr.
Traceback (most recent call last):
  File "./setup.py", line 238, in <module>
    setuptools.setup(**params)
  File "/usr/local/Cellar/python/2.7.8_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/core.py", line 137, in setup
    ok = dist.parse_command_line()
  File "build/bdist.macosx-10.9-x86_64/egg/setuptools/dist.py", line 275, in parse_command_line
  File "build/bdist.macosx-10.9-x86_64/egg/setuptools/dist.py", line 371, in _finalize_features
  File "build/bdist.macosx-10.9-x86_64/egg/setuptools/dist.py", line 785, in include_in
  File "build/bdist.macosx-10.9-x86_64/egg/setuptools/dist.py", line 414, in include_feature
distutils.errors.DistutilsOptionError: access to SQL databases is required, but was excluded or is not available

Importing data has broke in the latest commit

Commit: 170251d

python imdbpy2sql.py --mysql-force-myisam -d ~/Downloads/imdb-data/ -u mysql://user@localhost:/imdb-data

Traceback (most recent call last):
File "imdbpy2sql.py", line 3074, in
run()
File "imdbpy2sql.py", line 2939, in run
readMovieList()
File "imdbpy2sql.py", line 1533, in readMovieList
mid = CACHE_MID.addUnique(title, yearData)
File "imdbpy2sql.py", line 1137, in addUnique
else: return self.add(key, miscData)
File "imdbpy2sql.py", line 1012, in add
self[key] = c
File "imdbpy2sql.py", line 921, in setitem
self.flush()
File "imdbpy2sql.py", line 975, in flush
self.flush(quiet=quiet, _recursionLevel=_recursionLevel)
File "imdbpy2sql.py", line 975, in flush
self.flush(quiet=quiet, _recursionLevel=_recursionLevel)
File "imdbpy2sql.py", line 975, in flush
self.flush(quiet=quiet, _recursionLevel=_recursionLevel)
File "imdbpy2sql.py", line 976, in flush
self._tmpDict = secondHalf

update episodes - missing pilot

If you run the following commands:
import imdb
i = imdb.IMDb()
res = i.search_movie('blackadder')
r1 = res[0]
i.update(r1,'episodes')
print str(r1['episodes'])

You will get the following results:
{1: {1: <Movie id:0526541[http] title:_"The Black Adder" The Foretelling (1983)_>, 2: <Movie id:0526537[http] title:_"The Black Adder" Born to Be King (1983)_>, 3: <Movie id:0526539[http] title:_"The Black Adder" The Archbishop (1983)_>, 4: <Movie id:0526542[http] title:_"The Black Adder" The Queen of Spain's Beard (1983)_>, 5: <Movie id:0526543[http] title:_"The Black Adder" Witchsmeller Pursuivant (1983)_>, 6: <Movie id:0526540[http] title:_"The Black Adder" The Black Seal (1983)_>}}

which is missing the pilot episode ref: (http://www.imdb.com/title/tt0084988/episodes?season=1&ref_=tt_eps_sn_1)

Tested on latest version (cc63c25)

Mini series kind and years parsed incorrectly

The kind for mini series is reported as "tv series" (code documentation suggests that it would be "tv mini series"). Also, series years doesn't get parsed. Data to test: Band of Brothers (id: 0185906). The "series years" key should be "2001-2001" but it's not in the result.

Importing to Sql Server - text has encoding issues

Hi,

Thank you very much for a great tool.

I am using it to import latest IMDB csv files to local Sql Server Express 2014 database.
When I look at the data in DB I see text like this in title.title.:

"A Próxima Vítima"
"Discriminación en el lenguaje"
...

In name.name I see,

"Aarseth, Øystein"
"Abati, Joël"
...

Looks like something with encoding. The command I use to import is:
python.exe imdbpy2sql.py-d C:\imdb-files -u "mssql://connection text" --ms-sqlserver

I am using Sql Server 2014 express.
What can I do to fix it?

Thank you,

Eric

search_movie "invalid syntax"

Hi,

Apologies for what is likely a case of user error.

I used "pip install imdbpy" to install the software. It completed without error.
I can run the following commands in ipython without error:

import imdb
ia = imdb.IMDb()

But when I try ia.search_movie('Jaws'), the result is always null ([ ]). This is true regardless of what movie I am searching for.

If I just import search_movie and try
search_movie 'Jaws'

I get the error, "Invalid syntax."

Can someone enlighten me?

Python3 compatibility

As Guido von Rossum said this PyCon, everybody should start switiching to python3.

Videogames are indicated as "movie" in person filmography

The kind attribute of an element in the filmography of a person has the value "movie" for elements that are not movies at all.
Here is a simple test with actor Elijah Wood and the movie and videogame with the same title The Lord of the Rings: The Return of the King.

import unittest
from imdb import IMDb

class TestMovieKind(unittest.TestCase):

    def runTest(self):
        ia = IMDb(loggingLevel='error')
        person = ia.get_person('0000704') # Elijah Wood
    for movie in person.get('actor', []):
        if movie.getID() == '0387360':
                print movie['title'], '=> videogame for Xbox'
                self.assertNotEqual(movie['kind'], 'movie')
            if movie.getID() == '0167260':
                print movie['title'], '=> movie'
                self.assertEqual(movie['kind'], 'movie')

if __name__ == "__main__":
    unittest.main()

I can't get the correct indentation of the python code, sorry, when I paste it I lose it...

Timeout error IOError('socket error', timeout('timed out',))

When I try

ia = imdb.IMDb()
s_result = ia.search_movie(title)
first = s_result[0]
# get synopsis                                                                                                                             
ia.update(first, 'synopsis')

I get this error in many cases

CRITICAL [imdbpy] /usr/lib64/python2.7/site-packages/imdb/_exceptions.py:35: IMDbDataAccessError exception raised; args: ({'exception type': 'IOError', 'url': 'http://akas.imdb.com/title/tt0110647/combined', 'errcode': 'socket error', 'proxy': '', 'original exception': IOError('socket error', timeout('timed out',)), 'errmsg': 'timed out'},); kwds: {}

Any reason for this?

Python 3.X support

I am writing pythoon 3 project, and i want to use this awesome plugin. Do you plan in future to support python 3.0?

Kind entry for video games has inconsistent case with others

The kind for video games is given as "Video Game" whereas for other kinds the value is in lower case, as in "tv series" or "video movie".
Possible fix: Modify the _TITLE_KINDS dict in imdb/utils.py.
I'm not changing it since it might break compatibility with existing code.

movie.asXML Key Error: 'long imdb name'

I used imdbpy's .asXML method to generate a collection of XML documents that represent the all movies within the IMDB dataset (as of 3/2015). While this works for the vast majority of titles, I receive the following stack trace when attempting to get the XML representation of a title named "Los rosarios" (XML output from OMDB below). Presumably, this indicates that the long name for the director in question does not exist.

STARTING RUN WITH 8 PROCESSES
CREATING NODES FOR 3224547 MOVIES
ERROR: COULD NOT PARSE MOVIE WITH ID: 2680117
TRACEBACK:
Traceback (most recent call last):
File "generate_xml_nodes.py", line 53, in create_nodes
tree = ElementTree.XML(result.asXML(), parser)
File "/usr/local/lib/python2.7/dist-packages/IMDbPY-5.1dev_20150313-py2.7-linux-x86_64.egg/imdb/utils.py", line 1452, in asXML
value = self.getAsXML(key, _with_add_keys=_with_add_keys)
File "/usr/local/lib/python2.7/dist-packages/IMDbPY-5.1dev_20150313-py2.7-linux-x86_64.egg/imdb/utils.py", line 1441, in getAsXML
fullpath=tag))
File "/usr/local/lib/python2.7/dist-packages/IMDbPY-5.1dev_20150313-py2.7-linux-x86_64.egg/imdb/utils.py", line 1088, in _seq2xml
fullpath='%s.%s' % (fullpath, tagName))
File "/usr/local/lib/python2.7/dist-packages/IMDbPY-5.1dev_20150313-py2.7-linux-x86_64.egg/imdb/utils.py", line 1102, in _seq2xml
item.class.name.lower()))
File "/usr/local/lib/python2.7/dist-packages/IMDbPY-5.1dev_20150313-py2.7-linux-x86_64.egg/imdb/utils.py", line 1117, in _seq2xml
_l.extend(_tag4TON(seq))
File "/usr/local/lib/python2.7/dist-packages/IMDbPY-5.1dev_20150313-py2.7-linux-x86_64.egg/imdb/utils.py", line 963, in _tag4TON
crValue = cr['long imdb name']
File "/usr/local/lib/python2.7/dist-packages/IMDbPY-5.1dev_20150313-py2.7-linux-x86_64.egg/imdb/utils.py", line 1472, in getitem
rawData = self.data[key]
KeyError: 'long imdb name'

OMDB XML data for this movie:

<root response="True"><movie title="Los rosariazos" year="2007" rated="N/A" released="01 Sep 2007" runtime="N/A" genre="Documentary" director="Carlos López" writer="N/A" actors="Daiana Barrios, Pablo Bonel, Rafael Cao, Fernando Carazo" plot="N/A" language="Spanish" country="Argentina" awards="N/A" poster="N/A" metascore="N/A" imdbRating="N/A" imdbVotes="N/A" imdbID="tt1247280" type="movie"/></root>

Relevant code:

      localdb = imdb.IMDb('sql', uri='mysql://root:password@localhost/imdb')
      for mid in range(start, stop):
          moviefile = 'movie-%d' % mid
          moviefilepath = os.path.join(SAVE_PATH, moviefile + '.xml')
          if not os.path.isfile(moviefilepath):
              result = localdb.get_movie(mid)
              parser = etree.XMLParser(remove_blank_text=True, recover=True, huge_tree=True, encoding='latin1')
              try:
                  tree = ElementTree.XML(result.asXML(), parser)
                  imdbfile = open(moviefilepath, 'w')
                  imdbfile.write(etree.tostring(tree, pretty_print=True))
                  imdbfile.close()
              except:
                  print "ERROR: COULD NOT PARSE MOVIE WITH ID: %d" % mid
                  print "TRACEBACK:"
                  print traceback.format_exc()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.