The tvgrabnlpy from tvgrabbers

Overlapping similar titles

I noticed while fixing the cache bug, that programs with the same name seem to overwrite each other. It's not urgent, but weird. So I keep an eye on that.

Incorrect data for channel 2BE

I noticed that data for Belgian channel 2BE is extremely unreliable (i.e. inaccurate). I've made several recordings where the recorded program is different from the one in its metadata (in other words: the program data was incorrect to begin with).
Another example is coming up this evening: on June 17th 2015 2BE will be broadcasting a movie called New Police Story at 20:35. Both 2BE's own website and teveblad.be agree on this. However, Myth thinks the movie will be Source Code.
I'm using Version: 2.1.6-p20150510-beta.

Wrong date offset for some channels

Some channels don't have the correct day. Looks like that it only concerns channels where not all programs start on the current day.

By example when request
http://www.tvgids.nl/json/lists/programs.php?channels=3&day=0
The json provides:
"51": {
"db_id": "12342149",
"titel": "Homeland",
"genre": "Serie/soap",
"soort": "Dramaserie",
"kijkwijzer": "gs3",
"artikel_id": null,
"datum_start": "2012-03-11 20:25:00",
"datum_end": "2012-03-11 21:20:00"
},

While the xml outputs:

<title lang="nl">Homeland</title> Serie/soap

btw some other observation looking at the code:

json provide an iso format datetime
only the hour+minute part is taken
the function correct_times correct it back to an datetime object
What is the function from that ? Can the correct_times not be entirely skipped or simplified if we just take the full datetime instead of only the time ?

Looking for better source for the sbs group channels

Still looking for a source containing better (with season/episode) information for the sbs group channels (sbs6, net5, veronica and sbs9)

--configure is broken

--configure gives an empty configuration file. This is bad.

Error with some ids

Looking at the list in depth I spotted some issues resulting in channels not linking with other sources because of typos as well as some missing mappings:

tvgids.tv
bbc-first should be linked with 0-464
ziggo-sport should be linked with 0-466

horizon.tv
Comedy Central Family (672816167176) should be linked with 0-317
FOX Sports 6 (606274087106) should be linked with 1-fox-sports-6
Ziggo Sport (675503655063) should be linked with 0-466
RTL Lounge (672816167174) should be linked with 0-408

vpro.nl
comedycentral should be comedy_central
24kitchen should be 24_kitchen

nieuwsblad.be
bbc_1 should be bbc-1
bbc_2 should be bbc-2
prime-serie should be prime-series

The MTV sources should be split since MTV Vlaanderen has a different schedule than MTV NL:
http://www.mtv.be/schedule/
http://www.mtv.nl/programma
MTV NL is the following:
0-25
1-mtv
5-24443943006

MTV BE is the following:
6-69
8-mtv

The same for Nickelodeon
http://www.nickelodeon.nl/tv-gids
http://www.nickelodeon.be/tv-gids
Nickelodeon NL is the following:
0-89
1-nickelodeon
5-542836775318

Nickelodeon BE is the following:
6-73
8-nickelodeon

New blocking Cooky popup on tvgids.nl

tvgids.nl has put a blocking popup in between asking you to agree. I'm looking at ways to handle this. But for now I will create an automatic fall back to the json page only trying ones.

wrong dates on teveblad.be and structural failures on the tvgids.nl detail pages

I just now notice two other problems. teveblad.be structurally returning wrong date and structural failures on the tvgids.nl detail pages. I'll look into that.

Slightly changed format on the tvgids.nl detail pages

This is not fatal, but an update will follow soon.

Use other data sources

Is dit wellicht waardevol qua data?
http://www.rtl.nl/active/epg_data/dag_data/0
En: http://www.rtl.nl/active/epg_data/uitzending_data/771904638412019

Only create ~/.xmltv directory when required

the --configure option always creates a ~/.xmltv directory, even if the configure file is in another directory. It shouldn't to that.

Changes to TVGids.tv?

For the past few days I've been getting some errors on TVGids.tv, the one thing in common is that it only appears on the Film1 and HBO channels, so it might be specific to movies:

2015-12-14 04:47:53 : Error processing tvgids.tv detailpage:http://www.tvgids.tv/tv/hollywood-banker/14702652
2015-12-14 04:47:53 : Traceback (most recent call last):
2015-12-14 04:47:53 :   File "tv_grab_nl.py", line 8482, in load_detailpage
2015-12-14 04:47:53 :     kw_val = d.find('div').get('class').strip()
2015-12-14 04:47:53 : AttributeError: 'NoneType' object has no attribute 'get'
2015-12-14 04:47:55 : Error processing tvgids.tv detailpage:http://www.tvgids.tv/tv/northpole/14702656
2015-12-14 04:47:55 : Traceback (most recent call last):
2015-12-14 04:47:55 :   File "tv_grab_nl.py", line 8482, in load_detailpage
2015-12-14 04:47:55 :     kw_val = d.find('div').get('class').strip()
2015-12-14 04:47:55 : AttributeError: 'NoneType' object has no attribute 'get'
2015-12-14 04:47:58 : Error processing tvgids.tv detailpage:http://www.tvgids.tv/tv/de-dolle-tweeling-3/14702662
2015-12-14 04:47:58 : Traceback (most recent call last):
2015-12-14 04:47:58 :   File "tv_grab_nl.py", line 8482, in load_detailpage
2015-12-14 04:47:58 :     kw_val = d.find('div').get('class').strip()
2015-12-14 04:47:58 : AttributeError: 'NoneType' object has no attribute 'get'
2015-12-14 04:48:01 : Error processing tvgids.tv detailpage:http://www.tvgids.tv/tv/the-birdcage/14702664
2015-12-14 04:48:01 : Traceback (most recent call last):
2015-12-14 04:48:01 :   File "tv_grab_nl.py", line 8482, in load_detailpage
2015-12-14 04:48:01 :     kw_val = d.find('div').get('class').strip()
2015-12-14 04:48:01 : AttributeError: 'NoneType' object has no attribute 'get'
2015-12-14 04:55:50 : Error processing tvgids.tv detailpage:http://www.tvgids.tv/tv/waar-is-het-paard-van-sinterklaas/14716775
2015-12-14 04:55:50 : Traceback (most recent call last):
2015-12-14 04:55:50 :   File "tv_grab_nl.py", line 8482, in load_detailpage
2015-12-14 04:55:50 :     kw_val = d.find('div').get('class').strip()
2015-12-14 04:55:50 : AttributeError: 'NoneType' object has no attribute 'get'

100% CPU load while running: is that 'normal'?

Hi, is it normal that I am experiencing 100% CPU-load (on one out of 4 cpu's only) while running? The cpu gets real hot for the duration of a run, which takes about an hour.

I plan to look into the cause of this myself. First, I converted the script to Python3, to see if that would make any difference. I does not. :-) Next thing, I built in some diagnostics/debugging code and I see that in fetching data for some 20 channels, it creates 28 threads. Then it seems to be working on fetching data for 2 channels concurrently. Once one of these channels finishes, the 'threading.active_count()' is reduced by one and a new channel starts being processed.
I will dig into this some more. If anyone has any pointers, then please share.

I am running on an Intel I5-4670 Haswell cpu, Fedora 22 Linux. Earlier I ran with python 2.7, currently python 3.4.

Unhandled exception in thread started by

The script is refusing to finish anymore.

Sometimes it's done and just hangs forever:

Detail statistics for 24 Kitchen (channel 29 of 29)
     0 cache hits
   168 without details in cache

And sometimes I get this error:

Unhandled exception in thread started by

But nothing more.

Gaps in schedule

Melding 1:

ik heb dezelfde problemen als iedereen: veel gaten in de programmagids 'NO DATA' in mythtv.

Andere gebruiker:

Ik heb nog steeds (met de experimental versie) dat er zaken tot 3x in voorkomen.
Ik heb de boel nog niet in MythTV durven laden, omdat deze nu druk aan het opnemen is met handmatig geprogrammeerde zaken, maar heb enkel de xml even bekeken.
Bij bijv. RTL4 komt een aantal programma's nog steeds tot 3x toe.
Ik heb de cachefile wel weggegooid voor het grabben.
Ik vermoed dus dat er ongetwijfeld nog wel wat gaten in zitten (puur gebaseerd op het gegeven dat programma's vaker voorkomen)

Season/Episode disapearing at the last moment

It is not consistent, but after recording regularly season/episode is missing, while, as there is a propramID, it was present before. I thought I had addressed this before in the cache query, but ...

tv_grab_nl.py fetch failed or timed out Help?

Hello Guru's,

Ik heb jullie hulp nodig, ik gebruik al heel lang de tv_grab_nl.py op mijn synologyom voor de epg van tvgids.nl Maar sinds enige tijd werkt het niet meer, ik krijg de errors (fetch failed or timed out):

MediaStation> ./tv_grab_nl_py --output /root/.xmltv/tv_grab_file.xmltv

Now fetching NPO 1(xmltvid=1) (channel 1 of 121)
Deleting duplicate: Journaal
Deleting duplicate: Journaal
Deleting duplicate: Journaal

( 0%) 94: Studio sport [normal fetch] fetch failed or timed out 47: Jinek [normal fetch] fetch failed or timed out 98: De rijdende rechter [normal fetch] fetch failed or timed out 30: Journaal ^CTraceback (most recent call last):

Wat moet ik doen, om dit op te lossen?

Alvast bedank,

Non standard characters in XML

Hello,

I use bigscreen EPG to parse the XML to windows media center, i get an error on the xml file. I have added a snippet:

[code]

<title lang="nl">Bluf</title>
Aflevering 5
Dramaserie: Ook Elise is in de ban van Julian. Of gebruikt ze hem alleen om Mark jaloers te maken? Tjé ten slotte komt in de Amsterdam ArenA de vrouw van zijn dromen tegen. "Ik weet niet wat ik in mijn vorige leven heb gedaan om dit te mogen meemaken, maar het was letterlijk een soort jongensdroom: strippers op schoot en voetballen in de ArenA. Ik krijg een eigen nachtclub en word voor het eerst verliefd", vertelt acteur Géza Weisz, die Tj�é speelt, in Grazia. "Het...
Drama
1 . 4 .

12+

Seks

Grof

[/code]

I Suspect the letter é is the problem, could this be translated?

Thankyou,
Roland de Leeuw

Unable to fetch from teveblad.be

Seems something changed on their website during the last week (I first installed tv_grab_nl_py last week), they're now referring to HLN (http://www.hln.be/hln/nl/929/Kanaal-TV/index.dhtml#aanbod). The links it is trying (http://www.teveblad.be/tv-gids/zenders et cetera) indeed give a 404, so I guess this is a quite fundamental change, which may or may not be related to the acquisition of teveblad.be by De Persgroep (see here: http://www.persgroep.be/nl/news/overdracht-van-titels-sanoma-naar-de-persgroep-goedgekeurd).

Of course it would be nice to have this fixed (if at all possible), but more importantly... I think it would be useful to have the ability to disable certain sources (--disable-teveblad, --disable-tvgidstv et cetera), in case of such eventualities? Or maybe there is some setting I overlooked. In any case... now it seems to just get stuck with this...

http://www.teveblad.be/tv-gids/2015-09-08/zenders/bbc2-nl
[repeats for each channel]
Cannot open url http://www.teveblad.be/tv-gids/zenders/: Not Found
teveblad channel info file: /home/tvheadend/teveblad_channels.html not found

update
I decided to see if copying in the 'teveblad channel info file' would help (even though it says it is not necessary anymore?). It seems to have stopped the script from getting stuck. Still, maybe it would be good to look into the above :)

Detection of Python version

The latest grabber (2.2.8) I tested this morning, it exits on the following:

# check Python version
if sys.version_info[:3] < (2,7,9):
    sys.stderr.write("tv_grab_nl_py requires Pyton 2.7 or higher\n")
    sys.exit(2)

The error should also mention the latest digit.
My Mythbuntu has 2.7.4 (not entirely sure) and the Ubuntu 14.04 here has 2.7.6
So is it really needed to have at least such a new version, or is a work-around possible?

tv_grab_nl.py fetch failed or timed out Help?

Hello Guru's,

Ik heb jullie hulp nodig, ik gebruik al heel lang de tv_grab_nl.py op mijn synologyom voor de epg van tvgids.nl Maar sinds enige tijd werkt het niet meer, ik krijg de errors (fetch failed or timed out):

MediaStation> ./tv_grab_nl_py --output /root/.xmltv/tv_grab_file.xmltv

Now fetching NPO 1(xmltvid=1) (channel 1 of 121)
Deleting duplicate: Journaal
Deleting duplicate: Journaal
Deleting duplicate: Journaal

( 0%) 94: Studio sport [normal fetch] fetch failed or timed out 47: Jinek [normal fetch] fetch failed or timed out 98: De rijdende rechter [normal fetch] fetch failed or timed out 30: Journaal ^CTraceback (most recent call last):

Wat moet ik doen, om dit op te lossen?

Alvast bedank,

Problems with version 2.1.2 and tvheadend reported

Problems have been reported. One about no cache being created and the other about no descriptions being fetched.
I'm awaiting details

tvgids.tv seems for Discovery to store a subtitle as a genre

I have noticed a whole flood of weird genres. I have pinpointed them to coming from Dircovery on tvgids.tv. I'm not jet sure if and what to do about it.

Python 2.5 and 2.4 support

It seems there are still a few users with Python 2.5 and even 2.4. This is currently not supported.

Add version number to script

The script currently does not have a version number. This makes troubleshooting harder.

Automatically add a version number (date, hash id,...) if possible

Unicode fixes voor stderr

  File "./tv_grab_nl.py", line 1425, in <module>
    sys.exit(main())
  File "./tv_grab_nl.py", line 1387, in main
    get_descriptions(programs, program_cache, nocattrans, quiet, slowdays)
  File "./tv_grab_nl.py", line 864, in get_descriptions
    sys.stderr.write('\n(%3.0f%%) %s: %s ' % (100*float(counter)/float(nprograms), i, programs[i]['name']))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 13: ordinal not in range(128)

tv_grab_nl.py fetch failed or timed out Help?

Hello Guru's,

Ik heb jullie hulp nodig, ik gebruik al heel lang de tv_grab_nl.py op mijn synologyom voor de epg van tvgids.nl Maar sinds enige tijd werkt het niet meer, ik krijg de errors (fetch failed or timed out):

MediaStation> ./tv_grab_nl_py --output /root/.xmltv/tv_grab_file.xmltv

Now fetching NPO 1(xmltvid=1) (channel 1 of 121)
Deleting duplicate: Journaal
Deleting duplicate: Journaal
Deleting duplicate: Journaal

( 0%) 94: Studio sport [normal fetch] [fetch failed or timed out]

( 1%) 47: Jinek [normal fetch] [fetch failed or timed out]

( 1%) 98: De rijdende rechter [normal fetch] [fetch failed or timed out]

( 2%) 30: Journaal ^CTraceback (most recent call last):

Wat moet ik doen, om dit op te lossen?

Alvast bedank,

npo.nl page name and possibly more changed

VPRO.nl source not working

I'm getting no data with the recently added vpro.nl source.

Error with new nieuwsblad.be source

I get the following when I try to run configure after you added the nieuwsblad.be source

Traceback (most recent call last):
  File "tv_grab_nl.py", line 11516, in get_channels
    strdata = self.get_page(self.get_url('base'))
  File "tv_grab_nl.py", line 11494, in get_url
    locale.setlocale(locale.LC_TIME, ('nl_NL', 'utf-8'))
  File "/usr/lib/python2.7/locale.py", line 579, in setlocale
    return _setlocale(category, locale)
Error: unsupported locale setting

UnicodeDecodeError with desc

Fast update is ok, but with full update:

Traceback (most recent call last):
File "./tv_grab_nl.py", line 1425, in
sys.exit(main())
File "./tv_grab_nl.py", line 1397, in main
xml.extend(xmlefy_programs(programs, id, desc_len, compat, nocattrans))
File "./tv_grab_nl.py", line 1073, in xmlefy_programs
desc.append('%s ' % program[detail_row])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 20: ordinal not in range(1 28)

Environment:

CentOS 6.2
python 2.6.6

Retrieve program info using JSON or other sources

An awesome reader wrote:

Heb je er misschien wat aan als je ook de detail informatie (in get_descriptions) via JSON kan ophalen? Dan is zowel het overzicht, als de detail data op dezelfde manier (en hopelijk zelfde character encoding) beschikbaar. Zonder dat je HTML hoeft te parsen om de juiste velden te vinden. Is denk ik stabieler en sneller.

Met een educated guess heb ik deze URL gevonden: http://www.tvgids.nl/json/lists/program.php?id=12341766
Het mee te geven id is het db_id.

tvgids.tv detail fetch fails

tv_grab_nl.py fetch failed or timed out Help?

Hello Guru's,

Ik heb jullie hulp nodig, ik gebruik al heel lang de tv_grab_nl.py op mijn synologyom voor de epg van tvgids.nl Maar sinds enige tijd werkt het niet meer, ik krijg de errors (fetch failed or timed out):

MediaStation> ./tv_grab_nl_py --output /root/.xmltv/tv_grab_file.xmltv

Now fetching NPO 1(xmltvid=1) (channel 1 of 121)
Deleting duplicate: Journaal
Deleting duplicate: Journaal
Deleting duplicate: Journaal

( 0%) 94: Studio sport [normal fetch] [fetch failed or timed out]

( 1%) 47: Jinek [normal fetch] [fetch failed or timed out]

( 1%) 98: De rijdende rechter [normal fetch] [fetch failed or timed out]

( 2%) 30: Journaal ^CTraceback (most recent call last):

Wat moet ik doen, om dit op te lossen?

Alvast bedank,

Fox only shows Garage Gold

Fox only shows Garage Gold as show instead of all the shows that there are on.
I already tried to remove all the files and did a complete re-run which seemed to work, I saw multiple shows, but unfortunately after a few days this went away and only Garage Gold is displayed.

I'm running the latest stable 2.1.5, please let me know if I can upload any files that can be helpful.

Sanitize tvgids.nl input

Ik kreeg vandaag een error op Familie 24 heel raar ik had de laatste update opgehaald met git en daar zat deze fout in

Now fetching Familie 24(xmltvid=64) (channel 3 of 31)
Traceback (most recent call last):
  File "./tv_grab_nl.py", line 1408, in <module>
    sys.exit(main())
  File "./tv_grab_nl.py", line 1365, in main
    info = get_channel_all_days(id,  days, quiet)
  File "./tv_grab_nl.py", line 627, in get_channel_all_days
    for r in v:
TypeError: 'bool' object is not iterable

The ids on humo.be are not stable

This breaks old configurations and the linking with other sources on new configurations

Thoughts on enhancements

A more properly named issue to continue #49 Error with some ids

Sanitize config file input

script fails if a line in the config file does not start with a number (here: starts with the word "channel")

Traceback (most recent call last):
  File "/usr/bin/tv_grab_nl", line 1408, in <module>
    sys.exit(main())
  File "/usr/bin/tv_grab_nl", line 1348, in main
    ikey = int(key)
ValueError: invalid literal for int() with base 10: 'channel'
2012-03-14 21:24:16.277 FillData, Error: xmltv returned error code 256
2012-03-14 21:24:16.278 Error in 1:1: unexpected end of file

Python 3 support

Title says it all.

Documentation

Add pages from repository to wiki

Fix title_split (req'd for BBC3)

BBC3 heeft geen genre of iets van dien aard. In title_split moeten een check en reparatie komen:

def title_split(program):
    """
    Some channels have the annoying habit of adding the subtitle to the title of a program.
    This function attempts to fix this, by splitting the name at a ': '.
    """

#paulp
# Some programs (BBC3 when this happened) have no genre. If none, then set to a default
if program['genre'] is None:
    program['genre'] = 'overige';


if  ('titel aflevering' in program and program['titel aflevering'] != '')  \
 or ('genre' in program and program['genre'].lower() in ['movies','film']):
   return


colonpos =  program['name'].rfind(': ')
if colonpos > 0:
   program['titel aflevering'] = program['name'][colonpos+1:len(program['name'])].strip()
   program['name'] =  program['name'][0:colonpos].strip()

How to specify channels that do not have a counterpart at tvgids.nl?

I would like to use channel info from e.g. teveblad.be for channels (e.g. vijftv) that do not exist at tvgids.nl. I see hints that support for this has been added, but I cannot find these channels in the config file that is created with --configure. How do I tell the grabber to grab this data?

Crash; possible deadlock

I've only begun using this fork/continuation of the grabber a few weeks ago. I've successfully used version 2.1.3 for a couple of weeks. As of a few days ago no data was being grabbed and I've had to kill the grabber as it was using lots of CPU. Now the grabber isn't doing much of anything at all: it produces no output and also uses hardly any CPU.

According to strace the grabber seems to be indefinitely waiting for a lock to be released. This remains the case even after a reboot.

Beperk tot 3 dagen

Het wordt wel beperkt tot 3 dagen, day=4 geeft data van gisteren :-(

Het heeft geen zin langer dan 4 dagen data binnen te halen. TVGids geeft
voor dag 0 vandaag, voor 1 morgen, 2 overmorgen 3 de dag daarna en
voor 4 en hoger vandaag. Als je dus voor meer dan 4 dagen data opvraagt
levert dat niets extra's op.

Horizon IDs occasional change

Occasional I see that an Horizon ID has changed from a 11 digit 2444..... number to a 12 digit 6... number. I saw it with NPO 3 and now with tve. As long as Horizon does not deliver the xmltvid, you notice it because a separate channel line is created and it is a simple adjustment in the code. Else the xmltvid will also change and probably if active it will become inactive. Luckily those are not so many. For now all that can be done is keeping an eye on it.

Ignore old-style cache files

Warn and quit when an old-style cache file is found (perhaps the code should put the version in the cache!)

Timeout or other cause for process to hang

For the last few weeks, every other day the script hangs and has to be killed and restarted to finish the collection of channel data.
I am only collecting for about 25 channels, so the daily update should only take about an hour.

Often only one restart of the script on the same day is enough to finish, but today it is already taking up to 3 restarts and is still (actively) busy.
So perhaps it is waiting for a very long (indefinite?) time-out, or waiting to find something on the page which isn't there (e.g. error-page).

Errors when running --configure

I get the following error when running 'tv_grab_nl.py --configure':

File "/usr/bin/perlbin/vendor/tv_grab_nl.py", line 783
self.teveblad_genericnames = {"ochtend- en dagprogramma's",
(The problem lies at the comma, before the slash)

I'm using python 2.6:
Python 2.6.6 (r266:84292, Nov 18 2011, 05:12:23)
[GCC 4.5.1] on linux2

I tried v2.1.7 and v2.1.9-beta of the script.

tvgrabbers / tvgrabnlpy Goto Github PK

tvgrabnlpy's People

Contributors

Stargazers

Watchers

Forkers

tvgrabnlpy's Issues

Recommend Projects

Recommend Topics

Recommend Org