tvgrabbers / tvgrabnlpy Goto Github PK
View Code? Open in Web Editor NEWDeze versie is deprecated zie: tvgrabpyAPI
Home Page: https://github.com/tvgrabbers/tvgrabpyAPI
License: GNU General Public License v2.0
Deze versie is deprecated zie: tvgrabpyAPI
Home Page: https://github.com/tvgrabbers/tvgrabpyAPI
License: GNU General Public License v2.0
I noticed while fixing the cache bug, that programs with the same name seem to overwrite each other. It's not urgent, but weird. So I keep an eye on that.
I noticed that data for Belgian channel 2BE is extremely unreliable (i.e. inaccurate). I've made several recordings where the recorded program is different from the one in its metadata (in other words: the program data was incorrect to begin with).
Another example is coming up this evening: on June 17th 2015 2BE will be broadcasting a movie called New Police Story at 20:35. Both 2BE's own website and teveblad.be agree on this. However, Myth thinks the movie will be Source Code.
I'm using Version: 2.1.6-p20150510-beta.
Some channels don't have the correct day. Looks like that it only concerns channels where not all programs start on the current day.
By example when request
http://www.tvgids.nl/json/lists/programs.php?channels=3&day=0
The json provides:
"51": {
"db_id": "12342149",
"titel": "Homeland",
"genre": "Serie/soap",
"soort": "Dramaserie",
"kijkwijzer": "gs3",
"artikel_id": null,
"datum_start": "2012-03-11 20:25:00",
"datum_end": "2012-03-11 21:20:00"
},
While the xml outputs:
btw some other observation looking at the code:
Still looking for a source containing better (with season/episode) information for the sbs group channels (sbs6, net5, veronica and sbs9)
--configure gives an empty configuration file. This is bad.
Looking at the list in depth I spotted some issues resulting in channels not linking with other sources because of typos as well as some missing mappings:
tvgids.tv
bbc-first should be linked with 0-464
ziggo-sport should be linked with 0-466
horizon.tv
Comedy Central Family (672816167176) should be linked with 0-317
FOX Sports 6 (606274087106) should be linked with 1-fox-sports-6
Ziggo Sport (675503655063) should be linked with 0-466
RTL Lounge (672816167174) should be linked with 0-408
vpro.nl
comedycentral should be comedy_central
24kitchen should be 24_kitchen
nieuwsblad.be
bbc_1 should be bbc-1
bbc_2 should be bbc-2
prime-serie should be prime-series
The MTV sources should be split since MTV Vlaanderen has a different schedule than MTV NL:
http://www.mtv.be/schedule/
http://www.mtv.nl/programma
MTV NL is the following:
0-25
1-mtv
5-24443943006
MTV BE is the following:
6-69
8-mtv
The same for Nickelodeon
http://www.nickelodeon.nl/tv-gids
http://www.nickelodeon.be/tv-gids
Nickelodeon NL is the following:
0-89
1-nickelodeon
5-542836775318
Nickelodeon BE is the following:
6-73
8-nickelodeon
tvgids.nl has put a blocking popup in between asking you to agree. I'm looking at ways to handle this. But for now I will create an automatic fall back to the json page only trying ones.
Current result if there is no network:
Now fetching Nederland 1(xmltvid=1) (channel 1 of 3)
Traceback (most recent call last):
File "./tv_grab_nl.py", line 1430, in <module>
sys.exit(main())
File "./tv_grab_nl.py", line 1387, in main
info = get_channel_all_days(id, days, quiet)
File "./tv_grab_nl.py", line 589, in get_channel_all_days
response = opener.open(req)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 394, in open
response = self._open(req, data)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 412, in _open
'_open', req)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain
result = func(*args)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1199, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1174, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known>
I just now notice two other problems. teveblad.be structurally returning wrong date and structural failures on the tvgids.nl detail pages. I'll look into that.
This is not fatal, but an update will follow soon.
Is dit wellicht waardevol qua data?
http://www.rtl.nl/active/epg_data/dag_data/0
En: http://www.rtl.nl/active/epg_data/uitzending_data/771904638412019
the --configure
option always creates a ~/.xmltv directory, even if the configure file is in another directory. It shouldn't to that.
For the past few days I've been getting some errors on TVGids.tv, the one thing in common is that it only appears on the Film1 and HBO channels, so it might be specific to movies:
2015-12-14 04:47:53 : Error processing tvgids.tv detailpage:http://www.tvgids.tv/tv/hollywood-banker/14702652
2015-12-14 04:47:53 : Traceback (most recent call last):
2015-12-14 04:47:53 : File "tv_grab_nl.py", line 8482, in load_detailpage
2015-12-14 04:47:53 : kw_val = d.find('div').get('class').strip()
2015-12-14 04:47:53 : AttributeError: 'NoneType' object has no attribute 'get'
2015-12-14 04:47:55 : Error processing tvgids.tv detailpage:http://www.tvgids.tv/tv/northpole/14702656
2015-12-14 04:47:55 : Traceback (most recent call last):
2015-12-14 04:47:55 : File "tv_grab_nl.py", line 8482, in load_detailpage
2015-12-14 04:47:55 : kw_val = d.find('div').get('class').strip()
2015-12-14 04:47:55 : AttributeError: 'NoneType' object has no attribute 'get'
2015-12-14 04:47:58 : Error processing tvgids.tv detailpage:http://www.tvgids.tv/tv/de-dolle-tweeling-3/14702662
2015-12-14 04:47:58 : Traceback (most recent call last):
2015-12-14 04:47:58 : File "tv_grab_nl.py", line 8482, in load_detailpage
2015-12-14 04:47:58 : kw_val = d.find('div').get('class').strip()
2015-12-14 04:47:58 : AttributeError: 'NoneType' object has no attribute 'get'
2015-12-14 04:48:01 : Error processing tvgids.tv detailpage:http://www.tvgids.tv/tv/the-birdcage/14702664
2015-12-14 04:48:01 : Traceback (most recent call last):
2015-12-14 04:48:01 : File "tv_grab_nl.py", line 8482, in load_detailpage
2015-12-14 04:48:01 : kw_val = d.find('div').get('class').strip()
2015-12-14 04:48:01 : AttributeError: 'NoneType' object has no attribute 'get'
2015-12-14 04:55:50 : Error processing tvgids.tv detailpage:http://www.tvgids.tv/tv/waar-is-het-paard-van-sinterklaas/14716775
2015-12-14 04:55:50 : Traceback (most recent call last):
2015-12-14 04:55:50 : File "tv_grab_nl.py", line 8482, in load_detailpage
2015-12-14 04:55:50 : kw_val = d.find('div').get('class').strip()
2015-12-14 04:55:50 : AttributeError: 'NoneType' object has no attribute 'get'
Hi, is it normal that I am experiencing 100% CPU-load (on one out of 4 cpu's only) while running? The cpu gets real hot for the duration of a run, which takes about an hour.
I plan to look into the cause of this myself. First, I converted the script to Python3, to see if that would make any difference. I does not. :-) Next thing, I built in some diagnostics/debugging code and I see that in fetching data for some 20 channels, it creates 28 threads. Then it seems to be working on fetching data for 2 channels concurrently. Once one of these channels finishes, the 'threading.active_count()' is reduced by one and a new channel starts being processed.
I will dig into this some more. If anyone has any pointers, then please share.
I am running on an Intel I5-4670 Haswell cpu, Fedora 22 Linux. Earlier I ran with python 2.7, currently python 3.4.
The script is refusing to finish anymore.
Sometimes it's done and just hangs forever:
Detail statistics for 24 Kitchen (channel 29 of 29)
0 cache hits
168 without details in cache
And sometimes I get this error:
Unhandled exception in thread started by
But nothing more.
Melding 1:
ik heb dezelfde problemen als iedereen: veel gaten in de programmagids 'NO DATA' in mythtv.
Andere gebruiker:
Ik heb nog steeds (met de experimental versie) dat er zaken tot 3x in voorkomen.
Ik heb de boel nog niet in MythTV durven laden, omdat deze nu druk aan het opnemen is met handmatig geprogrammeerde zaken, maar heb enkel de xml even bekeken.
Bij bijv. RTL4 komt een aantal programma's nog steeds tot 3x toe.
Ik heb de cachefile wel weggegooid voor het grabben.
Ik vermoed dus dat er ongetwijfeld nog wel wat gaten in zitten (puur gebaseerd op het gegeven dat programma's vaker voorkomen)
It is not consistent, but after recording regularly season/episode is missing, while, as there is a propramID, it was present before. I thought I had addressed this before in the cache query, but ...
Hello Guru's,
Ik heb jullie hulp nodig, ik gebruik al heel lang de tv_grab_nl.py op mijn synologyom voor de epg van tvgids.nl Maar sinds enige tijd werkt het niet meer, ik krijg de errors (fetch failed or timed out):
MediaStation> ./tv_grab_nl_py --output /root/.xmltv/tv_grab_file.xmltv
Now fetching NPO 1(xmltvid=1) (channel 1 of 121)
Deleting duplicate: Journaal
Deleting duplicate: Journaal
Deleting duplicate: Journaal
( 0%) 94: Studio sport [normal fetch] fetch failed or timed out 47: Jinek [normal fetch] fetch failed or timed out 98: De rijdende rechter [normal fetch] fetch failed or timed out 30: Journaal ^CTraceback (most recent call last):
Wat moet ik doen, om dit op te lossen?
Alvast bedank,
Hello,
I use bigscreen EPG to parse the XML to windows media center, i get an error on the xml file. I have added a snippet:
[code]
<title lang="nl">Bluf</title>
Aflevering 5
Dramaserie: Ook Elise is in de ban van Julian. Of gebruikt ze hem alleen om Mark jaloers te maken? Tjé ten slotte komt in de Amsterdam ArenA de vrouw van zijn dromen tegen. "Ik weet niet wat ik in mijn vorige leven heb gedaan om dit te mogen meemaken, maar het was letterlijk een soort jongensdroom: strippers op schoot en voetballen in de ArenA. Ik krijg een eigen nachtclub en word voor het eerst verliefd", vertelt acteur Géza Weisz, die Tj�é speelt, in Grazia. "Het...
Drama
1 . 4 .
12+
Seks
Grof
[/code]
I Suspect the letter é is the problem, could this be translated?
Seems something changed on their website during the last week (I first installed tv_grab_nl_py last week), they're now referring to HLN (http://www.hln.be/hln/nl/929/Kanaal-TV/index.dhtml#aanbod). The links it is trying (http://www.teveblad.be/tv-gids/zenders et cetera) indeed give a 404, so I guess this is a quite fundamental change, which may or may not be related to the acquisition of teveblad.be by De Persgroep (see here: http://www.persgroep.be/nl/news/overdracht-van-titels-sanoma-naar-de-persgroep-goedgekeurd).
Of course it would be nice to have this fixed (if at all possible), but more importantly... I think it would be useful to have the ability to disable certain sources (--disable-teveblad, --disable-tvgidstv et cetera), in case of such eventualities? Or maybe there is some setting I overlooked. In any case... now it seems to just get stuck with this...
http://www.teveblad.be/tv-gids/2015-09-08/zenders/bbc2-nl
[repeats for each channel]
Cannot open url http://www.teveblad.be/tv-gids/zenders/: Not Found
teveblad channel info file: /home/tvheadend/teveblad_channels.html not found
update
I decided to see if copying in the 'teveblad channel info file' would help (even though it says it is not necessary anymore?). It seems to have stopped the script from getting stuck. Still, maybe it would be good to look into the above :)
The latest grabber (2.2.8) I tested this morning, it exits on the following:
# check Python version
if sys.version_info[:3] < (2,7,9):
sys.stderr.write("tv_grab_nl_py requires Pyton 2.7 or higher\n")
sys.exit(2)
The error should also mention the latest digit.
My Mythbuntu has 2.7.4 (not entirely sure) and the Ubuntu 14.04 here has 2.7.6
So is it really needed to have at least such a new version, or is a work-around possible?
Hello Guru's,
Ik heb jullie hulp nodig, ik gebruik al heel lang de tv_grab_nl.py op mijn synologyom voor de epg van tvgids.nl Maar sinds enige tijd werkt het niet meer, ik krijg de errors (fetch failed or timed out):
MediaStation> ./tv_grab_nl_py --output /root/.xmltv/tv_grab_file.xmltv
Now fetching NPO 1(xmltvid=1) (channel 1 of 121)
Deleting duplicate: Journaal
Deleting duplicate: Journaal
Deleting duplicate: Journaal
( 0%) 94: Studio sport [normal fetch] fetch failed or timed out 47: Jinek [normal fetch] fetch failed or timed out 98: De rijdende rechter [normal fetch] fetch failed or timed out 30: Journaal ^CTraceback (most recent call last):
Wat moet ik doen, om dit op te lossen?
Alvast bedank,
Problems have been reported. One about no cache being created and the other about no descriptions being fetched.
I'm awaiting details
I have noticed a whole flood of weird genres. I have pinpointed them to coming from Dircovery on tvgids.tv. I'm not jet sure if and what to do about it.
It seems there are still a few users with Python 2.5 and even 2.4. This is currently not supported.
The script currently does not have a version number. This makes troubleshooting harder.
Automatically add a version number (date, hash id,...) if possible
File "./tv_grab_nl.py", line 1425, in <module>
sys.exit(main())
File "./tv_grab_nl.py", line 1387, in main
get_descriptions(programs, program_cache, nocattrans, quiet, slowdays)
File "./tv_grab_nl.py", line 864, in get_descriptions
sys.stderr.write('\n(%3.0f%%) %s: %s ' % (100*float(counter)/float(nprograms), i, programs[i]['name']))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 13: ordinal not in range(128)
Hello Guru's,
Ik heb jullie hulp nodig, ik gebruik al heel lang de tv_grab_nl.py op mijn synologyom voor de epg van tvgids.nl Maar sinds enige tijd werkt het niet meer, ik krijg de errors (fetch failed or timed out):
MediaStation> ./tv_grab_nl_py --output /root/.xmltv/tv_grab_file.xmltv
Now fetching NPO 1(xmltvid=1) (channel 1 of 121)
Deleting duplicate: Journaal
Deleting duplicate: Journaal
Deleting duplicate: Journaal
( 0%) 94: Studio sport [normal fetch] [fetch failed or timed out]
( 1%) 47: Jinek [normal fetch] [fetch failed or timed out]
( 1%) 98: De rijdende rechter [normal fetch] [fetch failed or timed out]
( 2%) 30: Journaal ^CTraceback (most recent call last):
Wat moet ik doen, om dit op te lossen?
Alvast bedank,
I'm getting no data with the recently added vpro.nl source.
I get the following when I try to run configure after you added the nieuwsblad.be source
Traceback (most recent call last):
File "tv_grab_nl.py", line 11516, in get_channels
strdata = self.get_page(self.get_url('base'))
File "tv_grab_nl.py", line 11494, in get_url
locale.setlocale(locale.LC_TIME, ('nl_NL', 'utf-8'))
File "/usr/lib/python2.7/locale.py", line 579, in setlocale
return _setlocale(category, locale)
Error: unsupported locale setting
Fast update is ok, but with full update:
Traceback (most recent call last):
File "./tv_grab_nl.py", line 1425, in
sys.exit(main())
File "./tv_grab_nl.py", line 1397, in main
xml.extend(xmlefy_programs(programs, id, desc_len, compat, nocattrans))
File "./tv_grab_nl.py", line 1073, in xmlefy_programs
desc.append('%s ' % program[detail_row])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 20: ordinal not in range(1 28)
Environment:
An awesome reader wrote:
Heb je er misschien wat aan als je ook de detail informatie (in get_descriptions) via JSON kan ophalen? Dan is zowel het overzicht, als de detail data op dezelfde manier (en hopelijk zelfde character encoding) beschikbaar. Zonder dat je HTML hoeft te parsen om de juiste velden te vinden. Is denk ik stabieler en sneller.
Met een educated guess heb ik deze URL gevonden: http://www.tvgids.nl/json/lists/program.php?id=12341766
Het mee te geven id is het db_id.
Hello Guru's,
Ik heb jullie hulp nodig, ik gebruik al heel lang de tv_grab_nl.py op mijn synologyom voor de epg van tvgids.nl Maar sinds enige tijd werkt het niet meer, ik krijg de errors (fetch failed or timed out):
MediaStation> ./tv_grab_nl_py --output /root/.xmltv/tv_grab_file.xmltv
Now fetching NPO 1(xmltvid=1) (channel 1 of 121)
Deleting duplicate: Journaal
Deleting duplicate: Journaal
Deleting duplicate: Journaal
( 0%) 94: Studio sport [normal fetch] [fetch failed or timed out]
( 1%) 47: Jinek [normal fetch] [fetch failed or timed out]
( 1%) 98: De rijdende rechter [normal fetch] [fetch failed or timed out]
( 2%) 30: Journaal ^CTraceback (most recent call last):
Wat moet ik doen, om dit op te lossen?
Alvast bedank,
Fox only shows Garage Gold as show instead of all the shows that there are on.
I already tried to remove all the files and did a complete re-run which seemed to work, I saw multiple shows, but unfortunately after a few days this went away and only Garage Gold is displayed.
I'm running the latest stable 2.1.5, please let me know if I can upload any files that can be helpful.
Ik kreeg vandaag een error op Familie 24 heel raar ik had de laatste update opgehaald met git en daar zat deze fout in
Now fetching Familie 24(xmltvid=64) (channel 3 of 31)
Traceback (most recent call last):
File "./tv_grab_nl.py", line 1408, in <module>
sys.exit(main())
File "./tv_grab_nl.py", line 1365, in main
info = get_channel_all_days(id, days, quiet)
File "./tv_grab_nl.py", line 627, in get_channel_all_days
for r in v:
TypeError: 'bool' object is not iterable
This breaks old configurations and the linking with other sources on new configurations
A more properly named issue to continue #49 Error with some ids
script fails if a line in the config file does not start with a number (here: starts with the word "channel")
Traceback (most recent call last):
File "/usr/bin/tv_grab_nl", line 1408, in <module>
sys.exit(main())
File "/usr/bin/tv_grab_nl", line 1348, in main
ikey = int(key)
ValueError: invalid literal for int() with base 10: 'channel'
2012-03-14 21:24:16.277 FillData, Error: xmltv returned error code 256
2012-03-14 21:24:16.278 Error in 1:1: unexpected end of file
Title says it all.
Add pages from repository to wiki
BBC3 heeft geen genre of iets van dien aard. In title_split moeten een check en reparatie komen:
def title_split(program):
"""
Some channels have the annoying habit of adding the subtitle to the title of a program.
This function attempts to fix this, by splitting the name at a ': '.
"""
#paulp
# Some programs (BBC3 when this happened) have no genre. If none, then set to a default
if program['genre'] is None:
program['genre'] = 'overige';
if ('titel aflevering' in program and program['titel aflevering'] != '') \
or ('genre' in program and program['genre'].lower() in ['movies','film']):
return
colonpos = program['name'].rfind(': ')
if colonpos > 0:
program['titel aflevering'] = program['name'][colonpos+1:len(program['name'])].strip()
program['name'] = program['name'][0:colonpos].strip()
I would like to use channel info from e.g. teveblad.be for channels (e.g. vijftv) that do not exist at tvgids.nl. I see hints that support for this has been added, but I cannot find these channels in the config file that is created with --configure. How do I tell the grabber to grab this data?
I've only begun using this fork/continuation of the grabber a few weeks ago. I've successfully used version 2.1.3 for a couple of weeks. As of a few days ago no data was being grabbed and I've had to kill the grabber as it was using lots of CPU. Now the grabber isn't doing much of anything at all: it produces no output and also uses hardly any CPU.
According to strace the grabber seems to be indefinitely waiting for a lock to be released. This remains the case even after a reboot.
Het wordt wel beperkt tot 3 dagen, day=4 geeft data van gisteren :-(
Het heeft geen zin langer dan 4 dagen data binnen te halen. TVGids geeft
voor dag 0 vandaag, voor 1 morgen, 2 overmorgen 3 de dag daarna en
voor 4 en hoger vandaag. Als je dus voor meer dan 4 dagen data opvraagt
levert dat niets extra's op.
Occasional I see that an Horizon ID has changed from a 11 digit 2444..... number to a 12 digit 6... number. I saw it with NPO 3 and now with tve. As long as Horizon does not deliver the xmltvid, you notice it because a separate channel line is created and it is a simple adjustment in the code. Else the xmltvid will also change and probably if active it will become inactive. Luckily those are not so many. For now all that can be done is keeping an eye on it.
Warn and quit when an old-style cache file is found (perhaps the code should put the version in the cache!)
For the last few weeks, every other day the script hangs and has to be killed and restarted to finish the collection of channel data.
I am only collecting for about 25 channels, so the daily update should only take about an hour.
Often only one restart of the script on the same day is enough to finish, but today it is already taking up to 3 restarts and is still (actively) busy.
So perhaps it is waiting for a very long (indefinite?) time-out, or waiting to find something on the page which isn't there (e.g. error-page).
I get the following error when running 'tv_grab_nl.py --configure':
File "/usr/bin/perlbin/vendor/tv_grab_nl.py", line 783
self.teveblad_genericnames = {"ochtend- en dagprogramma's",
(The problem lies at the comma, before the slash)
I'm using python 2.6:
Python 2.6.6 (r266:84292, Nov 18 2011, 05:12:23)
[GCC 4.5.1] on linux2
I tried v2.1.7 and v2.1.9-beta of the script.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.