None of the free video decoders being used in SageTV appear to support closed captioni

Monitor .srt files for new subtitles during live playback.,about google/sagetv

Comments (56)

Narflex commented on July 26, 2024

The subtitle handles currently load everything when the player loads and then never look back to see if there's more information available. What's probably not that hard to do is in something like sage.media.sub.SRTSubtitleHandler.loadSubtitlesFromFiles(MediaFile) you detect if that MediaFile is currently recording or not. If it is, then don't close the file after you finish reading what's in it, and spin up a thread to regularly look at that file for more data and then add the additional subtitle entries (and then terminate the thread and close the file in the cleanup() method). However, this gets to be more problematic in the context of SageTVClient because then you're not dealing with a local file....but in that case, you'd then need to keep checking with the SageTV Server if there's more content in it and then downloading and parsing it as that happens (although I'd just write it to a temp file like it does already via getLoadableSubtitleFile and then just keep appending that so all the other logic is universal).

The multiple segment recordings is a separate issue. It's currently designed so that the subtitle information is part of the media format; and inside of that it contains the path to the subtitle file. Formats apply to the recordings as a whole and are not for each individual segment; so that's where the disconnect lies. In MediaFile.checkForSubtitles() that's where you can see it uses the first file as the base filename to check for the external subtitle files. A less than ideal workaround to this problem is to pass a segment number into that function (and change the various things that call into it) and that would work fine; except for the fact that it would then update the file's format to be that way...the only real negative impact I can think of to that is then if multiple clients try to play this file at nearly the exact same time, they would then be possibly reading the other subtitle location (but this is probably better than what currently happens in this case). But this is all a moot point if the first thing above is done. :)

from sagetv.

CraziFuzzy commented on July 26, 2024

Is there a real reason to keep subtitle info IN the media format object? Seems to make more sense to load it during playback, in essentially the same way the comskip playback plugin reads in .EDL files (including the way it treats each segment separately).

from sagetv.

Narflex commented on July 26, 2024

It does actually redetect them before playback every time. There's also a
special metadata attribute called VARIED_FORMAT which also causes the
MiniPlayer to redetect the format for each individual segment in a multi
file recording as well (to fix an issue I had with PIDs changing mid
program on FIOS). These can likely be used to make this easier.

Jeff Kardatzke
Sent from my Android
On Jun 23, 2016 7:44 AM, "Christopher Piper" [email protected]
wrote:

Is there a real reason to keep subtitle info IN the media format object?
Seems to make more sense to load it during playback, in essentially the
same way the comskip playback plugin reads in .EDL files (including the way
it treats each segment separately).

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#87 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ANEIDED8NjtYej4at3iwlV85OED4CSXlks5qOpu5gaJpZM4HwFxZ
.

from sagetv.

enternoescape commented on July 26, 2024

The re-detection is was I observed. I was just starting to take a stab at this tonight. I feel like I'm missing something obvious, but what's the best way to determine the MediaFile is currently a live recording?

from sagetv.

Narflex commented on July 26, 2024

I believe there is a isRecording() call in it. :-)

Jeff Kardatzke
Sent from my Android
On Jun 24, 2016 7:15 PM, "Joseph Shuttlesworth" [email protected]
wrote:

The re-detection is was I observed. I was just starting to take a stab at
this tonight. I feel like I'm missing something obvious, but what's the
best way to determine the MediaFile is currently a live recording?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#87 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ANEIDG84lKkqBEyzOuHY2WJfN89oqVsLks5qPJ1CgaJpZM4HwFxZ
.

from sagetv.

enternoescape commented on July 26, 2024

Yeah, that was too obvious. :) I interpreted that as just saying it's a recording, not that it was actively recording.

Yeah, that looks like it will be true for anything that matches MEDIA_MASK_TV.

from sagetv.

enternoescape commented on July 26, 2024

I don't think that will have the desired effect. I don't want to poll a file for changes if the corresponding recording it's not actively being written. I think isAnyLiveStream() is closer to what I'm looking for.

from sagetv.

CraziFuzzy commented on July 26, 2024

IsFileCurrentlyRecording(sage.MediaFile MediaFile) is probably more appropriate, but I think it's far better to check the actual .SRT file(s) on an interval instead (like the way .EDL files are monitored), whether the file is actively being recorded or not.

from sagetv.

enternoescape commented on July 26, 2024

I guess the SRT file could be created after the recording has completed while you're watching it. At least in the case of CCExtractor, it's so fast compared to Comskip this would be a rare occurrence. SRT files are timing sensitive and that could result in somewhat aggressive polling of the file. This can create a surprising amount of stress on the file system as I learned with OpenDCT. My thoughts right now are instead of polling the file at intervals, the file is only polled when we have reached the last known subtitle to see if we can load any more and if we can't then it resorts to timed polling.

It's important to know that if you start playback and the SRT file doesn't already exist, this monitoring will not even happen. That's partly why I don't think it's necessary to do this the same way Comskip does it.

from sagetv.

enternoescape commented on July 26, 2024

I looked up IsFileCurrentlyRecording because that sounded familiar. It is a part of the API which I don't really need to abide by when writing core features. It basically just find the MediaFile in question and checks that the file is in the database and isRecording() which is basically what was suggested earlier. Knowing that, and knowing that just matching the mask MEDIA_MASK_TV is enough to make that return true, now I find a little confusing that that does in fact return the correct value when a show is not recording. I would have assumed that the mask doesn't change just because the recording stopped, but I guess that just means I don't understand the meaning of the mask.

from sagetv.

stuckless commented on July 26, 2024

@enternoescape - not 100% sure what you are after... but... isFileCurrentlyRecording will only return true if the file is being recorded, now. It won't return true, tomorrow, for example.

But if you are looking to figure out "when" a new recording is happening... you can use the event API system, and RecordingSegmentAdded, for example.

sagetv/java/sage/SageTVEventListener.java

Line 29 in 364cd4b

    
            * RecordingSegmentAdded (called an active recording transitions to a new file but would not have fired a RecordingStarted event) - vars: MediaFile

There are a number of events in there that cover things like recording started, completed, etc, or even when a mediafile is being watched, paused, stopped, etc.

from sagetv.

CraziFuzzy commented on July 26, 2024

I think he was not looking to see if the file was a recording, but whether it was actively recording, because he's working off the assumption that .SRT files are only going to change or appear while the file is still recording.

from sagetv.

enternoescape commented on July 26, 2024

@stuckless Thanks for the pointers.

I'll think I'll just write it so that if the subtitle file exists when you start and the file is any kind of recording, active or not, it will monitor for changes. The reason I was so focused on live only is because the experience is kind of bad if you actually need the subtitles to be able to watch the show, that means the SRT file needs to be written at the same time as the recording.

While this should be good enough for many people, the real solution will be integrating a closed captions extractor that will make the captions accessible just ahead of the actual video playback.

from sagetv.

CraziFuzzy commented on July 26, 2024

I mean, an ultimate solution would be to go with something more robust that .SRT anyway. SRT doesn't even include all the info that CC contains. CC has positioning information that is lost in the SRT format. Ultimately, we need STV logic that will read in a much more robust format (like .ASS) and render that out to the screen as STV widgets. That's a whole lot bigger project than just changing out already supported formats are read in though.

from sagetv.

enternoescape commented on July 26, 2024

That's close to what I'm talking about, but if the core is doing all of the decoding, we don't really even need that information to be stored in a file. It could be decoded from the A/V container as it's needed. We'd just create a new SubtitleHandler.

from sagetv.

enternoescape commented on July 26, 2024

@Narflex I often see the use of the Vector object in the source code. I can understand why some of them are probably there, but at least in the case of SubtitleHandler, we are already using synchronize and then we use a Vector which is also internally using synchronize. I think in this case, an ArrayList is a better choice. I would usually just leave something like this alone, but since the client will possibly need to keep downloading the file and parsing it, depending on the number of subtitles, it could actually become a performance issue.

Also, if it's all the same to you, when it's relevant to what I'm working on I would like to actually provide the values for the type variables in Map objects. It just looks cleaner in the IDE and will reduce human errors.

from sagetv.

enternoescape commented on July 26, 2024

I think I found a much nicer solution to the Windows Client not having direct access to the file. I wrote a new class called MediaServerInputStream that basically enables you to open a remote file as an InputStream using the MediaServer with a little local buffering to keep small data requests from being a problem. The advantage is now I don't need to copy the file locally, then read it. I'm sure this might help improve overall efficiency in other similar situations too.

from sagetv.

CraziFuzzy commented on July 26, 2024

How big are the .SRT files in general? Are they too big to be brought in with GetFileAsString (which is server based)? (Not that I'm against an InputStream method being implemented, but I like to keep UI Related things based through the API, and done IN the UI).

from sagetv.

CraziFuzzy commented on July 26, 2024

Though, thinking about it, the absolute BEST thing might be to roll up a Subtitles class, and have that read in .SRT, .SSA, .ASS, etc on the server, and serve up an actual object and methods for the list of caption events (keyed by timestamp). Caching and new file checking could be done on the server, so on first request for the events, if it's not parsed in and loaded, the server does that, and checks file timestamp on each access to see if new information is there to be brought in. This way, clients are dealing with files at all, just function calls.

from sagetv.

enternoescape commented on July 26, 2024

GetFileAsString still pulls the entire file over the network. Unlike Comskip files, these files could add measurable overhead if we are constantly transferring the entire file on a live stream. Unlike Comskip file, SRT files don't change anything that's already been written. Also the entire file needs to be parsed every time if you do it that way or at the very least you need to get back to where you left off; either way it's very inefficient in a way I'm sure someone will notice during playback. When it is read as an InputStream, I can read the file until it says there's nothing else to read, then come back later to see if there's more to read. Also this can be reused for many things that would go a little faster if we parsed the data as it comes in instead of doing a complete download, then read.

Subtitles to me are one of those special cases since they are useless if they aren't on time. I'm sure that Jeff can fill in the blanks for you there on why it's done the way it currently is done. I'm not against doing a complete re-write if it will take weeks, but I'm not really interesting a month+ project that has no visible benefit. If there's a reason other than convenience as to why the subtitles are processed by the Client instead of the server, I'm probably going to leave that alone. I suspect it has something to do with timing and the local client is definitely going to be faster than the server trying to get things out on time.

It's actually very atypical for a subtitle file to be growing as you're using it. I don't know of any programs that currently support this. It's probably not worth going much further than making this work well with what is available. I feel like closed captioning being extracted by the server might merit completely renovating this system, but given my limited time and the other things I want to do that will have a much more noteworthy impact on the SageTV experience, I'm likely going to get this to the point that it works very well and leave the kind of changes you're suggesting up for a later date.

from sagetv.

enternoescape commented on July 26, 2024

@Narflex From what I can tell, if we go past the last subtitle, the value returned basically tells VideoFrame to come back in a week. I was trying to follow the logic there, but I can't with complete certainty tell what it does with this value. Will it actually not check again for that amount of time, if so, I tell it the next subtitle is in 2 seconds when a new subtitle has yet to be available or would that cause a problem?

from sagetv.

enternoescape commented on July 26, 2024

It turns out 1 second polling is better than on demand. I underestimated how much time it can take to parse new entries. The good news is I'm right at live and all of the subtitles are loading just in time every time, so that's great.

from sagetv.

enternoescape commented on July 26, 2024

The new MediaServerInputStream class I created is working terrifically too. I was a little concerned that it would have all kinds of little quirks, but it's working as well as direct access. I modified IOUtils.openReaderDetectCharset() to accept the boolean parameter, local, that tells it if it should return a BufferedReader backed by a FileInputStream or a MediaServerInputStream. This allows things to be effectively seamless between a local and remote file.

Some of the other performance issues I found was that the original code was creating a new StringBuffer for every single subtitle. I replaced that with a local StringBuilder object that we clear for each new subtitle entry. I also added some handling for when we can't be completely sure we got the entire subtitle, so it will go back and try one more time on the next poll.

I think figured out the VideoFrame coming back in a week thing. From observing the logs, it looks like it actually will not come back for a week. When I'm doing this live, I now have it intercepting that value when it's a week and changing it to 1 second. Situations that could lead to running out of subtitles usual are commercial breaks since not all commercials have captions.

@Narflex Now I'm getting down to the harder bits. I can see what what you were talking about just changing the format every time the segment changes, but that really does feel a little hacky. If I understand, the race here is if two miniclients load different segments at the same time?

from sagetv.

enternoescape commented on July 26, 2024

@stuckless I did find something interesting when using the Android miniclient with live subtitles. Occasionally they clobber each other and often they aren't available in time. Reloading the video doesn't fix the clobbering. I can't recall if I've seen this before, but as you might guess it only happens when there's a lot being said. The Windows Client, Placeshifter, HD200 and HD300 don't have this issue. The Android miniclient appears to run a little closer to the edge of the stream, so that might be part of the problem.

I know there has been talk about having the server process the closed captions, but if the Android miniclient is running this far ahead, I think we'll need to find a way to hold it back a second or two. OpenDCT actually extracts the closed captions just before the video stream hits the disk, so it's falling behind even with this advantage. I'm working on further optimizing the loading performance, but I think I've reached a point where things are still improving, but the gains are getting smaller.

from sagetv.

stuckless commented on July 26, 2024

@enternoescape I'm not sure I fully understand the issue, but I'm open to suggestions on how to fix it :) Does this only happen for live tv streaming, or does it happen anytime there is a lot of subtitles, even for pre-recorded content? I know the HD300 processes it's own subtitles and sends them to sagetv for rendering... in the case of live tv, are you sure it's using the srt file vs processing its own subtitles?

from sagetv.

enternoescape commented on July 26, 2024

@Narflex I feel a little silly now. I was looking at using the InputStream on all of the subtitle loading methods and I noticed that VobSubSubtitleHandler uses a RandomAccessFile which led me to notice the NetworkRandomFile class that does a lot of what I just did to create the InputStream.

I noticed that NetworkRandomFile doesn't allow writing. Is the reason just that it's unsafe or security purposes? From what I can tell there's nothing preventing you from opening a file and using WRITE instead of READ. In my InputStream, I added a method to create an OutputStream that determines the correct remote offset taking into account the offset in the local cache before writing and then effectively clears the local cache when you perform any writing to ensure that a future read will result in the data that's actually on the disk, not what's in the local cache. I also added a truncate command to the media server so we have a full complement of what FastRandomFile is allowed to do.

I noticed NetworkRandomFile as I was writing an interface to abstract the methods that FastRandomFile actually uses in RandomAccessFile, so that I could transparently swap out RandomAccessFile for a class that will open files using the media server. Since I pretty much finished creating my own implementations before I noticed your work, this is what happens when you go on vacation... :) I'm going to do some benchmarks just to see if there's any advantage between what I wrote vs what you wrote.

from sagetv.

enternoescape commented on July 26, 2024

@stuckless In all cases, I'm talking about SRT files. After more thorough analysis, the disruption does appear to be caused the the thread locking needed to add new subtitles. I'm working on making the locking as brief as possible, but it does appear to sometimes cause timing issues in the miniclient. I'm guessing the other clients cache the subtitles a little. The same subtitles don't always clobber each other. I also saw the timebar flicker at the same time as one of the clobber "events," so I guess the locking is gumming up the works so to speak. It didn't matter if it was live or not.

I don't know if you have already seen flickering before involving the timebar or other elements, but I think whatever is causing that is probably the biggest culprit. Either way I guess this wasn't really the right place to bring this up. I'll dig a little deeper and probably end up creating an issue on the Android miniclient project.

Actually I just tried this with the latest jar on Bintray just for reference and the flickering and clobbering still happens there, so I guess I'm just noticing an issue that already existed. Turning off subtitles stops the screen from timebar from sometime flickering in time with the subtitles. I think this can probably be lumped in with the flickering and animation issues some forum members have reported. It's hard not to appreciate how good the miniclient is; especially given how quickly it became usable to the general public.

from sagetv.

enternoescape commented on July 26, 2024

I have things optimized to the point that I can load 89 subtitles in under 37ms on a average computer. On a slower computer, I was able to parse 371 in 973ms. That's fast enough to update the subtitles on demand like I wanted to do before. I was observing the average caption is on the screen between 1.5 and 3 seconds. So I'll set the polling to happen every 5 seconds or if we have run out of subtitles. That appears be fast enough to have no measurable impact on playback.

from sagetv.

Narflex commented on July 26, 2024

@enternoescape Regarding Vector, feel free to change those to ArrayList in areas where they don't need synchronization. This was from habitual use of Vector in my earlier days. :) Also feel free to use typed collections; a lot of this code was written before those even existed, which is why it wasn't in there.

For reading files from the server, there's already NetworkChannelFile and NetworkRandomFile (the former supports use of NIO Buffer objects for reading). So you may just want to use those (they extends the FasterRandomFile class which implements DataInput and DataOutput; so they should be usable for just about any case....which after reading more comments, I now see you have found. :) The only reason I never added write support was because I never had a need for it before from those classes.

For changing the format in the MediaFile objects each time you load a different segment; your assumption is correct, you don't want to do that because it'll break other clients that might be playing a different segment at the same time.

Let me know if I've missed any outstanding questions.

from sagetv.

enternoescape commented on July 26, 2024

I had a feeling these were just carried over from the older days of Java.

I don't know how much the JVM will optimize things for this situation, but the InputStream implementation I wrote is a lot lighter than DataInput partly because it doesn't implement all of those other methods that will never be used when reading lines of characters. I'd need to write a wrapper class over top of NetworkChannelFile or NetworkRandomFile, to make getting a BufferedReader from IOUtils.openReaderDetectCharset() a seamless experience.

Is there a technical reason why the subtitle files for the client in the current code are always copied locally, then read? Is there a performance penalty? For example, I see for VobSub, that you could have used NetworkRandomFile to read the sub files, but instead you copy the entire file locally and parse it there.

from sagetv.

Narflex commented on July 26, 2024

I didnt think there would be a case for live subtitle files at the time I
wrote it, so it just seemed easier to go the temp file route via a copy.
It's likely more performant to do it the other way, unless you are going to
be skipping backwards many times and reloading the same data. But either
way, the difference would be negligible compared to anything else going on
in the system.

Jeff Kardatzke
Sent from my Android
On Jul 2, 2016 5:34 PM, "Joseph Shuttlesworth" [email protected]
wrote:

I had a feeling these were just carried over from the older days of Java.

I don't know how much the JVM will optimize things for this situation, but
the InputStream implementation I wrote is a lot lighter than DataInput
partly because it doesn't implement all of those other methods that will
never be used when reading lines of characters. I'd need to write a wrapper
class over top of NetworkChannelFile or NetworkRandomFile, to make getting
a BufferedReader from IOUtils.openReaderDetectCharset() a seamless
experience.

Is there a technical reason why the subtitle files for the client in the
current code are always copied locally, then read? Is there a performance
penalty? For example, I see for VobSub, that you could have used
NetworkRandomFile to read the sub files, but instead you copy the entire
file locally and parse it there.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#87 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ANEIDBmvFt-M8HszIyPfym67xhVrN0-4ks5qRwN2gaJpZM4HwFxZ
.

from sagetv.

enternoescape commented on July 26, 2024

I was just make sure I wasn't repeating history. You'll know better than me if something was tried and the results were worse than expected. I agree that it should be faster to parse the stream as it comes in instead of copying it over, then parsing. I also agree that in the grand scheme, it's probably not the biggest overall impact performance-wise.

I would like to add the NetworkInputStream class I wrote into the "Network" classes since it's very purpose-built as a lighter version off the bigger classes that extend FasterRandomFile. The class I wrote does buffering similar to FasterRandomFile to keep really small reads from making things slow. The problem is that InputStream is abstract, so there's no clean way to pull these all together without being a little clunky, so I'd rather make it it's own class.

from sagetv.

enternoescape commented on July 26, 2024

I was looking more at what I can get out of NetworkRandomFile which should suit what InputStream needs best. The DataInput interface is implemented in FasterRandomFile (not NetworkRandomFile), so the methods I actually need for an InputStream are not exposed at the right level and we don't actually appear to have any buffering benefits from FasterRandomFile for the same reasons. For example, I can get the method read(byte b[], int off, int len), but it won't be reading the network file, it will be reading the RandomAccessFile that doesn't get opened because the implementation of the method is in FasterRandomFile.

I could write the missing methods into NetworkRandomFile, but I feel like this is the wrong way to get where we're going and will only lead to further duplication of efforts between local and remote files down the road. I know this is a rather core part of how SageTV does client communications, but I think this merits a cleanup. What I want to do is try to standardize these things a little better so we are not duplicating so much at the higher levels and will nearly eliminate any mistakes you can easily make when you extend a class in this manner.

The only difference when opening a file should be if a file is accessed locally or via the media server. The RandomAccessFile in FastRandomFile should be changed to an interface that can be swapped out between the local and remote file communications protocols. I have deduced what's actually used in RandomAccessFile and it's actually a very small number of methods. It's not really a huge ordeal to move these into an interface; I tried it and it works well. I know there are special transcoding methods in NetworkRandomFile, but we don't need to eliminate this class in my proposal (though it's less necessary); we would just eliminate the methods involved in data transfers and have a way to communicate special "directions" like transcoding to the media server through the RandomAccessFile replacement.

I realize this looks like I'm fixing something that's not broken, but I really think this is one of those changes that will work out for the best in the long run.

from sagetv.

Narflex commented on July 26, 2024

Go for it. :-)

Jeff Kardatzke
Sent from my Android
On Jul 3, 2016 6:26 PM, "Joseph Shuttlesworth" [email protected]
wrote:

I was looking more at what I can get out of NetworkRandomFile which should
suit what InputStream needs best. The DataInput interface is implemented in
FasterRandomFile (not NetworkRandomFile), so the methods I actually need
for an InputStream are not exposed at the right level and we don't actually
appear to have any buffering benefits from FasterRandomFile for the same
reasons. For example, I can get the method read(byte b[], int off, int
len), but it won't be reading the network file, it will be reading the
RandomAccessFile that doesn't get opened because the implementation of the
method is in FasterRandomFile.

I could write the missing methods into NetworkRandomFile, but I feel like
this is the wrong way to get where we're going and will only lead to
further duplication of efforts between local and remote files down the
road. I know this is a rather core part of how SageTV does client
communications, but I think this merits a cleanup. What I want to do is try
to standardize these things a little better so we are not duplicating so
much at the higher levels and will nearly eliminate any mistakes you can
easily make when you extend a class in this manner.

The only difference when opening a file should be if a file is accessed
locally or via the media server. The RandomAccessFile in FastRandomFile
should be changed to an interface that can be swapped out between the local
and remote file communications protocols. I have deduced what's actually
used in RandomAccessFile and it's actually a very small number of methods.
It's not really a huge ordeal to move these into an interface; I tried it
and it works well. I know there are special transcoding methods in
NetworkRandomFile, but we don't need to eliminate this class in my proposal
(though it's less necessary); we would just eliminate the methods involved
in data transfers and have a way to communicate special "directions" like
transcoding to the media server through the RandomAccessFile replacement.

I realize this looks like I'm fixing something that's not broken, but I
really think this is one of those changes that will work out for the best
in the long run.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#87 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ANEIDFAxtidI5F8KE6s8_ErqFzrqfxeHks5qSGFfgaJpZM4HwFxZ
.

from sagetv.

enternoescape commented on July 26, 2024

I guess I didn't realize how unrelated to this issue my latest suggestion would be. :)

The more I look at this the more I understand how complex the task actually is to unify the way local and remote files are read in SageTV. Part of the problem is Java. They added ways to use off heap memory and it causes a lot of people to duplicate their efforts because in cases of high throughput, you don't always know which one is going to be faster; especially if you need to do many small reads from a byte buffer (and even more so if it's a direct byte buffer). I saw that you wrote a read and write buffer for NIO which feels weird, but I think I understand why that was done.

So far, I wrote classes to handle opening local and remote files using IO and a smaller separate set designed similarly for NIO that implement FileChannel and the interface SageFileChannel. I also wrote "wrapper" classes to add whatever is needed above the actual "file." For example, you can add a buffering layer to the local and remote files similar to how InputStream can be wrapped into a BufferedInputStream. In addition I moved the encryption into it's own class that can be used in a similar manner. Every layer is optimized to do what it does well and it makes it trivial to add new functionality if needed. I also wrote classes that you can wrap these inside of to get more standard Java abstract classes/interfaces like InputStream, OutputStream, DataInput, DataOutput, FileChannel. I was careful with DataInput and DataOutput to keep the character/string related methods exactly how you had them in FastRandomFile. The idea is that if we don't always need anything special, it's wasteful to be loading an all encompassing monolithic class.

I have replaced all uses of FastRandomFile, FasterRandomFile, BlurayRandomFile, BlurayNetworkFile, NetworkRandomFile and NetworkChannelFile with the minimum that each one actually requires for a given situation and I really think it has driven down loading times a bit. I'm still testing, but this kind of thing is really just a matter of verifying that the math is correct and the outcome is what you expected. So far everything is working great.

The Wizard is working correctly as it reads the wiz.bin and re-writes it. Because the encrypted filter class is its own layer, when the file is determined to not be encrypted, it can cleanly remove the encryption layer without any need to reopen the file (I know had this been a great cause for concern, it could have been done with FastRandomFile too, but not as easily). Separating encryption (which from what I determined isn't even used in open source) also means we have one less branch in every read and write method. I put some temporary logging in so I could see the random write cache misses and by how much they missed it. I optimized the write caching for the Wizard so that it will dynamically do partial writes up to half of the write buffer size when it determines that the next random write would end up missing the cache. The performance is very good.

from sagetv.

stuckless commented on July 26, 2024

Sounds like this is a good candidate to unit test :) Unit testing SageTV
code can be a PITA because so many things end up calling into Sage.java and
that in turns will attempt to load the native parts, etc. But these
classes seem like something that might be able to be unit tested fairly
easily.

On Tue, 12 Jul 2016 at 22:57 Joseph Shuttlesworth [email protected]
wrote:

I guess I didn't realize how unrelated to this issue my latest suggestion
would be. :)

The more I look at this the more I understand how complex the task
actually is to unify the way local and remote files are read in SageTV.
Part of the problem is Java. They added ways to use off heap memory and it
causes a lot of people to duplicate their efforts because in cases of high
throughput, you don't always know which one is going to be faster;
especially if you need to do many small reads from a byte buffer (and even
more so if it's a direct byte buffer). I saw that you wrote a read and
write buffer for NIO which feels weird, but I think I understand why that
was done.

So far, I wrote classes to handle opening local and remote files using IO
and a smaller separate set designed similarly for NIO that implement
FileChannel and the interface SageFileChannel. I also wrote "wrapper"
classes to add whatever is needed above the actual "file." For example, you
can add a buffering layer to the local and remote files similar to how
InputStream can be wrapped into a BufferedInputStream. In addition I moved
the encryption into it's own class that can be used in a similar manner.
Every layer is optimized to do what it does well and it makes it trivial to
add new functionality if needed. I also wrote classes that you can wrap
these inside of to get more standard Java abstract classes/interfaces like
InputStream, OutputStream, DataInput, DataOutput, FileChannel. I was
careful with DataInput and DataOutput to keep the character/string related
methods exactly how you had them in FastRandomFile. The idea is that if we
don't always need anything special, it's wasteful to be loading an all
encompassing monolithic class.

I have replaced all uses of FastRandomFile, FasterRandomFile,
BlurayRandomFile, BlurayNetworkFile, NetworkRandomFile and
NetworkChannelFile with the minimum that each one actually requires for a
given situation and I really think it has driven down loading times a bit.
I'm still testing, but this kind of thing is really just a matter of
verifying that the math is correct and the outcome is what you expected. So
far everything is working great.

The Wizard is working correctly as it reads the wiz.bin and re-writes it.
Because the encrypted filter class is its own layer, when the file is
determined to not be encrypted, it can cleanly remove the encryption layer
without any need to reopen the file (I know had this been a great cause for
concern, it could have been done with FastRandomFile too, but not as
easily). Separating encryption (which from what I determined isn't even
used in open source) also means we have one less branch in every read and
write method. I put some temporary logging in so I could see the random
write cache misses and by how much they missed it. I optimized the write
caching for the Wizard so that it will dynamically do partial writes up to
half of the write buffer size when it determines that the next random write
would end up missing the cache. The performance is very good.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#87 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ABAfKSur4IpWMcI30sUYjZ9TvHR-scmrks5qVFQEgaJpZM4HwFxZ
.

Sent from Android

from sagetv.

enternoescape commented on July 26, 2024

I completely agree. I was actually thinking along those lines. I can't test the media server implementation with a unit test since that will cause all kinds of native code to load. Due to the design, I can however create a custom source to test that things are working correctly at all of the levels above. The encryption layer also needs native because the keys it loads are populated by native code, but I could change it so that's not a requirement.

from sagetv.

enternoescape commented on July 26, 2024

I will mention that these classes are very inclusive, so it might take twice as long to write the tests as it took me to write the classes. I actually found a bug in FastRandomFile when writing a single byte that apparently doesn't happen often or it would have been fixed. I made the same mistake when I was writing my own write buffer, and that's why I know the logic is in the wrong order. The error would be an out of bounds exception.

from sagetv.

stuckless commented on July 26, 2024

The test classes is just a suggestion on my part... I tend to like them,
especially when I refactor code. (I spend last weekend completely
rewriting the Phoenix filename scrapers and because of the test cases... I
was able to quickly find and fix my issues).

I think if you can quickly write some test cases around it... then do it...
but if it becomes something that is going to take much more time to do...
then your call. I've been meaning to restructure the Sage core code to be
less dependent on the native parts (which would help in unit testing many
other areas)... but I didn't get far with that side project :(

On Wed, 13 Jul 2016 at 09:28 Joseph Shuttlesworth [email protected]
wrote:

I will mention that these classes are very inclusive, so it might take
twice as long to write the tests as it took me to write the classes. I
actually found a bug in FastRandomFile when writing a single byte that
apparently doesn't happen often or it would have been fixed. I made the
same mistake when I was writing my own write buffer, and that's why I know
the logic is in the wrong order. The error would be an out of bounds
exception.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#87 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ABAfKYj8H-FPjpHGDjaYP92llRRZ6omBks5qVOfmgaJpZM4HwFxZ
.

Sent from Android

from sagetv.

enternoescape commented on July 26, 2024

I'm going use TestNG since it's more flexible than JUnit and it's what I'm using for the more important parts of OpenDCT. I'll add the appropriate dependencies to Gradle and make sure the tests are a part of building Sage.jar. Thoughts? Concerns?

I agree that SageTV should be more sectioned up so that there aren't so many strange dependencies in some of the classes and we can avoid loading native when it's not really needed in all cases.

from sagetv.

stuckless commented on July 26, 2024

TestNG is fine by me... I've never used it... but I'm not doing anything in
SageTV core at the moment, so use whatever you want... Eventually I'll
probably pull in mockito as well, since I use that fair bit to unit test
stuff that is harder to decouple or abstract. My only suggestion is to put
the test code in a separate test src folder (normally mimicing the package
space of the original source file(s))... and yeah, having gradle run the
tests during a build is good thing.

On Wed, 13 Jul 2016 at 09:40 Joseph Shuttlesworth [email protected]
wrote:

I'm going use TestNG since it's more flexible than JUnit and it's what I'm
using for the more important parts of OpenDCT. I'll add the appropriate
dependencies to Gradle and make sure the tests are a part of building
Sage.jar. Thoughts? Concerns?

I agree that SageTV should be more sectioned up so that there aren't so
many strange dependencies in some of the classes and we can avoid loading
native when it's not really needed in all cases.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#87 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ABAfKXIAJo5C4imfD7HY87mNErO6bPlBks5qVOrXgaJpZM4HwFxZ
.

Sent from Android

from sagetv.

Narflex commented on July 26, 2024

@enternoescape One thing I wanted to correct you on is that the crypto key for the Wiz.bin file is not loaded from native code in the open source version. It's in the Sage.java class on line 968.

And the main reason there's all that NIO code in there is for performance (mainly for when running in standalone mode on the HD300; but it benefits PC class execution as well). If you use a FileChannel for reading and then a SocketChannel for your network connection; you can send between the two directly and the bytes may never even touch system memory and just get DMA'd across. Also, if you use direct byte buffers with that same type of setup; then you can avoid ever having to copy stuff up into Java heap memory and keep it all at the native layer (which is faster for when the native code is reading/writing into those buffers since it doesn't need to pin the java objects in memory and can just access the native arrays directly).

And everything else sounds good...and I'm very curious about the bug you found in FastRandomFile...I'm surprised a bug survived in that class since it's been there since the beginning.

from sagetv.

enternoescape commented on July 26, 2024

In OpenDCT I use direct NIO buffers for a lot of the native code which as you said keeps it from ever coming into the JVM. That alone cut down significantly on garbage collection and it's faster. I was mostly just commenting on the NIO buffering which I believe what they expect you to do use a ByteBuffer. What you have is closer to the way RandomAccessFile works which is certainly less awkward for processing large amounts of data. I wouldn't mind revamping the MediaServerRemuxer to accept and output to a direct ByteBuffer, but it might be questionable over if it's worth it.

Never mind on that bug I spoke of, it was a bug for me because I did something different. The issue was if you did a write from an array that went exactly to the end of the write buffer byte array. Then if you do a single byte write, you would get an index out of bounds exception. After looking a second time, I don't think that would be a problem for your class. False alarm! :)

from sagetv.

enternoescape commented on July 26, 2024

Also the crypt key location was odd. I assumed it was native because when I looked for how it got there, I couldn't find it. :) Thanks for the correction.

from sagetv.

enternoescape commented on July 26, 2024

I put FasterRandomFile and FastRandomFile through the random read test I wrote and found what I think is a bug in FasterRandomFile that you might already be aware of. ensureBuffer() calls read() instead of readFully() which means that it can return as little as 0 bytes even if we are not at the end of the file. Guess what, it consistently threw an EOF exception even though it was positively not at the end of the file. I made a small change to use readFully() and the EOF exception went away. I know that could be optimized better, but the classes I'm writing will probably be replacing the need for them to still exist.

from sagetv.

Narflex commented on July 26, 2024

That's very interesting. Historically, when reading from file sources, even
though read says that it may not return all the requested bytes, it always
does. readFully is only ever really needed when reading from network
sources. However, for correctness based on the API, readFully should be
used of course since it says that read may not return the requested number
of bytes.

I'm very curious how exactly you were running a test that enabled this to
occur, since in practice I've never seen a read call on a file that has the
data available to read return less than the number of requested bytes.

I looked at the API again, and it's interesting. In
RAF.read(byte[],int,int) it says this:

"Although RandomAccessFile is not a subclass of InputStream, this method
behaves in exactly the same way as the InputStream.read(byte[], int, int)
method of InputStream."

And then if you look at the method in InputStream, it says this:

" The default implementation of this method blocks until the requested
amount of input data len has been read, end of file is detected, or an
exception is thrown. Subclasses are encouraged to provide a more efficient
implementation of this method."

Which is saying it will behave in the same way as readFully. At the Java
layer though, RAF.readFully clearly will keep calling read() until it reads
everything and does not assume that the read method gets it all (likely
because an overriding class could change that behavior).

So now I'm very curious...I'd really like to see the test case you created
that caused read() to behave differently than readFully() in
RandomAccessFile. :)

On Wed, Jul 13, 2016 at 8:17 PM, Joseph Shuttlesworth <
[email protected]> wrote:

I put FasterRandomFile and FastRandomFile through the random read test I
wrote and found what I think is a bug in FasterRandomFile that you might
already be aware of. ensureBuffer() calls read() instead of readFully()
which means that it can return as little as 0 bytes even if we are not at
the end of the file. Guess what, it consistently threw an EOF exception
even though it was positively not at the end of the file. I made a small
change to use readFully() and the EOF exception went away. I know that
could be optimized better, but the classes I'm writing will probably be
replacing the need for them to still exist.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#87 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ANEIDMAMYQc0GOiqX_90N9V3NHaqy8r0ks5qVaphgaJpZM4HwFxZ
.

Jeffrey Kardatzke
[email protected]
Google, Inc.

from sagetv.

enternoescape commented on July 26, 2024

I spotted the real problem. You're throwing an EOFException and it's not actually in the contract for the method (read()) calling it. I usually take at least 4 hours before I post that I don't get something or that I found something interesting, but I feel like I'm doing a great job of reporting nothing. :)

from sagetv.

enternoescape commented on July 26, 2024

The test does sequential reads while optionally skipping back or forward at random until it reaches the end of the file. The very last read is actually read() which should return -1, but instead threw an EOF exception. Due to the way things were written, I ended up determining incorrectly that the value returned was not -1 when in fact it should have been, but threw an exception instead. I know it sounds odd that you could confuse these two things, but the symptom was an index out of bounds exception which should only be happening if somehow the last read didn't return that it was the end of the file. EOF was not always thrown which led to the confusion.

from sagetv.

enternoescape commented on July 26, 2024

I deliberately leave incorrect behavior unresolved in the test cases because if you account for them, the underlying issue might not make the test fail. Also while I've noticed that unit tests are not the most accurate benchmark, I was able to prove that the new classes load faster and read/write as fast or in some cases faster than the old ones. I was a little concerned that overall performance might not be the same.

from sagetv.

enternoescape commented on July 26, 2024

@Narflex I was seeing messages starting with "Waiting to send message reply to client until it reaches quanta" for long periods of time when testing the client. I see that message comes from SageTVConnection.java. Is that related in some way to the file system? Any ideas on what I should be focusing on to address it or does it basically imply my server is slow?

from sagetv.

enternoescape commented on July 26, 2024

You should also know this is only when I'm using the client from another computer. When I use the client on the same Windows server, I never see that message. Is there a distinction between 127.0.0.1 and other addresses?

from sagetv.

Narflex commented on July 26, 2024

You generally shouldn't be seeing those for a long period of time...they
may show up very shortly during times of heavy load. Let me explain what
those mean. :)

Before Google, when the server did an update on the database, in order to
ensure proper synchronization, it would lock the table, then apply the
update, then distribute the update to all clients, and then release the
lock. That way all DBs on all clients and the server were always
synchronized.

At Google, we ran into problems where one box may be having network related
issue, or got hard powered off and then would stall all the other clients
because we had to wait for the network timeout to finish which is a couple
minutes on that socket (and lowering it below that introduced other
problems when there were large GCs, so we had to leave it where it was).

The solution was to allow distributing the DB transactions to clients after
the server released its DB lock. In order to make all this consistent,
there's a 'quanta' value that is stored which is incremented every time
there is any transactional state change on the server. Then, when a client
makes an API call to the server (which may rely on the DB state being
consistent, like if it starts a recording, that MediaFile addition needs to
be pushed to the client before it returns or the client won't have the
object in its DB to actually return), only at that point in time do we need
to ensure synchronization. So when this occurs, the server is waiting for
that particular client to reach the same quanta that the server was at when
the API call completed on the server before it sends the reply to that
client.

There's no special treatment of the 127.0.0.1 client in all of this, it's
just another client. If there's networking issues, then you may also see
this message until those issues resolve...or you may also see this message
repeatedly when a client goes offline in a way where the server doesn't see
the disconnect (like hard power off, or unplugging a network cable); but it
will clear up after the timeout when the server disconnects that client.
Also, if you're changing anything relating to the client/server behavior
that's done across SageTVConnection, then you may have introduced a bug
that's causing them to get out of sync.

Not sure if that'll help much at all or not..but if you give me more
details on when it happens, if it recovers fine, etc. then I may be able to
help you figure out more what's going on.

On Sat, Jul 16, 2016 at 6:38 AM, Joseph Shuttlesworth <
[email protected]> wrote:

You should also know this is only when I'm using the client from another
computer. When I use the client on the same Windows server, I never see
that message. Is there a distinction between 127.0.0.1 and other addresses?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#87 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ANEIDHWSoL4V7oIrAujbgvcWwD-NgNXWks5qWN7PgaJpZM4HwFxZ
.

Jeffrey Kardatzke
[email protected]
Google, Inc.

from sagetv.

enternoescape commented on July 26, 2024

Thanks for the explanation. That sounds a lot like how new configurations are distributed in clusters. Fortunately nothing in SageTVConnection was changed by me until I saw this problem. I had a situation whereby the logging of this actually appeared to slow things down enough that it didn't recover for a very long time. Increasing the modulus value for the logging helped it recover much faster or it just missed more times that it happened. I could see the first time the database is populated with EPG data was enough of an event to probably cause this. It's good to have this in the back of my head as I can see how the overall performance of this piece could be effected by my changes.

from sagetv.

Narflex commented on July 26, 2024

You can also get a Java stack dump when it's in this state to see what the
threads are doing and figure out where something is blocked.

Jeff Kardatzke
Sent from my Android

On Jul 16, 2016 12:50 PM, "Joseph Shuttlesworth" [email protected]
wrote:

Thanks for the explanation. That sounds a lot like how new configurations
are distributed in clusters. Fortunately nothing in SageTVConnection was
changed by me until I saw this problem. I had a situation whereby the
logging of this actually appeared to slow things down enough that it didn't
recover for a very long time. Increasing the modulus value for the logging
helped it recover much faster or it just missed more times that it
happened. I could see the first time the database is populated with EPG
data was enough of an event to probably cause this. It's good to have this
in the back of my head as I can see how the overall performance of this
piece could be effected by my changes.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#87 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ANEIDMtgy6PO4_lvWRmWJdM_oWzYBsj5ks5qWTYOgaJpZM4HwFxZ
.

from sagetv.

enternoescape commented on July 26, 2024

True. I still forget about how convenient Java makes things like that.

from sagetv.

enternoescape commented on July 26, 2024

I decided that at least for now I'm just focusing on getting live SRT subtitles working as they currently are without support for multiple files. This is partly because I also redid a large part of the way SageTV accesses files and added tests for io and nio. I also think getting closed captioning decoded by the server would be a greater benefit and I might be taking a look at that eventually. If it really becomes a problem for a few people, we can revisit the multiple file support issue, but that's technically a different issue and in regards to full recordings, it mostly only effects people who have bigger problems.

from sagetv.

Monitor .srt files for new subtitles during live playback. about sagetv HOT 56 CLOSED

Comments (56)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent