alexpoint / subtitlesparser Goto Github PK
View Code? Open in Web Editor NEWMulti formats subtitles parser in C#
License: MIT License
Multi formats subtitles parser in C#
License: MIT License
Hi there, could you tell me if there's a way to get the result in second instead of millisecond? For example, it converts 00:00:06,848
into 6848
. I'd like to see 6,848
. Is it possible? Thank you.
Do you have plan to support writing for WebVTT format?
Console.WriteLine is not the appropriate way to log debug information (example: https://github.com/AlexPoint/SubtitlesParser/blob/master/SubtitlesParser/Classes/Parsers/MicroDvdParser.cs#L77)
Not only it brings noise to the Console applicaitons, it also crashes when this code is ran in Windows Service environment.
I've managed to build the project successfully targeting .NET Standard.
SubtitlesParser.1.4.7.nupkg.zip
Why do you target .NET Core App 2.1? It is more limiting.
Hey,
Are you going to publish the latest NuGet release?
The actual one is out of date (does not have Writer classes)
Also, you have a null reference exception in SRT parser.
Line 84. Need to add item.PlaintextLines ??= new List<string>();
P.S. Great library.
There is a problem with upgrading Nuget package
Could not install package 'SubtitlesParser 1.5.1'. You are trying to install this package into a project that targets '.NETFramework,Version=v4.8', but the package does not contain any assembly references or content files that are compatible with that framework. For more information, contact the package author.
You need to limit the Split() to the number of columns from the header row and that should solve it.
Download a .srt file from http://ccsubs.com/ and all the times that are parsed are set to -1.
I expect it is because all of the caption items are rounded to the nearest second so they fail to parse correctly.
Any chance you'd consider dual-licensing under something more permissive, such as an MIT license? I came across this library and think it may be an ideal fit for a current need. Unfortunately, given the gray area around dynamic linkage of GPL libraries (IANAL) I'm afraid it is not something we'll be able to use considering our current license model.
Will certainly fork/contribute any modifications/extensions we distribute if this is something you're willing to do.
Hey, I was trying to install your library but apparently, it does not appear on Nuget... Is there any reason why?
Run this (with http://rg3.github.io/youtube-dl/) to get a TTML file that fails with a null reference.
youtube-dl.exe --write-info-json --write-auto-sub --sub-lang en --sub-format ttml -o %(id)s.mp4 -f mp4 https://www.youtube.com/watch?v=kYB8IZa5AuE
System.ArgumentNullException was unhandled by user code HResult=-2147467261 Message=Value cannot be null. Parameter name: ns ParamName=ns Source=System.Xml.Linq StackTrace: at System.Xml.Linq.XNamespace.op_Addition(XNamespace ns, String localName) at SubtitlesParser.Classes.Parsers.TtmlParser.ParseStream(Stream xmlStream, Encoding encoding)
Hi,
We would like to fork and extend this parser to include WebVTT parsing for a project at our company. However, without a selected license for this project, we're not able to do so. What is your intended license for this parser?
Thanks
hello sir,
i used this parser in asp net core 2.1 but it doesn't work, i got this error in that line
"using (var fileStream = File.OpenRead(pathToSrtFile))"
i got this error message(IformFile doesn't have definition for "OpenRead")
i replaced (OpenRead with OpenReadStream) but still don't work.
so do you have any suggestion sir?
The location data of the subtitle in the WebVTT is lost, how it can be retained while parsing for e.g. -
00:00:02.377 --> 00:00:06.423 align:middle line:79% position:50% size:85%
Attached a sample WebVTT file -
Hello developer, I got the following error when I parsed the SRT file:
System.FormatException
Message : Parsing as srt returned no srt part.
StackTrace : at SubtitlesParser.Classes.Parsers.SrtParser.ParseStream(Stream srtStream, Encoding encoding)
I guess the SRT file is invalid or corrupt, but I when I use VLC player to play the video, VLC player still shows the subtitle from this SRT file. So I think it's a valid SRT file.
It worked with the other SRT files I have but not for this file. I attach the video and its SRT file for your reference. I also used another package called node srt-to-vtt
to convert it, but I got the blank VTT file.
FYI, I use the tool called SubtitleEdit to create the SRT file for this video. Thanks in advance if there's any solution or workaround to fix this issue.
I found out an issue is that the SrtWriter
class returns the StartTime
and EndTime
with only 2 decimal numbers (fractional part). It leads to the video plays and stops the subtitle earlier a little bit. It should be 3 instead of 2 because I see most subtitles use 3 decimal numbers.
For example, the original subtitle is:
00:00:06.704 --> 00:00:10.538
The ravenous swarm stretches
as far as the eye can see.
After using SrtWriter
it becomes:
1
00:00:06,70 --> 00:00:10,53
The ravenous swarm stretches
as far as the eye can see.
I found this issue is from the following line:
SubtitlesParser/Classes/Writers/SrtWriter.cs
...
string formatTimecodeLine()
{
TimeSpan start = TimeSpan.FromMilliseconds(subtitleItem.StartTime);
TimeSpan end = TimeSpan.FromMilliseconds(subtitleItem.EndTime);
return $"{start:hh\\:mm\\:ss\\,ff} --> {end:hh\\:mm\\:ss\\,ff}";
}
...
I fixed it by replacing return $"{start:hh\\:mm\\:ss\\,ff} --> {end:hh\\:mm\\:ss\\,ff}";
into return $"{start:hh\\:mm\\:ss\\,fff} --> {end:hh\\:mm\\:ss\\,fff}";
.
So please update it if you think it's right.
Thanks for this awesome plugin and please keep evolving it as you guys are doing amazing job. Cheers.
Web VTT spec allows to have subtitles after 24 hours.
see: https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API#Cue_timings
Thanks for your great work!
I am using the universal parser, which is supposed to recognize the format.
var parser = new SubtitlesParser.Classes.Parsers.SubParser();
using (var fileStream = File.OpenRead(pathToSrtFile)){
var items = parser.ParseStream(fileStream);
}
I use as test a MicroDvd file
https://www.opensubtitles.org/fr/subtitleserve/sub/117068
However it is recognized as a SRT, and therefore I get the following error:
Stream is not a valid Srt format
My current workaround is to use the filename, and GetMostLikelyFormat
, which correctly guesses MicroDvd
. However since there are many formats using .sub
extension, I am not sure how solid this method is.
I would be happy with any help :-)
Has the assembly published to NuGet been signed with snk? I see commits in the history indicating that an snk ref was added at one point...
YtXmlFormatParser.Parse
causes System.ArgumentException: 'Stream is not in a valid Youtube XML format'
The code is as follows
List<SubtitlesParser.Classes.SubtitleItem> subtitleItems;
var ytSubtitlesParser = new SubtitlesParser.Classes.Parsers.YtXmlFormatParser();
using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(subtitles)))
{
subtitleItems = ytSubtitlesParser.ParseStream(stream, Encoding.UTF8);
}
YouTube captions attached
yt-video-oPnDOxMXlUc.zip
I have a situation where I am using subliminal to download subtitles, and it seems to default to naming them .en.srt, but some are NOT SRT, so I used our code to read them and it worked great, but I can't tell it was not a Srt to start with, so have to read/rewrite them all instead of just rewriting the few 'wrong ones'.
Thanks for this nuget, very useful for me. Just wondering if it can convert SRT to VTT file or vice versa? For example:
var parser = new SubtitlesParser.Classes.Parsers.SubParser();
List<SubtitlesParser.Classes.SubtitleItem> subtitleItems = null;
using (var fileStream = File.OpenRead(pathToSrtFile))
{
subtitleItems = parser.ParseStream(fileStream);
}
Then how to write out subtitleItems
into srt
or vtt
files like srtFormat.srt
or vttFormat.vtt
?
At the moment I'm using the Node library named SrtToVtt.js
to convert srt
to vtt
but DotNet Core 3 marks it as Obsolete
. So I'm looking for another approach to convert it. Thank you.
I tried to run the Test project but the files are missing.
Manually copying the files from Test/Content
to Test/Content/bin
solves the issue.
But this step shall be done automatically.
The WriteStream
in the SrtWriter
method might called multiple times. I think the TextWriter writer
should not be closed early in this method.
SubtitlesParser/SubtitlesParser/Classes/Writers/SrtWriter.cs
Lines 58 to 72 in 19ab8d3
The tests seem to fail with the following output:
Parsing of file 20140120-074450_invalidSub: FAILURE
System.FormatException: Failed to parse as SubtitlesParser.Classes.SubtitlesFormat ---> System.ArgumentException: Stream is not in a valid Srt format
at SubtitlesParser.Classes.Parsers.SrtParser.ParseStream(Stream srtStream, Encoding encoding) in C:\Users\galdin\source\repos\SubtitlesParser\SubtitlesParser\Classes\Parsers\SrtParser.cs:line 104
at SubtitlesParser.Classes.Parsers.SubParser.ParseStream(Stream stream, Encoding encoding, Dictionary`2 subFormatDictionary) in C:\Users\galdin\source\repos\SubtitlesParser\SubtitlesParser\Classes\Parsers\SubParser.cs:line 120
--- End of inner exception stack trace ---
at SubtitlesParser.Classes.Parsers.SubParser.ParseStream(Stream stream, Encoding encoding, Dictionary`2 subFormatDictionary) in C:\Users\galdin\source\repos\SubtitlesParser\SubtitlesParser\Classes\Parsers\SubParser.cs:line 125
at SubtitlesParser.Classes.Parsers.SubParser.ParseStream(Stream stream, Encoding encoding, SubtitlesFormat subFormat) in C:\Users\galdin\source\repos\SubtitlesParser\SubtitlesParser\Classes\Parsers\SubParser.cs:line 85
at Test.Program.Main(String[] args) in C:\Users\galdin\source\repos\SubtitlesParser\Test\Program.cs:line 26
----------------------
Parsing of file 4989_ES.srt: SUCCESS (1092 items - 0% corrupted)
----------------
----------------------
Parsing of file ccSubs_com_pope-francis-speaks-about-religious-liberty_en.srt: SUCCESS (25 items - 8% corrupted)
----------------
----------------------
Parsing of file Children.of.Men.2006.DVD5.720p.HDDVD.x264-REVEiLLE.srt: SUCCESS (985 items - 0% corrupted)
----------------
----------------------
Parsing of file cloudy.with.a.risk.of.meatballs.ttml: SUCCESS (1109 items - 0% corrupted)
----------------
----------------------
Parsing of file Example With Comments.vtt: SUCCESS (2 items - 0% corrupted)
----------------
----------------------
Parsing of file Fight Club_eng.sub: FAILURE
System.FormatException: Failed to parse as SubtitlesParser.Classes.SubtitlesFormat ---> System.ArgumentException: Stream is not in a valid MicroDVD format
at SubtitlesParser.Classes.Parsers.MicroDvdParser.ParseStream(Stream subStream, Encoding encoding) in C:\Users\galdin\source\repos\SubtitlesParser\SubtitlesParser\Classes\Parsers\MicroDvdParser.cs:line 111
at SubtitlesParser.Classes.Parsers.SubParser.ParseStream(Stream stream, Encoding encoding, Dictionary`2 subFormatDictionary) in C:\Users\galdin\source\repos\SubtitlesParser\SubtitlesParser\Classes\Parsers\SubParser.cs:line 120
--- End of inner exception stack trace ---
at SubtitlesParser.Classes.Parsers.SubParser.ParseStream(Stream stream, Encoding encoding, Dictionary`2 subFormatDictionary) in C:\Users\galdin\source\repos\SubtitlesParser\SubtitlesParser\Classes\Parsers\SubParser.cs:line 125
at SubtitlesParser.Classes.Parsers.SubParser.ParseStream(Stream stream, Encoding encoding, SubtitlesFormat subFormat) in C:\Users\galdin\source\repos\SubtitlesParser\SubtitlesParser\Classes\Parsers\SubParser.cs:line 85
at Test.Program.Main(String[] args) in C:\Users\galdin\source\repos\SubtitlesParser\Test\Program.cs:line 26
----------------------
Parsing of file Game of Thrones - 03x05 - Kissed by Fire.2HD.English.HI.C.orig.Addic7ed.com.srt: FAILURE
System.FormatException: Failed to parse as SubtitlesParser.Classes.SubtitlesFormat ---> System.ArgumentException: Stream is not in a valid Srt format
at SubtitlesParser.Classes.Parsers.SrtParser.ParseStream(Stream srtStream, Encoding encoding) in C:\Users\galdin\source\repos\SubtitlesParser\SubtitlesParser\Classes\Parsers\SrtParser.cs:line 104
at SubtitlesParser.Classes.Parsers.SubParser.ParseStream(Stream stream, Encoding encoding, Dictionary`2 subFormatDictionary) in C:\Users\galdin\source\repos\SubtitlesParser\SubtitlesParser\Classes\Parsers\SubParser.cs:line 120
--- End of inner exception stack trace ---
at SubtitlesParser.Classes.Parsers.SubParser.ParseStream(Stream stream, Encoding encoding, Dictionary`2 subFormatDictionary) in C:\Users\galdin\source\repos\SubtitlesParser\SubtitlesParser\Classes\Parsers\SubParser.cs:line 125
at SubtitlesParser.Classes.Parsers.SubParser.ParseStream(Stream stream, Encoding encoding, SubtitlesFormat subFormat) in C:\Users\galdin\source\repos\SubtitlesParser\SubtitlesParser\Classes\Parsers\SubParser.cs:line 85
at Test.Program.Main(String[] args) in C:\Users\galdin\source\repos\SubtitlesParser\Test\Program.cs:line 26
----------------------
Parsing of file Gangs-of-New-York-DVD-Deadman-2.sub: SUCCESS (748 items - 0% corrupted)
----------------
----------------------
Parsing of file kYB8IZa5AuE.en.ttml: SUCCESS (169 items - 0% corrupted)
----------------
----------------------
Parsing of file No Captions - With comment block.vtt: SUCCESS (No items found!)
----------------------
Parsing of file Orange.is.the.new.black.s01e01.ttml: SUCCESS (813 items - 0% corrupted)
----------------
----------------------
Parsing of file Salvage.SRT: SUCCESS (474 items - 0% corrupted)
----------------
----------------------
Parsing of file The Mentalist - 3x11 - Episode 11.fr.srt: SUCCESS (786 items - 0% corrupted)
----------------
----------------------
Parsing of file timedtext.xml: FAILURE
System.FormatException: Failed to parse as SubtitlesParser.Classes.SubtitlesFormat ---> System.ArgumentException: Stream is not in a valid Srt format
at SubtitlesParser.Classes.Parsers.SrtParser.ParseStream(Stream srtStream, Encoding encoding) in C:\Users\galdin\source\repos\SubtitlesParser\SubtitlesParser\Classes\Parsers\SrtParser.cs:line 104
at SubtitlesParser.Classes.Parsers.SubParser.ParseStream(Stream stream, Encoding encoding, Dictionary`2 subFormatDictionary) in C:\Users\galdin\source\repos\SubtitlesParser\SubtitlesParser\Classes\Parsers\SubParser.cs:line 120
--- End of inner exception stack trace ---
at SubtitlesParser.Classes.Parsers.SubParser.ParseStream(Stream stream, Encoding encoding, Dictionary`2 subFormatDictionary) in C:\Users\galdin\source\repos\SubtitlesParser\SubtitlesParser\Classes\Parsers\SubParser.cs:line 125
at SubtitlesParser.Classes.Parsers.SubParser.ParseStream(Stream stream, Encoding encoding, SubtitlesFormat subFormat) in C:\Users\galdin\source\repos\SubtitlesParser\SubtitlesParser\Classes\Parsers\SubParser.cs:line 85
at Test.Program.Main(String[] args) in C:\Users\galdin\source\repos\SubtitlesParser\Test\Program.cs:line 26
----------------------
Parsing of file Wiedzmin.s01e01.DVDRip.DreamLair.srt: SUCCESS (233 items - 0% corrupted)
----------------
----------------------
Kind of want a nice convenient subtitle parser rather than rolling one myself, but Unity doesn't seem to support .net 4.0
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.