Comments (2)
Here's a version that works with youtube.
I added support for the default namespace and changed the way the times are being parsed.
using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace SubtitlesParser.Classes.Parsers
{
public class TtmlParser:ISubtitlesParser
{
public List<SubtitleItem> ParseStream(Stream xmlStream, Encoding encoding)
{
// rewind the stream
xmlStream.Position = 0;
var items = new List<SubtitleItem>();
// parse xml stream
var xElement = XElement.Load(xmlStream);
XNamespace tt = xElement.GetNamespaceOfPrefix("tt");
if (tt == null)
tt = xElement.GetDefaultNamespace();
if (xElement != null)
{
var nodeList = xElement.Descendants(tt + "p").ToList();
if (nodeList != null)
{
for (var i = 0; i < nodeList.Count; i++)
{
var node = nodeList[i];
try
{
var reader = node.CreateReader();
reader.MoveToContent();
var beginString = node.Attribute("begin").Value.Replace("t", "");
long startTicks = ParseTimecode(beginString);
var endString = node.Attribute("end").Value.Replace("t", "");
long endTicks = ParseTimecode(endString);
var text = reader.ReadInnerXml().Replace("<tt:", "<").Replace("</tt:", "</").Replace(string.Format(@" xmlns:tt=""{0}""", tt), "").Replace(string.Format(@" xmlns=""{0}""", tt), "");
items.Add(new SubtitleItem()
{
StartTime = (int)(startTicks),
EndTime = (int)(endTicks),
Lines = new List<string>() { text }
});
}
catch (Exception ex)
{
Console.WriteLine("Exception raised when parsing xml node {0}: {1}", node, ex);
}
}
}
}
if (items.Any())
{
return items;
}
else
{
throw new ArgumentException("Stream is not in a valid TTML format, or represents empty subtitles");
}
}
/// <summary>
/// Takes an SRT timecode as a string and parses it into a double (in seconds). A SRT timecode reads as follows:
/// 00:00:20,000
/// </summary>
/// <param name="s">The timecode to parse</param>
/// <returns>The parsed timecode as a TimeSpan instance. If the parsing was unsuccessful, -1 is returned (subtitles should never show)</returns>
private int ParseTimecode(string s)
{
TimeSpan result;
if (TimeSpan.TryParse(s, out result))
{
var nbOfMs = (int)result.TotalMilliseconds;
return nbOfMs;
}
else
{
return -1;
}
}
}
}
from subtitlesparser.
Indeed, the timespans were not parsed correctly.
It's been fixed here: fb0c432
Thanks for the code; don't hesitate to do a pull request next time ;)
A new nuget package (1.4.7 has been released with those changes)
from subtitlesparser.
Related Issues (20)
- Test could not pass
- Why are you targeting "netcoreapp2.1" instead of "netstandard2.0"? HOT 2
- Please remove all Console.WriteLine calls in the parsers HOT 1
- Tests fail on master HOT 2
- compatibility error HOT 8
- Parsing as srt returned no srt part HOT 2
- How can I get seconds instead of milliseconds? HOT 1
- Location data is lost in the WebVTT file HOT 1
- How to convert SRT to VTT or vice versa? HOT 1
- MicroDvd recognized as SRT HOT 6
- WebVTT timestamps after 24 hours will parse as -1 HOT 3
- Updated NuGet HOT 2
- The decimal numbers of `StartTime` and `EndTime` of `SrtWriter` should be 3 instead of 2. HOT 1
- Parsing CSV type data with Split() has problems when 'Text' might contain commas HOT 2
- MIght want a way for SubParser to report which subtitle type it found HOT 3
- Stream is not in a valid Youtube XML format HOT 1
- Nuget Upgrade from 1.4.8 to 1.5.1 Failed
- The TextWriter writer should not be closed in the WriteStream method HOT 1
- Support for WebVTT writing HOT 1
- Support .NET Core HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from subtitlesparser.