Coder Social home page Coder Social logo

Comments (2)

Bryan-Legend avatar Bryan-Legend commented on September 14, 2024

Here's a version that works with youtube.

I added support for the default namespace and changed the way the times are being parsed.

using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;

namespace SubtitlesParser.Classes.Parsers
{
    public class TtmlParser:ISubtitlesParser
    {
        public List<SubtitleItem> ParseStream(Stream xmlStream, Encoding encoding)
        {
            // rewind the stream
            xmlStream.Position = 0;
            var items = new List<SubtitleItem>();

            // parse xml stream
            var xElement = XElement.Load(xmlStream);
            XNamespace tt = xElement.GetNamespaceOfPrefix("tt");
            if (tt == null)
                tt = xElement.GetDefaultNamespace();

            if (xElement != null)
            {
                var nodeList = xElement.Descendants(tt + "p").ToList();

                if (nodeList != null)
                {
                    for (var i = 0; i < nodeList.Count; i++)
                    {
                        var node = nodeList[i];
                        try
                        {
                            var reader = node.CreateReader();
                            reader.MoveToContent();
                            var beginString = node.Attribute("begin").Value.Replace("t", "");
                            long startTicks = ParseTimecode(beginString);
                            var endString = node.Attribute("end").Value.Replace("t", "");
                            long endTicks = ParseTimecode(endString);
                            var text = reader.ReadInnerXml().Replace("<tt:", "<").Replace("</tt:", "</").Replace(string.Format(@" xmlns:tt=""{0}""", tt), "").Replace(string.Format(@" xmlns=""{0}""", tt), "");

                            items.Add(new SubtitleItem()
                            {
                                StartTime = (int)(startTicks),
                                EndTime = (int)(endTicks),
                                Lines = new List<string>() { text }
                            });
                        }
                        catch (Exception ex)
                        {
                            Console.WriteLine("Exception raised when parsing xml node {0}: {1}", node, ex);
                        }
                    }  
                }
            }

            if (items.Any())
            {
                return items;
            }
            else
            {
                throw new ArgumentException("Stream is not in a valid TTML format, or represents empty subtitles");
            }
        }

        /// <summary>
        /// Takes an SRT timecode as a string and parses it into a double (in seconds). A SRT timecode reads as follows: 
        /// 00:00:20,000
        /// </summary>
        /// <param name="s">The timecode to parse</param>
        /// <returns>The parsed timecode as a TimeSpan instance. If the parsing was unsuccessful, -1 is returned (subtitles should never show)</returns>
        private int ParseTimecode(string s)
        {
            TimeSpan result;

            if (TimeSpan.TryParse(s, out result))
            {
                var nbOfMs = (int)result.TotalMilliseconds;
                return nbOfMs;
            }
            else
            {
                return -1;
            }
        }
    }
}

from subtitlesparser.

AlexPoint avatar AlexPoint commented on September 14, 2024

Indeed, the timespans were not parsed correctly.
It's been fixed here: fb0c432
Thanks for the code; don't hesitate to do a pull request next time ;)
A new nuget package (1.4.7 has been released with those changes)

from subtitlesparser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.