Coder Social home page Coder Social logo

Comments (10)

gettalong avatar gettalong commented on July 26, 2024

The handling of quotes is correct as per the kramdown spec. Other libraries like rubypants do the same as kramdown when given your test cases.

I recommend putting the underscores outside of the quotes.

from kramdown.

Vaguery avatar Vaguery commented on July 26, 2024

So when a quotation starts with an italic word, the italics should overlap the front quote? Not how typesetting works.

Just curious: why does it work correctly for single asterisks, but not for underscores?

"_Single underscores_ do not work."

"*Single asterisks* work as they should."

'__Double underscores__ do not work.'

'**Double underscores** do not work.'

from kramdown.

gettalong avatar gettalong commented on July 26, 2024

That's a good question! It should definitely work the same. rubypants shows the same, incorrect behavior...

Will have a look at it! And sorry for my dismissive behavior, should have investigated a bit more pedantically!

from kramdown.

Vaguery avatar Vaguery commented on July 26, 2024

It seems like a little edge case unless you've actually had first-hand experience trying to write a book in it, I know :)

I've already started writing a new parser, with specs and discrete acceptance tests for specified features.

Do you have any particular technical goals for kramdown, such as speed or ability to handle very large input files, or is accuracy of all the edge cases your main goal? Would it be a problem if I were to write a comprehensive suite of explicit unit and integration specs, rather than the large suite of fixtures you use now?

from kramdown.

gettalong avatar gettalong commented on July 26, 2024

The original goal was twofold:

  • to create a faster pure Ruby Markdown parser than Maruku and
  • to provide a syntax spec for the kramdown specific input so that the edge cases are all documented

Since the internal structure of the parsers and the converters doesn't really matter, I thought it would be better to provide test inputs and their respective outputs to ensure that changing the code doesn't invalidate the spec. Therefore the large suite of test cases.

You can use whatever you like and what seems best for you for testing your parser :)

from kramdown.

Vaguery avatar Vaguery commented on July 26, 2024

I'm just taking a little while to tease apart all the integration tests you have here into individual specs. A few "small changes" to the current parser broke a surprising variety of them in a lot of uninterpretable ways, so single-responsibility tests will help me step carefully.

Could you say a few words about the overall architecture? My problem visualizing the whole is that it feels as though each parser is applied in the order they appear in the @block_parsers (and then @span_parsers) list. Is that the case? They're being serially applied in that order to each component, recursively over each branch found?

So as a side effect of this serial ordering, I don't need to worry (for architectural reasons) about accidentally applying a smart_quotes parser to an ERB block? Because the "only things left" to parse will be "extra raw" text that should have the quotes curled, dashes replaced and so forth?

from kramdown.

gettalong avatar gettalong commented on July 26, 2024

The kramdown parser does its thing in multiple steps:

  • First the document is parsed for block level elements, the big building blocks.
  • Then the created document tree is updated by parsing raw_text elements with the span level parser and applying the IALs.
  • The last step is replacing abbreviations with the correct kramdown Element object.

The block and span parsers are applied in a serial kind of way, in the order they appear in @block_parser and @span_parsers.

Each parser defines a regexp that is used to identify the block or span level element. Since some parsers use similar regexps (e.g. setext header and paragraphs, both need a line of text) the order of the parsers is important. Also, by arranging the parsers that are not order-dependent in a most-commonly-used way it is possible to squeeze some more speed out of the kramdown parser.

In case of the block parsers, they are just tested on the current position of the document in the order they appear and the first one wins.

In case of the span parsers, the start regexps for all spans parsers are stitched together into one huge regexp. A match for this regexp is searched for and then the span parsers are tested in the order they appear in the @span_parsers array. This is done to avoid scanning each and every character.

Regarding an ERB block parser: This depends on how you create the element. If you create a custom :erb element with the ERB text in the #value accessor, you have nothing to worry about. Text will only be parsed by the span level parser if you create an element of type :raw_text with the ERB text!

from kramdown.

gettalong avatar gettalong commented on July 26, 2024

So, I have found a way to make your examples work without breaking my test suite :) So your presented use cases will work in the next release.

from kramdown.

Vaguery avatar Vaguery commented on July 26, 2024

Thank you! I worked on it for quite a while and was elbow-deep into a full parser rewrite before I got stuck. So I'm glad you found an incremental solution.

Do you have the sense that, especially as Markdown flavors get more popular, the ability to extend and revise the pattern-matching is starting to trump the speed of the implementations?

from kramdown.

gettalong avatar gettalong commented on July 26, 2024

Not really, no. There are also some fast implementations that are not regexp-based like the original Perl Markdown from Gruber but which are based on a PEG grammar that can more easily be modified.

I also have to admit that I don't immediately recognize what all the code of the kramdown parser does :) It takes me some time to understand some really complex parts (should have add more comments).

from kramdown.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.