Comments (10)
The handling of quotes is correct as per the kramdown spec. Other libraries like rubypants do the same as kramdown when given your test cases.
I recommend putting the underscores outside of the quotes.
from kramdown.
So when a quotation starts with an italic word, the italics should overlap the front quote? Not how typesetting works.
Just curious: why does it work correctly for single asterisks, but not for underscores?
"_Single underscores_ do not work."
"*Single asterisks* work as they should."
'__Double underscores__ do not work.'
'**Double underscores** do not work.'
from kramdown.
That's a good question! It should definitely work the same. rubypants shows the same, incorrect behavior...
Will have a look at it! And sorry for my dismissive behavior, should have investigated a bit more pedantically!
from kramdown.
It seems like a little edge case unless you've actually had first-hand experience trying to write a book in it, I know :)
I've already started writing a new parser, with specs and discrete acceptance tests for specified features.
Do you have any particular technical goals for kramdown, such as speed or ability to handle very large input files, or is accuracy of all the edge cases your main goal? Would it be a problem if I were to write a comprehensive suite of explicit unit and integration specs, rather than the large suite of fixtures you use now?
from kramdown.
The original goal was twofold:
- to create a faster pure Ruby Markdown parser than Maruku and
- to provide a syntax spec for the kramdown specific input so that the edge cases are all documented
Since the internal structure of the parsers and the converters doesn't really matter, I thought it would be better to provide test inputs and their respective outputs to ensure that changing the code doesn't invalidate the spec. Therefore the large suite of test cases.
You can use whatever you like and what seems best for you for testing your parser :)
from kramdown.
I'm just taking a little while to tease apart all the integration tests you have here into individual specs. A few "small changes" to the current parser broke a surprising variety of them in a lot of uninterpretable ways, so single-responsibility tests will help me step carefully.
Could you say a few words about the overall architecture? My problem visualizing the whole is that it feels as though each parser is applied in the order they appear in the @block_parsers
(and then @span_parsers
) list. Is that the case? They're being serially applied in that order to each component, recursively over each branch found?
So as a side effect of this serial ordering, I don't need to worry (for architectural reasons) about accidentally applying a smart_quotes
parser to an ERB block? Because the "only things left" to parse will be "extra raw" text that should have the quotes curled, dashes replaced and so forth?
from kramdown.
The kramdown parser does its thing in multiple steps:
- First the document is parsed for block level elements, the big building blocks.
- Then the created document tree is updated by parsing
raw_text
elements with the span level parser and applying the IALs. - The last step is replacing abbreviations with the correct kramdown Element object.
The block and span parsers are applied in a serial kind of way, in the order they appear in @block_parser
and @span_parsers
.
Each parser defines a regexp that is used to identify the block or span level element. Since some parsers use similar regexps (e.g. setext header and paragraphs, both need a line of text) the order of the parsers is important. Also, by arranging the parsers that are not order-dependent in a most-commonly-used way it is possible to squeeze some more speed out of the kramdown parser.
In case of the block parsers, they are just tested on the current position of the document in the order they appear and the first one wins.
In case of the span parsers, the start regexps for all spans parsers are stitched together into one huge regexp. A match for this regexp is searched for and then the span parsers are tested in the order they appear in the @span_parsers
array. This is done to avoid scanning each and every character.
Regarding an ERB block parser: This depends on how you create the element. If you create a custom :erb
element with the ERB text in the #value
accessor, you have nothing to worry about. Text will only be parsed by the span level parser if you create an element of type :raw_text
with the ERB text!
from kramdown.
So, I have found a way to make your examples work without breaking my test suite :) So your presented use cases will work in the next release.
from kramdown.
Thank you! I worked on it for quite a while and was elbow-deep into a full parser rewrite before I got stuck. So I'm glad you found an incremental solution.
Do you have the sense that, especially as Markdown flavors get more popular, the ability to extend and revise the pattern-matching is starting to trump the speed of the implementations?
from kramdown.
Not really, no. There are also some fast implementations that are not regexp-based like the original Perl Markdown from Gruber but which are based on a PEG grammar that can more easily be modified.
I also have to admit that I don't immediately recognize what all the code of the kramdown parser does :) It takes me some time to understand some really complex parts (should have add more comments).
from kramdown.
Related Issues (20)
- Fenced codeblocks column (not the list indentation) should determine the stripped whitespace from the code HOT 3
- handling `br` inside formatting tags HOT 2
- [feature] support inline notes
- Attribute name consisting of colon does not get parsed HOT 1
- Code block language not parsing in details/summary following math expression HOT 2
- Custom Parser Support HOT 1
- MD to HTML is adding spaces to code segments. HOT 4
- Add an extension for parsing Mermaid syntax to the libraries extending the functionality of kramdown HOT 1
- Could not find gem 'kramdown' in https://github.com/gettalong/kramdown.git HOT 5
- Pipe characters within a inline link is parsed as a table (rather than a link) HOT 1
- `markdown="0"` meaning HOT 1
- Select whether an HTML tag is parsed as block or span level HOT 2
- `time` should be span level by default HOT 1
- Void elements should not use trailing slashes. HOT 2
- Jekyll kramdown: How to disable generating styles in tables? HOT 1
- Rouge CSS not applied to code HOT 2
- Japanese chars combined with MD syntax are not converted HOT 5
- Can I remove the image parsing method? HOT 2
- parse failure on empty tables
- Odd behavior with nested table and mixing block and inline elements HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kramdown.