Comments (4)
What I've tried
- rewriting to operate directly on Text instead of tokenizing first
- rewriting to operate directly on Text, using megaparsec instead of parsec, and using the fast parsers takeWhileP etc.
- rewriting to use ByteStrings instead of Texts in the Toks.
None of this achieved any speed improvement over the current version using [Tok]
; indeed, in every case performance was worse.
Profiling reveals that block structure parsing is fast. Most of the time is taken up by tokenize
and restOfLine
(31%), and by inline parsing.
Instructions for profiling
make prof
Current results (March 12 2020):
1.8 parseChunks
2.1 pDelimChunk
2.2 Commonmark.Blocks.runInlineParser
2.5 blockContinues
2.6 Commonmark.Inlines.processBs
2.9 MAIN
3.9 block_starts
6.6 renderHtml
9.0 pSymbol
11.9 defaultInlineParser
17.5 Commonmark.Tokens.tokenize
32.6 restOfLine
from commonmark-hs.
For a 1.4MB file:
from commonmark-hs.
Benchmarks for different extensions:
extension | mean |
---|---|
-xautolinks | 310.8 ms (309.3 ms .. 311.3 ms) |
-xpipe_tables | 295.2 ms (293.2 ms .. 296.6 ms) |
-xstrikethrough | 267.9 ms (265.6 ms .. 269.1 ms) |
-xsuperscript | 267.8 ms (264.9 ms .. 269.5 ms) |
-xsubscript | 266.8 ms (263.6 ms .. 267.9 ms) |
-xsmart | 293.0 ms (292.0 ms .. 294.3 ms) |
-xmath | 287.4 ms (285.4 ms .. 290.7 ms) |
-xemoji | 281.6 ms (280.3 ms .. 282.8 ms) |
-xfootnotes | 291.3 ms (286.1 ms .. 293.3 ms) |
-xdefinition_lists | 272.6 ms (271.0 ms .. 275.4 ms) |
-xfancy_lists | 271.2 ms (269.3 ms .. 273.8 ms) |
-xattributes | 284.2 ms (283.4 ms .. 285.7 ms) |
-xraw_attribute | 280.7 ms (279.6 ms .. 281.6 ms) |
-xbracketed_spans | 268.5 ms (267.0 ms .. 269.4 ms) |
-xfenced_divs | 269.6 ms (267.5 ms .. 271.6 ms) |
-xauto_identifiers | 274.9 ms (273.0 ms .. 277.8 ms) |
-ximplicit_heading_references | 269.8 ms (268.2 ms .. 272.8 ms) |
-xall | 520.4 ms (515.5 ms .. 523.6 ms) |
from commonmark-hs.
One idea to explore: use ShortText
from text-short
package instead of Text
in Tok
.
The public API could still use Text
.
This should reduce the memory used by the tokens.
from commonmark-hs.
Related Issues (20)
- Feature request: support MyST directives and roles HOT 2
- Strikethrough text issue in commonmark_x mode HOT 3
- HTML entities in wikilinks HOT 6
- Will this support {-} to make section unnumbered
- Pandoc hangs when using +sourcepos on markdown files ending with ordered or unordered lists HOT 6
- `attributes` commonmark extension swallows attributes HOT 2
- `fenced_divs` doesn't support unbraced class names in commonmark HOT 6
- Implement `markdown_in_html_blocks` extension HOT 3
- More conservative `sourcepos` HOT 4
- commonmark_x reader mishandles ~~strikeout~~ HOT 4
- Improve architecture for FormattingSpecs
- commonmark+fancy_lists awkward parsing of some alphabetic list labels as roman HOT 4
- Add a "Free" renderer implementation of IsBlock/IsInline HOT 4
- `show :: SourceRange -> String` is ambiguous HOT 2
- Too-wide version bounds in commonmark-extensions HOT 5
- `gfmExtensions` parser drops out of list when `|` is encountered HOT 15
- Multiple paragraph <li> under <dd> HOT 9
- Autolinks extension should ignore URIs inside link descriptions HOT 1
- commonmark-cli Vs cmark cli tool HOT 1
- Attributes extension regression in commonmark-extensions 0.2.3.3 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from commonmark-hs.