Coder Social home page Coder Social logo

3bmd's People

Contributors

3b avatar archimag avatar hyotang666 avatar ljanyst avatar m-n avatar mdbergmann avatar melisgl avatar puercopop avatar shinmera avatar svetlyak40wt avatar vseloved avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

3bmd's Issues

Escaping headings

A have a bit of headache with the parse/print consistency of headings.

First, and this may be how markdown works, if there is no newline after the "heading", then it's parsed as :PLAIN:

CL-USER> (3bmd::parse-doc "x
#y
")
((:PARAGRAPH "x") (:HEADING :LEVEL 1 :CONTENTS ("y")))
NIL
T
CL-USER> (3bmd::parse-doc "x
#y")
((:PARAGRAPH "x") (:PLAIN "#" "y"))
NIL
T

When the latter is printed, an extra newline is inserted:

CL-USER> (3bmd:print-doc-to-stream (3bmd::parse-doc "x
#y") t :format :markdown)
x

#y
NIL

When the heading is escaped, the parse is good, but printing loses the escape:

CL-USER> (3bmd::parse-doc "x
\\#y
")
((:PLAIN "x" "
"
  "#" "y"))
NIL
T
CL-USER> (3bmd:print-doc-to-stream (3bmd::parse-doc "x
\\#y
") t)
x
#y
NIL

If this output is parsed again, then we get a :HEADING. Thus print/parse consistency is lost.

The quick fix would be to escape all # characters in print-md-escaped, but that produces unnecessarily cluttered output, which goes against the spirit of markdown. The right solution seems to be to escape only in column 0, but that's not easily and portably available.

Undefined function: parse-string

Hello,

This is surprising, but why not:

(3bmd:parse-string "rst")

; in: 3BMD:PARSE-STRING "rst"
;     (3BMD:PARSE-STRING "rst")
; 
; caught STYLE-WARNING:
;   undefined function: 3BMD:PARSE-STRING

Slime finds this choice, I find parse-string as an exported symbol, but my grep and my eyes couldn't find a function definition too.

with Quicklisp of january.

regards

mailto randomness

When a mailto element is printed to html there is randomness injected into the encoding supposedly to make life more difficult for spammers:

(defun encode-email (text)
  (with-output-to-string (s)
    (loop for i across text
       for r = (random 1.0)
       do (cond
            ((< r 0.1) (write-char i s))
            ;; fixme: make this portable to non-unicode/ascii lisps?
            ((< r 0.6) (format s "&#x~x;" (char-code i)))
            (t (format s "&#~d;" (char-code i)))))))

Unfortunately, this has the side effect of introducing spurious diffs when the generated html is version controlled. Would a deterministic solution be acceptable?

Fails on ABCL-1.9.2

It fails as:

:info:build Caught UNBOUND-VARIABLE while processing --eval option "(asdf:operate (quote asdf:build-op) (quote 3bmd-tests))":
:info:build   The variable DEF-GRAMMAR-TEST is unbound.
:info:build Command failed: env XDG_CACHE_HOME=$HOME/.cache /opt/local/bin/abcl --noinit --batch --eval '(require "asdf")' --eval '(setf asdf:*central-registry* (list* (quote *default-pathname-defaults*) #p"/opt/local/var/macports/build/_Users_catap_src_macports-ports_lisp_cl-3bmd/cl-3bmd/work/build/system/" #p"/opt/local/share/common-lisp/system/" asdf:*central-registry*))' --eval '(asdf:operate (quote asdf:build-op) (quote 3bmd-tests))' 2>&1

SBCL, ECL, CLisp and CCL works.

support CommonMark?

3bmd is older than CommonMark, so it tries to implement the original markdown syntax with reference to behavior of other markdown processors where that was ambiguous. That strategy has all the problems that motivated CommonMark, and CommonMark seems popular enough now that not matching it is annoying and/or confusing to users (ex #45).

Unfortunately, it looks like it would be difficult or impossible to write a proper PEG/TDPL grammar for the entire CommonMark spec at once, so it would probably be hard to maintain compatibility with existing 3bmd extensions.

It probably wouldn't be too hard to write a new parser using something like the multiple pass lines -> blocks -> inlines strategy suggested by the spec. The inlines pass might be able to reuse a lot of the 3bmd inline grammar, possibly with some limitations on length of code span delimiters and similar. In that case, inline extensions might be usable without too much changes (I'd probably want to clean up the AST in the process though, so they would need updates to match that). Block elements would need rewritten though, not sure if that pass would use esrap for parsing or if it would need something more complicated to handle the arbitrary indentation in lists/blockquotes. Possibly a hybrid with an esrap rule to detect start of a block, and then let the block parse the following lines however it wants.

I don't have any current plans to work on such a thing though, since my current limited markdown needs are satisfied by 3bmd as it is and I have other things that are higher priority for now (unless someone has a pile of money to throw at a commonmark parser or something). It does seem interesting enough that I might try to at least do a proof-of-concept between other projects at some point, but will probably be a while if so.


some related links:
CommonDoc : probable replacement for the ad-hoc AST in 3bmd in a rewrite
commondoc-markdown : Project using 3bmd with CommonDoc, possibly supporting CommonMark in the future.
cl-cmark : CommonMark processor using FFI to libcmark

Clean up parse tree and add to public interface.

Current parse tree is mostly derived from the grammar rather than having any though put into it.

Would be nicer to have a more logical parse tree as an officially supported part of the API, for people who want to modify it or add other output formats.

Code blocks in list items

Code blocks lose the indent when printed:

CL-USER> (let ((3bmd-code-blocks:*code-blocks* t))
           (3bmd-grammar:parse-doc "
- xxx

    ```
    0123456789
            89
    ```
"))
((:BULLET-LIST
  (:LIST-ITEM (:PARAGRAPH "xxx")
   (3BMD-CODE-BLOCKS::CODE-BLOCK :LANG "" :PARAMS NIL :CONTENT "0123456789
        89"))))
NIL
T
CL-USER> (let ((3bmd-code-blocks:*code-blocks* t))
           (3bmd:print-doc-to-stream * *standard-output* :format :markdown))
- xxx

    ```
0123456789
        89
```

smart quotes and backslash

Running this:

(let ((3bmd-grammar:*smart-quotes* t))
  (3bmd:parse-string-and-print-to-stream "\\'" *standard-output*))

gives the error:

Cannot FUNCALL the SYMBOL-FUNCTION of special operator QUOTE.

make sure parser always returns something useful

The grammar should match all input, but in case of bugs it would be nice to (optionally?) catch parse errors and return something useful anyway.

  • first step would probably be to add a catch-all (* character) to the end of the doc rule, and add it to the blocks (maybe as an extra plain block?)
  • "incomplete parse" errors should probably be handled the same way (maybe not even bother with trying to catch extra in the doc grammar if this needs to be here anyway?)
  • "parse failed" should just return the original input?

One giant paragraph instead of separate paragraph tags

Hello!

Thank you for your amazing project. I am using it to write a static site generator and am running into an issue where it it outputs a single paragraph tag for an entire string of text with newlines instead of separating into new paragraphs on the newlines.

The below code:

This is a first post. I am excited to have this post in place. I am using a new blogging engine I wrote myself in Common Lisp.

This is me hoping the paragraph gets formatted properly.

Gives me the following output:

<p>This is a first post. I am excited to have this post in place. I am using a new blogging engine I wrote myself in Common Lisp.This is me hoping the paragraph gets formatted properly.</p>

Any assistance with this issue would be much appreciated. I am running with the latest build from quicklisp on SBCL for macOS.

Failed tests on clisp

I'm using clisp from https://gitlab.com/gnu-clisp/clisp/-/commit/66924971790e4cbee3d58f36e530caa0ad568e5f and attempt to run tests via MacPorts leads to failure:

:info:test   INDENT-BY-TAB-SHOULD-BE-REPLACED-WITH-SPACES........................................................................[ OK ]
:info:test   BLANK-LINE-TEST2........................................................................[ OK ]
:info:test   BLANK-LINE-TEST1........................................................................[ OK ]
:info:test   NEWLINE........................................................................[ OK ]
:info:test   MULTIPLE-SPACES-MIXED-WITH-TABS........................................................................[ OK ]
:info:test   TAB-AS-SPACE........................................................................[ OK ]
:info:test   SPACE-TEST........................................................................[ OK ]
:info:test   EOF-TEST........................................................................[ OK ]
:info:test Test run had 1 failure:
:info:test   Failure 1: FAILED-ASSERTION when running 3BMD-TESTS::PARSE-LIST-WITH-CARRIAGE-RETURN
:info:test     Binary predicate (EQUALP X Y) failed.
:info:test     x: 3BMD-TESTS::RESULT => 
:info:test     ((:BULLET-LIST
:info:test       (:LIST-ITEM
:info:test        (:PLAIN "x"
:info:test         "
:info:test     "
:info:test         "y"
:info:test         "
:info:test     "
:info:test         "Not" " " "verbatim"))))
:info:test     y: 3BMD-TESTS::EXPECTED => 
:info:test     ((:BULLET-LIST
:info:test       (:LIST-ITEM
:info:test        (:PARAGRAPH
:info:test         "x
:info:test     y")
:info:test        (:PARAGRAPH "Not" " " "verbatim"))))
:info:test *** - tests failed

Escaping curly brackets

In commit 18a59d3, I changed print-md-escaped to escape the [] and {} characters. The former was necessary for print/parse consistency, while the latter wasn't because {} are not parsed specially (except for allowing them to be backslash escaped). However, in melisgl/mgl-pax#28, we find that escaping curly brackets makes outputting latex-in-markdown for pandoc a pain.

Do you think not escaping them would be correct?

Processing instructions are escaped instead of passed through

Out of the box 3bmd doesn't recognise that processing instructions are valid:

cl-user> (3bmd:parse-string-and-print-to-stream "<?this is a valid processing instruction?>" t)
<p>&lt;?this is a valid processing instruction?&gt;</p>

At least according to the CommonMark spec, processing instructions are allowed as blocks and inlines, and should be passed through verbatim.

Definition lists extention does not work because of error

There is no applicable method for the generic function #<STANDARD-GENERIC-FUNCTION 3BMD-EXT:PRINT-MD-TAGGED-ELEMENT (35)> when called with arguments (3BMD-DEFINITION-LISTS::DEFINITION-LIST #<SB-IMPL::STRING-OUTPUT-STREAM {666A6F3}> ((:TERMS ((3BMD-DEFINITION-LISTS::DEFINITION-TERM "test" " " "definition")) :DEFINITIONS ((3BMD-DEFINITION-LISTS::DEFINITION-LIST-ITEM (:PLAIN "The" " " "definition" " " "test")))) (:TERMS ((3BMD-DEFINITION-LISTS::DEFINITION-TERM "second" " " "item")) :DEFINITIONS ((3BMD-DEFINITION-LISTS::DEFINITION-LIST-ITEM (:PLAIN "Nother" " " "definition" " " "test")))))).

Code is expected PRINT-MD-TAGGED-ELEMENT method, but extension defines PRINT-TAGGED-ELEMENT.

Pygments option to ext-code-blocks extension should probably be secured better

Currently the Pygments mode of ext-code-blocks passes user input to the pygmentize for the language and options. The code tries to do so safely by trying to avoid going through a shell and rejecting the cssfile option, but it would probably be better to whitelist the allowed options and either whitelist the languages (possibly querying from pygmentize on first use?) or at least restrict the characters allowed.

Memory usage on large inputs

I'm using the per-block implementation in parse-doc, but it's still fairly easy to run out of memory with large %blocks with something like this:

CL-USER> (time
          (let ((input (with-output-to-string (out)
                         (loop repeat 100000
                               do (format out "- ~A ~A ~A ~A~%"
                                          (random 1000000) (random 1000000)
                                          (random 1000000) (random 1000000))))))
            (3bmd-grammar::parse-doc input)
            (length input)))
Evaluation took:
  12.364 seconds of real time
  12.371129 seconds of total run time (11.750773 user, 0.620356 system)
  [ Run times consist of 5.771 seconds GC time, and 6.601 seconds non-GC time. ]
  100.06% CPU
  37,030,481,562 processor cycles
  15,570,202,368 bytes consed
  
2955662
CL-USER> (/ 15570202368 2955662.0)
5267.924

This example uses a bulleted list because it is probably the worst offender, but a large paragraph behaves similarly.

According to time, consing scales linearly with the number of repeats, which is good. Perhaps 5267 bytes per character is too high, but I suspect that the main problem is that maximum size of the working set also scales linearly.

figure out proper handling of lists with some blank lines

in lists with some items separated by blank lines, we currently treat all elements as paragraphs.

markdown.pl only treat entries before/after blank lines as paragraph (2,3,4 in example).

Github treats everything starting before the first line as a paragraph (2,3,4,5,6 in example below).

see http://babelmark.bobtfish.net/ for a comparison of various other implementations, all 3 behaviours seem reasonably common

test case:

* l0
* l1
* l2

* l3

* l4
* l5
* l6

test case as displayed by github:

  • l0

  • l1

  • l2

  • l3

  • l4

  • l5

  • l6

Code blocks highlighting

It seems that regular, indented code blocks are treated differently than unindented (marked with ```) code blocks in terms of highlighting? IMO, even though there's currently no way to set a language, the indented code blocks should also undergo highlighting, at least that's how other formatters, e.g. Stackoverflow render it too.

`3bmd::ensure-paragraph` undefined

When trying to print as markdown I get an error that the function 3bmd::ensure-paragraph is undefined, and inspection shows that 3bmd::end-paragraph also is.

Hypothesis: these were renamed 3bmd::ensure-block and 3bmd::end-block and the one call site didn't get edited. Can you confirm, @melisgl?

:description

Would you please consider adding a :description option to your system definition of 3bmd, 3bmd-ext-code-blocks and 3bmd-ext-wiki-links?

:REFLINK :DEFINITION that looks like :EMPH

Is this not valid input?

(3bmd-grammar:parse-doc "[l][*x*]")
.. debugger invoked on SB-KERNEL:CASE-FAILURE:
..   :EMPH fell through ETYPECASE expression.
..   Wanted one of (STRING CHARACTER LIST).

"# foo" fails to parse

... with error "Incomplete parse, stopped at 6.". If I add a newline and some more text, it works.

We use Markdown because it's able to output meaningful HTML no matter how bad is the input; so more generally, it would be nice to have an option to just accept any input and never throw a parse error.

Accepting empty cells in table

3bmd-ext-tables ignore empty cells in table. In a following text, 3bmd don't render correctly.

| a |   |
| - | - |
|   | b |

I expect to render like a following.

a
b

I guess the cause is

(+ (and (! (or (and sp #\|) endline)) inline))

Thank you.

Code-blocks, nested into a list items aren't supported

There are two problems:

  • Parsed code goes as a list item's sibling despite that it has the same indentation as item's content.
  • And they are parsed as inline code instead of CODE-BLOCK.
40ANTS-DOC-TEST/UTILS-TEST> (let ((3bmd-code-blocks:*code-blocks* t))
                              (3bmd-grammar:parse-doc "
* Added a warning mechanism, which will issue such warnings on words which looks
  like a symbol, but when real symbol or reference is absent:

  ```
  WARNING: Unable to find symbol \"API\" mentioned in (CL-INFO:@INDEX SECTION)
  ```
"))
((:BULLET-LIST
  (:LIST-ITEM
   (:PLAIN "Added" " " "a" " " "warning" " " "mechanism," " " "which" " "
    "will" " " "issue" " " "such" " " "warnings" " " "on" " " "words" " "
    "which" " " "looks" "
"
    "  " "like" " " "a" " " "symbol," " " "but" " " "when" " " "real" " "
    "symbol" " " "or" " " "reference" " " "is" " " "absent:")))
 (:PLAIN "  "
  (:CODE "
  WARNING: Unable to find symbol \"API\" mentioned in (CL-INFO:@INDEX SECTION)
")))
NIL
T

When there is now indentation, than code block is parsed correctly:

40ANTS-DOC-TEST/UTILS-TEST> (let ((3bmd-code-blocks:*code-blocks* t))
                              (3bmd-grammar:parse-doc "
```
WARNING: Unable to find symbol \"API\" mentioned in (CL-INFO:@INDEX SECTION)
```
"))
((3BMD-CODE-BLOCKS::CODE-BLOCK :LANG "" :PARAMS NIL :CONTENT
  "WARNING: Unable to find symbol \"API\" mentioned in (CL-INFO:@INDEX SECTION)"))
NIL
T

add option for less-strict html blocks

cl-mongo README.md embeds documentation created with docmentation template, which doesn't close <p> tags, and the embedded chunks are <p> followed immediately by a <blockquote> in 1 html block rather than separated into 2 as I understand the markdown docs to require.

Github parses these chunks as HTML blocks when rendering documentation, but doesn't seem to in issues.

Probably can get reasonable parsing by optionally allowing multiple html-block-in-tags in one html-block, and adding a variant of <p> that is closed by a html-block-in-tags (or possibly only a subset of block tags?) rather than </p>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.