Coder Social home page Coder Social logo

orgmode-parse's Introduction

Welcome!

Hackage Version Travis CI Status

orgmode-parse provides a top-level parser and collection of attoparsec parser combinators for org-mode structured text.

You can find the package on Hackage.

What's Finished

We have built attoparsec parsers for parsing org-mode document structures and meta-data. Here is a list of all the syntax features that have a complete parsing implementation and not:

  • Headlines and Sections
  • Affiliated Keywords
  • [-] Greater Elements
    • Greater Blocks
    • Drawers
    • Dynamic Blocks
    • Footnote Definitions
    • Inlinetasks
    • Plain Lists and Items
      • Unordered lists
      • Numbered lists
      • Checkbox modified lists
    • Property Drawers
    • Tables
  • Elements
    • Babel Cell
    • Blocks
    • Clock, Diary Sexp and Planning
      • Scheduled and deadline timestamps (timestamp, range, duration, periodicity)
        • Active and inactive timestamps
      • Clock timestamps
    • Comments
    • Fixed Width Areas
    • Horizontal Rules
    • Keywords
    • LaTeX Environments
    • Node Properties
    • Paragraphs
    • Table Rows
  • Objects
    • [-] Entities and LaTeX Fragments
    • Export Snippets
    • Footnote References
    • Inline Babel Calls and Source Blocks
    • Line Breaks (\)
    • Links
    • Macros
    • Targets and Radio Targets
    • Statistics Cookies
    • Table Cells
    • [-] Timestamps
    • Text Markup
      • Bold
      • Italic
      • Strikethrough
      • Underline
      • Superscript
      • Subscript
      • Code / monospaced
  • Position Annotated AST

Building

There are a few ways to build this library if you're developing a patch:

  • stack build && stack test, and
  • nix-build --no-out-link --attr orgmode-parse release.nix

You can also use the nix-shell provided cabal environment for incremental development:

$ nix-shell
$ cabal build

Projects that use this package:

https://github.com/volhovm/orgstat

License

BSD3 Open Source Software License

orgmode-parse's People

Contributors

avnik avatar chrissound avatar imalsogreg avatar ixmatus avatar jakeisnt avatar nushio3 avatar smurphy8 avatar volhovm avatar zhujinxuan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

orgmode-parse's Issues

Does not compile with recent version of GHC

Just dumping the output here:

    /tmp/stack17172/orgmode-parse-0.2.1/src/Data/OrgMode/Types.hs:97:32: error:
        • No instance for (Semigroup Properties)
            arising from the 'deriving' clause of a data type declaration
          Possible fix:
            use a standalone 'deriving instance' declaration,
              so you can specify the instance context yourself
        • When deriving the instance for (Monoid Properties)
       |
    97 |   deriving (Show, Eq, Generic, Monoid)
       |                                ^^^^^^
    
    /tmp/stack17172/orgmode-parse-0.2.1/src/Data/OrgMode/Types.hs:103:32: error:
        • No instance for (Semigroup Logbook)
            arising from the 'deriving' clause of a data type declaration
          Possible fix:
            use a standalone 'deriving instance' declaration,
              so you can specify the instance context yourself
        • When deriving the instance for (Monoid Logbook)
        |
    103 |   deriving (Show, Eq, Generic, Monoid)
        |                                ^^^^^^
    

more general types

Wouldn't it be nicer to put the types into the module Data.Orgmode instead of Data.Orgmode.Parse? Nonparsing actions should be handled with the help of those types, like searching and filtering.

(By the way, are those types used anywhere else already?)

Only partial parsing of documents

Any use of parseDocument will result in a partial parsing.

To reproduce:
Apply parse (parseDocument []) to this text file:

* Heading

Giving a empty "" string to the resulting partial's parser results in the document as expected:


parseDoc :: Text -> Document
parseDoc text = case parse (parseDocument []) text of
                  Partial f -> case f "" of
                                Done _ doc -> doc

Subtree list items turn into headers

Bug in 0.2.0. Consider this org-file.

* Header1
** Header2
*** Header3
    :LOGBOOK:
    :END:
    * Item1
    * Item2
** Header4

When logbook is not present, it's parsed correctly. When it is (even if it's empty), we get:

* Header1
** Header2
*** Header3
    :LOGBOOK:
    :END:
    * Item1
    * Item2
** Item1
*** Header4

Idea for collaboration

Hello, I found your project from the worg tools list. First, sorry for the semi-spam nature of this issue. I had a notion for a project that the org community might find useful and I'm looking for feedback. Feel free to close this issue if it doesn't sound useful to you.

My idea is to start a list of org-mode snippets which can serve as a test bed for people developing tools. The idea is that having a separate collection of examples makes it easier for others in the community to benefit from the examples developed through communication with users.

Users could use these samples to try to construct minimal examples of issues they're having and/or contribute examples there which others could benefit from. Exactly how it will take shape is still up in the air.

These samples could also serve as a place to discuss ideas about how to develop the grammar itself. According to worg, the spec is still in draft state.

There's not much there at the moment. Mostly because I don't want to commit too early to what seems like it might be useful. I'll add more examples as I go.

If you like the concept and/or want to contribute and/or just want to offer feedback, I'd very much appreciate it.

Again, sorry for the spam.

Support LOGBOOK

It would be nice if orgmode-parse supported documents like this:

* Testing
:LOGBOOK:
CLOCK: [2015-10-05 Mon 17:13]--[2015-10-05 Mon 17:14] =>  0:01
:END:

 Some paragraph text, yada-yada.

Currently it behaves like this:

Prelude Data.Attoparsec.Text Data.Text Data.OrgMode.Parse> test <- readFile "/home/gleber/test.org"
test :: String
Prelude Data.Attoparsec.Text Data.Text Data.OrgMode.Parse> parseOnly (parseDocument []) (pack test)
Right (Document {documentText = "\n", documentHeadings = [Heading {level = Level 1, keyword = Nothing, priority = Nothing, title = "Testing", stats = Nothing, tags = [], section = Section {sectionPlannings = Plns (fromList []), sectionClocks = [], sectionProperties = fromList [], sectionParagraph = ":LOGBOOK:\nCLOCK: [2015-10-05 Mon 17:13]--[2015-10-05 Mon 17:14] =>  0:01\n:END:\n\n Some paragraph text, yada-yada.\n"}, subHeadings = []}]})
it :: Either String Document

Make bracketedDateTime a dateTime

The data type BracketedDateTime is almost identical to DateTime. These two data types and their associated parsers can be unified.

How can I help with filling out more of orgmode-parse?

I would love to help fill out more of your markup renderer in orgmode-parse.
Do you have an idea of what you want the markup types to look like?
I see in the readme it says this is being worked on. Any pointer on where the work is?

Thanks

Parsing a timestamp seems to only work if SCHEDULED or DEADLINE

I have been testing the most recent version of orgmode-parse ( e9e034c at the moment ) and it does not seem to parse timestamps that are not part of a SCHEDULED or DEADLINE item. For example, from the timestamps page:

http://orgmode.org/manual/Timestamps.html#Timestamps

* Meet Peter at the movies
  <2006-11-01 Wed 19:15>

* Discussion on climate change
  <2006-11-02 Thu 20:00-22:00>

parses into:

Right (Document {documentText = "", documentHeadings = [Heading {level = Level 1, keyword = Nothing, priority = Nothing, title = "Meet Peter at the movies", stats = Nothing, tags = [], section = Section {sectionPlannings = Plns (fromList []), sectionClocks = [], sectionProperties = fromList [], sectionParagraph = "  <2006-11-01 Wed 19:15>\n\n"}, subHeadings = []},Heading {level = Level 1, keyword = Nothing, priority = Nothing, title = "Discussion on climate change", stats = Nothing, tags = [], section = Section {sectionPlannings = Plns (fromList []), sectionClocks = [], sectionProperties = fromList [], sectionParagraph = "  <2006-11-02 Thu 20:00-22:00>\n"}, subHeadings = []}]})

However, if I change the first item to scheduled:

Right (Document {documentText = "", documentHeadings = [Heading {level = Level 1, keyword = Nothing, priority = Nothing, title = "Meet Peter at the movies", stats = Nothing, tags = [], section = Section {sectionPlannings = Plns (fromList [(SCHEDULED,Timestamp {tsTime = DateTime {yearMonthDay = YMD' (YearMonthDay {ymdYear = 2006, ymdMonth = 11, ymdDay = 1}), dayName = Just "Wed", hourMinute = Just (19,15), repeater = Nothing, delay = Nothing}, tsActive = True, tsEndTime = Nothing})]), sectionClocks = [], sectionProperties = fromList [], sectionParagraph = ""}, subHeadings = []},Heading {level = Level 1, keyword = Nothing, priority = Nothing, title = "Discussion on climate change", stats = Nothing, tags = [], section = Section {sectionPlannings = Plns (fromList []), sectionClocks = [], sectionProperties = fromList [], sectionParagraph = "  <2006-11-02 Thu 20:00-22:00>\n"}, subHeadings = []}]})

Is it possible to get a tsTime structure without making the item SCHEDULED?

Markup parsers should only consider markup on word boundaries

Discovered while testing the HyperLink parser. The parser will incorrectly parse the following:

/[[https://orgmode.org/manual/Link-format.html][The Org Manual: Link format]]/

... as:

Right [Paragraph [Italic [Plain "[[https:"],Italic [Plain "orgmode.org"],Plain "manual",Italic [Plain "Link-format.html][The Org Manual: Link format]]"]]]

This should be easy to fix since formatting markup is only treated as such if the beginning sentinel character is preceded by whitespace and followed by a non-whitespace character.

Skip more spaces around syntactic terms

Hi,

This is a tricky issue when we design a parser --- to parse off allowed spaces. Today I stumbled upon this issue when I tried to parse " SCHEDULED: <...>" line with preceding spaces.

Here is a quick workaround for me: master...nushio3:issue-spaces but surely, this is too ad-hoc.

Parnell, do you have any preferred style for this? Should every syntactic terms absorb spaces before them, after them, or both?

Upon decision, we can identify where skipSpace is missing and insert them systematically.

Optionally, we can introduce some token parser combinator that creates a space-absorbing parsers, like token from https://hackage.haskell.org/package/parsers-0.12.1.1/docs/Text-Parser-Token.html#v:token .

Parse whole .org file?

Hello. I'm interested in using this to parse an entire .org file with multiple headers. Not necessarily using any more org features than what you've already implemented (timestamps, headers, and properties). Any plans to flesh out the parser for multiple headers? Or am I overlooking a trivial way that users can do this?
Thanks!

Build on nixpkgs/hackage is broken

When I'm trying to build the library from nixpkgs (for example, 88cd06d0f22cf7008e9480d804593978d5e0a45f), I'm getting this error:

Preprocessing library for orgmode-parse-0.3.0..
Building library for orgmode-parse-0.3.0..
[ 1 of 11] Compiling Data.OrgMode.Parse.Attoparsec.Util ( src/Data/OrgMode/Parse/Attoparsec/Util.hs, dist/build/Data/OrgMode/Parse/Attoparsec/Util.o )

src/Data/OrgMode/Parse/Attoparsec/Util.hs:26:1: error:
    Could not find module ‘Data.OrgMode.Parse.Attoparsec.Util.ParseLinesTill’
    Use -v to see a list of the files searched for.
   |
26 | import           Data.OrgMode.Parse.Attoparsec.Util.ParseLinesTill

At the same time when I build the project locally (after bumping LTS, otherwise stack gives me error: attribute 'ghc843' missing) it builds just alright. It also happens when I use orgmode-parse as a hackage dependency in my stack project, but If I target the last master commit on the github directly (ce152776307e3a019c2047e459a74eeab05566df), it also works. Maybe something is wrong with the particular revision uploaded to hackage?

Bold is incorrectly parsed as a header

I'll need to double-check, but I have a strong suspicion that

* My header
Text1
*TODO* Text2
Text3

will be parsed as two headers, TODO* Text2 being the second one. In emacs org-mode it would produce a single header with TODO in bold.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.