Coder Social home page Coder Social logo

parser-experiments's Introduction

Overview

experiment with different ways to parse and generate ast with optimal speed & memory. All of these below parse Python3.10 grammar.

module total allocated size time peak
pgen2 369KiB 0.08s 684KiB
pegen + regex(tokenizer) 1767KiB 0.32s 2015KiB
pegen 1234KiB 0.38s 2281.9KiB
xonsh-ply 8240.6 KiB 0.65s 10333.5KiB
lark (lalr-cached) 3753.7 KiB 0.74s 9307.3KiB
parso 3542.7 KiB 0.80s 3690.2KiB
treesitter 9137.0 KiB 1.56s 9708.7KiB
libcst 21817.KiB 6.5s 23024.4KiB

seems like both are good. easpecially pgen2 interms of memory usage and performance. but we can use pegen2 as it has a separate pypi package. We can expect some stability as Python may include more and more peg only changes

Conclusions

A. xonsh-ply

  • the existing parser is slow and uses more memory.
  • the ply codebase is a mess though rply is good and we can optimize with some care

B. pegen

  • It will be following the official parser, hence future proof
  • generates AST which we can feed directly to the interpreter
  • has big peak memory size but it gets released and will end up with optimal size
    • when regex is used to tokenize the peak memory is 2015KiB
  • I found a PR which intends to make use of pegen in place of ply

C. pgen2

  • it comes from lib2to3 package of CPython. but it will be removed in py3.13 or so ... not much future proof
    • but black-formatter has forked it and it may stick around sometime more. we can refer these packages if we decided to base our parser on this
  • but has very less memory usage and faster too for any of the tested tools here
  • Links

D. parso

  • it is a fork of pgen2
  • does error recovery of sorts and hence the high memory usage
  • we can pick some pieces from this project if we decided to use pgen2

E. treesitter

  • even with the python bindings it ended up using more memory.
  • seems like the memory is not freed as the peak memory is the same as total allocated.

Step forward

  1. implement the completion-context parser in pgen2 and pegen and compare the
    1. development time
    2. performance
    3. memory usage

Links

parser-experiments's People

Contributors

brettcannon avatar jnoortheen avatar

Watchers

 avatar  avatar

parser-experiments's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.