Coder Social home page Coder Social logo

Comments (7)

CrockAgile avatar CrockAgile commented on August 26, 2024 2

just to share some progress, I started looking into this today, and was able to reduce the bug test case to:

#[test]
fn parse_nested_empty() {
    let grammar: Grammar = "
    <start> ::= <a> <empty>
    <a> ::= 'a' <empty>
    <empty> ::= ''"
        .parse()
        .unwrap();

    let input = "a";

    let parses = parse(&grammar, input);
    assert_eq!(parses.count(), 1);
}

The bug seems to be caused by incorrectly de-duplicating Earley states. Because "empty" rules do not advance the input cursor, different predictions get treated as "duplicates".

I have a fix working locally, but it basically "solves" the problem by duplicating all parse attempts 😅

Next chance I get, I will brainstorm for a more realistic solution

from bnf.

nskobelevs avatar nskobelevs commented on August 26, 2024 2

Looks good here - thank you for addressing this!

from bnf.

amascolo avatar amascolo commented on August 26, 2024 1

@CrockAgile in #92 you reference this tutorial, which has a section on empty rules: https://loup-vaillant.fr/tutorials/earley-parsing/empty-rules – maybe there's something helpful there, where it discusses Aycock and Horspool's (2002) paper?

If you get any insights from fixing this issue, it might be worth including the findings in your previous write-up (thanks again for that): https://gist.github.com/CrockAgile/d8cb79607b6c6c49c06a97bd652dc219

Either way, we are in good company – as mentioned by MARPA's author:

Earley's original 1970 algorithm actually had a bug in its handling of nullables. There was an easy fix, but it made the algorithm slower. Since efficiency was already seen as the reason to prefer other parsers, this was a big deal.

from bnf.

CrockAgile avatar CrockAgile commented on August 26, 2024 1

Thank you for another example! I do have a working fix, based on the exact solution you linked. So glad we are on the same page.

But while implementing it, a surprising amount of refactoring was needed (allowing for Earley matches on "nullable productions", not just complete "States", if you are curious 😅 )

That refactor introduced a significant performance regression 😱

But I fixed that last night! So

TLDR: I should have it up for review tomorrow 🤞

from bnf.

shnewto avatar shnewto commented on August 26, 2024

ah thanks for raising this issue! I really appreciate the details for reproducing!

I'll take a look 😄

from bnf.

amascolo avatar amascolo commented on August 26, 2024

Any progress on a fix?

I may be hitting the same issue – here's another bare-bones example you can check against:

#[test]
fn test() {
    let grammar: Grammar = "
    <balanced> ::= <left> <balanced> <right>
                 | ''
    
    <left> ::= <opt-ws> '(' <opt-ws>
    <right> ::= <opt-ws> ')' <opt-ws>
    
    <opt-ws> ::= '' | <ws>
    <ws> ::= ' ' | ' ' <ws>
    "
    .parse()
    .unwrap();

    let input = "()";

    assert!(
        grammar.parse_input(input).next().is_some(),
        "can't parse: {input}"
    );
}

Greatly appreciate everybody's work so far!

Would love to use bnf as I don't see an established Rust library that supports Earley parsing.

However, this issue is a blocker for my use case 🙁

from bnf.

CrockAgile avatar CrockAgile commented on August 26, 2024

this fix should now be available under v0.4.3

but this was my first time publishing this crate, so let me know if something is broken 😅

from bnf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.