Coder Social home page Coder Social logo

berkeleyparser's People

Contributors

slavpetrov avatar slavstudent avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

berkeleyparser's Issues

Unable to load model from jar resource

Currently the load method requires a file name.  This could be minimally
refactored to take an input stream as an option instead:
e.g. 
public static ParserData Load(InputStream inStream) {
    ParserData pData = null;
    try {
      GZIPInputStream gzis = new GZIPInputStream(inStream); // Compressed
      ObjectInputStream in = new ObjectInputStream(gzis); // Load objects
      pData = (ParserData)in.readObject(); // Read the mix of grammars
      in.close(); // And close the stream.
    } catch (IOException e) {
      System.out.println("IOException\n"+e);
      return null;
    } catch (ClassNotFoundException e) {
      System.out.println("Class not found!");
      return null;
    }
    return pData;
  }

  public static ParserData Load(String fileName) {
    FileInputStream fis = null;
    try {
      fis = new FileInputStream(fileName); // Load from file
      return Load(fis);
    } catch (IOException e) {
      System.out.println("IOException\n"+e);
      return null;
    }
    finally {
      try {
        if (fis != null) fis.close();
      }
      catch (IOException e) {
        e.printStackTrace();
      }
    }
  }


Original issue reported on code.google.com by [email protected] on 24 Jul 2009 at 4:36

Documentation of file formats

Where can I find documentation of the file formats used? Unfortunately I neither can find one for the .gr files in the repo nor for the file formats generated by WriteGrammarToTextFile such as .grammar (as described in README).

Though I can guess most of what's in the .grammar files I'm still a bit puzzled. I have invoked by the command

java -cp BerkeleyParser-1.7.jar edu/berkeley/nlp/PCFGLA/WriteGrammarToTextFile arb_sm5.gr arb_sm5

and in the content of arb_sm5.grammar I'm wondering:

  • Does @ have a special meaning or is it just an ordinary character in names? Is there a difference between @.. and non-@ names?
  • Does the $_1/$_0-suffix have a special meaning?

(I also couldn't find any notes on the file format in the publications COLING-ACL 2006 and HLT_NAACL 2007 that are mentioned in the README).

The reason I am asking is that I'm considering supporting .gr or .grammar input files in an own project CoPaR.

Where is getSignature?

In the readme a perl script named "getSignature" is mentioned to get the signatures for the pos-tagging of unknown words. This script is not in the repository though. Where could I find it?

multiple spaces in input without -tokenize

What steps will reproduce the problem?

Have two spaces or more between words in input

example: echo "a  b" | java -jar berkeleyParser.jar -gr eng_sm5.gr
java.lang.StringIndexOutOfBoundsException: String index out of range: 0
    at java.lang.String.charAt(String.java:687)
    at edu.berkeley.nlp.PCFGLA.SophisticatedLexicon.getSignature(Unknown Source)
    at edu.berkeley.nlp.PCFGLA.SophisticatedLexicon.getCachedSignature(Unknown
Source)
    at edu.berkeley.nlp.PCFGLA.SophisticatedLexicon.score(Unknown Source)
    at
edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.initializeChart(Unknown
Source)
    at edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.doPreParses(Unknown
Source)
    at
edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.getBestConstrainedParse(Unknow
n
Source)
    at
edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.getBestConstrainedParse(Unknow
n
Source)
    at edu.berkeley.nlp.PCFGLA.BerkeleyParser.main(BerkeleyParser.java:190)


If there is only one space, one obtains a parse tree.
echo "a b" | java -jar berkeleyParser2.jar -gr eng_sm5.gr 
( (NP (DT a) (X (SYM b))) )

If you run the parser with tokenization (-tokenize), it works fine.

Suggestion: track the line number in the input and show it when printing
the trace. Makes debugging easier.

Original issue reported on code.google.com by [email protected] on 11 Feb 2009 at 9:15

License?

Hello! What is the license and copyright associated with this code?

Setting binarization type of a parser

What steps will reproduce the problem?

public Parser getParser(String grammarFile, Options opts) {
    double threshold = 1.0;
    ParserData pData = ParserData.Load(grammarFile);
    Grammar grammar = pData.getGrammar();
    Numberer.setNumberers(pData.getNumbs());
    Parser parser = new CoarseToFineMaxRuleParser(grammar,
pData.getLexicon(), threshold,-1,opts.viterbi, opts.substates, opts.scores,
opts.accurate, false, true, true);
    // parser.binarization = pData.getBinarization(); // HERE LIES THE ISSUE
    return parser;
}

What is the expected output? What do you see instead?

Since the 'binarization' attribute of the parser is package-level
protected, there seems to be no way of setting the binarization type.

Suggestion: create a setter for the binarization attribute.

Original issue reported on code.google.com by [email protected] on 21 Jul 2009 at 10:00

No 5 split-merge cycle grammar for English in Downloads

In the README it is advised to use a grammar with 5 split-merge cycles when 
parsing non-WSJ text. Since most of the text in the universe is actually not 
from the WSJ, it would be most useful if this 5 split-merge grammar would be 
available in the downloads.

Cheers

Original issue reported on code.google.com by [email protected] on 23 Mar 2012 at 7:02

Failing Parse

Consider the sentence, "May the odds be ever in your favor" and version 1.7 in this repository:

    # May the odds be ever in your favor
    (())
    # may the odds be ever in your favor
    ( (S (VP (MD may) (NP (DT the) (NNS odds)) (VP (VB be) (PP (ADVP (RB ever)) (IN in) (NP (PRP$ your) (NN favor)))))) )
    # May the odds be ever in your
    (())
    # may the odds be ever in your
    ( (S (VP (MD may) (NP (DT the) (NNS odds)) (VP (VB be) (PP (ADVP (RB ever)) (IN in) (NP (PRP$ your)))))) )
    # May the odds be ever in
    (())
    # may the odds be ever in
    ( (S (VP (MD may) (NP (DT the) (NNS odds)) (VP (VB be) (ADJP (RB ever) (FW in))))) )
    # May the odds be ever
    (())
    # may the odds be ever
    ( (S (VP (MD may) (NP (DT the) (NNS odds)) (VP (VB be) (ADVP (RB ever))))) )
    # May the odds be
    (())
    # may the odds be
    ( (S (VP (MD may) (NP (DT the) (NNS odds)) (VP (VB be)))) )

And now things get interesting...

    # May the odds
    ( (FRAG (NP (NNP May)) (NP (DT the) (NNS odds))) )
    # may the odds
    ( (FRAG (X (MD may)) (NP (DT the) (NNS odds))) )
    # May the
    ( (FRAG (NP (NNP May)) (X (DT the))) )
    # may the
    ( (FRAG (X (MD may)) (NP (DT the))) )
    # May
    ( (NP (NNP May)) )
    # may
    ( (X (MD may)) )

The results are the same when a period is appended to the end of the sentence.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.