slavpetrov / berkeleyparser Goto Github PK

View Code? Open in Web Editor NEW

180.0 180.0 48.0 90.06 MB

Automatically exported from code.google.com/p/berkeleyparser

License: GNU General Public License v2.0

Java 100.00%

berkeleyparser's People

Contributors

Stargazers

Watchers

Forkers

liangkai wandernei hitochan777 joshua-decoder kevinwenya carolscarton daqulazhang mility magic282 normala danielmorozoff imclab ndnlp jhullmanuw zhangxt fheck kira0096 languagerecipes zyhxq lepikhin jess1ca sela2 shivajid qy-y chenying99 chenhuayou nlpguyz sjmielke daymos zhenyangiacas hatake0 renzhong007 liyougeng leeeeoliu habecker herrjasper chao-su wn1652400018 waterwind omardroubi quincy1994 digger3d khalidezzeldeen jiqis labmem-zhouyx yanhuihang bethchf

berkeleyparser's Issues

Unable to load model from jar resource

Currently the load method requires a file name.  This could be minimally
refactored to take an input stream as an option instead:
e.g. 
public static ParserData Load(InputStream inStream) {
    ParserData pData = null;
    try {
      GZIPInputStream gzis = new GZIPInputStream(inStream); // Compressed
      ObjectInputStream in = new ObjectInputStream(gzis); // Load objects
      pData = (ParserData)in.readObject(); // Read the mix of grammars
      in.close(); // And close the stream.
    } catch (IOException e) {
      System.out.println("IOException\n"+e);
      return null;
    } catch (ClassNotFoundException e) {
      System.out.println("Class not found!");
      return null;
    }
    return pData;
  }

  public static ParserData Load(String fileName) {
    FileInputStream fis = null;
    try {
      fis = new FileInputStream(fileName); // Load from file
      return Load(fis);
    } catch (IOException e) {
      System.out.println("IOException\n"+e);
      return null;
    }
    finally {
      try {
        if (fis != null) fis.close();
      }
      catch (IOException e) {
        e.printStackTrace();
      }
    }
  }

Original issue reported on code.google.com by [email protected] on 24 Jul 2009 at 4:36

Documentation of file formats

Where can I find documentation of the file formats used? Unfortunately I neither can find one for the .gr files in the repo nor for the file formats generated by WriteGrammarToTextFile such as .grammar (as described in README).

Though I can guess most of what's in the .grammar files I'm still a bit puzzled. I have invoked by the command

java -cp BerkeleyParser-1.7.jar edu/berkeley/nlp/PCFGLA/WriteGrammarToTextFile arb_sm5.gr arb_sm5

and in the content of arb_sm5.grammar I'm wondering:

Does @ have a special meaning or is it just an ordinary character in names? Is there a difference between @.. and non-@ names?
Does the $_1/$_0-suffix have a special meaning?

(I also couldn't find any notes on the file format in the publications COLING-ACL 2006 and HLT_NAACL 2007 that are mentioned in the README).

The reason I am asking is that I'm considering supporting .gr or .grammar input files in an own project CoPaR.

Where is getSignature?

In the readme a perl script named "getSignature" is mentioned to get the signatures for the pos-tagging of unknown words. This script is not in the repository though. Where could I find it?

multiple spaces in input without -tokenize

What steps will reproduce the problem?

Have two spaces or more between words in input

example: echo "a  b" | java -jar berkeleyParser.jar -gr eng_sm5.gr
java.lang.StringIndexOutOfBoundsException: String index out of range: 0
    at java.lang.String.charAt(String.java:687)
    at edu.berkeley.nlp.PCFGLA.SophisticatedLexicon.getSignature(Unknown Source)
    at edu.berkeley.nlp.PCFGLA.SophisticatedLexicon.getCachedSignature(Unknown
Source)
    at edu.berkeley.nlp.PCFGLA.SophisticatedLexicon.score(Unknown Source)
    at
edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.initializeChart(Unknown
Source)
    at edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.doPreParses(Unknown
Source)
    at
edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.getBestConstrainedParse(Unknow
n
Source)
    at
edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser.getBestConstrainedParse(Unknow
n
Source)
    at edu.berkeley.nlp.PCFGLA.BerkeleyParser.main(BerkeleyParser.java:190)


If there is only one space, one obtains a parse tree.
echo "a b" | java -jar berkeleyParser2.jar -gr eng_sm5.gr 
( (NP (DT a) (X (SYM b))) )

If you run the parser with tokenization (-tokenize), it works fine.

Suggestion: track the line number in the input and show it when printing
the trace. Makes debugging easier.

Original issue reported on code.google.com by [email protected] on 11 Feb 2009 at 9:15

License?

Hello! What is the license and copyright associated with this code?

Setting binarization type of a parser

What steps will reproduce the problem?

public Parser getParser(String grammarFile, Options opts) {
    double threshold = 1.0;
    ParserData pData = ParserData.Load(grammarFile);
    Grammar grammar = pData.getGrammar();
    Numberer.setNumberers(pData.getNumbs());
    Parser parser = new CoarseToFineMaxRuleParser(grammar,
pData.getLexicon(), threshold,-1,opts.viterbi, opts.substates, opts.scores,
opts.accurate, false, true, true);
    // parser.binarization = pData.getBinarization(); // HERE LIES THE ISSUE
    return parser;
}

What is the expected output? What do you see instead?

Since the 'binarization' attribute of the parser is package-level
protected, there seems to be no way of setting the binarization type.

Suggestion: create a setter for the binarization attribute.

Original issue reported on code.google.com by [email protected] on 21 Jul 2009 at 10:00

No 5 split-merge cycle grammar for English in Downloads

In the README it is advised to use a grammar with 5 split-merge cycles when 
parsing non-WSJ text. Since most of the text in the universe is actually not 
from the WSJ, it would be most useful if this 5 split-merge grammar would be 
available in the downloads.

Cheers

Original issue reported on code.google.com by [email protected] on 23 Mar 2012 at 7:02

Failing Parse

Consider the sentence, "May the odds be ever in your favor" and version 1.7 in this repository:

    # May the odds be ever in your favor
    (())
    # may the odds be ever in your favor
    ( (S (VP (MD may) (NP (DT the) (NNS odds)) (VP (VB be) (PP (ADVP (RB ever)) (IN in) (NP (PRP$ your) (NN favor)))))) )
    # May the odds be ever in your
    (())
    # may the odds be ever in your
    ( (S (VP (MD may) (NP (DT the) (NNS odds)) (VP (VB be) (PP (ADVP (RB ever)) (IN in) (NP (PRP$ your)))))) )
    # May the odds be ever in
    (())
    # may the odds be ever in
    ( (S (VP (MD may) (NP (DT the) (NNS odds)) (VP (VB be) (ADJP (RB ever) (FW in))))) )
    # May the odds be ever
    (())
    # may the odds be ever
    ( (S (VP (MD may) (NP (DT the) (NNS odds)) (VP (VB be) (ADVP (RB ever))))) )
    # May the odds be
    (())
    # may the odds be
    ( (S (VP (MD may) (NP (DT the) (NNS odds)) (VP (VB be)))) )

And now things get interesting...

    # May the odds
    ( (FRAG (NP (NNP May)) (NP (DT the) (NNS odds))) )
    # may the odds
    ( (FRAG (X (MD may)) (NP (DT the) (NNS odds))) )
    # May the
    ( (FRAG (NP (NNP May)) (X (DT the))) )
    # may the
    ( (FRAG (X (MD may)) (NP (DT the))) )
    # May
    ( (NP (NNP May)) )
    # may
    ( (X (MD may)) )

The results are the same when a period is appended to the end of the sentence.

slavpetrov / berkeleyparser Goto Github PK

berkeleyparser's People

Contributors

Stargazers

Watchers

Forkers

berkeleyparser's Issues

Unable to load model from jar resource

Documentation of file formats

Where is getSignature?

multiple spaces in input without -tokenize

License?

Setting binarization type of a parser

No 5 split-merge cycle grammar for English in Downloads

Failing Parse

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent