Coder Social home page Coder Social logo

Comments (9)

saif-ellafi avatar saif-ellafi commented on May 9, 2024

Hello, thanks for reporting the issue. I am taking a look at this issue, however I can't seem to reproduce it on my end. I just copy and pasted your code (and correcting the val data line...)

Although I agree documentation should be updated, and I also think the default corpus should be used when not provided, I get:

scala> pipeline.fit(data).transform(data).show()
java.lang.Exception: Empty corpus for POS
  at com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronApproach$.retrievePOSCorpus(PerceptronApproach.scala:249)
  at com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronApproach.train(PerceptronApproach.scala:84)
  at com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronApproach.train(PerceptronApproach.scala:22)
  at com.johnsnowlabs.nlp.AnnotatorApproach.fit(AnnotatorApproach.scala:28)
  at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:153)
  at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:149)
  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
  at scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:44)
  at scala.collection.SeqViewLike$AbstractTransformed.foreach(SeqViewLike.scala:37)
  at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:149)
  ... 54 elided

from spark-nlp.

saif-ellafi avatar saif-ellafi commented on May 9, 2024

The problem seems to be related to resources not reading directory file names outside our test environment. We expect the list of file names inside a directory.

scala> import scala.io.Source
scala> val s = getClass.getResourceAsStream("/anc-pos-corpus")
scala> val t = Source.fromInputStream(s)("UTF-8")
scala> t.isEmpty
res4: Boolean = true

where pointing to /anc-pos-corpus/1400.txt would lead correctly.

A workaround is for now, manually pointing to src/main/resources/anc-pos-corpus using setCorpusPath() but this should definetly work out by default

Will keep investigating

from spark-nlp.

saif-ellafi avatar saif-ellafi commented on May 9, 2024

#49

This fix will resolve reading defaults from provided resources

from spark-nlp.

maziyarpanahi avatar maziyarpanahi commented on May 9, 2024

Hi @saifjsl,

Is your merged commits included in version 1.2.3 on Maven? Cause in the latest version I have to copy anc-pos-corpus directory to my resources and manually pointed to what you mentioned here.

Many thanks.

from spark-nlp.

saif-ellafi avatar saif-ellafi commented on May 9, 2024

hey @maziyarpanahi thanks,

No, we did not make any release since that issue. We are quite busy on December but will retake activity soon :)

from spark-nlp.

maziyarpanahi avatar maziyarpanahi commented on May 9, 2024

No problem at all mate! I was just making sure I’m not missing something :)
Thank you and have a great day.

from spark-nlp.

aleksei-ai avatar aleksei-ai commented on May 9, 2024

The changes are in master, so I'm closing this issue.

from spark-nlp.

rylanhalteman avatar rylanhalteman commented on May 9, 2024

Works for me, I haven't had the time to retest this yet. Thanks for fixing!

from spark-nlp.

petervilleroy avatar petervilleroy commented on May 9, 2024

Thanks for including this in your next release, defaulting to standard Corpus a huge plus.

from spark-nlp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.