Deion The documentation does not provide a clear way to run

<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="27

hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Unclear documentation on how to properly use the POSTagger about spark-nlp HOT 9 CLOSED

rylanhalteman commented on May 9, 2024

Unclear documentation on how to properly use the POSTagger

from spark-nlp.

Comments (9)

saif-ellafi commented on May 9, 2024

Hello, thanks for reporting the issue. I am taking a look at this issue, however I can't seem to reproduce it on my end. I just copy and pasted your code (and correcting the val data line...)

Although I agree documentation should be updated, and I also think the default corpus should be used when not provided, I get:

scala> pipeline.fit(data).transform(data).show()
java.lang.Exception: Empty corpus for POS
  at com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronApproach$.retrievePOSCorpus(PerceptronApproach.scala:249)
  at com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronApproach.train(PerceptronApproach.scala:84)
  at com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronApproach.train(PerceptronApproach.scala:22)
  at com.johnsnowlabs.nlp.AnnotatorApproach.fit(AnnotatorApproach.scala:28)
  at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:153)
  at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:149)
  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
  at scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:44)
  at scala.collection.SeqViewLike$AbstractTransformed.foreach(SeqViewLike.scala:37)
  at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:149)
  ... 54 elided

from spark-nlp.

saif-ellafi commented on May 9, 2024

The problem seems to be related to resources not reading directory file names outside our test environment. We expect the list of file names inside a directory.

scala> import scala.io.Source
scala> val s = getClass.getResourceAsStream("/anc-pos-corpus")
scala> val t = Source.fromInputStream(s)("UTF-8")
scala> t.isEmpty
res4: Boolean = true

where pointing to /anc-pos-corpus/1400.txt would lead correctly.

A workaround is for now, manually pointing to src/main/resources/anc-pos-corpus using setCorpusPath() but this should definetly work out by default

Will keep investigating

from spark-nlp.

saif-ellafi commented on May 9, 2024

#49

This fix will resolve reading defaults from provided resources

from spark-nlp.

maziyarpanahi commented on May 9, 2024

Hi @saifjsl,

Is your merged commits included in version 1.2.3 on Maven? Cause in the latest version I have to copy anc-pos-corpus directory to my resources and manually pointed to what you mentioned here.

Many thanks.

from spark-nlp.

saif-ellafi commented on May 9, 2024

hey @maziyarpanahi thanks,

No, we did not make any release since that issue. We are quite busy on December but will retake activity soon :)

from spark-nlp.

maziyarpanahi commented on May 9, 2024

No problem at all mate! I was just making sure I’m not missing something :)
Thank you and have a great day.

from spark-nlp.

aleksei-ai commented on May 9, 2024

The changes are in master, so I'm closing this issue.

from spark-nlp.

rylanhalteman commented on May 9, 2024

Works for me, I haven't had the time to retest this yet. Thanks for fixing!

from spark-nlp.

petervilleroy commented on May 9, 2024

Thanks for including this in your next release, defaulting to standard Corpus a huge plus.

from spark-nlp.

Unclear documentation on how to properly use the POSTagger about spark-nlp HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent