println("clustering single vectors fails")
val singleVector = mymatrix.map { point =>
try {
val prediction = kModel.predict(point)
(point.toString, prediction)
} catch {
case e: Error => println("unable to predict a single vector")
}
}
println(s"singleVector.count():${singleVector.count()}")
println("clustering using multiple vectors, this runs oke")
val predictions = kModel.predict(mymatrix)
val multipleVector = predictions.zip(mymatrix).map(point => (point._2.toString, point._1))
println(s"multipleVector.count():${multipleVector.count()}")
2015/06/18 11:10:03:300 [ERROR] [Executor task launch worker-5] org.apache.spark.Logging$class.logError:96 - Exception in task 0.0 in stage 63.0 (TID 31500)
java.lang.StackOverflowError
at com.massivedatascience.divergence.SquaredEuclideanDistanceDivergence$.convexHomogeneous (BregmanDivergence.scala:144)
at com.massivedatascience.clusterer.NonSmoothedPointCenterFactory$class.toPoint(BregmanPointO ps.scala:209)
at com.massivedatascience.clusterer.SquaredEuclideanPointOps$.toPoint(BregmanPointOps.scala:260)
at com.massivedatascience.clusterer.KMeansPredictor$class.predictWeighted(KMeansModel.scala:66)
at com.massivedatascience.clusterer.KMeansModel.predictWeighted(KMeansModel.scala:99)
This works on the MLLib kmeans implementation, however switching to massive-kmeans gives the following stackoverflowerror:
(you can switch between import statements MLLib/massivedatascience in the scala file to see the difference)