Coder Social home page Coder Social logo

Comments (16)

shuttie avatar shuttie commented on May 12, 2024 1

Can you try to use a full URL path for the model file instead of just metarank_model_movie.model? Something like file:///home/user/metarank.model? It seems like a bug, I will make a fix tomorrow.

from metarank.

vgoloviznin avatar vgoloviznin commented on May 12, 2024 1

@laxmimerit we're working on the fix for your case

from metarank.

shuttie avatar shuttie commented on May 12, 2024 1

It may sound ironic, but I've again fixed this new issue in https://github.com/metarank/metarank/releases/tag/0.2.5
Please, try again :)

from metarank.

laxmimerit avatar laxmimerit commented on May 12, 2024

Solved!
Since this file is stored with Git LFS. I had to download events.jsonl.gz separately from here .

from metarank.

laxmimerit avatar laxmimerit commented on May 12, 2024

Everything worked well as described in the tutorial. I got stuck at Inference step. While running the inference step

java -cp metarank-assembly-0.2.2.jar ai.metarank.mode.inference.Inference --config config.yml --model metarank_model_movie.model --redis-host localhost --format json --savepoint-dir ./output/savepoint

it is throwing following error

02:19:54.447 INFO  a.m.mode.inference.InferenceCmdline$ - Port: 8080
02:19:54.451 INFO  a.m.mode.inference.InferenceCmdline$ - Model path: metarank_model_movie.model
scala.MatchError: config.yml (of class java.lang.String)
        at ai.metarank.mode.FileLoader$.load(FileLoader.scala:14)
        at ai.metarank.mode.inference.Inference$.$anonfun$run$3(Inference.scala:35)
        at apply @ ai.metarank.mode.inference.InferenceCmdline$.$anonfun$parse$4(InferenceCmdline.scala:131)
        at map @ ai.metarank.mode.inference.InferenceCmdline$.$anonfun$parse$4(InferenceCmdline.scala:131)
        at apply @ ai.metarank.mode.inference.InferenceCmdline$.$anonfun$parse$2(InferenceCmdline.scala:130)
        at flatMap @ ai.metarank.mode.inference.InferenceCmdline$.$anonfun$parse$2(InferenceCmdline.scala:130)
        at flatMap @ ai.metarank.mode.inference.InferenceCmdline$.parse(InferenceCmdline.scala:125)
        at flatMap @ ai.metarank.mode.inference.Inference$.$anonfun$run$2(Inference.scala:34)
        at apply @ ai.metarank.mode.inference.Inference$.run(Inference.scala:33)
        at flatMap @ ai.metarank.mode.inference.Inference$.run(Inference.scala:33)

What am I missing here?

from metarank.

vgoloviznin avatar vgoloviznin commented on May 12, 2024

Hey @laxmimerit , thanks for pointing this out, I've updated the docs a bit to include the full link in the bootstrapping part

from metarank.

vgoloviznin avatar vgoloviznin commented on May 12, 2024

@laxmimerit as for the inference step, can you post your config.yml file, or did you use it directly from our repo?

from metarank.

laxmimerit avatar laxmimerit commented on May 12, 2024

from metarank.

laxmimerit avatar laxmimerit commented on May 12, 2024

config.yml file is as follows

interactions:
  - name: click
    weight: 1.0
features:
  - name: popularity
    type: number
    scope: item
    source: metadata.popularity

  - name: vote_avg
    type: number
    scope: item
    source: metadata.vote_avg

  - name: vote_cnt
    type: number
    scope: item
    source: metadata.vote_cnt

  - name: budget
    type: number
    scope: item
    source: metadata.budget

  - name: release_date
    type: number
    scope: item
    source: metadata.release_date

  - name: runtime
    type: number
    scope: item
    source: metadata.runtime

  - name: title_length
    type: word_count
    source: metadata.title
    scope: item

  - name: genre
    type: string
    scope: item
    source: metadata.genres
    values:
      - drama
      - comedy
      - thriller
      - action
      - adventure
      - romance
      - crime
      - science fiction
      - fantasy
      - family
      - horror
      - mystery
      - animation
      - history
      - music

  - name: ctr
    type: rate
    top: click
    bottom: impression
    scope: item
    bucket: 24h
    periods: [7,30]

  - name: liked_genre
    type: interacted_with
    interaction: click
    field: metadata.genres
    scope: session
    count: 10
    duration: 24h

  - name: liked_actors
    type: interacted_with
    interaction: click
    field: metadata.actors
    scope: session
    count: 10
    duration: 24h

  - name: liked_tags
    type: interacted_with
    interaction: click
    field: metadata.tags
    scope: session
    count: 10
    duration: 24h

  - name: liked_director
    type: interacted_with
    interaction: click
    field: metadata.director
    scope: session
    count: 10
    duration: 24h

  - name: visitor_click_count
    type: interaction_count
    interaction: click
    scope: session

  - name: global_item_click_count
    type: interaction_count
    interaction: click
    scope: item

  - name: day_item_click_count
    type: window_count
    interaction: click
    scope: item
    bucket: 24h
    periods: [7,30]

from metarank.

laxmimerit avatar laxmimerit commented on May 12, 2024

Yes. I had already tried it by giving full path for all files but error is same. In fact I also tried to run trained model available in test resources folder.

from metarank.

shuttie avatar shuttie commented on May 12, 2024

Should be fixed in https://github.com/metarank/metarank/releases/tag/0.2.3

Please reopen in case if you encounter any further issues.

from metarank.

laxmimerit avatar laxmimerit commented on May 12, 2024

Hi,
After the previous fix, I have started getting this new error at the first step, Data Bootstraping, itself

14:00:05.492 INFO  o.a.f.r.t.slot.TaskSlotTableImpl - Activate slot 66896173a0f00d248c21b1915f638c66.
14:00:05.511 INFO  o.a.flink.runtime.taskmanager.Task - GroupReduce (reduce(OperatorSubtaskState)) (1/1)#0 (1fc0ed9b1b03ecea41769fe96191f62c) switched from CREATED to DEPLOYING.
14:00:05.512 INFO  o.a.flink.runtime.taskmanager.Task - Loading JAR files for task GroupReduce (reduce(OperatorSubtaskState)) (1/1)#0 (1fc0ed9b1b03ecea41769fe96191f62c) [DEPLOYING].
14:00:05.511 INFO  o.a.flink.runtime.taskmanager.Task - MapPartition (0af8aae8d1672968360cf5c5b0cfd272) (1/1)#0 (708a532c53f1aa6f5e7064e134333adb) switched from DEPLOYING to INITIALIZING.
14:00:05.512 INFO  o.a.flink.runtime.taskmanager.Task - MapPartition (0af8aae8d1672968360cf5c5b0cfd272) (1/1)#0 (708a532c53f1aa6f5e7064e134333adb) switched from INITIALIZING to RUNNING.
14:00:05.513 INFO  o.a.f.r.e.ExecutionGraph - MapPartition (0af8aae8d1672968360cf5c5b0cfd272) (1/1) (708a532c53f1aa6f5e7064e134333adb) switched from DEPLOYING to INITIALIZING.
14:00:05.513 INFO  o.a.f.r.e.ExecutionGraph - MapPartition (0af8aae8d1672968360cf5c5b0cfd272) (1/1) (708a532c53f1aa6f5e7064e134333adb) switched from INITIALIZING to RUNNING.
14:00:05.521 INFO  o.a.f.r.taskexecutor.TaskExecutor - Received task CHAIN Partition -> Map (Key Remover) (1/1)#0 (db09c8277e8799fdd97e29636c52520b), deploy into slot with allocation id 66896173a0f00d248c21b1915f638c66.
14:00:05.521 INFO  o.a.flink.runtime.taskmanager.Task - GroupReduce (reduce(OperatorSubtaskState)) (1/1)#0 (1fc0ed9b1b03ecea41769fe96191f62c) switched from DEPLOYING to INITIALIZING.
14:00:05.521 INFO  o.a.flink.runtime.taskmanager.Task - GroupReduce (reduce(OperatorSubtaskState)) (1/1)#0 (1fc0ed9b1b03ecea41769fe96191f62c) switched from INITIALIZING to RUNNING.
14:00:05.522 INFO  o.a.f.r.e.ExecutionGraph - GroupReduce (reduce(OperatorSubtaskState)) (1/1) (1fc0ed9b1b03ecea41769fe96191f62c) switched from DEPLOYING to INITIALIZING.
14:00:05.525 INFO  o.a.f.r.e.ExecutionGraph - GroupReduce (reduce(OperatorSubtaskState)) (1/1) (1fc0ed9b1b03ecea41769fe96191f62c) switched from INITIALIZING to RUNNING.
14:00:05.526 ERROR o.a.f.runtime.operators.BatchTask - Error in task code:  MapPartition (6d34697451896c6270f6053830e3820a) (1/1)
java.lang.NullPointerException: null
        at org.apache.flink.state.api.output.BoundedStreamTask.cleanUpInternal(BoundedStreamTask.java:120)
        at org.apache.flink.streaming.runtime.tasks.StreamTask.runAndSuppressThrowable(StreamTask.java:1021)
        at org.apache.flink.streaming.runtime.tasks.StreamTask.cleanUp(StreamTask.java:925)
        at org.apache.flink.state.api.output.BoundedOneInputStreamTaskRunner.mapPartition(BoundedOneInputStreamTaskRunner.java:89)
        at org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:113)
        at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:519)
        at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:360)
        at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
        at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937)
        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
        at java.base/java.lang.Thread.run(Thread.java:829)

from metarank.

shuttie avatar shuttie commented on May 12, 2024

Looks like that the e2e test was missing a set -e trick in bash, so when one of subtasks failed, it still was marked as successful. Should be fixed (again!) in https://github.com/metarank/metarank/releases/tag/0.2.4, please try the new build.

from metarank.

laxmimerit avatar laxmimerit commented on May 12, 2024

Hi,
Step 1 and 2 passed successfully but this time it got stuck in Step 3, Upload.

10:42:32.155 INFO  o.a.flink.runtime.blob.BlobServer - Stopped BLOB server at 0.0.0.0:42907
10:42:32.161 INFO  o.a.f.r.rpc.akka.AkkaRpcService - Stopped Akka RPC service.
org.apache.flink.runtime.messages.FlinkJobNotFoundException: Could not find Flink job (de51f989b310d16d5839b75752aa778c)
        at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$cancelJob$8(Dispatcher.java:533)
        at java.base/java.util.Optional.orElseGet(Optional.java:369)
        at org.apache.flink.runtime.dispatcher.Dispatcher.cancelJob(Dispatcher.java:530)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316)
        at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314)
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
        at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78)
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
        at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
        at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
        at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
        at akka.actor.Actor.aroundReceive(Actor.scala:537)
        at akka.actor.Actor.aroundReceive$(Actor.scala:535)
        at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
        at akka.actor.ActorCell.invoke(ActorCell.scala:548)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
        at akka.dispatch.Mailbox.run(Mailbox.scala:231)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
        at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
        at apply @ ai.metarank.mode.AsyncFlinkJob$.$anonfun$execute$3(AsyncFlinkJob.scala:24)
        at fromCompletableFuture @ ai.metarank.mode.AsyncFlinkJob$.execute(AsyncFlinkJob.scala:16)
        at map @ ai.metarank.mode.AsyncFlinkJob$.$anonfun$execute$3(AsyncFlinkJob.scala:24)
        at apply @ ai.metarank.mode.upload.Upload$.isFinished(Upload.scala:59)
        at fromCompletableFuture @ ai.metarank.mode.AsyncFlinkJob$.execute(AsyncFlinkJob.scala:16)
        at map @ ai.metarank.mode.upload.Upload$.isFinished(Upload.scala:61)
        at flatMap @ ai.metarank.mode.upload.Upload$.$anonfun$blockUntilFinished$1(Upload.scala:50)

from metarank.

laxmimerit avatar laxmimerit commented on May 12, 2024

Hi,
Thanks for a quick fix. Successfully executed all steps from scratch. Will be testing feedback and rank API next week. Thanks.

from metarank.

vgoloviznin avatar vgoloviznin commented on May 12, 2024

Hopefully these steps will go much smoother :)

from metarank.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.