Coder Social home page Coder Social logo

deepjavalibrary / djl Goto Github PK

View Code? Open in Web Editor NEW
3.8K 106.0 612.0 45.73 MB

An Engine-Agnostic Deep Learning Framework in Java

Home Page: https://djl.ai

License: Apache License 2.0

Java 86.98% Python 1.72% Dockerfile 0.18% C 2.15% C++ 4.18% Shell 0.40% ANTLR 0.26% HTML 1.05% CSS 0.14% JavaScript 0.30% CMake 0.79% Batchfile 0.07% Rust 0.95% Scala 0.83%
deep-learning neural-network ai java mxnet machine-learning deep-neural-networks ml autograd djl

djl's People

Contributors

aksrajvanshi avatar anfee1 avatar azizzayed avatar bryanktliu avatar carkham avatar chenkelmann avatar dandansamax avatar demq avatar ebamberg avatar enpasos avatar ewan0x79 avatar frankfliu avatar goswamig avatar ivybazan avatar johndoll2023 avatar keerthanvasist avatar kexinfeng avatar kimim avatar lanking520 avatar markbookk avatar patins1 avatar roywei avatar siddvenk avatar sindhuvahinis avatar stu1130 avatar viclzhu avatar vrakesh avatar warthecatalyst avatar xyang16 avatar zachgk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

djl's Issues

Ability to load modules from TensorFlow Hub

Description

Please add the ability to load modules from TensorFlow Hub. For example, I'd want to be able to download, save and use their universal-sentence-encoder model.

LSTM example with training and inference

Description

can you provide an LSTM example code with training and inference, preferably with time series data

Will this change the current api? How?

no

Who will benefit from this feature?

any one who wants to build an LSTM model using the api

References

  • list reference and related literature
  • list known implementations

Reshape exception during NDArray.set()

Description

DJL seems to complain when trying to set multiple elements via function when using NDIndex slicing.
I was expecting the example below to apply sigmoid function to elements 2-3 in each array
Setting just a number works though:

        array.set(new NDIndex(":, 2:"), 2);

Error Message

ND: (2, 4) gpu(0) float32
[[7.4022, 9.2099, 0.3902, 9.6896],
 [9.2514, 4.4635, 6.6732, 1.0993],
]

Exception in thread "main" ai.djl.engine.EngineException: MXNet engine call failed: MXNetError: Check failed: src.Size() == dst->Size() (4 vs. 0) : Cannot reshape array of size 4 into shape [2,0]
Stack trace:
  File "C:\source\mxnet-distro\mxnet-build\src\operator\numpy\np_matrix_op.cc", line 145

	at ai.djl.mxnet.jna.JnaUtils.checkCall(JnaUtils.java:1788)
	at ai.djl.mxnet.jna.JnaUtils.imperativeInvoke(JnaUtils.java:500)
	at ai.djl.mxnet.jna.FunctionInfo.invoke(FunctionInfo.java:82)
	at ai.djl.mxnet.jna.FunctionInfo.invoke(FunctionInfo.java:66)
	at ai.djl.mxnet.engine.MxNDManager.invoke(MxNDManager.java:329)
	at ai.djl.mxnet.engine.MxNDManager.invoke(MxNDManager.java:347)
	at ai.djl.mxnet.engine.MxNDArray.reshape(MxNDArray.java:1167)
	at ai.djl.mxnet.engine.MxNDArray.set(MxNDArray.java:348)
	at ai.djl.ndarray.NDArray.set(NDArray.java:472)
	at Main.main(Main.java:11)

Steps to reproduce

    public static void main(String[] args) {
        NDManager manager = NDManager.newBaseManager();
        NDArray array = manager.randomUniform(0, 10, new Shape(2, 4));
        System.out.println(array);
        array.set(new NDIndex(":, 2:"), arr -> arr.getNDArrayInternal().sigmoid());
        System.out.println(array);
    }

Training as listener

Description

Implement training as listener, so we can subscribe to training listener. We need more information to each of the provided listeners:

  • normalize/de-normalize class (separate feature request)
  • train data set
  • test data set
  • last best model
  • last best model score

In this way we can do whatever we want:

  • calculate prediction from best model data set with our test data set,
  • de-normalize test data set and compare with de-normalized predicted data set
  • create our own evaluation with real vs. predicted data
  • create evaluation on train data
  • create evaluation on test data
  • on complete we want latest best model (with best score), the sam onEpoch listener
  • on completion listener we can save training model for later use - transfer learning
  • combined with early stop configuration - so we can check which termination condition stop training (separate feature request)

Will this change the current api? How?

Yes, training should be implemented as listener, so we can subscribe to listener.
Interface should have:
onStart
onEpoch
onCompletion

Who will benefit from this feature?

Everybody, subscribe to the listener and we have all information we need

References

Reference implementation: https://github.com/eclipse/deeplearning4j/blob/b5f0ec072f3fd0da566e32f82c0e43ca36553f39/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/listener/EarlyStoppingListener.java
but we need more information inside each of onStart, onEpoch and onCompletion method - see above

Remove no-op {@inheritDoc}

I noticed multiple instances where the code contains the following Javadoc: /** {@inheritDoc} */

This behavior is already the default. Try removing the Javadoc and regenerating the documentation and you will notice that nothing has changed.

The only time you need to use {@inheritDoc} is if you plan to augment the inherited Javadoc, which in the above case you do not.

I suggest removing such instances from the source-code.

Cannot build without testing

Description

To quickly compile the artifact sometimes I want to skip tests while building, but this is prohibited in by the gradle task ':integration:jacocoTestReport'.

Expected Behavior

The project should be able to compile without any test

Error Message

$ ./gradlew build -x test

> Configure project :basicdataset
GPU 0: GeForce RTX 2070 (UUID: GPU-ccda497c-7a55-7df7-49ad-b68e31743286)

> Configure project :examples
GPU 0: GeForce RTX 2070 (UUID: GPU-ccda497c-7a55-7df7-49ad-b68e31743286)

> Configure project :integration
GPU 0: GeForce RTX 2070 (UUID: GPU-ccda497c-7a55-7df7-49ad-b68e31743286)

> Configure project :mxnet:mxnet-engine
[WARN ] Header file has been changed in open source project: mxnet/c_api.h.

> Task :integration:jacocoTestReport FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':integration:jacocoTestReport'.
> Unable to read execution data file /home/peng/git-release/djl/integration/build/jacoco/test.exec

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

* Get more help at https://help.gradle.org

Deprecated Gradle features were used in this build, making it incompatible with Gradle 7.0.
Use '--warning-mode all' to show the individual deprecation warnings.
See https://docs.gradle.org/6.0/userguide/command_line_interface.html#sec:command_line_warnings

BUILD FAILED in 1m 4s
66 actionable tasks: 32 executed, 34 up-to-date

How to Reproduce?

just running ./gradlew build -x test under project

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. ./gradlew build -x test

What have you tried to solve it?

Environment Info

Please provide the following information:

  • Operating System: Ubuntu 18.04 LTS
  • Hardware(Machine) Info:
  • CUDA version(if available): 10.0
  • Deep Java Library version: 0.2.0-SNAPSHOT
  • MXNet version: haven't installed

Good visualization

Description

Visual representation of neural network, model score, parameter ratios, learning parameter,...
All aspect of neural network parameter that can be visualized in timely manner (iteration).

Will this change the current api? How?

Present already provided parameters from network, visually.

Who will benefit from this feature?

Everybody, with visualization you can quickly see network over fitting, under fitting, network behavior. It makes easier to determine right network parameter and right network architecture.

References

Good example of very good visualization (web based, local or remote): https://deeplearning4j.konduit.ai/tuning-and-training/visualization

Clarity on what the parameters do for MxModelZoo.BERT_QA

Description

The "BBC Japan" example is great, but doesn't quite give enough info for the user to start experimenting on their own.

  1. How much text can go into the paragraph variable?
  2. Are there other datasets available beyond "book_corpus_wiki_en_uncased"?
  3. (most importantly) What is the magic "384" number in QAInput(question, paragraph, 384) - should it be adjusted up or down, and when?
  4. Is there a way to see any of the details (embeddings, or uncertainty, or answers that were below a confidence score), or is a CSV string the only output of a call to predict?

Will this change the current api? How?

No.

Who will benefit from this feature?

New users.

References

examples/docs/BERT_question_and_answer.md

TensorFlow support in DJL and NDArrays

Hi,

This is not really a feature request but I thought it would be easy to reach out to the DJL community here, so please close this issue whenever you wish.

I represent the official group responsible of maintaining and enhancing the support of TensorFlow on the JVM. We have just heard of your initiative and we are very excited about it. We would like to know if there is anything we can do to help with the integration of TensorFlow in DJL.

Also, we would like to open a discussion about the NDArray standardization. There is already a few implementations of this interface available on the market (e.g. MXNet has one, DL4J has one, we have just created one and now AWS has one). To improve portability between various frameworks and libraries, we believe that such an interface should eventually end up in the JDK itself and it would be a good candidate for a JSR/JEP. It would be interesting to see all parties actually involved in the development of a "NumPy equivalent" for Java to agree on a common interface that could then be proposed to the Java community, on top of which higher-level APIs can be built.

If you are interested, it is possible to reach us directly on one the following channels:

Google Group: [email protected]
Gitter: tensorflow/sig-jvm
GitHub: https://github.com/tensorflow/java

Thanks, hoping to hear from you soon,

Karl

Build failed cause mxnet.dll is not a valid Win32 application

Hello, guys. Could you help me?
When I tried to build app I've got an error
"Unable to load library 'C:\Users\Default.mxnet\cache\1.6.0-b-SNAPSHOT-20200120mkl-win-x86_64\mxnet.dll':
2020-01-23T03:20:04.060+0300 [ERROR] [system.err] %1 is not a valid Win32 application."

Command for build is "./gradlew run -Dmain=ai.djl.examples.training.TrainMnist"

It's strange, because I have the environment showed below:

java version "1.8.0_241"
Java(TM) SE Runtime Environment (build 1.8.0_241-b07)
Java HotSpot(TM) 64-Bit Server VM (build 25.241-b07, mixed mode)

Microsoft Windows [Version 10.0.17763.864]
OS Name Microsoft Windows 10 Enterprise LTSC
Version 10.0.17763 Build 17763
System Type x64-based PC

Gradle 6.0.1
Build time: 2019-11-18 20:25:01 UTC
Revision: fad121066a68c4701acd362daf4287a7c309a0f5

Kotlin: 1.3.50
Groovy: 2.5.8
Ant: Apache Ant(TM) version 1.10.7 compiled on September 1 2019
JVM: 1.8.0_162 (Oracle Corporation 25.162-b12)
OS: Windows 10 10.0 amd64

Load libmxnet.so from jar

Description

Can you please allow loading native library from jar? I didn't find a way to do it. But you could you NativeUtils for loading libraries from jar. And e.g. expose some system variable for specifying path in jar.

Will this change the current api? How?

No

Who will benefit from this feature?

End users

Feature Request: Add Mish Activation

Mish is a novel activation function proposed in this paper.
It has shown promising results so far and has been adopted in several packages including:

TensorFlow-Addons SpaCy (Tok2Vec Layer) Thinc - SpaCy's official NLP based ML library
Eclipse's deeplearning4j Hasktorch Echo AI
CNTKX - Extension of Microsoft's CNTK FastAI-Dev Darknet
Yolov3 BeeDNN - Library in C++ Gen-EfficientNet-PyTorch
dnet ruby-dnn blackcat-tensors
DL4S HuggingFace Transformers PAGI
OpenCV Odin-AI Mini DNN
Efficient Segmentation Networks

All benchmarks, analysis and links to official package implementations can be found in this repository

Mish also was recently used for a submission on the Stanford DAWN Cifar-10 Training Time Benchmark where it obtained 94% accuracy in just 10.7 seconds which is the current best score on 4 GPU and second fastest overall. Additionally, Mish has shown to improve convergence rate by requiring less epochs. Reference -

0 (2)

Mish also has shown consistent improved ImageNet scores and is more robust. Reference -

0

Additional ImageNet benchmarks along with Network architectures and weights are avilable on my repository.

Summary of Vision related results:

Capture

It would be nice to have Mish as an option within the activation function group.

This is the comparison of Mish with other conventional activation functions in a SEResNet-50 for CIFAR-10:
se50_1

Possibly unintentional inclusion of logging framework

I'm newish to maven, but I don't think this is normally what is done:
from mvn dependency:tree

[INFO] +- ai.djl:examples:jar:0.2.1:compile
[INFO] |  +- commons-cli:commons-cli:jar:1.4:runtime
[INFO] |  +- org.apache.logging.log4j:log4j-slf4j-impl:jar:2.12.1:runtime
[INFO] |  |  +- org.apache.logging.log4j:log4j-api:jar:2.12.1:runtime
[INFO] |  |  \- org.apache.logging.log4j:log4j-core:jar:2.12.1:runtime

Because it gives off warnings like

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/ME/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.12.1/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/ME/.m2/repository/org/slf4j/slf4j-simple/1.7.30/slf4j-simple-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

All the other ai.djl:* dependencies look ok.

Data normalization/de-normalization

Description

Create normalize class which has two method:

  • normalize - to normalize number to fit network model (range: 0 to 1, range: -1 to 1)
  • de-normalize number - de-normalize number when network training is finished (onEpoch or onComplete listeners)

Normalize object must also have parameter - min, max (what is the minimum and maximum number of our number range) and interval (interval: 0 to 1, interval: -1 to 1)

Example:
we have range of real numbers that needs to be normalized: [1, 5, 7, 12, 16, 19, 23, 3, 6, 33]
Normalize class will have:

  • normalize range interval (if we want range from 0 to 1, or -1 to 1)
  • minimum, which is in this case 1
  • maximum, which is in this case 33

With all this information, we can normalize number and it will be prepared for train/test model.
Each number, which enter network input as normalized number will have normalized class defined.
On training we can easily de-normalize every number and compare it with our test data set (which also needs to be de-normalized)

Will this change the current api? How?

Yes, Normalization should be part of data set. Each number in INDArray should have also normalization object. So each number that comes in network input is normalized - 0 to 1 or -1 to 1. Also when network is training and we use listener, we can easily de-normalize number in the data set - predicted numbers can easily be de-normalized and then compared with de-normalized test data set.

Who will benefit from this feature?

Everybody, normalized data set will be simplified with provided normalization/de-normalization of numbers which enters network input model and also when network training is in progress

References

Example: https://github.com/eclipse/deeplearning4j/blob/b5f0ec072f3fd0da566e32f82c0e43ca36553f39/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/dataset/api/preprocessor/MultiDataNormalization.java
I think this one is not good enough, it is simple normalization

Inference in multi-gpu instances fail

Description

When I try to run inference on pre-trained embeddings while training NLP models, I see NullPointerException as Predictor is not designed to work with multiple devices.

Error Message

[INFO ] - Load MXNet Engine Version 1.7.0 in 0.181 ms.
[INFO ] - forward P50: 3.519 ms, P90: 3.519 ms
[INFO ] - training-metrics P50: 0.048 ms, P90: 0.048 ms
[INFO ] - backward P50: 1.721 ms, P90: 1.721 ms
Exception in thread "main" java.lang.NullPointerException
        at ai.djl.training.ParameterStore.getValue(ParameterStore.java:105)
        at ai.djl.nn.core.Embedding.opInputs(Embedding.java:257)
        at ai.djl.nn.core.Embedding.forward(Embedding.java:162)
        at ai.djl.nn.Block.forward(Block.java:118)
        at ai.djl.inference.Predictor.predict(Predictor.java:117)
        at ai.djl.inference.Predictor.batchPredict(Predictor.java:144)
        at ai.djl.inference.Predictor.predict(Predictor.java:112)
        at ai.djl.modality.nlp.embedding.ModelZooTextEmbedding.embedText(ModelZooTextEmbedding.java:57)
        at ai.djl.examples.training.TrainSentimentAnalysis$EmbeddingDataManager.getData(TrainSentimentAnalysis.java:277)
        at ai.djl.training.Trainer.trainBatch(Trainer.java:159)
        at ai.djl.examples.training.util.TrainingUtils.fit(TrainingUtils.java:36)
        at ai.djl.examples.training.TrainSentimentAnalysis.runExample(TrainSentimentAnalysis.java:133)
        at ai.djl.examples.training.TrainSentimentAnalysis.main(TrainSentimentAnalysis.java:89)
        Suppressed: java.lang.IllegalArgumentException: Metric name not found: step
                at ai.djl.metric.Metrics.percentile(Metrics.java:135)
                at ai.djl.training.listener.LoggingTrainingListener.onTrainingEnd(LoggingTrainingListener.java:167)
                at ai.djl.training.Trainer.lambda$close$5(Trainer.java:349)
                at java.util.ArrayList.forEach(ArrayList.java:1257)
                at ai.djl.training.Trainer.close(Trainer.java:349)
                at ai.djl.examples.training.TrainSentimentAnalysis.runExample(TrainSentimentAnalysis.java:159)
                ... 1 more

Bug when trying to load multiple engines in DJL

There are a few bugs in DJL right now when you try to use multiple engines.

For example if we use MXNet and TensorFlow engine together. If I set -Dai.djl.default_engine=MXNet, and call a TfEngine or TfModelZoo method, MxEngine and MxModelZoo is actually returned.

  1. TfEngine.getInstance() will return Default Engine instead of TfEngine.

  2. TfModelZoo.RESNET.loadModel() will return MxModel.RESNET if default engine is MXNet, will return PyModel.RESNET if default engine is Pytorch. But user already specified to use TfModelZoo

  3. In Criteria.builder() the .optEngine("TensorFlow") option is not used by ModelZoo during loading model.

Right now the 2 ways to work with multiple engines are:

  1. manually switch engine by setting system property ai.djl.default_engine back and forth
  2. use newInstance with engineName will return the correct implementation under that engine:
    Model tfModel = Model.newInstance(Device.defaultDevice(), "TensorFlow")

NDIndex does not support -1

There is a bug when trying to get "-1" from NDIndex, can be reproduced by change the following test:

    @Test
    public void testGet() {
        try (NDManager manager = NDManager.newBaseManager()) {
            NDArray original = manager.create(new float[] {1f, 2f, 3f, 4f}, new Shape(2, 2));
            Assert.assertEquals(original.get(new NDIndex()), original);
            original.get("-1");

Work around:
use get(":-1") works

strack trace:

    [ERROR] - Test ai.djl.integration.tests.ndarray.NDArrayOtherOpTest.testGet FAILED
    [ERROR] -
    ai.djl.engine.EngineException: MXNet engine call failed: MXNetError: Check failed: dshape[axes[i]] == 1 (0 vs. 1) : cannot select an axis to squeeze out which has size=0 not equal to one
    Stack trace:
      File "../src/operator/numpy/np_matrix_op.cc", line 438

        at ai.djl.mxnet.jna.JnaUtils.checkCall(JnaUtils.java:1788) ~[mxnet-engine-0.6.0-SNAPSHOT.jar:?]
        at ai.djl.mxnet.jna.JnaUtils.imperativeInvoke(JnaUtils.java:500) ~[mxnet-engine-0.6.0-SNAPSHOT.jar:?]
        at ai.djl.mxnet.jna.FunctionInfo.invoke(FunctionInfo.java:82) ~[mxnet-engine-0.6.0-SNAPSHOT.jar:?]
        at ai.djl.mxnet.jna.FunctionInfo.invoke(FunctionInfo.java:66) ~[mxnet-engine-0.6.0-SNAPSHOT.jar:?]
        at ai.djl.mxnet.engine.MxNDManager.invoke(MxNDManager.java:319) ~[mxnet-engine-0.6.0-SNAPSHOT.jar:?]
        at ai.djl.mxnet.engine.MxNDManager.invoke(MxNDManager.java:337) ~[mxnet-engine-0.6.0-SNAPSHOT.jar:?]
        at ai.djl.mxnet.engine.MxNDArray.squeeze(MxNDArray.java:1200) ~[mxnet-engine-0.6.0-SNAPSHOT.jar:?]
        at ai.djl.mxnet.engine.MxNDArray.get(MxNDArray.java:431) ~[mxnet-engine-0.6.0-SNAPSHOT.jar:?]
        at ai.djl.ndarray.NDArray.get(NDArray.java:500) ~[api-0.6.0-SNAPSHOT.jar:?]
        at ai.djl.integration.tests.ndarray.NDArrayOtherOpTest.testGet(NDArrayOtherOpTest.java:33) ~[main/:?]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_231]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_231]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_231]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231]
        at ai.djl.integration.IntegrationTest$TestClass.runTest(IntegrationTest.java:350) [main/:?]
        at ai.djl.integration.IntegrationTest.runTests(IntegrationTest.java:111) [main/:?]
        at ai.djl.integration.IntegrationTest.runTests(IntegrationTest.java:80) [main/:?]
        at ai.djl.integration.IntegrationTests.runIntegrationTests(IntegrationTests.java:23) [test/:?]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_231]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_231]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_231]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231]
        at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:84) [testng-6.8.1.jar:?]
        at org.testng.internal.Invoker.invokeMethod(Invoker.java:714) [testng-6.8.1.jar:?]
        at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901) [testng-6.8.1.jar:?]
        at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231) [testng-6.8.1.jar:?]
        at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127) [testng-6.8.1.jar:?]
        at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111) [testng-6.8.1.jar:?]
        at org.testng.TestRunner.privateRun(TestRunner.java:767) [testng-6.8.1.jar:?]
        at org.testng.TestRunner.run(TestRunner.java:617) [testng-6.8.1.jar:?]
        at org.testng.SuiteRunner.runTest(SuiteRunner.java:334) [testng-6.8.1.jar:?]
        at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329) [testng-6.8.1.jar:?]
        at org.testng.SuiteRunner.privateRun(SuiteRunner.java:291) [testng-6.8.1.jar:?]
        at org.testng.SuiteRunner.run(SuiteRunner.java:240) [testng-6.8.1.jar:?]
        at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52) [testng-6.8.1.jar:?]
        at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86) [testng-6.8.1.jar:?]
        at org.testng.TestNG.runSuitesSequentially(TestNG.java:1224) [testng-6.8.1.jar:?]
        at org.testng.TestNG.runSuitesLocally(TestNG.java:1149) [testng-6.8.1.jar:?]
        at org.testng.TestNG.run(TestNG.java:1057) [testng-6.8.1.jar:?]
        at org.gradle.api.internal.tasks.testing.testng.TestNGTestClassProcessor.runTests(TestNGTestClassProcessor.java:141) [gradle-testing-jvm-6.4.1.jar:6.4.1]
        at org.gradle.api.internal.tasks.testing.testng.TestNGTestClassProcessor.stop(TestNGTestClassProcessor.java:90) [gradle-testing-jvm-6.4.1.jar:6.4.1]
        at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.stop(SuiteTestClassProcessor.java:61) [gradle-testing-base-6.4.1.jar:6.4.1]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_231]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_231]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_231]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231]
        at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) [gradle-messaging-6.4.1.jar:6.4.1]
        at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) [gradle-messaging-6.4.1.jar:6.4.1]
        at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) [gradle-messaging-6.4.1.jar:6.4.1]
        at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) [gradle-messaging-6.4.1.jar:6.4.1]
        at com.sun.proxy.$Proxy2.stop(Unknown Source) [?:?]
        at org.gradle.api.internal.tasks.testing.worker.TestWorker.stop(TestWorker.java:132) [gradle-testing-base-6.4.1.jar:6.4.1]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_231]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_231]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_231]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231]
        at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) [gradle-messaging-6.4.1.jar:6.4.1]
        at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) [gradle-messaging-6.4.1.jar:6.4.1]
        at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182) [gradle-messaging-6.4.1.jar:6.4.1]
        at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164) [gradle-messaging-6.4.1.jar:6.4.1]
        at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:413) [gradle-messaging-6.4.1.jar:6.4.1]
        at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64) [gradle-base-services-6.4.1.jar:6.4.1]
        at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48) [gradle-base-services-6.4.1.jar:6.4.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_231]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_231]
        at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56) [gradle-base-services-6.4.1.jar:6.4.1]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_231]

ModelNotFound Exception while loading SSD Model

Description

I am attempting to reproduce this ObjectDetection example in Scala. However, when I specify the model criteria as illustrated in the example code, I get an error that the model is not found.

Expected Behavior

Model should be loaded and objects in image should be detected as in example

Error Message

See error message below:

[[A[2019-12-15 12:38:57,736] [ERROR] [akka.actor.SupervisorStrategy] [HelloAkkaHttpServer-akka.actor.default-dispatcher-13] [akka://HelloAkkaHttpServer/user] - Model not found.
ai.djl.repository.zoo.ModelNotFoundException: Model not found.
        at ai.djl.repository.zoo.BaseModelLoader.loadModel(BaseModelLoader.java:71)
        at ai.djl.repository.zoo.ModelLoader.loadModel(ModelLoader.java:84)

How to Reproduce?

To reproduce:
build.sbt file

libraryDependencies += "ai.djl" % "api" % "0.2.0"
libraryDependencies += "ai.djl" % "model-zoo" % "0.2.0"

sbt import log

[error] (update) sbt.librarymanagement.ResolveException: Error downloading ai.djl.mxnet:basicdataset:0.2.0
[error]   Not found
[error]   Not found
[error]   not found: /Users/olalekanelesin/.ivy2/local/ai.djl.mxnet/basicdataset/0.2.0/ivys/ivy.xml
[error]   not found: https://repo1.maven.org/maven2/ai/djl/mxnet/basicdataset/0.2.0/basicdataset-0.2.0.pom
[error] Error downloading ai.djl.mxnet:examples:0.2.0
[error]   Not found
[error]   Not found
[error]   not found: /Users/olalekanelesin/.ivy2/local/ai.djl.mxnet/examples/0.2.0/ivys/ivy.xml
[error]   not found: https://repo1.maven.org/maven2/ai/djl/mxnet/examples/0.2.0/examples-0.2.0.pom
[error] Error downloading ai.djl.mxnet:mxnet-native-mkl:0.2.0
[error]   Not found
[error]   Not found
[error]   not found: /Users/olalekanelesin/.ivy2/local/ai.djl.mxnet/mxnet-native-mkl/0.2.0/ivys/ivy.xml
[error]   not found: https://repo1.maven.org/maven2/ai/djl/mxnet/mxnet-native-mkl/0.2.0/mxnet-native-mkl-0.2.0.pom
[error] (ssExtractDependencies) sbt.librarymanagement.ResolveException: Error downloading ai.djl.mxnet:basicdataset:0.2.0
[error]   Not found
[error]   Not found
[error]   not found: /Users/olalekanelesin/.ivy2/local/ai.djl.mxnet/basicdataset/0.2.0/ivys/ivy.xml
[error]   not found: https://repo1.maven.org/maven2/ai/djl/mxnet/basicdataset/0.2.0/basicdataset-0.2.0.pom
[error] Error downloading ai.djl.mxnet:examples:0.2.0
[error]   Not found
[error]   Not found
[error]   not found: /Users/olalekanelesin/.ivy2/local/ai.djl.mxnet/examples/0.2.0/ivys/ivy.xml
[error]   not found: https://repo1.maven.org/maven2/ai/djl/mxnet/examples/0.2.0/examples-0.2.0.pom
[error] Error downloading ai.djl.mxnet:mxnet-native-mkl:0.2.0
[error]   Not found
[error]   Not found
[error]   not found: /Users/olalekanelesin/.ivy2/local/ai.djl.mxnet/mxnet-native-mkl/0.2.0/ivys/ivy.xml
[error]   not found: https://repo1.maven.org/maven2/ai/djl/mxnet/mxnet-native-mkl/0.2.0/mxnet-native-mkl-0.2.0.pom
[error] Total time: 3 s, completed Dec 15, 2019 1:07:13 PM
import scala.collection.JavaConverters._
val img = BufferedImageUtils.fromUrl(inputImageUrl)
val criteria = Map(
      "size" -> "512",
      "backbone" -> "resnet50",
      "flavor" -> "v1",
      "dataset" -> "voc"
).asJava

try {
      val model = ModelZoo.SSD.loadModel(criteria, new ProgressBar())
      val predictor = model.newPredictor()
      val detectedObjects: DetectedObjects = predictor.predict(img)
      detectedObjects
}

Steps to reproduce

(Paste the commands you ran that produced the error.)

What have you tried to solve it?

  1. Checked online but no errors similar to this
  2. Tried to check for the model zoo, but nothing of such is found.

Environment Info

Please provide the following information:

  • Operating System: MacOs Mojave
  • Hardware(Machine) Info:
  • CUDA version(if available): none
  • Deep Java Library version:
  • MXNet version:

Running the examples in their own project (maven)

Description

Unclear which dependencies are required to run the examples, because the examples are hosted in the same module.

Expected Behavior

Example pom.xml that allows an example to run.

Error Message

Exception in thread "main" java.util.ServiceConfigurationError: ai.djl.engine.EngineProvider: Provider ai.djl.mxnet.engine.MxEngineProvider could not be instantiated
	at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:581)
	at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:803)
	at java.base/java.util.ServiceLoader$ProviderImpl.get(ServiceLoader.java:721)
	at java.base/java.util.ServiceLoader$3.next(ServiceLoader.java:1394)
	at ai.djl.engine.Engine.initEngine(Engine.java:47)
	at ai.djl.engine.Engine.<clinit>(Engine.java:42)
	at ai.djl.Model.newInstance(Model.java:69)
	at ai.djl.repository.zoo.BaseModelLoader.loadModel(BaseModelLoader.java:96)
	at ai.djl.repository.zoo.BaseModelLoader.loadModel(BaseModelLoader.java:85)
	at ai.djl.repository.zoo.ModelLoader.loadModel(ModelLoader.java:84)
	at org.example.HelloKt.predictAnswer(Hello.kt:48)
	at org.example.HelloKt.main(Hello.kt:72)
	at org.example.HelloKt.main(Hello.kt)
Caused by: java.lang.UnsatisfiedLinkError: Unable to load library 'mxnet':
dlopen(libmxnet.dylib, 9): image not found
dlopen(libmxnet.dylib, 9): image not found
Native library (darwin/libmxnet.dylib) not found in resource path (/Users/MYNAME/Desktop/workspace/untitled/target/classes:/Users/MYNAME/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib-jdk8/1.3.61/kotlin-stdlib-jdk8-1.3.61.jar:/Users/MYNAME/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib/1.3.61/kotlin-stdlib-1.3.61.jar:/Users/MYNAME/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib-common/1.3.61/kotlin-stdlib-common-1.3.61.jar:/Users/MYNAME/.m2/repository/org/jetbrains/annotations/13.0/annotations-13.0.jar:/Users/MYNAME/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib-jdk7/1.3.61/kotlin-stdlib-jdk7-1.3.61.jar:/Users/MYNAME/.m2/repository/ai/djl/api/0.2.1/api-0.2.1.jar:/Users/MYNAME/.m2/repository/ai/djl/basicdataset/0.2.1/basicdataset-0.2.1.jar:/Users/MYNAME/.m2/repository/ai/djl/repository/0.2.1/repository-0.2.1.jar:/Users/MYNAME/.m2/repository/com/google/code/gson/gson/2.8.5/gson-2.8.5.jar:/Users/MYNAME/.m2/repository/ai/djl/model-zoo/0.2.1/model-zoo-0.2.1.jar:/Users/MYNAME/.m2/repository/ai/djl/mxnet/mxnet-model-zoo/0.2.1/mxnet-model-zoo-0.2.1.jar:/Users/MYNAME/.m2/repository/ai/djl/mxnet/mxnet-engine/0.2.1/mxnet-engine-0.2.1.jar:/Users/MYNAME/.m2/repository/net/java/dev/jna/jna/5.3.0/jna-5.3.0.jar:/Users/MYNAME/.m2/repository/org/slf4j/slf4j-api/1.7.30/slf4j-api-1.7.30.jar:/Users/MYNAME/.m2/repository/org/slf4j/slf4j-simple/1.7.30/slf4j-simple-1.7.30.jar)
	at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:302)
	at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:455)
	at com.sun.jna.Library$Handler.<init>(Library.java:192)
	at com.sun.jna.Native.load(Native.java:596)
	at com.sun.jna.Native.load(Native.java:570)
	at ai.djl.mxnet.jna.LibUtils.loadLibrary(LibUtils.java:80)
	at ai.djl.mxnet.jna.JnaUtils.<clinit>(JnaUtils.java:68)
	at ai.djl.mxnet.engine.MxEngine.<init>(MxEngine.java:36)
	at ai.djl.mxnet.engine.MxEngineProvider.<clinit>(MxEngineProvider.java:21)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
	at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:779)
	... 11 more
	Suppressed: java.lang.UnsatisfiedLinkError: dlopen(libmxnet.dylib, 9): image not found
		at com.sun.jna.Native.open(Native Method)
		at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:191)
		... 24 more
	Suppressed: java.lang.UnsatisfiedLinkError: dlopen(libmxnet.dylib, 9): image not found
		at com.sun.jna.Native.open(Native Method)
		at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:204)
		... 24 more
	Suppressed: java.io.IOException: Native library (darwin/libmxnet.dylib) not found in resource path (/Users/MYNAME/Desktop/workspace/untitled/target/classes:/Users/MYNAME/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib-jdk8/1.3.61/kotlin-stdlib-jdk8-1.3.61.jar:/Users/MYNAME/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib/1.3.61/kotlin-stdlib-1.3.61.jar:/Users/MYNAME/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib-common/1.3.61/kotlin-stdlib-common-1.3.61.jar:/Users/MYNAME/.m2/repository/org/jetbrains/annotations/13.0/annotations-13.0.jar:/Users/MYNAME/.m2/repository/org/jetbrains/kotlin/kotlin-stdlib-jdk7/1.3.61/kotlin-stdlib-jdk7-1.3.61.jar:/Users/MYNAME/.m2/repository/ai/djl/api/0.2.1/api-0.2.1.jar:/Users/MYNAME/.m2/repository/ai/djl/basicdataset/0.2.1/basicdataset-0.2.1.jar:/Users/MYNAME/.m2/repository/ai/djl/repository/0.2.1/repository-0.2.1.jar:/Users/MYNAME/.m2/repository/com/google/code/gson/gson/2.8.5/gson-2.8.5.jar:/Users/MYNAME/.m2/repository/ai/djl/model-zoo/0.2.1/model-zoo-0.2.1.jar:/Users/MYNAME/.m2/repository/ai/djl/mxnet/mxnet-model-zoo/0.2.1/mxnet-model-zoo-0.2.1.jar:/Users/MYNAME/.m2/repository/ai/djl/mxnet/mxnet-engine/0.2.1/mxnet-engine-0.2.1.jar:/Users/MYNAME/.m2/repository/net/java/dev/jna/jna/5.3.0/jna-5.3.0.jar:/Users/MYNAME/.m2/repository/org/slf4j/slf4j-api/1.7.30/slf4j-api-1.7.30.jar:/Users/MYNAME/.m2/repository/org/slf4j/slf4j-simple/1.7.30/slf4j-simple-1.7.30.jar)
		at com.sun.jna.Native.extractFromResourcePath(Native.java:1095)
		at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:276)
		... 24 more

How to Reproduce?

fun main() {
    val paragraph = ("""BBC Japan was a general entertainment Channel.
Which operated between December 2004 and April 2006.
It ceased operations after its Japanese distributor folded.""")
    val criteria = mapOf(
        "backbone" to "bert",
        "dataset" to "book_corpus_wiki_en_uncased"
    )
    arrayOf(
        "When did BBC Japan start broadcasting?",
        "When did BBC Japan stop broadcasting?"
    ).forEach { question ->
        val input = QAInput(question, paragraph, 384)
        println("Paragraph: ${input.paragraph}")
        println("Question: ${input.question}")
        MxModelZoo.BERT_QA.loadModel(criteria, ProgressBar()).use { model ->
            model.newPredictor().use { predictor ->
                println("Answer: ${predictor.predict(input)}")
            }
        }
    }
}

Steps to reproduce

pom.xml with


        <dependency>
            <groupId>ai.djl</groupId>
            <artifactId>api</artifactId>
            <version>0.2.1</version>
        </dependency>
        <dependency>
            <groupId>ai.djl</groupId>
            <artifactId>api</artifactId>
            <version>0.2.1</version>
        </dependency>
        <dependency>
            <groupId>ai.djl</groupId>
            <artifactId>basicdataset</artifactId>
            <version>0.2.1</version>
        </dependency>
        <dependency>
            <groupId>ai.djl</groupId>
            <artifactId>model-zoo</artifactId>
            <version>0.2.1</version>
        </dependency>
        <dependency>
            <groupId>ai.djl.mxnet</groupId>
            <artifactId>mxnet-model-zoo</artifactId>
            <version>0.2.1</version>
        </dependency>
        <dependency>
            <groupId>ai.djl.mxnet</groupId>
            <artifactId>mxnet-engine</artifactId>
            <version>0.2.1</version>
        </dependency>

        <dependency>
            <groupId>ai.djl.mxnet</groupId>
            <artifactId>mxnet-native-cu92mkl</artifactId>
            <version>1.6.0-b</version>
            <classifier>macosx-x86_64-gpu</classifier>
        </dependency>
        <dependency>
            <groupId>ai.djl.mxnet</groupId>
            <artifactId>mxnet-native-cu101mkl</artifactId>
            <version>1.6.0-b</version>
            <classifier>macosx-x86_64-gpu</classifier>
        </dependency>
        <dependency>
            <groupId>ai.djl.mxnet</groupId>
            <artifactId>mxnet-native-mkl</artifactId>
            <version>1.6.0-b</version>
            <classifier>macosx-x86_64-gpu</classifier>
        </dependency>

        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>1.7.30</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-simple</artifactId>
            <version>1.7.30</version>
        </dependency>

What have you tried to solve it?

It ran successfully from the checkout of this project. I'm trying to get it to run using released libs.

I tried various values in the <classifier>macosx-x86_64-gpu</classifier>

Environment Info

Please provide the following information:

  • Operating System: OSX
  • Hardware(Machine) Info: Macbook pro
  • CUDA version(if available): ?
  • Deep Java Library version: ?
  • MXNet version: ?

Improve Documentation

The documentation in DJL was originally written with the expectation that users are reasonably familiar with deep learning. So, it does not go out of the way to define and explain some of the key concepts. To help users who are newer to deep learning, we created a documentation convention for what explanation is required to get a basic understanding of the relevant topics. We now need to update the existing javadocs to contain all the required information.

  • Blocks
    • Convolution (and Conv1D, Conv2D, Conv3D) - @jonathan016
    • Embedding Block
    • Linear Block
    • Activations (mostly in Activation)
      • Prelu block
    • BatchNorm - @jonathan016
    • Dropout - @jonathan016
    • Pooling (in Pool)
    • GRU
    • LSTM
    • RNN
  • Evaluators
  • Loss Functions
  • Initializers
  • Optimizers
  • Models
    • Mlp
    • ResnetV1
    • SingleShotDetection
  • Datasets
    • Captcha
    • Cifar10
    • Coco
    • ImageNet
    • Mnist
    • PikachuDetection
    • StanfordMovieReview
    • TatoebaEnglishFreshDataset

This issue is fairly big for a single person, so I want to set this up for multiple people to work on. Comment below if you are interested in helping with any of the documentation and which of the items above you want to work on. Also comment if you notice any other javadoc that does not match the convention. I will edit this description to keep it up to date as the documentation is updated.

Find saved model in subdirectory of downloaded .zip

Description

When I specify a model contained in a .zip such as:

-Dai.djl.repository.zoo.location=https://djl-tensorflow-javacpp.s3.amazonaws.com/tensorflow-models/covid-19/saved_model.zip

This is out output of the extraction:

% ls /Users/ermurphy/.djl.ai/cache/repo/model/undefined/ai/djl/localmodelzoo/9cd10ffd7f1adba3a00d0425403b69f7
saved_model

Notice a subdirectory of saved_model is present. This is where the model files reside.

% ls /Users/ermurphy/.djl.ai/cache/repo/model/undefined/ai/djl/localmodelzoo/9cd10ffd7f1adba3a00d0425403b69f7/saved_model 
assets          saved_model.pb  variables

Here is the output when running the model from this .zip:

org.tensorflow.exceptions.TensorFlowException: Could not find SavedModel .pb or .pbtxt at supplied export directory path: /Users/ermurphy/.djl.ai/cache/repo/model/undefined/ai/djl/localmodelzoo/9cd10ffd7f1adba3a00d0425403b69f7

This results in an error because it's looking in the parent directory, and not the saved_model directory. See how saved_model is missing from the path from the TensorFlowException.

Expected Behavior

There should be a way to specify a subdirectory for where the model files reside from a .zip. It cannot be expected that they will be in the parent directory. The DJL Java code might also search the subdirectories first to find the .pb before giving a path to TensorFlow.

Additional Info

Criteria when loading the model. This comes from the COVID example code.

Criteria<BufferedImage, Classifications> criteria = Criteria.builder()
                .setTypes(BufferedImage.class, Classifications.class).optTranslator(new MyTranslator())
                .optProgress(new ProgressBar()).build();

CocoDetection creates a wrong directory structure for Coco data

Description

Coco Detection creates and download Coco data into a wrong directory structure:

/root/.djl.ai/cache/repo/dataset/cv/ai/djl/basicdataset/coco/1.0/
-- annotations
---- annotations
-- train2017
----train2017
--val2017
-----val2017
and

/root/.djl.ai/cache/repo/dataset/cv/ai/djl/basicdataset/coco/1.0/annotations/
--annotations
---- captions_train2017.json
---- captions_val2017.json
---- instances_train2017.json
---- instances_val2017.json
---- person_keypoints_train2017.json
---- person_keypoints_val2017.json

Expected Behavior

.json files and and images should be in the upper directories

Error Message

If we use CocoDataset for preparing data for a training pipeline similar to TrainPikachu, following error will appear

root@e1414f287bc3:~/git/danhlephuoc/djl/examples# mvn exec:java -Dexec.mainClass="ai.djl.examples.training.TrainCoco"
[INFO] Scanning for projects...
[INFO]
[INFO] --------------------------< ai.djl:examples >---------------------------
[INFO] Building examples 0.6.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- exec-maven-plugin:1.6.0:java (default-cli) @ examples ---
[WARNING]
java.nio.file.NoSuchFileException: /root/.djl.ai/cache/repo/dataset/cv/ai/djl/basicdataset/coco/1.0/annotations/instances_train2017.json
at sun.nio.fs.UnixException.translateToIOException (UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException (UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException (UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newByteChannel (UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel (Files.java:361)
at java.nio.file.Files.newByteChannel (Files.java:407)
at java.nio.file.spi.FileSystemProvider.newInputStream (FileSystemProvider.java:384)
at java.nio.file.Files.newInputStream (Files.java:152)
at java.nio.file.Files.newBufferedReader (Files.java:2784)
at java.nio.file.Files.newBufferedReader (Files.java:2816)
at ai.djl.basicdataset.CocoUtils.prepare (CocoUtils.java:54)
at ai.djl.basicdataset.CocoDetection.prepareData (CocoDetection.java:146)
at ai.djl.repository.dataset.ZooDataset.prepare (ZooDataset.java:104)
at ai.djl.examples.training.TrainCoco.getDataset (TrainCoco.java:147)
at ai.djl.examples.training.TrainCoco.runExample (TrainCoco.java:79)
at ai.djl.examples.training.TrainCoco.main (TrainCoco.java:70)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:498)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
at java.lang.Thread.run (Thread.java:748)
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.145 s
[INFO] Finished at: 2020-05-31T07:57:03+02:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.6.0:java (default-cli) on project examples: An exception occured while executing the Java class. /root/.djl.ai/cache/repo/dataset/cv/ai/djl/basicdataset/coco/1.0/annotations/instances_train2017.json -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

What have you tried to solve it?

I manually moved .json files and .jpg files to upper directories, the error is gone.

BERT Inference Demo using pytorch

Description

Could you provide an jupyter notebook sample for BERT inference demo using pytorch?
Just like current BERT inference demo using mxnet

Who will benefit from this feature?

People already pretrained / fine-tuned their own model using pytorch can easily migrate into this project.

Thanks.

Potential ArrayIndexOutOfBoundsException

Description

There is a potential ArrayIndexOutOfBoundsException in the method latestMetric in class ai.djl.metric.Metrics.
In the following code snippet, if list is empty (not null), the return list.get(list.size() - 1); statement will has an ArrayIndexOutOfBoundsException.

public Metric latestMetric(String name) {
List<Metric> list = metrics.get(name);
if (list == null) {
throw new IllegalArgumentException("Could not find metric: " + name);
}
return list.get(list.size() - 1);
}
`

What have you tried to solve it?

change if (list == null) to if (list == null || list.isEmpty())

Training Visualization

Description

DJL should include a NN web or GUI visualization tool like DL4J to help optimize NN parameters and NN layers.

Will this change the current api? How?

This will be a new API.

Who will benefit from this feature?

Users who want to monitor training jobs and help determine why their model is not training successfully.

References

Training example of Instance Segmentation

Description

Is there any guide on how to train an instance segmentation model for DJL?

Will this change the current api? How?

Who will benefit from this feature?

References

  • list reference and related literature
  • list known implementations

Part of Speech Tagging Dataset

Description

This is a task to add at least one part of speech tagging dataset. These datasets help provide an example of an NLP token classification task, as well as having some use for training multi-purpose NLP models. A good example might be one from Universal Dependencies.

Multi-GPU training fails with CUDA illegal memory access for some examples

When running training examples in multi-GPU, I came across CUDA illegal memory access for examples with LSTM operators

Error Message

        at ai.djl.examples.training.TrainSeq2Seq.runExample(TrainSeq2Seq.java:107)
        at ai.djl.examples.training.TrainSeq2Seq.main(TrainSeq2Seq.java:64)
        Suppressed: java.lang.IllegalArgumentException: Metric name not found: step
                at ai.djl.metric.Metrics.percentile(Metrics.java:135)
                at ai.djl.training.listener.LoggingTrainingListener.onTrainingEnd(LoggingTrainingListener.java:167)
                at ai.djl.training.Trainer.lambda$close$5(Trainer.java:348)
                at java.util.ArrayList.forEach(ArrayList.java:1257)
                at ai.djl.training.Trainer.close(Trainer.java:348)
                at ai.djl.examples.training.TrainSeq2Seq.runExample(TrainSeq2Seq.java:119)
                ... 1 more
        Suppressed: ai.djl.engine.EngineException: MXNet engine call failed: cuDNN: Check failed: e == CUDNN_STATUS_SUCCESS (4 vs. 0) : CUDNN_STATUS_INTERNAL_ERROR
Stack trace:
  File "src/operator/./rnn-inl.h", line 768

                at ai.djl.mxnet.jna.JnaUtils.checkCall(JnaUtils.java:1788)
                at ai.djl.mxnet.jna.JnaUtils.waitAll(JnaUtils.java:466)
                at ai.djl.mxnet.engine.MxModel.close(MxModel.java:161)
                at ai.djl.examples.training.TrainSeq2Seq.runExample(TrainSeq2Seq.java:122)
                ... 1 more
[18:23:19] src/resource.cc:230: Ignore CUDA Error [18:23:19] /codebuild/output/src546137840/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/AWS-MXNet/3rdparty/mshadow/mshadow/./tensor_gpu-inl.h:73: Check failed: e == cudaSuccess: CUDA: an illegal memory access was encountered


[18:23:19] src/resource.cc:279: Ignore CUDA Error [18:23:19] src/storage/./pooled_storage_manager.h:97: CUDA: an illegal memory access was encountered


[18:23:19] src/resource.cc:279: Ignore CUDA Error [18:23:19] src/storage/./pooled_storage_manager.h:97: CUDA: an illegal memory access was encountered


[18:23:19] src/engine/threaded_engine_perdevice.cc:275: Ignore CUDA Error [18:23:19] /codebuild/output/src546137840/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/AWS-MXNet/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:203: Check failed: e == cudaSuccess: CUDA: an illegal memory access was encountered

How to Reproduce?

We can reproduce by running the TrainSeq2Seq example on a machine with more than 1 GPU.

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. ./gradlew :examples:run -Dmain=ai.djl.examples.training.TrainSeq2Seq

Multi array label Softmax Cross Entropy loss function

Description

We want to have multi array labels/prediction. Currently API can handle only one label (first element in NDArray). Obvious examples are Multi digit number recognition, where we can predict multiple digits from provided input. Example of this: https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/convolution/captcharecognition/MultiDigitNumberRecognition.java
Also there are many cases when we want multiple array labels, especially when dealing with just numbers (outputs) and predictions.
There is small thread about this (python):
https://datascience.stackexchange.com/questions/23614/keras-multiple-softmax-in-last-layer-possible

Documentation for this is very hidden or non existent - the same is with examples. Everyone is using just single array label, multiple classes (out neurons):
[0.2, 0.3, 0.6,...0.12] -> 0.6

We want to have multiple array labels, multiple classes (out neurons):
[0.2, 0.3, 0.6,...0.12] -> 0.6
[0.4, 0.2, 0.7,...0.88] -> 0.88
[0.11, 0.77, 0.55,...0.33] -> 0.77
:
:

Will this change the current api?

Hopefully only new loss function (class) can be implemented, which will calculate multi array labels (Split tensor to parts, then compute softmax separately per part and concatenate tensor parts at end) - SoftmaxCrossEntropyLossMulti.

References

Rename ModelZoo, ZooModel

I suggest renaming ModelZoo, ZooModel classes because:

  1. It's not intuitive when you should use one versus the other (the name doesn't make it obvious).
  2. If this is meant to model a collection of models, please just call it a ModelCollection. The use of the word "Zoo" doesn't add any useful meaning to first-time readers and in my case just caused confusion.

Just my 2 cents :)

Rename Block to LearnedFunction

This issue is my proposal to rename the Block class (https://github.com/awslabs/djl/blob/master/api/src/main/java/ai/djl/nn/Block.java) to LearnedFunction. I am hoping to collect feedback and have a discussion with the community about it.

Right now, we use Block as the main class for representing a neural network. We chose Block because it conveyed the idea of composability: that the various Blocks can combine like lego blocks. This addresses the question of how neural networks are build up using small differentiable functions (operators) into a full network.

My concern with Block is that it doesn't convey a sense of freedom. Blocks are more rigid and can only go together in relatively fixed ways. However, the ways Blocks can go together is not quite clear. Are SequentialBlock and ParallelBlock sufficient for everything you need? Can blocks have variable number of children or is it fixed? How does conditionals or loops fit into the analogy?

That is why I am thinking that LearnedFunction might be a clearer representation. It can do pretty much anything a function can do and any programmer should be aware of what functions do. This makes it clear you can do things like composition, call other functions, and use control flow.

It is also a more clearer representation of what the Block class actually represents. The first two paragraphs of the Block javadoc, copied below, clearly show the ideology of a LearnedFunction:

A {@code Block} is a composable function that forms a neural network.

Blocks serve a purpose similar to functions that convert an input NDList to an output NDList. They can represent single operations, parts of a neural network, and even the whole neural network. What makes blocks special is that they contain a number of parameters that are used in their function and are trained during deep learning. As these parameters are trained, the functions represented by the blocks get more and more accurate. Each block consists of the following components:

There are also some concerns about this rename. First, the name of Block is used by other frameworks like Gluon (although new TF/Keras use layers, PT uses Module).

The other concern is that LearnedFunction is a more abstract concept than Block. Block, although not a perfectly accurate description, would be easier to understand. This could make it easier for new users to adapt to deep learning with DJL. Using a very abstract concept, on the other hand, would make it more difficult.

Please comment below if you have any other thoughts, ideas, or concerns regarding this. Also, add a reaction to the main description with thumbs up (+1) if you agree with the rename and a thumbs down (-1) if you think it is a bad idea.

Mx Net Engine truncates float values to integers on systems with a German locale

Description

Operations on MxNDArray that take a single Number argument truncate the arguments decimal places which leads to erroneous calculation results. This seems to be due to a German Locale setting of the host.

E.g.:
System.out.println(NDManager.newBaseManager().create(1.3).add(0.7));
prints:
ND: () gpu(0) float64
1.3

This seems to affect all math operations with a single Number argument like add, gt, gte, lt etc.

Expected Behavior

float and double values are passed correctly, mathematical operations yield correct results. The above line should print:
ND: () gpu(0) float64
2.0

Error Message

N/A

How to Reproduce?

An example of the error with the console output on a German system can be found here:
https://gist.github.com/chenkelmann/2bfa9627d79a9aaab34a46227d81aea5

Steps to reproduce

Run the main method in the above example on a Linux system with LC_NUMERIC=de_DE.UTF-8

What have you tried to solve it?

The problem can be circumvented by creating an NDArray with the argument instead of using the methods that take Number:
System.out.println(manager.create(1.3).add(manager.create(new double[]{0.7})));

Setting LC_NUMERIC="en_US.UTF-8" for the current process also fixes the issue (but is very fragile, as the correct working of the code depends on the current environment variables...)

Environment Info

Please run the command ./gradlew debugEnv from the root directory of DJL (if necessary, clone DJL first). It will output information about your system, environment, and installation that can help us debug your issue. Paste the output of the command below:

[INFO ] - ----------System Properties----------
[INFO ] - sun.cpu.isalist: 
[INFO ] - sun.desktop: gnome
[INFO ] - sun.io.unicode.encoding: UnicodeLittle
[INFO ] - sun.cpu.endian: little
[INFO ] - java.vendor.url.bug: http://bugreport.sun.com/bugreport/
[INFO ] - file.separator: /
[INFO ] - java.vendor: Private Build
[INFO ] - sun.boot.class.path: /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/resources.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/sunrsasign.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jsse.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jce.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/charsets.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jfr.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/classes
[INFO ] - java.ext.dirs: /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext:/usr/java/packages/lib/ext
[INFO ] - java.version: 1.8.0_242
[INFO ] - java.vm.info: mixed mode
[INFO ] - awt.toolkit: sun.awt.X11.XToolkit
[INFO ] - org.apache.logging.log4j.assignedSequences: 8786
[INFO ] - user.language: en
[INFO ] - java.specification.vendor: Oracle Corporation
[INFO ] - sun.java.command: ai.djl.integration.util.DebugEnvironment
[INFO ] - java.home: /usr/lib/jvm/java-8-openjdk-amd64/jre
[INFO ] - sun.arch.data.model: 64
[INFO ] - java.vm.specification.version: 1.8
[INFO ] - java.class.path: /home/christoph/IdeaProjects/djl/integration/build/classes/java/main:/home/christoph/IdeaProjects/djl/integration/build/resources/main:/home/christoph/.gradle/caches/modules-2/files-2.1/commons-cli/commons-cli/1.4/c51c00206bb913cd8612b24abd9fa98ae89719b1/commons-cli-1.4.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/org.apache.logging.log4j/log4j-slf4j-impl/2.12.1/14973e22497adaf0196d481fb99c5dc2a0b58d41/log4j-slf4j-impl-2.12.1.jar:/home/christoph/IdeaProjects/djl/basicdataset/build/libs/basicdataset-0.5.0-SNAPSHOT.jar:/home/christoph/IdeaProjects/djl/model-zoo/build/libs/model-zoo-0.5.0-SNAPSHOT.jar:/home/christoph/IdeaProjects/djl/testing/build/libs/testing-0.5.0-SNAPSHOT.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/org.testng/testng/6.8.1/8aebea980eee079365df20f0cf7fcac900d50250/testng-6.8.1.jar:/home/christoph/IdeaProjects/djl/mxnet/mxnet-model-zoo/build/libs/mxnet-model-zoo-0.5.0-SNAPSHOT.jar:/home/christoph/IdeaProjects/djl/mxnet/mxnet-engine/build/libs/mxnet-engine-0.5.0-SNAPSHOT.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/ai.djl.mxnet/mxnet-native-auto/1.7.0-a-SNAPSHOT/a65beb2ad0ce1f49012bda3e5898979320278027/mxnet-native-auto-1.7.0-a-SNAPSHOT.jar:/home/christoph/IdeaProjects/djl/api/build/libs/api-0.5.0-SNAPSHOT.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-api/1.7.26/77100a62c2e6f04b53977b9f541044d7d722693d/slf4j-api-1.7.26.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/org.apache.logging.log4j/log4j-core/2.12.1/4382e93136c06bfb34ddfa0bb8a9fb4ea2f3df59/log4j-core-2.12.1.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/org.apache.logging.log4j/log4j-api/2.12.1/a55e6d987f50a515c9260b0451b4fa217dc539cb/log4j-api-2.12.1.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/org.beanshell/bsh/2.0b4/a05f0a0feefa8d8467ac80e16e7de071489f0d9c/bsh-2.0b4.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/com.beust/jcommander/1.27/58c9cbf0f1fa296f93c712f2cf46de50471920f9/jcommander-1.27.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/org.yaml/snakeyaml/1.6/a1e23e31c424d566ee27382e373d73a28fdabd88/snakeyaml-1.6.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/com.google.code.gson/gson/2.8.5/f645ed69d595b24d4cf8b3fbb64cc505bede8829/gson-2.8.5.jar:/home/christoph/.gradle/caches/modules-2/files-2.1/net.java.dev.jna/jna/5.3.0/4654d1da02e4173ba7b64f7166378847db55448a/jna-5.3.0.jar
[INFO ] - user.name: christoph
[INFO ] - file.encoding: UTF-8
[INFO ] - java.specification.version: 1.8
[INFO ] - java.awt.printerjob: sun.print.PSPrinterJob
[INFO ] - user.timezone: Europe/Berlin
[INFO ] - user.home: /home/christoph
[INFO ] - os.version: 5.3.0-46-generic
[INFO ] - sun.management.compiler: HotSpot 64-Bit Tiered Compilers
[INFO ] - java.specification.name: Java Platform API Specification
[INFO ] - java.class.version: 52.0
[INFO ] - java.library.path: /usr/local/cuda/lib64::/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
[INFO ] - sun.jnu.encoding: UTF-8
[INFO ] - os.name: Linux
[INFO ] - user.variant: 
[INFO ] - java.vm.specification.vendor: Oracle Corporation
[INFO ] - java.io.tmpdir: /tmp
[INFO ] - line.separator: 

[INFO ] - java.endorsed.dirs: /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/endorsed
[INFO ] - os.arch: amd64
[INFO ] - java.awt.graphicsenv: sun.awt.X11GraphicsEnvironment
[INFO ] - java.runtime.version: 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
[INFO ] - java.vm.specification.name: Java Virtual Machine Specification
[INFO ] - user.dir: /home/christoph/IdeaProjects/djl/integration
[INFO ] - sun.java.launcher: SUN_STANDARD
[INFO ] - user.country: US
[INFO ] - sun.os.patch.level: unknown
[INFO ] - java.vm.name: OpenJDK 64-Bit Server VM
[INFO ] - file.encoding.pkg: sun.io
[INFO ] - path.separator: :
[INFO ] - java.vm.vendor: Private Build
[INFO ] - java.vendor.url: http://java.oracle.com/
[INFO ] - sun.boot.library.path: /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64
[INFO ] - java.vm.version: 25.242-b08
[INFO ] - java.runtime.name: OpenJDK Runtime Environment
[INFO ] - 
[INFO ] - ----------Environment Variables----------
[INFO ] - PATH: /usr/local/cuda/bin:/home/christoph/.local/bin:/home/christoph/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
[INFO ] - LC_MEASUREMENT: de_DE.UTF-8
[INFO ] - XAUTHORITY: /home/christoph/.Xauthority
[INFO ] - LC_TELEPHONE: de_DE.UTF-8
[INFO ] - XDG_DATA_DIRS: /usr/share/cinnamon:/usr/share/gnome:/home/christoph/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share
[INFO ] - GDMSESSION: cinnamon
[INFO ] - DBUS_SESSION_BUS_ADDRESS: unix:path=/run/user/1000/bus
[INFO ] - XDG_CURRENT_DESKTOP: X-Cinnamon
[INFO ] - SSH_AGENT_PID: 1493
[INFO ] - COLORTERM: truecolor
[INFO ] - LD_LIBRARY_PATH: /usr/local/cuda/lib64:
[INFO ] - LC_PAPER: de_DE.UTF-8
[INFO ] - SESSION_MANAGER: local/bishop:@/tmp/.ICE-unix/1428,unix/bishop:/tmp/.ICE-unix/1428
[INFO ] - LOGNAME: christoph
[INFO ] - PWD: /home/christoph/IdeaProjects/djl
[INFO ] - LANGUAGE: en_US
[INFO ] - GJS_DEBUG_TOPICS: JS ERROR;JS LOG
[INFO ] - SHELL: /bin/bash
[INFO ] - LESSOPEN: | /usr/bin/lesspipe %s
[INFO ] - LC_ADDRESS: de_DE.UTF-8
[INFO ] - OLDPWD: /home/christoph/IdeaProjects/djl
[INFO ] - GNOME_DESKTOP_SESSION_ID: this-is-deprecated
[INFO ] - GNOME_TERMINAL_SCREEN: /org/gnome/Terminal/screen/cdbd2b41_45b6_4c94_aa70_241e01b6353f
[INFO ] - GTK_MODULES: gail:atk-bridge
[INFO ] - XDG_SESSION_PATH: /org/freedesktop/DisplayManager/Session0
[INFO ] - XDG_SESSION_DESKTOP: cinnamon
[INFO ] - LS_COLORS: rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
[INFO ] - SHLVL: 1
[INFO ] - LC_IDENTIFICATION: de_DE.UTF-8
[INFO ] - LESSCLOSE: /usr/bin/lesspipe %s %s
[INFO ] - LC_MONETARY: de_DE.UTF-8
[INFO ] - TERM: xterm-256color
[INFO ] - XDG_CONFIG_DIRS: /etc/xdg/xdg-cinnamon:/etc/xdg
[INFO ] - GNOME_TERMINAL_SERVICE: :1.84
[INFO ] - LANG: en_US.UTF-8
[INFO ] - XDG_SEAT_PATH: /org/freedesktop/DisplayManager/Seat0
[INFO ] - XDG_SESSION_ID: c2
[INFO ] - XDG_SESSION_TYPE: x11
[INFO ] - DISPLAY: :0
[INFO ] - CINNAMON_VERSION: 4.4.8
[INFO ] - LC_NAME: de_DE.UTF-8
[INFO ] - _: ./gradlew
[INFO ] - GDM_LANG: en_US
[INFO ] - XDG_GREETER_DATA_DIR: /var/lib/lightdm-data/christoph
[INFO ] - GPG_AGENT_INFO: /run/user/1000/gnupg/S.gpg-agent:0:1
[INFO ] - DESKTOP_SESSION: cinnamon
[INFO ] - USER: christoph
[INFO ] - VTE_VERSION: 5202
[INFO ] - QT_ACCESSIBILITY: 1
[INFO ] - LC_NUMERIC: de_DE.UTF-8
[INFO ] - GJS_DEBUG_OUTPUT: stderr
[INFO ] - SSH_AUTH_SOCK: /run/user/1000/keyring/ssh
[INFO ] - XDG_SEAT: seat0
[INFO ] - GTK_OVERLAY_SCROLLING: 1
[INFO ] - QT_QPA_PLATFORMTHEME: qt5ct
[INFO ] - XDG_VTNR: 7
[INFO ] - XDG_RUNTIME_DIR: /run/user/1000
[INFO ] - HOME: /home/christoph
[INFO ] - 
[INFO ] - ----------Default Engine----------

(the output did not print anything about the Default Engine, it hangs for minutes after that, I had to abort)

The output of locale is:

    LANG=en_US.UTF-8
    LANGUAGE=en_US
    LC_CTYPE="en_US.UTF-8"
    LC_NUMERIC=de_DE.UTF-8
    LC_TIME="en_US.UTF-8"
    LC_COLLATE="en_US.UTF-8"
    LC_MONETARY=de_DE.UTF-8
    LC_MESSAGES="en_US.UTF-8"
    LC_PAPER=de_DE.UTF-8
    LC_NAME=de_DE.UTF-8
    LC_ADDRESS=de_DE.UTF-8
    LC_TELEPHONE=de_DE.UTF-8
    LC_MEASUREMENT=de_DE.UTF-8
    LC_IDENTIFICATION=de_DE.UTF-8
    LC_ALL=

The culprit is LC_NUMERIC. If set to en_US.UTF-8 the calculations yield the expected result.

Dynamic learning parameter - time period

Description

Dynamically modified learning parameter over time. We want learning parameter to change dynamically when network is learning:

  • constant learning parameter at the start then cycle mode. We want to start with constant learning parameter, then after N epoch start cycle - like saw, from min. to max. definition of learning parameter (up and down)
  • it should be configurable - we want to start with static learning parameter and then cycle, we want to start with cycle at the beginning,..
  • any combination with constant and cycle, for example 0-10 epoch constant learning parameter, then cycle(whatever type) (11-20), then again constant parameter (21-30),..

Cycle (learning parameter alternation) mode should have multiple types, with configurable time interval and min., max. learning parameter value, for example:

  • like saw wave (up and down /\ /\ /\ )
  • like first half of saw wave (just up / / / )
  • like second half of saw wave (just down \ \ \ )
  • we can go any combination (/\ - / / - \ \ )

Will this change the current api? How?

Yes, implement learning parameter which dynamically changes over iteration/epoch

Who will benefit from this feature?

Everybody, this will enable network to learn more quickly.

References

Here is dynamic learning parameter change implementation, kind of saw wave over time:
https://github.com/eclipse/deeplearning4j/blob/b5f0ec072f3fd0da566e32f82c0e43ca36553f39/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/schedule/CycleSchedule.java
We want to extend this implementation - see above.

TensorFlow library not loading with GraalVM Native Image

Description

I am working on an implementation of the covid19-detection example code, but using Quarkus to serve the model and also support GraalVM Native Image.

The project is located here:
https://github.com/murphye/djl-demo/tree/master/covid19-detection-quarkus

The application runs fine on the JVM, but when running in native mode, the TensorFlow libraries are not being loaded (i.e. System.loadLibrary).

Important: I cannot find a reference in the DJL code for System.loadLibrary and I do not understand how the TensorFlow libraries are actually loaded. If I better understood how the mechanism worked, I could better diagnose it. It does seem to be related to Bytedeco which I am not familiar with.

Here is the code that I am running to demonstrate the issue:

        LibUtils.loadLibrary(); // Forcing the library to load to demo error
        System.out.println("Library Path: " + System.getProperty("org.bytedeco.javacpp.platform.preloadpath"));

        // See if TF loaded correctly or not. If not, expect java.lang.UnsatisfiedLinkError
        TfEngine.getInstance().debugEnvironment();

Error Message

Here is the output of the error when running this code. The TF libraries are downloaded and placed in /Users/ermurphy/.tensorflow/cache/2.1.0-a-SNAPSHOT-cpu-osx-x86_64 correctly.

__  ____  __  _____   ___  __ ____  ______ 
 --/ __ \/ / / / _ | / _ \/ //_/ / / / __/ 
 -/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \   
--\___\_\____/_/ |_/_/|_/_/|_|\____/___/   
2020-05-11 19:22:10,136 INFO  [io.quarkus] (main) covid19-detection-quarkus 1.0-SNAPSHOT (powered by Quarkus 1.4.2.Final) started in 0.056s. Listening on: http://0.0.0.0:8080
2020-05-11 19:22:10,137 INFO  [io.quarkus] (main) Profile prod activated. 
2020-05-11 19:22:10,137 INFO  [io.quarkus] (main) Installed features: [cdi, resteasy, resteasy-jackson]
2020-05-11 19:22:23,584 INFO  [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libstdc++.6.dylib ...
2020-05-11 19:22:24,752 INFO  [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libjnitensorflow.dylib ...
2020-05-11 19:22:24,987 INFO  [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libgcc_s.1.dylib ...
2020-05-11 19:22:25,203 INFO  [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading THIRD_PARTY_TF_JNI_LICENSES ...
2020-05-11 19:22:25,448 INFO  [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libtensorflow.2.dylib ...
2020-05-11 19:22:37,506 INFO  [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libjnimklml.dylib ...
2020-05-11 19:22:37,656 INFO  [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libiomp5.dylib ...
2020-05-11 19:22:38,182 INFO  [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libmkldnn.0.dylib ...
2020-05-11 19:22:38,934 INFO  [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading LICENSE ...
2020-05-11 19:22:39,038 INFO  [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libmklml.dylib ...
2020-05-11 19:22:42,655 INFO  [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libjnimkldnn.dylib ...
2020-05-11 19:22:42,833 INFO  [ai.djl.ten.eng.LibUtils] (executor-thread-1) Downloading libgomp.1.dylib ...
Library Path: /Users/ermurphy/.tensorflow/cache/2.1.0-a-SNAPSHOT-cpu-osx-x86_64
2020-05-11 19:22:42,979 INFO  [ai.djl.eng.Engine] (executor-thread-1) Engine name: TensorFlow
2020-05-11 19:22:42,980 ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] (executor-thread-1) HTTP Request to /predict failed, error id: 45121a3e-fbf8-4684-911c-4e9250ed8f41-1: java.lang.UnsatisfiedLinkError: org.tensorflow.internal.c_api.global.tensorflow.TF_Version()Lorg/bytedeco/javacpp/BytePointer; [symbol: Java_org_tensorflow_internal_c_1api_global_tensorflow_TF_1Version or Java_org_tensorflow_internal_c_1api_global_tensorflow_TF_1Version__]
        at com.oracle.svm.jni.access.JNINativeLinkage.getOrFindEntryPoint(JNINativeLinkage.java:145)
        at com.oracle.svm.jni.JNIGeneratedMethodSupport.nativeCallAddress(JNIGeneratedMethodSupport.java:57)
        at org.tensorflow.internal.c_api.global.tensorflow.TF_Version(tensorflow.java)
        at org.tensorflow.TensorFlow.version(TensorFlow.java:37)
        at ai.djl.tensorflow.engine.TfEngine.getVersion(TfEngine.java:64)
        at ai.djl.engine.Engine.debugEnvironment(Engine.java:171)
        at com.examples.ExampleService.<init>(ExampleService.java:42)

Expected Behavior

Running in GraalVM Native Image executable, the libaries should be loaded and usable through JNI bridge. I have proven this in the past with this PoC:
https://github.com/murphye/quarkus-tensorflow-inception/blob/master/src/main/java/io/quarkus/tensorflow/LoadTensorFlow.java

Next Steps

I need some guidance on how TensorFlow is loaded in DJL if it's not using System.loadLibrary as shown here:
https://github.com/murphye/quarkus-tensorflow-inception/blob/master/src/main/java/io/quarkus/tensorflow/LoadTensorFlow.java#L98

How else does the TensorFlow library get loaded, and how can I further diagnose the issue when running in Native mode?

Add CUDA 10.2 support

Description

I tried to run this demo to get familiar with djl, but could not run training on GPU because I have CUDA 10.2 installed:

Caused by: java.lang.UnsatisfiedLinkError: Unable to load library '/home/andrej/.mxnet/cache/1.6.0-a-20191127cu101mkl-linux-x86_64/libmxnet.so':
libcudart.so.10.1: cannot open shared object file: No such file or directory
libcudart.so.10.1: cannot open shared object file: No such file or directory
Native library (home/andrej/.mxnet/cache/1.6.0-a-20191127cu101mkl-linux-x86_64/libmxnet.so) not found in resource path (/home/andrej/projects/djl-demo/build/classes/java/main:/home/andrej/projects/djl-demo/build/resources/main:/home/andrej/.gradle/caches/modules-2/files-2.1/ai.djl.mxnet/mxnet-model-zoo/0.2.0/355dfb3163430430f25c7e86267fc5c621276179/mxnet-model-zoo-0.2.0.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/ai.djl.mxnet/mxnet-engine/0.2.0/7841bd1c3fc2f44fe76cb2e8a083dfead4de3f9a/mxnet-engine-0.2.0.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/net.java.dev.jna/jna/5.3.0/4654d1da02e4173ba7b64f7166378847db55448a/jna-5.3.0.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/org.apache.httpcomponents/httpcore/4.4.12/21ebaf6d532bc350ba95bd81938fa5f0e511c132/httpcore-4.4.12.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/commons-cli/commons-cli/1.4/c51c00206bb913cd8612b24abd9fa98ae89719b1/commons-cli-1.4.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-simple/1.7.29/82ae07f95088577987a15d90171de12b00d81847/slf4j-simple-1.7.29.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/org.apache.commons/commons-csv/1.7/cb5d05520f8fe1b409aaf29962e47dc5764f8f39/commons-csv-1.7.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/ai.djl/basicdataset/0.2.0/fa73e42fb774b56f23a030a6c95159a1987d8110/basicdataset-0.2.0.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/ai.djl/model-zoo/0.2.0/dbe300ddc19ec809002ed9a6214dac11e39a1055/model-zoo-0.2.0.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/ai.djl/repository/0.2.0/266c3a327e89b82234c03a713f05067567c2e9dd/repository-0.2.0.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/com.google.code.gson/gson/2.8.5/f645ed69d595b24d4cf8b3fbb64cc505bede8829/gson-2.8.5.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/ai.djl/api/0.2.0/c83672c1e7178830ea9c43b98603d5fa7737fd78/api-0.2.0.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/ai.djl.mxnet/mxnet-native-cu101mkl/1.6.0-a/c67432f4f6ba4273a13c3f9efff52e5f2710c888/mxnet-native-cu101mkl-1.6.0-a-linux-x86_64.jar:/home/andrej/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-api/1.7.29/e56bf4473a4c6b71c7dd397a833dce86d1993d9d/slf4j-api-1.7.29.jar)
        at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:302)
        at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:455)
        at com.sun.jna.Library$Handler.<init>(Library.java:192)
        at com.sun.jna.Native.load(Native.java:596)
        at com.sun.jna.Native.load(Native.java:570)
        at ai.djl.mxnet.jna.LibUtils.loadLibrary(LibUtils.java:80)
        at ai.djl.mxnet.jna.JnaUtils.<clinit>(JnaUtils.java:68)
        at ai.djl.mxnet.engine.MxEngine.<init>(MxEngine.java:36)
        at ai.djl.mxnet.engine.MxEngineProvider.<clinit>(MxEngineProvider.java:21)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
        at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:779)
        ... 9 more
        Suppressed: java.lang.UnsatisfiedLinkError: libcudart.so.10.1: cannot open shared object file: No such file or directory
                at com.sun.jna.Native.open(Native Method)
                at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:191)
                ... 22 more

CUDA 10.2 seems to be mainstream at the moment and it's easier to find installation instructions for this version.

Rolling back to CUDA 10.1 might be troublesome and will repel some users from using this library.

Will this change the current api? How?

No

Who will benefit from this feature?

All potential users of this djl including myself

TrainWithOptimizers throws TrainingDivergedException

Description

TrainWithOptimizers throws TrainingDivergedException
Possibly caused by Metric name not found: epoch

Expected Behavior

Run most things in ai.djl.examples.training.* out of the box.

Error Message

mymac:examples me $ ./gradlew run -Dmain=ai.djl.examples.training.TrainWithOptimizers

> Task :run
[INFO ] - Running ExampleTrainingListener on: cpu(0).
[INFO ] - Load library 1.6.0 in 0.106 ms.
Training:      0% |โ–ˆ                                       | Accuracy: 0.19, SoftmaxCrossEntropyLoss: 4.72, speed: 8.18 images/sec                                                        [INFO ] - Training: 1562 batches15s]
[INFO ] - Validation: 312 batches
[INFO ] - train P50: 3914.031 ms, P90: 3914.031 ms
[INFO ] - forward P50: 20.862 ms, P90: 21.610 ms
[INFO ] - training-metrics P50: 3606.591 ms, P90: 5841.697 ms
[INFO ] - backward P50: 8.703 ms, P90: 11.023 ms
[INFO ] - step P50: 39.751 ms, P90: 39.751 ms
Exception in thread "main" ai.djl.TrainingDivergedException: The Loss became NaN, try reduce learning rate,add clipGradient option to your optimizer, check input data and loss calculation.
        at ai.djl.examples.training.util.ExampleTrainingListener.onTrainingBatch(ExampleTrainingListener.java:87)
        at ai.djl.mxnet.engine.MxTrainer.lambda$trainBatch$3(MxTrainer.java:142)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)
        at ai.djl.mxnet.engine.MxTrainer.trainBatch(MxTrainer.java:142)
        at ai.djl.examples.training.util.TrainingUtils.fit(TrainingUtils.java:47)
        at ai.djl.examples.training.TrainWithOptimizers.runExample(TrainWithOptimizers.java:107)
        at ai.djl.examples.training.TrainWithOptimizers.main(TrainWithOptimizers.java:69)
        Suppressed: java.lang.IllegalArgumentException: Metric name not found: epoch
                at ai.djl.metric.Metrics.percentile(Metrics.java:135)
                at ai.djl.examples.training.util.ExampleTrainingListener.onTrainingEnd(ExampleTrainingListener.java:168)
                at ai.djl.mxnet.engine.MxTrainer.lambda$close$10(MxTrainer.java:344)
                at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)
                at ai.djl.mxnet.engine.MxTrainer.close(MxTrainer.java:344)
                at ai.djl.examples.training.TrainWithOptimizers.runExample(TrainWithOptimizers.java:96)
                ... 1 more

> Task :run FAILED

FAILURE: Build failed with an exception.

How to Reproduce?

git clone from repository, then start running examples.

Steps to reproduce

  1. ./gradlew run -Dmain=ai.djl.examples.training.TrainWithOptimizers

What have you tried to solve it?

  1. Nothing

Environment Info

Please provide the following information:

  • Operating System: OSX
  • Hardware(Machine) Info: Macbook pro
  • CUDA version(if available): None
  • Deep Java Library version: ?
  • MXNet version: ?

Add a ND4J backend

Description

ND4J, the accelerated linear algebra backend that powers Eclipse DeepLearning4J, should have all the necessary features to become a proper backend for DJL.

Will this change the current api? How?

No API changes should be necessary for the basics. If ND4J provides any opportunity an API enhancement we can discuss those as a separate issue.

Who will benefit from this feature?

DL4J has support for more native architectures (including ARM64 (Raspberry Pi/iOS/Android) and OpenPower) so it will allow people with those devices to take advantage of DJL. DL4J supports importing Keras, Tensorflow and ONNX models so if we design this correctly we can expand the types of pre-trained models the users can use.

References

Early stopping configuration

Description

Early stopping configuration: Specifies the various configuration options for running training with early stopping.

  • early stopping model saver - only use last best model: How model will be saved (to disk, to memory, etc)
  • Termination conditions:
    1. Iteration termination conditions: how many epoch till termination.
    2. score improvement termination condition - terminate training if best model score does not improve for N epochs
    3. best expected score - terminate training once we achieved an expected score.
    4. termination condition after certain time - terminate training after certain time
    5. other termination conditions, if they are logical

Will this change the current api? How?

We can configure when model training will stop, when one of condition above is met.
Training should be implemented as listener, early stop configuration will listen for any conditions above and terminate training.

Who will benefit from this feature?

Everybody, we can easily configure when learning will end.

References

Reference implementation:
https://github.com/eclipse/deeplearning4j/blob/b5f0ec072f3fd0da566e32f82c0e43ca36553f39/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/EarlyStoppingConfiguration.java
There are other implementation in different NN framework.

Add direct API support to override arguments when loading zoo models

Description

Feature to allow passing argument overrides when loading models. Such argument overrides may, for example, allow changing default threshold for translators or adjust other parameters.

Will this change the current api? How?

Yes, the API will change, specifically, load methods.

For example at present the method signature is:

ai.djl.repository.zoo.ModelLoader#loadModel(Map<java.lang.String,java.lang.String> criteria);

It would be nice to get a flavor like:

ai.djl.repository.zoo.ModelLoader#loadModel(Map<java.lang.String,java.lang.String> criteria, Map<> argumentOverrides);

The above is a trivialized approach that may be too simplistic.

Who will benefit from this feature?

Library consumers who would like to load models and adjust default parameters.

Additional Info

This may be beneficial if customers are trying to externalize configuration parameters needed to load models, predictors, translators and allow creating generic configuration in IoC environments like Spring.

Steps for creating a custom dataset

Am trying out djl and really excited to finally try out object detection in Java. What are the steps to label images for object detection?

I see that images needs to be separated in to train and test directories and also see index.file containing the coordinates of annotated objects. What tool can be used to annotate images to generate index.file?

Gradle build of examples fails

Description

DJL main project compiles with Gradle using the test option disabled (-x test). Unfortunately, I cannot build the examples folder with Gradle tool. It is showing some annoying compile errors. Based on my investigation, some class files from DJL are not visible from the example repository.

Expected Behavior

Example repository should compile.

Error Message

Microsoft Windows [Version 10.0.18362.657]
(c) 2019 Microsoft Corporation. All rights reserved.

C:\MyWorks\DJL-AI\djl\examples>gradlew jar
Found C:\MyWorks\DJL-AI\djl\examples\\gradle\wrapper\gradle-wrapper.jar

> Task :compileJava FAILED
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\ActionRecognition.java:20: error: cannot find symbol
import ai.djl.repository.zoo.Criteria;
                            ^
  symbol:   class Criteria
  location: package ai.djl.repository.zoo
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\benchmark\util\AbstractBenchmark.java:21: error: cannot find symbol
import ai.djl.repository.zoo.Criteria;
                            ^
  symbol:   class Criteria
  location: package ai.djl.repository.zoo
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\BertQaInference.java:20: error: cannot find symbol
import ai.djl.repository.zoo.Criteria;
                            ^
  symbol:   class Criteria
  location: package ai.djl.repository.zoo
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\InstanceSegmentation.java:21: error: cannot find symbol
import ai.djl.repository.zoo.Criteria;
                            ^
  symbol:   class Criteria
  location: package ai.djl.repository.zoo
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\ObjectDetection.java:21: error: cannot find symbol
import ai.djl.repository.zoo.Criteria;
                            ^
  symbol:   class Criteria
  location: package ai.djl.repository.zoo
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\PoseEstimation.java:24: error: cannot find symbol
import ai.djl.repository.zoo.Criteria;
                            ^
  symbol:   class Criteria
  location: package ai.djl.repository.zoo
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\training\TrainWithOptimizers.java:38: error: cannot find symbol
import ai.djl.repository.zoo.Criteria;
                            ^
  symbol:   class Criteria
  location: package ai.djl.repository.zoo
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\training\transferlearning\TrainResnetWithCifar10.java:38: error: cannot find symbol
import ai.djl.repository.zoo.Criteria;
                            ^
  symbol:   class Criteria
  location: package ai.djl.repository.zoo
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\ActionRecognition.java:54: error: cannot find symbol
        Criteria<BufferedImage, Classifications> criteria =
        ^
  symbol:   class Criteria
  location: class ActionRecognition
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\ActionRecognition.java:55: error: cannot find symbol
                Criteria.builder()
                ^
  symbol:   variable Criteria
  location: class ActionRecognition
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\benchmark\util\AbstractBenchmark.java:191: error: package Criteria does not exist
        Criteria.Builder<I, O> builder =
                ^
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\benchmark\util\AbstractBenchmark.java:192: error: cannot find symbol
                Criteria.builder()
                ^
  symbol:   variable Criteria
  location: class AbstractBenchmark<I,O>
  where I,O are type-variables:
    I extends Object declared in class AbstractBenchmark
    O extends Object declared in class AbstractBenchmark
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\BertQaInference.java:64: error: cannot find symbol
        Criteria<QAInput, String> criteria =
        ^
  symbol:   class Criteria
  location: class BertQaInference
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\BertQaInference.java:65: error: cannot find symbol
                Criteria.builder()
                ^
  symbol:   variable Criteria
  location: class BertQaInference
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\InstanceSegmentation.java:58: error: cannot find symbol
        Criteria<BufferedImage, DetectedObjects> criteria =
        ^
  symbol:   class Criteria
  location: class InstanceSegmentation
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\InstanceSegmentation.java:59: error: cannot find symbol
                Criteria.builder()
                ^
  symbol:   variable Criteria
  location: class InstanceSegmentation
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\ObjectDetection.java:58: error: cannot find symbol
        Criteria<BufferedImage, DetectedObjects> criteria =
        ^
  symbol:   class Criteria
  location: class ObjectDetection
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\ObjectDetection.java:59: error: cannot find symbol
                Criteria.builder()
                ^
  symbol:   variable Criteria
  location: class ObjectDetection
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\PoseEstimation.java:76: error: cannot find symbol
        Criteria<BufferedImage, DetectedObjects> criteria =
        ^
  symbol:   class Criteria
  location: class PoseEstimation
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\PoseEstimation.java:77: error: cannot find symbol
                Criteria.builder()
                ^
  symbol:   variable Criteria
  location: class PoseEstimation
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\PoseEstimation.java:114: error: cannot find symbol
        Criteria<BufferedImage, Joints> criteria =
        ^
  symbol:   class Criteria
  location: class PoseEstimation
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\inference\PoseEstimation.java:115: error: cannot find symbol
                Criteria.builder()
                ^
  symbol:   variable Criteria
  location: class PoseEstimation
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\training\TrainWithOptimizers.java:128: error: package Criteria does not exist
        Criteria.Builder<BufferedImage, Classifications> builder =
                ^
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\training\TrainWithOptimizers.java:129: error: cannot find symbol
                Criteria.builder()
                ^
  symbol:   variable Criteria
  location: class TrainWithOptimizers
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\training\transferlearning\TrainResnetWithCifar10.java:123: error: package Criteria does not exist
        Criteria.Builder<BufferedImage, Classifications> builder =
                ^
C:\MyWorks\DJL-AI\djl\examples\src\main\java\ai\djl\examples\training\transferlearning\TrainResnetWithCifar10.java:124: error: cannot find symbol
                Criteria.builder()
                ^
  symbol:   variable Criteria
  location: class TrainResnetWithCifar10
26 errors

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':compileJava'.
> Compilation failed; see the compiler error output for details.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

* Get more help at https://help.gradle.org

BUILD FAILED in 1s
1 actionable task: 1 executed

C:\MyWorks\DJL-AI\djl\examples> 

How to Reproduce?

Used the official version from GitHub.

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. Clone the https://github.com/awslabs/djl
  2. Build with gradlew build -x test
  3. Enter into examples using cd examples
  4. Try to build with gradlew jar

What have you tried to solve it?

  1. Tried with all the provided comments, didn't work.
  2. Tried with maven, didn't work.

Environment Info

Please run the command ./gradlew debugEnv from the root directory of DJL (if necessary, clone DJL first). It will output information about your system, environment, and installation that can help us debug your issue. Paste the output of the command below:

Microsoft Windows [Version 10.0.18362.657]
(c) 2019 Microsoft Corporation. All rights reserved.

C:\MyWorks\DJL-AI\djl>gradlew debugEnv
Found C:\MyWorks\DJL-AI\djl\\gradle\wrapper\gradle-wrapper.jar

> Configure project :mxnet:mxnet-engine
[WARN ] Header file has been changed in open source project: mxnet/c_api.h.
[WARN ] Header file has been changed in open source project: nnvm/c_api.h.

> Task :integration:debugEnv
[INFO ] - ----------System Properties----------
[INFO ] - sun.desktop: windows
[INFO ] - awt.toolkit: sun.awt.windows.WToolkit
[INFO ] - java.specification.version: 12
[INFO ] - sun.cpu.isalist: amd64
[INFO ] - sun.jnu.encoding: Cp1252
[INFO ] - java.class.path: C:\MyWorks\DJL-AI\djl\integration\build\classes\java\main;C:\MyWorks\DJL-AI\djl\integration\build\resources\main;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\commons-cli\commons-cli\1.4\c51c00206bb913cd8612b24abd9fa98ae89719b1\commons-cli-1.4.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\org.apache.logging.log4j\log4j-slf4j-impl\2.12.1\14973e22497adaf0196d481fb99c5dc2a0b58d41\log4j-slf4j-impl-2.12.1.jar;C:\MyWorks\DJL-AI\djl\basicdataset\build\libs\basicdataset-0.3.0-SNAPSHOT.jar;C:\MyWorks\DJL-AI\djl\model-zoo\build\libs\model-zoo-0.3.0-SNAPSHOT.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\org.testng\testng\6.8.1\8aebea980eee079365df20f0cf7fcac900d50250\testng-6.8.1.jar;C:\MyWorks\DJL-AI\djl\mxnet\mxnet-model-zoo\build\libs\mxnet-model-zoo-0.3.0-SNAPSHOT.jar;C:\MyWorks\DJL-AI\djl\mxnet\mxnet-engine\build\libs\mxnet-engine-0.3.0-SNAPSHOT.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\ai.djl.mxnet\mxnet-native-auto\1.6.0-c-SNAPSHOT\88086d340572c8452ce22c76b233e05974add594\mxnet-native-auto-1.6.0-c-SNAPSHOT.jar;C:\MyWorks\DJL-AI\djl\pytorch\pytorch-model-zoo\build\libs\pytorch-model-zoo-0.3.0-SNAPSHOT.jar;C:\MyWorks\DJL-AI\djl\pytorch\pytorch-engine\build\libs\pytorch-engine-0.3.0-SNAPSHOT.jar;C:\MyWorks\DJL-AI\djl\repository\build\libs\repository-0.3.0-SNAPSHOT.jar;C:\MyWorks\DJL-AI\djl\api\build\libs\api-0.3.0-SNAPSHOT.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\org.slf4j\slf4j-api\1.7.26\77100a62c2e6f04b53977b9f541044d7d722693d\slf4j-api-1.7.26.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\org.apache.logging.log4j\log4j-core\2.12.1\4382e93136c06bfb34ddfa0bb8a9fb4ea2f3df59\log4j-core-2.12.1.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\org.apache.logging.log4j\log4j-api\2.12.1\a55e6d987f50a515c9260b0451b4fa217dc539cb\log4j-api-2.12.1.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\org.beanshell\bsh\2.0b4\a05f0a0feefa8d8467ac80e16e7de071489f0d9c\bsh-2.0b4.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\com.beust\jcommander\1.27\58c9cbf0f1fa296f93c712f2cf46de50471920f9\jcommander-1.27.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\org.yaml\snakeyaml\1.6\a1e23e31c424d566ee27382e373d73a28fdabd88\snakeyaml-1.6.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\com.google.code.gson\gson\2.8.5\f645ed69d595b24d4cf8b3fbb64cc505bede8829\gson-2.8.5.jar;C:\Users\MasudRahman\.gradle\caches\modules-2\files-2.1\net.java.dev.jna\jna\5.3.0\4654d1da02e4173ba7b64f7166378847db55448a\jna-5.3.0.jar
[INFO ] - java.vm.vendor: Oracle Corporation
[INFO ] - sun.arch.data.model: 64
[INFO ] - user.variant:
[INFO ] - java.vendor.url: https://java.oracle.com/
[INFO ] - user.timezone: America/Toronto
[INFO ] - java.vm.specification.version: 12
[INFO ] - os.name: Windows 10
[INFO ] - org.apache.logging.log4j.assignedSequences: 159
[INFO ] - user.country: CA
[INFO ] - sun.java.launcher: SUN_STANDARD
[INFO ] - sun.boot.library.path: C:\Program Files\Java\jdk-12.0.2\bin
[INFO ] - sun.java.command: ai.djl.integration.util.DebugEnvironment
[INFO ] - jdk.debug: release
[INFO ] - sun.cpu.endian: little
[INFO ] - user.home: C:\Users\MasudRahman
[INFO ] - user.language: en
[INFO ] - java.specification.vendor: Oracle Corporation
[INFO ] - java.version.date: 2019-07-16
[INFO ] - java.home: C:\Program Files\Java\jdk-12.0.2
[INFO ] - file.separator: \
[INFO ] - java.vm.compressedOopsMode: 32-bit
[INFO ] - line.separator:

[INFO ] - java.vm.specification.vendor: Oracle Corporation
[INFO ] - java.specification.name: Java Platform API Specification
[INFO ] - java.awt.graphicsenv: sun.awt.Win32GraphicsEnvironment
[INFO ] - user.script:
[INFO ] - sun.management.compiler: HotSpot 64-Bit Tiered Compilers
[INFO ] - java.runtime.version: 12.0.2+10
[INFO ] - user.name: MasudRahman
[INFO ] - path.separator: ;
[INFO ] - os.version: 10.0
[INFO ] - java.runtime.name: Java(TM) SE Runtime Environment
[INFO ] - file.encoding: windows-1252
[INFO ] - java.vm.name: Java HotSpot(TM) 64-Bit Server VM
[INFO ] - java.vendor.url.bug: https://bugreport.java.com/bugreport/
[INFO ] - java.io.tmpdir: C:\Users\MASUDR~1\AppData\Local\Temp\
[INFO ] - java.version: 12.0.2
[INFO ] - user.dir: C:\MyWorks\DJL-AI\djl\integration
[INFO ] - os.arch: amd64
[INFO ] - java.vm.specification.name: Java Virtual Machine Specification
[INFO ] - sun.os.patch.level:
[INFO ] - java.library.path: C:\Program Files\Java\jdk-12.0.2\bin;C:\windows\Sun\Java\bin;C:\windows\system32;C:\windows;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\libnvvp;C:\windows\system32;C:\windows;C:\windows\System32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0\;C:\windows\System32\OpenSSH\;C:\Program Files\MiKTeX 2.9\miktex\bin\x64\;C:\Program Files\Git\cmd;C:\Program Files\Java\jdk-12.0.2\bin;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\Nsight Compute 2019.4.0\;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files (x86)\WiX Toolset v3.11\bin;C:\Program Files\dotnet\;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\;C:\MyWorks\MySofts\apache-maven-3.6.3\bin;C:\MyWorks\DJL-AI\gradle-6.2\bin;C:\Users\MasudRahman\AppData\Local\Programs\Python\Python37\Scripts\;C:\Users\MasudRahman\AppData\Local\Programs\Python\Python37\;C:\Users\MasudRahman\AppData\Local\Microsoft\WindowsApps;C:\Program Files\JetBrains\PyCharm Community Edition 2019.2.3\bin;;C:\Users\MasudRahman\AppData\Local\Programs\MiKTeX 2.9\miktex\bin\x64\;C:\Users\MasudRahman\.dotnet\tools;.
[INFO ] - java.vm.info: mixed mode, sharing
[INFO ] - java.vendor: Oracle Corporation
[INFO ] - java.vm.version: 12.0.2+10
[INFO ] - sun.io.unicode.encoding: UnicodeLittle
[INFO ] - java.class.version: 56.0
[INFO ] -
[INFO ] - ----------Environment Variables----------
[INFO ] - USERDOMAIN_ROAMINGPROFILE: LAPTOP-9GR27E2K
[INFO ] - PROCESSOR_LEVEL: 6
[INFO ] - RegionCode: NA
[INFO ] - SESSIONNAME: Console
[INFO ] - ALLUSERSPROFILE: C:\ProgramData
[INFO ] - PROCESSOR_ARCHITECTURE: AMD64
[INFO ] - PSModulePath: C:\Program Files\WindowsPowerShell\Modules;C:\windows\system32\WindowsPowerShell\v1.0\Modules
[INFO ] - SystemDrive: C:
[INFO ] - MOZ_PLUGIN_PATH: C:\Program Files (x86)\Foxit Software\Foxit Reader\plugins\
[INFO ] - DIRNAME: C:\MyWorks\DJL-AI\djl\
[INFO ] - USERNAME: MasudRahman
[INFO ] - CMD_LINE_ARGS: debugEnv
[INFO ] - ProgramFiles(x86): C:\Program Files (x86)
[INFO ] - APP_HOME: C:\MyWorks\DJL-AI\djl\
[INFO ] - CUDA_PATH_V10_1: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
[INFO ] - PATHEXT: .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
[INFO ] - DriverData: C:\Windows\System32\Drivers\DriverData
[INFO ] - OneDriveConsumer: C:\Users\MasudRahman\OneDrive
[INFO ] - platformcode: KV
[INFO ] - PyCharm Community Edition: C:\Program Files\JetBrains\PyCharm Community Edition 2019.2.3\bin;
[INFO ] - ProgramData: C:\ProgramData
[INFO ] - ProgramW6432: C:\Program Files
[INFO ] - HOMEPATH: \Users\MasudRahman
[INFO ] - NVCUDASAMPLES10_1_ROOT: C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.1
[INFO ] - PROCESSOR_IDENTIFIER: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
[INFO ] - ProgramFiles: C:\Program Files
[INFO ] - PUBLIC: C:\Users\Public
[INFO ] - windir: C:\windows
[INFO ] - =::: ::\
[INFO ] - _SKIP: 2
[INFO ] - LOCALAPPDATA: C:\Users\MasudRahman\AppData\Local
[INFO ] - USERDOMAIN: LAPTOP-9GR27E2K
[INFO ] - LOGONSERVER: \\LAPTOP-9GR27E2K
[INFO ] - JAVA_HOME: C:\Program Files\Java\jdk-12.0.2
[INFO ] - PROMPT: $P$G
[INFO ] - OneDrive: C:\Users\MasudRahman\OneDrive
[INFO ] - =C:: C:\MyWorks\DJL-AI\djl
[INFO ] - APPDATA: C:\Users\MasudRahman\AppData\Roaming
[INFO ] - DOWNLOAD_URL: "https://raw.githubusercontent.com/gradle/gradle/master/gradle/wrapper/gradle-wrapper.jar"
[INFO ] - JAVA_EXE: C:\Program Files\Java\jdk-12.0.2/bin/java.exe
[INFO ] - NVTOOLSEXT_PATH: C:\Program Files\NVIDIA Corporation\NvToolsExt\
[INFO ] - CommonProgramFiles: C:\Program Files\Common Files
[INFO ] - Path: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\libnvvp;C:\windows\system32;C:\windows;C:\windows\System32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0\;C:\windows\System32\OpenSSH\;C:\Program Files\MiKTeX 2.9\miktex\bin\x64\;C:\Program Files\Git\cmd;C:\Program Files\Java\jdk-12.0.2\bin;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\Nsight Compute 2019.4.0\;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files (x86)\WiX Toolset v3.11\bin;C:\Program Files\dotnet\;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\;C:\MyWorks\MySofts\apache-maven-3.6.3\bin;C:\MyWorks\DJL-AI\gradle-6.2\bin;C:\Users\MasudRahman\AppData\Local\Programs\Python\Python37\Scripts\;C:\Users\MasudRahman\AppData\Local\Programs\Python\Python37\;C:\Users\MasudRahman\AppData\Local\Microsoft\WindowsApps;C:\Program Files\JetBrains\PyCharm Community Edition 2019.2.3\bin;;C:\Users\MasudRahman\AppData\Local\Programs\MiKTeX 2.9\miktex\bin\x64\;C:\Users\MasudRahman\.dotnet\tools
[INFO ] - OS: Windows_NT
[INFO ] - COMPUTERNAME: LAPTOP-9GR27E2K
[INFO ] - NVCUDASAMPLES_ROOT: C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.1
[INFO ] - CUDA_PATH: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
[INFO ] - OnlineServices: Online Services
[INFO ] - PROCESSOR_REVISION: 8e0c
[INFO ] - CLASSPATH: C:\MyWorks\DJL-AI\djl\\gradle\wrapper\gradle-wrapper.jar
[INFO ] - CommonProgramW6432: C:\Program Files\Common Files
[INFO ] - ComSpec: C:\windows\system32\cmd.exe
[INFO ] - APP_BASE_NAME: gradlew
[INFO ] - NVCUDASAMPLES9_0_ROOT: C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0
[INFO ] - SystemRoot: C:\windows
[INFO ] - TEMP: C:\Users\MASUDR~1\AppData\Local\Temp
[INFO ] - HOMEDRIVE: C:
[INFO ] - USERPROFILE: C:\Users\MasudRahman
[INFO ] - TMP: C:\Users\MASUDR~1\AppData\Local\Temp
[INFO ] - CUDA_PATH_V9_0: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0
[INFO ] - CommonProgramFiles(x86): C:\Program Files (x86)\Common Files
[INFO ] - NUMBER_OF_PROCESSORS: 8
[INFO ] -
[INFO ] - ----------Default Engine----------
Exception in thread "main" java.util.ServiceConfigurationError: ai.djl.engine.EngineProvider: Provider ai.djl.mxnet.engine.MxEngineProvider could not be instantiated
        at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:583)
        at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:805)
        at java.base/java.util.ServiceLoader$ProviderImpl.get(ServiceLoader.java:723)
        at java.base/java.util.ServiceLoader$3.next(ServiceLoader.java:1395)
        at ai.djl.engine.Engine.initEngine(Engine.java:46)
        at ai.djl.engine.Engine.<clinit>(Engine.java:41)
        at ai.djl.integration.util.DebugEnvironment.main(DebugEnvironment.java:51)
Caused by: java.lang.ExceptionInInitializerError
        at ai.djl.mxnet.engine.MxEngine.<init>(MxEngine.java:40)
        at ai.djl.mxnet.engine.MxEngineProvider.<clinit>(MxEngineProvider.java:21)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
        at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:781)
        ... 5 more
Caused by: java.lang.IllegalStateException: Failed to download MXNet native library
        at ai.djl.mxnet.jna.LibUtils.findLibraryInClasspath(LibUtils.java:134)
        at ai.djl.mxnet.jna.LibUtils.getLibName(LibUtils.java:76)
        at ai.djl.mxnet.jna.LibUtils.loadLibrary(LibUtils.java:67)
        at ai.djl.mxnet.jna.JnaUtils.<clinit>(JnaUtils.java:69)
        ... 13 more
Caused by: java.nio.file.FileAlreadyExistsException: C:\Users\MasudRahman\.mxnet\cache\1.6.0-c-SNAPSHOT-20200218mkl-win-x86_64
        at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:351)
        at java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292)
        at java.base/java.nio.file.Files.move(Files.java:1424)
        at ai.djl.mxnet.jna.LibUtils.downloadMxnet(LibUtils.java:316)
        at ai.djl.mxnet.jna.LibUtils.findLibraryInClasspath(LibUtils.java:132)
        ... 16 more

> Task :integration:debugEnv FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':integration:debugEnv'.
> Process 'command 'C:\Program Files\Java\jdk-12.0.2\bin\java.exe'' finished with non-zero exit value 1

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

* Get more help at https://help.gradle.org

Deprecated Gradle features were used in this build, making it incompatible with Gradle 7.0.
Use '--warning-mode all' to show the individual deprecation warnings.
See https://docs.gradle.org/6.0.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD FAILED in 10s
29 actionable tasks: 2 executed, 27 up-to-date

C:\MyWorks\DJL-AI\djl> 

YOLO Models trained with Coco and Darknet53 return Cuda Memory error (MxNet engine)

Description

ObjectDetection example with the pre-trained Yolo models (dataset=coco, backbone=darknet53) return error:

"MXNet engine call failed: CUDA: Check failed: e == cudaSuccess: an illegal memory access was encountered"

Expected Behavior

Object Detection example should work on different pre-trained Yolo models. Note, Yolo models trained with Pascal VOC work just fine.

Error Message

INFO] --- exec-maven-plugin:1.6.0:java (default-cli) @ examples ---
Loading: 100% |\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588|
[11:32:04] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v1.6.0. Attempting to upgrade...
[11:32:04] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
model yolo
[11:32:13] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:97: Running performance tests to find the best convolution algorithm, this can take a while... (set the environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[WARNING]
ai.djl.engine.EngineException: MXNet engine call failed: CUDA: Check failed: e == cudaSuccess: an illegal memory access was encountered
Stack trace:
File "/codebuild/output/src546137840/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/AWS-MXNet/3rdparty/mshadow/mshadow/./stream_gpu-inl.h", line 81

at ai.djl.mxnet.jna.JnaUtils.checkCall (JnaUtils.java:1788)
at ai.djl.mxnet.jna.JnaUtils.syncCopyToCPU (JnaUtils.java:473)
at ai.djl.mxnet.engine.MxNDArray.toByteBuffer (MxNDArray.java:283)
at ai.djl.ndarray.NDArray.toIntArray (NDArray.java:279)
at ai.djl.modality.cv.translator.YoloTranslator.processOutput (YoloTranslator.java:40)
at ai.djl.modality.cv.translator.YoloTranslator.processOutput (YoloTranslator.java:26)
at ai.djl.inference.Predictor.processOutputs (Predictor.java:202)
at ai.djl.inference.Predictor.batchPredict (Predictor.java:160)
at ai.djl.inference.Predictor.predict (Predictor.java:112)
at ai.djl.examples.inference.ObjectDetectionBench.predict (ObjectDetectionBench.java:71)
at ai.djl.examples.inference.ObjectDetectionBench.main (ObjectDetectionBench.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:498)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
at java.lang.Thread.run (Thread.java:748)

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 16.461 s
[INFO] Finished at: 2020-05-30T11:32:18+02:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.6.0:java (default-cli) on project examples: An exception occured while executing the Java class. MXNet engine call failed: CUDA: Check failed: e == cudaSuccess: an illegal memory access was encountered
[ERROR] Stack trace:
[ERROR] File "/codebuild/output/src546137840/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/AWS-MXNet/3rdparty/mshadow/mshadow/./stream_gpu-inl.h", line 81
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[11:32:18] src/resource.cc:279: Ignore CUDA Error [11:32:18] src/storage/./pooled_storage_manager.h:97: CUDA: an illegal memory access was encountered

[[[[11:32:18] 11:32:18] src/engine/threaded_engine_perdevice.cc11:32:18src/engine/threaded_engine_perdevice.cc] src/engine/threaded_engine_perdevice.cc:27511:32:18:275:: 275: Ignore CUDA Error [11:32:18] /codebuild/output/src546137840/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/AWS-MXNet/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:203: Check failed: e == cudaSuccess: CUDA: an illegal memory access was encountered

] src/engine/threaded_engine_perdevice.cc:275Ignore CUDA Error [11:32:18] /codebuild/output/src546137840/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/AWS-MXNet/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:203: Check failed: e == cudaSuccess: CUDA: an illegal memory access was encountered

: : Ignore CUDA Error [11:32:18] /codebuild/output/src546137840/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/AWS-MXNet/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:203: Check failed: e == cudaSuccess: CUDA: an illegal memory access was encountered

Ignore CUDA Error [11:32:18] /codebuild/output/src546137840/src/git-codecommit.us-west-2.amazonaws.com/v1/repos/AWS-MXNet/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:203: Check failed: e == cudaSuccess: CUDA: an illegal memory access was encountered

terminate called after throwing an instance of 'dmlc::Error'
what(): [11:32:18] src/storage/./pooled_storage_manager.h:97: CUDA: an illegal memory access was encountered

Aborted (core dumped)

How to Reproduce?

  1. Change the Criteria configuration of 'predict' method of ai.djl.examples.inference.ObjectDetection.java :

from

.optFilter("backbone", "resnet50")"

to

.optFilter("dataset", "coco")
.optFilter("imageSize","416")
.optFilter("backbone", "darknet53")

  1. run
    mvn exec:java -Dexec.mainClass="ai.djl.examples.inference.ObjectDetection"

Environment Info

Ubuntu, CUDA 10.2, GPU V100

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.