This project demonstrates how Scala can be used effectively with the ML library TensorFlow.
TensorFlow is usually used with Python. They also provide a Java API but it is not very powerful so this project uses the excellent tensorflow_scala project from Emmanouil Antonios Platanios.
The code here is an approximate Scala translation of MNIST Python code. It trains and tests using the tensorflow_scala API on MNIST data. For a full explanation of the original Python code go here. I wrote a simple version for this project, not a CNN version like the one provided by Platanios.
This main code of this project is easy to browse: MNISTSimple.scala.
tensorflow_scala is packaged as fat jars in the lib
directory. Version 0.4.2 is used. I
modified the code slightly, fixing/removing outdated dependencies, allowing the saving of
checkpoint state directly from external code, and adding an assembly.sbt file for fat jar
generation.
-
Clone or pull this project to a convenient folder on your local machine. MacOS or Ubuntu should work.
-
This project works with TensorFlow 1.x. If you have not set up TensorFlow before, you will need the
libtensorflow.so
andlibtensorflow_framework.so
dynamic libraries. For convenience, I have provided copies in thelibs
directory. You need to ensure both those libraries are in yourLD_LIBRARY_PATH
. For example, on a Mac you might move them to your/usr/local/lib/
directory and have this in your .bash_profile:export LD_LIBRARY_PATH=/usr/local/lib
-
If you are working on a modern Mac, you will probably run into errors unless you disable SIP. There is some conflict between SIP and TensorFlow. Here are some instructions.
-
You should have Scala installed. This code has been tested with Scala 2.12.8 and sbt 1.x.
In the directory where you installed this project, run:
sbt run
After compilation you should see the main output. Like this:
2020-07-01 16:39:10.133 [run-main-0] INFO MNIST Data Loader - Extracting images from file 'datasets/MNIST/train-images-idx3-ubyte.gz'.
2020-07-01 16:39:10.591 [run-main-0] INFO MNIST Data Loader - Extracting labels from file 'datasets/MNIST/train-labels-idx1-ubyte.gz'.
2020-07-01 16:39:10.598 [run-main-0] INFO MNIST Data Loader - Extracting images from file 'datasets/MNIST/t10k-images-idx3-ubyte.gz'.
2020-07-01 16:39:10.659 [run-main-0] INFO MNIST Data Loader - Extracting labels from file 'datasets/MNIST/t10k-labels-idx1-ubyte.gz'.
2020-07-01 16:39:10.661 [run-main-0] INFO MNIST Data Loader - Finished loading the MNIST dataset.
Processing batch 0 of 1000
Processing batch 100 of 1000
Processing batch 200 of 1000
Processing batch 300 of 1000
Processing batch 400 of 1000
Processing batch 500 of 1000
Processing batch 600 of 1000
Processing batch 700 of 1000
Processing batch 800 of 1000
Processing batch 900 of 1000
2020-07-01 16:42:01.428 [run-main-0] INFO Variables / Saver - Saving parameters to '/Users/james/dev/ScalaTensorflowPOC/model/model'.
2020-07-01 16:42:02.080 [run-main-0] INFO Variables / Saver - Saved parameters to '/Users/james/dev/ScalaTensorflowPOC/model/model'.
Accuracy: 0.9156