Coder Social home page Coder Social logo

zhjunqin / io Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tensorflow/io

0.0 1.0 0.0 7.12 MB

Datasets and filesystem extensions maintained by SIG-IO

License: Apache License 2.0

Shell 2.62% R 4.86% Dockerfile 0.23% Python 30.08% C++ 62.23%

io's Introduction

TensorFlow I/O

Travis-CI Build Status PyPI Status Badge CRAN_Status_Badge

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support.

At the moment TensorFlow I/O supports the following data sources:

  • tensorflow_io.ignite: Data source for Apache Ignite and Ignite File System (IGFS). Overview and usage guide here.
  • tensorflow_io.kafka: Apache Kafka stream-processing support.
  • tensorflow_io.kinesis: Amazon Kinesis data streams support.
  • tensorflow_io.hadoop: Hadoop SequenceFile format support.
  • tensorflow_io.arrow: Apache Arrow data format support. Usage guide here.
  • tensorflow_io.image: WebP and TIFF image format support.
  • tensorflow_io.libsvm: LIBSVM file format support.
  • tensorflow_io.video: Video file support with FFmpeg.
  • tensorflow_io.parquet: Apache Parquet data format support.
  • tensorflow_io.lmdb: LMDB file format support.
  • tensorflow_io.mnist: MNIST file format support.
  • tensorflow_io.pubsub: Google Cloud Pub/Sub support.

Installation

Python Package

The tensorflow-io Python package could be installed with pip directly:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

The use of tensorflow-io is straightforward with keras. Below is the example of Get Started with TensorFlow with data processing replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io.mnist as mnist_io

# Read MNIST into tf.data.Dataset
d_train = mnist_io.MNISTDataset(
    'train-images-idx3-ubyte.gz',
    'train-labels-idx1-ubyte.gz',
    compression_type="GZIP")

# By default image data is uint8 so conver to float32.
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y)).batch(1)

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(d_train, epochs=5, steps_per_epoch=10000)

Note that in the above example, MNIST database files are assumed to have been downloaded and saved to the local directory. Compression files could be processed automatically with compression_type="GZIP".

R Package

Once the tensorflow-io Python package has beem successfully installed, you can then install the latest stable release of the R package via:

install.packages('tfio')

You can also install the development version from Github via:

if (!require("devtools")) install.packages("devtools")
devtools::install_github("tensorflow/io", subdir = "R-package")

Developing

Python

For Python development, a reference Dockerfile here can be used to build the TensorFlow I/O package (tensorflow-io) from source:

$ # Build and run the Docker image
$ docker build -f dev/Dockerfile -t tfio-dev .
$ docker run -it -rm --net=host -v ${PWD}:/v -w /v tfio-dev
$ # In Docker, configure will install TensorFlow or use existing install
$ ./configure.sh
$ # Build TensorFlow I/O C++
$ bazel build -s --verbose_failures //tensorflow_io/...
$ # Run tests with PyTest, note: some tests require launching additional containers to run (see below)
$ pytest tests/
$ # Build the TensorFlow I/O package
$ python setup.py bdist_wheel

A package file dist/tensorflow_io-*.whl will be generated after a build is successful.

NOTE: When working in the Python development container, an environment variable TFIO_DATAPATH is automatically set to point tensorflow-io to the shared C++ libraries built by Bazel to run pytest and build the bdist_wheel. Python setup.py can also accept --data [path] as an argument, for example python setup.py --data bazel-bin bdist_wheel.

Starting Test Containers

Some tests require launching a test container before running. In order to run all tests, execute the following commands:

$ bash -x -e tests/test_ignite/start_ignite.sh
$ bash -x -e tests/test_kafka/kafka_test.sh start kafka
$ bash -x -e tests/test_kinesis/kinesis_test.sh start kinesis

R

We provide a reference Dockerfile here for you so that you can use the R package directly for testing. You can build it via:

docker build -t tfio-r-dev -f R-package/scripts/Dockerfile .

Inside the container, you can start your R session, instantiate a SequenceFileDataset from an example Hadoop SequenceFile string.seq, and then use any transformation functions provided by tfdatasets package on the dataset like the following:

library(tfio)
dataset <- sequence_file_dataset("R-package/tests/testthat/testdata/string.seq") %>%
    dataset_repeat(2)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})

Community

License

Apache License 2.0

io's People

Contributors

artemmalykh avatar bryancutler avatar damienpontifex avatar dmitrievanthony avatar gunan avatar henrytansetiawan avatar jpienaar avatar jsimsa avatar knightxun avatar lc0 avatar martinwicke avatar meteorcloudy avatar mrry avatar rongjiecomputer avatar saeta avatar superbobry avatar suphoff avatar tensorflower-gardener avatar terrytangyuan avatar wookayin avatar xiejw avatar yifeif avatar yongtang avatar yupbank avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.