Coder Social home page Coder Social logo

tensorcom's Introduction

Test TestPip DeepSource

Tensorcom

Status: alpha software release, APIs may still change

Tensorcom is a way of loading training data into deep learning frameworks quickly and portably. You can write a single data loading/augmentation pipeline and train one or more jobs in the same or different frameworks with it.

Both Keras and PyTorch can use the Python Connection object for input, but MessagePack and ZMQ libraries exist in all major languages, making it easy to write servers and input operators for any framework.

Tensorcom replaces the use of multiprocessing in Python for that purpose. Both use separate processes for loading and augmentation, but by making the processes and communications explicit, you gain some significant advantages:

  • the same augmentation pipeline can be used with different DL frameworks
  • augmentation processes can easily be run on multiple machines
  • output from a single automentation pipeline can be shared by many training jobs
  • you can start up and test the augmentation pipeline before you start the Dl jobs
  • DL frameworks wanting to use tensorcom only need a small library to handle input

Using tensorcom for training is very simple. First, start up a data server; for Imagenet, there are two example jobs. The serve-imagenet-dir program illustrates how to use the standard PyTorch Imagenet DataLoader to serve training data:

    $ serve-imagenet-dir -d /data/imagenet -b 64 zpub://127.0.0.1:7880

The server will give you information about the rate at which it serves image batches. Your training loop then becomes very simple:

    training = tensorcom.Connection("zsub://127.0.0.1:7880", epoch=1000000)
    for xs, ys in training:
        train_batch(xs, ys)

If you want multiple jobs for augmentation, just use more publishers using Bash-style brace notation: zpub://127.0.0.1:788{0..3} and zsub://127.0.0.1:788{0..3}.

Note that you can start up multiple training jobs connecting to the same server.

Command Line Tools

There are some command line programs to help with developing and debugging these jobs:

  • tensormon -- connect to a data server and monitor throughput
  • tensorshow -- show images from input batches
  • tensorstat -- compute statistics over input data samples

Examples

  • serve-imagenet-dir -- serve Imagenet data from a file system using PyTorch
  • serve-imagenet-shards -- serve Imagenet from shards using webloader
  • keras.ipynb -- simple example of using Keras with tensorcom
  • pytorch.ipynb -- simple example of using PyTorch with tensorcom

ZMQ URLs

There is no official standard for ZMQ URLs. This library uses the following notation:

Socket types:

  • zpush / zpull -- standard PUSH/PULL sockets
  • zrpush / zrpull -- reverse PUSH/PULL connections (PUSH socket is server / PULL socket connects)
  • zpub / zsub -- standard PUB/SUB sockets
  • zrpub / zrsub -- reverse PUB/SUB connections

The pub/sub servers allow the same augmentation pipeline to be shared by multiple learning jobs.

Default transport is TCP/IP, but you can choose IPC as in zpush+ipc://mypath.

Connection Objects

The major way of interacting with the library is through the Connection object. It simply gives you an iterator over training samples.

Encodings

Data is encoded in a simple binary tensor format; see codec.py for details. The same format can also be used for saving and loading lists of tensors from disk (extension: .ten). Data is encoded on 64 byte aligned boundaries to allow easy memory mapping and direct use by CPUs and GPUs.

tensorcom's People

Contributors

shoaibahmed avatar tmbdev avatar tmbnv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tensorcom's Issues

ACID compliant

Thanks for pointing me to this repo from webdatasets!

I have spent some time looking at zmq and tensorcom. Cool project! I am curious if there is a messaging pattern that will guarantee all elements get processed?

This works well if we have a training job, but what if we want to make inference over every element and be guaranteed each one is served? Maybe REQ/RESP? I don't think that has been implemented and I think there might be a throughout cost associated with that.

Maybe there is another tool other than zmq you have looked at?

Cheers!

Unable to subscribe/publish to topics

While I can manually set up a topic to listen to in the subscriber, the check_acceptable_input_type blocks most simple headers I could check for. So, right now, I'm manually adding a bytestring first, then sending the encoded buffer with pure zmq:

# send camera data
from displayarray import read_updates
import numpy as np
import zmq
from tensorcom.tenbin import encode_buffer

ctx = zmq.Context()
s = ctx.socket(zmq.PUB)
s.bind("tcp://127.0.0.1:7880")
for upd in read_updates(0, size=(9999,9999)):
    if upd:
        u = next(iter(upd.values()))[0]
        s.send_multipart([b'topic', encode_buffer([arr])])
# receive and display camera data
import zmq
from displayarray import display
from tensorcom.tenbin import decode_buffer

ctx = zmq.Context()
s = ctx.socket(zmq.SUB)
s.setsockopt(zmq.SUBSCRIBE, b"topic")
s.connect("tcp://127.0.0.1:7880")

d = display()
while True:
    r = s.recv_multipart()
    # r[0]=="topic"
    arr = decode_buffer(r[1])
    d.update(arr[0], '0')

Are there any plans for native support for topics? It's usually one of the more important things when working with pub-sub models.

Regarding multiple augmentation processes server and client

Taking the serve-imagenet-shards as an example, I implemented for my own WebDataset. There are a few points I would like to highlight:

  • using multiple workers in the dataloader and providing an address range as zpub://0.0.0.0:788{0..4} results in daemonic processes are not allowed to have children error. To circumvent this, I used from concurrent.futures import ProcessPoolExecutor as Pool instead of multiprocessing.Pool. This worked fine, though there is risk of zombie processes on exit of the main script if the grandchildren processes are still running.
  • On the client side, I am able to get the data from the address range, however, I am trying to run a multiprocess based webdataset.WebLoader as below:
def identity(x):
   """Return the argument."""
   return x


dataset = wds.Processor(tensorcom.Connection("zsub://<server_name>:788{0..4}", converters="torch"), identity)
dataloader = wds.WebLoader(dataset, num_workers=1, batch_size=None)
collate_fn = MyCollate()
dataloader = dataloader.unbatched().shuffle(1000).batched(batchsize=64, collation_fn=collate_fn, partial=False)

The above code hangs and is unable to get any data if I use num_workers > 0.

Is there a way to do this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.