Coder Social home page Coder Social logo

sutkarsh / cottoncandy Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gallantlab/cottoncandy

0.0 3.0 0.0 1.57 MB

sugar for s3

Home Page: http://gallantlab.github.io/cottoncandy/

License: BSD 2-Clause "Simplified" License

Python 99.77% TeX 0.23%

cottoncandy's Introduction

Welcome to cottoncandy!

Build Status DOI

sugar for s3

https://gallantlab.github.io/cottoncandy

What is cottoncandy?

A python scientific library for storing and accessing numpy array data on S3. This is achieved by reading arrays from memory and downloading arrays directly into memory. This means that you don't have to download your array to disk, and then load it from disk into your python session.

This library relies heavily on boto3

Installation

Clone the repo from GitHub and do the usual python install from the command line

$ git clone https://github.com/gallantlab/cottoncandy.git
$ cd cottoncandy
$ sudo python setup.py install

A configuration file will be saved under ~/.config/cottoncandy/options.cfg. Upon installation cottoncandy will try to find your AWS keys and store them in this file. See the default file for more configuration options.

Object and bucket permissions are set to authenticated-read by default. If you wish to keep all your objects private, modify the configuration file and set default_acl = private. See AWS ACL overview for more information on S3 permissions.

Getting started

Setup the connection (endpoint, access and secret keys can be specified in the configuration file instead)::

>>> import cottoncandy as cc
>>> cci = cc.get_interface('my_bucket',
                           ACCESS_KEY='FAKEACCESSKEYTEXT',
                           SECRET_KEY='FAKESECRETKEYTEXT',
                           endpoint_url='https://s3.amazonaws.com')

Storing numpy arrays

>>> import numpy as np
>>> arr = np.random.randn(100)
>>> s3_response = cci.upload_raw_array('myarray', arr)
>>> arr_down = cci.download_raw_array('myarray')
>>> assert np.allclose(arr, arr_down)

Storing dask arrays

>>> arr = np.random.randn(100,600,1000)
>>> s3_response = cci.upload_dask_array('test_dim', arr, axis=-1)
>>> dask_object = cci.download_dask_array('test_dim')
>>> dask_object
dask.array<array, shape=(100, 600, 1000), dtype=float64, chunksize=(100, 600, 100)>
>>> dask_slice = dask_object[..., :200]
>>> dask_slice
dask.array<getitem..., shape=(100, 600, 1000), dtype=float64, chunksize=(100, 600, 100)>
>>> downloaded_data = np.asarray(dask_slice) # this downloads the array
>>> downloaded_data.shape
(100, 600, 200)

Command-line search

>>> cci.glob('/path/to/*/file01*.grp/image_data')
['/path/to/my/file01a.grp/image_data',
 '/path/to/my/file01b.grp/image_data',
 '/path/to/your/file01a.grp/image_data',
 '/path/to/your/file01b.grp/image_data']
>>> cci.glob('/path/to/my/file02*.grp/*')
['/path/to/my/file02a.grp/image_data',
 '/path/to/my/file02a.grp/text_data',
 '/path/to/my/file02b.grp/image_data',
 '/path/to/my/file02b.grp/text_data']

File system-like object browsing

>>> import cottoncandy as cc
>>> browser = cc.get_browser('my_bucket_name',
                             ACCESS_KEY='FAKEACCESSKEYTEXT',
                             SECRET_KEY='FAKESECRETKEYTEXT',
                             endpoint_url='https://s3.amazonaws.com')
>>> browser.sweet_project.sub<TAB>
browser.sweet_project.sub01_awesome_analysis_DOT_grp
browser.sweet_project.sub02_awesome_analysis_DOT_grp
>>> browser.sweet_project.sub01_awesome_analysis_DOT_grp
<cottoncandy-group <bucket:my_bucket_name> (sub01_awesome_analysis.grp: 3 keys)>
>>> browser.sweet_project.sub01_awesome_analysis_DOT_grp.result_model01
<cottoncandy-dataset <bucket:my_bucket_name [1.00MB:shape=(10000)]>

Google Drive backend (experimental)

cottoncandy can also use Google Drive as a back-end. This equires a client_secrets.json file in your ~/.config/cottoncandy folder and the pydrive package.

See the Google Drive setup instructions for more details.

>>> import cottoncandy as cc
>>> cci = cc.get_interface(backend='gdrive')

Encryption (highly experimental)

cottoncandyprovides a transparent encryption interface for AWS S3 and Google Drive. This requires the pycrypto package. This is HIGHLY EXPERIMENTAL.

>>> import cottoncandy as cc
>>> cci = cc.get_encrypted_interface('my_bucket_name',
                                      ACCESS_KEY='FAKEACCESSKEYTEXT',
                                      SECRET_KEY='FAKESECRETKEYTEXT',
                                      endpoint_url='https://s3.amazonaws.com')                               

Cite as

Nunez-Elizalde AO, Zhang T, Huth AG, Gao JS, Slivkoff, Lescroart MD, Deniz F, McNeil C, Gibboni R, Popham SF, Rokem A, Oliver MD and Gallant JL. cottoncandy: scientific python package for easy cloud storage. Zenodo. 2017. http://doi.org/10.5281/zenodo.1034342

cottoncandy's People

Contributors

alexhuth avatar anwarnunez avatar arokem avatar candytaco avatar cmcneil avatar fatmai avatar marklescroart avatar r-b-g-b avatar spopham avatar sslivkoff avatar sutkarsh avatar the-moliver avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.