Coder Social home page Coder Social logo

zarrita's Introduction

Here be dragons. Zarrita is a minimal, exploratory implementation of the Zarr version 3.0 core protocol. This is a technical spike only, not for production use.

This README contains a doctest suite to verify basic functionality. Both local and remote file systems are supported via fsspec.

Ensure blank slate:

>>> import shutil
>>> shutil.rmtree('test.zr3', ignore_errors=True)
 

Create a hierarchy

Create a new hierarchy stored on the local file system:

>>> import zarrita
>>> h = zarrita.create_hierarchy('test.zr3')
>>> h  # doctest: +ELLIPSIS
<Hierarchy at file://.../test.zr3>
>>> from sh import tree, cat
>>> tree('test.zr3', '-n', '--noreport')  # doctest: +NORMALIZE_WHITESPACE
test.zr3
└── zarr.json
>>> cat('test.zr3/zarr.json')
{
    "zarr_format": "https://purl.org/zarr/spec/protocol/core/3.0",
    "metadata_encoding": "application/json",
    "extensions": []
}

Open a hierarchy

Access a previously created hierarchy:

>>> h = zarrita.get_hierarchy('test.zr3')
>>> h  # doctest: +ELLIPSIS
<Hierarchy at file://.../test.zr3>

Create an array

>>> from numcodecs import GZip
>>> compressor = GZip(level=1)
>>> attrs = {'question': 'life', 'answer': 42}
>>> a = h.create_array('/arthur/dent', shape=(5, 10), dtype='i4', chunk_shape=(2, 5), compressor=compressor, attrs=attrs)
>>> a
<Array /arthur/dent>
>>> a.path
'/arthur/dent'
>>> a.name
'dent'
>>> a.ndim
2
>>> a.shape
(5, 10)
>>> a.dtype
dtype('int32')
>>> a.chunk_shape
(2, 5)
>>> a.compressor
GZip(level=1)
>>> a.attrs
{'question': 'life', 'answer': 42}
>>> tree('test.zr3', '-n', '--noreport')  # doctest: +NORMALIZE_WHITESPACE
test.zr3
├── meta
│   └── root
│       └── arthur
│           └── dent.array
└── zarr.json
>>> cat('test.zr3/meta/root/arthur/dent.array')
{
    "shape": [
        5,
        10
    ],
    "data_type": "<i4",
    "chunk_grid": {
        "type": "regular",
        "chunk_shape": [
            2,
            5
        ]
    },
    "chunk_memory_layout": "C",
    "compressor": {
        "codec": "https://purl.org/zarr/spec/codec/gzip/1.0",
        "configuration": {
            "level": 1
        }
    },
    "fill_value": null,
    "extensions": [],
    "attributes": {
        "question": "life",
        "answer": 42
    }
}

Create a group

>>> attrs = {'heart': 'gold', 'improbability': 'infinite'}
>>> g = h.create_group('/tricia/mcmillan', attrs=attrs)
>>> g
<Group /tricia/mcmillan>
>>> g.path
'/tricia/mcmillan'
>>> g.name
'mcmillan'
>>> g.attrs
{'heart': 'gold', 'improbability': 'infinite'}
>>> tree('test.zr3', '-n', '--noreport')  # doctest: +NORMALIZE_WHITESPACE
test.zr3
├── meta
│   └── root
│       ├── arthur
│       │   └── dent.array
│       └── tricia
│           └── mcmillan.group
└── zarr.json
>>> cat('test.zr3/meta/root/tricia/mcmillan.group')
{
    "extensions": [],
    "attributes": {
        "heart": "gold",
        "improbability": "infinite"
    }
}

Create nodes via groups

>>> h.root.create_group('marvin')
<Group /marvin>
>>> h.root['marvin'].create_group('paranoid')
<Group /marvin/paranoid>
>>> h.root['marvin'].create_array('android', shape=(42, 42), dtype=bool, chunk_shape=(2, 2))
<Array /marvin/android>
>>> tree('test.zr3', '-n', '--noreport')  # doctest: +NORMALIZE_WHITESPACE
test.zr3
├── meta
│   └── root
│       ├── arthur
│       │   └── dent.array
│       ├── marvin
│       │   ├── android.array
│       │   └── paranoid.group
│       ├── marvin.group
│       └── tricia
│           └── mcmillan.group
└── zarr.json

Access an array

>>> a = h['/arthur/dent']
>>> a
<Array /arthur/dent>
>>> a.shape
(5, 10)
>>> a.dtype
dtype('int32')
>>> a.chunk_shape
(2, 5)
>>> a.compressor
GZip(level=1)
>>> a.attrs
{'question': 'life', 'answer': 42}

Access an explicit group

>>> g = h['/tricia/mcmillan']
>>> g
<Group /tricia/mcmillan>
>>> g.attrs
{'heart': 'gold', 'improbability': 'infinite'}

Access implicit groups

>>> h['/']
<Group / (implied)>
>>> h['/arthur']
<Group /arthur (implied)>
>>> h['/tricia']
<Group /tricia (implied)>

Access nodes via groups

>>> root = h['/']
>>> root
<Group / (implied)>
>>> arthur = root['arthur']
>>> arthur
<Group /arthur (implied)>
>>> arthur['dent']
<Array /arthur/dent>
>>> tricia = root['tricia']
>>> tricia
<Group /tricia (implied)>
>>> tricia['mcmillan']
<Group /tricia/mcmillan>

List group children

Explore the hierarchy top-down:

>>> h.list_children('/')  # doctest: +NORMALIZE_WHITESPACE
[{'name': 'marvin', 'type': 'explicit_group'}, 
 {'name': 'arthur', 'type': 'implicit_group'}, 
 {'name': 'tricia', 'type': 'implicit_group'}]
>>> h.list_children('/tricia')
[{'name': 'mcmillan', 'type': 'explicit_group'}]
>>> h.list_children('/tricia/mcmillan')
[]
>>> h.list_children('/arthur')
[{'name': 'dent', 'type': 'array'}]

Alternative way to explore the hierarchy:

>>> h.root
<Group / (implied)>
>>> h.root.list_children()  # doctest: +NORMALIZE_WHITESPACE
[{'name': 'marvin', 'type': 'explicit_group'}, 
 {'name': 'arthur', 'type': 'implicit_group'}, 
 {'name': 'tricia', 'type': 'implicit_group'}]
>>> h.root['tricia'].list_children()
[{'name': 'mcmillan', 'type': 'explicit_group'}]
>>> h.root['tricia']['mcmillan'].list_children()
[]
>>> h.root['arthur'].list_children()
[{'name': 'dent', 'type': 'array'}]

Check existence of nodes in a hierarchy

>>> '/' in h
True
>>> '/arthur' in h
True
>>> '/arthur/dent' in h
True
>>> '/zaphod' in h
False
>>> '/zaphod/beeblebrox' in h
False
>>> '/tricia' in h
True
>>> '/tricia/mcmillan' in h
True

Check existence of children in a group

>>> 'arthur' in h.root
True
>>> 'tricia' in h.root
True
>>> 'zaphod' in h.root
False
>>> g = h.root['arthur']
>>> 'dent' in g
True
>>> g = h.root['tricia']
>>> 'mcmillan' in g
True
>>> 'beeblebrox' in g
False

Read and write array data

>>> import numpy as np
>>> a = h['/arthur/dent']
>>> a
<Array /arthur/dent>
>>> tree('test.zr3', '-n', '--noreport')  # doctest: +NORMALIZE_WHITESPACE
test.zr3
├── meta
│   └── root
│       ├── arthur
│       │   └── dent.array
│       ├── marvin
│       │   ├── android.array
│       │   └── paranoid.group
│       ├── marvin.group
│       └── tricia
│           └── mcmillan.group
└── zarr.json
>>> a[:, :]
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)
>>> a[...]
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)
>>> a[:]
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)
>>> a[0, :] = 42
>>> a[:]
array([[42, 42, 42, 42, 42, 42, 42, 42, 42, 42],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0]], dtype=int32)
>>> tree('test.zr3', '-n', '--noreport')  # doctest: +NORMALIZE_WHITESPACE
test.zr3
├── data
│   └── arthur
│       └── dent
│           ├── 0.0
│           └── 0.1
├── meta
│   └── root
│       ├── arthur
│       │   └── dent.array
│       ├── marvin
│       │   ├── android.array
│       │   └── paranoid.group
│       ├── marvin.group
│       └── tricia
│           └── mcmillan.group
└── zarr.json
>>> a[:, 0] = 42
>>> a[:]
array([[42, 42, 42, 42, 42, 42, 42, 42, 42, 42],
       [42,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [42,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [42,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [42,  0,  0,  0,  0,  0,  0,  0,  0,  0]], dtype=int32)
>>> tree('test.zr3', '-n', '--noreport')  # doctest: +NORMALIZE_WHITESPACE
test.zr3
├── data
│   └── arthur
│       └── dent
│           ├── 0.0
│           ├── 0.1
│           ├── 1.0
│           └── 2.0
├── meta
│   └── root
│       ├── arthur
│       │   └── dent.array
│       ├── marvin
│       │   ├── android.array
│       │   └── paranoid.group
│       ├── marvin.group
│       └── tricia
│           └── mcmillan.group
└── zarr.json
>>> a[:] = 42
>>> a[:]
array([[42, 42, 42, 42, 42, 42, 42, 42, 42, 42],
       [42, 42, 42, 42, 42, 42, 42, 42, 42, 42],
       [42, 42, 42, 42, 42, 42, 42, 42, 42, 42],
       [42, 42, 42, 42, 42, 42, 42, 42, 42, 42],
       [42, 42, 42, 42, 42, 42, 42, 42, 42, 42]], dtype=int32)
>>> tree('test.zr3', '-n', '--noreport')  # doctest: +NORMALIZE_WHITESPACE
test.zr3
├── data
│   └── arthur
│       └── dent
│           ├── 0.0
│           ├── 0.1
│           ├── 1.0
│           ├── 1.1
│           ├── 2.0
│           └── 2.1
├── meta
│   └── root
│       ├── arthur
│       │   └── dent.array
│       ├── marvin
│       │   ├── android.array
│       │   └── paranoid.group
│       ├── marvin.group
│       └── tricia
│           └── mcmillan.group
└── zarr.json
>>> a[0, :] = np.arange(10)
>>> a[:]
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [42, 42, 42, 42, 42, 42, 42, 42, 42, 42],
       [42, 42, 42, 42, 42, 42, 42, 42, 42, 42],
       [42, 42, 42, 42, 42, 42, 42, 42, 42, 42],
       [42, 42, 42, 42, 42, 42, 42, 42, 42, 42]], dtype=int32)
>>> a[:, 0] = np.arange(0, 50, 10)
>>> a[:]
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 42, 42, 42, 42, 42, 42, 42, 42, 42],
       [20, 42, 42, 42, 42, 42, 42, 42, 42, 42],
       [30, 42, 42, 42, 42, 42, 42, 42, 42, 42],
       [40, 42, 42, 42, 42, 42, 42, 42, 42, 42]], dtype=int32)
>>> a[:] = np.arange(50).reshape(5, 10)
>>> a[:]
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]], dtype=int32)
>>> a[:, 0]
array([ 0, 10, 20, 30, 40], dtype=int32)
>>> a[:, 1]
array([ 1, 11, 21, 31, 41], dtype=int32)
>>> a[0, :]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
>>> a[1, :]
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19], dtype=int32)
>>> a[:, 0:7]
array([[ 0,  1,  2,  3,  4,  5,  6],
       [10, 11, 12, 13, 14, 15, 16],
       [20, 21, 22, 23, 24, 25, 26],
       [30, 31, 32, 33, 34, 35, 36],
       [40, 41, 42, 43, 44, 45, 46]], dtype=int32)
>>> a[0:3, :]
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]], dtype=int32)
>>> a[0:3, 0:7]
array([[ 0,  1,  2,  3,  4,  5,  6],
       [10, 11, 12, 13, 14, 15, 16],
       [20, 21, 22, 23, 24, 25, 26]], dtype=int32)
>>> a[1:4, 2:7]
array([[12, 13, 14, 15, 16],
       [22, 23, 24, 25, 26],
       [32, 33, 34, 35, 36]], dtype=int32)

Invalid node names

>>> bad_paths = '', '//', '/foo//bar', ' ', '.', '/../', '/foo/./bar', '/foo/../bar', '/ ', 'Καλημέρα'
>>> for p in bad_paths:
...     try:
...         h.create_group(p)
...     except ValueError:
...         pass

Use cloud storage

Read data previously copied to GCS:

>>> h = zarrita.get_hierarchy('gs://zarr-demo/v3/test.zr3', token='anon')
>>> h
<Hierarchy at gs://zarr-demo/v3/test.zr3>
>>> h.list_children('/')  # doctest: +NORMALIZE_WHITESPACE
[{'name': 'marvin', 'type': 'explicit_group'}, 
 {'name': 'arthur', 'type': 'implicit_group'},
 {'name': 'tricia', 'type': 'implicit_group'}]
>>> h.list_children('/arthur')
[{'name': 'dent', 'type': 'array'}]
>>> h.list_children('/tricia')
[{'name': 'mcmillan', 'type': 'explicit_group'}]
>>> h.list_children('/tricia/mcmillan')
[]
>>> h['/']
<Group / (implied)>
>>> h['/tricia']
<Group /tricia (implied)>
>>> g = h['/tricia/mcmillan']
>>> g
<Group /tricia/mcmillan>
>>> g.attrs
{'heart': 'gold', 'improbability': 'infinite'}
>>> a = h['/arthur/dent']
>>> a
<Array /arthur/dent>
>>> a.shape
(5, 10)
>>> a.dtype
dtype('int32')
>>> a.attrs
{'question': 'life', 'answer': 42}
>>> a[:]
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]], dtype=int32)

zarrita's People

Contributors

alimanfoo avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.