Coder Social home page Coder Social logo

blz's People

Contributors

aterrel avatar esc avatar francescalted avatar grahamc avatar mrocklin avatar mwiebe avatar sdiehl avatar teoliphant avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

blz's Issues

Iterators in BLZ should be in their own class

Right now, the barray and btable objects in BLZ implement the iter in the same class, and this can create problems in different situations:

  1. the len() cannot be shared between the iterator and the underlying object (e.g. nd.array(b.where(a<5)) uses the len(b) to fill the object).

  2. two iterators cannot be run simultaneously (e.g. zip(b.where(a1 && a<6)))

Making the iterator to be an independent object will solve these issues.

Error copying compressed blz to a lz4 compressed container

import blz
from scipy import misc

#The test image can be downloaded from http://i.imgur.com/3afoJWq.jpg
img = misc.imread('3afoJWq.jpg')

blz_img = blz.barray(img)

#This error only happens with lz4 and clevel greater than 0
blz_img.copy(bparams=blz.bparams(clevel=1, shuffle=True, cname='lz4'),
                                 expectedlen=blz_img.size)

``repr`` of ``chunk`` class is broken

In [11]: a = blz.arange(1e5)

In [12]: c = a.chunks[0]

In [13]: repr(c)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-084143abbd63> in <module>()
----> 1 repr(c)

/home/esc/gw/blz/blz/blz_ext.so in blz.blz_ext.chunk.__repr__ (blz/blz_ext.c:6294)()

AttributeError: 'blz.blz_ext.chunk' object has no attribute 'shape'

I had a look at the code and I could probably fix and submit a PR but I don't understand the intention. What should shape be:

  def __repr__(self):
    """Represent the chunk as an string, with additional info."""
    cratio = self.nbytes / float(self.cbytes)
    fullrepr = "chunk(%s, %s)  nbytes: %d; cbytes: %d; ratio: %.2f\n%r" % \
        (self.shape, self.dtype, self.nbytes, self.cbytes, cratio, str(self))
    return fullrepr

trim() could return a new blz

When using trim(), it trims the current blz instead of returning a reference to a new one, I think it would be better if it returned a reference.

btable.delcol, doesn't work on disk

import numpy as np
import blz

N = 100*1000
ct = blz.fromiter(((i,i*i) for i in xrange(N)),
       rootdir='error.blz',
       dtype="i4,f8",
       count=N)

new_col = np.linspace(0, 1, 100*1000)

ct.addcol(new_col, 'Name', dtype='i4')

ct.delcol('Name')
ct.flush()

ct.addcol(new_col, 'Name', dtype='i4')

Output:

RuntimeError: specified rootdir path 'error.blz/Name' already exists and creation mode is 'a'

A manual workaround is to simply delete that folder.

Cython required for running tests

Cython should only be a build time dependency. It makes no sense to import Cython when running the tests, and display the Cython version number, because this might not even be the version which was used during building it package.

Can't create large multidimensional array in BLZ

[Adapted from https://github.com/Blosc/bcolz/issues/25]

With Numpy I can do something like this:

foo = np.zeros([ 2 ] * 20)

And get an ndarray with the corresponding shape. I can then:

ac = blz.barray(foo)

To get a barray object. Great. But I'm playing around with blaze.blz because I want to use array sizes that are larger than could otherwise fit in memory, and [2] * 20 is an easy shape for Numpy to handle, so for me it's a baseline of sorts.

Looking to explore the capabilities of blze.blz, I try to create the object directly, without the intermediate Numpy step:

ac = blz.zeros([2] * 20)

But I get an error:

In [11]: blz.zeros([2]*20)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-32ca191bdff9> in <module>()
----> 1 blz.zeros([2]*20)

/home/faltet/software/blz/blz/bfuncs.pyc in zeros(shape, dtype, **kwargs)
    239     """
    240     dtype = np.dtype(dtype)
--> 241     return fill(shape=shape, dflt=np.zeros((), dtype), dtype=dtype, **kwargs)
    242 
    243 

/home/faltet/software/blz/blz/bfuncs.pyc in fill(shape, dflt, dtype, **kwargs)
    203     # Then fill it
    204     # We need an array for the defaults so as to keep the atom info
--> 205     dflt = np.array(obj.dflt, dtype=dtype)
    206     # Making strides=(0,) below is a trick to create the array fast and
    207     # without memory consumption

ValueError: number of dimensions must be within [0, 32]

Which leads me to wonder: is something like this possible using blz?

Can't load btable data from disk

I created a large persistent btable in IPython:

 table = blz.btable(np.empty(0, dtype="u1,"*400), bparams=blz.bparams(clevel=9), expectedlen=6000000, rootdir="test_btable")

Then I appended 6 000 000 rows:

for _ in xrange(6000000):
    table.append(np.random.randint(0, 2, 400).tolist())

This gave the expected result: a btable of 6 000 000 rows and 400 cols. I then closed the IPython session and opened a new one to test the persistence. I attempted to reopen this btable using:

table = blz.open("test_btable")

This loads a btable with the correct dtype, although the table is empty and the clevel is set to 5 instead of 9.

btable((0,), [('f0', 'u1'), ... dtype is ok so I cut this ..., ('f399', 'u1')])
  nbytes: 0; cbytes: 25.00 MB; ratio: 0.00
  bparams := bparams(clevel=5, shuffle=True, cname=blosclz)
  rootdir := 'test_btable/'
[]

I know that the data was saved on disk, since the test_btable directory contains 400 fx folders and each of these folders contains a few data chunks.

I then attempted to load a barray directly:

array = blz.open("test_btable/f0/")

which resulted in:

barray((0,), uint8)
  nbytes: 0; cbytes: 64.00 KB; ratio: 0.00
  bparams := bparams(clevel=9, shuffle=True, cname=blosclz)
  rootdir := 'test_btable/f0'
[]

Here is the metadata for barray f0:

{"shape": [0], "nbytes": 0, "cbytes": 65536}
{"dtype": "uint8", "bparams": {"shuffle": true, "clevel": 9}, "chunklen": 65536, "dflt": 0, "expectedlen": 6000000}

And here is a partial ls -lah of the test_btable/f0/data directory:

total 6552
drwxr-xr-x  93 Alexandre  staff   3.1K Mar 19 05:11 ./
drwxr-xr-x   5 Alexandre  staff   170B Mar 19 00:21 ../
-rw-r--r--   1 Alexandre  staff    32K Mar 19 00:25 __0.blp
-rw-r--r--   1 Alexandre  staff    32K Mar 19 00:30 __1.blp
...
-rw-r--r--   1 Alexandre  staff    32K Mar 19 00:58 __9.blp
-rw-r--r--   1 Alexandre  staff    32K Mar 19 05:11 __90.blp

I am using blz 0.6.2-dev, OSX 10.9.2 and Python 2.7.5.

Best,
Alex

Cannot build against Numpy 1.10

When trying to build blz against Numpy 1.10, I get (on all platforms):

ERROR:: You need numpy 1.7 or greater to run blz!

I'm guessing there is a test for the Numpy version based on the first 3 characters of the version string only.

Reshape fails for large arrays

We are trying to use blz to store and reshape very large UInt16 arrays (5000x3e9).

The machine has 60GB of RAM if that is of importance.

Here is an example of code breaking:

In [1]: import blz
In [2]: size = int(1e6)
In [3]: b = blz.zeros(size).reshape((size/2, 2))
In [4]: b
Out[4]: 
barray((500000, 2), float64)
  nbytes: 7.63 MB; cbytes: 88.78 KB; ratio: 88.00
  bparams := bparams(clevel=5, shuffle=True, cname=blosclz)
[[ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]
 ..., 
 [ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]]

In [5]: size = int(5e9)
In [6]: b = blz.zeros(size).reshape((size/2, 2))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-0214aa15e7ec> in <module>()
----> 1 b = blz.zeros(size).reshape((size/2, 2))

/data_ebs/anaconda/lib/python2.7/site-packages/blz/blz_ext.so in blz.blz_ext.barray.reshape (blz/blz_ext.c:18541)()

ValueError: total size of new array must be unchanged

test does not work on Python 2.6

Even though Python 2.6 claims to be supported, I get (on 64-bit Linux):

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
BLZ version:       0.6.0
NumPy version:     1.7.1
Blosc version:     1.3.1 ($Date:: 2014-01-15 #$)
Blosc compressors: ['blosclz', 'lz4', 'lz4hc', 'snappy', 'zlib']
Numexpr version:   2.2.2
Cython version:    0.19.2
Python version:    2.6.9 |Continuum Analytics, Inc.| (unknown, Oct 30 2013, 10:17:14) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-54)]
Platform:          linux2-x86_64
Byte-ordering:     little
Detected cores:    1
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Performing only a light (yet comprehensive) subset of the test suite.
If you want a more complete test, try passing the --heavy flag to this
script (or set the 'heavy' parameter in case you are using blz.test()
call).  The whole suite will take more than 30 seconds to complete on a
relatively modern CPU and around 300 MB of RAM and 500 MB of disk
[32-bit platforms will always run significantly more lightly].

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Traceback (most recent call last):
  File "/home/ilan/aroot/test-tmp_dir/run_test.py", line 92, in <module>
    blz.test()
  File "/opt/anaconda1anaconda2anaconda3/lib/python2.6/site-packages/blz/tests/test_all.py", line 92, in test
    unittest.TextTestRunner().run(suite())
  File "/opt/anaconda1anaconda2anaconda3/lib/python2.6/site-packages/blz/tests/test_all.py", line 26, in suite
    return unittest.TestLoader().discover(
AttributeError: 'TestLoader' object has no attribute 'discover'

BUG: btable.delcol() doesnt update__rootdir__ reference of deleted column

When i try to deletea column from btable using named column, it does not remove the column reference in the btable directory__ rootdir __ file. If i manually go ahead and remove the reference, i can load the btable back from disk.

i am on Windows platform (I also confirmed the same error on redhat linux as well). here is an example code:

import blz
import numpy as np
blz.__version__
'0.6.2'

N = 100*1000
ct = blz.fromiter(((i,i*i) for i in xrange(N)), dtype="i4,f8", count=N,rootdir='test')
new_col = np.linspace(0, 1, 100*1000)
ct.addcol(new_col)
ct.delcol('f2')
ct.flush()

#trying to open the btable from rootdir produces the error
a=blz.open(rootdir='test')

---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-6-2c1171fba903> in <module>()
----> 1 a=blz.open(rootdir='test')

C:\Users\RKJ837\AppData\Local\Continuum\Anaconda\lib\site-packages\blz\bfuncs.pyc in open(rootdir, mode)
     51     # distinguish between btable and barray
     52     if os.path.exists(os.path.join(rootdir, '__rootdirs__')):
---> 53         obj = btable(rootdir=rootdir, mode=mode)
     54     else:
     55         obj = barray(rootdir=rootdir, mode=mode)

C:\Users\RKJ837\AppData\Local\Continuum\Anaconda\lib\site-packages\blz\btable.pyc in __init__(self, columns, names, **kwargs)
    216             _new = True
    217         else:
--> 218             self.open_btable()
    219             _new = False
    220 

C:\Users\RKJ837\AppData\Local\Continuum\Anaconda\lib\site-packages\blz\btable.pyc in open_btable(self)
    311 
    312         # Open the btable by reading the metadata
--> 313         self.cols.read_meta_and_open()
    314 
    315         # Get the length out of the first column

C:\Users\RKJ837\AppData\Local\Continuum\Anaconda\lib\site-packages\blz\btable.pyc in read_meta_and_open(self)
     55             dir_ = os.path.basename(dir_)
     56             dir_ = os.path.join(self.rootdir, dir_)
---> 57             self._cols[str(name)] = barray(rootdir=dir_, mode=self.mode)
     58 
     59     def update_meta(self):

C:\Users\RKJ837\AppData\Local\Continuum\Anaconda\lib\site-packages\blz\blz_ext.pyd in blz.blz_ext.barray.__cinit__ (blz\blz_ext.c:11804)()

C:\Users\RKJ837\AppData\Local\Continuum\Anaconda\lib\site-packages\blz\blz_ext.pyd in blz.blz_ext.barray.read_meta (blz\blz_ext.c:15483)()

IOError: [Errno 2] No such file or directory: u'test\\f2\\meta\\sizes'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.