Coder Social home page Coder Social logo

Comments (23)

tammojan avatar tammojan commented on August 19, 2024

Thanks for the detailed report. I have tried to reproduce this (also with a large dataset, 5.6G), but the two methods yield the same result for me. Could it be that you ran out of memory?

from python-casacore.

cyriltasse avatar cyriltasse commented on August 19, 2024

@tammojan thanks for the quick answer. The dataset is bigger (190Gb - DATA+CORRECTED_DATA), basically an uncompressed single LOFAR LBA subband (64 channels / 1s bin, 7 hours integration). The system has 256 Gb RAM, so enough to host the array.

from python-casacore.

cyriltasse avatar cyriltasse commented on August 19, 2024

To be sure it wasn't connected to a type problem, I've tested replacing the raw data values which have levels between 10^6 and 10^8 by ones. Same behaviour...

Also meaning that the putcol method is not affected

from python-casacore.

tammojan avatar tammojan commented on August 19, 2024

Can you reproduce it with a smaller array (say half of your data), to make sure that it's not a memory problem?

from python-casacore.

cyriltasse avatar cyriltasse commented on August 19, 2024

On a smaller MS

I've cut the MS to 42% of its original size and now I get a perfect match! But does that mean this is a memory problem? I'm not completely convinced.

figure_1

On the original MS

During loading the DATA array, the memory consumption peaks at 82 Gb (out of 256 Gb)

from python-casacore.

cyriltasse avatar cyriltasse commented on August 19, 2024

Original MS

d=t.getcol("DATA")

This is the memory consumption as a function of time (green is allocated to the process, gray is cache). It seems to have a plateau.

figure_1

from python-casacore.

tammojan avatar tammojan commented on August 19, 2024

I think that numpy requires contiguous memory, so even if you have enough memory available, it may not be contiguous. I agree that giving zeroes in case of (possibly) memory problems isn't desirable. Some kind of other error would be more acceptable.

Until this is solved, you just convinced yourself that using slicing from disk is safer for arrays larger than say 50Gb.

from python-casacore.

cyriltasse avatar cyriltasse commented on August 19, 2024

On the smaller MS where no missbehaviour is observed

no memory deallocation is seen at the end...

figure_1

from python-casacore.

cyriltasse avatar cyriltasse commented on August 19, 2024

Somehow part of the allocated array gets deallocated. See the difference between the two previous plots at the end of the read. How can that be?

from python-casacore.

tammojan avatar tammojan commented on August 19, 2024

Perhaps the deallocation happens at the end of the program, which may be negligible in the 'small' example?

from python-casacore.

cyriltasse avatar cyriltasse commented on August 19, 2024

My eye sees that the dip behind the bump at 2 and 1 min. does and does not happen respectively.

from python-casacore.

cyriltasse avatar cyriltasse commented on August 19, 2024

... and that the size of the dip that does happen in the first case corresponds to the quantity of zero (missing then) data in the first case...

from python-casacore.

tammojan avatar tammojan commented on August 19, 2024

Agreed. I'll have to ask @gervandiepen to comment on that. Thanks for your detailed report, and I hope you have enough input to at least continue your work (using the slicer from disk).

from python-casacore.

cyriltasse avatar cyriltasse commented on August 19, 2024

@tammojan Thanks for looking at that issue. It's actually quite blocking for me. I'd need to modify the architecture of my software to use data chunks compatible with what casacore can or cannot do which is a bit unclear now. I'd prefer a bug fix of course if possible.

from python-casacore.

cyriltasse avatar cyriltasse commented on August 19, 2024

Any news from @gervandiepen ?

from python-casacore.

gervandiepen avatar gervandiepen commented on August 19, 2024

Hi Cyril,

I've taken a look, but cannot pinpoint the problem.
The coming weeks I have little time because I have to finish SDP documents
and have a week leave.

I'll take a look again after March 5th.
Ger

On Mon, Feb 15, 2016 at 3:09 PM, cyriltasse [email protected]
wrote:

Any news from @gervandiepen https://github.com/gervandiepen ?


Reply to this email directly or view it on GitHub
#38 (comment)
.

from python-casacore.

cyriltasse avatar cyriltasse commented on August 19, 2024

Hey Ger,

Thanks very much, I think you're the only person in the observable universe that can have an idea on that weird thing. CountedPointer or so? Can you reproduce? If not let me know,

Cyril

from python-casacore.

tammojan avatar tammojan commented on August 19, 2024

Hi @cyriltasse ,

Perhaps, as a workaround, you could try something like:

# Allocate full array
fulldata=t.getcol("DATA",0,1)
fulldata.resize((t.nrows(),dsel_FromDisk.shape[1],dsel_FromDisk.shape[2]))

# Fill the array in chunks
chunksize=1e5
for current_row in range(0, t.nrows(), chunksize):
  fulldata[current_row:current_row+chunksize]=t.getcol("DATA",current_row,chunksize)

incr=2179
dsel_FromArray=fulldata[1::incr,10,0]

This shouldn't affect how you work with your data too much does it?

I'm not sure this works, because the array fulldata will again be very large, maybe too large. But if it is, this piece of code could help show whether the error is at the numpy side or at the casacore side.

from python-casacore.

cyriltasse avatar cyriltasse commented on August 19, 2024

@tammojan yep something like that should probably be all right. I'll try when that gets critical again. Although probably not very efficient in terms of I/O... But thanks for thinking of it!

from python-casacore.

tammojan avatar tammojan commented on August 19, 2024

@cyriltasse I updated the code a bit, even in these 7 lines I made a mistake.. You can make the I/O more efficient by setting chunksize very large (at the extreme to something as large as the whole MS, but then I'm pretty sure you'll see zeros again).

from python-casacore.

tammojan avatar tammojan commented on August 19, 2024

Hi @cyriltasse ,

You're probably looking for the following function (thanks @gervandiepen ):

    def getcolnp (self, columnname, nparray, startrow=0, nrow=-1, rowincr=1):
        """Get the contents of a column or part of it into the given numpy array.
        The numpy array has to be C-contiguous with a shape matching the
        shape of the column (part). Data type coercion will be done as needed.
        If the column contains arrays, they should all have the same shape.
        An exception is thrown if they differ in shape. In that case the
        method :func:`getvarcol` should be used instead.
        The column can be sliced by giving a start row (default 0), number of
        rows (default all), and row stride (default 1).
        """
        if not nparray.flags.c_contiguous  or  nparray.size == 0:
            raise ValueError("Argument 'nparray' has to be a contiguous numpy array")
        return self._getcolvh (columnname, startrow, nrow, rowincr, nparray)

from python-casacore.

cyriltasse avatar cyriltasse commented on August 19, 2024

Ho nice I'll have a look!

from python-casacore.

tammojan avatar tammojan commented on August 19, 2024

The docs on readthedocs weren't updated properly, so I moved them to github instead. The documentation for getcolnp is now at
http://casacore.github.io/python-casacore/casacore_tables.html#casacore.tables.table.getcolnp
(in case you had troubles with the markdown...)

from python-casacore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.