Comments (23)
Thanks for the detailed report. I have tried to reproduce this (also with a large dataset, 5.6G), but the two methods yield the same result for me. Could it be that you ran out of memory?
from python-casacore.
@tammojan thanks for the quick answer. The dataset is bigger (190Gb - DATA+CORRECTED_DATA), basically an uncompressed single LOFAR LBA subband (64 channels / 1s bin, 7 hours integration). The system has 256 Gb RAM, so enough to host the array.
from python-casacore.
To be sure it wasn't connected to a type problem, I've tested replacing the raw data values which have levels between 10^6 and 10^8 by ones. Same behaviour...
Also meaning that the putcol method is not affected
from python-casacore.
Can you reproduce it with a smaller array (say half of your data), to make sure that it's not a memory problem?
from python-casacore.
On a smaller MS
I've cut the MS to 42% of its original size and now I get a perfect match! But does that mean this is a memory problem? I'm not completely convinced.
On the original MS
During loading the DATA array, the memory consumption peaks at 82 Gb (out of 256 Gb)
from python-casacore.
Original MS
d=t.getcol("DATA")
This is the memory consumption as a function of time (green is allocated to the process, gray is cache). It seems to have a plateau.
from python-casacore.
I think that numpy requires contiguous memory, so even if you have enough memory available, it may not be contiguous. I agree that giving zeroes in case of (possibly) memory problems isn't desirable. Some kind of other error would be more acceptable.
Until this is solved, you just convinced yourself that using slicing from disk is safer for arrays larger than say 50Gb.
from python-casacore.
On the smaller MS where no missbehaviour is observed
no memory deallocation is seen at the end...
from python-casacore.
Somehow part of the allocated array gets deallocated. See the difference between the two previous plots at the end of the read. How can that be?
from python-casacore.
Perhaps the deallocation happens at the end of the program, which may be negligible in the 'small' example?
from python-casacore.
My eye sees that the dip behind the bump at 2 and 1 min. does and does not happen respectively.
from python-casacore.
... and that the size of the dip that does happen in the first case corresponds to the quantity of zero (missing then) data in the first case...
from python-casacore.
Agreed. I'll have to ask @gervandiepen to comment on that. Thanks for your detailed report, and I hope you have enough input to at least continue your work (using the slicer from disk).
from python-casacore.
@tammojan Thanks for looking at that issue. It's actually quite blocking for me. I'd need to modify the architecture of my software to use data chunks compatible with what casacore can or cannot do which is a bit unclear now. I'd prefer a bug fix of course if possible.
from python-casacore.
Any news from @gervandiepen ?
from python-casacore.
Hi Cyril,
I've taken a look, but cannot pinpoint the problem.
The coming weeks I have little time because I have to finish SDP documents
and have a week leave.
I'll take a look again after March 5th.
Ger
On Mon, Feb 15, 2016 at 3:09 PM, cyriltasse [email protected]
wrote:
Any news from @gervandiepen https://github.com/gervandiepen ?
—
Reply to this email directly or view it on GitHub
#38 (comment)
.
from python-casacore.
Hey Ger,
Thanks very much, I think you're the only person in the observable universe that can have an idea on that weird thing. CountedPointer or so? Can you reproduce? If not let me know,
Cyril
from python-casacore.
Hi @cyriltasse ,
Perhaps, as a workaround, you could try something like:
# Allocate full array
fulldata=t.getcol("DATA",0,1)
fulldata.resize((t.nrows(),dsel_FromDisk.shape[1],dsel_FromDisk.shape[2]))
# Fill the array in chunks
chunksize=1e5
for current_row in range(0, t.nrows(), chunksize):
fulldata[current_row:current_row+chunksize]=t.getcol("DATA",current_row,chunksize)
incr=2179
dsel_FromArray=fulldata[1::incr,10,0]
This shouldn't affect how you work with your data too much does it?
I'm not sure this works, because the array fulldata
will again be very large, maybe too large. But if it is, this piece of code could help show whether the error is at the numpy side or at the casacore side.
from python-casacore.
@tammojan yep something like that should probably be all right. I'll try when that gets critical again. Although probably not very efficient in terms of I/O... But thanks for thinking of it!
from python-casacore.
@cyriltasse I updated the code a bit, even in these 7 lines I made a mistake.. You can make the I/O more efficient by setting chunksize
very large (at the extreme to something as large as the whole MS, but then I'm pretty sure you'll see zeros again).
from python-casacore.
Hi @cyriltasse ,
You're probably looking for the following function (thanks @gervandiepen ):
def getcolnp (self, columnname, nparray, startrow=0, nrow=-1, rowincr=1):
"""Get the contents of a column or part of it into the given numpy array.
The numpy array has to be C-contiguous with a shape matching the
shape of the column (part). Data type coercion will be done as needed.
If the column contains arrays, they should all have the same shape.
An exception is thrown if they differ in shape. In that case the
method :func:`getvarcol` should be used instead.
The column can be sliced by giving a start row (default 0), number of
rows (default all), and row stride (default 1).
"""
if not nparray.flags.c_contiguous or nparray.size == 0:
raise ValueError("Argument 'nparray' has to be a contiguous numpy array")
return self._getcolvh (columnname, startrow, nrow, rowincr, nparray)
from python-casacore.
Ho nice I'll have a look!
from python-casacore.
The docs on readthedocs weren't updated properly, so I moved them to github instead. The documentation for getcolnp
is now at
http://casacore.github.io/python-casacore/casacore_tables.html#casacore.tables.table.getcolnp
(in case you had troubles with the markdown...)
from python-casacore.
Related Issues (20)
- Fail compiling python-casacore on ubuntu 18.04 HOT 1
- undefined symbol: register_derivedmscal when importing `casacore.tables`. HOT 7
- pip install on windows fails (opt.split()) HOT 2
- Several vulnerabilities in the C libraries which python-casacore depends on. Could you help upgrade to patch versions? HOT 1
- `python-casacore==3.5` does not work with `numpy==1.22` HOT 5
- table.summary(recurse=True) : TypeError on MeasurementSets
- Source distribution for 3.5.2 HOT 2
- table.browse() fails to access integer data when no explicit conversion is made
- taql() function: RuntimeError when comment in TaQL command includes one single-quote HOT 2
- RuntimeError: Table DataManager error: StandardStMan::addColumn bucketsize too small for adding column COPY HOT 2
- Binary wheels for python 3.11 HOT 2
- Argument list to long HOT 2
- Importing underscore modules (undefined symbol) HOT 2
- Missing GSL Dependency on conda-forge HOT 2
- python-casacore has implicit dependency on casadata HOT 3
- How to create images with pixel values that have different data types than float32? HOT 2
- Interop with casatasks/casatools HOT 1
- pip install can't find proper boost directory HOT 9
- Problem installing python-casacore in macOS 12
- large image construction HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-casacore.