vlas-sokolov / multicube Goto Github PK
View Code? Open in Web Editor NEWTools for processing multiple guesses and velocity components in spectral cubes.
License: MIT License
Tools for processing multiple guesses and velocity components in spectral cubes.
License: MIT License
The best_guess
method is the bottleneck of the best guesses estimation. To allow efficient processing of moderate dimensionalities (read: realistic data cubes with several LoS components), the method needs to be refactored!
Problem summary: we need to find the best residuals among a number of synthetic spectra, at every pixel. Why is this challenging? Assuming 1000 channels and 8-byte floats, if we are looking for a grid of 4x4x4 for every LoS component, then just to store the grid of spectral models for three components we need (np.prod([4, 4, 4]*3) * 1000 * 8) / (2**30)
~ 2 GB of RAM!
Current approach: try to broadcast as large of an array as possible, then look for the best residuals on a single CPU. In a world with infinite RAM, best_guess
will grab a slice of memory pie the size of (Nmodels, Nchan, X, Y) - for a modest map size of 300x200 and example above, that's more than 100 TB of memory.
What tends to happen? Needless to say, in most applications, the method only has enough memory to iterate on individual pixels, and - since it was designed to bite more than it can chew - runs on one thread only.
What should be done? The memory slice with the synthetic models should be shared with other jobs that iterate among individual pixels, and the best matching model results collected back into the main thread. Additionally (this is simpler), some parts of the current residual calculations can be sped up - dropping the numpy nan
-methods and replacing residual rms calculation by the sub of squares should give a significant speedup already.
(... an alternative possibility, one that scales up into cluster computing, is to look into how the big data folks handle this kind of processing)
Dear @keflavich,
Thanks very much for your many efforts on this program. It would be very useful for me. Actually, I am using it to fit a huge amount of data (~ 1 Gb). However, when I run your code to sometime, there is warning message as below and then I find the program is eating my CPU immoderately so that no more CUP can be used for other work. So, do you have any idea about that problem? Thanks a lot in advance.
Best regards,
@hongliliu
PS: warning message: WARNING: Selected model is best only for less than %5 of the cube, consider using the map of guesses. [multicube.subcube]
INFO: Overall best model: selected #600 [ 1.91 7.09 0.27] [multicube.subcube]
INFO: Best model @ highest SNR: #442 [ 7.15 5.64 0.62] [multicube.subcube]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.