Coder Social home page Coder Social logo

cuffs's People

Contributors

gheald avatar sarrvesh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cuffs's Issues

Helper script should allow 2D input FITS files

The helper script to create 3D cubes does not allow for 2D input fits files (i.e. NAXIS=2). There are some subtle things in the script that need to be adjusted to make this possible. I have created a modified version of the script and can upload it as a separate branch if you like, Sarrvesh?

Allow the user to compile in float/double mode

The user should be able to compile the rm synthesis code in both float/double mode. Gaming GPUs seem to slow down while attempting to do double-enable math operations while scientific GPUs work quite well with double data types. To switch between these modes, the user should be able to compile the code in both float/double modes.

Science test case

For a simple Q and U cubes, run RM Synthesis and compare results with the rmsynthesis code on dop254.

Need better error handling

Looks like a number of functions terminate execution using exit(). This needs to be avoided. All functions must return control to main() and it is main() who decides whether the code needs to terminate or not.

Support multiple GPUs

Multiple (dis)similar GPUs can be connected to a single host. The code be able to detect multiple GPUs and distribute threads among all suitable devices.

Check out of bounds in each thread

To have equal number of threads across all blocks, the number of threads launched can be greater than the number of \phi planes. To avoid out of bounds memory access, each thread should check if its index is greater than the number of \phi planes. If it is greater, that thread should terminate gracefully without executing anything.

Avoid variable reuse in doRMSynthesis()

Variables size and nElements are being reused multiple times in function doRMSynthesis(). This is not good coding practice and could potentially lead to disasters... Also, need to clean up the code in this function.

Output cubes contain only zeros (on galaxy)

I've tried using the program both on MWA and ASKAP data, and in both cases the pixel values in the output cubes contain only zeros. The program seems to complete successfully, and the cubes are valid FITS files, but they don't contain any useful data. Unsure whether this is due to the build that I managed to do on galaxy, or the code itself. Did you ever see something like this on other systems?

Support image masks

The code should be able to read in a fits mask and run rmsynthesis only on the selected pixels.

Select GPU based on global memory size

While operating in single device mode, select device based on its global memory size. At the moment, the code chooses the first device in the list. (Also see issue #9 )

Better documentation

Write better documentation for all functions. Also document the parameter list and return values for all functions.

Optimize thread and block size

In the current version, a single block with nPhi threads are launched by default. This is not necessarily the best option. Ideally, one should decide based on the number of registers available per MP. A good understanding of the GPU hardware is needed to solve this problem.

Insufficient memory on device

Larger test case failing with message
"ERROR: Insufficient memory on device! Try reducing nPhi"

After a bit of inspection it appears that the global memory might be incorrectly identified. The size of the output Q/U cubes seems to be reported as exactly the same size as the global memory.

Remove unwanted code

Remove unwanted printf statements that were coupled with some fits_report_error.

Add to conda-forge

Hey @sarrvesh!

I'm not sure how much time you've got for cuFFS things these days. But it'd be great if we could get get cuFFS onto conda-forge for easy installation. Right now the installation seems to be the trickiest bit, so having an easy one-line option would be great.

Unfortunately, I have no experience with adding a package to conda-forge. But, a quick look at the docs would seem to suggest it isn't too much work

Running on large files

Hi @sarrvesh ! Thank you for making cuFFS.

I tried to run cuFFS on 600 * 2 GB fits files, but makeFitsCube.py couldn't stack the rotated cube requested by cuFFS, because (I think) the HPC RAM available (200GB) is smaller than the cube size (1TB).

So I have a few questions:

  1. Is there an efficient way to stack a 1TB rotated cube? (I wrote a few scripts myself, but they would take over a week to generate that cube...)

  2. Can cuFFS run on a 1TB cube with 200GB nodes? If not, my workaround would be making smaller sub-cubes.

Thanks!

Process each LoS separately

In the current version of the code, each input channel is processed separately and each \phi axis is assigned to a gpu kernel. Due to the way GPU memory is accessed, this might not be the most efficient way to do this. Another approach is to process each line of sight separately. I am not sure if this is faster but we should give it a try.

Optimize QU computation

At the moment Q(phi) and U (phi) are computed separately. This requires that input frequency channels have to be moved to the device twice. If the device memory is big enough to accommodate both Q and U output cubes at the same time, the code should compute Q and U as one single gpu call. This can speed up the code a bit.

Change data type

Treat all pixel values as float. Double precision is not required at least for now.

Reduce memory footprint

Since RM Synthesis works on one input image channel at a time, we can reduce the code's memory footprint by not reading in the entire Q and U cubes into memory.

Sums are not NaN safe

If input array contains a NaN, the output voxel will be filled with NaNs. It would be useful to have NaNs handled as an input.

RMSF -- Not wide enough for CLEAN & Doesn't adjust to account for flagging

Currently, the output RMSF is the same width as the specified Faraday depth range. For CLEAN an RMSF of twice the FD range to be CLEANed is required.

Also, implementations such as RM-Tools "correctly deals with isolated clumps of NaN-flagged voxels within the data-cube (unlikely in interferometric cubes, but possible in single-dish cubes)". Specifically, if a NaN is detected, the channel is set to 0 and the weight is also set to zero. The ability to specify a weight per channel would be beneficial. Or, even better would be the output of an RMSF per pixel that correctly accounts for varying flagging per voxel.

Journal publication

Publish the RM Synthesis (version 1.0 milestone) in Astronomy & Computing.

Reduce the memory footprint of fitsrotate

fitsrotate uses two memory maps to read in the fits cube, rotate, and write out the rotated cube. If we read the input cube channel-by-channel, the code can work with a single memory map.

Update readme

The current readme is seriously out-of-date. Update with

  • Installation instructions
  • Update dependencies?
  • Structure of the code
  • Add citation instructions: Link to the A&C publication and ascl.
  • Information about work in progress?
  • How to contribute?

Implement fast FITS cube rotation

At the moment, users will have to rotate the input cubes using external programs like miriad. cuFFS should have an built-in executable that can rotate and derotate FITS cubes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.