Coder Social home page Coder Social logo

realfastvla / realfast Goto Github PK

View Code? Open in Web Editor NEW
10.0 6.0 4.0 20.86 MB

Real-time interferometric data analysis for the VLA

Home Page: http://realfast.io

License: BSD 3-Clause "New" or "Revised" License

Python 89.03% Jupyter Notebook 10.58% Shell 0.39%
data-analysis cluster radio astronomy transient-astronomy distributed-computing

realfast's People

Contributors

caseyjlaw avatar demorest avatar xiggystardust avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

realfast's Issues

Define realfast servers

Nominal plan that needs to be finalized:

  • 32 servers
    • can host at least 2 GPUs per server.
    • 32 GB
    • dual 12-core CPUs
      -~$5k
  • 1 head node
    • 512 GB
    • dual 4-core CPUs
    • ~$12k

Open question:

  • how does architecture affect latency, capacity, diversity of modes, etc.?

Define format of candidate visibility data

We will generate visibility snippets to be archived. What metadata and format should they have?

Baseline plan is to write SDMs with sdmpy without dedispersion correction.

Is it adequate to modify metadata only for new time range?

systematic plan for mock transients

We have the ability to simulate transients, but do not regularly use it in real-time observing. It would be valuable to implement an end-to-end plan for adding, finding, and tracking mock transients as a regular check of data quality.

Design review R6: better schedule

R6: A more granular schedule, with definite targets should be developed to track progress. This schedule shall be shared with affected aspects of NRAO to avoid priority collisions.

  • Schedule needs to be integrated with other NRAO activities and priorities:
  • Identify periods of conflict for key personnel
  • More granular schedule for early identification of slippage

two scans can collide during queue_monitor cleanup

queue_monitor can accidentally parse the wrong mergepkl if two scans finish at the same time. In one case, the two scans were from different SBs, so the scans to archive were not correctly identified.
Offending code is:

for jobid in jobids:
...
if job == finishedjobs[-1]:

Define GPUs

Need to finalize nominal plan for purchasing 64 GPUs:

  • nVIdia Titan
  • 6 GB global memory

Does choice of GPU affect/limit how we program for it?

Related q: is numba efficient for GPUs?

last calibrator scan bdf not removed?

During a science run for 17A-396, we noticed a bdf remaining after archiving completed. It was the last scan and a calibrator scan. The SDM downloaded from the NRAO archive included the bdf, so that seems to have worked properly.
It is likely a bug related to bdf cleanup within the realfast code.

Metadata for input SDM

CBE will write SDMs for realfast to process. The metadata in those SDMs is not well defined yet, as they are a new data product.

Design and Implement VLA API

The integration of realfast with the VLA observing system should be a prototype of a general system for third-party systems. As such, a general interface needs to be designed for systems like this.

Design review R5: ICDs for all interfaces

R5: Identify and document (ICD) all external interfaces to the realfast system. Major subsystems within realfast should consider similar documentation.

Definition of data formats:

  • Identification and enumeration of interfaces (e.g. XML documents TD consumes)
  • Clearly identified interfaces in the processing pipeline (e.g. replacing the TD with another algorithm)

OTF testing

Can realfast operate commensally with VLA OTF mode?

Potential issues:

  • phase stepping for VLASS is faster than DM sweep
  • the slew rate is longer, but still fast!

Can internal portal be based on jupyterhub?

We have working prototype that uses jupyterhub as a front end. Do we want to redevelop front end for internal portal or further develop notebooks for use with jupyterhub?

periodicity imaging

Development of algorithm and science behind concept of periodicity imaging. Open questions:

  • what kinds of pulsars are accessible?
  • is there a new science case in those pulsars?
  • what algorithms are needed to access that science?
  • what tests can we run to prove that case?

If all goes well, write it up!

Purchase prototype hardware

Having "magic disk" and GPU hardware would make testing more effective and informative for eventual cluster order.

OTF/VLASS commissioning

  • Does 0.45 s phase stepping hurt realfast transient detection system?
  • Does OTF hurt transient detection?

Prototype rfpipe on CBE

Would be nice to get some rudimentary version of rfpipe (using CPUs via dask distributed scheduler) on CBE for NRAO review.

need to be more resilient to missing bdfs

End of scheduling block process will trip up if a bdf never arrives. A few potential issues here:

  1. sdm timeout not working
  2. need to be able to know when bdfs will never arrive (correlator fail)
  3. be sure that we can know state of sdm when bdfs are not around for any reason.

commensal test

Prototypes and new features coming together:

  • RDMA API
  • new worker (GPUs, nvme)
  • rfpipe
  • new software environment

A test would motivate integration and answer rudimentary questions about performance before making big purchase.

Design review R4: commissioning plan

R4: Develop a commissioning plan and explicitly identify the tests which will mitigate the risks identified.

There are several areas of risk to the operations of VLA:

  • Averaging in the CBE rather than Baseline cards
  • Data rates through Correlator to CBE switch.
  • Data rates through Infiniband switches

Many limits will be empirically determined, so a robust process to identify, monitor, and organize results must be defined.

Connect realfast to archive

Need to define path to get output visibility cutouts into archive.

Currently Paul's sdmpy library can create cutouts. Those products need to be ingested to archive system.

rfpipe on GPU cluster

key questions:

  • latency?
  • GPU utilization?
  • is memory buffer adequate for typical VLA observing patterns?

Prepare project documents

NRAO PMD recommends some documents to support the realfast development process.

  1. Requirements and technical architecture (two docs, actually)
  2. Concept of operations ("how should system behave?")
  3. Operations plan (list activities, set budgets, future)

Candidate detection rate control

Candidate detections will create visibility data to be archived. However, many candidates will be bad and should be cut before archiving.

Initially, this will require human feedback ("Astronomer on duty", AOD) to remove bad candidates and reduce the archived data rate. It is expected that this can be done on a ~day cadence.

  • Observing control interface should be able to control candidate rate (e.g., set threshold for given SB)
  • Need a plan to go from duty astronomer to NRAO (autonomous) system
    • Does this require 15 min a day? If so, NRAO can handle via data analyst.
    • If longer, we need to reduce output somehow. ML classifier?

Do we need a codec?

Data volumes and rates are large and may be limiting either in the correlator or within the realfast cluster. Using compression/decompression may help trade compute for data rate limitations.
Prior art:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.