askap-vast / vast-pipeline Goto Github PK

View Code? Open in Web Editor NEW

7.0 9.0 3.0 108.05 MB

This repository holds the code of the Radio Transient detection pipeline for the VAST project.

Home Page: https://vast-survey.org/vast-pipeline/

License: MIT License

Shell 0.06% Python 74.13% CSS 0.66% JavaScript 5.22% HTML 17.85% SCSS 1.13% Jinja 0.95%

astronomy astrophysics transient-astronomy radio-astronomy transients

vast-pipeline's Introduction

VAST Pipeline

This repository holds the code of the VAST Pipeline, a radio transient detection pipeline for the ASKAP survey science project, VAST.

Please read the Installation Instructions. If you have any questions or feedback, we welcome you to open an issue. If you are interested in contributing to the code, please read and follow the Contributing and Developing Guidelines.

If using this tool in your research, please cite Murphy, et al. (2021).

Features

Code base in Python 3.8+
Source association/manipulations using Astropy4+ and Pandas1+ dataframe
Association methods: basic (Astropy crossmatch), advanced (search with a fixed distance), De Ruiter
Flagging of "New Source" and "Related Source"
Forced Extraction (Monitor) backward and forward in time
Parallelization and scalability using Dask API (Dask 2+)
Data exploration in modern Django3+ Web App (Bootstrap 4)
Accessing raw pipeline output data using .parquet and .arrow files
Pipeline interface from command line (CLI) and via Web App
Web App is served by Postgres12+ with Q3C plugin

Screenshots and Previews

Contributors

Sergio Pintaldi – Sydney Informatics Hub
Adam Stewart – Sydney Institute for Astronomy
Andrew O'Brien – Department of Physics, University of Wisconsin-Milwaukee
Tara Murphy – Sydney Institute for Astronomy
David Kaplan – Department of Physics, University of Wisconsin-Milwaukee
Shibli Saleheen – ADACS
David Liptai – ADACS
Ella Xi Wang – ADACS

Acknowledgements

The VAST Pipeline development was supported by:

The Australian Research Council through grants FT150100099 and DP190100561.
The Sydney Informatics Hub (SIH), a core research facility at the University of Sydney.
Software support resources awarded under the Astronomy Data and Computing Services (ADACS) Merit Allocation Program. ADACS is funded from the Astronomy National Collaborative Research Infrastructure Strategy (NCRIS) allocation provided by the Australian Government and managed by Astronomy Australia Limited (AAL).
NSF grant AST-1816492.

We also acknowledge the LOFAR Transients Pipeline (TraP) (Swinbank, et al. 2015) from which various concepts and design choices have been implemented in the VAST Pipeline.

The developers thank the creators of SB Admin 2 to make the dashboard template freely available.

vast-pipeline's People

Contributors

Stargazers

Watchers

Forkers

srggrs ajstewart aishani1uwm

vast-pipeline's Issues

Pipeline breaks if config has one image

the pipeline will break if only one image is processed.

If the pipeline is run with one images, I was thinking that make sense creating the images and sources entries in db but not run any association, right? @marxide @ajstewart

Image file not found

Deal correctly when the image or selavy file paths are not found or are incorrect.

sorting catalogs by "datapoints" column gives AJAX error

When attempting to sort the catalog overview table by the "datapoints" column, the following error is given in the browser

DataTables warning: table id=dataTable - Ajax error. For more information about this error, please see http://datatables.net/tn/7

An accompanying exception is raised

2020-02-14 23:42:46,415 log ERROR Internal Server Error: /api/catalogs/
Traceback (most recent call last):
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/django/core/handlers/exception.py", line 34, in inner
    response = get_response(request)
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/django/core/handlers/base.py", line 115, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/django/core/handlers/base.py", line 113, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/rest_framework/viewsets.py", line 114, in view
    return self.dispatch(request, *args, **kwargs)
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/rest_framework/views.py", line 505, in dispatch
    response = self.handle_exception(exc)
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/rest_framework/views.py", line 465, in handle_exception
    self.raise_uncaught_exception(exc)
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/rest_framework/views.py", line 476, in raise_uncaught_exception
    raise exc
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/rest_framework/views.py", line 502, in dispatch
    response = handler(request, *args, **kwargs)
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/rest_framework/mixins.py", line 40, in list
    page = self.paginate_queryset(queryset)
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/rest_framework/generics.py", line 171, in paginate_queryset
    return self.paginator.paginate_queryset(queryset, self.request, view=self)
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/rest_framework_datatables/pagination.py", line 68, in paginate_queryset
    return list(self.page)
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/django/core/paginator.py", line 150, in __len__
    return len(self.object_list)
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/django/db/models/query.py", line 256, in __len__
    self._fetch_all()
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/django/db/models/query.py", line 1242, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/django/db/models/query.py", line 55, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1087, in execute_sql
    sql, params = self.as_sql()
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 474, in as_sql
    extra_select, order_by, group_by = self.pre_sql_setup()
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 55, in pre_sql_setup
    order_by = self.get_order_by()
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 330, in get_order_by
    field, self.query.get_meta(), default_order=asc))
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 704, in find_ordering_name
    field, targets, alias, joins, path, opts, transform_function = self._setup_joins(pieces, opts, alias)
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 734, in _setup_joins
    field, targets, opts, joins, path, transform_function = self.query.setup_joins(pieces, opts, alias)
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/django/db/models/sql/query.py", line 1504, in setup_joins
    names[:pivot], opts, allow_many, fail_on_missing=True,
  File "//miniconda3/envs/vast-pipeline/lib/python3.6/site-packages/django/db/models/sql/query.py", line 1420, in names_to_path
    "Choices are: %s" % (name, ", ".join(available)))
django.core.exceptions.FieldError: Cannot resolve keyword 'sources' into field. Choices are: association, ave_dec, ave_flux_int, ave_flux_peak, ave_ra, comment, dataset, dataset_id, eta_int, eta_peak, id, max_flux_peak, name, new, source, v_int, v_peak

fix frequency and header reading

please have a look at the my code which use the trap function vs the vast function

User Interface pages

please check your email. You should find that you get invited to a DROPBOX folder in which there are files and instructions on how to create the visuals of the webpage of the web app.

Let me know if you have any issues.

The design in there should be incorporate in #25

Sensibly handle image and selavy catalogue matching in config

As selavy catalogues will be a main input, the pipeline needs to sensibly link images to selavy catalogues. Currently they are assigned by the index in the lists in the config files.

Options to consider:

Dictionary definition? (This is messy for a config file).
Assume a different extension to the image name. I.e. the catalogue as the same name as the image but .fits is replaced by .components.txt. And expand the config file around this.

check pixel radius calculation

duplicated code in the radius calculation, please confirm calculation: vast ref code and trap ref code

In Vast, the issue is that the radius is first using the min(x, y) then recalculated using the max, why?

Change name to vast-pipeline

To avoid confusion with the actual askap pipeline!

change pipeline object names

After discussing with Tara and Adam they proposed the following changes:

dataset -> pipeline run
catalog -> source
source -> measurement

This will definitely breaks stuff but as long as you reset the db as per development instructions, you should be fine.

hi

Add forced photometry

When a source is detected in only some of the epochs, forced photometry (also called constrained fitting etc) should be used to determine robust constraints for each non-detection.

I have written a version of this code: https://github.com/dlakaplan/forced_phot.git

It assumes that ASKAPsoft images (including noise and background images) are available. Methods are present to do injection tests etc. It seems to work reasonable well and reasonable fast. For most sources it uses a definite linear algorithm:

Extract a sub-image for each source
Create a beam kernel
Determine the flux density by:
F = sum (data * kernel/noise2) / sum(kernel2 / noise**2)

which doesn’t rely on any non-deterministic/non-linear fitting (so no convergence issues, etc). It seems to work fine for most sources, at least based on some testing with selavy catalogs and injected sources that I did.

However, if 2 or more sources are too close as determined using a KDTree algorithm, it will do a simultaneous fit with a non-linear fitter. This will typically only happen for a small fraction (<1%) of the total sources. I tested this as a function of separation for sources with similar flux densities, and determined a reasonable threshold of 1.5*BMAJ. Inside there cluster fitting gives good results but ignoring them fails. Note that this really need to consider all sources, not just those that are interesting.

I run this for all of the sources in a selavy catalog, and it takes ~10s for 5000 sources with some small fraction in clusters.

I didn’t do any explicit parallelization but I think it should be thread-safe.

I spent a little time updating the formatting (parameter documentation, blackening) but I would like to integrate it more with Sergio’s repository to use consistent standards for logging, error handling, etc. Any other standards or things to watch out for would be good too.

I’ve been working on testing. As I said, when I inject sources it performs very well. For comparison with the selavy catalog it does OK, but I believe that most of the errors are from non-Gaussian sources etc. For instance, the residuals correlate strongly with chi^2. If you have suggestions for other tests please let me know. If you can help with formal unit-testing, that would be good too.

Flag sources that have more than 1 match

It could be the case that 2 sources in one catalogue are crossmatched to a single source in another. These need to be at least flagged as being the case, also decide how is best to handle - same catalogue or start a separate one?

f string should not be used in logging

f string will always be evaluated even when logging statement is not called (e.g. for lower logging levels)

Add proper installation and compile of frontend assets

Add node install instruction
package.json and lock
add minification of CSS, JS files using gulp
add live server to live recompile front end as u modify them (maybe?)

incorrect date is ingested from FITS images

The pipeline is setting the Image.time field to the value of the DATE keyword in the FITS image header. The FITS standard defines DATE as the timestamp when the FITS file was created. We want to use DATE-OBS instead.

cleardataset command is broken

The cleardataset command appears to be out of sync with the current database models. Attempting to use this command raises the following error:

ImportError: cannot import name 'Cube'

This model doesn't exist. Easy fix, just delete the import. This reveals more errors e.g.

django.db.utils.ProgrammingError: relation "pipeline_flux" does not exist
LINE 1: DELETE FROM pipeline_flux

The Flux model and therefore the pipeline_flux table doesn't exist. Unfortunately, the fix isn't as simple as removing the offending SQL statement as it has other implications. I'll start investigating, but someone more familiar with the models may want to chime in.

investigate and fix nearest neighbour association

The currently implemented association algorithm is a simple nearest neighbour crossmatch. However, something is not quite right as the catalogs only end up with a maximum of 1 source measurement. One would expect a maximum of N source measurements where N is the number of epochs.

Source associated with Sources

there is cases where you want to know if the measurements belong to the same physical source, but the code will create 2 different source entities, so we need a column with related_sources and and Many-to-Many relation with the Source with itself.

Add Test to be run in CI/CD

I think at this point in time that people start developing the code simultaneously, would be wise to start developing test so when we merge branches to master, the code will be automatically tested.

Maybe include in the repo some testing images to run against? @ajstewart @marxide

Perform large scale benchmark (high number of images)

Check performance of the current structure with a large scale test that could be representative of a future large scale survey.

Currently the VAST pilot survey consists of 113 images but is shallow so approximately 2 - 3k sources per image. We have 2 epochs of this so far with another 2 possibly in hand in 2 - 3 weeks.

This could be one good survey test.

Also from RACS we know the sky can be tiled in ~900 pointings, with deep images (e.g. the GW fields) having on the order of 30k detected sources. This will be the extreme end.

We could replicate the GW field epochs such that we have about 2 - 5k images to process through and see how long this takes?

What are others thoughts on this?

Allow survey import using csv file

Currently .vot tables are required. This injection can be faster with a direct csv read (which can also be directly downloaded from vizier, and perhaps astroquery).

fix var and eta metric calculations

Looks like there's a typo in the functions that calculate the variability metrics. I noticed it when running the pipeline over simulated data products. Fix incoming.

sources name convention

Make the source names according a convention. This need issue #2 closed and PR #9 done.

From slack, Adam:

The standard naming of what we get at the moment is:
image.i.SB9649.cont.taylor.0.restored.fits
Where the SB is the ‘SBID’ of the observation. However for surveys like VAST Pilot we have been creating our own combined mosaics, which I have started to name e.g. VAST_0012+00A.EPOCH01.I.fits . I have been starting to think we should standarise names as the default is not that useful (even that pilot name doesn’t actually say it’s Pilot data). It’s something that will take a bit of thought thought.
Saying this all the information should be pulled from the header anyway, and this kind of information would be better to standardised in there rather than the filename. I’m very used to the filename being whatever, it’s the header info that’s important

From slack, Martin:

I think there is good code to do the naming in vast in “make_source_name()”
For ASKAP that should give PREFIX_RAdegrees_hours_minutes_seconds+/-DECLINATIONdegrees_hours_minutes_seconds
And we name the fits files so we tend to try and keep these in a specific format so we can identify certain features. I am not sure what the standard is for ASKAP just yet.

handle var metrics calculation when num data points is 1

The sample standard deviation calculated by Pandas returns NaN when there is only a single flux measurement.

Development readme file

add a development readme file

Light Curve plot not working on Safari

Works fine on Chrome, but not Safari.

I only attempted turning off my adblocking, but did not delve any deeper into built in Safari content blockers.

Add support to read in and use selavy generated RMS and Background maps

The RMS map (or NoiseMap as it's called from selavy) is a default output from selavy that we should always get. Hence we can also require this is an input to help in analysing the RMS of the image and to also quickly obtain upper limits for sources without having to do forced extraction in every case (e.g. new source analysis).

UI addons

extra functionality for:

login, logout
saving queries (e.g. on catalogs)
starring sources (e.g. have favorite ...)

Enable stack trace in development

Make sure that the stacktrace is enable when in development mode (e.g. passing -v flag or debug=True)

fix this bug in Association table

The pipeline produce 3 entries in Association for the same source and catalog:

Name	Date	Image	RA	RA Error	DEC	DEC Error	Flux (mJy)	Error Flux (mJy)	Peak Flux (mJy)	Error Peak Flux (mJy)
SB9602_component_227a	2019 Aug 18 16:14:15	image.i.SB9602.cont.taylor.0.restored.cutout.2kpix.fits	12.966821	0.04	-24.519377	0.03	617	7	548	4
SB9602_component_227a	2019 Aug 18 16:14:15	image.i.SB9602.cont.taylor.0.restored.cutout.2kpix.fits	12.966821	0.04	-24.519377	0.03	617	7	548	4
SB9602_component_227a	2019 Aug 18 16:14:15	image.i.SB9602.cont.taylor.0.restored.cutout.2kpix.fits	12.966821	0.04	-24.519377	0.03	617	7	548	4

integrate bootstrap interface

Add basic boostrap template

from https://startbootstrap.com/previews/sb-admin-2/
using instructions https://studygyaan.com/django/how-to-integrate-bootstrap-4-template-in-django

Missing filtering by "New" source

missing filtering of the new source from the source query menu in the web app.

Implement "Sky regions" like done in TraP

This assigns a sky region for each image analysing the image size and centre coordinates. So a sky region consists of coordinates and a size (in TraP case a radius).

See: https://tkp.readthedocs.io/en/latest/devref/database/schema.html#skyregion

Each running catalog is then able to have a sky region associated with it, which is a fast way of being able to say "which images do I expect to see this source in".

This information is good to have for post processing and also it may help with force fitting (#23) as well so you know exactly what image each source needs to be extracted from.

storage location for pipeline data products

This occurred to me while updating .gitignore to ignore the pipeline data product directions pipeline-projects and reference-surveys and thought it warrants some discussion.

It doesn't feel right to store these outputs within the pipeline codebase as the default settings do and the installation instructions suggest. It's okay for now while we develop, but when we deploy we will probably want to change this. I'm fuzzy on the details, but I can imagine a scenario where we deploy to e.g. Pawsey Nimbus and the pipeline codebase lives on a small ephemeral storage allocation for the VM and our main data products are ingested from the larger attached network storage at Pawsey. We will likely want to store our pipeline products on that larger attached storage too.

I'm not sure what the best way to do this is. The first solution that comes to mind is to make use of Django's media files framework to deal with it. I think this is meant to define where user file uploads go but it should work for this too. e.g. we could configure the MEDIA_ROOT setting and use that to define the pipeline-projects location

# settings.py
MEDIA_ROOT = "/path/to/vast/attached/storage/"
PROJECT_WORKING_DIR = os.path.join(MEDIA_ROOT, 'pipeline-projects')

add help comment on model fields

add description on help comments in Django models. The models are the in the models.py file under each main folder.

See example below:

# file: MYFOLDER/models.py
from django.db import models

class Survey(models.Model):
    name = models.CharField(
        max_length=32,
        unique=True,
        help_text='PUT HERE THE DESCRIPTION OF THIS FIELD'
    )

...

class SurveySource(models.Model):
    name = models.CharField(
        max_length=100,
        unique=True
        help_text='PUT HERE THE DESCRIPTION OF THIS FIELD'
    )
    ra = models.FloatField(
        help_text='PUT HERE THE DESCRIPTION OF THIS FIELD'
    )
    err_ra = models.FloatField(
        help_text='PUT HERE THE DESCRIPTION OF THIS FIELD'
    )

Date not found

Reported error:

CommandError: Processing error:
"Keyword 'DATE-OBS' not found."

for running on these images:

VAST_0331+00A.EPOCH01.I.fits
VAST_0331+00A.EPOCH02.I.fits
VAST_0331+00A.EPOCH04x.I.fits

Implement regex inputs for image and selavy filenames

A dataset currently requires explicit filenames to be supplied in pipeline-projects/<dataset>/config.py. This can be generalised to accept regex input to match a large batch of files.

Implement TraP source association methodology

To be added in addition to a fixed distance association (#14).

The method is explained in Section 4.4.1 of the TraP paper: https://arxiv.org/abs/1503.01526.

It involves calculating the 'de Ruiter radius' of sources and also taking into account the semi-major axis of the beam.

Currently this is all done in SQL in the TraP code, but should be ok to translate to python.

Force source reading if the image catalogue is changed

There could be instances where selavy has been run on an image with different settings. In this case you want to reload and inject the source catalogues instead of using the catalogues already present (if they are).

right source Django model

Need to find the right source model, with the correct field names (the names of the table columns in the db backend)

value too long error on catalogue ingest

I attempted to run the pipeline on two epochs of a VAST pilot field and encountered the following error:

2020-02-12 17:09:37,876 main INFO read image VAST_2053-06A.EPOCH01.I.fits
2020-02-12 17:09:37,879 main INFO Adding new frequency band: 887
2020-02-12 17:09:37,890 main INFO Found sky region 313.448091831112, -6.2988997221061
2020-02-12 17:09:37,892 main INFO Adding VAST_1_2 to sky region 313.448091831112, -6.2988997221061
2020-02-12 17:09:38,208 main INFO Processed sources dataframe of shape: (7839, 28)
CommandError: Processing error:
value too long for type character varying(32)

The first component name that the pipeline attempts to insert, VAST_2053-06A_SB9673_component_1000a, is 36 chars long. We probably need to either increase the max lengths for the name fields, or change the naming convention. I personally favour the former unless a significant performance argument can be made.

test

Fix bug with processing image

processing this image image.i.SB10383.cont.RACS_test4_1.05_0000-12A.linmos.taylor.0.restored.fits give the following error:

2020-01-31 02:26:11,409 main INFO Processed sources dataframe of shape: (6083, 28)
Traceback (most recent call last):
  File "/Users/josh/Astro/askap-pipeline/pipeline_env/lib/python3.7/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.StringDataRightTruncation: value too long for type character varying(32)


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./manage.py", line 21, in <module>
    main()
  File "./manage.py", line 17, in main
    execute_from_command_line(sys.argv)
  File "/Users/josh/Astro/askap-pipeline/pipeline_env/lib/python3.7/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/Users/josh/Astro/askap-pipeline/pipeline_env/lib/python3.7/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/Users/josh/Astro/askap-pipeline/pipeline_env/lib/python3.7/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/Users/josh/Astro/askap-pipeline/pipeline_env/lib/python3.7/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/Users/josh/Astro/askap-pipeline/pipeline/management/commands/runpipeline.py", line 75, in handle
    pipeline.process_pipeline(dataset)
  File "/Users/josh/Astro/askap-pipeline/pipeline/pipeline/main.py", line 102, in process_pipeline
    batch_size
  File "/Users/josh/Astro/askap-pipeline/pipeline_env/lib/python3.7/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/Users/josh/Astro/askap-pipeline/pipeline_env/lib/python3.7/site-packages/django/db/models/query.py", line 474, in bulk_create
    ids = self._batched_insert(objs_without_pk, fields, batch_size, ignore_conflicts=ignore_conflicts)
  File "/Users/josh/Astro/askap-pipeline/pipeline_env/lib/python3.7/site-packages/django/db/models/query.py", line 1204, in _batched_insert
    ignore_conflicts=ignore_conflicts,
  File "/Users/josh/Astro/askap-pipeline/pipeline_env/lib/python3.7/site-packages/django/db/models/query.py", line 1186, in _insert
    return query.get_compiler(using=using).execute_sql(return_id)
  File "/Users/josh/Astro/askap-pipeline/pipeline_env/lib/python3.7/site-packages/django/db/models/sql/compiler.py", line 1335, in execute_sql
    cursor.execute(sql, params)
  File "/Users/josh/Astro/askap-pipeline/pipeline_env/lib/python3.7/site-packages/django/db/backends/utils.py", line 99, in execute
    return super().execute(sql, params)
  File "/Users/josh/Astro/askap-pipeline/pipeline_env/lib/python3.7/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/Users/josh/Astro/askap-pipeline/pipeline_env/lib/python3.7/site-packages/django/db/backends/utils.py", line 76, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/Users/josh/Astro/askap-pipeline/pipeline_env/lib/python3.7/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/Users/josh/Astro/askap-pipeline/pipeline_env/lib/python3.7/site-packages/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/Users/josh/Astro/askap-pipeline/pipeline_env/lib/python3.7/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.DataError: value too long for type character varying(32)

Remove Q3C install and related code

should we remove Q3C and cone searching in the db for the time being?

Pipelie config is a mess

Configuration settings for both pipeline and single pipeline run is a mess as all the flags used by Aegean and older pipelines were imported but really not used.

Forced fits strategy for non detections backwards and forwards in time

When a source either "disappears" or a new source is detected it is often vital to get information at the location of the source in images where it is not detected at that location.

Both VAST and TraP do this. VAST uses Aegean 'priorized` fitting (I think both backwards and forwards in time). It is an optional feature. TraP does force fits using PySE forward in time only after a new source is detected. It is optional for how many epochs to continue force-fitting the detected source if not detected.

TraP has a separate 'monitor' feature where you can ask it to do forced fits at a location throughout the entire run to build a light curve. Only downside of this feature is that you have to run the TraP again.

The reason the options above are optional or have some optional element is that they are computationally expensive, especially when a typical TraP run of deep ASKAP images can produce thousands of new sources, most of which are rubbish. I think we can do better then the current 'new source analysis' that TraP does which may help this issue, but that's a separate issue in itself.

Another fast option could perhaps be to fetch Xsigma upper limits using the RMS maps (which are produced by default by ASKAPsoft).

Simple Cross match using Astropy

Add simple crossmatch using astropy matching catalogs

Flag Sources/Catalogs as "New"

It's good to flag sources that appear as 'new' so that they can be quickly checked. Problem currently is that there are a lot of them. TraP attempted to somewhat solve this problem by using a 'new source sigma margin', to try and determine if the source is just at the detection threshold or is an actual transient source of interest. Essentially:

The previous best image (by minimum rms) is found.
The sigma of the new source in question is calculated assuming the source was in the location of the best rms of that best previous image and the worst rms in the same image.
The sigma is then checked against the detection sigma and the new source sigma margin: SigmaNSBest > (SigmaDet + SigmaMargin).
If it meets the above criteria then the source is added to the new source list.
If it also meets SigmaNSworst > (SigmaDet + SigmaMargin) then the source is flagged as a likely transient.

In the above form with ASKAP images often the 'best rms' is just too low and the 'worst rms' is too high and with the RMS being quite variable over an ASKAP image, it is far from ideal.

This can be improved by performing would be rms checks on the actual location of the source in the rms maps, and would significantly help in finding transients quickly directly from the pipeline without the need for further analysis. Perhaps ordering importance by the highest previous sigma value for a new source.

Add standard variability metrics

Can add the variability metrics that we want. We could add both TraP and VAST metrics.

Note that selavy source errors are still not usable so we are still using the local_rms column from that.

Fix Astropy association

need to do:

calculate average int flux
calculate average peak flux
calculate average max peak flux
ave ra and dec for every image iteration in generating the SkyCoord object

askap-vast / vast-pipeline Goto Github PK

vast-pipeline's Introduction

VAST Pipeline

Features

Screenshots and Previews

Contributors

Acknowledgements

vast-pipeline's People

Contributors

Stargazers

Watchers

Forkers

vast-pipeline's Issues

Recommend Projects

Recommend Topics

Recommend Org