loostrum / darc Goto Github PK
View Code? Open in Web Editor NEWData Analysis of Real-time Candidates from ARTS
Home Page: https://loostrum.github.io/darc
License: Apache License 2.0
Data Analysis of Real-time Candidates from ARTS
Home Page: https://loostrum.github.io/darc
License: Apache License 2.0
Sometimes the following error occurs:
File "/home/oostrum/python36/lib/python3.6/site-packages/darc/amber_clustering.py", line 293, in _check_triggers
**sys_params)
ValueError: not enough values to unpack (expected 7, got 6)
This suggests tools.get_triggers
sometimes returns only 6 values instead of the expected 7. Perhaps this is related to the recent addition of the number of candidates per cluster to the returned values.
DARC should try to load external config file and use inernal file as fall-back.
This way, it is possible to put sensitive information in a local file
In the email from the processor, 3 triggers are listed even if there are 0 AMBER candidates.
Apparently it's counting the 3 headers.
Instead of using ghostscript, PDF merging could be done in python with e.g. pypdf3
Add to both old and new emailer
Plots to be attached to email are quite big; could reduce by using png instead of pdf.
Upon a config change, the current pipeline requires a restart of the affected service.
This interrupts running observations. Instead, service could reload the config
at each observation start
To avoid deadlocks
Something like darc --service dada_trigger --attr port_iquv
should print the DADATrigger.port_iquv
attribute.
Verify that -1 is only printed if there is an error, and 0 if there are no triggers.
Sometimes it seems -1 is printed when there are 0 triggers, perhaps due to a bug
when reading an empty string or list
emailer.py is still part of ARTS-obs.
It should be converted to python3 and made part of OfflineProcessing (master)
Currently using hardcoded 20190416freq_time.hdf5
, which is the one that should be used
This even happens when there are no triggers to process. There are no errors in the logs.
Recent example: arts003 processing 20210311/2021-03-11-07:46:27.FRB20190117A/
, taskid=210311011. During this
observation, the data streams were down. The data
folder is empty, grouped_pulses
contains only a header. This is all as expected.
Did the node miss/fail to process the stop_observation
message? Relevant part of the processor log:
2021-03-11 07:46:23,834.INFO.processor: Starting observation with task ID 210311011
2021-03-11 07:46:23,923.INFO.processor: Processor initialized
2021-03-11 07:46:24,023.INFO.processor: Starting Processor
2021-03-11 07:46:24,025.INFO.processor: Starting observation
2021-03-11 07:46:25,328.INFO.processor: Observation started
2021-03-11 07:46:25,411.INFO.clustering: Starting clustering thread
2021-03-11 07:46:25,417.INFO.extractor: Starting extractor thread
2021-03-11 07:46:25,420.INFO.extractor: Starting extractor thread
2021-03-11 07:46:25,423.INFO.extractor: Starting extractor thread
2021-03-11 07:46:25,426.INFO.extractor: Starting extractor thread
2021-03-11 07:46:25,427.INFO.classifier: Starting classifier thread
2021-03-11 07:46:26,304.INFO.processor: Received header: ['beam_id', 'batch_id', 'sample_id', 'integration_step', 'compacted_integration_steps', 'time', 'DM_id', 'D
2021-03-11 07:46:26,304.INFO.processor: Received header: ['beam_id', 'batch_id', 'sample_id', 'integration_step', 'compacted_integration_steps', 'time', 'DM_id', 'D
2021-03-11 07:46:26,304.INFO.processor: Received header: ['beam_id', 'batch_id', 'sample_id', 'integration_step', 'compacted_integration_steps', 'time', 'DM_id', 'D
2021-03-11 07:46:26,304.INFO.processor: Only header received - Canceling processing
2021-03-11 08:55:11,951.INFO.processor: Starting observation
2021-03-11 08:55:12,037.INFO.processor: Observation parset not found in input config, looking for master parset
2021-03-11 08:55:12,703.INFO.processor: Starting observation with task ID 210311018
There is no stop observation message in the log!
Compare to a snippet of the same part for arts004:
021-03-11 07:46:26,625.INFO.classifier: Starting classifier thread
2021-03-11 08:17:24,650.INFO.processor: Stopping observation
2021-03-11 08:17:24,650.INFO.processor: Observation parset not found in input config, looking for master parset
2021-03-11 08:17:24,654.INFO.processor: Stopping observation
2021-03-11 08:17:24,655.INFO.processor: Finishing observation
2021-03-11 08:17:27,717.INFO.clustering: Stopping clustering thread
2021-03-11 08:17:29,085.INFO.extractor: Stopping extractor thread
2021-03-11 08:17:30,330.INFO.extractor: Stopping extractor thread
2021-03-11 08:17:31,593.INFO.extractor: Stopping extractor thread
2021-03-11 08:17:32,996.INFO.extractor: Stopping extractor thread
2021-03-11 08:17:33,311.INFO.classifier: Stopping classifier thread
2021-03-11 08:17:33,748.INFO.processor: No post-classifier candidates found, skipping visualization for taskid 210311011
2021-03-11 08:17:34,149.INFO.processor: Observation finished: 210311011: 2021-03-11-07:46:27.FRB20190117A
2021-03-11 08:17:56,084.INFO.processor: Scavenging thread of taskid 210311011
2021-03-11 08:55:11,951.INFO.processor: Starting observation
The problem is indeed that stop_observation
was not processed. There is also no stop_observation
in logs of the other modules on arts003, so it is not processor-specific.
Not all subprocesses exit - check which ones are hanging and why
Perhaps enough to only do it for the processor as it's the only one that can run multiple observations at the same time.
The offline processing module will be deprecated anyway, and prints the commands it runs including observation name so there it is not required.
If a candidate has a higher S/N at DM=0 than at the detection DM by some amount,
it should be ignored
Test vs real event default should be set in config.
darc --offline --parset foo start_observation
should send the start_observation command only to the offline processing queue. This would allow observations to be (re)processed while a real-time observation is running
Needs a way to read the AMBER triggers
Based on the field name (and reference frame), the pipeline could automatically detect drift scans and run the calibration tools. Field names are like <source_name>drift<startCB><endCB>
or <source_name>drift<CB>
Required for running them as Process instead of Thread
New processor should not use white background for missing/zero data in visualization, but the same as the old pipeline
The candidates_to_visualize
attribute of Classifier
could be extracted by the parent process with a Pipe
, see https://docs.python.org/3/library/multiprocessing.html
Sometimes the processor thread does not exit, so it keeps showing on the processing web page. The observation itself does finish correctly. May be caused by reaching the processing time limit
arts001; stop of non-existing observation:
2020-12-02 16:46:18,570.ERROR.processor: Failed to stop observation: no such task ID 201202032
2020-12-02 16:46:18,571.ERROR.processor: Caught exception in main loop: <class 'KeyError'>: '201202032'
After a change to the yaml config, the end user should be able to load the new config without restarting the entire pipeline. Restarting services is ok and probably required
Create a status webpage showing the processing of each node / observation.
ProcessorManager on each node should create a .json file with processing status, to be picked up by a website
amber_clustering
corrects top of band to centre of band, but
lofar_trigger
assumes arrival time is top of band.
Should fix in lofar_trigger
Add a module to trigger LOFAR directly, skipping the VOEvent system.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.