hisparc / publicdb Goto Github PK

View Code? Open in Web Editor NEW

8.0 14.0 4.0 12.25 MB

The HiSPARC Public Database

Home Page: https://data.hisparc.nl/

License: GNU General Public License v3.0

Makefile 0.10% Python 55.61% JavaScript 19.89% CSS 2.96% HTML 20.84% Shell 0.48% Dockerfile 0.12%

outreach derived-data cosmic-rays

publicdb's Introduction

HiSPARC Public Database README

Overview

The HiSPARC Public Database is a Django application which stores derived data and displays it for public use. It exposes an API to download raw detector data from the datastore, as well as an API to participate in analysis sessions.

To run and test the code, you can use Docker to setup a contained test environment.

Table of Contents

Important information regarding provisioning the production servers

When you first run ansible on a freshly-installed system, you're likely to run into an error like:

sudo: sorry, you must have a tty to run sudo

You can fix that by manually logging into the machine, and typing:

$ sudo visudo

And changing the line:

Defaults requiretty

to:

Defaults !requiretty

Also, lock the root account and the user account. First, make sure to add your public key to ~/.ssh/authorized_keys, with the mode of both the directory and the file set to 0600. First make sure to test logging in without a password!!! Only then, lock the accounts:

$ sudo passwd -l root
$ sudo passwd -l hisparc

The only way to get into the machine is via SSH, so don't lock yourself out! (Actually, there is another way. With console access, you can reboot in single user mode.)

Provisioning production servers

We use Ansible for all our provisioning needs. You can run it from the top repository directory. First install ansible and its requirements:

$ pip install ansible
$ ansible-galaxy install -r requirements.yml

At that location, there is a file called ansible.cfg which sets up a few config values. To run the playbook, issue:

$ ansible-playbook provisioning/playbook.yml

Beware, however, that this will run provisioning for all production servers. It is very useful to limit the hosts for which to run the provisioner, e.g.:

$ ansible-playbook provisioning/playbook.yml --limit publicdb

If you want to check first what the provisioner would like to change, without actually changing anything, use the -C option:

$ ansible-playbook provisioning/playbook.yml --limit publicdb --check

Running a provisioner from a remote location

The servers can not be accessed directly from outside the university network. However, if there is a jump server available, it can easily be configured to automatically be used by setting it in your ssh configuration.

Add this to your .ssh/config:

Host hisparc-data hisparc-raw
    ProxyJump notchpeak1.chpc.utah.edu

Running with Docker-compose

Install and start Docker, then in this directory do:

$ docker-compose up -d

If this is the first run you should now run database migrations:

$ docker-compose exec publicdb ./manage.py migrate

In order to populate the database you can use a dump of the production database:

$ docker-compose exec -T postgres pg_restore --username=hisparc --dbname=publicdb < publicdb_dump.sql

or create some fake data:

$ docker-compose exec publicdb ./manage.py createfakedata

To clean the database again to fill it with new fake data use:

$ docker-compose exec publicdb ./manage.py flush --noinput
$ docker-compose exec publicdb ./manage.py loaddata publicdb/histograms/fixtures/initial_generator_state.json
$ docker-compose exec publicdb ./manage.py loaddata publicdb/histograms/fixtures/initial_types.json

To run the tests use:

$ docker-compose exec publicdb make coveragetests

Hints for running a development publicdb server

In order to create a tiny copy of the datastore for development purposes, do:

$ python scripts/download_test_datastore.py

To generate the histograms for the downloaded data:

$ ./manage.py updatehistograms

publicdb's People

Contributors

Stargazers

Watchers

Forkers

tomkooij larryklean 153957 rajeevyadav

publicdb's Issues

jSparc: Calculation of student scores

The list of student scores does not show the best score by each student.
What it does show is unclear to me, some weighted mean?

Analysissession: 'Session created' mail has no link to jSparc

The 'HiSPARC Analysissession created' email only contains a link to the results page, not to the page where the sessions can be analysed: http://data.hisparc.nl/media/jsparc/shower.htm

Scrolling of the map is jerky with magic mouse/trackpad

Perhaps we should use what is discussed here:
openlayers/openlayers#483

Source button for GPS coords not shown

Also add GPS height graph.

Update Django to v1.4

We are running on Django v1.1 and want to move to v1.4

While making sure everything keeps working.

jSparc: unknown GPS positions

jSparc interfaces with the publicdb through the jSparc Django app. In views.py, get_coincidence, information from a station is inserted into the data, including the GPS position, at that moment in time. Sometimes, however, the GPS position is not available since there are no configuration messages sent before that date. In that case, better to return the future first-known position, instead of (lat, lon, alt) = (0., 0., 0.).

Add status information to stations and new station_status pages

When we implement a new status monitor for the station-software we will want to be able to view the results on data.hipsarc.nl

As in #2 we want to have status-indicators on the stations page next to each station, to show how well it is running at that moment.

When this status indicator is clicked we want to have a new station_status page which will show some of the status history.
We need to determine what data we want to show, and for what time ranges (perhaps interactive with flot).

API crashes for 'stations_with_data' if day not specified.

For a month or a year, it crashes::

$ curl http://data.hisparc.nl/api/stations/data/2013/5/

[Django debug output]

Color the HV values of the Channels in the sidebar

Give them (the header and/or value) the same color as their corresponding line in the graphs to easily see which belongs to which.

Add station health/data quality indicators to Stations page

It would be useful to be able to see the health (based on to be determined parameters) of each station on the stations page.

Has it uploaded data yesterday
What was the quality of that data
- Shift in MPV value
- Change in total number of events
- ..
Recent problems
..

Publicdb source download: CRLF / LF

Nice! The csv-files that can be downloaded by clicking on the source buttons on the data pages returns files with unix-style line breaks: LF. Windows doesn't get that, it wants CRLF. Fix that platform independently???

Allow for British Postalcodes

The Postalcodes in England have a slightly different format.
For example:
BS8 3JD

We should allow for more international postal codes.

Download processed data

The ubiquitous sparklines branch also includes code for downloading processed data as a csv-file. That is, event timestamps, pulse heights, and even particle arrival times can be downloaded using a crude form. Include this on the main site and clean it up.

This will finally enable students to quickly start analyzing shower direction using one station.

Sort order and grouping stations page

The sort order and the grouping of stations on the station page should be changed. Something like: Netherlands at the top, alphabetically sorted countries below that. Main clusters alphabetically sorted, with subclusters grouped with the parent cluster.

Add new views for a new 'live-display' page

Kaj (from Twente) proposed the creation of a new (live-display) program for the purpose of showing in real-time the events of a HiSPARC station.

For this we will probably need to implement new views in the public database which this program can use:

Trace of most recent event
Pulseheight histogram of last day (or last n events)
Angle reconstruction for stations with 4 detectors (for last event)
- Or one angle for stations with 2 detectors
GPS coordinates
- Which can then be shown on a map
- This can also be combined with the angle reconstruction to show how the showers looks on the map
Weather station information
Lightning detection
...

Hide unused/test stations in the Station List

We need to add a (boolean) column to the Station table which will determine if the station should be shown (or flagged) in the Station List.

This can be used to hide stations that were added but never made active or test stations.

Update PyTables code for PyTables 3

With the new vagrant deployment we get the latest version of PyTables (v3) this causes some warnings for the current code:

DeprecationWarnings:
openFile -> open_file
getNode -> get_node
createTable -> create_table
readCoordinates -> read_coordinates
etc..

We can try using the pt2to3 tool to update the source code.

Download station coincidences

It should be possible to ask http://data.hisparc.nl/ for a list of coincidences of some selected stations within a period of time. Think of a nice csv data format.

jSparc: Title name is case sensitive

The title is case-senstive because the title is compared to see if it equal to the title that was returned from the AnalysisSession.objects during the request. However, during the request the title is not case sensitive, so a wrong-cased title can be used for the request, returning a 'correctly'-cased title, causing it to fail during the sending of the result.

See b16cfa7, in views.py line 115.

Detector 3/4 in histograms for stations with only 2 detectors

In all of yesterday's histograms for the 2-detector stations there is a peak in the first bin colored green/blue (detector 3/4). This peak is as high as the total amount of events. An example:

Source http://data.hisparc.nl/show/stations/7/2013/5/14/ :

[[1, 14, ..., 2, 1],
 [0, 32, ..., 0, 2],
 [66255, 0, ..., 0, 0],
 [66255, 0, ..., 0, 0]]

The day before these stations did not have this: http://data.hisparc.nl/show/stations/7/2013/5/13/
This is after the update to use the ESD.

The ADC counts for the non-present detectors should be -1

> EDS_2013_5_14.root.hisparc.cluster_amsterdam.station_7.events.col('pulseheights').T
array([[208,  80, 274, ...,  81,  60, 157],
       [195, 280, 320, ...,  71, 542, 241],
       [ -1,  -1,  -1, ...,  -1,  -1,  -1],
       [ -1,  -1,  -1, ...,  -1,  -1,  -1]], dtype=int16)

That works correctly, the only processing then done is:

pulseheights *= .57
hist, bins = numpy.histogram(pulseheights, bins=numpy.linspace(0, 2000, 201))

I tried this manually on data from the EDS, but then correctly got 0 counts in those first bins for detector 3/4.

Since the multiplication with .57 still returns values smaller than 0 and the most left bin edge is 0, I do not see how these value's got to be counted in that first bin.

Other value's smaller than 0 do seem to be ignore.
We have one station with 2 detectors, but those are detectors 3 and 4.
It records -999 as ADC values for detector 1 and 2.
It does not show a black/red line in the first bin.

http://data.hisparc.nl/show/stations/10/2013/5/13/
http://data.hisparc.nl/show/stations/10/2013/5/14/

Create a Vagrant file/box for easy dev environment setup

We should create a Vagrant box to make setting up a development area easy for new people.
This box should use the same OS and setup as out server, so run the same version of mysql/django/python packages etc..

Better checking when to show a page, and when to show links to a page

This works, and there is data in 2011 for this station: http://data.hisparc.nl/django/show/stations/7002/2011/10/1/

However, this page does not work; http://data.hisparc.nl/django/show/stations/7002/2011/3/25/
Yet, other years link to this page. So we need more robust checking if it is ok to link to another page.

Move those 404 queries to separate functions?

KASCADE station cluster

On 7-1-2011 the KASCADE station (701) was changed from Cluster Karlsruhe to Cluster Amsterdam (subcluster Karlsruhe).
This can cause some problems when, for instance, downloading data.
When working on dates prior to or on 7-1-2011 the script will look for station 701 in the cluster_amsterdam group while it should (also) be looking in the cluster_karlsruhe group.

To fix this, we have renumbered station 701 to 70001 and moved it back to Karlsruhe.
Now station_701 nodes should be renamed to station_70001, and some need to be moved from the cluster_amsterdam group to the cluster_karlsruhe group.

Rights for /var/log/hisparc-update.log

The hisparc user has a cronjob set to run hisparc-update.py and log to /var/log/hisparc-update.log. This log is not yet created in the provisioning and hisparc has no rights to write to /var/log/.

So add an empty hisparc-update.log file and give ownership to hisparc.

Date navigation on station histograms page

If you are on a station histograms page, seeing the page for 'tomorrow' or 'yesterday' is not easy. The calendar navigation is nice, but not optimal. Certainly not for quickly walking through all data.

Pulse integral histogram

The Pulse integral histograms needs its x-axis to be fixed.
At the moment the x-values are in (mVsample) (1 sample = 2.5 ns)
They should be in (mVns).

Add link to latest HiSPARC Installer

Add a link that will always get the latest available installer.
Currently it is a manual operation to first upload a new installer to the server and the fix the links to the installer.

Make standard sessions for jSparc

Make 'default' analysis session sets for jSparc.

'Prevent' issues with requests
Easy starting, no waiting
Make new results for each new analysis of standard set

More meaningful event histogram

With the work by Richard Bartels we should be able to make a more meaningful 'Events Histogram'.
By cutting the events that we count by using the MPV peak we reduce the influence of the temperature of the PMT, High Voltage shifts and some other effects that can change the number of events detected, (while we have the same number of actual showers).

(When this change has been implemented and all histograms will be updated this should be combined with the histogram update mentioned in issue #1)

consolidate map scripts into external script

Most usages for the OpenLayers Maps now have all code in their html files, meaning that they don't share code and global changes are hard to implement.
A large portion of this code should be moved to external files.

Pulseheight unit: millivolt vs ADC counts

The pulseheight histogram on the station overview page is currently shown as number-of-events vs pulseheight-in-millivolt.

If we check the pulseheight most-probable-value via the station status page (nagios), the MPV is stated in units of ADC counts (without mentioning the unit).

This discrepancy in usage of units is confusing, even if the units are stated. Converting between units is a menial task. Therefore it is suggested to use only one unit everywhere for the pulseheight.

New stations not automatically added to 'Detector HiSPARC'

Found trying to analyse a jsparc session.

New detector stations are not automatically added to Inforecords -> 'Detector HiSPARC'
Even after they have uploaded their first config with GPS position
This information is used by the new stations on map view and also jsparc
This should perhaps be automatically updated whenever a new (different) config is found

Improve sidebar

html/css currently messes up the layout a bit for Status views. (Help link over nagios elements)

wrap entire sidebar in a div to float right?

status counts on stations page count those with 'warning' also as 'up'.

Definitions of status:
Up - Host is Ok
Warning - Host is Ok but a service is Critical
Down - Host is Down

All stations in Warning are also in Up.
This can be confusing, since we also show only 1 color indicator next to the station names.

GPS coordinates of 0, 0, 0 should be banned!

Add filter to ignore 0, 0, 0 GPS coordinates.
Use last known (good) coordinates.

Pulse integral (and heights) overflows

Keep record of overflows in pulse integral/pulseheight histograms.
Currently those events are simply ignored.

Find a nice way to show them (all in last bin?)

Looking at a coincidence in the site administration 'hangs' server

Looking at a coincidence causes a very long load of a list of all coincidences, this lookup causes the server to use 100%, hanging it for some time.

Analysis sessions space usage

The coincidences events use a lot of space (probably due to their traces)

$ du -h /var/lib/mysql/publicdb/*.MYD
779M    coincidences_event.MYD
659M    histograms_dailydataset.MYD
302M    histograms_dailyhistogram.MYD
<25M    [rest]

We should clean up events after they have been analysed or when a session has ended, since they will not be used again.

Except perhaps for the 'Get Example' option in jSparc, so there should always be some coincidences/events (make a couple of never ending 'ghost' sessions?)..

Recreate histograms for all data

To make sure all pulse integral histograms are correct.

Since the recent updates the pulse integral is now correctly calculated (* 2.5 to go from mVsample to mVns)

Fix all csrf problems

Some Django version added csrf protection.
This is required by jSparc (requesting analysis sessions) and also for logging in on the admin area.
But it should not be enabled when requesting an url to download data. Currently the jSparc and downloading work correctly, but the admin area does not.

Django 1.7 changes

Things to lookout for when upgrading to Django 1.7:

Requries Python 2.7, should not be an issue.
Schema migrations now handled by Django, so remove South and migrate the migrations to Django.
New method on custom Field subclasses
Initial data loading has changed, especially when migrating
Form validation changes (no need to return cleaned_data, etc)

Check this page for more changes:
https://docs.djangoproject.com/en/1.7/releases/1.7/

Weather plots x-axis

The weather plots claim to use UTC on the x-axis, but seem (according to the tick labels) to show data from 23h (previous day) until 23h (current day).
When downloading the source data (csv) for a plot the first timestamp is the correct date (i.e. 1330560001 = March 1, 2012, 00:00:01)
So somehow the conversion from timestamp goes wrong.

Moreover, the first label of the x-axis is the date, this should be 0h, like in the event histogram.
The number of ticks on the x axis should be make the same for those plots.

(Example http://data.hisparc.nl/show/stations/501/2012/3/1/ )

Num events in API incorrect

The num events in the API for day/month/year uses the num_events in Summaries.
These use the number of rows in the datastore files.
So duplicate events are not yet filtered from this number.
Number of events per hour are based on esd data, and are therefore correct.

possible solutions:

To get the number of events per day use sum of the histogram. This works but is probably slower.
Add a new field to the Summary: num_events_esd. To store the filtered event number.

Sparklines on stations page

It would be nice to have sparklines (jquery.sparkline) on the Stations page as extra indicator of how well a station is performing.

Area for station-logs

We want to add pages with accompanying forms that allow us to keep a log of changes/repairs/errors/problems with HiSPARC stations.

This will help us keep track of all problems, and for example know if somebody else has experienced the same problem before and why someone changed some settings, etc.

The forms should ask for the following info:

Name (perhaps linked to your account)
Station
Problem category (electronics, software, notes, etc..)
Log entry

And should also store the date of when this entry is submitted.

The Page which hows these logs should be able to show the log for one station or multiple/all stations.
And perhaps also include information from the station, when it submitted new configs or did a diagnostic check.

[develop] jSparc: 'session_hash' is not defined

Sending a coincidence analysis result back to jSparc (/publicdb/jsparc/result/) results into the following error message:

ERROR:django.request:Internal Server Error: /jsparc/result/

Traceback (most recent call last):
File "/usr/local/lib/python2.6/site-packages/django/core/handlers/base.py", line 111, in get_response
response = callback(request, _callback_args, *_callback_kwargs)
File "../django_publicdb/jsparc/views.py", line 115, in result
assert coincidence.session.hash == session_hash

NameError: global name 'session_hash' is not defined

Change plotting engine to javascript

Currently the server has to generate each plot (unless already in cache) when someone requests to see it.

The problems with this are:

it is costs the server some cpu time to calculate these images
the images take up storage on the server (or more cpu if they have to be regenerated every time)
sending the images takes bandwidth, and time on slow connections
it also means that if we want to make a change to some plot the server will have to dump the entire cache

If we use a Javascript library (like flot or jqplot):

we only need to send the javascript libraries once, they will be cached by browsers and reused on all pages
then we only need to send them the numbers (arrays) required to make the plots and the script/functions that determine how the plots must look.
the plots can be made interactive

At the moment I see one problem with this:

the plots can no longer be easily be downloaded, since they are no longer simple images (use screenshots..)

Django 1.6 changes

Things to watch out for when we upgrade to Django 1.6:

The BooleanField will use None as a default instead of False. This may break some scripts that save new models because None will not be an accepted value for a BooleanField. So set the defaults to False in the models and create migrations. Updated the models where necessary, no migrations needed. b31b50b
Replace import of update_wrapper, no longer included in Django because it is in the standard library of Python (1.6 had new minimum Python version). But it was not used, so removed the import. d32fda2
Addition of QuerySet.datetimes(). This changes QuerySet.dates(), which we use. date() now returns a date objects instead of a datetime objects. However, we only use days/months/years. So should be fine. Moreover, the fields on which .dates() is used are DateFields which is what they should be.
Changes in url reverse(), this needs to be fixed in inforecords/models.py:359 Apperantly this does not affect us.

I don't think I saw any other Backwards incompatible changes that affect us, but that will need to be tested of course.

https://docs.djangoproject.com/en/1.6/releases/1.6/

Filter 'no data' (-999) from temperature plots

Apparently when a weather station had bad (temperature) data it simply returns -999 (°C).
This data is currently simply plotted by the publicdb, this makes it harder to see the useful data around ~10 °C
http://data.hisparc.nl/django/show/stations/501/2012/3/1/

(See #1)

The right axis of the javascript plots is to thick

The border is 2 pixels thick but should be only one pixel thick, like the left axis.