cms-gem-daq-project / gem-light-dqm Goto Github PK
View Code? Open in Web Editor NEWGEM light DQM code
GEM light DQM code
Linked to cms-gem-daq-project/ldqm-browser#13
Need to update the getConfig function:
2.The control sequence will detail which packet of 8 channels is nonzero (i.e. 1001 means first and last nonzero)
Changing the last 16 digits of line 1 from data to control sequence is easy, but that control sequence is really imperative to keeping track of channels
I have implemented an array uint8_t packets[16] to hold the various packets upcoming, may not be the best way.
In order to correctly read off the CRC after all the packets, I needed to count the number of 1's in the control sequence. Method I used may not be optimal.
The variable position of the CRC means that there may be required a lot of empty space at the end of a particular line in order to keep the VFAT payload one block of 64 bit lines.
At the moment changes are just being made to the VFAT read_fw,sw,tw methods in GEMAMC13EventFormat.h, but changes will also need to be made to their instances in the unpacker c++ file (now read_fw accepts two arguments, one to switch modes).
I am just starting this project; at the moment the changes look like they made only need to be applied locally, but that may change as I look into the surroundings of the unpacker file.
Below is a schematic that compares the two types of VFAT payloads
Latency scans shows values in cero for all the OH in the cosmic stand.
*.raw files copied to NAS and located in:
/data/bigdisk/GEM-Data-Taking/GE11_QC8/Cosmics/run000456/
unpacker (ferol) and LightDQM executed using runLDQM.sh script:
runLDQM.sh -i 1 -r 456
Normal termination, output file:
/data/bigdisk/GEM-Data-Taking/GE11_QC8/Cosmics/run000456/run000456_allLumi_index000000.analyzed.root
Analysis script shows a bug:
[gemuser@gem904qc8daq ~]$ anaXDAQLatency.py -s 6 --scanmin 50 --scanmax 75 /data/bigdisk/GEM-Data-Taking/GE11_QC8/Cosmics/run000456/run000456_allLumi_index000000.analyzed.root
Traceback (most recent call last):
File "/opt/cmsgemos/bin/anaXDAQLatency.py", line 50, in
if 'gemType' not in inFile.latTree.GetListOfBranches():
NameError: name 'inFile' is not defined
[gemuser@gem904qc8daq ~]$
Fix hardcoded in local copy:
cd /home/gemuser/fsimone
python anaXDAQLatency.py -s 6 --scanmin 50 --scanmax 75 /data/bigdisk/GEM-Data-Taking/GE11_QC8/Cosmics/run000456/run000456_allLumi_index000000.analyzed.root -r 456
Output plots are empty
[gemuser@gem904qc8daq ~]$ anaXDAQLatency.py -s 6 --scanmin 50 --scanmax 75 /data/bigdisk/GEM-Data-Taking/GE11_QC8/Cosmics/run000456/run000456_allLumi_index000000.analyzed.root
I've noticed there's a long processing time for runs which somewhat impedes rapid turn-around from data-taking to results. For example run000029 finished yesterday at 21h41:
-rw-r-----. 1 GE11COSMICS gemuser 7.2M Nov 27 21:41 run000029_LatencyScan_TIF_2016-11-27_chunk_691.dat
-rw-r-----. 1 GE11COSMICS gemuser 3.6M Nov 27 21:41 run000029_LatencyScan_TIF_2016-11-27_chunk_692.dat
-rw-r-----. 1 GE11COSMICS gemuser 3.6M Nov 27 21:41 run000029_LatencyScan_TIF_2016-11-27_chunk_693.dat
However this morning it is still processing ~11 hours after the fact (current time is 8h21):
[GE11COSMICS@cosmicstandtif ~]$ ls -lhtr $GEM_DATA_DIR/run000029* | grep -v "chunk"
-rw-r--r--. 1 GE11COSMICS GE11COSMICS 11M Nov 28 08:21 /data/bigdisk/GEM-Data-Taking/GE11_QC8//run000029_LatencyScan_TIF_2016-11-27.analyzed.root
-rw-r--r--. 1 GE11COSMICS GE11COSMICS 98M Nov 28 08:21 /data/bigdisk/GEM-Data-Taking/GE11_QC8//run000029_LatencyScan_TIF_2016-11-27.raw.root
-rw-r--r--. 1 GE11COSMICS GE11COSMICS 790M Nov 28 08:21 /data/bigdisk/GEM-Data-Taking/GE11_QC8//run000029_LatencyScan_TIF_2016-11-27.dat
This is additionally troublesome if runs are taken together. Right now run000028 has finished but hasn't started processing yet:
[GE11COSMICS@cosmicstandtif ~]$ ls -lhtr $GEM_DATA_DIR/run000028*
...
...
...
-rw-r-----. 1 GE11COSMICS gemuser 495K Nov 27 21:07 /data/bigdisk/GEM-Data-Taking/GE11_QC8//run000028_LatencyScan_TIF_2016-11-27_chunk_555.dat
[GE11COSMICS@cosmicstandtif ~]$ ls -lhtr $GEM_DATA_DIR/run000028* | grep -v "chunk"
[GE11COSMICS@cosmicstandtif ~]$
Discussed briefly with Jared maybe multiple file I/O operations are slowing things down? Could we see an improvement by either increasing the number of events in a "chunk" file or having the daemon merge X chunk files together and then start processing from this new "merged" set of files? Alternatively for LS2 should we invest in a multicore machine dedicated to this purpose (say n_core >= 16)?
For the meantime will process desired runs by hand (cat'ing together the "chunk" files).
There should be no hardcoded DB details! They should be passed as parameters.
A typo has been introduced in the unpacker breaking the unpacking of ferol-saved files
run the unpacker over ferol files produced on integration stand
Patch:
Uncomment
Need to update mapping based on the chamber type (should be done after implementation of DB changes)
@jsturdy @cbravo135
Need to create a daemon which runs a script that takes the raw format from xdaq (*.dat) for all shelves slots and links, unpacks the data and for each ohkey() converts it into a gemTreeStructure format.
The idea is for a new executable, gemTreeTranslator
, to use the GEMUnpacker.cc in a similar fashion to gemTreeReader and create a set of TFiles (one per AMC/OH pair).
It will fill the following branches:
the following branches:
Brian mentions that "The rest of the parameters could be filled with dummy values or obtained from a DB query of the conditions/configuration DB at runtime"
Does not currently exist.
Should be done within approximately 1 week.
Trying to analyze latency data on cosmicstandtif.cern.ch
Package was already installed on cosmicstandtif.
Following compilation instructions found in Section 2.2.2 in the "GEM_Light_DQM_Reference_Guide.pdf"
Then launched the DAEMON following instructions in section 2.4.1 of the same manual.
Actual command was:
[GE11COSMICS@cosmicstandtif ~]$ python2.7 manage.py runserver cosmicstandtif.cern.ch:5017 >& log-lqdm.log &
Attaching log file showing errors.
log-ldqm.txt
Also tried to run manually:
[GE11COSMICS@cosmicstandtif LightDQM]$ /home/GE11COSMICS/Software/DAQ/gem-light-dqm/gemtreewriter/bin/linux/x86_64_slc6/unpacker /tmp/run000180_LatencyScan_TIF_2016-11-11_chunk_768.dat sdram
/home/GE11COSMICS/Software/DAQ/gem-light-dqm/gemtreewriter/bin/linux/x86_64_slc6/unpacker: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by /home/GE11COSMICS/Software/DAQ/gem-light-dqm/gemtreewriter/bin/linux/x86_64_slc6/unpacker)
Current version of gem-light-dqm
consumes all available RAM + very large swap space, totally amounting around 13GB. Initial suspicion that we have a memory leak somewhere appears to be incorrect. In fact, such large memory consumption is not a bug, but a "feature".
According to R. Brun, a size of a histogram in memory equals to:
sizeof(TObject) + sizeof(Title) + sizeof(Name) + sizeof(Type_t)*(n_bins+2)
where TObject
is a histogram type (e.g. TH1D
) and Type_t
is its data type (e.g. Double_t
for TH1D
).
I have calculated the memory size for VFAT histograms and got following:
root [5] s = sizeof(TH1D) + (4096+2)*sizeof(Double_t)+sizeof("BC")+sizeof("Bunch Crossing Number")
(int) 33809
root [8] int vfatH_size=0
(int) 0
root [10] vfatH_size +=s
(int) 33809
root [11] s = sizeof(TH1F)+sizeof("n_hits_per_event")+sizeof("n_hits_per_event")+sizeof(Float_t)*131
(int) 1558
root [12] vfatH_size +=s
(int) 35367
root [13] s = sizeof(TH1F)+sizeof("EC")+sizeof("Event Counter")+sizeof(Float_t)*257
(int) 2045
root [14] vfatH_size +=s
(int) 37412
root [15] s = sizeof(TH1F)+sizeof("Header")+sizeof("Header")+sizeof(Float_t)*34
(int) 1150
root [16] vfatH_size +=s
(int) 38562
root [17] s = sizeof(TH1F)+sizeof("SlotN")+sizeof("Slot Number")+sizeof(Float_t)*26
(int) 1122
root [18] vfatH_size +=s
(int) 39684
root [19] s = sizeof(TH1F)+sizeof("FiredChannels")+sizeof("FiredChannels")+sizeof(Float_t)*130
(int) 1548
root [20] vfatH_size +=s
(int) 41232
root [21] s = sizeof(TH1F)+sizeof("FiredStrips")+sizeof("FiredStrips")+sizeof(Float_t)*130
(int) 1544
root [22] vfatH_size +=s
(int) 42776
root [23] s = sizeof(TH1F)+sizeof("crc")+sizeof("check sum value")+sizeof(Float_t)*65537
(int) 263168
root [24] vfatH_size +=s
(int) 305944
root [25] s = sizeof(TH1F)+sizeof("crc_calc")+sizeof("check sum value recalculated")+sizeof(Float_t)*65537
(int) 263186
root [26] vfatH_size +=s
(int) 569130
root [27] s = sizeof(TH1F)+sizeof("crc_difference")+sizeof("difference between crc and crc_calc")+sizeof(Float_t)*65537
(int) 263199
root [28] vfatH_size +=s
(int) 832329
root [29] s = sizeof(TH1D)+sizeof("latencyScan")+sizeof("Latency Scan")+sizeof(Double_t)*258
(int) 3089
root [30] vfatH_size +=s
(int) 835418
root [31] s = sizeof(TH1D)+sizeof("latencyBXdiffScan")+sizeof("Latency Scan BX subtracted")+sizeof(Double_t)*4354
(int) 35877
root [32] vfatH_size +=s
(int) 871295
root [33] s = sizeof(TH2F)+sizeof("latencyScan2D")+sizeof("Latency Scan: Chan Vs Latency")+sizeof(Float_t)*1026*130
(int) 534596
root [34] vfatH_size +=s
(int) 1405891
root [35] s = sizeof(TH2D)+sizeof("latencyScanBX2D")+sizeof("Latency Scan vs BX")+sizeof(Double_t)*258*4098
(int) 8459339
root [36] vfatH_size +=s
(int) 9865230
root [37] s = sizeof(TH2F)+sizeof("latencyScanBX2D_extraHighOcc")+sizeof("Latency Scan vs BX when number of fired channels is greater than 100")+sizeof(Float_t)*258*4098
(int) 4230266
root [38] vfatH_size +=s
(int) 14095496
root [39] s = sizeof(TH1F)+sizeof("thresholdScan")+sizeof("Threshold Scan")+sizeof(Float_t)*258
(int) 2061
root [40] vfatH_size +=s
(int) 14097557
root [41] s = sizeof(TH2F)+sizeof("thresholdScan2D")+sizeof("Threshold Scan")+sizeof(Float_t)*258*130
(int) 135223
root [42] vfatH_size +=s
(int) 14232780
root [43] s = sizeof(TH1F)+sizeof("thresholdScanXXX")+sizeof("Threshold Scan XXX")+sizeof(Float_t)*258
(int) 2068
root [44] vfatH_size += s*128
(int) 14497484
which gives ~14.5 MB per single VFAT.
For a current system of 3 AMC's assuming all links are active, it gives ~14.52412*3 = 12.5GB of memory required - and this doesn't account for GEB
, AMC
and AMC13
histograms.
The application should not consume all available RAM
Consumes about 7GB of RAM + 6GB of swap.
Just run the tool and look at the output of top
command
Such huge memory consumption edges the limits of our QC8 PC and leads to crashes in the dqm
code. This is caused by improper application of the light dqm tool which has been initially developed as an expert tool for small systems debugging. As a short time solution I propose to strip down a number of histograms. Mainly VFAT
ones: two-dimensional latency scans and per-channel threshold scan distributions plus some others. This will allow to run the tool with fairly modest RAM consumption (3-4% of total RAM) and make the application much more stable. Of course, a central online DQM has to be implemented in the near future.
develop
Due to the hardcoded dependence on gcc 4.9.3, the DQM code was not building on cosmicstandtif
I installed 4.9.3 and it failed due to some include errors
We need to ensure that the environment on each machine is set up identically, but I don't know what's missing at the moment
These have been listed by their order in the 3 bit registry
2.If we have a package with absolutely no hit on any channel, then SZP and SZD activate
a.SZP will not send any information except a Header (no EC,BC, or CRC, nothing)
b.SZD will send the entire package except the data.
3.The hardest to implement will be DT (sequential zero suppresion)
c. This actively perpetrates the channels into 16 sections of 8 channels. This has been described in the last issue.
2.In a short period, I will implement the other two choices into the code.
At the moment their is no part of the code which verifies which mode is being used (its manually set), and I imagine the structure of the AMC13 payload may need to be changed in order to give me this information.
That said at the beginning of each VFAT payload their is a 3 bit unused space, perfect for the required 3 bit number telling us which mode is being used.
2.The control sequence will detail which packet of 8 channels is nonzero (i.e. 1001 means first and last nonzero)
Changing the last 16 digits of line 1 from data to control sequence is easy, but that control sequence is really imperative to keeping track of channels
I have implemented an array uint8_t packets[16] to hold the various packets upcoming, may not be the best way.
In order to correctly read off the CRC after all the packets, I needed to count the number of 1's in the control sequence. Method I used may not be optimal.
The variable position of the CRC means that there may be required a lot of empty space at the end of a particular line in order to keep the VFAT payload one block of 64 bit lines.
At the moment changes are just being made to the VFAT read_fw,sw,tw methods in GEMAMC13EventFormat.h, but changes will also need to be made to their instances in the unpacker c++ file (now read_fw accepts two arguments, one to switch modes).
I am just starting this project; at the moment the changes look like they made only need to be applied locally, but that may change as I look into the surroundings of the unpacker file.
There should be some simple way to install binaries in PATH or add the correct binary locations to PATH.
Trying to call either the unpacker
or the dqm
outside of the directory:
$BUILD_HOME/gem-light-dqm/gemtreewriter/src/dic/
Results in an error:
$ dqm /data/bigdisk/GEM-Data-Taking/GE11_QC8/Cosmics/run000061_LatencyScan_CERNQC8_2018-07-19_chunk_0.raw.root
--==DQM Main==--
Error in <TCling::RegisterModule>: cannot find dictionary module EventDict_rdict.pcm
$ unpacker run000061_LatencyScan_CERNQC8_2018-07-19_chunk_0.dat sdram
[GEMUnpacker]: ---> Main()
Error in <TCling::RegisterModule>: cannot find dictionary module EventDict_rdict.pcm
Some environment variable is not set correctly?
Should not need to call from directory:
$BUILD_HOME/gem-light-dqm/gemtreewriter/src/dic/
Need to call from above directory
According to lightDQM manual no instructions are needed to setup any env for catching this file
release-v3
bash
Cannot produce analyzed.root
file. Some database issue generates a seg fault and crashes program.
Should be able to produce the analyzed.root
file.
[gemuser@gem904qc8daq dic]$ dqm /data/bigdisk/GEM-Data-Taking/GE11_QC8/Cosmics/run000061_LatencyScan_CERNQC8_2018-07-19_chunk_0.raw.root
--==DQM Main==--
Error connecting to database '' : Can't connect to MySQL server on 'gem904daq01.cern.ch' (111)
*** Break *** segmentation violation
===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0 0x00007f3601f181c9 in waitpid () from /lib64/libc.so.6
#1 0x00007f3601e95e52 in do_system () from /lib64/libc.so.6
#2 0x00007f3601e96201 in system () from /lib64/libc.so.6
#3 0x00007f36080f24b2 in TUnixSystem::StackTrace() () from /usr/lib64/root/libCore.so.6.12
#4 0x00007f36080f4e2c in TUnixSystem::DispatchSignals(ESignals) () from /usr/lib64/root/libCore.so.6.12
#5 <signal handler called>
#6 0x00007f3603674e07 in mysql_send_query () from /usr/lib64/mysql/libmysqlclient.so.18
#7 0x00007f3603674e41 in mysql_real_query () from /usr/lib64/mysql/libmysqlclient.so.18
#8 0x0000000000406786 in simpleDBQuery(st_mysql*, std::string) ()
#9 0x00000000004073e8 in getConfig(std::string) ()
#10 0x000000000040879d in main ()
===========================================================
The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum http://root.cern.ch/forum.
Only if you are really convinced it is a bug in ROOT then please submit a
report at http://root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#6 0x00007f3603674e07 in mysql_send_query () from /usr/lib64/mysql/libmysqlclient.so.18
#7 0x00007f3603674e41 in mysql_real_query () from /usr/lib64/mysql/libmysqlclient.so.18
#8 0x0000000000406786 in simpleDBQuery(st_mysql*, std::string) ()
#9 0x00000000004073e8 in getConfig(std::string) ()
#10 0x000000000040879d in main ()
===========================================================
cd $BUILD_HOME/gem-light-dqm/gemtreewriter/src/dic/
dqm $DATA_PATH/Cosmics/run000061_LatencyScan_CERNQC8_2018-07-19_chunk_0.raw.root
Cannot analyze data taken.
When trying to compile via:
source compile.sh
It complains that command g++-4.9.3
is not found on gem904qc8daq
. Is there a particular reason version 4.9.3
is required?
The settings.sh
file also seems to try to reference:
cd $GEM_DQM_ROOT_BASE
source bin/thisroot.sh
cd -
export LD_LIBRARY_PATH=/usr/local/gcc/4.9.3/lib64:$LD_LIBRARY_PATH
export PATH=$PATH:$BUILD_HOME/gem-light-dqm/dqm-root/bin/linux/x86_64_slc6:$BUILD_HOME/gem-light-dqm/gemtreewriter/bin/linux/x86_64_slc6
There is no such directory /usr/local/gcc
on the machine.
The light dqm manual I have:
GEM_Light_DQM_Reference_Guide.pdf
does not specify installation requirements or what $GEM_DQM_ROOT_BASE
should be (I guess a system install of ROOT
location...?).
Compilation seems to work by just using g++
version 4.8.5 on the machine.
release-v3
bash
Sometimes we receive VFAT blocks with non-existing ChipID. In this case the slot position can't be recovered and channel-to-strip maps can't be assigned. The code should provide protection from the SegFault in such cases.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.