Comments (4)
After some preliminary test, it turns out that if a smile file is further compressed into a zip, it consumes around ten times less space. On practical terms, the DAQAggregator for cdaq could possibly write out around a GB per day (and minidaqs would add another GB or so).
The simplest solution would be to just zip every serialized snapshot individually and unzip before deserialization. On the other hand, retroactively zipping/unzipping collections of snapshots (e.g. per day or per hour) would cause disproportionally large delays when a single snapshot is requested, for example in the go-back-in-time use case of DAQView. And it would probably not save more space.
This approach would require adding a zipper and unzipper unit in the serializer and deserializer, respectively. Also, before going for this approach, it is a good idea to check if the realtime applications can afford the latency which would be introduced by the extra steps.
from daqaggregator.
How about the following scheme: the snapshot is written uncompressed, i.e. you don't add latency for the live view. After some time, e.g. 24 hours or a week, e.g. a cron job zips the snapshots. The deserializer would need to look for both kind of files: if no unzipped snapshot is available, it tries to find the corresponding zipped one and decompresses it. Like this you don't introduce latency for newer snapshots, which are more likely to be requested, and save space for the long term storage.
from daqaggregator.
Yes, this sounds like a nice hybrid scheme and it keeps compression on individual snapshot level, which is the important part for DAQView replays.
Both kinds of files could actually be stored in the existing time-based directories structure and the deserializer would just need to inspect the filename extension and apply whichever procedure is applicable (deserialize or unzip+deserialize). The lookup could probably stay unchanged. The only drawback I see is maintaining the extra cron script, but the advantages are for sure important.
from daqaggregator.
If we only implement the simple solution of zipping/unzipping every snapshot individually, there will not be a significant delay.
I have tested the times on my PC for an hour last Saturday (~1000 files), during ongoing run with around 3/4 of the partitions in and running. Based on this I assume that there was a large variety of values within each snapshot, which is usually the case during normal runs. So the task's difficulty was realistic enough.
Overall, the time to read a smile file, zip it, write it, read the zipped, unzip it and write it again as a smile (4 I/Os, 1 compression, 1 decompression) was estimated to be less than 20ms. This should be fine to use during real-time monitoring. A further micro-optimized implementation could possibly save few more milliseconds by pipelining smile to zip on the fly, without doing all the I/Os which were done during the test.
There was not much deviation in time, because there was not much deviation in snapshot sizes either. Snapshots in .smile were around 369kB, while their .zip counterparts were 57kB (thus with zipping you save ~85% in space).
The snapshot directories need not to be changed at all, they could just contain zip files after the implementation goes into production. For backwards compatibility, the deserializer should always check whether a file is actually a zip before applying the unzip function.
The utility libraries to implement this come already with Java, there is no need of external library.
from daqaggregator.
Related Issues (20)
- RU fedIdsWithoutFragments should only be considered when rate is 0 HOT 2
- Support arbitrary number of LASes and auto-discover flash list to LAS mapping HOT 5
- FEROL40 masking not working correctly HOT 1
- DAQ state in Error when BU is Blocked HOT 2
- Back-porting features from Ferol40 to Ferol HOT 1
- add DAQAggregator version to snapshot HOT 2
- add FED type to snapshot HOT 2
- filter all flashlists by DAQ session id HOT 1
- support for uTCA FED TTS states HOT 3
- Negative throughputs from BUs HOT 5
- Writing tmp files
- support for TCDS_CPM_RATES_1HZ flashlists HOT 9
- PartitionDeadtime, FEDDeadtime should be displayed combined (as part of Deadtime)
- storing DAQPartition objects in test data directory HOT 5
- BackpressureConverter retains backpressure when FED is masked HOT 1
- Add support for other networks besides .cms HOT 3
- class RefPrefix
- add ACTION_MSG of LEVEL_ZERO_FM_SUBSYS flashlist to snapshot HOT 2
- When determining an FRLPc's crashed flag, only take into account jobs that are running on the correct port
- cmsdaqfff should not be used from .cms
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from daqaggregator.