Beacon is an end-to-end I/O resource monitoring and diagnosis system, for the 40960-node Sunway TaihuLight supercomputer, current ranked world No.3. Beacon simultaneously collects and correlates I/O tracing/profiling data from all the compute nodes, forwarding nodes, storage nodes and metadata servers. With mechanisms such as aggressive online+offline trace compression and distributed caching/storage, it delivers scalable, low-overhead, and sustainable I/O diagnosis under production use.
With its deployment on TaihuLight for around 18 months, it has successfully helped center administrators identify obscure design or configuration flaws, system anomaly occurrences, I/O performance interference, and resource under- or over-provisioning problems. Several of the exposed problems have already been fixed, with others being currently addressed. In addition, Beacon can be adopted by other platforms. Beacon's building blocks, such as operation log collection and compression, scheduler-assisted per-application data correlation and analysis, history-based anomaly identification, automatic I/O mode detection, and built-in interference analysis, can all be performed on other supercomputers.
This is joint work among 5 institutes, Shandong University, Tsinghua University, Qatar Computing Research institute, Emory University, and National Supercomputing Center in Wuxi.
- NSDI19, End-to-end I/O Monitoring on a Leading Supercomputer, Bin Yang, Xu Ji, Xiaosong Ma, Xiyang Wang, Tianyu Zhang, Xiupeng Zhu, Nosayba El-Sayed, Haidong Lan, Yibo Yang, Jidong Zhai, Weiguo Liu, and Wei Xue, PDF
- FAST19, Automatic, Application-Aware I/O Forwarding Resource Allocation, Xu Ji, Bin Yang, Tianyu Zhang, Xiaosong Ma, Xiupeng Zhu, Xiyang Wang, Nosayba El-Sayed, Jidong Zhai, Weiguo Liu, and Wei Xue, PDF
We are now cleaning up our codes and gradually open source Beacon code/Data collected on Sunway TaihuLight, including monitoring and analysis methods.
This part helps users to deploy Beacon on their clusters.
- Linux OS
- Python 2.7 or above
- C
- Elasticserach-1.5 or above
- Redis-3.0 or above
- Logstash-1.5 or above
Prepare monitoring nodes (Beacon's client will collect data on these nodes), storage nodes (Beacon's server will store data on these nodes), and a dedicated visualization node (Beacon's web server will run on this node)
- Deploy monitoring daemons on monitoring nodes
- Deploy Elasticsearch + Redis + Logstash on storage nodes
- Deploy web server on the dedicated visualization node
Before run Beacon's client (monitoring daemons) on monitoring nodes, we should configure Beacon's storage server first. Below will show how to configure Beacon's sotrage server after we install Elasticsearch, redis, logstash successfully.
- Use logstash to collect messages from monitoring daemons
input {
file {
type => "lala test"
path => "/root/testfile"
}
file {
type => "sys messages"
path => "/var/log/messages"
}
tcp {
port => "9999"
codec => "plain"
}
stdin {
}
}
output {
stdout {
codec=>rubydebug
}
}
output {
redis {
host => 'localhost'
data_type => 'list'
port => '6379'
key => 'logstash:redis'
}
}
- Use logstash to extract message from Redis and store it to the Elasticsearch
input {
redis {
host => 'localhost'
data_type => 'list'
port => "6379"
key => 'logstash:redis'
type => 'redis-input'
}
}
output {
stdout {
codec=>rubydebug
}
}
output {
elasticsearch {
host=>localhost
cluster=> "elasticsearch_cluster"
}
}
- Use Redis to cache messages
pidfile /var/run/redis.pid
port 6379
timeout 0
loglevel verbose
logfile /var/log/redis.log
dbfilename dump.rdb
dir /root/ELK/redis/db/
## vm-swap-file /tmp/redis.swap
- Use Elasticsearch to store data
cluster.name: elasticsearchzhengji
# default configuration
We can nearly use the default configuration. However, remember to set the same cluster name and ensure these backend nodes are in the same network segment.
Our code will be opened source in /sys, including monitoring module, analysis module and web interface. For more detailed information, please read README in sys directory.
The directory /data is used to store released data, and we plan to release data gradually.
Step to obtain the data:
- We put released data on cloud
- We share the link here
- Anyone can obtian these data by access the link here with fetchCode
8pja
- New released data on our cloud!!!(30 days since from 2019-07-01)
The released dataset is a sample 14-day (start from 2018-01-01, the total data size is about 50GB compressed) multi-level I/O monitoring dataset from the TaihuLight Supercomputer, currently the world's 3rd largest.
Original data format are shown below:
index-name || data-type || id || score || message || @version || @timestamp || host
After data cleaning:
message || timestamp || host
In order to release data, we perform a mapping strategy:
- Every file or directory will be instead by a hash value.
- Every User will be instead by "Userxxx"
Example:
Original:
[2018-09-10 14:16:52] T OPEN() /User_storage/job1/file1/file2/file3/file4/file5 => 0x1200bd3f0
After mapping:
[2018-09-10 14:16:52] T OPEN() /User146/6596814368836924247/-1160749754054947605/-8481035609384531935/2230746621555036977/756880090362066628/-1752974055252976644 => 0x1200bd3f0
By the way, for open opearation
: There may be two same open opearation in the trace. e.g:
[2018-09-10 14:16:52] T OPEN() /User_storage/job1/file1/file2/file3/file4/file5 => 0x1200bd3f0
[2018-09-10 14:16:52] T OPEN() /User_storage/job1/file1/file2/file3/file4/file5
We can find that only one open operation has the file descriptor. The reason for this phenomenon is that one operation is the request initiator without file descriptor, and the other operation has alreadly received the request comletation signal with file descriptor.
- ES_COMP (Data collected by Beacon from compute nodes node by node)
- ES_FWD1 (Data collected by Beacon from default forwarding nodes)
- ES_FWD2 (Data collected by Beacon from rest forwarding nodes)
- ES_MDS (Data collected by Beacon from MDS)
- ES_Latency (Data collected by Beacon from forwarding nodes, on LWFS servers, including queue length and latency)
- ES_OST1 (Data collected by Beacon from default storage nodes)
- ES_OST2 (Data collected by Beacon from rest storage nodes)
- Job
(Data collected by Beacon from applications running on the TaihuLight
# PS: we are still applying for open sourece this part.
)
More data will be released, if you are interested, please follow us.
Thanks for checking this project out! I hope you find it useful. Of course, there's always room for improvement. Feel free to open an issue so we can make Beacon better, stronger, faster.
Also, if you have any questions,contact us:
Email: [email protected].