Coder Social home page Coder Social logo

loghub's Introduction

Loghub

Loghub maintains a collection of system logs, which are freely accessible for AI-driven log analytics research. Some of the logs are production data released from previous studies, while some others are collected from real systems in our lab environment. Wherever possible, the logs are NOT sanitized, anonymized or modified in any way. These log datasets are freely available for research or academic work.

๐Ÿค— We proudly announce that the loghub datasets have attained total by more than 450 organizations from both industry and academia.

Logs currently available

๐Ÿ”— Get raw logs via hyperlinks in the Download column.

Dataset Description Labeled Time Span #Lines Raw Size Download
๐Ÿ“‚ Distributed systems
HDFS_v1 Hadoop distributed file system log โœ”๏ธ 38.7 hours 11,175,629 1.47GB ๐Ÿ”—
HDFS_v2 Hadoop distributed file system log N.A. 71,118,073 16.06GB ๐Ÿ”—
HDFS_v3 Instrumented HDFS trace log (TraceBench) โœ”๏ธ N.A. 14,778,079 2.96GB ๐Ÿ”—
Hadoop Hadoop mapreduce job log โœ”๏ธ N.A. 394,308 48.61MB ๐Ÿ”—
Spark Spark job log N.A. 33,236,604 2.75GB ๐Ÿ”—
Zookeeper ZooKeeper service log 26.7 days 74,380 9.95MB ๐Ÿ”—
OpenStack OpenStack infrastructure log โœ”๏ธ N.A. 207,820 58.61MB ๐Ÿ”—
๐Ÿ“‚ Super computers
BGL Blue Gene/L supercomputer log โœ”๏ธ 214.7 days 4,747,963 708.76MB ๐Ÿ”—
HPC High performance cluster log N.A. 433,489 32.00MB ๐Ÿ”—
Thunderbird Thunderbird supercomputer log โœ”๏ธ 244 days 211,212,192 29.60GB ๐Ÿ”—
๐Ÿ“‚ Operating systems
Windows Windows event log 226.7 days 114,608,388 26.09GB ๐Ÿ”—
Linux Linux system log 263.9 days 25,567 2.25MB ๐Ÿ”—
Mac Mac OS log 7.0 days 117,283 16.09MB ๐Ÿ”—
๐Ÿ“‚ Mobile systems
Android_v1 Android framework log N.A. 1,555,005 183.37MB ๐Ÿ”—
Android_v2 Android framework log N.A. 30,348,042 3.38GB ๐Ÿ”—
HealthApp Health app log 10.5 days 253,395 22.44MB ๐Ÿ”—
๐Ÿ“‚ Server applications
Apache Apache web server error log 263.9 days 56,481 4.90MB ๐Ÿ”—
OpenSSH OpenSSH server log 28.4 days 655,146 70.02MB ๐Ÿ”—
๐Ÿ“‚ Standalone software
Proxifier Proxifier software log N.A. 21,329 2.42MB ๐Ÿ”—

๐Ÿ”ฅ Citation

Please cite the following paper if you use the loghub datasets in your research.

Publications using loghub datasets

Publication Paper Title
DSN'07 Adam J. Oliner, Jon Stearley. What Supercomputers Say: A Study of Five System Logs. IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2007.
SOSP'09 Wei Xu, Ling Huang, Armando Fox, David A. Patterson, Michael I. Jordan. Detecting Large-Scale System Problems by Mining Console Logs. ACM Symposium on Operating Systems Principles (SOSP), 2009.
KDD'09 Adetokunbo Makanju, A. Nur Zincir-Heywood, Evangelos E. Milios. Clustering Event Logs Using Iterative Partitioning. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2009.
ISSRE'16 Shilin He, Jieming Zhu, Pinjia He, Michael R. Lyu. Experience Report: System Log Analysis for Anomaly Detection. IEEE International Symposium on Software Reliability Engineering (ISSRE), 2016.
DSN'16 Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. An Evaluation Study on Log Parsing and Its Use in Log Mining. IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2016.
ICSE'16 Qingwei Lin, Hongyu Zhang, Jian-Guang Lou, Yu Zhang, Xuewei Chen. Log Clustering Based Problem Identification for Online Service Systems. International Conference on Software Engineering (ICSE), 2016.
ICWS'17 Pinjia He, Jieming Zhu, Zibin Zheng, Michael R. Lyu. Drain: An Online Log Parsing Approach with Fixed Depth Tree. IEEE International Conference on Web Services (ICWS), 2017.
CCS'17 Min Du, Feifei Li, Guineng Zheng, Vivek Srikumar. DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning. ACM Conference on Computer and Communications Security (CCS), 2017.
TDSC'18 Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. Towards Automated Log Parsing for Large-Scale Log Data Analysis. IEEE Transactions on Dependable and Secure Computing (TDSC), 2018.
TKDE'18 Min Du, Feifei Li. Spell: Online Streaming Parsing of Large Unstructured System Logs. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2018.
ASE'19 Jinyang Liu, Jieming Zhu, Shilin He, Pinjia He, Zibin Zheng, Michael R. Lyu. Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression. IEEE/ACM International Conference on Automated Software Engineering (ASE), 2019.
ICSE'19 Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, Michael R. Lyu. Tools and Benchmarks for Automated Log Parsing. International Conference on Software Engineering (ICSE), 2019.
ICSE'22 Zanis Ali Khan, Donghwan Shin, Domenico Bianculli, Lionel Briand. Guidelines for Assessing the Accuracy of Log Message Template Identification Techniques. International Conference on Software Engineering (ICSE), 2023.
ICSE'23 Van-Hoang Le, Hongyu Zhang. Log Parsing with Prompt-based Few-shot Learning. International Conference on Software Engineering (ICSE), 2023.
ICSE'23 Zhenhao Li, Chuan Luo, Tse-Hsun Chen, Weiyi Shang, Shilin He, Qingwei Lin, Dongmei Zhang. Did We Miss Something Important? Studying and Exploring Variable-Aware Log Abstraction. International Conference on Software Engineering (ICSE), 2023.
ICSE'23 Yintong Huo, Yuxin Su, Cheryl Lee, Michael R. Lyu. SemParser: A Semantic Parser for Log Analysis. International Conference on Software Engineering (ICSE), 2023.
WWW'23 Liming Wang, Hong Xie, Ye Li, Jian Tan, John C.S. Lui. Interactive Log Parsing via Light-weight User Feedback. ACM Web Conference, 2023.
TSC'23 Siyu Yu, Pinjia He, Ningjiang Chen, Yifan Wu. Brain: Log Parsing with Bidirectional Parallel Tree. IEEE Transaction on Severice Computing, 2023.

๐Ÿ’ก If you use loghub datasets in your paper, please feel free to make a PR to add your paper to the table.

Discussion

Welcome to join our WeChat group for any question and discussion. Alternatively, you can open a discussion here.

Scan QR code

๐ŸŒˆ License

The datasets are freely available for research or academic work. For any usage or distribution of the datasets, please refer to the loghub repository URL https://github.com/logpai/loghub and cite the loghub paper where applicable.

loghub's People

Contributors

zhujiem avatar shilinhe avatar pinjiahe avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.