lukeshirnia / out-of-memory Goto Github PK
View Code? Open in Web Editor NEWOut Of Memory Analyzer can be used to obtain and summarize "out of memory" issues logged by a Linux kernel (when oom-killer is invoked)
License: Apache License 2.0
Out Of Memory Analyzer can be used to obtain and summarize "out of memory" issues logged by a Linux kernel (when oom-killer is invoked)
License: Apache License 2.0
Fix date objects so the sort works correctly rather than the current sorting via alphabetical
Add functionality to report on the total number of processes recorded when the system ooms
Dmesg reporting OOM issues but script reporting nothing
This is because dmesg is reporting very old oom incidents.
System has rotated and purged all logs since that report so there is nothing left in the log file.
Grab the oldest date (1st line in oldest compressed file) and print message to explain that its an old message, there are no incidents since $date
[root@lga-db ~]# monkey.py -o -- -q
Downloading oom tool ...
----------------------------------------
_____ _____ _____
| | | |
| | | | | | | |
|_____|_____|_|_|_|
Out Of Memory Analyser
Disclaimer:
If system OOMs too viciously, there may be nothing logged!
Do NOT take this script as FACT, investigate further
----------------------------------------
Checking other logs, select an option:
Option: 1 /var/log/messages - Occurrences: 1
/var/log/messages-20170806.gz - Occurrences: 0
Option: 2 /var/log/messages-20170814.gz - Occurrences: 1
Option: 3 /var/log/messages-20170820.gz - Occurrences: 1
/var/log/messages-20170827.gz - Occurrences: 0
Which file should we check next?
Select an option number between 1 and 3: ^C
Traceback (most recent call last):
File "/home/rack/monkeys/monkey-2517a72ed6.py", line 1479, in <module>
main()
File "/home/rack/monkeys/monkey-2517a72ed6.py", line 1437, in main
external_script_action(opt, args)
File "/home/rack/monkeys/monkey-2517a72ed6.py", line 1402, in external_script_action
p.communicate(script)
File "/usr/lib64/python2.7/subprocess.py", line 797, in communicate
self.wait()
File "/usr/lib64/python2.7/subprocess.py", line 1376, in wait
pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
File "/usr/lib64/python2.7/subprocess.py", line 478, in _eintr_retry_call
return func(*args)
KeyboardInterrupt
Currently the script does not run when the file is larger than 300Mb.
Add an option like --override
to allow to bypass this limit if the device has a significant amount of RAM.
Note: Limit was put in place for small devices. Maybe add a check for free RAM and base the size of the file on that.
Here's the output I'm getting on Fedora 25:
Python 2.7.13
~/out-of-memory $ python oom-investigate.py
----------------------------------------
_____ _____ _____
| | | |
| | | | | | | |
|_____|_____|_|_|_|
Out Of Memory Analyser
Disclaimer:
If system OOMs too viciously, there may be nothing logged!
Do NOT take this script as FACT, investigate further
----------------------------------------
Unsupported OS
Error:
'NoneType' object has no attribute 'endswith'
----------------------------------------
If there are loads of oom instances, you will only be able to show 3
Implement an option to view more or less, and data range
When there are less than 3 oom occurrences, the script will either repeat a dates output or produce and error.
"Note: Only Showing: 3 of the 94 occurences
Showing the 1st, 2nd and last"
Doesnt provide information with there is an OOM incident with CentOS/RHEL 5
Although the system does not log in the same manner, more information can be provided.
Update script to provide a little bit more information
Fedora 28 (and soon RHEL 8) will require journalctl compatibility as they do not log to /var/log/messages
as before.
Add this functionality
Need to add the ability to scan all logs, including compressed, with date and time (occurrences)
Need to add command line help to the script
Add a flag (such as --killed
) that can be used to find and display oom incidents where a specific service was killed. Maybe accept a list for multiple killed services.
Investigate implementing different return codes depending on what is returned by the script
Example:
Run script with the following output:
This allows automation tools to implement different "time saved" values depending on the scripts return code
-q, --quick option doesn't report on the main log file
File "<stdin>", line 80
with open("/proc/meminfo", "r") as meminfo:
^
SyntaxError: invalid syntax
Note: RHEL5 and python 2.4.x are EOL. I will not go out of my way to accommodate for EOL OS's and python versions.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.