Comments (28)
Hi,
Cisco was on my list, more a lack of devices.
Ff you don't mind you could use this project (https://github.com/DMTF/Redfish-Mockup-Creator) and send me the mockup as tar and I will have a look.
But it will contain IP addresses, serial numbers and account data which you might want to anonymize.
from check_redfish.
Do you want the tar file attached to this issue?
from check_redfish.
just send it to my email address.
from check_redfish.
can you please test the latest "next-release" branch?
from check_redfish.
All checks works as expected, except power:
[CRITICAL]: Power supply 1 (700-014160-0000) status is: None
[CRITICAL]: Power supply 2 (700-014160-0000) status is: None|'ps_1'=213 'ps_2'=198 'voltage_PSU1_VOUT'=12.0 'voltage_PSU2_VOUT'=12.0 'voltage_P12V'=11.89 'voltage_P3V_BAT_SCALED'=3.04
from check_redfish.
That sounds great so far. Thank you for testing.
The issue with the power supplies is that this instance doesn't report a Health status:
'PowerSupplies': [
{
'@odata.id': '/redfish/v1/Chassis/1/Power#/PowerSupplies/PSU1',
'FirmwareVersion': '1100101',
'InputRanges': [{'InputType': 'AC',
'MaximumFrequencyHz': 63,
'MaximumVoltage': 264,
'MinimumFrequencyHz': 47,
'MinimumVoltage': 90,
'OutputWattage': 770}],
'LastPowerOutputWatts': '212',
'LineInputVoltage': '223',
'LineInputVoltageType': 'AC',
'Manufacturer': 'Cisco Systems Inc',
'MemberID': 1,
'Model': '700-014160-0000',
'Name': 'PSU1',
'PartNumber': '341-0591-04',
'PowerSupplyType': 'AC',
'SerialNumber': 'ABC',
'SparePartNumber': '341-0591-04',
'Status': {'state': 'Enabled'}},
{'@odata.id': '/redfish/v1/Chassis/1/Power#/PowerSupplies/PSU2',
'FirmwareVersion': '1100101',
'InputRanges': [{'InputType': 'AC',
'MaximumFrequencyHz': 63,
'MaximumVoltage': 264,
'MinimumFrequencyHz': 47,
'MinimumVoltage': 90,
'OutputWattage': 770}],
'LastPowerOutputWatts': '194',
'LineInputVoltage': '223',
'LineInputVoltageType': 'AC',
'Manufacturer': 'Cisco Systems Inc',
'MemberID': 2,
'Model': '700-014160-0000',
'Name': 'PSU2',
'PartNumber': '341-0591-04',
'PowerSupplyType': 'AC',
'SerialNumber': 'ABD',
'SparePartNumber': '341-0591-04',
'Status': {'state': 'Enabled'}
}
]
according to the redfish standard it should be reported like this:
{'Health': 'OK', 'HealthRollup': 'OK', 'State': 'Enabled'}
Source DSP0268_2019.4_0.pdf Page 45
I don't know if all attributes are mendatory but a "Health" attribute would be quite helpfull in this case.
All other vendors are able to report a "Health" status.
Any idea on how to treat this situation?
- Just report it as OK?
- Assume that Enabled means OK and if state is not enabled then report CRITICAL?
Would you be able to test this in your enviroment. Just if you run the check then pull out a power supply and run the check again? And maybe just unplug the power as well and see what exactly gets reported?
Option "-v" would be very helpful in this case.
Thank you very much.
from check_redfish.
Actually that's exactly what happens for storage components. State == "Enabled" -> OK
Testing would still be great.
Thank you
from check_redfish.
I'll be able to test it on Wednesday, I'll let you know what I find.
from check_redfish.
If the power cable is disconnected, the CIMC returns
"Status": {
"state": "Disabled"
if the PSU is removed it's simply not present in the output (I removed PSU1, only PSU2 was in the MockUp)
I think it's safe to assume 'Enabled' means okay, anything else is bad.
from check_redfish.
Thank you so much for testing.
I will change the code accordingly.
state == Enabled -> OK
state != Enabled -> CRITICAL
But if the power supply is unplugged does "PowerRedundancy" complain?
do you still have the Mockups?
from check_redfish.
Unfortunately I only have the mockup from when one PSU was missing.
Looking at the original mockup I sent to you, there is no mention of PowerRedundancy
I plan on going onsite again Friday, let me know if we need more testing done.
from check_redfish.
The only difference in the output from the MockUp-tool is the change of state.
from check_redfish.
Thank you so much for testing.
Makes me curious if a removed power supply is properly indicated.
I will add the change hopefully by next week.
from check_redfish.
Looking in the CIMC, I see a warning if a PSU is removed:
PSU1_STATUS: Power Supply 1 missing: reseat or replace PS 1
But the overall status of the system is green.
from check_redfish.
I need to revisit this one - We just discovered that the latest firmware update from Cisco breaks the script, the checks it broke are 'bmc' and 'firmware' (we do not use the latter)
The bmc check returns '[CRITICAL]: None (Firmware: None)'
The firmware check just throws a Python error
New mockup?
from check_redfish.
tisk, tisk, tisk, CISCO.
yes please. a MockUp would be great.
Thank you
from check_redfish.
I added the changes (and quite some more) to next-release. Can you test if it's working now?
from check_redfish.
--firmware works, as does mem, temp, fan and nic, the rest (except mel and sel) fail with a similar sounding erro (the AttributeError, line numbers and functions differ):
Traceback (most recent call last):
File "/home/user/check_redfish/check_redfish.py", line 3232, in <module>
if "bmc" in args.requested_query: get_bmc_info()
File "/home/user/check_redfish/check_redfish.py", line 2793, in get_bmc_info
get_bmc_info_generic(manager)
File "/home/user/check_redfish/check_redfish.py", line 2955, in get_bmc_info_generic
status = manager_response.get("Status").get("Health").upper()
AttributeError: 'NoneType' object has no attribute 'get'
mel and sel fails with
--sel
[UNKNOWN]: No log services discoverd in /redfish/v1/Managers/CIMC/LogServices that match System
[UNKNOWN]: No log services discoverd in /redfish/v1/Managers/2/LogServices that match System
--mel
[UNKNOWN]: No log services discoverd in /redfish/v1/Managers/CIMC/LogServices that match Manager
[UNKNOWN]: No log services discoverd in /redfish/v1/Managers/2/LogServices that match Manager
from check_redfish.
Thank you for testing.
I'm not sure if you have the latest version from branch next-release.
check_redfish.py
now only has 220 lines and this is the last one: https://github.com/bb-Ricardo/check_redfish/blob/next-release/check_redfish.py#L220
Please pull this branch again and try it out: https://github.com/bb-Ricardo/check_redfish/tree/next-release
from check_redfish.
You are absolutely right, brainfart on my part.
--info
[OK]: Type: Cisco Systems Inc UCSC-C240-M5SX (CPU: 2, MEM: 768GB) - BIOS: C240M5.4.1.1b.0.0124200238 - Serial: <serial> - Power: On - Name: <hostname>
--firmware
[OK]: Found 3 firmware entries. Use '--detailed' option to display them.
--storage
[OK]: All storage controllers (Storage controller FX3S), volumes and disk drives are in good condition
--proc
[OK]: All processors (2) are in good condition
--memory
[OK]: All memory modules (Total 768GB) are in good condition
--power
[OK]: All power supplies (2) are in good condition and 4 Voltages are OK|'ps_1'=246 'ps_2'=240 'voltage_PSU1_VOUT'=12.1 'voltage_PSU2_VOUT'=12.1 'voltage_P12V'=11.774 'voltage_P3V_BAT_SCALED'=2.995
--temp
[OK]: All temp sensors (22) are in good condition|'temp_VIC_SLOT2_TEMP'=42.0;;90 'temp_TEMP_SENS_FRONT'=26.0;;45 'temp_DDR4_P1_A1_TMP'=31.0;;85 'temp_DDR4_P1_B1_TMP'=32.0;;85 'temp_DDR4_P1_C1_TMP'=31.0;;85 'temp_DDR4_P1_D1_TMP'=33.0;;85 'temp_DDR4_P1_E1_TMP'=33.0;;85 'temp_DDR4_P1_F1_TMP'=33.0;;85 'temp_DDR4_P2_G1_TMP'=35.0;;85 'temp_DDR4_P2_H1_TMP'=35.0;;85 'temp_DDR4_P2_J1_TMP'=34.0;;85 'temp_DDR4_P2_K1_TMP'=40.0;;85 'temp_DDR4_P2_L1_TMP'=40.0;;85 'temp_DDR4_P2_M1_TMP'=40.0;;85 'temp_P1_TEMP_SENS'=43.5;;104 'temp_P2_TEMP_SENS'=45.5;;104 'temp_PSU1_TEMP'=31.0;;65 'temp_PSU2_TEMP'=29.0;;65 'temp_PCH_TEMP_SENS'=33.0;;85 'temp_RISER1_TEMP'=28.0;;70 'temp_RISER2_INLET_TMP'=37.0;;70 'temp_RISER1_INLET_TMP'=32.0;;70
--fan
[OK]: All fans (10) are in good condition|'Fan_MOD1_FAN1_SPEED'=15150;; 'Fan_MOD1_FAN1_SPEED'=15150;; 'Fan_MOD2_FAN1_SPEED'=16160;; 'Fan_MOD2_FAN2_SPEED'=15680;; 'Fan_MOD3_FAN2_SPEED'=14700;; 'Fan_MOD3_FAN1_SPEED'=15150;; 'Fan_MOD4_FAN2_SPEED'=15680;; 'Fan_MOD4_FAN2_SPEED'=15680;; 'Fan_MOD5_FAN1_SPEED'=15150;; 'Fan_MOD6_FAN2_SPEED'=15680;;
--nic
[OK]: All network interfaces (6) are in good condition
--bmc
[OK]: UCSC-C240-M5SX (Firmware: 4.1(1d)) and all nics are in 'OK' state.
--sel
[OK]: Found 50 OK System Event Log entries. Most recent notable: [OK]: 2020-03-25 10:20:27 CET: BIOS_POST_CMPLT: Presence sensor, Device Inserted / Device Present was asserted
--mel
[OK]: Found 100 OK Manager Event Log entries. Most recent notable: [OK]: 2020 Mar 28 12:53:49 CET: Session close (user:user@domain (LDAP), ip:10.10.10.10, id:3199, type:xmlapi)
Excellent work, once again. Thank you.
from check_redfish.
Great, this looks good.
Thank you for all this input.
from check_redfish.
Apparently I didn't test enough :( - some checks fail on UCS C220 with firmware 4.1:
--storage
Traceback (most recent call last):
File "/home/user/check_redfish/check_redfish.py", line 211, in <module>
if any(x in args.requested_query for x in ['storage', 'all']): get_storage(plugin)
File "/home/user/check_redfish/cr_module/storage.py", line 29, in get_storage
get_storage_generic(plugin_object, system)
File "/home/user/check_redfish/cr_module/storage.py", line 806, in get_storage_generic
get_volumes(controller_response.get("Volumes").get("@odata.id"))
AttributeError: 'NoneType' object has no attribute 'get'
--proc
Traceback (most recent call last):
File "/home/user/check_redfish/check_redfish.py", line 208, in <module>
if any(x in args.requested_query for x in ['proc', 'all']): get_system_data(plugin, "procs")
File "/home/user/check_redfish/check_redfish.py", line 176, in get_system_data
get_single_system_procs(plugin_object, system)
File "/home/user/check_redfish/cr_module/proc.py", line 50, in get_single_system_procs
for proc in processors_response.get("Members") or processors_response.get(system_response_proc_key):
TypeError: 'NoneType' object is not iterable
--nic
[UNKNOWN]: No network interface data returned for API URL '/redfish/v1/Systems/<serial>/EthernetInterfaces/'
Another mockup?
from check_redfish.
you got more/different machines?
Hope this project doesn't turn into a support hell of different vendors with different versions of BMCs, working differently as the ones before. So far Cisco is at the top of the list.
Yes, Mockup would be great.
Thank you
from check_redfish.
I know, and the bugs me Cisco is this difficult (there was no difference between C220 and C240 before the firmware upgrade).
I will email the mockup, let me know if there's something we can do to support your work.
from check_redfish.
Can you try it again please?
from check_redfish.
--nic and --proc returns 'unknown' due to the data format error you have identified. I'll open a TAC case.
--storage returns warning on both C220 and C240 now (C240 worked before the latest pull)
[WARNING]: Physical Drive SD card (NA / None / None) 0GiB status: WARNING
from check_redfish.
Sorry to reopen this, but the storage issue remains; the output on all servers is the same, regardless of the actual storage configuration (we have with FlexFlash, with disk/ssd with and without hw.raid)
all of them outputs: [WARNING]: Physical Drive SD card (NA / None / None) 0GiB status: WARNING
from check_redfish.
The output is based on the data returned from the CIMC.
Just run the plugin with the -v
cli option and you will see the JSON data returned.
In this case only CISCO is able to fix this problem.
from check_redfish.
Related Issues (20)
- object of type 'NoneType' has no len() HOT 19
- Disabled power control become CRITICAL HOT 6
- Virtual Machine did not assigned to devices HOT 3
- plugin event : somme event are filtered by default : Power supply redundancy is lost HOT 10
- ALREADY IN INVENTORY message in --nic HOT 10
- Incorrect status when server is powered off (Dell poweredge R740) HOT 24
- unclear why some servers are WARNING/CRITICAL when using "--info" HOT 4
- Mode --firmware also checks --storage -> Request error: No array controller data returned for API URL '/redfish/v1/Systems/1//SmartStorage/ArrayControllers?$expand=.' HOT 9
- HP DL360g10: NVMe monitoring when Smart Array controller is present HOT 4
- Duplicate Power supplies reported with HPE Apollo 4510 Gen10 HOT 1
- First output line order is random HOT 9
- No network adapter result in "Unable to connect to Host '0.0.0.0', max retries exhausted" HOT 15
- "TypeError: 'NoneType' object is not iterable" while reading Network Ports HOT 9
- Traceback Error on plugin.do_exit() HOT 10
- check_redfish.py does not resolve hostname HOT 4
- Negative timestamp HOT 1
- ILO4 max retries exhausted HOT 6
- UNKNOWN issues after 1.3.0->1.4.1 update HOT 9
- Asrock Rack Support HOT 1
- TypeError: 'NoneType' object is not iterable HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from check_redfish.