Comments (27)
Thank you for testing it. will check it out
Edit: I found the problem. This will cause a much bigger change then I anticipated. But in the end we will be able to support multiple chassis, systems and managers in every server/blade center.
from check_redfish.
It looks like the new BMC reports 2 chassis for some reason:
{
"Members": [
{
"@odata.id": "/redfish/v1/Chassis/1"
},
{
"@odata.id": "/redfish/v1/Chassis/3"
}
],
"@odata.type": "#ChassisCollection.ChassisCollection",
"@odata.id": "/redfish/v1/Chassis",
"Name": "ChassisCollection",
"@odata.etag": "\"234145c889472ae2565\"",
"[email protected]": 2,
"Description": "A collection of Chassis resource instances."
}
Chassis 1 reports all info, but chassis 3 output looks like so:
{
"SerialNumber": "xxxxxxxx",
"Id": "3",
"Name": "Backplane",
"@odata.id": "/redfish/v1/Chassis/3",
"SKU": "01GV280",
"Oem": {
"Lenovo": {
"PRODUCT_ID": "0000",
"VPD_ID": "0070",
"Entity_ID": "0f",
"Device_ID": "51",
"POS_ID": "006a"
}
},
"@odata.type": "#Chassis.v1_10_0.Chassis",
"ChassisType": "Enclosure",
"PartNumber": "SC57A01986",
"@odata.etag": "\"32915858356a2a24fc8\"",
"Manufacturer": "LNVO",
"Description": "This resource is used to represent a chassis or other physical enclosure for a Redfish implementation."
}
Looking at the output, this is a backplane / enclosure resource, which might provide more info in later versions, but for now there is not much data here.
from check_redfish.
Looking at the changelog I found this regarding the chassis: Added the Redfish support of Enclosure "Chassis" object on blade and dense systems.
from check_redfish.
oh wow. interesting. I might have to adapt the plugin.
- all other checks work?
- and chassi 1 contains all the necessary data?
from check_redfish.
I overlooked, temperature is also reporting the same problem.
Yes, chassis 1 contains all necessary data. ChassisType
of chassis 1 is RackMount
. Maybe a quick workaround would be to check if ChassisType
is RackMount
, but I'm not sure if that causes problems for other types of servers.
from check_redfish.
I would take another approach. I would collect data and only complain if no data for temp or anything returned at all. As long as one chassi returns data it should report as green.
Will have a look at it.
from check_redfish.
That is probably a better approach.
In case backplane endpoint starts returning temperature / power / fans (blade or dense systems where there is a separage storage chassis under the server bay), your solution will cover that.
from check_redfish.
Looking at Lenovo SD530, it might be that this reports chassis 1 as main chassis (power supply and some temperatures) and 2,3,4,5 as each node (fans, power, temperature)...
from check_redfish.
I would be highly interested in mockups to add to my testing environment. Would be great if you could provide some. Also makes coding against it much easier.
from check_redfish.
I can't provide mockups for SD530 as we don't have them. As for SR6x0 and BMC/XCC 5.40, the mockup is above (for chassis 3).
I will also love to test anx fixes you make.
from check_redfish.
So, I took quite long but now I tried to take care of this issue.
can you check out next-release and see if this fixes your issue.
from check_redfish.
Power supply checks work OK:
[OK]: All power supplies (2) are in good condition and Power redundancy 1 status is: Enabled|'ps_1'=99 'ps_2'=99
Temperature checks work OK:
[OK]: |'temp_Ambient_Temp'=26.0;43;47 'temp_CPU1_Temp'=48.0 'temp_CPU1_DTS'=-49.0 'temp_DIMM_3_Temp'=39.0 'temp_DIMM_4_Temp'=39.0 'temp_DIMM_5_Temp'=39.0 'temp_DIMM_6_Temp'=39.0 'temp_DIMM_7_Temp'=37.0 'temp_DIMM_8_Temp'=37.0 'temp_DIMM_9_Temp'=36.0 'temp_DIMM_10_Temp'=36.0 'temp_PCH_Temp'=64.0 'temp_Exhaust_Temp'=48.0
Fans:
[UNKNOWN]: Request error: No fan data returned for API URL '/redfish/v1/Chassis/1/Thermal', No fan data returned for API URL '/redfish/v1/Chassis/3/Thermal'
Chassis/1/ json output: https://pastebin.com/2UqTtiAg
Chassis/3/ json output: https://pastebin.com/6sgN2F81
from check_redfish.
I ran the fans check again and now it works., but the output is different.
Check on old version:
[OK]: All fans (10) are in good condition|'Fan_Fan_1A_Tach'=5460;; 'Fan_Fan_1B_Tach'=5340;; 'Fan_Fan_2A_Tach'=5376;; 'Fan_Fan_2B_Tach'=5251;; 'Fan_Fan_3A_Tach'=5376;; 'Fan_Fan_3B_Tach'=5162;; 'Fan_Fan_4A_Tach'=5460;; 'Fan_Fan_4B_Tach'=5251;; 'Fan_Fan_5A_Tach'=5208;; 'Fan_Fan_5B_Tach'=5162;;
Check on new version:
[OK]: |'Fan_Fan_1A_Tach'=5124;; 'Fan_Fan_1B_Tach'=4895;; 'Fan_Fan_2A_Tach'=5208;; 'Fan_Fan_2B_Tach'=4895;; 'Fan_Fan_3A_Tach'=5040;; 'Fan_Fan_3B_Tach'=4984;; 'Fan_Fan_4A_Tach'=5040;; 'Fan_Fan_4B_Tach'=4895;; 'Fan_Fan_5A_Tach'=5040;; 'Fan_Fan_5B_Tach'=4806;;
from check_redfish.
That sound great!! Can't wait to test it out:)
from check_redfish.
Hey @matejzero,
It took quite a while but finally finished the change. Can you please test the 'next-release' branch and let me know if this works for you?
Thank you.
from check_redfish.
I can confirm the new version works on Lenovo SR630/SR650 with XCC firmware versions 5.42 (latest) and 4.80 (pre-latest), apart from no mel/sel logs, but that doesn't work on latest release either:
[UNKNOWN]: No log services discovered where name matches 'Manager'
[UNKNOWN]: No log services discovered where name matches 'System'
All checks are green on Dell R6515 (iDrac 4.10.10.10 and 4.30.30.30) and R640 (iDrac 4.10.10.10), but I get some errors on a R740 (iDrac 4.22.00.53) that weren't present in latest release:
-
storage check
New version:[CRITICAL]: PERC H730P Mini status: OK
Old version:[OK]: All storage controllers (PERC H730P Mini PERC H730P Mini, C620 Series Chipset Family SSATA Controller [AHCI mode] C620 Series Chipset Family SSATA Controller [AHCI mode], C620 Series Chipset Family SATA Controller [AHCI mode] C620 Series Chipset Family SATA Controller [AHCI mode], PERC H730P Mini), volumes and disk drives are in good condition
-
info check
New version:[CRITICAL]: Type: Dell Inc. PowerEdge R740 (CPU: 1, MEM: 512GB) - BIOS: 2.9.4 - Serial: xxxx - Power: On - Name: NOT SET - 1 health sensor in 'CRITICAL' state, 34 health sensors are in 'OK' state
Old version:[OK]: Type: Dell Inc. PowerEdge R740 (CPU: 1, MEM: 512GB) - BIOS: 2.9.4 - Serial: xxxx - Power: On - Name: NOT SET
I only have one R740 to test, but iDrac is reporting the system is all green. I tried looking info output in verbose if any HealthState is reported as Critical, but everyting is OK or Unknown. Let me know how I can help further debug the issue to make it simpler for you.
Thanks for fixing the check so far!
from check_redfish.
I can confirm the new version works on Lenovo SR630/SR650 with XCC firmware versions 5.42 (latest) and 4.80 (pre-latest), apart from no mel/sel logs, but that doesn't work on latest release either:
[UNKNOWN]: No log services discovered where name matches 'Manager'
[UNKNOWN]: No log services discovered where name matches 'System'
If you could provide me with a MockUP i can check and integrate this as well.
All checks are green on Dell R6515 (iDrac 4.10.10.10 and 4.30.30.30) and R640 (iDrac 4.10.10.10), but I get some errors on a R740 (iDrac 4.22.00.53) that weren't present in latest release:
- storage check
New version:[CRITICAL]: PERC H730P Mini status: OK
Old version:[OK]: All storage controllers (PERC H730P Mini PERC H730P Mini, C620 Series Chipset Family SSATA Controller [AHCI mode] C620 Series Chipset Family SSATA Controller [AHCI mode], C620 Series Chipset Family SATA Controller [AHCI mode] C620 Series Chipset Family SATA Controller [AHCI mode], PERC H730P Mini), volumes and disk drives are in good condition
This seems to be a bug.
- info check
New version:[CRITICAL]: Type: Dell Inc. PowerEdge R740 (CPU: 1, MEM: 512GB) - BIOS: 2.9.4 - Serial: xxxx - Power: On - Name: NOT SET - 1 health sensor in 'CRITICAL' state, 34 health sensors are in 'OK' state
Old version:[OK]: Type: Dell Inc. PowerEdge R740 (CPU: 1, MEM: 512GB) - BIOS: 2.9.4 - Serial: xxxx - Power: On - Name: NOT SET
There seems to be one component not filtered properly.
Can you please run both commands in --detailed
option and post the output here?
Thank you.
from check_redfish.
I'll try and get the mockup for logs, but I need to find out which endpoint URI the script is calling to collect the document. If you can give me the URI (so that I won't need to look through verbose output), I'll be able to generate it quicker.
Detailed output of storage check:
[CRITICAL]: PERC H730P Mini status: OK
[OK]: PERC H730P Mini PERC H730P Mini (FW: 25.5.7.0005) status is: OK
[OK]: Physical Drive Solid State Disk 0:1:0 (MTFDDAK480TDC / SSD / SATA) 479.56GiB status: OK
[OK]: Physical Drive Solid State Disk 0:1:1 (MTFDDAK480TDC / SSD / SATA) 479.56GiB status: OK
[OK]: Logical Drive VD_0 (VD_0) 480GiB (Mirrored) status: OK
[OK]: StorageEnclosure BP14G+ 0:1 (Power: On) status: OK
[OK]: C620 Series Chipset Family SSATA Controller [AHCI mode] C620 Series Chipset Family SSATA Controller [AHCI mode] (FW: None) status is: None
[OK]: C620 Series Chipset Family SATA Controller [AHCI mode] C620 Series Chipset Family SATA Controller [AHCI mode] (FW: None) status is: None
[OK]: MICRON Solid State Disk 0:1:0 MTFDDAK480TDC (size: 479.56 GiB) status: OK
[OK]: MICRON Solid State Disk 0:1:1 MTFDDAK480TDC (size: 479.56 GiB) status: OK
[OK]: DELL Backplane 1 on Connector 0 of Integrated RAID Controller 1 BP14G+ 0:1 status: OK
info check:
[CRITICAL]: Type: Dell Inc. PowerEdge R740 (CPU: 1, MEM: 512GB) - BIOS: 2.9.4 - Serial: xxxx - Power: On - Name: NOT SET
[CRITICAL]: Sensor "CPU2 Status": Unknown (Enabled/Unknown)
[OK]: Sensor "CPU1 FIVR PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM012 VDDQ PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM012 VPP PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM012 VTT PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM345 VDDQ PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM345 VPP PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM345 VTT PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 Status": OK (Enabled/Good)
[OK]: Sensor "CPU1 VCCIO PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 VCORE PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 VSA PG": OK (Enabled/Good)
[OK]: Sensor "DIMM SLOT A10": OK (Enabled/Presence Detected)
[OK]: Sensor "DIMM SLOT A11": OK (Enabled/Presence Detected)
[OK]: Sensor "DIMM SLOT A2": OK (Enabled/Presence Detected)
[OK]: Sensor "DIMM SLOT A4": OK (Enabled/Presence Detected)
[OK]: Sensor "DIMM SLOT A5": OK (Enabled/Presence Detected)
[OK]: Sensor "DIMM SLOT A7": OK (Enabled/Presence Detected)
[OK]: Sensor "DIMM SLOT A8": OK (Enabled/Presence Detected)
[OK]: Sensor "System Board 1.8V SW PG": OK (Enabled/Good)
[OK]: Sensor "System Board 2.5V SW PG": OK (Enabled/Good)
[OK]: Sensor "System Board 3.3V B PG": OK (Enabled/Good)
[OK]: Sensor "System Board 5V SW PG": OK (Enabled/Good)
[OK]: Sensor "System Board BP0 PG": OK (Enabled/Good)
[OK]: Sensor "System Board BP1 PG": OK (Enabled/Good)
[OK]: Sensor "System Board BP2 PG": OK (Enabled/Good)
[OK]: Sensor "System Board CMOS Battery": OK (Enabled/Good)
[OK]: Sensor "System Board DIMM PG": OK (Enabled/Good)
[OK]: Sensor "System Board Intrusion": OK (Enabled/No Breach)
[OK]: Sensor "System Board NDC PG": OK (Enabled/Good)
[OK]: Sensor "System Board PS1 PG FAIL": OK (Enabled/Good)
[OK]: Sensor "System Board PS2 PG FAIL": OK (Enabled/Good)
[OK]: Sensor "System Board PVNN SW PG": OK (Enabled/Good)
[OK]: Sensor "System Board VSB11 SW PG": OK (Enabled/Good)
[OK]: Sensor "System Board VSBM SW PG": OK (Enabled/Good)
I can see the info output, there is Enabled/Unknown
for sensor CPU2 Status
. This server supports 2 CPUs, but only 1 is installed.
from check_redfish.
There is a refdfish mockup generator on github. I usually use that one: https://github.com/DMTF/Redfish-Mockup-Creator
from check_redfish.
Could I send the mockup to you via email as I don't want to post it here due to serial numbers included in the mockup.
from check_redfish.
Absolutely
from check_redfish.
I don't seem to find your email on github. Could you send me an email to xxx at yyy and I'll send you the link to mockup files.
from check_redfish.
You can find it here: https://github.com/bb-Ricardo/check_redfish/blob/master/check_redfish.py#L21
from check_redfish.
I saw you made some commits. Checks on R740 now pass without a problem.
from check_redfish.
Thank you for testing. I just pushed another commit.
Now the Logs on Lenovo Systems should work again. Also added Controller Cache Battery Infos for newer DELL and Lenovo Systems
from check_redfish.
I tested latest version on SR630/SR650 and Dell R6515, R640 and R740 and all works OK!
We also have a lot of SR635 servers, but they are too slow for querying at the moment. Need to do more testing, but just querying base redfish URI can take between 3s and 30s+, so I need to do more testing and then try this check.
Anyway, I think all issues are now fixed and you can close this.
Thank you very much for this fixes!
from check_redfish.
Great and thank you for all the testing.
from check_redfish.
Related Issues (20)
- object of type 'NoneType' has no len() HOT 19
- Disabled power control become CRITICAL HOT 6
- Virtual Machine did not assigned to devices HOT 3
- plugin event : somme event are filtered by default : Power supply redundancy is lost HOT 10
- ALREADY IN INVENTORY message in --nic HOT 10
- Incorrect status when server is powered off (Dell poweredge R740) HOT 24
- unclear why some servers are WARNING/CRITICAL when using "--info" HOT 4
- Mode --firmware also checks --storage -> Request error: No array controller data returned for API URL '/redfish/v1/Systems/1//SmartStorage/ArrayControllers?$expand=.' HOT 9
- HP DL360g10: NVMe monitoring when Smart Array controller is present HOT 4
- Duplicate Power supplies reported with HPE Apollo 4510 Gen10 HOT 1
- First output line order is random HOT 9
- No network adapter result in "Unable to connect to Host '0.0.0.0', max retries exhausted" HOT 15
- "TypeError: 'NoneType' object is not iterable" while reading Network Ports HOT 9
- Traceback Error on plugin.do_exit() HOT 10
- check_redfish.py does not resolve hostname HOT 4
- HP ProLiant DL360 Gen10 storage cache handling HOT 4
- ILO4 max retries exhausted HOT 6
- UNKNOWN issues after 1.3.0->1.4.1 update HOT 9
- Asrock Rack Support HOT 1
- TypeError: 'NoneType' object is not iterable HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from check_redfish.