Coder Social home page Coder Social logo

Comments (27)

bb-Ricardo avatar bb-Ricardo commented on June 2, 2024 1

Thank you for testing it. will check it out

Edit: I found the problem. This will cause a much bigger change then I anticipated. But in the end we will be able to support multiple chassis, systems and managers in every server/blade center.

from check_redfish.

matejzero avatar matejzero commented on June 2, 2024

It looks like the new BMC reports 2 chassis for some reason:

{
    "Members": [
        {
            "@odata.id": "/redfish/v1/Chassis/1"
        },
        {
            "@odata.id": "/redfish/v1/Chassis/3"
        }
    ],
    "@odata.type": "#ChassisCollection.ChassisCollection",
    "@odata.id": "/redfish/v1/Chassis",
    "Name": "ChassisCollection",
    "@odata.etag": "\"234145c889472ae2565\"",
    "[email protected]": 2,
    "Description": "A collection of Chassis resource instances."
}

Chassis 1 reports all info, but chassis 3 output looks like so:

{
    "SerialNumber": "xxxxxxxx",
    "Id": "3",
    "Name": "Backplane",
    "@odata.id": "/redfish/v1/Chassis/3",
    "SKU": "01GV280",
    "Oem": {
        "Lenovo": {
            "PRODUCT_ID": "0000",
            "VPD_ID": "0070",
            "Entity_ID": "0f",
            "Device_ID": "51",
            "POS_ID": "006a"
        }
    },
    "@odata.type": "#Chassis.v1_10_0.Chassis",
    "ChassisType": "Enclosure",
    "PartNumber": "SC57A01986",
    "@odata.etag": "\"32915858356a2a24fc8\"",
    "Manufacturer": "LNVO",
    "Description": "This resource is used to represent a chassis or other physical enclosure for a Redfish implementation."
}

Looking at the output, this is a backplane / enclosure resource, which might provide more info in later versions, but for now there is not much data here.

from check_redfish.

matejzero avatar matejzero commented on June 2, 2024

Looking at the changelog I found this regarding the chassis: Added the Redfish support of Enclosure "Chassis" object on blade and dense systems.

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 2, 2024

oh wow. interesting. I might have to adapt the plugin.

  • all other checks work?
  • and chassi 1 contains all the necessary data?

from check_redfish.

matejzero avatar matejzero commented on June 2, 2024

I overlooked, temperature is also reporting the same problem.

Yes, chassis 1 contains all necessary data. ChassisType of chassis 1 is RackMount. Maybe a quick workaround would be to check if ChassisType is RackMount, but I'm not sure if that causes problems for other types of servers.

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 2, 2024

I would take another approach. I would collect data and only complain if no data for temp or anything returned at all. As long as one chassi returns data it should report as green.

Will have a look at it.

from check_redfish.

matejzero avatar matejzero commented on June 2, 2024

That is probably a better approach.

In case backplane endpoint starts returning temperature / power / fans (blade or dense systems where there is a separage storage chassis under the server bay), your solution will cover that.

from check_redfish.

matejzero avatar matejzero commented on June 2, 2024

Looking at Lenovo SD530, it might be that this reports chassis 1 as main chassis (power supply and some temperatures) and 2,3,4,5 as each node (fans, power, temperature)...

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 2, 2024

I would be highly interested in mockups to add to my testing environment. Would be great if you could provide some. Also makes coding against it much easier.

from check_redfish.

matejzero avatar matejzero commented on June 2, 2024

I can't provide mockups for SD530 as we don't have them. As for SR6x0 and BMC/XCC 5.40, the mockup is above (for chassis 3).

I will also love to test anx fixes you make.

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 2, 2024

So, I took quite long but now I tried to take care of this issue.

can you check out next-release and see if this fixes your issue.

from check_redfish.

matejzero avatar matejzero commented on June 2, 2024

Power supply checks work OK:
[OK]: All power supplies (2) are in good condition and Power redundancy 1 status is: Enabled|'ps_1'=99 'ps_2'=99

Temperature checks work OK:
[OK]: |'temp_Ambient_Temp'=26.0;43;47 'temp_CPU1_Temp'=48.0 'temp_CPU1_DTS'=-49.0 'temp_DIMM_3_Temp'=39.0 'temp_DIMM_4_Temp'=39.0 'temp_DIMM_5_Temp'=39.0 'temp_DIMM_6_Temp'=39.0 'temp_DIMM_7_Temp'=37.0 'temp_DIMM_8_Temp'=37.0 'temp_DIMM_9_Temp'=36.0 'temp_DIMM_10_Temp'=36.0 'temp_PCH_Temp'=64.0 'temp_Exhaust_Temp'=48.0

Fans:
[UNKNOWN]: Request error: No fan data returned for API URL '/redfish/v1/Chassis/1/Thermal', No fan data returned for API URL '/redfish/v1/Chassis/3/Thermal'

Chassis/1/ json output: https://pastebin.com/2UqTtiAg
Chassis/3/ json output: https://pastebin.com/6sgN2F81

from check_redfish.

matejzero avatar matejzero commented on June 2, 2024

I ran the fans check again and now it works., but the output is different.

Check on old version:
[OK]: All fans (10) are in good condition|'Fan_Fan_1A_Tach'=5460;; 'Fan_Fan_1B_Tach'=5340;; 'Fan_Fan_2A_Tach'=5376;; 'Fan_Fan_2B_Tach'=5251;; 'Fan_Fan_3A_Tach'=5376;; 'Fan_Fan_3B_Tach'=5162;; 'Fan_Fan_4A_Tach'=5460;; 'Fan_Fan_4B_Tach'=5251;; 'Fan_Fan_5A_Tach'=5208;; 'Fan_Fan_5B_Tach'=5162;;

Check on new version:
[OK]: |'Fan_Fan_1A_Tach'=5124;; 'Fan_Fan_1B_Tach'=4895;; 'Fan_Fan_2A_Tach'=5208;; 'Fan_Fan_2B_Tach'=4895;; 'Fan_Fan_3A_Tach'=5040;; 'Fan_Fan_3B_Tach'=4984;; 'Fan_Fan_4A_Tach'=5040;; 'Fan_Fan_4B_Tach'=4895;; 'Fan_Fan_5A_Tach'=5040;; 'Fan_Fan_5B_Tach'=4806;;

from check_redfish.

matejzero avatar matejzero commented on June 2, 2024

That sound great!! Can't wait to test it out:)

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 2, 2024

Hey @matejzero,

It took quite a while but finally finished the change. Can you please test the 'next-release' branch and let me know if this works for you?

Thank you.

from check_redfish.

matejzero avatar matejzero commented on June 2, 2024

I can confirm the new version works on Lenovo SR630/SR650 with XCC firmware versions 5.42 (latest) and 4.80 (pre-latest), apart from no mel/sel logs, but that doesn't work on latest release either:

  • [UNKNOWN]: No log services discovered where name matches 'Manager'
  • [UNKNOWN]: No log services discovered where name matches 'System'

All checks are green on Dell R6515 (iDrac 4.10.10.10 and 4.30.30.30) and R640 (iDrac 4.10.10.10), but I get some errors on a R740 (iDrac 4.22.00.53) that weren't present in latest release:

  • storage check
    New version: [CRITICAL]: PERC H730P Mini status: OK
    Old version: [OK]: All storage controllers (PERC H730P Mini PERC H730P Mini, C620 Series Chipset Family SSATA Controller [AHCI mode] C620 Series Chipset Family SSATA Controller [AHCI mode], C620 Series Chipset Family SATA Controller [AHCI mode] C620 Series Chipset Family SATA Controller [AHCI mode], PERC H730P Mini), volumes and disk drives are in good condition

  • info check
    New version: [CRITICAL]: Type: Dell Inc. PowerEdge R740 (CPU: 1, MEM: 512GB) - BIOS: 2.9.4 - Serial: xxxx - Power: On - Name: NOT SET - 1 health sensor in 'CRITICAL' state, 34 health sensors are in 'OK' state
    Old version: [OK]: Type: Dell Inc. PowerEdge R740 (CPU: 1, MEM: 512GB) - BIOS: 2.9.4 - Serial: xxxx - Power: On - Name: NOT SET

I only have one R740 to test, but iDrac is reporting the system is all green. I tried looking info output in verbose if any HealthState is reported as Critical, but everyting is OK or Unknown. Let me know how I can help further debug the issue to make it simpler for you.

Thanks for fixing the check so far!

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 2, 2024

I can confirm the new version works on Lenovo SR630/SR650 with XCC firmware versions 5.42 (latest) and 4.80 (pre-latest), apart from no mel/sel logs, but that doesn't work on latest release either:

  • [UNKNOWN]: No log services discovered where name matches 'Manager'
  • [UNKNOWN]: No log services discovered where name matches 'System'

If you could provide me with a MockUP i can check and integrate this as well.

All checks are green on Dell R6515 (iDrac 4.10.10.10 and 4.30.30.30) and R640 (iDrac 4.10.10.10), but I get some errors on a R740 (iDrac 4.22.00.53) that weren't present in latest release:

  • storage check
    New version: [CRITICAL]: PERC H730P Mini status: OK
    Old version: [OK]: All storage controllers (PERC H730P Mini PERC H730P Mini, C620 Series Chipset Family SSATA Controller [AHCI mode] C620 Series Chipset Family SSATA Controller [AHCI mode], C620 Series Chipset Family SATA Controller [AHCI mode] C620 Series Chipset Family SATA Controller [AHCI mode], PERC H730P Mini), volumes and disk drives are in good condition

This seems to be a bug.

  • info check
    New version: [CRITICAL]: Type: Dell Inc. PowerEdge R740 (CPU: 1, MEM: 512GB) - BIOS: 2.9.4 - Serial: xxxx - Power: On - Name: NOT SET - 1 health sensor in 'CRITICAL' state, 34 health sensors are in 'OK' state
    Old version: [OK]: Type: Dell Inc. PowerEdge R740 (CPU: 1, MEM: 512GB) - BIOS: 2.9.4 - Serial: xxxx - Power: On - Name: NOT SET

There seems to be one component not filtered properly.

Can you please run both commands in --detailed option and post the output here?

Thank you.

from check_redfish.

matejzero avatar matejzero commented on June 2, 2024

I'll try and get the mockup for logs, but I need to find out which endpoint URI the script is calling to collect the document. If you can give me the URI (so that I won't need to look through verbose output), I'll be able to generate it quicker.

Detailed output of storage check:

[CRITICAL]: PERC H730P Mini status: OK
[OK]: PERC H730P Mini PERC H730P Mini (FW: 25.5.7.0005) status is: OK
[OK]: Physical Drive Solid State Disk 0:1:0 (MTFDDAK480TDC / SSD / SATA) 479.56GiB status: OK
[OK]: Physical Drive Solid State Disk 0:1:1 (MTFDDAK480TDC / SSD / SATA) 479.56GiB status: OK
[OK]: Logical Drive VD_0 (VD_0) 480GiB (Mirrored) status: OK
[OK]: StorageEnclosure BP14G+ 0:1 (Power: On) status: OK
[OK]: C620 Series Chipset Family SSATA Controller [AHCI mode] C620 Series Chipset Family SSATA Controller [AHCI mode] (FW: None) status is: None
[OK]: C620 Series Chipset Family SATA Controller [AHCI mode] C620 Series Chipset Family SATA Controller [AHCI mode] (FW: None) status is: None
[OK]: MICRON Solid State Disk 0:1:0 MTFDDAK480TDC (size: 479.56 GiB) status: OK
[OK]: MICRON Solid State Disk 0:1:1 MTFDDAK480TDC (size: 479.56 GiB) status: OK
[OK]: DELL Backplane 1 on Connector 0 of Integrated RAID Controller 1 BP14G+ 0:1 status: OK

info check:

[CRITICAL]: Type: Dell Inc. PowerEdge R740 (CPU: 1, MEM: 512GB) - BIOS: 2.9.4 - Serial: xxxx - Power: On - Name: NOT SET
[CRITICAL]: Sensor "CPU2 Status": Unknown (Enabled/Unknown)
[OK]: Sensor "CPU1 FIVR PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM012 VDDQ PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM012 VPP PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM012 VTT PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM345 VDDQ PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM345 VPP PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 MEM345 VTT PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 Status": OK (Enabled/Good)
[OK]: Sensor "CPU1 VCCIO PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 VCORE PG": OK (Enabled/Good)
[OK]: Sensor "CPU1 VSA PG": OK (Enabled/Good)
[OK]: Sensor "DIMM SLOT A10": OK (Enabled/Presence Detected)
[OK]: Sensor "DIMM SLOT A11": OK (Enabled/Presence Detected)
[OK]: Sensor "DIMM SLOT A2": OK (Enabled/Presence Detected)
[OK]: Sensor "DIMM SLOT A4": OK (Enabled/Presence Detected)
[OK]: Sensor "DIMM SLOT A5": OK (Enabled/Presence Detected)
[OK]: Sensor "DIMM SLOT A7": OK (Enabled/Presence Detected)
[OK]: Sensor "DIMM SLOT A8": OK (Enabled/Presence Detected)
[OK]: Sensor "System Board 1.8V SW PG": OK (Enabled/Good)
[OK]: Sensor "System Board 2.5V SW PG": OK (Enabled/Good)
[OK]: Sensor "System Board 3.3V B PG": OK (Enabled/Good)
[OK]: Sensor "System Board 5V SW PG": OK (Enabled/Good)
[OK]: Sensor "System Board BP0 PG": OK (Enabled/Good)
[OK]: Sensor "System Board BP1 PG": OK (Enabled/Good)
[OK]: Sensor "System Board BP2 PG": OK (Enabled/Good)
[OK]: Sensor "System Board CMOS Battery": OK (Enabled/Good)
[OK]: Sensor "System Board DIMM PG": OK (Enabled/Good)
[OK]: Sensor "System Board Intrusion": OK (Enabled/No Breach)
[OK]: Sensor "System Board NDC PG": OK (Enabled/Good)
[OK]: Sensor "System Board PS1 PG FAIL": OK (Enabled/Good)
[OK]: Sensor "System Board PS2 PG FAIL": OK (Enabled/Good)
[OK]: Sensor "System Board PVNN SW PG": OK (Enabled/Good)
[OK]: Sensor "System Board VSB11 SW PG": OK (Enabled/Good)
[OK]: Sensor "System Board VSBM SW PG": OK (Enabled/Good)

I can see the info output, there is Enabled/Unknown for sensor CPU2 Status. This server supports 2 CPUs, but only 1 is installed.

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 2, 2024

There is a refdfish mockup generator on github. I usually use that one: https://github.com/DMTF/Redfish-Mockup-Creator

from check_redfish.

matejzero avatar matejzero commented on June 2, 2024

Could I send the mockup to you via email as I don't want to post it here due to serial numbers included in the mockup.

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 2, 2024

Absolutely

from check_redfish.

matejzero avatar matejzero commented on June 2, 2024

I don't seem to find your email on github. Could you send me an email to xxx at yyy and I'll send you the link to mockup files.

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 2, 2024

You can find it here: https://github.com/bb-Ricardo/check_redfish/blob/master/check_redfish.py#L21

from check_redfish.

matejzero avatar matejzero commented on June 2, 2024

I saw you made some commits. Checks on R740 now pass without a problem.

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 2, 2024

Thank you for testing. I just pushed another commit.

Now the Logs on Lenovo Systems should work again. Also added Controller Cache Battery Infos for newer DELL and Lenovo Systems

from check_redfish.

matejzero avatar matejzero commented on June 2, 2024

I tested latest version on SR630/SR650 and Dell R6515, R640 and R740 and all works OK!

We also have a lot of SR635 servers, but they are too slow for querying at the moment. Need to do more testing, but just querying base redfish URI can take between 3s and 30s+, so I need to do more testing and then try this check.

Anyway, I think all issues are now fixed and you can close this.

Thank you very much for this fixes!

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 2, 2024

Great and thank you for all the testing.

from check_redfish.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.