Coder Social home page Coder Social logo

Cisco CIMC support about check_redfish HOT 28 CLOSED

bb-ricardo avatar bb-ricardo commented on June 10, 2024
Cisco CIMC support

from check_redfish.

Comments (28)

bb-Ricardo avatar bb-Ricardo commented on June 10, 2024

Hi,

Cisco was on my list, more a lack of devices.

Ff you don't mind you could use this project (https://github.com/DMTF/Redfish-Mockup-Creator) and send me the mockup as tar and I will have a look.

But it will contain IP addresses, serial numbers and account data which you might want to anonymize.

from check_redfish.

ajoergensen avatar ajoergensen commented on June 10, 2024

Do you want the tar file attached to this issue?

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 10, 2024

just send it to my email address.

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 10, 2024

can you please test the latest "next-release" branch?

from check_redfish.

ajoergensen avatar ajoergensen commented on June 10, 2024

All checks works as expected, except power:

[CRITICAL]: Power supply 1 (700-014160-0000) status is: None
[CRITICAL]: Power supply 2 (700-014160-0000) status is: None|'ps_1'=213 'ps_2'=198 'voltage_PSU1_VOUT'=12.0 'voltage_PSU2_VOUT'=12.0 'voltage_P12V'=11.89 'voltage_P3V_BAT_SCALED'=3.04

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 10, 2024

That sounds great so far. Thank you for testing.

The issue with the power supplies is that this instance doesn't report a Health status:

'PowerSupplies': [
    {
    '@odata.id': '/redfish/v1/Chassis/1/Power#/PowerSupplies/PSU1',
    'FirmwareVersion': '1100101',
    'InputRanges': [{'InputType': 'AC',
                     'MaximumFrequencyHz': 63,
                     'MaximumVoltage': 264,
                     'MinimumFrequencyHz': 47,
                     'MinimumVoltage': 90,
                     'OutputWattage': 770}],
    'LastPowerOutputWatts': '212',
    'LineInputVoltage': '223',
    'LineInputVoltageType': 'AC',
    'Manufacturer': 'Cisco Systems Inc',
    'MemberID': 1,
    'Model': '700-014160-0000',
    'Name': 'PSU1',
    'PartNumber': '341-0591-04',
    'PowerSupplyType': 'AC',
    'SerialNumber': 'ABC',
    'SparePartNumber': '341-0591-04',
    'Status': {'state': 'Enabled'}},
   {'@odata.id': '/redfish/v1/Chassis/1/Power#/PowerSupplies/PSU2',
    'FirmwareVersion': '1100101',
    'InputRanges': [{'InputType': 'AC',
                     'MaximumFrequencyHz': 63,
                     'MaximumVoltage': 264,
                     'MinimumFrequencyHz': 47,
                     'MinimumVoltage': 90,
                     'OutputWattage': 770}],
    'LastPowerOutputWatts': '194',
    'LineInputVoltage': '223',
    'LineInputVoltageType': 'AC',
    'Manufacturer': 'Cisco Systems Inc',
    'MemberID': 2,
    'Model': '700-014160-0000',
    'Name': 'PSU2',
    'PartNumber': '341-0591-04',
    'PowerSupplyType': 'AC',
    'SerialNumber': 'ABD',
    'SparePartNumber': '341-0591-04',
    'Status': {'state': 'Enabled'}
    }
]

according to the redfish standard it should be reported like this:

 {'Health': 'OK', 'HealthRollup': 'OK', 'State': 'Enabled'}

Source DSP0268_2019.4_0.pdf Page 45

I don't know if all attributes are mendatory but a "Health" attribute would be quite helpfull in this case.
All other vendors are able to report a "Health" status.

Any idea on how to treat this situation?

  • Just report it as OK?
  • Assume that Enabled means OK and if state is not enabled then report CRITICAL?

Would you be able to test this in your enviroment. Just if you run the check then pull out a power supply and run the check again? And maybe just unplug the power as well and see what exactly gets reported?

Option "-v" would be very helpful in this case.

Thank you very much.

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 10, 2024

Actually that's exactly what happens for storage components. State == "Enabled" -> OK

Testing would still be great.

Thank you

from check_redfish.

ajoergensen avatar ajoergensen commented on June 10, 2024

I'll be able to test it on Wednesday, I'll let you know what I find.

from check_redfish.

ajoergensen avatar ajoergensen commented on June 10, 2024

If the power cable is disconnected, the CIMC returns

"Status": {
    "state": "Disabled"

if the PSU is removed it's simply not present in the output (I removed PSU1, only PSU2 was in the MockUp)

I think it's safe to assume 'Enabled' means okay, anything else is bad.

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 10, 2024

Thank you so much for testing.

I will change the code accordingly.

state == Enabled -> OK
state != Enabled -> CRITICAL

But if the power supply is unplugged does "PowerRedundancy" complain?

do you still have the Mockups?

from check_redfish.

ajoergensen avatar ajoergensen commented on June 10, 2024

Unfortunately I only have the mockup from when one PSU was missing.

Looking at the original mockup I sent to you, there is no mention of PowerRedundancy

I plan on going onsite again Friday, let me know if we need more testing done.

from check_redfish.

ajoergensen avatar ajoergensen commented on June 10, 2024

The only difference in the output from the MockUp-tool is the change of state.

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 10, 2024

Thank you so much for testing.

Makes me curious if a removed power supply is properly indicated.

I will add the change hopefully by next week.

from check_redfish.

ajoergensen avatar ajoergensen commented on June 10, 2024

Looking in the CIMC, I see a warning if a PSU is removed:

PSU1_STATUS: Power Supply 1 missing: reseat or replace PS 1

But the overall status of the system is green.

from check_redfish.

ajoergensen avatar ajoergensen commented on June 10, 2024

I need to revisit this one - We just discovered that the latest firmware update from Cisco breaks the script, the checks it broke are 'bmc' and 'firmware' (we do not use the latter)

The bmc check returns '[CRITICAL]: None (Firmware: None)'
The firmware check just throws a Python error

New mockup?

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 10, 2024

tisk, tisk, tisk, CISCO.

yes please. a MockUp would be great.

Thank you

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 10, 2024

I added the changes (and quite some more) to next-release. Can you test if it's working now?

from check_redfish.

ajoergensen avatar ajoergensen commented on June 10, 2024

--firmware works, as does mem, temp, fan and nic, the rest (except mel and sel) fail with a similar sounding erro (the AttributeError, line numbers and functions differ):

Traceback (most recent call last):
  File "/home/user/check_redfish/check_redfish.py", line 3232, in <module>
    if "bmc"        in args.requested_query: get_bmc_info()
  File "/home/user/check_redfish/check_redfish.py", line 2793, in get_bmc_info
    get_bmc_info_generic(manager)
  File "/home/user/check_redfish/check_redfish.py", line 2955, in get_bmc_info_generic
    status = manager_response.get("Status").get("Health").upper()
AttributeError: 'NoneType' object has no attribute 'get'

mel and sel fails with
--sel

[UNKNOWN]: No log services discoverd in /redfish/v1/Managers/CIMC/LogServices that match System
[UNKNOWN]: No log services discoverd in /redfish/v1/Managers/2/LogServices that match System

--mel

[UNKNOWN]: No log services discoverd in /redfish/v1/Managers/CIMC/LogServices that match Manager
[UNKNOWN]: No log services discoverd in /redfish/v1/Managers/2/LogServices that match Manager

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 10, 2024

Thank you for testing.

I'm not sure if you have the latest version from branch next-release.

check_redfish.py now only has 220 lines and this is the last one: https://github.com/bb-Ricardo/check_redfish/blob/next-release/check_redfish.py#L220

Please pull this branch again and try it out: https://github.com/bb-Ricardo/check_redfish/tree/next-release

from check_redfish.

ajoergensen avatar ajoergensen commented on June 10, 2024

You are absolutely right, brainfart on my part.

--info
[OK]: Type: Cisco Systems Inc UCSC-C240-M5SX (CPU: 2, MEM: 768GB) - BIOS: C240M5.4.1.1b.0.0124200238 - Serial: <serial> - Power: On - Name: <hostname>
--firmware
[OK]: Found 3 firmware entries. Use '--detailed' option to display them.
--storage
[OK]: All storage controllers (Storage controller FX3S), volumes and disk drives are in good condition
--proc
[OK]: All processors (2) are in good condition
--memory
[OK]: All memory modules (Total 768GB) are in good condition
--power
[OK]: All power supplies (2) are in good condition and 4 Voltages are OK|'ps_1'=246 'ps_2'=240 'voltage_PSU1_VOUT'=12.1 'voltage_PSU2_VOUT'=12.1 'voltage_P12V'=11.774 'voltage_P3V_BAT_SCALED'=2.995
--temp
[OK]: All temp sensors (22) are in good condition|'temp_VIC_SLOT2_TEMP'=42.0;;90 'temp_TEMP_SENS_FRONT'=26.0;;45 'temp_DDR4_P1_A1_TMP'=31.0;;85 'temp_DDR4_P1_B1_TMP'=32.0;;85 'temp_DDR4_P1_C1_TMP'=31.0;;85 'temp_DDR4_P1_D1_TMP'=33.0;;85 'temp_DDR4_P1_E1_TMP'=33.0;;85 'temp_DDR4_P1_F1_TMP'=33.0;;85 'temp_DDR4_P2_G1_TMP'=35.0;;85 'temp_DDR4_P2_H1_TMP'=35.0;;85 'temp_DDR4_P2_J1_TMP'=34.0;;85 'temp_DDR4_P2_K1_TMP'=40.0;;85 'temp_DDR4_P2_L1_TMP'=40.0;;85 'temp_DDR4_P2_M1_TMP'=40.0;;85 'temp_P1_TEMP_SENS'=43.5;;104 'temp_P2_TEMP_SENS'=45.5;;104 'temp_PSU1_TEMP'=31.0;;65 'temp_PSU2_TEMP'=29.0;;65 'temp_PCH_TEMP_SENS'=33.0;;85 'temp_RISER1_TEMP'=28.0;;70 'temp_RISER2_INLET_TMP'=37.0;;70 'temp_RISER1_INLET_TMP'=32.0;;70
--fan
[OK]: All fans (10) are in good condition|'Fan_MOD1_FAN1_SPEED'=15150;; 'Fan_MOD1_FAN1_SPEED'=15150;; 'Fan_MOD2_FAN1_SPEED'=16160;; 'Fan_MOD2_FAN2_SPEED'=15680;; 'Fan_MOD3_FAN2_SPEED'=14700;; 'Fan_MOD3_FAN1_SPEED'=15150;; 'Fan_MOD4_FAN2_SPEED'=15680;; 'Fan_MOD4_FAN2_SPEED'=15680;; 'Fan_MOD5_FAN1_SPEED'=15150;; 'Fan_MOD6_FAN2_SPEED'=15680;;
--nic
[OK]: All network interfaces (6) are in good condition
--bmc
[OK]: UCSC-C240-M5SX (Firmware: 4.1(1d)) and all nics are in 'OK' state.
--sel
[OK]: Found 50 OK System Event Log entries. Most recent notable: [OK]: 2020-03-25 10:20:27 CET: BIOS_POST_CMPLT: Presence sensor, Device Inserted / Device Present was asserted
--mel
[OK]: Found 100 OK Manager Event Log entries. Most recent notable: [OK]: 2020 Mar 28 12:53:49 CET: Session close (user:user@domain (LDAP), ip:10.10.10.10, id:3199, type:xmlapi)

Excellent work, once again. Thank you.

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 10, 2024

Great, this looks good.
Thank you for all this input.

from check_redfish.

ajoergensen avatar ajoergensen commented on June 10, 2024

Apparently I didn't test enough :( - some checks fail on UCS C220 with firmware 4.1:

--storage
Traceback (most recent call last):
  File "/home/user/check_redfish/check_redfish.py", line 211, in <module>
    if any(x in args.requested_query for x in ['storage', 'all']):  get_storage(plugin)
  File "/home/user/check_redfish/cr_module/storage.py", line 29, in get_storage
    get_storage_generic(plugin_object, system)
  File "/home/user/check_redfish/cr_module/storage.py", line 806, in get_storage_generic
    get_volumes(controller_response.get("Volumes").get("@odata.id"))
AttributeError: 'NoneType' object has no attribute 'get'
--proc
Traceback (most recent call last):
  File "/home/user/check_redfish/check_redfish.py", line 208, in <module>
    if any(x in args.requested_query for x in ['proc', 'all']):     get_system_data(plugin, "procs")
  File "/home/user/check_redfish/check_redfish.py", line 176, in get_system_data
    get_single_system_procs(plugin_object, system)
  File "/home/user/check_redfish/cr_module/proc.py", line 50, in get_single_system_procs
    for proc in processors_response.get("Members") or processors_response.get(system_response_proc_key):
TypeError: 'NoneType' object is not iterable
--nic
[UNKNOWN]: No network interface data returned for API URL '/redfish/v1/Systems/<serial>/EthernetInterfaces/'

Another mockup?

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 10, 2024

you got more/different machines?

Hope this project doesn't turn into a support hell of different vendors with different versions of BMCs, working differently as the ones before. So far Cisco is at the top of the list.

Yes, Mockup would be great.

Thank you

from check_redfish.

ajoergensen avatar ajoergensen commented on June 10, 2024

I know, and the bugs me Cisco is this difficult (there was no difference between C220 and C240 before the firmware upgrade).

I will email the mockup, let me know if there's something we can do to support your work.

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 10, 2024

Can you try it again please?

from check_redfish.

ajoergensen avatar ajoergensen commented on June 10, 2024

--nic and --proc returns 'unknown' due to the data format error you have identified. I'll open a TAC case.

--storage returns warning on both C220 and C240 now (C240 worked before the latest pull)
[WARNING]: Physical Drive SD card (NA / None / None) 0GiB status: WARNING

from check_redfish.

ajoergensen avatar ajoergensen commented on June 10, 2024

Sorry to reopen this, but the storage issue remains; the output on all servers is the same, regardless of the actual storage configuration (we have with FlexFlash, with disk/ssd with and without hw.raid)

all of them outputs: [WARNING]: Physical Drive SD card (NA / None / None) 0GiB status: WARNING

from check_redfish.

bb-Ricardo avatar bb-Ricardo commented on June 10, 2024

The output is based on the data returned from the CIMC.

Just run the plugin with the -v cli option and you will see the JSON data returned.

In this case only CISCO is able to fix this problem.

from check_redfish.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.