Coder Social home page Coder Social logo

smfc's People

Contributors

emansom avatar fcladera avatar petersulyok avatar smtdev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

smfc's Issues

Add BMC cold reset instructions to README

I was pulling my hair out why it wasn't working on my X11SSL-F. It only worked after performing a cold BMC reset using ipmicfg -r.

This nessecity should be made more clear in the README.

Supermicro X9 compatibility

Based on this forum topic we can assume that Super Micro X9 motherboards are behaving differently in setting up fan's level with IPMI. The current working hypothesis is that smfc is not working properly on a X9 motherboards.

This issue was created to collect all testing and correction effort related this issue.
Thanks to @matthuska for raising my attention on this issue.

Super Micro X10DRU-i+ zones

Hi,

I have an idea that this motherboard has not only two but multiple zones.
You may play with this command

# ipmitool raw 0x30 0x70 0x66 0x01 zone 50

substituting a value for zone parameter (e.g. 0, 1, 2, 3, 4) ad report back the output of the command execution.
This command will set the fan level to 50% in the specified zone. With changing the zone parameter your can discover the potential zones on your motherboard. Of course, connected fans may make this experience easier.

X10DRG

Originally posted in another thread, then realized probably better to not clutter that motherboard's thread with X10DRG data.

Anyway, I did some testing and the X10DRG has FOUR zones, not 2.
So instead of zones 0x00 and 0x01, you use 0x00 - 0x03. They're all "CPU zones".

To make things work correctly, you'll need to edit the set_ipmi_fan_level.sh file, and make sure it handles all four zones for CPU calls. After you've tested the .SH files in the ipmi folder and everything is working correctly, update the actual smfc.py file prior to running install.sh, specifically the set_fan_level function, to make sure it handles all four zones in the same way as testing.

Obviously every config is different, so I wanted to share these notes for anyone else running a X10DRG mobo, as it took a few hours to figure this all out and why some of the fans weren't working!!

Thanks to @petersulyok for his work on this, it helped me immensely as I was completely lost by the ipmi documentation.

Feature Request: dracut integration

During boot and reboot cycles the fans will blast at 100% as SMFC isn't running yet.

This can be rather annoying, especially when the server has rebooted itself (e.g. watchdog after kernel panic) and it's waiting on a LUKS key phrase.

Implementing a dracut module would resolve this.

Dual CPU Support

Hello again,

Is it possible to enable dual CPU support and change fan speed based on the hottest one?

Current path:

/sys/devices/platform/coretemp.0/hwmon/hwmon*/temp1_input

Additional path:

/sys/devices/platform/coretemp.1/hwmon/hwmon*/temp1_input

Thanks!

Request support for remote IPMI

it seems like this script is install on a host directly.
but what if the host is proxmox and want to keep it clean, so this script should install in a vm.
possible add remote support?
I think the command is
ipmitool -U ipmi_user_name -P ipmi_password -H ipmi_ip

X12 Support?

Does this program support X12 motherboard or will it support in the future?

Modle Type:
SYS -420GP.TNR X12 4U 10GPU CIE LAKE GEN4 PCIE SYSTEM

Set all fans to maximum speed on exception

Hi,

I would suggest to set all fans to maximum speed if some unhandled exception occured.
I had the service running on my server and changed one of my HDDs to a new one and started the resilvering process.
By sheer luck I had my IPMI view open and saw that my CPU was at 90CĀ°.
The service had an exception, because in the config there was still my old HDD in the /dev/by-id config.

I think there should be a wrapper around the whole program to set all fans to maximum speed if some unhandled exception occures.

Number of HDDs parameter

This is rather minor, but why does "count" parameter for the HD zone exist? The program will complain if the number of drives listed under hd_names doesn't match this, so clearly it can count the number of drives directly from the hd_names array. Is it intended to be a sanity check for the user?

Feature Request: GPU temperature/activity bias

In workstation configurations inside tower cases, when running certain GPU heavy and low CPU workloads; it can lead to scenarios where the top case fans are not running at sufficient CFM for the hot air to be drawn upwards.

When the CPU temperature is low, while the GPU temperature is not.

The GPU (blower style fan) is then recycling its own pocket of hot air, instead of the case fans helping.

To combat this, a bias of sorts could be introduced that influences the curve based on GPU temperature and/or activity.

Document behavior with ULNA

When pairing Noctua fans with Ultra Low Noise Adapters, the default lower threshold of 35% isn't enough.

This threshold needs to be upped to 45% to not upset the firmware, when using ULNA.

Describing this in the README will aid others in debugging and getting their Supermicro systems whisper quiet.

Version 2.4.0 breaks HD zone loading?

It seems the latest refactor of config option handling broke the HD zone?
If using 2.4.0, the CPU zone works fine, while the HD zone configuration does not appear to initialize at all - it won't report errors in the config parameters. If I set enable=0 in the CPU zone, and enable=1 in the HD zone, the program says that no zones are enabled and exits.
Reverting to 2.3.1 both zones work as expected.

Feature Request: summer and winter time of the year fan modes

In some regions of the world, temperature differences between the summer and winter season can be vast.

It would be nice if SMFC could detect the current season (based on system timezone, and possibly some third party library), and switch temperature and fan targets.

This would aid in achieving the highest possible performance and availability all year round.

smfc is not running on Python 3.6

During the execution of github workflow on different Python versions (i.e. 3.6, 3.7, 3.8, 3.9 and 3.10) it turned out the unit tests are failing on Python 3.6 with the following message:

Run pytest
============================= test session starts ==============================
platform linux -- Python 3.6.15, pytest-7.0.1, pluggy-1.0.0
rootdir: /home/runner/work/smfc/smfc, configfile: pytest.ini
collected 23 items

test/test_01_log.py ....                                                 [ 17%]
test/test_02_ipmi.py F....                                               [ 39%]
test/test_03_fancontroller.py .....                                      [ 60%]
test/test_04_cpuzone.py ..                                               [ 69%]
test/test_05_hdzone.py ....F.                                            [ 95%]
test/test_06_main.py F                                                   [100%]

=================================== FAILURES ===================================
________________________ IpmiTestCase.test_get_fan_mode ________________________

self = <test_02_ipmi.IpmiTestCase testMethod=test_get_fan_mode>

    def test_get_fan_mode(self) -> None:
        """This is a unit test for function Ipmi.get_fan_mode()"""
    
        # Test saving valid parameters.
>       self.pt_gfm_p1(Ipmi.STANDARD_MODE, 'ipmi get_fan_mode 1')

test/test_02_ipmi.py:149: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test/test_02_ipmi.py:109: in pt_gfm_p1
    fm = my_ipmi.get_fan_mode()
src/smfc.py:199: in get_fan_mode
    check=False, capture_output=True, text=True)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

input = None, timeout = None, check = False
popenargs = (['/tmp/tmpndgelrov/tmpjkdku5i1.sh', 'raw', '0x30', '0x45', '0x00'],)
kwargs = {'capture_output': True, 'text': True}

    def run(*popenargs, input=None, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.
    
        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.
    
        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.
    
        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.
    
        There is an optional argument "input", allowing you to
        pass a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.
    
        The other arguments are the same as for the Popen constructor.
    
        If universal_newlines=True is passed, the "input" argument must be a
        string and stdout/stderr in the returned object will be strings rather than
        bytes.
        """
        if input is not None:
            if 'stdin' in kwargs:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE
    
>       with Popen(*popenargs, **kwargs) as process:
E       TypeError: __init__() got an unexpected keyword argument 'capture_output'

/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/subprocess.py:423: TypeError
___________________________ HdZoneTestCase.test_init ___________________________

self = <test_05_hdzone.HdZoneTestCase testMethod=test_init>

    def test_init(self) -> None:
        """This is a unit test for function HdZone.__init__()"""
        my_td = TestData()
    
        # Test valid parameters (hd=1 case is not tested because it turns off standby guard).
>       self.pt_init_p1(2, FanController.CALC_MIN, 4, 2, 2, 32, 48, 35, 100, 2, my_td.get_hd_2(), 'hz init 1')

test/test_05_hdzone.py:198: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test/test_05_hdzone.py:60: in pt_init_p1
    my_hdzone = HdZone(my_log, my_ipmi, my_config)
src/smfc.py:682: in __init__
    raise e
src/smfc.py:680: in __init__
    n = self.check_standby_state()
src/smfc.py:774: in check_standby_state
    check=False, capture_output=True, text=True)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

input = None, timeout = None, check = False
popenargs = (['/tmp/tmppm3w63rr/tmpxlfmy7nc.sh', '-i', '-n', 'standby', '/tmp/tmppm3w63rr/dev/disk/by-id/ata-HD_HD1100XOI-D842B22F'],)
kwargs = {'capture_output': True, 'text': True}

    def run(*popenargs, input=None, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.
    
        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.
    
        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.
    
        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.
    
        There is an optional argument "input", allowing you to
        pass a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.
    
        The other arguments are the same as for the Popen constructor.
    
        If universal_newlines=True is passed, the "input" argument must be a
        string and stdout/stderr in the returned object will be strings rather than
        bytes.
        """
        if input is not None:
            if 'stdin' in kwargs:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE
    
>       with Popen(*popenargs, **kwargs) as process:
E       TypeError: __init__() got an unexpected keyword argument 'capture_output'

/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/subprocess.py:423: TypeError
____________________________ MainTestCase.test_main ____________________________

self = <test_06_main.MainTestCase testMethod=test_main>

    def test_main(self) -> None:
        """This is a unit test for function main()"""
    
        # Test standard exits (0, 2).
        self.pt_main_n1('-h', 0, 'smfc main 1')
        self.pt_main_n1('-v', 0, 'smfc main 2')
        # Test exits for invalid command line parameters.
        self.pt_main_n1('-l 4', 2, 'smfc main 3')
        self.pt_main_n1('-o 3', 2, 'smfc main 4')
        self.pt_main_n1('-o 1 -l 5', 2, 'smfc main 5')
        self.pt_main_n1('-o 5 -l 1', 2, 'smfc main 6')
    
        # Test exits (5) at Log() init skipped (cannot be reproduced because of the parsing of
        # the command-line arguments parsing).
    
        # Test exits(6) at configuration file loading.
        self.pt_main_n1('-o 0 -l 3 -c &.txt', 6, 'smfc main 7')
        self.pt_main_n1('-o 0 -l 3 -c ./nonexistent_folder/nonexistent_config_file.conf', 6, 'smfc main 8')
    
        # Test exits(7) at Ipmi() init.
        self.pt_main_n2('NON-EXIST', 0, 0, 7, 'smfc main 9')
        self.pt_main_n2('GOOD', -1, 0, 7, 'smfc main 10')
        self.pt_main_n2('GOOD', 0, -1, 7, 'smfc main 11')
>       self.pt_main_n2('BAD', 0, 0, 7, 'smfc main 12')

test/test_06_main.py:162: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test/test_06_main.py:66: in pt_main_n2
    smfc.main()
src/smfc.py:880: in main
    old_mode = my_ipmi.get_fan_mode()
src/smfc.py:199: in get_fan_mode
    check=False, capture_output=True, text=True)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def run(*popenargs, input=None, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.
    
        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.
    
        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.
    
        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.
    
        There is an optional argument "input", allowing you to
        pass a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.
    
        The other arguments are the same as for the Popen constructor.
    
        If universal_newlines=True is passed, the "input" argument must be a
        string and stdout/stderr in the returned object will be strings rather than
        bytes.
        """
        if input is not None:
            if 'stdin' in kwargs:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE
    
>       with Popen(*popenargs, **kwargs) as process:
E       TypeError: __init__() got an unexpected keyword argument 'capture_output'

/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/subprocess.py:423: TypeError
=========================== short test summary info ============================
FAILED test/test_02_ipmi.py::IpmiTestCase::test_get_fan_mode - TypeError: __i...
FAILED test/test_05_hdzone.py::HdZoneTestCase::test_init - TypeError: __init_...
FAILED test/test_06_main.py::MainTestCase::test_main - TypeError: __init__() ...
========================= 3 failed, 20 passed in 1.72s =========================
Error: Process completed with exit code 1.

Add note for higher RPM variance tolerance for redux line fans

I've moved my X11SSL-F board to another case and equipped it with lower quality NF-P12 redux-1700 PWM (previously used NF-A12x25). On some boots the IPMI complained about too low and too high RPM errors.

I had configured the fans to their correct RPM values (with additional +10% on the upper and -10% on the lower, to account for the tolerance defined by Noctua specs themselves) using a locally modified set_ipmi_threshold.sh (modified for specific fan specs targeted at specific FAN headers, visually inspected).

I have now altered the IPMI fan upper and lower values to account for a higher 25% variance tolerance and thus far no error has appeared in the SEL. Leading me to believe the redux line of Noctua fans need a higher variance tolerance.

H11SSL-i fan problem on proxmox

@Xyz00777 reported an issue in SMFC hardware compatibility #19 issue:

trying to get it working for my H11SSL-i with ASPEED AST2500 with an proxmox install.
because im not sure with fans are connected on what pwm i tried to set lower to 500 for every fan and 2000 as upper limit for every fan in the config

# This script must be executed by root.
if [ "$EUID" -ne 0 ]
then
    echo "ERROR: Please run as root"
    exit -1
fi

# Setup of the lower threshold limits of the fans (Noctua NF-F12 PWM rotation speed 300-1500 rpm).
# Edit the list of fans here (FAN1, FAN2, FAN4, FANA, FANB)!
for i in 1 2 3 5 A B;
do
    # Edit the lower threshold values here (0, 100, 200)!
    ipmitool sensor thresh FAN${i} lower 500 500 500 500 500 500
done

# Setup of the upper threshold limits of the fans (Noctua NF-F12 PWM rotation speed 300-1500 rpm).
# Edit the list of fans here (FAN1, FAN2, FAN4, FANA, FANB)!
for i in 1 2 3 5 A B;
do
    # Edit the upper threshold values here (1600, 1700, 1800)!
    ipmitool sensor thresh FAN${i} upper 2000 2000 2000 2000 2000 2000
done

i have Iceberg Thermal IceGALE Xtra with 500-2500 rpm and Noctua NH-U9 TR4-SP3 with 400-2000 rpm

after i loaded the modules and executed the install.sh file i have startet the service and got these journalctl log and the service crashed with 100% fan speed

May 31 03:07:18 ds9 systemd[1]: Started smfc.service - Super Micro Fan Control.
May 31 03:07:18 ds9 smfc.service[11931]: Logging module was initialized with:
May 31 03:07:18 ds9 smfc.service[11931]:    log_level = 3
May 31 03:07:18 ds9 smfc.service[11931]:    log_output = 2
May 31 03:07:18 ds9 smfc.service[11931]: Command line arguments:
May 31 03:07:18 ds9 smfc.service[11931]:    original arguments: /opt/smfc/smfc.py -c /opt/smfc/smfc.conf -l 3
May 31 03:07:18 ds9 smfc.service[11931]:    parsed config file = /opt/smfc/smfc.conf
May 31 03:07:18 ds9 smfc.service[11931]:    parsed log level = 3
May 31 03:07:18 ds9 smfc.service[11931]:    parsed log output = 2
May 31 03:07:19 ds9 smfc.service[11931]: Ipmi module was initialized with:
May 31 03:07:19 ds9 smfc.service[11931]:    command = /usr/bin/ipmitool
May 31 03:07:19 ds9 smfc.service[11931]:    fan_mode_delay = 10
May 31 03:07:19 ds9 smfc.service[11931]:    fan_level_delay = 2
May 31 03:07:19 ds9 smfc.service[11931]:    swapped_zones = False
May 31 03:07:29 ds9 smfc.py[11931]: Traceback (most recent call last):
May 31 03:07:29 ds9 smfc.py[11931]:   File "/opt/smfc/smfc.py", line 1150, in <module>
May 31 03:07:29 ds9 smfc.py[11931]:     service.run()
May 31 03:07:29 ds9 smfc.py[11931]:   File "/opt/smfc/smfc.py", line 1119, in run
May 31 03:07:29 ds9 smfc.py[11931]:     self.cpu_zone = CpuZone(self.log, self.ipmi, self.config)
May 31 03:07:29 ds9 smfc.py[11931]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 31 03:07:29 ds9 smfc.py[11931]:   File "/opt/smfc/smfc.py", line 600, in __init__
May 31 03:07:29 ds9 smfc.py[11931]:     super().__init__(
May 31 03:07:29 ds9 smfc.py[11931]:   File "/opt/smfc/smfc.py", line 395, in __init__
May 31 03:07:29 ds9 smfc.py[11931]:     self.build_hwmon_path(hwmon_path)
May 31 03:07:29 ds9 smfc.py[11931]:   File "/opt/smfc/smfc.py", line 632, in build_hwmon_path
May 31 03:07:29 ds9 smfc.py[11931]:     raise ValueError(self.ERROR_MSG_FILE_IO.format(path))
May 31 03:07:29 ds9 smfc.py[11931]: ValueError: Cannot read file (/sys/devices/platform/coretemp.0/hwmon/hwmon*/temp1_input).
May 31 03:07:33 ds9 smfc.service[11931]: smfc terminated: all fans are switched back to the 100% speed.
May 31 03:07:33 ds9 systemd[1]: smfc.service: Main process exited, code=exited, status=1/FAILURE
May 31 03:07:33 ds9 systemd[1]: smfc.service: Failed with result 'exit-code'.

Please help i dont want my fans to spin up every ~10 sec for 5 sec :(

Feature Request: interactive setup CLI

To address #8 and #12, as well as streamline the process of setting up zones, configuring hd_names it would help if there would be an interactive setup CLI.

This setup CLI would then prompt a few questions:

$ sfmc-config
f) assign fan to zone
u) set ULNA state of fan
c) create zone
z) configure zone
> c
Name of zone:
> CPU
Zone 'CPU' created.
f) assign fan to zone
u) set ULNA state of fan
c) create zone
z) configure zone
> c
Name of zone:
> Case
Zone 'Case' created.
f) assign fan to zone
u) set ULNA state of fan
c) create zone
z) configure zone
> f
Which fan?
Fans available: [FAN1, FAN2]
> FAN1
Which zone to assign to this fan?
Zones available: [CPU, Case]
> CPU
Fan 'FAN1' assigned to zone 'CPU'
f) assign fan to zone
u) set ULNA state of fan
c) create zone
z) configure zone
> f
Which fan?
Fans available: [FAN1, FAN2]
> FAN2
Which zone to assign to this fan?
Zones available: [CPU, Case]
> Case
Fan 'FAN2' assigned to zone 'Case'
f) assign fan to zone
u) set ULNA state of fan
c) create zone
z) configure zone
> u
Ultra Low Noise Adapter state
Which fan to configure?
Fans available: [FAN1, FAN2]
> FAN1
Fan 'FAN1' selected.
Currently ULNA state is disabled.
Enable this if you are using an ULNA adapter.
Toggle state to enabled? Y/n
> y
ULNA state enabled for fan 'FAN1'.
) assign fan to zone
u) set ULNA state of fan
c) create zone
z) configure zone
> u
Ultra Low Noise Adapter state
Which fan to configure?
Fans available: [FAN1, FAN2]
> FAN2
Fan 'FAN2' selected.
Currently ULNA state is disabled.
Enable this if you are using an ULNA adapter.
Toggle state to enabled? Y/n
> y
ULNA state enabled for fan 'FAN2'.

SMFC hardware compatibility

This issue is the collection of the compatibility feedbacks.
Please leave a comment here with the name of your Super Micro motherboard if you either :

  1. successfully executed SMFC

or

  1. you tried and failed because of a compatibility issue

Thanks for your feedback.

Distro packaging for SMFC

I'm in the process of creating an .spec file here to be used in conjunction with tito for RPM release management.

This will allow publishing to Copr for easy package testing (currently testing here), and eventual upstreaming to EPEL for wider package availability to the EL community at large.

Once the .spec file conforms to all Fedora and EL packaging rules, i'll open up a PR here to merge it in.

However, one of the hurdles I ran into with SMFC while testing on EL: is that the default kernel that ships with RHEL 9, Rocky Linux 9 and CentOS Stream 9 does not have the drivetemp kernel module built-in. This can be worked around with hddtemp to success, but it should not be broken on package installation.

Further the requirement to modify the hd_names variable first before it's functional is tedious and further complicates a functional state from the get go.

It would help the out-of-the-box experience for EL users to have SMFC automatically discover HDDs when e.g. hd_names is set to auto and to have SMFC fall-back/default to hddtemp for temperature.

Use HDD Highest Temp

Hi there,

A mate and I have been playing with this script, and it works well, thanks for your effort!

We were wondering though, if it would be possible to change fan speed based on the drive with the highest temp instead of the average temp?

Often there is one or two drives that run hotter then the rest, but they get lost in the 'average' and end up staying above the threshold.

Thanks

Feature Request: autoconfiguration of IPMI fan upper and lower bounds

Currently setting up the correct lower and upper values can be somewhat error-prone as #28 demonstrated, it would help if SMFC did this job of configuring the IPMI fan upper and lower bounds for us instead.

How I imagine this could work is the following:

  1. Having some way of tieing the corresponding Noctua fan model to a specific FAN header (FAN1, FAN2, FANA etc.) via the configuration file.
  2. The released version of SMFC already contains a definition file where the upper and lower bounds for specific Noctua models are defined within.
  3. This definition file is then used to look up the upper and lower bounds for the given Noctua model found in the configuration file.
  4. Lookup failures will result in SMFC refusing to start and logging the error.
  5. If the upper and lower bounds found within the definition file don't match with the IPMI data, SMFC will correct this and print a warning/info message that a cold reset of the BMC should be performed.

The definition file could be generated by crawling Noctua websites using e.g. htmlq

The configuration section syntax could look something like the following:

[Fan models]
FAN1 = "Noctua NF-A12x15 PWM"
FAN3 = "Noctua NF-P12 redux-1700 PWM"
FANA = "Noctua NF-P12 redux-1700 PWM"
FAN4 = "Noctua NF-R8 redux-1800 PWM"

Just a quick draft. Feel free to share a better syntax and/or implementation idea.

Error starting daemon

This is on latest Debian 11 Bullseye. Any idea?

I get the following error:

# /opt/smfc/smfc.py -c /opt/smfc/smfc.conf -l 3 -o 1
DEBUG: Logging module was initialized with:
DEBUG:    log_level = 3
DEBUG:    log_output = 1
DEBUG: Command line arguments:
DEBUG:    original arguments: /opt/smfc/smfc.py -c /opt/smfc/smfc.conf -l 3 -o 1
DEBUG:    parsed config file = /opt/smfc/smfc.conf
DEBUG:    parsed log level = 3
DEBUG:    parsed log output = 1
DEBUG: Configuration file (/opt/smfc/smfc.conf) loaded
DEBUG: Ipmi module was initialized with :
DEBUG:    command = /usr/bin/ipmitool
DEBUG:    fan_mode_delay = 10
DEBUG:    fan_level_delay = 2
ERROR: invalid literal for int() with base 10: ''.

smfc.conf:

# vim:isfname-==
#
#   smfc.conf
#   smfc service configuration parameters
#


[Ipmi]
# Path for ipmitool (str, default=/usr/bin/ipmitool)
command=/usr/bin/ipmitool
# Delay time after changing IPMI fan mode (int, seconds, default=10)
fan_mode_delay=10
# Delay time after changing IPMI fan level (int, seconds, default=2)
fan_level_delay=2

[CPU zone]
# Fan controller enabled (bool, default=0)
enabled=1
# Number of CPUs (int, default=1)
count=1
# Calculation method for CPU temperatures (int, [0-minimum, 1-average, 2-maximum], default=1)
temp_calc=1
# Discrete steps in mapping of temperatures to fan level (int, default=6)
steps=6
# Threshold in temperature change before the fan controller reacts (float, C, default=3.0)
sensitivity=3.0
# Polling time interval for reading temperature (int, sec, default=2)
polling=2
# Minimum CPU temperature (float, C, default=30.0)
min_temp=30.0
# Maximum CPU temperature (float, C, default=60.0)
max_temp=65.0
# Minimum CPU fan level (int, %, default=35)
min_level=10
# Maximum CPU fan level (int, %, default=100)
max_level=100
# Optional parameter, it will be generated automatically (can be used for testing and in special cases).
# Path for CPU sys/hwmon/coretemp file(s) (str multi-line list, default=/sys/devices/platform/coretemp.0/hwmon/hwmon*/temp1_input)
# hwmon_path=/sys/devices/platform/coretemp.0/hwmon/hwmon*/temp1_input
#            /sys/devices/platform/coretemp.1/hwmon/hwmon*/temp1_input


[HD zone]
# Fan controller enabled (bool, default=0)
enabled=1
# Number of HDs (int, default=1)
count=20
# Calculation of HD temperatures (int, [0-minimum, 1-average, 2-maximum], default=1)
temp_calc=1
# Discrete steps in mapping of temperatures to fan level (int, default=4)
steps=4
# Threshold in temperature change before the fan controller reacts (float, C, default=2.0)
sensitivity=2.0
# Polling interval for reading temperature (int, sec, default=10)
polling=10
# Minimum HD temperature (float, C, default=32.0)
min_temp=10.0
# Maximum HD temperature (float, C, default=46.0)
max_temp=46.0
# Minimum HD fan level (int, %, default=35)
min_level=10
# Maximum HD fan level (int, %, default=100)
max_level=100
# Names of the HDs (str multi-line list, default=)
# These names MUST BE specified in '/dev/disk/by-id/...'' form!
# See /dev/disk/by-id for a list
hd_names=/dev/disk/by-id/ta-ST8000VN0022-2EL112_ZA15MZ6H
	 /dev/disk/by-id/ta-ST8000VN0022-2EL112_ZA18AGTZ
	 /dev/disk/by-id/ta-ST8000VN0022-2EL112_ZA18YT0D
	 /dev/disk/by-id/ta-ST8000VN0022-2EL112_ZA18ZHAD
	 /dev/disk/by-id/ta-ST8000VN0022-2EL112_ZA19LB36
	 /dev/disk/by-id/ta-ST8000VN0022-2EL112_ZA19VNKB
	 /dev/disk/by-id/ta-ST8000VN0022-2EL112_ZA19XF88
	 /dev/disk/by-id/ta-ST8000VN0022-2EL112_ZA19Z568
	 /dev/disk/by-id/ta-ST8000VN0022-2EL112_ZA19Z6RH
	 /dev/disk/by-id/ta-ST8000VN0022-2EL112_ZA1E0VC6
	 /dev/disk/by-id/ta-ST8000VN0022-2EL112_ZA1E116P
	 /dev/disk/by-id/ta-ST8000VN0022-2EL112_ZA1E3V2G
	 /dev/disk/by-id/ta-ST8000VN0022-2EL112_ZA1FB9FQ
	 /dev/disk/by-id/ta-ST8000VN0022-2EL112_ZA1FECM9
	 /dev/disk/by-id/ta-ST8000VN0022-2EL112_ZA1FH7AX
	 /dev/disk/by-id/ta-ST8000VN004-2M2101_WKD04YD5
	 /dev/disk/by-id/ta-ST8000VN004-2M2101_WKD19NFM
	 /dev/disk/by-id/ta-ST8000VN004-2M2101_WKD1F7WW
	 /dev/disk/by-id/ta-Samsung_SSD_860_EVO_1TB_S3Z9NB0K731782M
	 /dev/disk/by-id/ta-Samsung_SSD_860_EVO_1TB_S3Z9NB0K731786Y
# Optional parameter, it will be generated automatically (can be used for testing and in special cases).
# Path for HD sys/hwmon/drivetemp file(s) (str multi-line list, default=/sys/class/scsi_disk/0:0:0:0/device/hwmon/hwmon*/temp1_input)
# hwmon_path=/sys/class/scsi_disk/0:0:0:0/device/hwmon/hwmon*/temp1_input
#            /sys/class/scsi_disk/1:0:0:0/device/hwmon/hwmon*/temp1_input
# Standby guard feature for RAID arrays (bool, default=0)
standby_guard_enabled=0
# Number of HDs already in STANDBY state before the full RAID array will be forced to it (int, default=1)
standby_hd_limit=1
# Path for 'smartctl' command (str, default=/usr/sbin/smartctl)
smartctl_path=/usr/sbin/smartctl

Enumeration inconsistency in smfc configuration

The issue: smfc v1.2 configuration defines the list of the hard disks and the adequate list of the temperature files in sys/hwmon system in the following way:

hd_names=/dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
hwmon_path=/sys/class/scsi_device/0:0:0:0/device/hwmon/hwmon*/temp1_input
           /sys/class/scsi_device/1:0:0:0/device/hwmon/hwmon*/temp1_input
           /sys/class/scsi_device/2:0:0:0/device/hwmon/hwmon*/temp1_input
           /sys/class/scsi_device/3:0:0:0/device/hwmon/hwmon*/temp1_input
           /sys/class/scsi_device/4:0:0:0/device/hwmon/hwmon*/temp1_input
           /sys/class/scsi_device/5:0:0:0/device/hwmon/hwmon*/temp1_input
           /sys/class/scsi_device/6:0:0:0/device/hwmon/hwmon*/temp1_input
           /sys/class/scsi_device/7:0:0:0/device/hwmon/hwmon*/temp1_input

Unfortunately both lists are created and enumerated in the Linux Kernel independently at boot time, so the identical order and proper pairing are not guaranteed (and could be different after a reboot). This problem will have different consequences in different hardware configurations. Here are two examples:

  1. There are 8 HDs used in a RAID array and the OS is booted from a NVME SSD. In this case smfc is working properly (i..e. the reading temperature of HDs, and Standby Guard feature is working properly) despite of inconsistency of the lists above.
  2. There are 5 HDs installed, four of them are organized in a RAID array, OS is booted from the fifth HD. In this case the inconsistency of the lists will generate real problems. Reading of the temperatures of the HDs and the Standby Guard feature can mix HDs inside and outside of the RAID array.

Supermicro boards incompatible with ULNA

Previously I mentioned fans with ULNA (Ultra Low Noise Adapter) requiring a higher minimum to not upset the IPMI.

And after a few months running in such configuration, I've had two fans (one NF-A12x15 and one NF-A12x25) die on me.

Leading me to the conclusion, that with the ULNA cables, the Supermicro board is sending either too high or too low voltages to the fans.

A cautionary warning of incompatibility with ULNA cables should probably be added to the README, as well as a comment inside the configuration files.

Allow swapping of CPU and HDD zones

Would it be possible to add a config setting to swap the CPU and HDD/IO zones?

My current setup is to have the CPU fan on FAN A, and then case fans hooked up to FAN 1-4 to keep the hard drives cool. This is a swap of what is suggested by Supermicro, where FAN A+B are for IO, and FAN 1-4 are for the CPU, but in reading forums I think it is a common setup for people who are doing manual fan control using userspace scripts such as smfc.

Feature Request: PCI/GPU/TPU/DPU zone

Currently there are two zones: one for processors and one for storage.

However, some servers (and retrofitted servers to desktop workstations) have PCIe devices with temperature sensors on them (e.g. AMD GPUs).

Creating another zone for special PCIe devices would help giving these devices extra airflow.

As well as possibly having an option to influence all zones with a certain percentage above a certain threshold, kind of like a boost, given some high-end PCIe devices often utilize more power than the rest of the system combined.

Cannot read hwmon*/temp1_input file

Hi,

I'm trying to get the smfc tool working on my homelab (X10DRi) but got stuck with the hwmon paths.
Seems that the CPU zone is initialized corretly but the HD zone can't find the temp1_input files anywhere.

I did some digging and indeed, all my /sys/class/scsi_disk/* folders don't have a hwmon directory in them.
Any idea what I'm doing wrong here?

DEBUG output

root@mars:/opt/smfc# ./smfc.py -o 0 -l 4
CONFIG: Logging module was initialized with:
CONFIG:    log_level = 4
CONFIG:    log_output = 0
CONFIG: Command line arguments:
CONFIG:    original arguments: ./smfc.py -o 0 -l 4
CONFIG:    parsed config file = smfc.conf
CONFIG:    parsed log level = 4
CONFIG:    parsed log output = 0
DEBUG: Configuration file (smfc.conf) loaded
CONFIG: Ipmi module was initialized with:
CONFIG:    command = /usr/bin/ipmitool
CONFIG:    fan_mode_delay = 10
CONFIG:    fan_level_delay = 2
CONFIG:    swapped_zones = True
DEBUG: Old IPMI fan mode = FULL_MODE
DEBUG: CPU zone fan controller enabled
CONFIG: CPU zone fan controller was initialized with:
CONFIG:    ipmi zone = 0
CONFIG:    count = 2
CONFIG:    temp_calc = 1
CONFIG:    steps = 6
CONFIG:    sensitivity = 3.0
CONFIG:    polling = 2.0
CONFIG:    min_temp = 30.0
CONFIG:    max_temp = 60.0
CONFIG:    min_level = 35
CONFIG:    max_level = 100
CONFIG:    hwmon_path = ['/sys/devices/platform/coretemp.0/hwmon/hwmon4/temp1_input', '/sys/devices/platform/coretemp.1/hwmon/hwmon5/temp1_input']
CONFIG:    Temperature to level mapping:
CONFIG:    0. [T:30.0C - L:35%]
CONFIG:    1. [T:35.0C - L:45%]
CONFIG:    2. [T:40.0C - L:56%]
CONFIG:    3. [T:45.0C - L:67%]
CONFIG:    4. [T:50.0C - L:78%]
CONFIG:    5. [T:55.0C - L:89%]
CONFIG:    6. [T:60.0C - L:100%]
DEBUG: HD zone fan controller enabled
Traceback (most recent call last):
  File "/opt/smfc/./smfc.py", line 975, in <module>
    main()
  File "/opt/smfc/./smfc.py", line 951, in main
    my_hd_zone = HdZone(my_log, my_ipmi, my_config)
  File "/opt/smfc/./smfc.py", line 703, in __init__
    super().__init__(
  File "/opt/smfc/./smfc.py", line 390, in __init__
    self.build_hwmon_path(hwmon_path)
  File "/opt/smfc/./smfc.py", line 786, in build_hwmon_path
    raise ValueError(self.ERROR_MSG_FILE_IO.format(path))
ValueError: Cannot read file (/sys/class/scsi_disk/0:0:13:0/device/hwmon/hwmon*/temp1_input).

Config file

[Ipmi]
# Path for ipmitool (str, default=/usr/bin/ipmitool)
command=/usr/bin/ipmitool
# Delay time after changing IPMI fan mode (int, seconds, default=10)
fan_mode_delay=10
# Delay time after changing IPMI fan level (int, seconds, default=2)
fan_level_delay=2
# CPU and HD zones are swapped (bool, default=0).
swapped_zones=1

[CPU zone]
# Fan controller enabled (bool, default=0)
enabled=1
# Number of CPUs (int, default=1)
count=2
# Calculation method for CPU temperatures (int, [0-minimum, 1-average, 2-maximum], default=1)
temp_calc=1
# Discrete steps in mapping of temperatures to fan level (int, default=6)
steps=6
# Threshold in temperature change before the fan controller reacts (float, C, default=3.0)
sensitivity=3.0
# Polling time interval for reading temperature (int, sec, default=2)
polling=2
# Minimum CPU temperature (float, C, default=30.0)
min_temp=30.0
# Maximum CPU temperature (float, C, default=60.0)
max_temp=60.0
# Minimum CPU fan level (int, %, default=35)
min_level=35
# Maximum CPU fan level (int, %, default=100)
max_level=100
# Optional parameter, it will be generated automatically (can be used for testing and in special cases).
# Path for CPU sys/hwmon/coretemp file(s) (str multi-line list, default=/sys/devices/platform/coretemp.0/hwmon/hwmon*/temp1_input)
# hwmon_path=/sys/devices/platform/coretemp.0/hwmon/hwmon*/temp1_input
#            /sys/devices/platform/coretemp.1/hwmon/hwmon*/temp1_input


[HD zone]
# Fan controller enabled (bool, default=0)
enabled=1
# Number of HDs (int, default=1)
count=23
# Calculation of HD temperatures (int, [0-minimum, 1-average, 2-maximum], default=1)
temp_calc=1
# Discrete steps in mapping of temperatures to fan level (int, default=4)
steps=4
# Threshold in temperature change before the fan controller reacts (float, C, default=2.0)
sensitivity=2.0
# Polling interval for reading temperature (int, sec, default=10)
polling=10
# Minimum HD temperature (float, C, default=32.0)
min_temp=32.0
# Maximum HD temperature (float, C, default=46.0)
max_temp=46.0
# Minimum HD fan level (int, %, default=35)
min_level=35
# Maximum HD fan level (int, %, default=100)
max_level=100
# Names of the HDs (str multi-line list, default=)
# These names MUST BE specified in '/dev/disk/by-id/...' form!
hd_names=/dev/disk/by-id/scsi-SATA_Samsung_SSD_870_S6PUNX0T715310D
         /dev/disk/by-id/scsi-SATA_SATA_SSD_67F407531F2400139578
         /dev/disk/by-id/scsi-SATA_SATA_SSD_96D70754012400149905
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ0XPDH0000C915756F
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ1CRWT0000C9206GHS
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ1D3FM0000C9206HJW
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ1F6HL0000C920JKGJ
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ1F6MV0000C920N6EE
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ1KB860000C850L5S0
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ1LZG30000C9247N7F
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ245PF0000C843F5T0
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ24AAS0000C920N7L6
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ2DCZ60000C925CLFV
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ2E8CC0000C922FXGV
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ2G2330000C9201HJX
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ2JDXL0000C93432Y5
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ2JSFW0000G84101EM
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ2KASF0000C9355QHB
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ2KZVZ0000C9342YAG
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ2LPKD0000C9362E9U
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ2M09R0000C9362FJX
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ443GZ0000C006EFT5
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ4CGJ50000C008J0BA
# Optional parameter, it will be generated automatically (can be used for testing and in special cases).
# Path for HD sys/hwmon/drivetemp file(s) (str multi-line list, default=/sys/class/scsi_disk/0:0:0:0/device/hwmon/hwmon*/temp1_input)
# hwmon_path=/sys/class/scsi_disk/0:0:0:0/device/hwmon/hwmon*/temp1_input
#            /sys/class/scsi_disk/1:0:0:0/device/hwmon/hwmon*/temp1_input

/sys/class/scsi_disk content

root@mars:/sys/class/scsi_disk# ls
0:0:0:0  0:0:10:0  0:0:12:0  0:0:14:0  0:0:16:0  0:0:18:0  0:0:2:0  0:0:4:0  0:0:6:0  0:0:8:0  10:0:0:0  9:0:0:0
0:0:1:0  0:0:11:0  0:0:13:0  0:0:15:0  0:0:17:0  0:0:19:0  0:0:3:0  0:0:5:0  0:0:7:0  0:0:9:0  5:0:0:0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.