Coder Social home page Coder Social logo

Comments (5)

hnrose avatar hnrose commented on June 17, 2024

I thought that the current OpenSM spec file supports old daemon management framework SysV (RHEL 6.X) .

What distributions are of interest ?

from opensm.

bdrung avatar bdrung commented on June 17, 2024

All recent distributions (Debian, Ubuntu, Fedora, etc) would benefit from a systemd service file. Let's quote lintian for a reasoning:

The specified init.d script has no equivalent systemd service.

Whilst systemd has a SysV init.d script compatibility mode, providing native systemd support has many advantages such as being able to specify security hardening features.

from opensm.

jamespharvey20 avatar jamespharvey20 commented on June 17, 2024

I'm the maintainer for Arch Linux's AUR opensm (and other InfiniBand) packages. (To be clear, AUR packages are maintained by any user who adopts the packages - InfiniBand packages are not part of Arch's official repositories.)

When I ran into the problem of no systemd service file, I copied (and gave credit) to the systemd opensm.service file included by Fedora. I would really like to see this or a version of it included here, as well. Fedora also notes that there is a timing bug that intermittently causes a signal 15 failure on start, so their workaround is to use a separate script. I have no idea if this intermittent timing bug still exists.

They use this separate script of theirs to allow multiple versions of opensm to run, on multiple ports.

opensm.service

Unit]
Description=Starts the OpenSM InfiniBand fabric Subnet Manager
Documentation=man:opensm
DefaultDependencies=false
Before=network.target remote-fs-pre.target
Requires=rdma.service
After=rdma.service

[Service]
Type=forking
ExecStart=/usr/libexec/opensm-launch

[Install]
WantedBy=network.target

opensm.launch

#!/bin/bash
#
# Launch the necessary OpenSM daemons for systemd
#
# sysconfig: /etc/sysconfig/opensm
# config: /etc/rdma/opensm.conf
#

shopt -s nullglob

prog=/usr/sbin/opensm
[ -f /etc/sysconfig/opensm ] && . /etc/sysconfig/opensm

[ -n "$PRIORITY" ] && prio="-p $PRIORITY"

if [ -z "$GUIDS" ]; then
   CONFIGS=""
   CONFIG_CNT=0
   for conf in /etc/rdma/opensm.conf.[0-9]*; do
      CONFIGS="$CONFIGS $conf"
      let CONFIG_CNT++
   done
else
   GUID_CNT=0
   for guid in $GUIDS; do
      let GUID_CNT++
   done
fi
# Start opensm
if [ -n "$GUIDS" ]; then
   SUBNET_COUNT=0
   for guid in $GUIDS; do
      SUBNET_PREFIX=`printf "0xfe800000000000%02d" $SUBNET_COUNT`
      (while true; do $prog $prio -g $guid --subnet_prefix $SUBNET_PREFIX; sleep 30; done) &                                                                      
      let SUBNET_COUNT++
   done
elif [ -n "$CONFIGS" ]; then
   for config in $CONFIGS; do
      (while true; do $prog $prio -F $config; sleep 30; done) &
   done
else
   (while true; do $prog $prio; sleep 30; done) &
fi
exit 0

I just tried running multiple interfaces for the first time myself, and ran across that their method of giving opensm a unique --subnet_prefix is broken, because this option is no longer a valid option for opensm. Running two instances of opensm -g <different GUIDS> appears to work, but I'm assuming at one point in the past, opensm might have complained if there were multiple versions running on the same subnet prefix.

If you do not want multiple interface support, opensm.launch can be simplified to:

#!/bin/bash

(while true; do /usr/bin/opensm; sleep 30; done) &
exit 0

opensm.sysconfig

# Problem #1: Multiple IB fabrics needing a subnet manager
#
# In the event that a machine has more than one IB subnet attached,
# and that machine is an opensm server, by default, opensm will
# only attach to one port and will not manage the fabric on the
# other port.  There are two ways to solve this problem:
#
# 1) Start opensm on multiple machines and configure it to manage
#    different fabrics on each machine
# 2) Configure opensm to start multiple instances on a single
#    machine
#
# Both solutions to this problem require non-standard configurations.
# In other words, you would normally have to modify /etc/rdma/opensm.conf
# and once you do that, the file will no longer be updated for new
# options when opensm is upgraded.  In an effort to allow people to
# have more than one subnet managed by opensm without having to modify
# the system default opensm.conf file, we have enabled two methods
# for modifying the default opensm config items needed to enable
# multiple fabric management.
#
# Method #1: Create multiple opensm.conf files in non-standard locations
#   Copy /etc/rdma/opensm.conf to /etc/rdma/opensm.conf.<number>
#     (do this once for each instance you want started)
#   Edit each copy of the opensm.conf file to reflect the necessary changes
#     for a multiple instance startup.  If you need to manage more than
#     one fabric, you will have to change the guid option in each file
#     to specify the guid of the specific port you want opensm attached
#     to.
#
# The advantage to method #1 is that, on the off chance you want to do
# really special custom things on different ports, like have different
# QoS settings depending on which port you are attached to, you have the
# freedom to edit any and all settings for each instance without those
# changes affecting other instances or being lost when opensm upgrades.
#
# Method #2: Specify multiple GUIDS variable entries in this file
#   Uncomment the below GUIDS variable and enter each guid you need to attach
#     to into the list.  If using this method you need to enter each
#     guid into the list as we won't attach to any default ports, only
#     those specified in the list.
#
#GUIDS="0x0002c90300048ca1 0x0002c90300048ca2"
#
# The obvious advantage to method #2 is that it's simple and doesn't
# clutter up your file system, but it is far more limited in what you
# can do.  If you enable method #2, then even if you create the files
# referenced in method #1, they will be ignored.
#
# Problem #2: Activating a backup subnet manager
#
# The default priority of opensm is set so that it wants to be the
# primary subnet manager.  This is great when you are only running
# opensm on one server, but if you want to have a non-primary opensm
# instance for failover, then you have to manually edit the opensm.conf
# file like for problem #1.  This carries with it all the problems
# listed above.  If you wish to enable opensm as a non-primary manager,
# then you can uncomment the PRIORITY variable below and set it to
# some number between 0 and 15, where 15 is the highest priority and
# the primary manager, with 0 being the lowest backup server.  This method
# will work with the GUIDS option above, and also with the multiple
# config files in method #1 above.  However, only a single priority is
# supported here.  If you wanted more than one priority (say this machine
# is the primary on the first fabric, and second on the second fabric,
# while the other opensm server is primary on the second fabric and
# second on the primary), then the only way to do that is to use method #1
# above and individually edit the config files.  If you edit the config
# files to set the priority and then also set the priority here, then
# this setting will override the config files and render that particular
# edit useless.
#
#PRIORITY=15

from opensm.

 avatar commented on June 17, 2024

When I ran into the problem of no systemd service file, I copied (and gave credit) to the systemd opensm.service file included by Fedora. I would really like to see this or a version of it included here, as well. Fedora also notes that there is a timing bug that intermittently causes a signal 15 failure on start, so their workaround is to use a separate script. I have no idea if this intermittent timing bug still exists.

There is no signal 15 failure for Fedora. Please see explanation in this bug page.
https://bugzilla.redhat.com/show_bug.cgi?id=1663785

from opensm.

jamespharvey20 avatar jamespharvey20 commented on June 17, 2024

When I ran into the problem of no systemd service file, I copied (and gave credit) to the systemd opensm.service file included by Fedora. I would really like to see this or a version of it included here, as well. Fedora also notes that there is a timing bug that intermittently causes a signal 15 failure on start, so their workaround is to use a separate script. I have no idea if this intermittent timing bug still exists.

There is no signal 15 failure for Fedora. Please see explanation in this bug page.
https://bugzilla.redhat.com/show_bug.cgi?id=1663785

Yeah, I was given bad info about that. At the link from HonggangLI, there's discussion of how it's done so opensm stays running, as it (at least in the past) closes in certain situations like a cable being unplugged. (The link is well worth a read.) If that's still opensm's native behavior, I think it would be nice if it was changed. I don't think anyone would want it to close in situations like that. It's of course different, but that would be like having dhcpd close whenever a client unplugged.

from opensm.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.