Coder Social home page Coder Social logo

opensm's People

Contributors

alexminchiu avatar bdrung avatar bvanassche avatar chu11 avatar cyberang3l avatar cyrilleverrier avatar dledford avatar ezahavi avatar gregoire-phil avatar hnrose avatar honggang-li avatar itaibaz avatar jdomke avatar jecavil avatar jf6b avatar jgunthorpe avatar kleindaniel7 avatar kmahesh85 avatar meier avatar mismail-asal avatar nmorey avatar ornechemia avatar rolandd avatar roymenczer avatar rpears0n avatar shefty avatar snimrod avatar stanfordlightfoot avatar tamirronen avatar weiny2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opensm's Issues

non-POSIX variable name in Makefile.am

autoreconf 2.69-11 (on Debian unstable) produces some warnings:

Makefile.am:17: warning: wildcard scripts/*: non-POSIX variable name
Makefile.am:17: (probably a GNU make extension)
complib/Makefile.am:20: warning: shell grep LIBVERSION= $(srcdir: non-POSIX variable name
complib/Makefile.am:20: (probably a GNU make extension)
complib/Makefile.am: installing 'config/depcomp'
libopensm/Makefile.am:20: warning: shell grep LIBVERSION= $(srcdir: non-POSIX variable name
libopensm/Makefile.am:20: (probably a GNU make extension)
libvendor/Makefile.am:22: warning: shell grep LIBVERSION= $(srcdir: non-POSIX variable name
libvendor/Makefile.am:22: (probably a GNU make extension)
opensm/Makefile.am:47: warning: ':='-style assignments are not portable
configure.ac: installing 'config/ylwrap'
osmeventplugin/Makefile.am:21: warning: shell grep LIBVERSION= $(srcdir: non-POSIX variable name
osmeventplugin/Makefile.am:21: (probably a GNU make extension)

opensm has errors from man

The opensm man page provokes warnings or errors from man:

(unstable)root@host:~# LC_ALL=C.UTF-8 MANROFFSEQ='' MANWIDTH=80 man --warnings -E UTF-8 -l -Tutf8 -Z /usr/share/man/man8/opensm.8.gz >/dev/null
<standard input>:584: warning [p 9, 1.0i]: cannot adjust line
<standard input>:748: warning [p 11, 6.0i]: cannot adjust line
<standard input>:749: warning [p 11, 6.3i]: cannot adjust line

"cannot adjust" or "can't break" are trouble with paragraph filling, usually related to long lines. Adjustment can be helped by left justifying, breaks can be helped with hyphenation, see "Manipulating
Filling and Adjusting" and "Manipulating Hyphenation" in the groff manual (see info groff).

Lintian also stricter in regards to declaring manpage preprocessors.

To test this for yourself you can use the following command:

 LC_ALL=en_US.UTF-8 MANROFFSEQ='' MANWIDTH=80 \
        man --warnings -E UTF-8 -l -Tutf8 -Z <file> >/dev/null

Opensm 3.3.21 Crash on debian systems

Dear Developers

We are experiencing opensm (V 3.3.21 ) crash from time to time on debian hosts, our developers suspect that it's thing with lash algorithm, Please can you advise a fix here.

from crash report

Signal: 11
SourcePackage: opensm
Stacktrace:
#0 0x00005636f04835a9 in get_next_switch (p_lash=0x1, link=, sw=0) at osm_ucast_lash.c:337
No locals.
#1 generate_cdg_for_sp (p_lash=p_lash@entry=0x5636f0eef0b0, sw=sw@entry=0, dest_switch=dest_switch@entry=1, lane=lane@entry=0) a
t osm_ucast_lash.c:337
num_switches = 13
switches = 0x7f501c2a7d20
cdg_vertex_matrix = 0x7f501c2d0e00
next_switch =
output_link =
j =
exists =
v =
prev = 0x0

#2 0x00005636f0484aa8 in lash_core (p_lash=) at osm_ucast_lash.c:842
lanes_needed = 1
k =
dest_switch = 1
output_link =
cycle_found2 =
num_switches =
switches =
output_link2 =
Please advise here with a possible solution, help much appreciated.

Please advise here with a possible solution, help much appreciated.

Thank you in advance

Best Regards

libopensm version

opensm-3.3.23 release based on d35a20f. commit 16df3de bumps the version of libopensm 9:0:0 -> 10:0:1.

However, package opensm-libs-3.3.23 built from https://github.com/linux-rdma/opensm/releases/download/3.3.23/opensm-3.3.23.tar.gz includes /usr/lib64/libopensm.so.9.1.0. The version in shared library name mismatch with file libopensm/libopensm.ver.

This version mismatch will introduce serious RPM package and library API dependency issue.

(master)]$ cat libopensm/libopensm.ver

In this file we track the current API version

of the opensm common interface (and libraries)

The version is built of the following

tree numbers:

API_REV:RUNNING_REV:AGE

API_REV - advance on any added API

RUNNING_REV - advance any change to the vendor files

AGE - number of backward versions the API still supports

LIBVERSION=10:0:1

opensm high load issue

We deployed a large HPC cluster, which has more than 1k nodes in same subnet. We encountered opensm high load issue when one switch down. After ibdump the traffic on opensm master node, we find much path record request traffic which may cause opensm high load issue.

My question is:

  1. what scenario will cause path record request?
  2. Will one arp request cause one path record request?

Question about the release .tar.gz files

Hello,

This is a minor question about the download release .tar.gz files.

When downloading release files from the GitHub opensm release URL :
https://github.com/linux-rdma/opensm/releases

The file names seem to have the project name prepended to the expected file name and version number. The expected download file release name was constructed using the OFA previous release names. e.g. opensm-3.3.20.tar.gz

For example, Downloading a .tar.gz file for opensm-3.3.20 results in a file named opensm-opensm-3.3.20.tar.gz

This double project name results in the .tar.gz directories to have the double project name.
For example, extracting the 3.3.20 release .tar.gz file would result in this directory naming.
opensm-opensm-3.3.20/NEWS
opensm-opensm-3.3.20/README

Is this double project name planned and will be used going forward in future releases?

Thank you for your help.

Unresolved symbols in shared libraries

The shared objects have some unresolved symbols which the program which links against it must resolve. This isn't a good idea because when you introduce new dependencies the package previously linked against the old version will break because it doesn't know about the new dependency. Maybe you could link against all needed libraries so programs must not link against libraries which it doesn't need to use.

dpkg-shlibdeps: warning: symbol osm_log_v2 used by debian/libosmvendor4/usr/lib/x86_64-linux-gnu/libosmvendor.so.4.0.3 found in none of the libraries
dpkg-shlibdeps: warning: symbol osm_mad_pool_get used by debian/libosmvendor4/usr/lib/x86_64-linux-gnu/libosmvendor.so.4.0.3 found in none of the libraries
dpkg-shlibdeps: warning: symbol osm_mad_pool_put used by debian/libosmvendor4/usr/lib/x86_64-linux-gnu/libosmvendor.so.4.0.3 found in none of the libraries
dpkg-shlibdeps: warning: symbol osm_log used by debian/libosmvendor4/usr/lib/x86_64-linux-gnu/libosmvendor.so.4.0.3 found in none of the libraries
dpkg-shlibdeps: warning: symbol osm_dump_smp_dr_path used by debian/libosmvendor4/usr/lib/x86_64-linux-gnu/libosmvendor.so.4.0.3 found in none of the libraries
dpkg-shlibdeps: warning: symbol cl_atomic_spinlock used by debian/libopensm8/usr/lib/x86_64-linux-gnu/libopensm.so.8.0.0 found in none of the libraries
dpkg-shlibdeps: warning: symbol osm_vendor_get used by debian/libopensm8/usr/lib/x86_64-linux-gnu/libopensm.so.8.0.0 found in none of the libraries
dpkg-shlibdeps: warning: symbol cl_spinlock_init used by debian/libopensm8/usr/lib/x86_64-linux-gnu/libopensm.so.8.0.0 found in none of the libraries
dpkg-shlibdeps: warning: symbol cl_spinlock_acquire used by debian/libopensm8/usr/lib/x86_64-linux-gnu/libopensm.so.8.0.0 found in none of the libraries
dpkg-shlibdeps: warning: symbol cl_get_time_stamp used by debian/libopensm8/usr/lib/x86_64-linux-gnu/libopensm.so.8.0.0 found in none of the libraries
dpkg-shlibdeps: warning: symbol cl_spinlock_release used by debian/libopensm8/usr/lib/x86_64-linux-gnu/libopensm.so.8.0.0 found in none of the libraries
dpkg-shlibdeps: warning: symbol osm_vendor_put used by debian/libopensm8/usr/lib/x86_64-linux-gnu/libopensm.so.8.0.0 found in none of the libraries

git tags are not consistent

I noticed that the latest git tag is 3.3.21
This is not consistent with the other git tags in this repo, like opensm-3.3.20

I wrote a script that relies on the convention that all tags in the opensm repo starts with opensm- to fetch the correct version.
Would you be kind enough to add the following git tag: opensm-3.3.21

Thank you.

OpenSM: Add information about limitations

Please add some information on the limitations of OpenSM. From what I know these are

  1. No SR-IOV support
  2. Multicast sweeps may cause multicast micro-loops which can cause the SM to fail.
  3. No Multicast support for ConnectX5 and 6.

These are issues fixed in the Mellanox OFED OpenSM.

Please provide systemd service files

It would be nice if opensm comes with systemd service files. Otherwise each distribution would have to create its own service files and might diverge.

partitions.conf uses magic values for rate and mtu which are not documented

And experienced IB user know they need to add mtu=5 to their partitions.conf in order enable MTU size of 4096. What 5 means is buried in the specification and/or include files.

This patch has two related changes to the partitions.conf doc.
First, make tables of the magic values.
Secondly, change the example so one would be able to find it by searching for 4096 as its not useful to have the example be the default 2K.

libosmcomp.so calls exit

The shared library libosmcomp.so calls the C library exit() or _exit() functions.

In the case of an error, the library should instead return an appropriate error code to the calling program which can then determine how to handle the error, including performing any required clean-up.

Remove structure packing where not needed

[jgunthorpe wrote:]
The reason you can't take the address of a packed member is because it is not aligned, it is simply an error and you shouldn't ever do it - it will crash at runtime on ARM. If the member is actually aligned then don't use packed, but use the proper attribute aligned to tell the compiler what is happening and it won't complain.

605osm_vendor_ibumad.c:409:41: error: taking address of packed member 'trans_id' of
606 class or structure '_ib_mad' may result in an unaligned pointer value
607 [-Werror,-Waddress-of-packed-member]
608 if (!(p_req_madw = get_madw(p_vend, &p_mad->trans_id,
609 ^~~~~~~~~~~~~~~
610osm_vendor_ibumad.c:437:35: error: taking address of packed member 'trans_id' of
611 class or structure '_ib_mad' may result in an unaligned pointer value
612 [-Werror,-Waddress-of-packed-member]
613 p_req_madw = get_madw(p_vend, &p_mad->trans_id,
614 ^~~~~~~~~~~~~~~
615osm_vendor_ibumad.c:1211:22: error: taking address of packed member 'trans_id'
616 of class or structure '_ib_mad' may result in an unaligned pointer value
617 [-Werror,-Waddress-of-packed-member]
618 get_madw(p_vend, &p_mad->trans_id,
619 ^~~~~~~~~~~~~~~
6203 errors generated.

The above occurred in libvendor/osm_vendor_ibumad.c where p_mad->trans_id was being accessed and p_mad is a pointer to ib_mad_t.

This is mainly issue in ib_types.h

New release

Could we get a new openSM release ?
The last one is over 2 years old and with it moved to github, we can finally get a new one :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.