linux-rdma / opensm Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
autoreconf 2.69-11 (on Debian unstable) produces some warnings:
Makefile.am:17: warning: wildcard scripts/*: non-POSIX variable name
Makefile.am:17: (probably a GNU make extension)
complib/Makefile.am:20: warning: shell grep LIBVERSION= $(srcdir: non-POSIX variable name
complib/Makefile.am:20: (probably a GNU make extension)
complib/Makefile.am: installing 'config/depcomp'
libopensm/Makefile.am:20: warning: shell grep LIBVERSION= $(srcdir: non-POSIX variable name
libopensm/Makefile.am:20: (probably a GNU make extension)
libvendor/Makefile.am:22: warning: shell grep LIBVERSION= $(srcdir: non-POSIX variable name
libvendor/Makefile.am:22: (probably a GNU make extension)
opensm/Makefile.am:47: warning: ':='-style assignments are not portable
configure.ac: installing 'config/ylwrap'
osmeventplugin/Makefile.am:21: warning: shell grep LIBVERSION= $(srcdir: non-POSIX variable name
osmeventplugin/Makefile.am:21: (probably a GNU make extension)
The opensm man page provokes warnings or errors from man:
(unstable)root@host:~# LC_ALL=C.UTF-8 MANROFFSEQ='' MANWIDTH=80 man --warnings -E UTF-8 -l -Tutf8 -Z /usr/share/man/man8/opensm.8.gz >/dev/null
<standard input>:584: warning [p 9, 1.0i]: cannot adjust line
<standard input>:748: warning [p 11, 6.0i]: cannot adjust line
<standard input>:749: warning [p 11, 6.3i]: cannot adjust line
"cannot adjust" or "can't break" are trouble with paragraph filling, usually related to long lines. Adjustment can be helped by left justifying, breaks can be helped with hyphenation, see "Manipulating
Filling and Adjusting" and "Manipulating Hyphenation" in the groff manual (see info groff).
Lintian also stricter in regards to declaring manpage preprocessors.
To test this for yourself you can use the following command:
LC_ALL=en_US.UTF-8 MANROFFSEQ='' MANWIDTH=80 \
man --warnings -E UTF-8 -l -Tutf8 -Z <file> >/dev/null
Dear Developers
We are experiencing opensm (V 3.3.21 ) crash from time to time on debian hosts, our developers suspect that it's thing with lash algorithm, Please can you advise a fix here.
from crash report
Signal: 11
SourcePackage: opensm
Stacktrace:
#0 0x00005636f04835a9 in get_next_switch (p_lash=0x1, link=, sw=0) at osm_ucast_lash.c:337
No locals.
#1 generate_cdg_for_sp (p_lash=p_lash@entry=0x5636f0eef0b0, sw=sw@entry=0, dest_switch=dest_switch@entry=1, lane=lane@entry=0) a
t osm_ucast_lash.c:337
num_switches = 13
switches = 0x7f501c2a7d20
cdg_vertex_matrix = 0x7f501c2d0e00
next_switch =
output_link =
j =
exists =
v =
prev = 0x0
#2 0x00005636f0484aa8 in lash_core (p_lash=) at osm_ucast_lash.c:842
lanes_needed = 1
k =
dest_switch = 1
output_link =
cycle_found2 =
num_switches =
switches =
output_link2 =
Please advise here with a possible solution, help much appreciated.
Please advise here with a possible solution, help much appreciated.
Thank you in advance
Best Regards
opensm-3.3.23 release based on d35a20f. commit 16df3de bumps the version of libopensm 9:0:0 -> 10:0:1.
However, package opensm-libs-3.3.23 built from https://github.com/linux-rdma/opensm/releases/download/3.3.23/opensm-3.3.23.tar.gz includes /usr/lib64/libopensm.so.9.1.0. The version in shared library name mismatch with file libopensm/libopensm.ver.
This version mismatch will introduce serious RPM package and library API dependency issue.
(master)]$ cat libopensm/libopensm.ver
LIBVERSION=10:0:1
We deployed a large HPC cluster, which has more than 1k nodes in same subnet. We encountered opensm high load issue when one switch down. After ibdump the traffic on opensm master node, we find much path record request traffic which may cause opensm high load issue.
My question is:
Hello,
This is a minor question about the download release .tar.gz files.
When downloading release files from the GitHub opensm release URL :
https://github.com/linux-rdma/opensm/releases
The file names seem to have the project name prepended to the expected file name and version number. The expected download file release name was constructed using the OFA previous release names. e.g. opensm-3.3.20.tar.gz
For example, Downloading a .tar.gz file for opensm-3.3.20 results in a file named opensm-opensm-3.3.20.tar.gz
This double project name results in the .tar.gz directories to have the double project name.
For example, extracting the 3.3.20 release .tar.gz file would result in this directory naming.
opensm-opensm-3.3.20/NEWS
opensm-opensm-3.3.20/README
Is this double project name planned and will be used going forward in future releases?
Thank you for your help.
The shared objects have some unresolved symbols which the program which links against it must resolve. This isn't a good idea because when you introduce new dependencies the package previously linked against the old version will break because it doesn't know about the new dependency. Maybe you could link against all needed libraries so programs must not link against libraries which it doesn't need to use.
dpkg-shlibdeps: warning: symbol osm_log_v2 used by debian/libosmvendor4/usr/lib/x86_64-linux-gnu/libosmvendor.so.4.0.3 found in none of the libraries
dpkg-shlibdeps: warning: symbol osm_mad_pool_get used by debian/libosmvendor4/usr/lib/x86_64-linux-gnu/libosmvendor.so.4.0.3 found in none of the libraries
dpkg-shlibdeps: warning: symbol osm_mad_pool_put used by debian/libosmvendor4/usr/lib/x86_64-linux-gnu/libosmvendor.so.4.0.3 found in none of the libraries
dpkg-shlibdeps: warning: symbol osm_log used by debian/libosmvendor4/usr/lib/x86_64-linux-gnu/libosmvendor.so.4.0.3 found in none of the libraries
dpkg-shlibdeps: warning: symbol osm_dump_smp_dr_path used by debian/libosmvendor4/usr/lib/x86_64-linux-gnu/libosmvendor.so.4.0.3 found in none of the libraries
dpkg-shlibdeps: warning: symbol cl_atomic_spinlock used by debian/libopensm8/usr/lib/x86_64-linux-gnu/libopensm.so.8.0.0 found in none of the libraries
dpkg-shlibdeps: warning: symbol osm_vendor_get used by debian/libopensm8/usr/lib/x86_64-linux-gnu/libopensm.so.8.0.0 found in none of the libraries
dpkg-shlibdeps: warning: symbol cl_spinlock_init used by debian/libopensm8/usr/lib/x86_64-linux-gnu/libopensm.so.8.0.0 found in none of the libraries
dpkg-shlibdeps: warning: symbol cl_spinlock_acquire used by debian/libopensm8/usr/lib/x86_64-linux-gnu/libopensm.so.8.0.0 found in none of the libraries
dpkg-shlibdeps: warning: symbol cl_get_time_stamp used by debian/libopensm8/usr/lib/x86_64-linux-gnu/libopensm.so.8.0.0 found in none of the libraries
dpkg-shlibdeps: warning: symbol cl_spinlock_release used by debian/libopensm8/usr/lib/x86_64-linux-gnu/libopensm.so.8.0.0 found in none of the libraries
dpkg-shlibdeps: warning: symbol osm_vendor_put used by debian/libopensm8/usr/lib/x86_64-linux-gnu/libopensm.so.8.0.0 found in none of the libraries
I noticed that the latest git tag is 3.3.21
This is not consistent with the other git tags in this repo, like opensm-3.3.20
I wrote a script that relies on the convention that all tags in the opensm repo starts with opensm-
to fetch the correct version.
Would you be kind enough to add the following git tag: opensm-3.3.21
Thank you.
Please add some information on the limitations of OpenSM. From what I know these are
These are issues fixed in the Mellanox OFED OpenSM.
It would be nice if opensm comes with systemd service files. Otherwise each distribution would have to create its own service files and might diverge.
And experienced IB user know they need to add mtu=5 to their partitions.conf in order enable MTU size of 4096. What 5 means is buried in the specification and/or include files.
This patch has two related changes to the partitions.conf doc.
First, make tables of the magic values.
Secondly, change the example so one would be able to find it by searching for 4096 as its not useful to have the example be the default 2K.
The shared library libosmcomp.so calls the C library exit()
or _exit()
functions.
In the case of an error, the library should instead return an appropriate error code to the calling program which can then determine how to handle the error, including performing any required clean-up.
[jgunthorpe wrote:]
The reason you can't take the address of a packed member is because it is not aligned, it is simply an error and you shouldn't ever do it - it will crash at runtime on ARM. If the member is actually aligned then don't use packed, but use the proper attribute aligned to tell the compiler what is happening and it won't complain.
605osm_vendor_ibumad.c:409:41: error: taking address of packed member 'trans_id' of
606 class or structure '_ib_mad' may result in an unaligned pointer value
607 [-Werror,-Waddress-of-packed-member]
608 if (!(p_req_madw = get_madw(p_vend, &p_mad->trans_id,
609 ^~~~~~~~~~~~~~~
610osm_vendor_ibumad.c:437:35: error: taking address of packed member 'trans_id' of
611 class or structure '_ib_mad' may result in an unaligned pointer value
612 [-Werror,-Waddress-of-packed-member]
613 p_req_madw = get_madw(p_vend, &p_mad->trans_id,
614 ^~~~~~~~~~~~~~~
615osm_vendor_ibumad.c:1211:22: error: taking address of packed member 'trans_id'
616 of class or structure '_ib_mad' may result in an unaligned pointer value
617 [-Werror,-Waddress-of-packed-member]
618 get_madw(p_vend, &p_mad->trans_id,
619 ^~~~~~~~~~~~~~~
6203 errors generated.
The above occurred in libvendor/osm_vendor_ibumad.c where p_mad->trans_id was being accessed and p_mad is a pointer to ib_mad_t.
This is mainly issue in ib_types.h
Could we get a new openSM release ?
The last one is over 2 years old and with it moved to github, we can finally get a new one :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.