reference-lapack / lapack Goto Github PK

View Code? Open in Web Editor NEW

1.4K 64.0 408.0 30.41 MB

LAPACK development repository

License: Other

CMake 0.33% Makefile 0.26% Fortran 76.67% C 21.45% TeX 0.11% C++ 1.15% Python 0.02%

linear-algebra lapack blas lapacke linear-equations eigenvalues eigenvectors matrix-factorization singular-values svd

lapack's Introduction

LAPACK

VERSION 1.0 : February 29, 1992
VERSION 1.0a : June 30, 1992
VERSION 1.0b : October 31, 1992
VERSION 1.1 : March 31, 1993
VERSION 2.0 : September 30, 1994
VERSION 3.0 : June 30, 1999
VERSION 3.0 + update : October 31, 1999
VERSION 3.0 + update : May 31, 2000
VERSION 3.1 : November 2006
VERSION 3.1.1 : February 2007
VERSION 3.2 : November 2008
VERSION 3.2.1 : April 2009
VERSION 3.2.2 : June 2010
VERSION 3.3.0 : November 2010
VERSION 3.3.1 : April 2011
VERSION 3.4.0 : November 2011
VERSION 3.4.1 : April 2012
VERSION 3.4.2 : September 2012
VERSION 3.5.0 : November 2013
VERSION 3.6.0 : November 2015
VERSION 3.6.1 : June 2016
VERSION 3.7.0 : December 2016
VERSION 3.7.1 : June 2017
VERSION 3.8.0 : November 2017
VERSION 3.9.0 : November 2019
VERSION 3.9.1 : April 2021
VERSION 3.10.0 : June 2021
VERSION 3.10.1 : April 2022
VERSION 3.11.0 : November 2022
VERSION 3.12.0 : November 2023

LAPACK is a library of Fortran subroutines for solving the most commonly occurring problems in numerical linear algebra.

LAPACK is a freely-available software package. It can be included in commercial software packages (and has been). We only ask that that proper credit be given to the authors, for example by citing the LAPACK Users' Guide. The license used for the software is the modified BSD license.

Like all software, it is copyrighted. It is not trademarked, but we do ask the following: if you modify the source for these routines we ask that you change the name of the routine and comment the changes made to the original.

We will gladly answer any questions regarding the software. If a modification is done, however, it is the responsibility of the person who modified the routine to provide support.

LAPACK is available from GitHub. LAPACK releases are also available on netlib.

The distribution contains (1) the Fortran source for LAPACK, and (2) its testing programs. It also contains (3) the Fortran reference implementation of the Basic Linear Algebra Subprograms (the Level 1, 2, and 3 BLAS) needed by LAPACK. However this code is intended for use only if there is no other implementation of the BLAS already available on your machine; the efficiency of LAPACK depends very much on the efficiency of the BLAS. It also contains (4) CBLAS, a C interface to the BLAS, and (5) LAPACKE, a C interface to LAPACK.

Installation

LAPACK can be installed with make. The configuration must be set in the make.inc file. A make.inc.example for a Linux machine running GNU compilers is given in the main directory. Some specific make.inc are also available in the INSTALL directory.
LAPACK includes also the CMake build. You will need to have CMake installed on your machine. CMake will allow an easy installation on a Windows Machine. An example CMake build to install the LAPACK library under $HOME/.local/lapack/ is:
```
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=$HOME/.local/lapack ..
cmake --build . -j --target install
```
LAPACK can be built and installed using vcpkg dependency manager:
```
git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh  # ./bootstrap-vcpkg.bat for Windows
./vcpkg integrate install
./vcpkg install lapack
```
The lapack port in vcpkg is kept up to date by Microsoft team members and community contributors. If the version is out of date, please create an issue or pull request on the vcpkg repository.

User Support

LAPACK has been thoroughly tested, on many different types of computers. The LAPACK project supports the package in the sense that reports of errors or poor performance will gain immediate attention from the developers. Such reports, descriptions of interesting applications, and other comments should be sent by email to the LAPACK team.

A list of known problems, bugs, and compiler errors for LAPACK is maintained on netlib. Please see as well the GitHub issue tracker.

For further information on LAPACK please read our FAQ and Users' Guide. A user forum and specific information for running LAPACK under Windows. is also available to help you with the LAPACK library.

Testing

LAPACK includes a thorough test suite. We recommend that, after compilation, you run the test suite.

For complete information on the LAPACK Testing please consult LAPACK Working Note 41 "Installation Guide for LAPACK".

LAPACKE

LAPACK now includes the LAPACKE package. LAPACKE is a Standard C language API for LAPACK that was born from a collaboration of the LAPACK and INTEL Math Kernel Library teams.

lapack's People

Contributors

Stargazers

Watchers

Forkers

turboencabulator hjmjohnson vladimir-ch mikeaddison93 echeresh antonio-rojas sfabbro zerothi christoph-conrads khaledto apocalypsetank advanpix mgates3 sydha oamarques sirpkt tristonc hadim shyamalschandra iyamazaki kandatz whxloveyrh sc08d30 karturov dushyanthasd svillemot jschueller conansherry nickkolok evanchueng 0x56794e yongjianxu midasc kbergstr charwliu xreki tmijieux 1aq miroi jeffhammond caomw nareshkamboju effzeh wk1984 sivasatvik thielema gzmkl beachdweller walkoncross jflopezfernandez hunter-packages wqyun wuqiangcpp flyahead joypoint zhufenggnss jedibill2010 tkymakimoto gitnetlib velraman cncae raovgarimella andreasnoack taylerablake isuruf yangyang14641 grseb9s chkob luzpaz pysuxing timmoon10 acidburn0zzz besser82 tarciov daseven marcomaggi albertziegenhagel zacbayhan magnusbrattlof mithlesh4257 albert-github alichten ocsobservatory geocug jinfree malcolmroberts drwells calvinzel amigalemming saidctb heemansh kinetictheory benbrands skosukhin longbiscuit haidarazzam phcerdan jcfr caitlynclark xiaohr

lapack's Issues

[C,Z]GEBD2 should be checked in /TESTING/EIG/[c,z]errbd.f

[D,S]GEBD2 routines are being checked in /TESTING/EIG/[d,s]errbd.f.
For example:
$ less /TESTING/EIG/derrbd.f
. . . . .
91 * .. External Subroutines ..
92 EXTERNAL CHKXER, DBDSDC, DBDSQR, DBDSVDX, DGEBD2,
93 $ DGEBRD, DORGBR, DORMBR
. . . . .
143 *
144 * DGEBD2
145 *
146 SRNAMT = 'DGEBD2'
147 INFOT = 1
148 CALL DGEBD2( -1, 0, A, 1, D, E, TQ, TP, W, INFO )
149 CALL CHKXER( 'DGEBD2', INFOT, NOUT, LERR, OK )
150 INFOT = 2
151 CALL DGEBD2( 0, -1, A, 1, D, E, TQ, TP, W, INFO )
152 CALL CHKXER( 'DGEBD2', INFOT, NOUT, LERR, OK )
153 INFOT = 4
154 CALL DGEBD2( 2, 1, A, 1, D, E, TQ, TP, W, INFO )
155 CALL CHKXER( 'DGEBD2', INFOT, NOUT, LERR, OK )
156 NT = NT + 3
Also
*> SERRBD tests the error exits for SGEBD2, SGEBRD, SORGBR, SORMBR,
*> SBDSQR, SBDSDC and SBDSVDX
It seems, the same should be provided for CGEBD2 and ZGEBD2.

LAPACKE build problem with mingw

Over at OpenBLAS we received a report OpenMathLib/OpenBLAS#1056 stating that mingw fails to build LAPACKE as the list of object files in LAPACKE/src/Makefile is too long for the ar on that platform to handle. Would you consider a PR to split the object list as it is currently done in https://github.com/xianyi/OpenBLAS/blob/develop/lapack-netlib/LAPACKE/src/Makefile ?

Bug in zheevd

There is an issue with the routine zheevd when compiled with Intel fortran using the optimization -O2 or -O3.

The problem occurs at line 285 of dlaed4:

DELTA( J ) = ( D( J )-D( I ) ) - TAU

which evaluates to zero because of the compiler not respecting the parenthesis. This results in a division-by-zero later on. Including -assume protect_parens fixes the problem.

Here is a simple program which produces the error:

program test
implicit none

integer, parameter :: n=27
integer i,j
integer liwork,lrwork,lwork,info
integer, allocatable :: iwork(:)
real(8), allocatable :: w(:),rwork(:)
complex(8), allocatable :: a(:,:),work(:)
liwork=5*n+3
lrwork=2*n**2+5*n+1
lwork=n**2+2*n
allocate(w(n),a(n,n))
allocate(iwork(liwork),rwork(lrwork),work(lwork))

a(:,:)=0.d0
do i=1,n
  a(i,i)=1.d0
  do j=i+1,n
    a(i,j)=cos(dble(i))
  end do
end do

call zheevd('V','U',n,a,n,w,work,lwork,rwork,lrwork,iwork,liwork,info)

print *,'info',info

end program

I'm not sure whether this is a compiler, LAPACK or user issue. I've added -assume protect_parens to our code (elk.sourceforge.net) and made zheev as the default eigenvalue solver for the moment.

xLASWP: bug for negative INCX

In DLASWP, INCX is the increment for IPIV and can be positive or negative. If negative, the swaps are applied from K2 down to K1, otherwise from K1 to K2. From the documentation it is understood that the same elements of IPIV are accessed in both cases.

However, the implementation behaves differently. When INCX>0, the first accessed element is K1, then K1+INCX, ... until K1+(K2-K1)*INCX. When INCX<0, the first accessed element is

1+(1-K2)*INCX

when in fact it should be

K1+(K1-K2)*INCX

It seems that the 1 is missing a K in front. All the calls to DLASWP with INCX=-1 luckily use K1=1 in LAPACK.

Remove SVN references from git repository.

Testing should use git repository.

http://my.cdash.org/index.php?project=LAPACK

From E-mail correspondence

We are transitioning to GIT. I did a grep of SVN in the LAPACK directory, and it seems to me that CMAKE is still using the old SVN repository. I think it would be more appropriate if CMAKE would use GIT. By any chance, is it something with which you could help us?

It is for the CTEST Dashboard. http://my.cdash.org/index.php?project=LAPACK&date=2016-07-25
We used it from time to time, especially around release time.

LAPACK does not have any exceptional shift strategy for QZ

LAPACK current shift strategy is known to fail on various pencils. Here are six known pencils, (A1,B1), (A2,B2), (A3,B3) where LAPACK fails

Pencil (A1,B1) has been sent to us by Zbigniew Leyk, Senior Lecturer, Department of Computer Science, Texas A & M University on Jan 30 2007. See: http://icl.cs.utk.edu/lapack-forum/viewtopic.php?t=317

Pencil (A2,B2) has been sent to us by Daniel Kressner in 2005 who quote Vasile Sima.

A3 = [
 1  1  0;
 0  0 -1;
 1  0  0
];
 
B3 = [
-1  0 -1;
 0 -1  0;
-1  0 -1
];

Pencil (A3,B3) has been sent to us by Patrick Alken from University of Colorado at Boulder on Apr 24 2007. See: http://icl.cs.utk.edu/lapack-forum/viewtopic.php?t=382

A4 = [
 1     1e-9  0;
 1e-9  1     0;
 0     0     1
];
 
B4 = [
 1  0  0;
 0  1  0;
 0  0  1
];

Pencil (A4,B4) has been sent to us by Michael Baudin from the Scilab project on Tu Oct 28th 2008. This one pencil make ZGGEV fail. DGGEV handles it correctly. http://bugzilla.scilab.org/show_bug.cgi?id=3652

See as well:
http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=13&t=4357

And probably also:
http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=13&t=4855

The fixes

a-- Mathworks' patch

Bobby Cheng sent the lapackers the Mathworks patch to fix LAPACK QZ. This patch works fine on the pencils (A1,B1) and (A2,B2). But fails on pencil (A3,B3). As observed by Jim Demmel, matlab qz works fine with pencil (A3,B3).

Date: Thu, 15 Sep 2005 20:35:00 +0200 (CEST)
From: Daniel Kressner
Subject: Re: [Lapackers] Mathworks modif on LAPACK

Hi,

below are some comments on Mathworks' modifications to DGEBAL and DHGEQZ.

1)Change SCLFAC from 8 to 2 to be the same as LINPACK.

File:
cgebal.f
zgebal.f
sgebal.f
dgebal.f

I support the point of view taken by Mathworks. Changing SCLFAC from 8 to 2 may result in a few extra iterations of the balancing algorithm, but it may also give a balanced matrix of slightly smaller norm and consequently one or two more accurate digits in the computed eigenvalues. I am not aware that the execution time needed by balancing could be a major concern when solving nonsymmetric eigenvalues.

3)Change to implicit shift and exceptional shift calculations.
Some problem were not converging.

Files:
zhgeqz.f (lines 602-672)
chgeqz.f (lines 602-672)
shgeqz.f (lines 657-667, 819-821)
dhgeqz.f (lines 657-667, 819-821)

As far as I can see, Mathworks applied four changes to *HGEQZ (not all of them are marked in the code):

(1) Exceptional shift strategy

As also pointed out by Vasile Sima, the exceptional shift strategy for the QZ algorithm currently implemented in DHGEQZ seems to be less effective than the one for the QR algorithm. For example, it fails for the matrix pair

      [ 0 0 0 1 ]
      [ 1 0 0 0 ]
 A =  [ 0 1 0 0 ],  B = identity.
      [ 0 0 1 0 ]

LAPACK's exceptional shift is zero (this is the same as the standard shift -> no convergence). Mathworks' exceptional shift is one (much better).

(2) Choice of single real shift

DHGEQZ applies a single shift QZ algorithm if the computed shifts are real. This seems to be a relict of Ward's combination shift QZ algorithm, which has no meaningful purpose anymore (applying two single shift QZ iterations is more expensive than applying one double shift QZ iteration, even in terms of flops).

If two real shifts are computed, Mathworks' change has the effect that always the shift closer to the last diagonal element of A\B is used. If I remember correctly, this is in the spirit of what Wilkinson proposed for symmetric matrices. I support this change and expect that it can lead to slightly faster convergence.

(3) Mathworks included B22 = -B22 on line 820 (in DHGEQZ)

This is a bug fix and I highly recommend to include it. It seems that DHGEQZ computes wrong complex conjugate eigenvalue pairs if DLASV2 computes negative diagonal elements (I am not sure whether this is an unlikely event in the context of the QZ algorithm).

(4) Mathworks uses the workspace to store the number of iterations needed for each eigenvalue.

This is probably used for debugging and should not be included in LAPACK.

With best regards,
Daniel

b-- David Day's patch

I found in the lapackers archive a patch from David Day.

From: David Day (Sandia)
Date: Tue Sep 7 09:49:40 2004
Subject: [Lapackers] Re: [Fwd: Convergence failure of LAPACK's zggevx]

Dear Jim Demmel:

The reported problem does involve the Exceptional shifts in QZ. Changing the QZ exceptional shift to something similar to what I have advixed for real QR and real QZ does fix this particular bug report. In my version of zhgeqz.f lines 603 read

           ELSE
 *
 *           Exceptional shift.  Chosen for no particularly good reason.
 *
              ESHIFT = ESHIFT + DCONJG( ( ASCALE*A( ILAST-1, ILAST ) ) /
       $               ( BSCALE*B( ILAST-1, ILAST-1 ) ) )

After these lines, I added

              TEMP =
       $      ABS(ASCALE*A(ILAST-1,ILAST-2)/(BSCALE*B(ILAST-1,ILAST-1)))+
       $      ABS(ASCALE*A(ILAST  ,ILAST-1)/(BSCALE*B(ILAST-2,ILAST-2)))
              TEMP2 =
       $      ABS(ASCALE*A(ILAST-1,ILAST-2)/(BSCALE*B(ILAST-2,ILAST-2)))+
       $      ABS(ASCALE*A(ILAST  ,ILAST-1)/(BSCALE*B(ILAST-1,ILAST-1)))
              ESHIFT = DCMPLX(MIN(TEMP, TEMP2),0)

This changes the value of the shift, now set on line 616
SHIFT = ESHIFT>

Thank you for including me in this discussion.
--David M. Day

c-- Patrick Alken's patch

Date: Wed, 2 May 2007 16:14:55 -0600
From: Patrick Alken
Subject: Re: [Lapack] dggev fails on non-singular system

Hello,

I have done some testing and I have found that the following
patch fixes all the 3x3 and 4x4 matrix systems that I found failures
on:

Replace (in dhgeqz.f):

      629             IF( ( DBLE( MAXIT )*SAFMIN )*ABS( H( ILAST-1, ILAST) ).LT.
      630      $          ABS( T( ILAST-1, ILAST-1 ) ) ) THEN
      631                ESHIFT = ESHIFT + H( ILAST-1, ILAST ) /
      632      $                  T( ILAST-1, ILAST-1 )
      633             ELSE
      634                ESHIFT = ESHIFT + ONE / ( SAFMIN*DBLE( MAXIT ) )
      635             END IF

with:

      629             IF( ( DBLE( MAXIT )*SAFMIN )*ABS( H( ILAST, ILAST-1) ).LT.
      630      $          ABS( T( ILAST-1, ILAST-1 ) ) ) THEN
      631                ESHIFT = ESHIFT + 0.736 * H( ILAST, ILAST-1 ) /
      632      $                  T( ILAST-1, ILAST-1 )
      633             ELSE
      634                ESHIFT = ESHIFT + ONE / ( SAFMIN*DBLE( MAXIT ) )
      635             END IF

The idea here is the following: in the previous code, its certainly
possible for H(ILAST-1,ILAST) to equal 0, in which case the test
will pass but ESHIFT will not be modified at all. I have just
replaced it with H(ILAST,ILAST-1) which is guaranteed not to be
zero, since its a subdiagonal element and the previous tests will
search for zeros on the subdiagonal. The 0.736 coefficient is
arbitrary - without it, I still found a small number of convergence
failures on 4x4 integer systems.

Patrick Alken

xLABAD

This is related to the subroutines xLABAD.

SLABAD takes as input the values computed by SLAMCH for underflow and
overflow, and returns the square root of each of these values if the
log of LARGE is sufficiently large. This subroutine is intended to
identify machines with a large exponent range, such as the Crays, and
redefine the underflow and overflow limits to be the square roots of
the values computed by SLAMCH. This subroutine is needed because
SLAMCH does not compensate for poor arithmetic in the upper half of
the exponent range, as is found on a Cray.

Is that still relevant on current Cray machines?

I would assume all machines are using IEEE and such a routine is obsolete by now.

Travis CI MacOS environment broken

All of my recent pull requests are failing to build on Mac with this error

No CMAKE_Fortran_COMPILER could be found.

Did the environment change? I see this in build 99:

$ export CXX=g++
$ export CC=gcc
$ g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin13.4.0
Thread model: posix

and this in build 100:

$ export CXX=g++
$ export CC=gcc
$ g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

?LAQR2: incorrect documentation about updated block of Z

The documentation for ?LAQR2 says:

*>          accumulated into Z(ILOZ:IHIZ,ILO:IHI) from the right.

It should say (as in ?DLAQR3):

*>          accumulated into Z(ILOZ:IHIZ,ILOZ:IHIZ) from the right.

Size of WORK - LDWORK

In clarfb.f the name of the parameter for the size of WORK is LDWORK.
LDWORK has been changed to LWORK everywhere else. It seems, to follow the internal convention,
the size of WORK should be named LWORK too.

Help text for DORCSD2BY1

DORCSD2BY1:
The help text for DORCSD2BY1 (and similarly for SORCSD2BY1, CUNCSD2BY1, ZUNCSD2BY1) does not tell me exactly how to build the output matrices from LAPACK's output. It gives the following formula:

                            [  I  0  0 ]
                            [  0  C  0 ]
      [ X11 ]   [ U1 |    ] [  0  0  0 ]
  X = [-----] = [---------] [----------] V1**T .
      [ X21 ]   [    | U2 ] [  0  0  0 ]
                            [  0  S  0 ]
                            [  0  0  I ]

and defines that X is M-by-Q, X11 is P-by-Q, and thus X12 is (M-P)-by-Q, and U1, U2, and V1 are P-by-P, (M-P)-by-(M-P), and Q-by-Q, respectively. The help text also defines that the matrices C and S are R-by-R, with R = MIN(P,M-P,Q,M-Q). But the sizes of the other blocks in the formula are not defined.

After quite some experimentation, I think that the top left matrix I should have size T-by-T, with T = max(Q+P-M, 0), and the bottom right matrix I should have size K-by-K, with K = max(Q-P, 0). The sizes of the other blocks can be derived from these two blocks, but finding these two formulas took me quite a long time on my own, and should really be documented directly.

Version number across all files?

What do the version numbers in each file correspond to? Do they have a meaning or are they meant to be updated at each release?

LAPACKE_?larfb_work: bad work matrix layout

LAPACKE_?larfb_work passes work and ldwork to the Fortran routine unmodified even if the matrix layout is row major. This is extremely surprising bordering a bug because it means that the work matrix has to be column major while all the other matrices are treated as row major. Interestingly, LAPACKE_?larfb follows the game and the work matrix it allocates (due to ldwork it uses) is column major.

I might be wrong but the only reasonable solution I see is not to allocate work in LAPACKE_?larfb and allocate it in LAPACKE_?larfb_work instead while ignoring work and ldwork parameters passed to it.

It seems that LAPACKE_?tprfb_work and LAPACKE_?trsna_work suffer from the same problem.

Bug in estimation of test error in [c,d,s,z]gebal with usage SNRM2/SCNRM2

There is the following difference in /LAPACK/[s,d]gebal.f between 3.6.0 and 3.5.0:
In LAPACK-3.5.0, C and R are calculated as follows:

       DO 200 I = K, L
         C = ZERO
         R = ZERO

          DO 150 J = K, L
             IF( J.EQ.I )
     $         GO TO 150
             C = C + ABS( A( J, I ) )
             R = R + ABS( A( I, J ) )
150 CONTINUE
. . . . . .
200 CONTINUE

I.e. | Aii | is not added to C and R.

In LAPACK-3.6.0, C and R are calculated as follows:

DO 200 I = K, L

C = SNRM2( L-K+1, A( K, I ), 1 )
R = SNRM2( L-K+1, A( I, K ), LDA )
. . . .
200 CONTINUE

See /BLAS1/snrm2.f

_ SNRM2 := sqrt( x'_x )
Naturally, there will be another result, because SNRM2 is the euclidean norm and Aii elements are included in calculations.

The same situation is in the case of LAPACK/cgebal.f
In LAPACK-3.5.0, to calculate C and R (estimation of norm) they use a special function
* .. Statement Function definitions ..
CABS1( CDUM ) = ABS( REAL( CDUM ) ) + ABS( AIMAG( CDUM ) )
In LAPACK-3.5.0, C and R are calculated as follows:

      DO 200 I = K, L
         C = ZERO
         R = ZERO

         DO 150 J = K, L
            IF( J.EQ.I )
     $         GO TO 150
            C = C + CABS1( A( J, I ) )
            R = R + CABS1( A( I, J ) )
150    CONTINUE
. . . . .
200 CONTINUE
Also   CABS1( Aii ) are not added to C and R.
In LAPACK-3.6.0, C and R are calculated as follows:
      DO 200 I = K, L

C = SCNRM2( L-K+1, A( K, I ), 1 )
R = SCNRM2( L-K+1, A( I , K ), LDA )
. . . . .
200 CONTINUE

See /BLAS1/scnrm2.f.
* SCNRM2 returns the euclidean norm of a vector via the function
* name, so that

_ SCNRM2 := sqrt( conjg( x' )_x )

SCNRM2 should work correctly, because it is successfully used by a lot ot other routines.
Any case, Aii are used in calculations of C and R.

Why have you changed the working algorithm?
I think, balancing works correctly, because those changes were connected only with estimations of
test error.

The difference in results, for example, in cbal.out
In LAPACK-3.5.0:
.. test output of CGEBAL ..
value of largest test error            =    0.000E+00
example number where info is not zero =    0
example number where ILO or IHI wrong =    0
example number having largest error    =    0
number of examples where info is not 0 =    0
total number of examples tested        =   13

In LAPACK-3.6.0:
.. test output of CGEBAL ..
value of largest test error            =    0.875E+00
example number where info is not zero =    0
example number where ILO or IHI wrong =    0
example number having largest error    =    6
number of examples where info is not 0 =    0
total number of examples tested        =   13

This is definitely because Aii are not calculated in one case (in LAPACK-3.5.0) and are calculated in another (in LAPACK-3.6.0).
I did the following experiment:
I used cgebal.f from LAPACK-3.5.0, but commented the IF condition:
      DO 200 I = K, L
         C = ZERO
         R = ZERO
*
         DO 150 J = K, L
*            IF( J.EQ.I )
*     $         GO TO 150
            C = C + CABS1( A( J, I ) )
            R = R + CABS1( A( I, J ) )
150    CONTINUE
. . . . .
200 CONTINUE

And i've received the same result, as in the case of LAPACK-3.5.0.

.......
.. test output of CGEBAL ..
value of largest test error            =    0.875E+00
example number where info is not zero =    0
example number where ILO or IHI wrong =    0
example number having largest error    =    6
number of examples where info is not 0 =    0
total number of examples tested        =   13

If that is a bug and get fixed in 3.6, should the test also be changed? Or is the "value of largest test error = 0.875E+00" acceptable as passed?

xSYEQUB, xHEEQUB is a Misnomer

The difference between xyyEQU and xyyEQUB is usually that the latter avoids round-off errors by returning scaling matrices who entries are powers of the floating point radix and the xGEEQUB documentation explains the difference very well. The xPOEQU and xPOEQUB obey the same naming rule but the documentation does not highlight the difference so you have to look at the source code to understand the difference. Seeing that xSYEQUB, xHEEQUB do not return radix powers, the routines are misnomers.

I am currently preparing a patch for the xPOEQUB documentation that highlights the difference to xPOEQU.

The Sca/LAPACK Program Style guide forbids breaking backward compatibility (see here) but clearly, it is a misnomer. In face of the broken documentation of xSYEQUB, xHEEQUB, I wonder if anyone is actually using this routine. In addition, these routines are not implemented in Intel MKL so I propose to ignore the backwards compatibility issue and just start rounding the entries of the scaling matrix to radix powers.

Tipos in comments in /SRC/cbbcsd.f

In /SRC/cbbcsd.f
Instead of
276 *> RWORK is REAL array, dimension (MAX(1,LWORK))
277 *> On exit, if INFO = 0, WORK(1) returns the optimal LWORK.
Should be:
276 *> RWORK is REAL array, dimension (MAX(1,LRWORK))
277 *> On exit, if INFO = 0, RWORK(1) returns the optimal LRWORK.
The same for /SRC/zbbcsd.f

CBLAS tests failing with GCC compiler.

These tests are failing on GCC compilers 4.9.1
version 4.9.1 20140922 (Red Hat 4.9.1-10) (GCC)

https://travis-ci.org/Reference-LAPACK/lapack/jobs/147765341

 13 - CBLAS-xscblat1 (Failed)
 14 - CBLAS-xscblat2 (Failed)
 15 - CBLAS-xscblat3 (Failed)
 16 - CBLAS-xdcblat1 (Failed)
 17 - CBLAS-xdcblat2 (Failed)
 18 - CBLAS-xdcblat3 (Failed)
 19 - CBLAS-xccblat1 (Failed)
 20 - CBLAS-xccblat2 (Failed)
 21 - CBLAS-xccblat3 (Failed)
 22 - CBLAS-xzcblat1 (Failed)
 23 - CBLAS-xzcblat2 (Failed)
 24 - CBLAS-xzcblat3 (Failed)

I verified these failures on a RHEL6 machine using These tests are failing on GCC compilers 4.9.1
version 4.9.1 20140922 (Red Hat 4.9.1-10) (GCC).

There is unused var JDUM described in /TESTING/EIG/serrgg.f

$less /TESTING/EIG/serrgg.f
. . . . .
80 INTEGER DUMMYK, DUMMYL, I, IFST, ILO, IHI, ILST, INFO,
81 $ J, M, NCYCLE, NT, SDIM, LWORK, JDUM

Should be:
80 INTEGER DUMMYK, DUMMYL, I, IFST, ILO, IHI, ILST, INFO,
81 $ J, M, NCYCLE, NT, SDIM, LWORK

LAPACKE_?larfb_work: wrong direction

Apart from #37, there is another bug in LAPACKE_?larfb_work. When transposing the input matrices, in the last, fourth, case when storev is 'r', the check for direct should compare it to 'b', not to 'f'.

Routine naming conventions: rook codes

The newly added rook3 codes which is based on BLAS-level 3 are named *_rk while the older rook routines are named *_rook.

As I suggested in #82, I think they should be named more similarly (here is a suggestion which may not at all reflect their intend and actual code):

rename _rook to _rook2, and rename _rk to rook3?
rename _rook to _rookkb2, and rename _rk to rookkb3? (rook-Bunch-Kaufman)
rename _rook to _rkkb2, and rename _rk to rkkb3? (rook-Bunch-Kaufman)
rename _rk to rook3

'exit_level_0' unreferenced label in *syconv

This label exit_level_0: in *syconv is not used, is there anything missing?

?TREXC: documented requirement on LDQ too strict

The documentation for LDQ says that

LDQ >= max(1,N).

irrespective of the value of COMPQ. The code actually takes COMPQ into account:

IF( LDQ.LT.1 .OR. ( WANTQ .AND. LDQ.LT.MAX( 1, N ) ) )

It seems that the documentation should be updated to:

LDQ >= 1, and if COMPQ = 'V', LDQ >= max(1,N).

Inconsistent calls to [cz]gemm from [cz]trevc3.f?

Hi,

I'm trying to apply f2c[*] to the latest LAPACK release, and I get:

ctrevc3.f:
ctrevc3:
Warning on line 596 of ctrevc3.f: inconsistent calling sequences for cgemm,

arg 6: here real variable, previously complex variable.

arg 6 is ALPHA, which is COMPLEX, so I think this is a valid warning, and the actual parameter ONE should be CONE.

Is my analysis correct?

Similar for ztrevc3.f.

[*] Yeah, I know that f2c cannot translate all of the latest LAPACK, but for the platform I'm targeting that's the best I have.

Kees van Reeuwijk

Unused dummy arguments in *LARRV

The VU argument in *LARRV is currently not being used to bound the eigenvalue calculation.

Is this omitted on purpose?

Furthermore, there is no check whether VL < VU as prescribed in the documentation.

Compiler warnings in tests: Rank mismatch

Compiler warnings are generated for:
~/lsrc/lapack-bld> gfortran --version
GNU Fortran (SUSE Linux) 4.8.3 20140627 [gcc-4_8-branch revision 212064]
Copyright (C) 2013 Free Software Foundation, Inc.

GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING

[ 8%] /user/iibi/johnsonhj/lsrc/lapack/CBLAS/testing/c_dblat1.f:214.48:

           CALL STEST1(DNRM2TEST(N,SX,INCX),STEMP,STEMP,SFAC)       
                                            1

Warning: Rank mismatch in argument 'strue1' at (1) (scalar and rank-1)
/user/iibi/johnsonhj/lsrc/lapack/CBLAS/testing/c_dblat1.f:218.48:

           CALL STEST1(DASUMTEST(N,SX,INCX),STEMP,STEMP,SFAC)       
                                            1

Warning: Rank mismatch in argument 'strue1' at (1) (scalar and rank-1)

[ 8%] Building C object CBLAS/testing/CMakeFiles/xscblat2.dir/c_sblas2.c.o
/user/iibi/johnsonhj/lsrc/lapack/CBLAS/testing/c_sblat1.f:214.48:

           CALL STEST1(SNRM2TEST(N,SX,INCX),STEMP,STEMP,SFAC)       
                                            1

Warning: Rank mismatch in argument 'strue1' at (1) (scalar and rank-1)
/user/iibi/johnsonhj/lsrc/lapack/CBLAS/testing/c_sblat1.f:218.48:

           CALL STEST1(SASUMTEST(N,SX,INCX),STEMP,STEMP,SFAC)       
                                            1

Warning: Rank mismatch in argument 'strue1' at (1) (scalar and rank-1)

Many bugs and incorrect documentation in {s,d}lasd{1,2,3}, {s,d}lasd{6,7,8}, and {s,d}laed2

The documentation for the deflation process in {s,d}lasd2 claims that the deflated singular values are sorted in increasing order, but they are in fact more-or-less sorted in decreasing order, with this constraint broken in certain cases when there was a small update vector entry deflation before a deflation of index JPREV due to a small diagonal difference deflation. As can be seen from {s,d}lasd1's call to {s,d}lamrg, the deflated portion of D is claimed to be in decreasing order, which turns out not to be necessarily true for the mentioned reason. I would therefore recommend calling DLASRT rather than DLAMRG to avoid breaking the assumption that the two subsets are already sorted.

Also, there is no d(j)-d(0) <= tol check within the initial deflation loop starting at line 425 of dlasd2. One could easily fix this by checking if d(j) is less than the tolerance rather than after checking if |z(j)| is, but this requires a Givens rotation to mix the first and j'th columns of U (and V), which destroys the fact that the first column of U2 would only have a single nonzero entry (in general, said column would become dense), so the packing logic would need to be changed to incorporate a rank-one update rather than copying a single row of the secular left singular vectors into the updated left singular vectors.

Further, the writes to DSIGMA during the deflation loop in {s,d}lasd2 are superfluous given the subsequent filling of DSIGMA at line 555 of dlasd2.

Further, no absolute value should be required on line 457 of dlasd2 (and the analogue in slasd2) since the diagonal entries are already sorted.

Further, the check on line 568 of dlasd2 of whether abs(DSIGMA(2)) <= tol/2 is strange for two reasons:

DSIGMA(2) >= 0, so the absolute value is superfluous
If DSIGMA(2) < tol, then it must have been deflated due to the small diagonal difference condition on line 457 . But then DSIGMA(2) is essentially the largest singular value due to the fact that the deflated components in DSIGMA are more or less stored in descending order. But this appears to be a contradiction given the definition of the deflation tolerance unless the matrix was of a sufficiently small dimension for the perturbations of a descending order to significantly break this argument (e.g., a pathological 3x3 matrix). Either way, some sort of comment is warranted for this check given the many subtle errors in the current implementation. Fixing the above-mentioned missing deflation criterion for the d(j) <= tol might obsolete this check.

Also, the documentation for U and V in {s,d}lasd3 incorrectly suggests that only the last N-K vectors are relevant, when, on exit, they provide the singular vectors for the merged bidiagonal problem.

Excessive size of man pages on 3.6.1

Dear Development Team ,
is it possible to have a manpage file that includes
only the manual and not also all the Doxygen
extra stuff as on previous release ?

$ ls -sh manpages-3.6.0.tgz
1.4M manpages-3.6.0.tgz

$ ls -sh manpages-3.6.1.tgz
9.6M manpages-3.6.1.tgz

All the additional Doxygen stuff does not belong to a normal man page:

...
Definition at line 152 of file zupmtr&.f&.
.PP
.nf
152 *
153 * -- LAPACK computational routine (version 3&.4&.0) --
154 * -- LAPACK is a software package provided by Univ&. of Tennessee, --
155 * -- Univ&. of California Berkeley, Univ&. of Colorado Denver and NAG Ltd&.&.--
156 * November 2011

[cut]

Bug in the comments to CTREVC3 / ZTREVC3

In /SRC/ctrevc3.f
instead of
21 * SUBROUTINE CTREVC3( SIDE, HOWMNY, SELECT, N, T, LDT, VL, LDVL,
22 * VR, LDVR, MM, M, WORK, LWORK, RWORK, INFO )
should be
21 * SUBROUTINE CTREVC3( SIDE, HOWMNY, SELECT, N, T, LDT, VL, LDVL,
22 * VR, LDVR, MM, M, WORK, LWORK, RWORK, LRWORK, INFO )
The same in the case of /SRC/ztrevc3.f

ZTREVC3

The double real constant ONE is passed into the routine ZGEMM at line 596 of ZTREVC3.

ZGEMM expects a double complex argument. This should be changed to CONE.

Inconsistent documentation of length of work array of sgesvj

The documentation of 'work' in sgesvj states that it should be of size max(4,M+N), whereas de documentation of 'lwork' in sgesvj states that it should be 'length of WORK, WORK >= MAX(6,M+N)'.

First of all, this is bad phrasing: WORK >= MAX(6,M+N) is an incorrect expression; clearly len(WORK) >= MAX(6, M+N) is intended. Second, this contradicts the documentation of 'work'. max(6, M+N) seems to be correct, because the documentation of 'work' mentions that the first six values of 'work' have meaning on exit.

Suggestion of changing at 'make.inc.example'.

Just a suggestion of changing at 'make.inc.example' file, to replace the line:

MAKEDEPRECATED = Yes

BUILD_DEPRECATED = Yes

With 'MAKEDEPRECATED = Yes' the deprecated routines are not build.

DSTEMR

DSTEMR may return INFO>0. There was a bug in LAPACK 3.5.0 that was troublesome: a matrix of dimension 100 (a sort of perturbation of the identity matrix, provided by the developers of the R package), caused a disastrous infinite loop in one of the subroutines in the calling tree of DSTEMR (symmetric tridiagonal eigensolver based on the MRRR algorighm). That was because of NaNs that the safeguard mechanisms implemented therein failed to deal with. As a fix, the value of a logical flag that was related to the infinite loop was changed, starting with LAPACK 3.6.0. With that change the risk of an infinite loop has disappeared. However, we found out that that change to the logical variable could lead to unexpected consequences: DSTEMR may return INFO>0 (indicating an internal error in the auxiliary subroutine DLARRV) for some matrices that are not particularly “pathologic.” We are in the process of redesigning a few of the underpinnings of DSTEMR (including DLARRV) to make it more robust.

101 unused symbols

Hi,

Since I see that you're interested in unused symbols in the sources, here is a list of exactly 101 of them. I think some of them have been covered by other issues, but I don't think all of them have.

unused.txt

Kees van Reeuwijk

Travis CI on MacOS: binary `jq` missing

Checking the why the Travis build #99.2 of a patch failed only on MacOS, I noticed that the binary jq is missing and consequently, the environment variable BRANCH is always empty:

$ export BRANCH=$(if [ "$TRAVIS_PULL_REQUEST" == "false" ]; then echo $TRAVIS_BRANCH; else echo `curl -s $PR | jq -r .head.ref`; fi)

/Users/travis/build.sh: line 45: jq: command not found

(23) Failed writing body

LAPACKE: bad transpose in ?laswp_work

For row-major matrices the function uses lda_t = max(1,lda) which is wrong, it should use the number of rows. The number of rows is however not readily available, only the lower bound can be extracted from ipiv. The lower bound should be enough for the purpose of the function (to do the matrix copy), and it seems to me that it is actually the only way to implement this function at all.

As a side note, an issue with the number of rows that is not directly available in lapacke_?laswp has arisen before. In 834c77d it was handled by omitting the NaN check.

[DS]TREXC: when can a 2x2 block split into two 1x1 blocks?

A large part of [DS]TREXC code deals with the case when a 2x2 block splits into two 1x1 blocks but it is not clear to me when that can happen.

On entry, T is assumed to be block upper triangular with 2x2 diagonal blocks in Schur canonical form. That is, any 2x2 block corresponds to a pair of complex conjugate eigenvalues. When can such a block split into two 1x1 blocks, that is, into a 2x2 block that has two real eigenvalues? One possible explanation is that the code is prepared to handle the case when a 2x2 block is not in Schur canonical form and can have two real eigenvalues which is revealed after the block passes through Dlaexc. Is that so or am I missing something?

Issues with WORK1 in ?GEQR and ?GELQ

I'm testing new QR routines and it seems that operations with WORK1 array won't be obvious for users. I see two possible issues here:

There is a convention in LAPACK that optimal size of workspace is returned in the first element of workspace but in the routines ?GEQR and ?GELQ it is returned in the second element for WORK1 and it can be misleading for users. It would be better to return optimal size of WORK1 in the first element, minimal size in the second element, the type of algorithm in the third etc (but only the optimal size may be returned - see the second issue).
Usually user passes one-element work array for workspace query. But in ?GEQR and ?GELQ its size should be at least 5 elements and it also may cause issues. So I think the better option here is to fill only the first element in workspace query - optimum size of WORK1 and fill other elements (row block size, column block size, algorithm type, ...) only inside computation call.

?TREXC: unavoidable error return when N=0

The documentation and the code indicate that N=0 is allowed. However, in that case the error checking for IFST (and ILST) leads to an unavoidable failure:

      ELSE IF( IFST.LT.1 .OR. IFST.GT.N ) THEN
         INFO = -7
      ELSE IF( ILST.LT.1 .OR. ILST.GT.N ) THEN
         INFO = -8
      END IF

Add tags for releases

I think it would be nice to have the lapack repo track the releases of lapack, consistently for future easy backtracks.

I have tracked the 3.6.0 and 3.6.1 versions down to:
v3.6.0: 5944165
v3.6.1: 5efe359

To confirm I have downloaded lapack-3.6.0.tgz and lapack-3.6.1.tgz and performed:

diff -r git-lapack/ lapack-3.6.X

and there where no differences.

I had difficulties from 3.5.0 due to inconsistent history. For instance I get the minimal diff:
using:
5eb7207
but there is one patch missing (introduced in: 1039d78)
But there are commits in between, so an exact hash is not available.

I haven't tried any versions prior to 3.5.0.

PS. I have added (annotated) tags on my fork:
https://github.com/zerothi/lapack/releases
but one cannot create PR for equivalent branches (assigning tags does not change history).

I hope you will consider adding the versions.

Error in SGESVDX in Lapack3.6.1 (also in 3.6.0)

There looks like a bug in SGESVDX in Lapack3.6.1 (also in 3.6.0):

At the beginning, SGESVDX check the NORM of A and then calls SLASCL to modify the values of the elements if they are extreme (line 506-514). At the end calls SLASCL to adjust back the values of elements of S if A has been modified (line 817-814). The problem is that, in many cases, S is not changed even A is modified. That means that the modification in A does not get transfered into S. There should be no adjustment need for S. And this unnecessary adjustment causes overflow or underflow in many cases. An exampleis:

JOBU='V', JOBVT='V', RANGE='V', N=1, and A and S have some big numbers. A gets modifiled but S does not change (look at the call to SBDSVDX). Calling to SLASCL at the end causes overflow.

Workaround: check if S is changed before calling to SLASCL at the end. This may not be real fix.

Incorrect respond in the case when an input value of the 15th parameter is incorrect

$ less sgejsv.
.

  SUBROUTINE SGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,

                     M, N, A, LDA, SVA, U, LDU, V, LDV,

                     WORK, LWORK, IWORK, INFO )

LDV is the 15th argument.
If the input value of LDV is incorrect, there should be a message that the
15th argument is incorrect, but there is a message about the 14th parameter,
because
564 ELSE IF ( RSVEC .AND. ( LDV .LT. N ) ) THEN
565 INFO = - 14
Should be:
564 ELSE IF ( RSVEC .AND. ( LDV .LT. N ) ) THEN
565 INFO = - 15
The same for sgejsv.f and dgejsv.f - incorrect, but correct for cgejsv.f and zgejsv.f.

possible issue dggsvd3 function

In a very simplified way, I created the following function in C that uses the Generalized SVD function dggsvd3.
[U,V,Q,C,S] = my_gsvd_in_C(A,B), where the inputs are:

A =    [[ 1.,  1.,  2.],
       [ 0.,  1.,  3.],
       [ 0.,  4.,  3.]]
B =   [[ 0.2,  1.4,  2. ],
       [ 0.4,  1. ,  3.2],
       [ 0.3,  4.5,  3.3]]

The outputs that I get are the following:

U =    [[-0.45904757,  0.22321234, -0.8599137 ],
       [-0.88154988, -0.23451243,  0.40972397],
       [-0.11020501,  0.94613961,  0.30442518]]

V =   [[-0.38233328,  0.1538968 ,  0.91111856],
       [-0.92047714, -0.149751  , -0.36096602],
       [-0.0808894 ,  0.97667313, -0.19891328]]

C = [ 0.69954115,  0.64582169,  0.99967773]

S = [ 0.71459232,  0.76348828,  0.02538585]

Q =  [[-0.09327252, -0.55839886,  0.82431241],
       [ 0.70757717,  0.54528323,  0.44944494],
       [ 0.70045328, -0.6251855 , -0.34425035]]

My problem is with the output of Q.
This output is completely different from the one from MATLAB.
Calling the function gsvd from MATLAB, with the same inputs, I have the following value for Q

Q =
    0.3456   -0.6562    0.8602
    5.8426   -2.5466   -0.7678
    3.9969   -5.5656   -0.4228

I don't know what is going on. I can send my C code if needed.
Sorry for any duplicate message, I posted the same issue in the LAPACK forum.

Thanks in advance for any answer

A bug in SGESVDX

There looks like a bug in SGESVDX in Lapack3.6.1 (also in 3.6.0):

JOBU='V', JOBVT='V', RANGE='V', N=1, and A and S have some big numbers. A gets modifiled but S does not change (look at the call to SBDSVDX). Calling to SLASCL at the end causes overflow.

Workaround: check if S is changed before calling to SLASCL at the end. This may not be real fix.

Thanks,
Shandong Lao

ZGELSD

ZGELSD returns INFO>0 for a matrix of dimension 23068-by-23066, resulting from DLASD4 (root finder) exceeding its maximum number of iterations. The problem is not in DLASD4 per se, but in the merging/sorting/deflation of singular values of (two) submatrices. The subroutine that does that, DLASD7, is failing to do the sorting that is later required by DLASD4. The issue in this case is that the 23068-by-23066 matrix has a very large number (1000’s) of singular values extremely close to each other and the merging/sorting/deflation step is not working as intended. The problem is under investigation; it might be related to the issue #34 created by Jack Poulson on Aug 7, 2016.

Obs. ZGELSD computes the minimum-norm solution to a real linear least squares problem
minimize 2-norm(| b - A*x |)
using the singular value decomposition (SVD), which is computed by a divide-and-conquer algorithm.

Bug in stgsna.f / dtgsna.f, which moves to incorrect SVD calculations.

$ cat t.f
      PROGRAM BUG_STGSNA
      IMPLICIT NONE
      INTEGER  INFO, LWORK, M, MM, IWORK(20), i, j
      REAL     A(2,2), B(2,2), RWORK( 40 )
      REAL     VL(2,4), VR(2,4), S(2), DIF(2)
      COMPLEX  C( 2, 2 ), U( 2, 2 ), VT( 2, 2 )
      COMPLEX  WORK( 20 )
      LOGICAL  SELECT(2)
      C(1,1) = (1.0,1.0)
      C(2,1) = 1.0 
      C(1,2) = (-1.0,1.0)  
      C(2,2) = - 1.0  
      LWORK  = 40
      CALL CGESVD('A','A',2,2,C,2,S,U,2,VT,2,WORK,LWORK,RWORK,INFO)  
      SELECT(1) = .TRUE.
      SELECT(2) = .TRUE. 
      A(1,1) = 1.0
      A(1,2) = 1.0
      A(2,1) = -A(1,2)
      A(2,2) = A(1,1)
      B(1,1) = 1.0
      B(1,2) = 0.0
      B(2,1) = 0.0
      B(2,2) = 1.0
      PRINT 10
      PRINT 11,((A(i,j),j=1,2),i=1,2)
      PRINT 12
      PRINT 11,((B(i,j),j=1,2),i=1,2)
      PRINT 13, S(2)
      MM = 2
      CALL STGEVC('B','S',SELECT,2,A,2,B,2,VL,2,VR,2,
     &      MM,M,RWORK,IWORK,INFO) 
      CALL STGSNA('B','S',SELECT,2,A,2,B,2,VL,2,
     &VR,2,S,DIF,MM,M,RWORK,LWORK,IWORK,INFO)
      PRINT 14, DIF(2)
  10  FORMAT(1X,'INPUT matrix A')
  11  FORMAT(1X,2F5.0)
  12  FORMAT(1X,'INPUT matrix B')
  13  FORMAT(1X,'Correct DIF(after CGESVD) ',F5.2)
  14  FORMAT(1X,'DIF after STGSNA          ',F5.2)
      END

The result is:

 INPUT matrix A
    1.   1.
   -1.   1.
 INPUT matrix B
    1.   0.
    0.   1.
 Correct DIF(after CGESVD)  0.87
 DIF after STGSNA           0.62

Instead of the correct result:

 INPUT matrix A
    1.   1.
   -1.   1.
 INPUT matrix B
    1.   0.
    0.   1.
 Correct DIF(after CGESVD)  0.87
 DIF after STGSNA           0.87

Solution: correct in stgsna.f and dtgsna.f: (lines 637-639).
Instead of

637       ROOT1 = C1 + SQRT( C1*C1-4.0*C2 )
638       ROOT2 = C2 / ROOT1
639       ROOT1 = ROOT1 / TWO

Should be:

637       ROOT1 = C1 + SQRT( C1*C1-FOUR*C2 )
638       ROOT1 = ROOT1 / TWO
639       ROOT2 = C2 / ROOT1

Git repository not listed on netlib web pages

The online web pages located at:

http://www.netlib.org/lapack/#_browse_download_lapack_routines_with_on_line_documentation_browser

http://www.netlib.org/blas/#_svn_access

Are linking to the SVN repository.

I did a google search with the following query
site:http://www.netlib.org/ "icl.cs.utk.edu/svn/lapack-dev/lapack/trunk"

Hans

line duplication in latest 3.7.0 causes failure of shared lib creation

hello,

while creating the shared version of latest lapack (3.7.0), I get:

liblapack.a(ztplqt.o): In function ztplqt_': ztplqt.f:(.text+0x0): multiple definition of ztplqt_'
liblapack.a(ztplqt.o):ztplqt.f:(.text+0x0): first defined here
liblapack.a(ztplqt2.o): In function ztplqt2_': ztplqt2.f:(.text+0x0): multiple definition of ztplqt2_'
liblapack.a(ztplqt2.o):ztplqt2.f:(.text+0x0): first defined here
liblapack.a(ztpmlqt.o): In function ztpmlqt_': ztpmlqt.f:(.text+0x0): multiple definition of ztpmlqt_'
liblapack.a(ztpmlqt.o):ztpmlqt.f:(.text+0x0): first defined here

Actually, I can find a duplicate line in both SRC/Makefile and
SRC/CMakeLists.txt:

SRC/Makefile:
...
ztplqt.o ztplqt2.o ztpmlqt.o \ <===== DUP
zgelqt.o zgelqt3.o zgemlqt.o
zgetsls.o zgeqr.o zlatsqr.o zlamtsqr.o zgemqr.o
zgelq.o zlaswlq.o zlamswlq.o zgemlq.o
ztplqt.o ztplqt2.o ztpmlqt.o \ <===== DUP
...

SRC/CMakeLists.txt:
...
ztplqt.f ztplqt2.f ztpmlqt.f <===== DUP
zgelqt.f zgelqt3.f zgemlqt.f
zgetsls.f zgeqr.f zlatsqr.f zlamtsqr.f zgemqr.f
zgelq.f zlaswlq.f zlamswlq.f zgemlq.f
ztplqt.f ztplqt2.f ztpmlqt.f <===== DUP
...

Removing the line duplication fixes everything for me.

ciao
gabriele

LAPACKE_WITH_TMG is not tested on TRAVIS dashboard

The LAPACKE_WITH_TMG option is not tested on the TRAVIS dashboard.

To minimize surprises after code changes, this should be added to patch tests.

This work can not be completed until pull request #16 is completed.

ppc64le - Segfault during test: `./xeigtstz < nep.in > znep.out 2>&1`

Hi, I built lapack with xlf on a ppc64le (POWER8) machine. I used the options in INSTALL\make.inc.XLF, although for my BLASLIB, I switched to ../../librefblas.a.

During make, I encountered the following error:

** zunt01   === End of Compilation 1 ===
"zunt01.f", 1500-036 (I) The NOSTRICT option (default at OPT(3)) has the potential to alter the semantics of a program.  Please refer to documentation on the STRICT/NOSTRICT option for more information.
1501-510  Compilation successful for file zunt01.f.
xlf -O3 -qfixed -qnosave -c zunt03.f -o zunt03.o
** zunt03   === End of Compilation 1 ===
"zunt03.f", 1500-036 (I) The NOSTRICT option (default at OPT(3)) has the potential to alter the semantics of a program.  Please refer to documentation on the STRICT/NOSTRICT option for more information.
1501-510  Compilation successful for file zunt03.f.
\
          xlf -qnosave -o xeigtstz \
          zchkee.o zbdt01.o zbdt02.o zbdt03.o zbdt05.o zchkbb.o zchkbd.o zchkbk.o zchkbl.o zchkec.o zchkgg.o zchkgk.o zchkgl.o zchkhb.o zchkhs.o zchkst.o zckcsd.o zckglm.o zckgqr.o zckgsv.o zcklse.o zcsdts.o zdrges.o zdrgev.o zdrges3.o zdrgev3.o zdrgsx.o zdrgvx.o zdrvbd.o zdrves.o zdrvev.o zdrvsg.o zdrvst.o zdrvsx.o zdrvvx.o zerrbd.o zerrec.o zerred.o zerrgg.o zerrhs.o zerrst.o zget02.o zget10.o zget22.o zget23.o zget24.o zget35.o zget36.o zget37.o zget38.o zget51.o zget52.o zget54.o zglmts.o zgqrts.o zgrqts.o zgsvts3.o zhbt21.o zhet21.o zhet22.o zhpt21.o zhst01.o zlarfy.o zlarhs.o zlatm4.o zlctes.o zlctsx.o zlsets.o zsbmv.o zsgt01.o zslect.o zstt21.o zstt22.o zunt01.o zunt03.o dlafts.o dlahd2.o dlasum.o dlatb9.o dstech.o dstect.o dsvdch.o dsvdct.o dsxt1.o alahdg.o alasum.o alasvm.o alareq.o ilaenv.o xerbla.o xlaenv.o chkxer.o ../../libtmglib.a \
          ../../liblapack.a ../../librefblas.a && mv xeigtstz ../xeigtstz
make[2]: Leaving directory `/home/u0017592/projects/lapack/TESTING/EIG'
NEP: Testing Nonsymmetric Eigenvalue Problem routines
./xeigtstz < nep.in > znep.out 2>&1
/bin/sh: line 1: 29872 Segmentation fault      ./xeigtstz < nep.in > znep.out 2>&1
make[1]: *** [znep.out] Error 139
make[1]: Leaving directory `/home/u0017592/projects/lapack/TESTING'
make: *** [lapack_testing] Error 2

Any idea what the problem might be? Is reference lapack tested on ppc64le architecture?

LAPACKE_?trexc_work: ldq must be always >= n

In row-major case, LAPACKE_?trexc_work requires ldq >= n even when compq is not 'v', that is, when the matrix Q is not referenced. The ldq parameter is always used conditionally only if compq is 'v', so the ldq >= n check should be also done only in that case.

In ?TREXC the corresponding check is:

IF( LDQ.LT.1 .OR. ( WANTQ .AND. LDQ.LT.MAX( 1, N ) ) ) THEN
   INFO = -6