dcache / dcache Goto Github PK

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods

Home Page: https://dcache.org

Java 94.99% HTML 0.32% CSS 0.42% JavaScript 2.04% AspectJ 0.04% Makefile 0.01% Shell 0.83% Ruby 0.14% Python 0.04% XSLT 0.89% Roff 0.10% RobotFramework 0.05% sed 0.01% Dockerfile 0.01% Less 0.09% Pascal 0.03% BitBake 0.01%

dcache java storage data-exchange gridftp xrootd webdav nfs-server exchange-data

dcache's Introduction

dCache

dCache is a system for storing and retrieving huge amounts of data, distributed among a large number of heterogeneous server nodes, under a single virtual filesystem tree with a variety of standard access methods. Depending on the Persistency Model, dCache provides methods for exchanging data with backend (tertiary) Storage Systems as well as space management, pool attraction, dataset replication, hot spot determination and recovery from disk or node failures. Connected to a tertiary storage system, the cache simulates unlimited direct access storage space. Data exchanges to and from the underlying HSM are performed automatically and invisibly to the user. Beside HEP specific protocols, data in dCache can be accessed via NFSv4.1 (pNFS), FTP as well as through WebDav.

Documentation

The dCache book

User Guide

Getting Started

The file BUILDING.md describes how to compile dCache code and build various packages.

The file also describes how to create the system-test deployment, which provides a quick and easy way to get a working dCache. Running system-test requires no special privileges and all the generated files reside within the code-base.

There are also packages of stable releases at https://www.dcache.org/downloads/.

License

The project is licensed under AGPL v3. Some parts licensed under BSD and LGPL. See the source code for details.

For more info, check the official dCache.ORG web page.

Contributors

dCache is a joint effort between Deutsches Elektronen-Synchrotron DESY, Fermi National Accelerator Laboratory and Nordic DataGrid Facility.

How to contribute

dCache uses the linux kernel model where git is not only source repository, but also the way to track contributions and copyrights.

Each submitted patch must have a "Signed-off-by" line. Patches without this line will not be accepted.

The sign-off is a simple line at the end of the explanation for the patch, which certifies that you wrote it or otherwise have the right to pass it on as an open-source patch. The rules are pretty simple: if you can certify the below:


    Developer's Certificate of Origin 1.1

    By making a contribution to this project, I certify that:

    (a) The contribution was created in whole or in part by me and I
         have the right to submit it under the open source license
         indicated in the file; or

    (b) The contribution is based upon previous work that, to the best
        of my knowledge, is covered under an appropriate open source
        license and I have the right under that license to submit that
        work with modifications, whether created in whole or in part
        by me, under the same open source license (unless I am
        permitted to submit under a different license), as indicated
        in the file; or

    (c) The contribution was provided directly to me by some other
        person who certified (a), (b) or (c) and I have not modified
        it.

    (d) I understand and agree that this project and the contribution
        are public and that a record of the contribution (including all
        personal information I submit with it, including my sign-off) is
        maintained indefinitely and may be redistributed consistent with
        this project or the open source license(s) involved.

then you just add a line saying ( git commit -s )

Signed-off-by: Random J Developer <[email protected]>

using your real name (sorry, no pseudonyms or anonymous contributions.)

We use an adapted version of the Google style guide for Java that can be found in the root of this project for IntelliJ. The used reformatting involves optimization of imports (reordering), application of all syntactical sugar settings, but does not include code rearrangement (fields, methods, classes) or code cleanup for existing code. Reformatting should be applied to the changed code before submitting a patch.

dcache's People

Contributors

Stargazers

Watchers

Forkers

kofemann paulmillar alrossi gbehrmann calestyo chrber s0525775 spawn89 dmitrylitvintsev daniel-meister sureddy fabiomartinelli vkoduru femiadeyemi cloudxtreme mksahakyan anupash mcaubet tiggi kiae-grid ivanugrin martbhell indigo-dc playbar andymn constantine97a yimengtianya emrul vector20240801 bhyvex gusev-vitaliy huaxiufeng fizmat cneric patrickfuhrmann rwenz3l yava90 fritzheiden jaceblake marisanest solen0id martin0711 lalottacontinua stefaus alenaschemmert stlcours makemedroid chohannesthurn alexgedeon mjohnki riobijanero laz0rsquid hwittlich geonmo mlassnig marcelmu ferdinandwolff vingar ram1999 yaoyaochecknow lemora svemeyer moiikana charlotte321 cyb3a maykakifovski meowmi michaelschwabe jobegrabber jpostholt gjyligashi83 yasmin1717 tomdittrich skymysky beaver-company hhy5277 kiakel linuxerwang xinliliu bryanchance pixelport vladimirvil hamletson lilitps arnika777 s0559289 daligro quentinkuth ebruelis2018 uellue thdesy stefanmarkmann revilofr iatkachenko wenhuizhang enixdark onnozweers surfsara-e-infra rontrompert holajiawei

dcache's Issues

tab expansion large possibilities offer misses a newline

Hey.

When expanding a set that has a lot of members (e.g. hitting tab below after the lcg-lrz-dc, which would be all our pools), then dCache prints the:
"Display all 341 possibilities? (y or n)" offer.

However, there's a newline missing, so the input line is cluttered up:
dCache admin > \c lcg-lrz-dcDisplay all 341 possibilities? (y or n)

instead it should be:
dCache admin > \c lcg-lrz-dc
Display all 341 possibilities? (y or n)
and editing should continue in the actual line (not below), so the cursor has to be moved back afterwards.

Cheers,
Chris.

context/transfers.txt gone

It seems the context/transfers.txt file from the web admin interface is gone somewhere between 2.11 and 2.13...

dCache xrootd door sha2 compliance

Hi,

I use this command to test the issue:
xrdcp -d 1 -f root://uct2-s5.mwt2.org:1096/pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/mc11_7TeV/f1/3d/NTUP_HSG2.01388734._000063.root.1 /dev/null

when using SHA1 compliant x509 proxy everything works:
sec_PM: Loading gsi protocol object from libXrdSecgsi.so
140320 15:38:50 9347 secgsi_InitOpts: *** ------------------------------------------------------------ ***
140320 15:38:50 9347 secgsi_InitOpts: Mode: client
140320 15:38:50 9347 secgsi_InitOpts: Debug: 1
140320 15:38:50 9347 secgsi_InitOpts: CA dir: /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/etc/grid-security-emi/certificates
140320 15:38:50 9347 secgsi_InitOpts: CA verification level: 1
140320 15:38:50 9347 secgsi_InitOpts: CRL dir: /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/etc/grid-security-emi/certificates
140320 15:38:50 9347 secgsi_InitOpts: CRL extension: .r0
140320 15:38:50 9347 secgsi_InitOpts: CRL check level: 1
140320 15:38:50 9347 secgsi_InitOpts: CRL refresh time: 86400
140320 15:38:50 9347 secgsi_InitOpts: Certificate: /home/ivukotic/.globus/usercert.pem
140320 15:38:50 9347 secgsi_InitOpts: Key: /home/ivukotic/.globus/userkey.pem
140320 15:38:50 9347 secgsi_InitOpts: Proxy file: /home/ivukotic/x509_Proxy_old_cert
140320 15:38:50 9347 secgsi_InitOpts: Proxy validity: 12:00
140320 15:38:50 9347 secgsi_InitOpts: Proxy dep length: 0
140320 15:38:50 9347 secgsi_InitOpts: Proxy bits: 512
140320 15:38:50 9347 secgsi_InitOpts: Proxy sign option: 1
140320 15:38:50 9347 secgsi_InitOpts: Proxy delegation option: 0
140320 15:38:50 9347 secgsi_InitOpts: Allowed server names: uct2-s5.uchicago.edu|uct2-s6.uchicago.edu|uct2-s20.uchicago.edu
140320 15:38:50 9347 secgsi_InitOpts: Crypto modules: ssl
140320 15:38:50 9347 secgsi_InitOpts: Ciphers: aes-128-cbc:bf-cbc:des-ede3-cbc
140320 15:38:50 9347 secgsi_InitOpts: MDigests: sha1:md5
140320 15:38:50 9347 secgsi_InitOpts: *** ------------------------------------------------------------ ***
sec_PM: Using gsi protocol, args='v:10200,c:ssl,ca:c7a717ce'
140320 15:38:50 9347 cryptossl_X509::IsCA: certificate has 6 extensions
140320 15:38:50 9347 secgsi_VerifyCA: Warning: CA certificate not self-signed and integrity not checked: assuming OK (c7a717ce.0)
140320 15:38:50 9347 cryptossl_X509::IsCA: certificate has 4 extensions
140320 15:38:50 9347 cryptossl_X509::IsCA: certificate has 9 extensions
140320 15:38:50 9347 cryptossl_X509::IsCA: certificate has 9 extensions
140320 15:38:51 9347 Xrd: Open: Access to server granted.

with the SHA2 compliant x509 it does not work:
sec_PM: Loading gsi protocol object from libXrdSecgsi.so
140320 15:36:20 7157 secgsi_InitOpts: *** ------------------------------------------------------------ ***
140320 15:36:20 7157 secgsi_InitOpts: Mode: client
140320 15:36:20 7157 secgsi_InitOpts: Debug: 1
140320 15:36:20 7157 secgsi_InitOpts: CA dir: /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/etc/grid-security-emi/certificates
140320 15:36:20 7157 secgsi_InitOpts: CA verification level: 1
140320 15:36:20 7157 secgsi_InitOpts: CRL dir: /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/etc/grid-security-emi/certificates
140320 15:36:20 7157 secgsi_InitOpts: CRL extension: .r0
140320 15:36:20 7157 secgsi_InitOpts: CRL check level: 1
140320 15:36:20 7157 secgsi_InitOpts: CRL refresh time: 86400
140320 15:36:20 7157 secgsi_InitOpts: Certificate: /home/ivukotic/.globus/usercert.pem
140320 15:36:20 7157 secgsi_InitOpts: Key: /home/ivukotic/.globus/userkey.pem
140320 15:36:20 7157 secgsi_InitOpts: Proxy file: /home/ivukotic/x509_Proxy_new_cert
140320 15:36:20 7157 secgsi_InitOpts: Proxy validity: 12:00
140320 15:36:20 7157 secgsi_InitOpts: Proxy dep length: 0
140320 15:36:20 7157 secgsi_InitOpts: Proxy bits: 512
140320 15:36:20 7157 secgsi_InitOpts: Proxy sign option: 1
140320 15:36:20 7157 secgsi_InitOpts: Proxy delegation option: 0
140320 15:36:20 7157 secgsi_InitOpts: Allowed server names: uct2-s5.uchicago.edu|uct2-s6.uchicago.edu|uct2-s20.uchicago.edu
140320 15:36:20 7157 secgsi_InitOpts: Crypto modules: ssl
140320 15:36:20 7157 secgsi_InitOpts: Ciphers: aes-128-cbc:bf-cbc:des-ede3-cbc
140320 15:36:20 7157 secgsi_InitOpts: MDigests: sha1:md5
140320 15:36:20 7157 secgsi_InitOpts: *** ------------------------------------------------------------ ***
sec_PM: Using gsi protocol, args='v:10200,c:ssl,ca:c7a717ce'
140320 15:36:20 7157 cryptossl_X509::IsCA: certificate has 6 extensions
140320 15:36:20 7157 secgsi_VerifyCA: Warning: CA certificate not self-signed and integrity not checked: assuming OK (c7a717ce.0)
140320 15:36:20 7157 cryptossl_X509::IsCA: certificate has 4 extensions
140320 15:36:20 7157 cryptossl_X509::IsCA: certificate has 9 extensions
140320 15:36:20 7157 cryptossl_X509::IsCA: certificate has 9 extensions
140320 15:36:20 7157 Xrd: CheckErrorStatus: Server [uct2-s5.mwt2.org:1096] declared: Internal server error (null)(error code: 3012)
140320 15:36:20 7157 Xrd: DoAuthentication: Internal server error (null)
sec_Client: protocol request for host uct2-s5.mwt2.org token=''
XrdSec: No authentication protocols are available.
140320 15:36:20 7157 Xrd: Open: Authentication failure: Internal server error (null): unable to get protocol object
xrdcp: Copy from uct2-s5.mwt2.org failed on open!
xrdcp: Internal server error (null): unable to get protocol object.

In the log file I see:
20 Mar 2014 15:35:56 (Xrootd-uct2-s5) [] xrootd server error while processing org.dcache.xrootd.protocol.messages.AuthenticationRequest@78167a1d (please report this to [email protected])
java.lang.NullPointerException: null

debian package doesn't clean up created user/group

Hey.

The debian package creates the dcache user and group.
Both aren't cleaned up on package removal respectively purge, and are thus left over uncleanly.

Cheers,
Chris.

srm logs: Illegal state transition from Canceled to Failed

Our SRM logs periodically shows entries like these:

27 May 2015 13:27:03 (SRM) [YFY:126:srm2:ls:-1686872496:-1686872421] Illegal State Transition : Illegal state transition from Canceled to Failed

Such errors indicate bugs and should not happen.

systemd-ify dCache

Hey.

I've brought this up several times before at meetings, and I think it's good to have it in the records and a "bug" where ideas and stuff like that can be collected.

As the title says, there should be support for systemd in dCache (i.e. with respect to service start/stop/etc.).
But such support shouldn't be just a "dumb" conversion of a sysvinit init script to systemd - as you perhaps know systemd provides many features which especially dCache could benefit from.

generator units
The idea would be, to have a unit generator, which dynamically creates units based on the respective node's dcache.conf and layout.
So one would have for example a
dcache.service
which is more or less the same what our current init script is - start it and all domains get started, stop it, all domains stopped and so on.

But in addition there would be the dynamically generated units, things like:
[email protected]
[email protected]
[email protected]
[email protected]
and so on, which of course affect just the respective domain.

As you can see this would at least in parts replace the "dcache" script.
It could also speed things up, like parallel start/stop of the domains.

It could also be explored whether this could be used to make sensible dependencies or orderings.

watchdog functionality
handle restarting of crashed domains at the systemd level, and not the dCache level
maybe even socket based activation
Usage of jailing functionality
e.g. restricting domains to only /var and other locations they actually need
or setting resource usage limits

Cheers,
Chris.

Messages dropped, despite queue containing sufficient capacity

There's a problem that results in the following log messages like:

24 May 2015 12:35:40 (DCap-gsi-prometheus) [] Dropping message: Task Delivery-of-MessageEvent(source=<CM: S=[>DCap-gsi-prometheus@dCacheDomain];D=[>SRM-prometheus@dCacheDomain];C=dmg.cells.services.login.LoginBrokerInfo;O=<1432463740975:343310>;LO=<1432463740975:343309>>) rejected from java.util.concurrent.ThreadPoolExecutor@74c6add[Running, pool size = 100, active threads = 0, queued tasks = 1, completed tasks = 51660]

This shows a message is dropped despite the message queue having sufficient capacity to hold it.

This exact message is triggered by commit c8fa056, but the effect is likely present earlier.

The effect is intermittent, with some messages being delivered without any problem.

Since commit c8fa056, the dropped messages include those that are the replacement for loginbroker: the LoginBrokerInfo topic. Currently the srm and info services are the subscribers of the LoginBrokerInfo topic.

The failure to deliver messages means that doors disappear from srm and info and reappear once a message is successfully. There are two knock-on effects:

doors will disappear and reappear from info-provider,
transfers initiated using SRM will fail if the door(s) to support the transfer appear absent to the srm door.

status of #8032 ?

Hi,

I wanted to check what was the final solution of the issue #8032 that was discussed over mail, but I can't find what bug reporting system are you using.
Anyhow that issue was concerning interference between gridFTP doors and xrootd4j plugin. This was preventing us from having the two doors run at the same machine.

Cheers,
Ilija

nfs door does not times out when talking to PoolManager

If requested file is not available, then PoolManager will put request into suspended state. Nevertheless NFS door must wait not longer then 3s and reply 'try later' to the client. This does not happen.

xrootd CDC/session missing for some operations in door

With prometheus (current tip of master) I see the following in the access log file:

level=INFO ts=2015-06-25T02:37:05.216+0200 event=org.dcache.xrootd.connection.start session=door:xrootd:AAUZTNKcCyA [...]
level=INFO ts=2015-06-25T02:37:07.896+0200 event=org.dcache.xrootd.request request=protocol response=ok
level=INFO ts=2015-06-25T02:37:07.983+0200 event=org.dcache.xrootd.login session=door:xrootd:AAUZTNKcCyA [...]
level=INFO ts=2015-06-25T02:37:07.987+0200 event=org.dcache.xrootd.request request=open path=VOs/desy/generated/SET_testfi[...]
level=INFO ts=2015-06-25T02:37:48.270+0200 event=org.dcache.xrootd.request request=open path=VOs/desy/generated/SET_testfil[...]
level=INFO ts=2015-06-25T02:37:48.279+0200 event=org.dcache.xrootd.connection.end session=door:xrootd:AAUZTNKcCyA

Note that the protocol and open requests have no session value.

Looking at prometheus history, I see "event=org.dcache.xrootd.request" have never included the session in the access log.

The session comes directly from CDC, suggesting the response for these requests are being processed with a thread that hasn't had the correct CDC restored.

pinboard date/time

Hi.

According to the release notes:

Timestamps in the in-memory pinboard log used to follow a 12 hour format.
The format has now been localized, meaning it will follow the conventions of the current local.

However this seems not to be the case (2.13.4).

E.g. I have:
dcache.log.format.pinboard=%d{yyyy-MM-dd HH:mm:ssXXX} [%t] [%X{org.dcache.ndc}] %m

And the pinboard looks like:
dCache admin > show pinboard
3:39:49 PM 2015-07-21 15:39:49+02:00 [pool-3-thread-1] [door:xrootd_lcg-lrz-dc62:AAUbYsnAtWA xrootd_lcg-lrz-dc62 PoolMgrSelectReadPool 0000CECB36DC19DA48EC9D4D6E8A5D67DA0D] AskIfAvailable found the object

while the locale is set as follows:
root@lcg-lrz-dcache:# date +%X
15:57:14
root@lcg-lrz-dcache:# date +%x
2015-07-21
root@lcg-lrz-dcache:~# date +%c
2015-07-21 15:57:16

So no format would match the actually used "3:39:49 PM".

Moreover, the above shows IMHO quite clearly what I've reported already earlier:
There should be simply no fixed time in the pinboard, i.e. the "3:39:49 PM" should go away and instead a dcache.log.format.pinboard default value should be set that contains the date/time, as I did above.

Cheers,
Chris.

xrootd logs stack-trace on malformed request

Part of DESY testing involves running nmap on open ports. For prometheus, this includes the xrootd door. The nmap invocation has it attempt to identify the kind of endpoint by sending content and watching the response.

When receiving badly formatted input, xrootd door logs a stack-trace. The following is an example of this:

24 Jun 2015 15:12:04 (xrootd) [door:xrootd:AAUZQ0DV9CE] Uncaught exception in thread xrootd-net-0
io.netty.handler.codec.DecoderException: java.lang.IllegalStateException
        at     io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:358) ~[netty-codec-4.0.28.Final.jar:4.0.28.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:230) ~[netty-codec-4.0.28.Final.jar:4.0.28.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) [netty-transport-4.0.28.Final.jar:4.0.28.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) [netty-transport-4.0.28.Final.jar:4.0.28.Final]
        at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.28.Final.jar:4.0.28.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) [netty-transport-4.0.28.Final.jar:4.0.28.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) [netty-transport-4.0.28.Final.jar:4.0.28.Final]
        at org.dcache.xrootd.door.NettyXrootdServer$SessionHandler.channelRead(NettyXrootdServer.java:282) [dcache-xrootd-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) [netty-transport-4.0.28.Final.jar:4.0.28.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) [netty-transport-4.0.28.Final.jar:4.0.28.Final]
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) [netty-transport-4.0.28.Final.jar:4.0.28.Final]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) [netty-transport-4.0.28.Final.jar:4.0.28.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) [netty-transport-4.0.28.Final.jar:4.0.28.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) [netty-transport-4.0.28.Final.jar:4.0.28.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) [netty-transport-4.0.28.Final.jar:4.0.28.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) [netty-transport-4.0.28.Final.jar:4.0.28.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.28.Final.jar:4.0.28.Final]
        at org.dcache.util.CDCThreadFactory$1.run(CDCThreadFactory.java:53) [dcache-core-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
Caused by: java.lang.IllegalStateException: null
        at com.google.common.base.Preconditions.checkState(Preconditions.java:158) ~[guava-18.0.jar:na]
        at org.dcache.xrootd.protocol.messages.AbstractXrootdRequest.<init>(AbstractXrootdRequest.java:38) ~[xrootd4j-2.1.0.jar:2.1.0]
        at org.dcache.xrootd.protocol.messages.HandshakeRequest.<init>(HandshakeRequest.java:30) ~[xrootd4j-2.1.0.jar:2.1.0]
        at org.dcache.xrootd.core.XrootdDecoder.decode(XrootdDecoder.java:60) ~[xrootd4j-2.1.0.jar:2.1.0]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:327) ~[netty-codec-4.0.28.Final.jar:4.0.28.Final]
        ... 18 common frames omitted

make poolmanager.conf output reproducible

Hi.

Every time the poolmanager.conf is save(d) it, or respectively subparts of it, seem to be arbitrarily re-ordered,... e.g. the created pools and so on.

This is kinda annoying when e.g. tracking the conf file with git, as diffs (that should only show the actual changes) are completely cluttered up.

It would be nice if the poolmanager.conf could be written in the same order (favourably version sort) for each of the respective subparts.

Also, I think one could/should drop all date/time information in the comments when the file was created.
This is already available in the mtime of the file.

Cheers,
Chris.

Setting ACLs with chimera command line utility fails with NoClassDefFoundError

chimera:/disk# setfacl test USER:501:+r:fo
Exception in thread "main" java.lang.NoClassDefFoundError: org/dcache/acl/parser/ACEParser
at org.dcache.chimera.cli.Shell$SetFaclCommand.call(Shell.java:759)
at org.dcache.chimera.cli.Shell$SetFaclCommand.call(Shell.java:657)
at org.dcache.util.cli.AnnotatedCommandExecutor.execute(AnnotatedCommandExecutor.java:130)
at org.dcache.util.cli.CommandInterpreter$CommandEntry.execute(CommandInterpreter.java:260)
at org.dcache.util.cli.CommandInterpreter.doExecute(CommandInterpreter.java:169)
at org.dcache.util.cli.CommandInterpreter.command(CommandInterpreter.java:155)
at org.dcache.util.cli.ShellApplication.execute(ShellApplication.java:149)
at org.dcache.util.cli.ShellApplication.console(ShellApplication.java:191)
at org.dcache.util.cli.ShellApplication.start(ShellApplication.java:84)
at org.dcache.chimera.cli.Shell.main(Shell.java:83)
Caused by: java.lang.ClassNotFoundException: org.dcache.acl.parser.ACEParser
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 10 more

Pool migration command fails with 'No targets' if destination already contains the file

The following affects at least 2.12. When copying files like this:

migration copy -target=pool -- pool_read

the command fails and sleeps whenever it runs into a file that already exists on the target pool:

12:10:05 [4] 000007A86FED654D4664A0FE65CC968195AE: No targets

The expected behaviour is that the file is skipped/updated.

In case multiple targets are available, the migration module will create superfluous replicas instead of producing an error.

Implement kXR_query config "sitename" response

Currently, dCache implements a handful of responses to the kXR_query config request (such as readv_iov_mx). See http://xrootd.org/doc/dev4/XRdv300.htm#_Toc372751551 for a reference.

CMS is starting to use the value of the "sitename" response to identify the sites. This way, when a site causes an error, instead of saying an error occurred at 202.122.33.118, we can say "Beijing-LCG2".

Having the human-readable site name in the exception message allows users and operators to better triage tickets before they get to the experts.

I was thinking about tying this to the value of "info-provider.site-unique-id" if set. Would there be interest in accepting such a patch?

Nearline storage timeouts are not saved to pool setup file

In 2.6, the xx set timeout values are persisted in the pool setup file when executing the save command. This functionality was lost in 2.10.

srm logs: Illegal state transition from Failed to RetryWait

Our SRM sometimes logs

24 May 2015 17:39:59 (SRM) [IEA:1637339:srm2:bringOnline:-1700199168:-1700199167] Illegal State Transition : Illegal state transition from Failed to RetryWait

These errors indicate bugs and should not happen.

loadConfig.sh in 2.6.18 Solaris package broken

Line 114 at /usr/share/dcache/lib/loadConfig.sh includes "hostname -s" command inside, which sets the hostname to "-s" on solaris systems.

Server: SunOS xxxxxxxxxxxx 5.10 Generic_147441-19 i86pc i386 i86pc
dCache version 2.6.18

debian packages dependencies - add descriptions for the Recommends/Suggests

Hi.

The Debian packages have several dependencies, including some packages which are just recommended/suggested:
--\ Suggests (1)
--- ruby --\ Recommends (2)
--- libxml2-utils
--- python-psycopg2

IMHO, the package description should contain information for what these are actually used, so that one can decide whether they're needed on the specific system or not.

Cheers,
Chris.

httpd gets somehow in a bad state and doesn't update the transfers.txt

Hi.

Somehow the httpd got in a bad state with error messages like these:
2015-07-22 17:57:08 Launching /usr/bin/java -server -Xmx2048M -XX:MaxDirectMemorySize=512m -Dsun.net.inetaddr.ttl=1800 -Dorg.globus.tcp.port.range=60000,64999 -Dorg.dcache.dcap.port=0 -Dorg.
2015-07-22 17:57:11+02:00 (httpdPoolCollector) [] Failed to update topology map: Failed to deliver String message 1437580631914:121 to [>PoolManager@local]: Route for >PoolManager@local< n
2015-07-24 12:08:09+02:00 (httpd) [] Uncaught exception in thread Active Transfers Collector
java.lang.IllegalArgumentException: Multiple entries with same key: dcap.plain_lcg-lrz-dc61--AAUbhtA9W7g@dcap_lcg-lrz-dc61#35=150994948;dcap.plain_lcg-lrz-dc61--AAUbhtA9W7g
at com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:150) ~[guava-18.0.jar:na]
at com.google.common.collect.RegularImmutableMap.checkNoConflictInBucket(RegularImmutableMap.java:104) ~[guava-18.0.jar:na]
at com.google.common.collect.RegularImmutableMap.(RegularImmutableMap.java:70) ~[guava-18.0.jar:na]
at com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:254) ~[guava-18.0.jar:na]
at com.google.common.collect.Maps.uniqueIndex(Maps.java:1166) ~[guava-18.0.jar:na]
at com.google.common.collect.Maps.uniqueIndex(Maps.java:1140) ~[guava-18.0.jar:na]
at org.dcache.util.TransferCollector.getTransfers(TransferCollector.java:147) ~[dcache-core-2.13.4.jar:2.13.4]
at org.dcache.util.TransferCollector.lambda$null$54(TransferCollector.java:119) ~[dcache-core-2.13.4.jar:2.13.4]
at org.dcache.util.TransferCollector$$Lambda$51/1140826663.apply(Unknown Source) ~[na:na]
at com.google.common.util.concurrent.Futures$2.apply(Futures.java:760) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:906) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:185) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.Futures$ChainingListenableFuture$1.run(Futures.java:918) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.Futures$ImmediateFuture.addListener(Futures.java:106) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:914) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:185) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.Futures$CombinedFuture.setOneValue(Futures.java:1764) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.Futures$CombinedFuture.access$400(Futures.java:1608) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.Futures$CombinedFuture$2.run(Futures.java:1686) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:185) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.Futures$FallbackFuture$1.onSuccess(Futures.java:462) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1319) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:185) ~[guava-18.0.jar:na]
at org.dcache.cells.CellStub$CallbackFuture.set(CellStub.java:651) ~[dcache-core-2.13.4.jar:2.13.4]
at org.dcache.cells.CellStub$CallbackFuture.answerArrived(CellStub.java:673) ~[dcache-core-2.13.4.jar:2.13.4]
at dmg.cells.nucleus.CellNucleus$CallbackTask.run(CellNucleus.java:1053) ~[cells-2.13.4.jar:2.13.4]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299) ~[guava-18.0.jar:na]
at dmg.cells.nucleus.CellNucleus.addToEventQueue(CellNucleus.java:836) ~[cells-2.13.4.jar:2.13.4]
at dmg.cells.nucleus.CellGlue.deliverLocally(CellGlue.java:624) ~[cells-2.13.4.jar:2.13.4]
at dmg.cells.nucleus.CellGlue.sendMessage(CellGlue.java:539) ~[cells-2.13.4.jar:2.13.4]
at dmg.cells.nucleus.CellGlue.sendMessage(CellGlue.java:504) ~[cells-2.13.4.jar:2.13.4]
at dmg.cells.nucleus.CellNucleus.sendMessage(CellNucleus.java:363) ~[cells-2.13.4.jar:2.13.4]
at dmg.cells.nucleus.CellAdapter.sendMessage(CellAdapter.java:434) ~[cells-2.13.4.jar:2.13.4]
at dmg.cells.network.LocationMgrTunnel.receive(LocationMgrTunnel.java:147) ~[cells-2.13.4.jar:2.13.4]
at dmg.cells.network.LocationMgrTunnel.run(LocationMgrTunnel.java:165) ~[cells-2.13.4.jar:2.13.4]
at dmg.cells.nucleus.CellNucleus$5.run(CellNucleus.java:744) ~[cells-2.13.4.jar:2.13.4]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_51]
2015-07-24 12:08:19+02:00 (httpdTransferObserver) [] java.lang.IllegalArgumentException: Multiple entries with same key: dcap.plain_lcg-lrz-dc61--AAUbhtA9W7g@dcap_lcg-lrz-dc61#35=15
java.lang.IllegalArgumentException: Multiple entries with same key: dcap.plain_lcg-lrz-dc61--AAUbhtA9W7g@dcap_lcg-lrz-dc61#35=150994948;dcap.plain_lcg-lrz-dc61--AAUbhtA9W7g
at com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:150) ~[guava-18.0.jar:na]
at com.google.common.collect.RegularImmutableMap.checkNoConflictInBucket(RegularImmutableMap.java:104) ~[guava-18.0.jar:na]
at com.google.common.collect.RegularImmutableMap.(RegularImmutableMap.java:70) ~[guava-18.0.jar:na]
at com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:254) ~[guava-18.0.jar:na]
at com.google.common.collect.Maps.uniqueIndex(Maps.java:1166) ~[guava-18.0.jar:na]
at com.google.common.collect.Maps.uniqueIndex(Maps.java:1140) ~[guava-18.0.jar:na]
at org.dcache.util.TransferCollector.getTransfers(TransferCollector.java:147) ~[dcache-core-2.13.4.jar:2.13.4]
at diskCacheV111.cells.TransferObserverV1.collectDataSequentially(TransferObserverV1.java:420) ~[dcache-core-2.13.4.jar:2.13.4]
at diskCacheV111.cells.TransferObserverV1.run(TransferObserverV1.java:385) ~[dcache-core-2.13.4.jar:2.13.4]
at dmg.cells.nucleus.CellNucleus$5.run(CellNucleus.java:744) [cells-2.13.4.jar:2.13.4]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
2015-07-24 12:09:52+02:00 (httpdTransferObserver) [] java.lang.IllegalArgumentException: Multiple entries with same key: dcap.plain_lcg-lrz-dc55--AAUbijs0tOE@dcap_lcg-lrz-dc55#1=150
java.lang.IllegalArgumentException: Multiple entries with same key: dcap.plain_lcg-lrz-dc55--AAUbijs0tOE@dcap_lcg-lrz-dc55#1=150994982;dcap.plain_lcg-lrz-dc55--AAUbijs0tOE@
at com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:150) ~[guava-18.0.jar:na]
at com.google.common.collect.RegularImmutableMap.checkNoConflictInBucket(RegularImmutableMap.java:104) ~[guava-18.0.jar:na]
at com.google.common.collect.RegularImmutableMap.(RegularImmutableMap.java:70) ~[guava-18.0.jar:na]
at com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:254) ~[guava-18.0.jar:na]
at com.google.common.collect.Maps.uniqueIndex(Maps.java:1166) ~[guava-18.0.jar:na]
at com.google.common.collect.Maps.uniqueIndex(Maps.java:1140) ~[guava-18.0.jar:na]
at org.dcache.util.TransferCollector.getTransfers(TransferCollector.java:147) ~[dcache-core-2.13.4.jar:2.13.4]
at diskCacheV111.cells.TransferObserverV1.collectDataSequentially(TransferObserverV1.java:420) ~[dcache-core-2.13.4.jar:2.13.4]
at diskCacheV111.cells.TransferObserverV1.run(TransferObserverV1.java:385) ~[dcache-core-2.13.4.jar:2.13.4]
at dmg.cells.nucleus.CellNucleus$5.run(CellNucleus.java:744) [cells-2.13.4.jar:2.13.4]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]

A consequence is that the transfers.txt file is no longer update but kept at the same state.

Immediately after restarting http, the above errors occur again:
2015-07-27 16:27:27 Launching /usr/bin/java -server -Xmx2048M -XX:MaxDirectMemorySize=512m -Dsun.net.inetaddr.ttl=1800 -Dorg.globus.tcp.port.range=60000,64999 -Dorg.dcache.dcap.port=0 -Dorg.
2015-07-27 16:27:32+02:00 (httpd) [] Uncaught exception in thread Active Transfers Collector
java.lang.IllegalArgumentException: Multiple entries with same key: dcap.plain_lcg-lrz-dc55--AAUbyC3vPYA@dcap_lcg-lrz-dc55#19=150994996;dcap.plain_lcg-lrz-dc55--AAUbyC3vPYA
at com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:150) ~[guava-18.0.jar:na]
at com.google.common.collect.RegularImmutableMap.checkNoConflictInBucket(RegularImmutableMap.java:104) ~[guava-18.0.jar:na]
at com.google.common.collect.RegularImmutableMap.(RegularImmutableMap.java:70) ~[guava-18.0.jar:na]
at com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:254) ~[guava-18.0.jar:na]
at com.google.common.collect.Maps.uniqueIndex(Maps.java:1166) ~[guava-18.0.jar:na]
at com.google.common.collect.Maps.uniqueIndex(Maps.java:1140) ~[guava-18.0.jar:na]
at org.dcache.util.TransferCollector.getTransfers(TransferCollector.java:147) ~[dcache-core-2.13.4.jar:2.13.4]
at org.dcache.util.TransferCollector.lambda$null$54(TransferCollector.java:119) ~[dcache-core-2.13.4.jar:2.13.4]
at org.dcache.util.TransferCollector$$Lambda$64/1164875777.apply(Unknown Source) ~[na:na]
at com.google.common.util.concurrent.Futures$2.apply(Futures.java:760) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:906) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:185) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.Futures$ChainingListenableFuture$1.run(Futures.java:918) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.Futures$ImmediateFuture.addListener(Futures.java:106) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:914) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:185) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.Futures$CombinedFuture.setOneValue(Futures.java:1764) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.Futures$CombinedFuture.access$400(Futures.java:1608) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.Futures$CombinedFuture$2.run(Futures.java:1686) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:185) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.Futures$FallbackFuture$1.onSuccess(Futures.java:462) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1319) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-18.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:185) ~[guava-18.0.jar:na]
at org.dcache.cells.CellStub$CallbackFuture.set(CellStub.java:651) ~[dcache-core-2.13.4.jar:2.13.4]
at org.dcache.cells.CellStub$CallbackFuture.answerArrived(CellStub.java:673) ~[dcache-core-2.13.4.jar:2.13.4]
at dmg.cells.nucleus.CellNucleus$CallbackTask.run(CellNucleus.java:1053) ~[cells-2.13.4.jar:2.13.4]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299) ~[guava-18.0.jar:na]
at dmg.cells.nucleus.CellNucleus.addToEventQueue(CellNucleus.java:836) ~[cells-2.13.4.jar:2.13.4]
at dmg.cells.nucleus.CellGlue.deliverLocally(CellGlue.java:624) ~[cells-2.13.4.jar:2.13.4]
at dmg.cells.nucleus.CellGlue.sendMessage(CellGlue.java:539) ~[cells-2.13.4.jar:2.13.4]
at dmg.cells.nucleus.CellGlue.sendMessage(CellGlue.java:504) ~[cells-2.13.4.jar:2.13.4]
at dmg.cells.nucleus.CellNucleus.sendMessage(CellNucleus.java:363) ~[cells-2.13.4.jar:2.13.4]
at dmg.cells.nucleus.CellAdapter.sendMessage(CellAdapter.java:434) ~[cells-2.13.4.jar:2.13.4]
at dmg.cells.network.LocationMgrTunnel.receive(LocationMgrTunnel.java:147) ~[cells-2.13.4.jar:2.13.4]
at dmg.cells.network.LocationMgrTunnel.run(LocationMgrTunnel.java:165) ~[cells-2.13.4.jar:2.13.4]
at dmg.cells.nucleus.CellNucleus$5.run(CellNucleus.java:744) ~[cells-2.13.4.jar:2.13.4]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_51]

Any ideas?

Cheers,
Chris

(dcache 2.13.4)

Random pool selection in migration module ignores full pools with deletable files

Since 2.10.8, a migration task like

migration copy -sticky -pins=move -permanent -verify -target=pgroup -select=random -smode=removable -concurrency=5 -- pgroup-cms-analysis

will not transfer files to a full pool; not even if the pool has deletable files. If all pools are full, errors such as:

Most recent errors:
20:01:46 [742570] 0000849E0BD667C4449B817B9DB3F5221BCD: No targets
20:01:46 [742569] 0000CB673F3955AC472F8309F515D55143EC: No targets
20:02:06 [742577] 0000F1A0CB69BDD14E3B8B8E5ACA812EEA18: No targets
20:02:06 [742578] 000033D6E2A2868745F890A19EF3603AA140: No targets
20:02:06 [742579] 0000C37034DE360F4316B01E1C40E1DEFC92: No targets
20:02:06 [742581] 0000FD3387698EB04F3A914B612DEF720EA8: No targets

are logged even when files on the target pools could have been garbage collected.

SRM container requests do not terminate when all files requests are cancelled

I've seen the following on prometheus:

[prometheus] (SRM-prometheus@dCacheDomain) admin > ls -l "-2147476536
Get id:-2147476536 files:1 state:Pending by:ops
   Submitted: 2015-06-26 11:42:51.720 (3 hours ago)
   Expires: 2015-06-26 15:42:51.720 (7 minutes in the future)
   History:
      2015-06-26 11:42:51.720 Pending: Request created (3 hours, so far)

   file id:-2147476535 state:Canceled
      SURL: srm://prometheus.desy.de:8443/srm/managerv2?SFN=/VOs/ops/testfile-put-1435311757-69cea5623e37.txt
      Pinned: true
      Pin id: 207
      TURL: gsiftp://prometheus.desy.de:2811//VOs/ops/testfile-put-1435311757-69cea5623e37.txt
      History:
         2015-06-26 11:42:51.721 Pending: Request created (13 ms)
         2015-06-26 11:42:51.734 TQueued: Request enqueued. (1 ms)
         2015-06-26 11:42:51.735 PriorityTQueued: Waiting for thread. (0 ms)
         2015-06-26 11:42:51.735 Running: Processing request (0 ms)
         2015-06-26 11:42:51.735 AsyncWait: Pinning file. (258 ms)
         2015-06-26 11:42:51.993 PriorityTQueued: Waiting for thread. (1 ms)
         2015-06-26 11:42:51.994 Running: Processing request (7 ms)
         2015-06-26 11:42:52.001 RQueued: Putting on a "Ready" Queue. (0 ms)
         2015-06-26 11:42:52.001 Ready: Execution succeeded. (18 s)
         2015-06-26 11:43:10.753 Canceled: Download aborted because the file was deleted by another request.

There are similar tests that continue to exist until they expire. This is despite the single file-request they contain being cancelled.

rpcbind startup error reported as a bug

we should not report as a BUG permission issues when portmap failed to bind to port 111.

storage group names with '/'-sign break statistics service

Dear dCache support,

we created some storage groups with a pattern like this (from poolmanager.conf):

psu create unit -store  ifh:magic15/testosm:testgroup@osm

This seems to break the statistics service (dCache 2.10.31) as it does not seem to properly mask the '/'. Excerpt from httpDomain (runs inside that domain here):

 [puck] /var/log/dcache % tail -n 12 httpdDomain.log
 09 Jun 2015 14:05:52 (PoolStatistics) [] Can't prepare Html directory: /var/lib/dcache/statistics/2015/06/09
 java.io.FileNotFoundException: /var/lib/dcache/statistics/2015/06/09/class-ifh:magic15/testosm:[email protected] (No such file or directory)
    at java.io.FileOutputStream.open(Native Method) ~[na:1.7.0_79]
    at java.io.FileOutputStream.<init>(FileOutputStream.java:221) ~[na:1.7.0_79]
    at java.io.FileOutputStream.<init>(FileOutputStream.java:171) ~[na:1.7.0_79]
    at java.io.FileWriter.<init>(FileWriter.java:90) ~[na:1.7.0_79]
    at diskCacheV111.services.PoolStatisticsV0.prepareDailyHtml(PoolStatisticsV0.java:1266) [dcache-core-2.10.31.jar:2.10.31]
    at diskCacheV111.services.PoolStatisticsV0.prepareDailyHtmlFiles(PoolStatisticsV0.java:1126) [dcache-core-2.10.31.jar:2.10.31]
    at diskCacheV111.services.PoolStatisticsV0.access$1100(PoolStatisticsV0.java:119) [dcache-core-2.10.31.jar:2.10.31]
    at diskCacheV111.services.PoolStatisticsV0$HourlyRunner.run(PoolStatisticsV0.java:511) [dcache-core-2.10.31.jar:2.10.31]
    at dmg.cells.nucleus.CellNucleus$5.run(CellNucleus.java:742) [cells-2.10.31.jar:2.10.31]
    at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]

Cheers,
Andreas

Usage with Ruby

Hey there,

I'm looking to give dCache a try and see how it compares against Riak and Redis for a rails application. I wasn't able to find a ruby wrapper on Github.

I was able to find a gem called dCache, but it doesn't link back to this repository, nor does it explicitly state that its a ruby client for dCache. Its also quite old.

NFS uploads to space tokens fail with `null group` error

When uploading using NFS to a directory with a WriteToken tag, the upload still fails with a null group error:

srm-dcachetest05Domain.log:08 Jun 2015 15:38:33 (SrmSpaceManager) [door:nfsv41@namespaceDomain2:1433792313701 nfsv41 PoolAcceptFile 0000EC7109158D014B0981066AF1F59807CF] Message processing failed: null group

The cause is that NFS uploads do not have a user name or DN and thus space manager is unable to generate a owner for the file entry it creates in the space reservation. This is fixed in 2.13 by allowing GIDs as owners, however there is no good reason why earlier versions could not create such files as unowned (the space owner of files does not appear to be used anyway).

hsm interface need a possibility to skip logging of some error codes

Some HSM systems use the fact that pool will retry flushing to tape and return non zero error as long as backed tape migration not ready. As any failed attempt to flush files get logged in billing, billing files get filled up with such error messages, which are 'OK'.

An option to HSM interface to skip logging of some error codes will decrease amount of the data send to billing.

Request: option to limit the number of getfilerequests per user

The SRM has this setting, I assume to limit the number of running get requests per user, with a default value of 100:
srm.request.get.max-by-same-user=${srm.request.max-by-same-user}

Wouldn't it be nice to have something similar to limit the waiting or total get requests per user, so that one uninformed user can't DoS all other users by overloading our SRM with TURLs?

References:
https://ggus.eu/?mode=ticket_info&ticket_id=112413
https://ggus.eu/?mode=ticket_info&ticket_id=114478

Thanks!
Onno

SRM gets slower the more requests are queued

The time it takes to schedule a request in the SRM gets longer the more requests are queued. This is to the point where having 30000 bring-online requests in the queue means it takes significant amount of time to start the 10000 concurrent stages we allow (half an hour or more). SARA has reported the same problem for transfer requests.

To reproduce, submit a 5 digit number of bring-online requests concurrently and observe how slowly requests are processed by running info in the SRM.

This was observed with 2.12 but likely affects all earlier versions too.

"possible bug found" from pool when disk went offline {2.10}{pool}

Hi,

Perhaps these errors from dCache is of interest (likely fix is known, replace disk and restart server):

This happened on an HP DL360 G7 with 4 x D2600 disk shelves behind a P411 controller in SL6.

dcache version 2.10.25

java-1.7.0-openjdk-devel-1.7.0.79-2.5.5.1.el6_6.x86_64
java-1.7.0-openjdk-1.7.0.79-2.5.5.1.el6_6.x86_64

A disk went into predictive failure and then the disk (logical drive) was offlined:

Jun 22 14:24:03 pacific08 kernel: hpsa 0000:06:00.0: Abort request on C1:B0:T0:L0
Jun 22 14:24:03 pacific08 kernel: hpsa 0000:06:00.0: ABORT REQUEST on C1:B0:T0:L0 Tag:0x00000000:00000090 Command:0x88 SN:0x4c317141  REQUEST SUCCEEDED.
[snip]
Jun 22 22:51:03 pacific08 kernel: hpsa 0000:06:00.0: ABORT REQUEST on C1:B0:T0:L0 Tag:0x00000000:000001f0 Command:0x88 SN:0x4cf98c6f  REQUEST SUCCEEDED.
Jun 22 22:51:33 pacific08 kernel: hpsa 0000:06:00.0: ABORT REQUEST on C1:B0:T0:L0 Tag:0x00000000:000001f0 Command:0x88 SN:0x4cf98c6f  FAILED. Aborted command has not completed after 30 seconds.
Jun 22 22:51:33 pacific08 kernel: hpsa 0000:06:00.0: resetting device 1:0:0:0
Jun 22 22:51:34 pacific08 kernel: hpsa 0000:06:00.0: device is ready.
Jun 22 22:51:34 pacific08 kernel: sd 1:0:0:0: [sdb] Medium access timeout failure. Offlining disk!
Jun 22 22:51:34 pacific08 kernel: sd 1:0:0:0: Device offlined - not ready after error recovery
Jun 22 22:51:34 pacific08 kernel: sd 1:0:0:0: [sdb]  Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
Jun 22 22:51:34 pacific08 kernel: sd 1:0:0:0: [sdb] CDB: Read(16): 88 00 00 00 00 02 93 5d e3 d8 00 00 00 60 00 00
Jun 22 22:51:34 pacific08 kernel: end_request: I/O error, dev sdb, sector 11062338520
Jun 22 22:51:34 pacific08 kernel: sd 1:0:0:0: rejecting I/O to offline device
Jun 22 22:51:34 pacific08 kernel: sd 1:0:0:0: [sdb] killing request

Then we saw this from dCache logs:

22 Jun 2015 22:51:35 (csc_fi_29) [] xrootd I/O error: java.io.IOException: Input/output error
22 Jun 2015 22:51:35 (csc_fi_29) [] xrootd I/O error: java.io.IOException: Input/output error
22 Jun 2015 22:51:35 (csc_fi_29) [] xrootd I/O error: java.io.IOException: Input/output error
22 Jun 2015 22:51:35 (csc_fi_29) [] xrootd I/O error: java.io.IOException: Input/output error
22 Jun 2015 22:51:35 (csc_fi_29) [] xrootd I/O error: java.io.IOException: Input/output error
22 Jun 2015 22:51:35 (csc_fi_29) [] xrootd I/O error: java.io.IOException: Input/output error
22 Jun 2015 22:51:35 (csc_fi_29) [] xrootd I/O error: java.io.IOException: Input/output error
22 Jun 2015 22:51:35 (csc_fi_29) [] xrootd I/O error: java.io.IOException: Input/output error
22 Jun 2015 22:51:35 (csc_fi_29) [] xrootd I/O error: java.io.IOException: Input/output error
22 Jun 2015 22:51:35 (csc_fi_29) [] Could not create /srv/dcache/pool1/csc_fi_29/meta/.repository_is_ok
22 Jun 2015 22:51:35 (csc_fi_29) [] Fault occured in repository: I/O test failed
22 Jun 2015 22:51:35 (csc_fi_29) [] Pool mode changed to disabled(fetch,store,stage,p2p-client,p2p-server): Pool disabled: I/O test failed
22 Jun 2015 22:51:35 (csc_fi_29) [door:Xrootd-madhatter@xrootd-madhatterDomain:AAABThy2emc Xrootd-madhatter PoolDeliverFile 000045B1C515108D46A489C671F7A403CC42] PoolIoFileMessage request rej
ected due to disabled(fetch,store,stage,p2p-client,p2p-server)
22 Jun 2015 22:51:35 (csc_fi_29) [] FILE OPEN RESPONSE  connId: -120714739    path :pnfs/csc.fi/data/cms/store/mc/Phys14DR/QCD_Pt-170to300_Tune4C_13TeV_pythia8/GEN-SIM-RAW/AVE20BX25_tsg_casto
r_PHYS14_25_V3-v1/30000/54F82F5D-E2A8-E411-83F8-0025907DC9DC.root,  fileCounter: 90869
22 Jun 2015 22:51:35 (csc_fi_29) [] FILE OPEN RESPONSE  connId: 2143435051    path :pnfs/csc.fi/data/cms/store/mc/Phys14DR/QCD_Pt-170to300_Tune4C_13TeV_pythia8/GEN-SIM-RAW/AVE20BX25_tsg_castor_PHYS14_25_V3-v1/30000/D096402A-CCA8-E411-8466-002590AC505C.root,  fileCounter: 90870
22 Jun 2015 22:51:35 (csc_fi_29) [door:GFTP-pacific01-225012@ftp-pacific01Domain:1435000104292 GFTP-pacific01-225012 PoolAcceptFile 00002083AFC458F843A3AE49131BB2D2B53A] Marking pool entry as BROKEN
22 Jun 2015 22:51:35 (csc_fi_29) [door:GFTP-madhatter-302655@ftp-madhatterDomain:1435002663989 GFTP-madhatter-302655 PoolAcceptFile 0000CE4911CA397A4A3680FE70A1DA7E396A] Possible bug found: No transaction is active
java.lang.IllegalStateException: No transaction is active
        at com.sleepycat.collections.CurrentTransaction.abortTransaction(CurrentTransaction.java:316) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.collections.StoredContainer.handleException(StoredContainer.java:456) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.collections.StoredContainer.putKeyValue(StoredContainer.java:327) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.collections.StoredMap.put(StoredMap.java:279) ~[je-6.1.5.jar:6.1.5]
        at org.dcache.pool.repository.meta.db.CacheRepositoryEntryImpl.setFileAttributes(CacheRepositoryEntryImpl.java:168) ~[dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.pool.repository.v5.CacheRepositoryV5.createEntry(CacheRepositoryV5.java:457) ~[dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.pool.classic.PoolV4.createMover(PoolV4.java:745) [dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.pool.classic.PoolV4.ioFile(PoolV4.java:791) [dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.pool.classic.PoolV4.messageArrived(PoolV4.java:1038) [dcache-core-2.10.25.jar:2.10.25]
        at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source) ~[na:na]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_79]
        at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_79]
        at org.dcache.cells.CellMessageDispatcher$LongReceiver.deliver(CellMessageDispatcher.java:305) [dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.cells.CellMessageDispatcher.call(CellMessageDispatcher.java:202) [dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.cells.AbstractCell.messageArrived(AbstractCell.java:802) [dcache-core-2.10.25.jar:2.10.25]
        at dmg.cells.nucleus.CellAdapter.messageArrived(CellAdapter.java:860) [cells-2.10.25.jar:2.10.25]
        at dmg.cells.nucleus.CellNucleus$DeliverMessageTask.innerRun(CellNucleus.java:1151) [cells-2.10.25.jar:2.10.25]
        at dmg.cells.nucleus.CellNucleus$AbstractNucleusTask.run(CellNucleus.java:1052) [cells-2.10.25.jar:2.10.25]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_79]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
22 Jun 2015 22:51:35 (csc_fi_29) [door:GFTP-pacific01-225121@ftp-pacific01Domain:1435001988125 GFTP-pacific01-225121 PoolAcceptFile 0000C4BED27E55174A2A8614B8409F585B9D] Marking pool entry as BROKEN
22 Jun 2015 22:51:35 (csc_fi_29) [door:GFTP-pacific01-225012@ftp-pacific01Domain:1435000104292 GFTP-pacific01-225012 PoolAcceptFile 00002083AFC458F843A3AE49131BB2D2B53A] Transfer failed in post-processing. Please report this bug to [email protected].
com.sleepycat.je.LogWriteException: (JE 6.1.5) Environment must be closed, caused by: com.sleepycat.je.LogWriteException: Environment invalid because of previous exception: (JE 6.1.5) /srv/dcache/pool1/csc_fi_29/meta IOException during fsync LOG_WRITE: IOException on write, log is likely incomplete. Environment is invalid and must be closed.
        at com.sleepycat.je.LogWriteException.wrapSelf(LogWriteException.java:72) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1610) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.je.Environment.checkEnv(Environment.java:2231) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.je.Database.checkEnv(Database.java:2220) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.je.Database.openCursor(Database.java:823) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.collections.CurrentTransaction.openCursor(CurrentTransaction.java:415) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.collections.MyRangeCursor.openCursor(MyRangeCursor.java:53) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.collections.MyRangeCursor.(MyRangeCursor.java:29) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.collections.DataCursor.init(DataCursor.java:170) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.collections.DataCursor.(DataCursor.java:58) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.collections.StoredContainer.getValue(StoredContainer.java:300) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.collections.StoredMap.get(StoredMap.java:240) ~[je-6.1.5.jar:6.1.5]
        at org.dcache.pool.repository.meta.db.CacheRepositoryEntryImpl.getStorageInfo(CacheRepositoryEntryImpl.java:148) ~[dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.pool.repository.meta.db.CacheRepositoryEntryImpl.getFileAttributes(CacheRepositoryEntryImpl.java:156) ~[dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.pool.repository.v5.CacheEntryImpl.(CacheEntryImpl.java:36) ~[dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.pool.repository.v5.CacheRepositoryV5.setState(CacheRepositoryV5.java:837) ~[dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.pool.repository.v5.WriteHandleImpl.fail(WriteHandleImpl.java:405) ~[dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.pool.repository.v5.WriteHandleImpl.close(WriteHandleImpl.java:418) ~[dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.pool.classic.DefaultPostTransferService$1.run(DefaultPostTransferService.java:108) ~[dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.util.FireAndForgetTask.run(FireAndForgetTask.java:31) [dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.util.CDCExecutorServiceDecorator$1.run(CDCExecutorServiceDecorator.java:104) [dcache-core-2.10.25.jar:2.10.25]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_79]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
Caused by: com.sleepycat.je.LogWriteException: Environment invalid because of previous exception: (JE 6.1.5) /srv/dcache/pool1/csc_fi_29/meta IOException during fsync LOG_WRITE: IOException on write, log is likely incomplete. Environment is invalid and must be closed.
        at com.sleepycat.je.log.FileManager.syncLogEnd(FileManager.java:2018) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.je.log.FSyncManager.executeFSync(FSyncManager.java:393) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.je.log.FSyncManager.sync(FSyncManager.java:343) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.je.log.LogManager.multiLog(LogManager.java:329) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.je.log.LogManager.log(LogManager.java:273) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.je.txn.Txn.logCommitEntry(Txn.java:1004) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.je.txn.Txn.commit(Txn.java:761) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.je.txn.Txn.commit(Txn.java:620) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.je.Transaction.commit(Transaction.java:317) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.collections.CurrentTransaction.commitTransaction(CurrentTransaction.java:280) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.collections.StoredContainer.commitAutoCommit(StoredContainer.java:448) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.collections.StoredContainer.putKeyValue(StoredContainer.java:323) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.collections.StoredMap.put(StoredMap.java:279) ~[je-6.1.5.jar:6.1.5]
        at org.dcache.pool.repository.meta.db.CacheRepositoryEntryImpl.setFileAttributes(CacheRepositoryEntryImpl.java:168) ~[dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.pool.repository.v5.CacheRepositoryV5.createEntry(CacheRepositoryV5.java:457) ~[dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.pool.classic.PoolV4.createMover(PoolV4.java:745) ~[dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.pool.classic.PoolV4.ioFile(PoolV4.java:791) ~[dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.pool.classic.PoolV4.messageArrived(PoolV4.java:1038) ~[dcache-core-2.10.25.jar:2.10.25]
        at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source) ~[na:na]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_79]
        at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_79]
        at org.dcache.cells.CellMessageDispatcher$LongReceiver.deliver(CellMessageDispatcher.java:305) ~[dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.cells.CellMessageDispatcher.call(CellMessageDispatcher.java:202) ~[dcache-core-2.10.25.jar:2.10.25]
        at org.dcache.cells.AbstractCell.messageArrived(AbstractCell.java:802) ~[dcache-core-2.10.25.jar:2.10.25]
        at dmg.cells.nucleus.CellAdapter.messageArrived(CellAdapter.java:860) ~[cells-2.10.25.jar:2.10.25]
        at dmg.cells.nucleus.CellNucleus$DeliverMessageTask.innerRun(CellNucleus.java:1151) ~[cells-2.10.25.jar:2.10.25]
        at dmg.cells.nucleus.CellNucleus$AbstractNucleusTask.run(CellNucleus.java:1052) ~[cells-2.10.25.jar:2.10.25]
        ... 3 common frames omitted
Caused by: java.io.IOException: Input/output error
        at sun.nio.ch.FileDispatcherImpl.force0(Native Method) ~[na:1.7.0_79]
        at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:77) ~[na:1.7.0_79]
        at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:377) ~[na:1.7.0_79]
        at com.sleepycat.je.log.FileManager$LogEndFileDescriptor.force(FileManager.java:3011) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.je.log.FileManager$LogEndFileDescriptor.access$500(FileManager.java:2678) ~[je-6.1.5.jar:6.1.5]
        at com.sleepycat.je.log.FileManager.syncLogEnd(FileManager.java:2016) ~[je-6.1.5.jar:6.1.5]
        ... 29 common frames omitted

dCache xrootd door problem with the new client

xroot 3.3.6 version of xrdcopy does not work with dCache xrootd doors:

~ >xrdcopy -d 1 -f root://uct2-s5.mwt2.org:1096//atlas/rucio/user/flegger:NTUP_SMWZ.00986521._000007.MWT2.root.1 /dev/null 12:45
[ 92%][==============================================> ] [3G/3G] [2014-03-20 12:53:38 -0500][Error ][XRootD ] [192.170.227.109:24031] Handling error while processing kXR_read (handle: 0x0000, offset: 3758096384, size: 16777216): [ERROR] Operation expired.
[2014-03-20 12:53:38 -0500][Error ][XRootD ] [192.170.227.109:24031] Unable to get the response to request kXR_read (handle: 0x0000, offset: 3758096384, size: 16777216)
[2014-03-20 12:53:38 -0500][Error ][File ] [0x60a880@root://uct2-s5.mwt2.org:1096//atlas/rucio/user/flegger:NTUP_SMWZ.00986521._000007.MWT2.root.1] Fatal file state error. Message kXR_read (handle: 0x0000, offset: 3758096384, size: 16777216) returned with [ERROR] Operation expired
[2014-03-20 12:53:38 -0500][Error ][XRootDTransport ] Message 0x651682d0, stream [0, 0] is a response that we're no longer interested in (timed out)
[2014-03-20 12:58:38 -0500][Error ][XRootD ] [192.170.227.109:24031] Handling error while processing kXR_close (handle: 0x0000): [ERROR] Operation expired.
[2014-03-20 12:58:38 -0500][Error ][XRootD ] [192.170.227.109:24031] Unable to get the response to request kXR_close (handle: 0x0000)
[ 99%][=================================================>] [3G/3G]
Run: [ERROR] Operation expired

org.springframework.jdbc.CannotGetJdbcConnectionException

This happened during restarts of dCache (on several servers):

dCache 2.10.34
openjdk
SL 6.6

The Domain where this logfile is from run these services:

[srm-madhatterDomain]
[srm-madhatterDomain/srm]
[srm-madhatterDomain/spacemanager]
[srm-madhatterDomain/transfermanagers]

2015-06-30 09:03:44 Process terminated

2015-06-30 09:32:24 Launching /usr/bin/java -server -Xmx2048m -XX:MaxDirectMemorySize=512m -Dsun.net.inetaddr.ttl=1800 -Dorg.globus.tcp.port.range=20000,25000 -Dorg.dcache.dcap.port=0 -Dorg.d
cache.net.tcp.portrange=20000:25000 -Dorg.globus.jglobus.delegation.cache.lifetime=30000 -Dorg.globus.jglobus.crl.cache.lifetime=60000 -Djava.security.krb5.realm= -Djava.security.krb5.kdc= -D
javax.security.auth.useSubjectCredsOnly=false -Djava.security.auth.login.config=/etc/dcache/jgss.conf -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/dcache/srm-madhatterDomain-oom.
hprof -javaagent:/usr/share/dcache/classes/aspectjweaver-1.8.1.jar -Djava.awt.headless=true -DwantLog4jSetup=n -Ddcache.home=/usr/share/dcache -Ddcache.paths.defaults=/usr/share/dcache/defaul
ts org.dcache.boot.BootLoader start srm-madhatterDomain
INFO  - dcache.conf:47: Property my.log.format.syslog is not a standard property
INFO  - dcache.conf:48: Property my.log.host.syslog is not a standard property
INFO  - dcache.conf:49: Property my.log.level.syslog is not a standard property
30 Jun 2015 09:33:55 (SRM-madhatter) [10.11.3.145:36769] Problem while establishing secure connection: java.net.SocketTimeoutException: Read timed out
30 Jun 2015 09:34:27 (c-dCacheDomain-101) [] Sleeping 18 seconds
30 Jun 2015 09:34:49 (lm) [] Problem in 'whereIs' request : java.io.IOException: Request timed out
30 Jun 2015 09:34:50 (c-dCacheDomain-101) [] Failed to connect to dCacheDomain: Timeout querying location manager
30 Jun 2015 09:34:50 (c-dCacheDomain-101) [] Sleeping 18 seconds
30 Jun 2015 09:35:03 (SRM-madhatter) [10.11.3.135:44312] Problem while establishing secure connection: java.net.SocketTimeoutException: Read timed out
30 Jun 2015 09:35:13 (lm) [] Problem in 'whereIs' request : java.io.IOException: Request timed out
30 Jun 2015 09:35:13 (SRM-madhatter) [10.11.3.107:60585] Problem while establishing secure connection: java.net.SocketTimeoutException: Read timed out
30 Jun 2015 09:35:14 (c-dCacheDomain-101) [] Failed to connect to dCacheDomain: Timeout querying location manager
30 Jun 2015 09:35:14 (c-dCacheDomain-101) [] Sleeping 20 seconds
30 Jun 2015 09:35:29 (SRM-madhatter) [10.11.3.107:60586] Problem while establishing secure connection: java.net.SocketTimeoutException: Read timed out
30 Jun 2015 09:35:38 (lm) [] Problem in 'whereIs' request : java.io.IOException: Request timed out
30 Jun 2015 09:35:39 (c-dCacheDomain-101) [] Failed to connect to dCacheDomain: Timeout querying location manager
30 Jun 2015 09:35:39 (c-dCacheDomain-101) [] Sleeping 5 seconds
30 Jun 2015 09:35:45 (lm) [] Problem in 'whereIs' request : java.io.IOException: Network is unreachable
30 Jun 2015 09:35:48 (SRM-madhatter) [10.12.100.13:56702] Client disconnected before SRM response was sent.
30 Jun 2015 09:35:48 (SRM-madhatter) [109.105.124.161:37557] Client disconnected before SRM response was sent.
30 Jun 2015 09:35:48 (SRM-madhatter) [10.12.100.19:34071 E2Q:91:srm2:prepareToPut] Request was interrupted
30 Jun 2015 09:35:48 (SRM-madhatter) [10.12.100.14:38414] Client disconnected before SRM response was sent.
30 Jun 2015 09:35:48 (SRM-madhatter) [10.12.100.19:34071] Client disconnected before SRM response was sent.
30 Jun 2015 09:35:48 (SRM-madhatter) [10.12.100.2:42858] Client disconnected before SRM response was sent.
30 Jun 2015 09:35:48 (SRM-madhatter) [188.185.227.41:48550 E2Q:106:srm2:ls] Please report this failure to [email protected]
org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: Interrupted during connection acquisition
        at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:80) ~[spring-jdbc-4.0.5.RELEASE.jar:4.0.5.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:630) ~[spring-jdbc-4.0.5.RELEASE.jar:4.0.5.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:695) ~[spring-jdbc-4.0.5.RELEASE.jar:4.0.5.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:727) ~[spring-jdbc-4.0.5.RELEASE.jar:4.0.5.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:737) ~[spring-jdbc-4.0.5.RELEASE.jar:4.0.5.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:787) ~[spring-jdbc-4.0.5.RELEASE.jar:4.0.5.RELEASE]
        at org.dcache.srm.request.sql.DatabaseRequestCredentialStorage.getRequestCredentialByCondition(DatabaseRequestCredentialStorage.java:207) ~[srm-server-2.10.34.jar:2.10.34]
        at org.dcache.srm.request.sql.DatabaseRequestCredentialStorage.getRequestCredential(DatabaseRequestCredentialStorage.java:244) ~[srm-server-2.10.34.jar:2.10.34]
        at org.dcache.srm.request.RequestCredential.newRequestCredential(RequestCredential.java:144) ~[srm-server-2.10.34.jar:2.10.34]
        at org.dcache.srm.server.SrmAuthorizer.getRequestCredential(SrmAuthorizer.java:200) ~[srm-server-2.10.34.jar:2.10.34]
        at org.dcache.srm.server.SRMServerV2.dispatch(SRMServerV2.java:263) [srm-server-2.10.34.jar:2.10.34]
        at org.dcache.srm.server.SRMServerV2.handleRequest(SRMServerV2.java:239) [srm-server-2.10.34.jar:2.10.34]
        at org.dcache.srm.server.SRMServerV2.srmLs(SRMServerV2.java:453) [srm-server-2.10.34.jar:2.10.34]
        at sun.reflect.GeneratedMethodAccessor214.invoke(Unknown Source) ~[na:na]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_79]
        at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_79]
        at org.apache.axis.providers.java.RPCProvider.invokeMethod(RPCProvider.java:397) [axis-1.4.jar:na]
        at org.dcache.srm.util.LoggingRPCProvider.invokeMethod(LoggingRPCProvider.java:51) [srm-server-2.10.34.jar:2.10.34]
        at org.apache.axis.providers.java.RPCProvider.processMessage(RPCProvider.java:186) [axis-1.4.jar:na]
        at org.apache.axis.providers.java.JavaProvider.invoke(JavaProvider.java:323) [axis-1.4.jar:na]
        at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32) [axis-1.4.jar:na]
        at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118) [axis-1.4.jar:na]
        at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83) [axis-1.4.jar:na]
        at org.apache.axis.handlers.soap.SOAPService.invoke(SOAPService.java:454) [axis-1.4.jar:na]
        at org.apache.axis.server.AxisServer.invoke(AxisServer.java:281) [axis-1.4.jar:na]
        at org.apache.axis.transport.http.AxisServlet.doPost(AxisServlet.java:699) [axis-1.4.jar:na]
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:755) [javax.servlet-3.0.0.v201112011016.jar:na]
        at org.apache.axis.transport.http.AxisServletBase.service(AxisServletBase.java:327) [axis-1.4.jar:na]
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) [javax.servlet-3.0.0.v201112011016.jar:na]
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) [jetty-servlet-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503) [jetty-servlet-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) [jetty-server-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) [jetty-security-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) [jetty-server-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) [jetty-server-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429) [jetty-servlet-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) [jetty-server-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) [jetty-server-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) [jetty-server-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) [jetty-server-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) [jetty-server-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.server.Server.handle(Server.java:370) [jetty-server-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) [jetty-server-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) [jetty-server-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982) [jetty-server-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043) [jetty-server-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865) [jetty-http-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) [jetty-http-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) [jetty-server-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) [jetty-server-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.dcache.util.JettyGSIConnector$GsiConnection.run(JettyGSIConnector.java:572) [dcache-core-2.10.34.jar:2.10.34]
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) [jetty-util-8.1.15.v20140411.jar:8.1.15.v20140411]
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) [jetty-util-8.1.15.v20140411.jar:8.1.15.v20140411]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
Caused by: java.sql.SQLException: Interrupted during connection acquisition
        at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:199) ~[HikariCP-java6-2.0.1.jar:na]
        at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:93) ~[HikariCP-java6-2.0.1.jar:na]
        at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111) ~[spring-jdbc-4.0.5.RELEASE.jar:4.0.5.RELEASE]
        at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77) ~[spring-jdbc-4.0.5.RELEASE.jar:4.0.5.RELEASE]
        ... 53 common frames omitted
Caused by: java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedLongSynchronizer.doAcquireSharedNanos(AbstractQueuedLongSynchronizer.java:815) ~[na:1.7.0_79]
        at java.util.concurrent.locks.AbstractQueuedLongSynchronizer.tryAcquireSharedNanos(AbstractQueuedLongSynchronizer.java:1103) ~[na:1.7.0_79]
        at com.zaxxer.hikari.util.ConcurrentBag.borrow(ConcurrentBag.java:146) ~[HikariCP-java6-2.0.1.jar:na]
        at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:169) ~[HikariCP-java6-2.0.1.jar:na]
        ... 56 common frames omitted
30 Jun 2015 09:35:48 (SRM-madhatter) [188.185.227.41:48550] Client disconnected before SRM response was sent.
30 Jun 2015 09:35:48 (SRM-madhatter) [188.185.227.41:48550] Client disconnected before SRM response was sent.
30 Jun 2015 09:35:48 (SRM-madhatter) [] 1 threads could not be stopped
30 Jun 2015 09:35:48 (SRM-madhatter) [10.12.100.21:49727 E2Q:84:srm2:ls:-2073365550:-2073365549] Persistence of request -2073365549 failed, queue is too long.
30 Jun 2015 09:35:48 (SRM-madhatter) [10.12.100.19:34068 E2Q:85:srm2:ls:-2073365548:-2073365547] Persistence of request -2073365547 failed, queue is too long.
30 Jun 2015 09:35:49 (System) [] Timeout waiting for SRM-madhatter
30 Jun 2015 09:35:49 (System) [] Thread: Timer-0 [AD-] (5) TIMED_WAITING
30 Jun 2015 09:35:49 (System) []     java.lang.Object.wait(Native Method)
30 Jun 2015 09:35:49 (System) []     java.util.TimerThread.mainLoop(Timer.java:552)
30 Jun 2015 09:35:49 (System) []     java.util.TimerThread.run(Timer.java:505)
30 Jun 2015 09:35:49 (System) [] Thread: Job expiration [AD-] (5) TIMED_WAITING
30 Jun 2015 09:35:49 (System) []     java.lang.Object.wait(Native Method)
30 Jun 2015 09:35:49 (System) []     java.util.TimerThread.mainLoop(Timer.java:552)
30 Jun 2015 09:35:49 (System) []     java.util.TimerThread.run(Timer.java:505)
30 Jun 2015 09:35:49 (System) [] Thread: jetty-srm-56 [A-I] (5) RUNNABLE
30 Jun 2015 09:35:49 (System) []     java.net.SocketInputStream.socketRead0(Native Method)
30 Jun 2015 09:35:49 (System) []     java.net.SocketInputStream.read(SocketInputStream.java:152)
30 Jun 2015 09:35:49 (System) []     java.net.SocketInputStream.read(SocketInputStream.java:122)
30 Jun 2015 09:35:49 (System) []     org.globus.gsi.gssapi.SSLUtil.read(SSLUtil.java:41)
30 Jun 2015 09:35:49 (System) []     org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readToken(GSIGssInputStream.java:70)
30 Jun 2015 09:35:49 (System) []     org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:60)
30 Jun 2015 09:35:49 (System) []     org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:64)
30 Jun 2015 09:35:49 (System) []     org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:126)
30 Jun 2015 09:35:49 (System) []     org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:146)
30 Jun 2015 09:35:49 (System) []     org.dcache.util.JettyGSIConnector$GsiConnection.run(JettyGSIConnector.java:566)
30 Jun 2015 09:35:49 (System) []     org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
30 Jun 2015 09:35:49 (System) []     org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
30 Jun 2015 09:35:49 (System) []     java.lang.Thread.run(Thread.java:745)
30 Jun 2015 09:35:49 (System) [] Thread: Timer-7 [AD-] (5) TIMED_WAITING
30 Jun 2015 09:35:49 (System) []     java.lang.Object.wait(Native Method)
30 Jun 2015 09:35:49 (System) []     java.util.TimerThread.mainLoop(Timer.java:552)
30 Jun 2015 09:35:49 (System) []     java.util.TimerThread.run(Timer.java:505)
30 Jun 2015 09:35:49 (System) [] Thread: Timer-8 [AD-] (5) TIMED_WAITING
30 Jun 2015 09:35:49 (System) []     java.lang.Object.wait(Native Method)
30 Jun 2015 09:35:49 (System) []     java.util.TimerThread.mainLoop(Timer.java:552)
30 Jun 2015 09:35:49 (System) []     java.util.TimerThread.run(Timer.java:505)

2015-06-30 09:43:09 Launching /usr/bin/java -server -Xmx2048m

logging level of p2p replication messages

Hello,

Glad to see dCache is using Github issues!

We always see messages such as:

30 May 2015 22:12:25 (PoolManager) [DCap-basilisk01-<unknown>-931793 PoolMgrSelectReadPool 0000D865FE02EA944154B637C55 2F1FB4B78] [p2p] CacheException(rc=10018;msg=P2P denied: already too many copies (2)) 30 May 2015 22:12:27 (PoolManager) [DCap-basilisk01-<unknown>-931790 PoolMgrSelectReadPool 00007EF6BB403CDC48F0AF0B5C7 12CEA6649] [p2p] CacheException(rc=10018;msg=P2P denied: already too many copies (2))

We have -max-copies=2 for replication, so I guess this always happens when a file would be replicated to another pool based on popularity, but shouldn't because there are already the maximum number of copies.
So this isn't really an error message, it should be suppressed and only printed at debug level.
Actually one could debate whether "I was going to replicate because A, but I won't because B" should be logged at all, since doing nothing is exactly the expected behaviour if the max number of copies is reached.

Similarly, we used to get a high volume of messages like
[p2p] CacheException(rc=10018;msg=P2P denied: No pool candidates available/configured/left for p2p or file already everywhere)

This happened when max-copies was still 2, but there were fewer replication pools available.
It seems that debug level would be appropriate for this too.

Do you think these messages can be suppressed during normal operation?
They don't generally provide useful information or indicate a problem.

This is observed in v 2.6; apologies if it is already changed in v 2.10.

debian packages dependencies - missing dependencies

Hi.

The Debian packages miss at least the dependency on a JRE (or JDK?).
Unless it's already known that it really doesn't work with specific JREs, it should probably use Debian's default:

default-jre | java-runtime

or if the headless version is enough:

default-jre-headless | java-runtime-headless

But if some java versions are known to not work (as in the newer dCache's), the following may virtual packages may be used as well: java2-runtime, java5-runtime, java6-runtime, java7-runtime, java8-runtime, e.g. for dcache 2.12 which requires java 8 this would be just: java8-runtime (respectively java8-runtime-headless) (since at least as of now, default-jre is only OpenJDK 8 on most debian arches).

If there are specific JREs known to not than you really must list the ones that work, e.g.

openjdk-8-jre | gcj-5-jre

or whatever...

Cheers,
Chris.

can someone check whether DH / ECC is still broken in current JDK versions?

Hey.

Since some time you disable DH kex and ECC per default, with the reasons:

DISABLE_BROKEN_DH

Diffie-Hellman is broken in Java 1.7u6 and forward. If DISABLE_BROKEN_DH is
included, dCache will disable all cipher families involving Diffie-Hellman on
those versions of Java. Depending on the client, this may result in less
secure SSL/TLS/GSI connections.

plus

Elliptic Curve ciphers are broken in Java 1.7 on Linux. The problem is that the
JRE successfully negotiates the use of cipher variants not supported by libnss3.
If this option is specified, dCache will disable all cipher families involving
Elliptic Curve ciphers.

As you already noted, this is not the best for security,.. especially since ECC is disabled as well, there will be neither DH nor ECDH and thus no PFS TLS.

I did a very quick search on Google (not going too much into depth so) and there were some hints that implied that this could have been fixed in current Java versions.

Could someone check this up?

Thanks,
Chris.

Admin shell timeout handling broken in 2.12

In dcache-2.12.11, the set timeout function in the admin interface does not work.

[dcache-core-photon.desy.de] (local) admin > set timeout 6000
Timeout = 6000
[dcache-core-photon.desy.de] (local) admin > cd dcache-photon14-03
[dcache-core-photon.desy.de] (dcache-photon14-03) admin > rep ls -l=e
Request to [>dcache-photon14-03@local] timed out.
(After 30 sec)

dcache check-config not working properly on solaris

Solaris 10 x86
dcache 2.6.20
-bash-3.00# java -version
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) Server VM (build 24.51-b03, mixed mode)

dcache check-config returns the following output:

-bash-3.00# dcache check-config
grep: illegal option -- E
Usage: grep -hblcnsviw pattern file . . .
grep: illegal option -- E
Usage: grep -hblcnsviw pattern file . . .
grep: illegal option -- E
Usage: grep -hblcnsviw pattern file . . .
No problems found.

probably in /usr/share/dcache/lib/utils.sh printpi function.

dcache.org does not redirect to www.dcache.org

When visiting http://dcache.org I receive no response. Maybe there should be a 301 redirect?

Pool logs stack trace on HSM remove errors using script driver

When using the HSM script driver, if the HSM script returns a non-zero error code on remove callouts, the pool logs a stack trace, e.g.

23 maj 2015 12:57:37 (pool_sm) [cleaner PoolRemoveFilesFromHSM 460d212a-b6c2-4ed1-93c4-0c85519f21aa] Remove of hsm://osm/?store=test&group=tape&bfid=0000DC03F5FD35BA4D9FAF7CDBFF18823B73 failed with {}.
diskCacheV111.util.CacheException: HSM script failed (script reported: 2: Need 'hsmBase' ... 
    at diskCacheV111.util.HsmRunSystem.execute(HsmRunSystem.java:49) ~[dcache-core-2.13.0-SNAPSHOT.jar:2.13.0-SNAPSHOT]
    at org.dcache.pool.nearline.script.ScriptNearlineStorage.remove(ScriptNearlineStorage.java:152) ~[dcache-core-2.13.0-SNAPSHOT.jar:2.13.0-SNAPSHOT]
    at org.dcache.pool.nearline.AbstractBlockingNearlineStorage$3.call(AbstractBlockingNearlineStorage.java:131) ~[dcache-core-2.13.0-SNAPSHOT.jar:2.13.0-SNAPSHOT]
    at org.dcache.pool.nearline.AbstractBlockingNearlineStorage$3.call(AbstractBlockingNearlineStorage.java:125) ~[dcache-core-2.13.0-SNAPSHOT.jar:2.13.0-SNAPSHOT]
    at org.dcache.pool.nearline.AbstractBlockingNearlineStorage$Task.processRequest(AbstractBlockingNearlineStorage.java:251) [dcache-core-2.13.0-SNAPSHOT.jar:2.13.0-SNAPSHOT]
    at org.dcache.pool.nearline.AbstractBlockingNearlineStorage$Task.run(AbstractBlockingNearlineStorage.java:235) [dcache-core-2.13.0-SNAPSHOT.jar:2.13.0-SNAPSHOT]
    at org.dcache.util.CDCExecutorServiceDecorator$WrappedRunnable.run(CDCExecutorServiceDecorator.java:153) [dcache-core-2.13.0-SNAPSHOT.jar:2.13.0-SNAPSHOT]
    at org.dcache.util.BoundedExecutor$Worker.run(BoundedExecutor.java:217) [dcache-core-2.13.0-SNAPSHOT.jar:2.13.0-SNAPSHOT]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_31]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_31]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_31]

migration module marks copy as sticky, even though -tmode=cached

Hi.

When doing e.g.:

\sl 000079F134611A6E451985473D4048A5F71C rep ls -l -a 000079F134611A6E451985473D4048A5F71C
lcg-lrz-dc02_3:
    000079F134611A6E451985473D4048A5F71C <C-------X--L(0)[0]> 2813763 si={atlas:LocalGroupDisk}

and then:

migration copy -pnfsid=000079F134611A6E451985473D4048A5F71C -smode=same -tmode=cached -target=pgroup replicas

I'd expect from the documentation of -tmode=cached, that the new replica is cached only but not sticky.
However:

\sl 000079F134611A6E451985473D4048A5F71C rep ls -l -a 000079F134611A6E451985473D4048A5F71C
lcg-lrz-dc29_2:
    000079F134611A6E451985473D4048A5F71C <C-------X--L(0)[0]> 2813763 si={atlas:LocalGroupDisk}
lcg-lrz-dc02_3:
    000079F134611A6E451985473D4048A5F71C <C-------X--L(0)[0]> 2813763 si={atlas:LocalGroupDisk}```

dCache 2.13.4

Chris.

debian package doesn't purge cleanly

Hey.

dCache creates some files in /etc/dcache, e.g.:
/etc/dcache/
└── admin
├── host_key
├── host_key.pub
├── server_key
├── server_key.pub
├── ssh_host_dsa_key
└── ssh_host_dsa_key.pub

These are neither conffiles of the package nor clean up in the maintainer scripts, and thus one get's errors and leftovers on package purge:
Removing dcache (2.13.4-1) ...
Purging configuration files for dcache (2.13.4-1) ...
dpkg: warning: while removing dcache, directory '/etc/dcache/admin' not empty so not removed

Cheers,
Chris.

xrootd internal timeout not logged by door

DESY periodic testing of prometheus failed with the following:

 > /usr/bin/xrdcp -d 1 -f root://prometheus.desy.de:1094/VOs/desy/generated/SET_testfile -
Thu Jun 25 02:37:05 CEST 2015
[2015-06-25 02:37:48.271670 +0200][Error  ][XRootD            ] [prometheus.desy.de:1094] Handling error while processing kXR_open (file: VOs/desy/generated/SET_testfile, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ): [ERROR] Error response.

[0B/0B][100%][==================================================][0B/s]  
Run: [ERROR] Server responded with an error: [3012] Internal timeout

The corresponding access log file shows:

level=INFO ts=2015-06-25T02:37:05.216+0200 event=org.dcache.xrootd.connection.start session=door:xrootd:AAUZTNKcCyA socket.remote=131.169.223.91:34727 socket.local=131.169.5.149:1094
level=INFO ts=2015-06-25T02:37:07.896+0200 event=org.dcache.xrootd.request request=protocol response=ok
level=INFO ts=2015-06-25T02:37:07.983+0200 event=org.dcache.xrootd.login session=door:xrootd:AAUZTNKcCyA user.mapped=nobody
level=INFO ts=2015-06-25T02:37:07.987+0200 event=org.dcache.xrootd.request request=open path=VOs/desy/generated/SET_testfile options=0x450 response=waitresp
level=INFO ts=2015-06-25T02:37:48.270+0200 event=org.dcache.xrootd.request request=open path=VOs/desy/generated/SET_testfile options=0x450 response=5008
level=INFO ts=2015-06-25T02:37:48.279+0200 event=org.dcache.xrootd.connection.end session=door:xrootd:AAUZTNKcCyA

The pool log files show:

25 Jun 2015 02:38:14 (pool2) [door:xrootd:AAUZTNKcCyA xrootd PoolDeliverFile 0000057FFB93E60D4385866E0185F704B75A] Transfer was forcefully killed

However, nothing is logged by the xrootd door for this time, despite there being some internal issue.

dcap with gsi authn, anonymous.operations=readonly

Hello,

According to http://www.dcache.org/manuals/upgrade-2.10/upgrade-2.6-to-2.10.html#Services , a plain dcap service should be replaced with:

dcap.authn.protocol = gsi dcap.authz.anonymous-operations = READONLY

I tried this using dcache-2.10.31-1.noarch but dccp failed:
$ dccp dcap://charon03.westgrid.ca:32125//pnfs/westgrid-test.uvic.ca/data/atlas/atlasscratchdisk/user.rptaylor/NTUP_TOP.366711._000015.root.1 /dev/null Error ( POLLIN) (with data) on control line [3] Failed to create a control line Error ( POLLIN) (with data) on control line [3] Failed to create a control line Failed open file in the dCache. Can't open source file : Server rejected "hello" System error: Input/output error

And there was a stack trace:
05 Jun 2015 14:35:24 (DCap-gsi-charon03) [] Exception (ITE) in secure protocol : {} java.lang.IllegalArgumentException: String length must be a multiple of four. at javatunnel.Base64.base64ToByteArray(Base64.java:140) ~[javatunnel-2.10.31.jar:2.10.31] at javatunnel.Base64.base64ToByteArray(Base64.java:120) ~[javatunnel-2.10.31.jar:2.10.31] at javatunnel.TunnelConverter.decode(TunnelConverter.java:58) ~[javatunnel-2.10.31.jar:2.10.31] at javatunnel.GssTunnel.decode(GssTunnel.java:104) ~[javatunnel-2.10.31.jar:2.10.31] at javatunnel.GssTunnel.verify(GssTunnel.java:195) ~[javatunnel-2.10.31.jar:2.10.31] at javatunnel.GsiTunnel.verify(GsiTunnel.java:93) ~[javatunnel-2.10.31.jar:2.10.31] at javatunnel.TunnelSocket.verify(TunnelSocket.java:170) ~[javatunnel-2.10.31.jar:2.10.31] at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_80] at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_80] at dmg.protocols.telnet.TelnetStreamEngine.<init>(TelnetStreamEngine.java:111) ~[cells-2.10.31.jar:2.10.31] at dmg.cells.services.login.StreamEngineFactory.newStreamEngine(StreamEngineFactory.java:49) ~[cells-2.10.31.jar:2.10.31] at dmg.cells.services.login.LoginManager$RunEngineThread.run(LoginManager.java:863) ~[cells-2.10.31.jar:2.10.31] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at dmg.cells.nucleus.CellNucleus$5.run(CellNucleus.java:742) [cells-2.10.31.jar:2.10.31] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] 05 Jun 2015 14:35:24 (DCap-gsi-charon03) [] Exception (ITE) in secure protocol : {} java.lang.IllegalArgumentException: String length must be a multiple of four. at javatunnel.Base64.base64ToByteArray(Base64.java:140) ~[javatunnel-2.10.31.jar:2.10.31] at javatunnel.Base64.base64ToByteArray(Base64.java:120) ~[javatunnel-2.10.31.jar:2.10.31] at javatunnel.TunnelConverter.decode(TunnelConverter.java:58) ~[javatunnel-2.10.31.jar:2.10.31] at javatunnel.GssTunnel.decode(GssTunnel.java:104) ~[javatunnel-2.10.31.jar:2.10.31] at javatunnel.GssTunnel.verify(GssTunnel.java:195) ~[javatunnel-2.10.31.jar:2.10.31] at javatunnel.GsiTunnel.verify(GsiTunnel.java:93) ~[javatunnel-2.10.31.jar:2.10.31] at javatunnel.TunnelSocket.verify(TunnelSocket.java:170) ~[javatunnel-2.10.31.jar:2.10.31] at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_80] at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_80] at dmg.protocols.telnet.TelnetStreamEngine.<init>(TelnetStreamEngine.java:111) ~[cells-2.10.31.jar:2.10.31] at dmg.cells.services.login.StreamEngineFactory.newStreamEngine(StreamEngineFactory.java:49) ~[cells-2.10.31.jar:2.10.31] at dmg.cells.services.login.LoginManager$RunEngineThread.run(LoginManager.java:863) ~[cells-2.10.31.jar:2.10.31] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at dmg.cells.nucleus.CellNucleus$5.run(CellNucleus.java:742) [cells-2.10.31.jar:2.10.31] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]

cannot overwrite directory tags using NFS dot commands

master, 2cda82d

[root@otfrid ~]# mkdir -p /pnfs/fs/usr/test/arossi
[root@otfrid ~]# cd /pnfs/fs/usr/test/arossi
[root@otfrid arossi]# echo "NEARLINE" > ".(tag)(AccessLatency)"
[root@otfrid arossi]# cat ".(tag)(AccessLatency)"
NEARLINE
[root@otfrid arossi]# mkdir online
[root@otfrid arossi]# cd online
[root@otfrid online]# echo "ONLINE" > ".(tag)(AccessLatency)"
bash: .(tag)(AccessLatency): Remote I/O error

The reported error:
17 Jun 2015 08:04:45 (NFS-fndcatemp2) [] Unhandled exception:
org.dcache.chimera.IOHimeraFsException: ERROR: duplicate key value violates unique constraint "t_tags_pkey"
Detail: Key (ipnfsid, itagname)=(000015CD7F81D3604BB68B680A28CB98006C, AccessLatency) already exists.
    at org.dcache.chimera.JdbcFs.createTag(JdbcFs.java:1752) ~[chimera-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
    at org.dcache.chimera.JdbcFs.createFile(JdbcFs.java:326) ~[chimera-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
    at org.dcache.nfs.vfs.ChimeraVfs.create(ChimeraVfs.java:97) ~[nfs4j-core-0.10.4.jar:0.10.4]
    at org.dcache.nfs.vfs.VfsCache.create(VfsCache.java:155) ~[nfs4j-core-0.10.4.jar:0.10.4]
    at org.dcache.nfs.vfs.PseudoFs.create(PseudoFs.java:152) ~[nfs4j-core-0.10.4.jar:0.10.4]
    at org.dcache.nfs.v4.OperationOPEN.process(OperationOPEN.java:161) ~[nfs4j-core-0.10.4.jar:0.10.4]
    at org.dcache.nfs.v4.NFSServerV41.NFSPROC4_COMPOUND_4(NFSServerV41.java:149) ~[nfs4j-core-0.10.4.jar:0.10.4]
    at org.dcache.nfs.v4.xdr.nfs4_prot_NFS4_PROGRAM_ServerStub.dispatchOncRpcCall(nfs4_prot_NFS4_PROGRAM_ServerStub.java:48) [nfs4j-core-0.10.4.jar:0.10.4]
    at org.dcache.xdr.RpcDispatcher$1.run(RpcDispatcher.java:85) [oncrpc4j-core-2.3.3.jar:na]
    at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:591) [grizzly-framework-2.3.21.jar:2.3.21]
    at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:571) [grizzly-framework-2.3.21.jar:2.3.21]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_25]
Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "t_tags_pkey"
   Detail: Key (ipnfsid, itagname)=(000015CD7F81D3604BB68B680A28CB98006C, AccessLatency) already exists.
    at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2270) ~[postgresql-9.4-1201-jdbc41.jar:9.4]
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1998) ~[postgresql-9.4-1201-jdbc41.jar:9.4]
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255) ~[postgresql-9.4-1201-jdbc41.jar:9.4]
    at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:570) ~[postgresql-9.4-1201-jdbc41.jar:9.4]
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:420) ~[postgresql-9.4-1201-jdbc41.jar:9.4]
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:366) ~[postgresql-9.4-1201-jdbc41.jar:9.4]
    at com.zaxxer.hikari.proxy.PreparedStatementProxy.executeUpdate(PreparedStatementProxy.java:61) ~[HikariCP-2.3.7.jar:na]
    at com.zaxxer.hikari.proxy.PreparedStatementJavassistProxy.executeUpdate(PreparedStatementJavassistProxy.java) ~[HikariCP-2.3.7.jar:na]
    at org.dcache.chimera.FsSqlDriver.assignTagToDir(FsSqlDriver.java:1553) ~[chimera-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
    at org.dcache.chimera.FsSqlDriver.createTag(FsSqlDriver.java:1444) ~[chimera-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
    at org.dcache.chimera.JdbcFs.createTag(JdbcFs.java:1747) ~[chimera-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
    ... 11 common frames omitted

bring back update link groups -blocking

Hi.

AFIAU, the -blocking option was dropped from update link groups.
This is kinda bad, cause when doing scripting one may want the results of an e.g. following ls link groups to be current, and when it doesn't block, this may not be the case, right?

Could the option come back?

Cheers,
Chris.

In web admin interface, max stores and max restores columns are zero instead of the real value

In dCache 2.10, when we look at the web admin interface pool request queues, for instance here:
http://dcmain.grid.sara.nl:2288/webadmin/queueinfo
... we always see zeroes in the 'max stores' and 'max restores' columns, whatever the value in the pool is. The 'act' and 'que' columns seem to have the correct values.

The old style web interface has the same issue:
http://dcmain.grid.sara.nl:2288/queueInfo

In 2.6 this worked as expected.

ACLs are not inherited when uploading with SRM

When uploading with the SRM protocol, the file inherits the ACLs from the upload directory (if any) rather than from the parent directory.

The expected result is that a file being uploaded inherits the ACLs from its parent directory, just like it does when any other protocol is used.

HSM script arguments split where file name contains spaces

When dCache calls our HSM script to put a file name with spaces, the -si argument is split by these spaces into separate arguments.

We have a loop like this to parse arguments:

while [ $# -gt 0 ] ; do
  # log "Processing argument: $1"
  # do some things
  shift 1
done

Then these are the arguments that are received. Please note the file name "onnotest met spaties":

08/13/15-11:55:38.020624330 13167 dmfcp.sh[2471][v14.11g,main,DEBUG]: : Processing argument: 'put'
08/13/15-11:55:38.024871098 13167 dmfcp.sh[2471][v14.11g,main,DEBUG]: : Processing argument: '000029BB27BF0B6249F086806662F7DD1C83'
08/13/15-11:55:38.028737311 13167 dmfcp.sh[2471][v14.11g,main,DEBUG]: : Processing argument: '/space/atlas/tape/pool/data/000029BB27BF0B6249F086806662F7DD1C83'
08/13/15-11:55:38.032784219 13167 dmfcp.sh[2471][v14.11g,main,DEBUG]: : Processing argument: '-si=size=7;new=true;stored=false;sClass=atlas:tape;cClass=-;hsm=osm;accessLatency=NEARLINE;retentionPolicy=CUSTODIAL;path=/pnfs/grid.sara.nl/data/atlas/onnotest'
08/13/15-11:55:38.048670680 13167 dmfcp.sh[2471][v14.11g,main,DEBUG]: : Processing argument: 'met'
08/13/15-11:55:38.052595239 13167 dmfcp.sh[2471][v14.11g,main,DEBUG]: : Processing argument: 'spaties;uid=36487;gid=31306;StoreName=atlas;LinkGroupId=1;flag-c=1:0ad2027a;store=atlas;group=tape;bfid=;'
08/13/15-11:55:38.056554410 13167 dmfcp.sh[2471][v14.11g,main,DEBUG]: : Processing argument: '-hsmBase=/cxfs/TIER1SC/tape_storage'
08/13/15-11:55:38.071549532 13167 dmfcp.sh[2471][v14.11g,main,DEBUG]: : Processing argument: '-pnfs=/pnfs/ftpBase'

This looks like a bug to me: it seems that dCache calls the HSM script without quotes around the arguments. Is that so? Could it be fixed?

Kind regards
Onno

setgid bit does not stick for files uploaded through SRM

When uploading to a directory with the setgid bit set, the group is not inherited for files uploaded to that directory through the SRM.

Observed with both 2.10 and 2.13.

dcache / dcache Goto Github PK

dcache's Introduction

dCache

Documentation

Getting Started

License

Contributors

How to contribute

dcache's People

Contributors

Stargazers

Watchers

Forkers

dcache's Issues

Recommend Projects

Recommend Topics

Recommend Org