Trying to run the implicit refinement in parallel on the cluster using the following script:
#!/bin/bash
# Batch script for mpirun job on cbio cluster.
#
#
# walltime : maximum wall clock time (hh:mm:ss)
#PBS -l walltime=24:00:00
#
# join stdout and stderr
#PBS -j oe
#
# spool output immediately
#PBS -k oe
#
# specify queue
#PBS -q gpu
#
# nodes: number of 8-core nodes
# ppn: how many cores per node to use (1 through 8)
# (you are always charged for the entire node)
#PBS -l nodes=4:ppn=4:gpus=4:shared
#
# export all my environment variables to the job
##PBS -V
#
# job name (default = name of script file)
#PBS -N implicit-refinement
#
#specifcy email for notifications
#PBS -M [email protected]
cd /cbio/jclab/home/albaness/ensembler/BRAF
module load cuda/6.5
build_mpirun_configfile --mpitype conda ensembler refine_implicit
mpirun -configfile configfile
Auto-selected OpenMM platform: CUDA
Auto-selected OpenMM platform: CUDA
Auto-selected OpenMM platform: CUDA
Auto-selected OpenMM platform: CUDA
Auto-selected OpenMM platform: CUDA
Auto-selected OpenMM platform: CUDA
Auto-selected OpenMM platform: CUDA
Auto-selected OpenMM platform: CUDA
Auto-selected OpenMM platform: CUDA
Auto-selected OpenMM platform: CUDA
Auto-selected OpenMM platform: CUDA
Auto-selected OpenMM platform: CUDA
Auto-selected OpenMM platform: CUDA
Auto-selected OpenMM platform: CUDA
Auto-selected OpenMM platform: CUDA
Auto-selected OpenMM platform: CUDA
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_4KSQ_B in implicit solvent for 100.0 ps (MPI rank: 2, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_4MNE_G in implicit solvent for 100.0 ps (MPI rank: 7, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_4MNE_C in implicit solvent for 100.0 ps (MPI rank: 5, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_4MBJ_A in implicit solvent for 100.0 ps (MPI rank: 3, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_4KSQ_A in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 0)
-------------------------------------------------------------------------
/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/ensembler/refinement.py:300: UserWarning: = ERROR start: MPI rank 7 hostname gpu-2-15.local gpuid 0 =
Error launching CUDA compiler: 256
<built-in>:0:0: fatal error: when writing output to : Bad file descriptor
compilation terminated.
Traceback (most recent call last):
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/ensembler/refinement.py", line 288, in refine_implicit_md
simulate_implicit_md()
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/ensembler/refinement.py", line 110, in simulate_implicit_md
modeller.addHydrogens(forcefield, pH=ph, variants=reference_variants)
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/simtk/openmm/app/modeller.py", line 853, in addHydrogens
context = Context(system, VerletIntegrator(0.0))
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/simtk/openmm/openmm.py", line 15050, in __init__
this = _openmm.new_Context(*args)
Exception: Error launching CUDA compiler: 256
<built-in>:0:0: fatal error: when writing output to : Bad file descriptor
compilation terminated.
= ERROR end: MPI rank 7 hostname gpu-2-15.local gpuid 0
mpistate.rank, socket.gethostname(), gpuid, e, trbk
/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/ensembler/refinement.py:300: UserWarning: = ERROR start: MPI rank 5 hostname gpu-2-15.local gpuid 0 =
Error launching CUDA compiler: 256
<built-in>:0:0: fatal error: when writing output to : Bad file descriptor
compilation terminated.
Traceback (most recent call last):
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/ensembler/refinement.py", line 288, in refine_implicit_md
simulate_implicit_md()
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/ensembler/refinement.py", line 110, in simulate_implicit_md
modeller.addHydrogens(forcefield, pH=ph, variants=reference_variants)
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/simtk/openmm/app/modeller.py", line 853, in addHydrogens
context = Context(system, VerletIntegrator(0.0))
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/simtk/openmm/openmm.py", line 15050, in __init__
this = _openmm.new_Context(*args)
Exception: Error launching CUDA compiler: 256
<built-in>:0:0: fatal error: when writing output to : Bad file descriptor
compilation terminated.
= ERROR end: MPI rank 5 hostname gpu-2-15.local gpuid 0
mpistate.rank, socket.gethostname(), gpuid, e, trbk
/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/ensembler/refinement.py:300: UserWarning: = ERROR start: MPI rank 2 hostname gpu-2-12.local gpuid 0 =
Error launching CUDA compiler: 256
<built-in>:0:0: fatal error: when writing output to : Bad file descriptor
compilation terminated.
Traceback (most recent call last):
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/ensembler/refinement.py", line 288, in refine_implicit_md
simulate_implicit_md()
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/ensembler/refinement.py", line 110, in simulate_implicit_md
modeller.addHydrogens(forcefield, pH=ph, variants=reference_variants)
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/simtk/openmm/app/modeller.py", line 853, in addHydrogens
context = Context(system, VerletIntegrator(0.0))
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/simtk/openmm/openmm.py", line 15050, in __init__
this = _openmm.new_Context(*args)
Exception: Error launching CUDA compiler: 256
<built-in>:0:0: fatal error: when writing output to : Bad file descriptor
compilation terminated.
= ERROR end: MPI rank 2 hostname gpu-2-12.local gpuid 0
mpistate.rank, socket.gethostname(), gpuid, e, trbk
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_5CT7_B in implicit solvent for 100.0 ps (MPI rank: 2, GPU ID: 0)
-------------------------------------------------------------------------
/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/ensembler/refinement.py:300: UserWarning: = ERROR start: MPI rank 1 hostname gpu-2-12.local gpuid 0 =
Error launching CUDA compiler: 256
<built-in>:0:0: fatal error: when writing output to : Bad file descriptor
compilation terminated.
Traceback (most recent call last):
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/ensembler/refinement.py", line 288, in refine_implicit_md
simulate_implicit_md()
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/ensembler/refinement.py", line 110, in simulate_implicit_md
modeller.addHydrogens(forcefield, pH=ph, variants=reference_variants)
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/simtk/openmm/app/modeller.py", line 853, in addHydrogens
context = Context(system, VerletIntegrator(0.0))
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/simtk/openmm/openmm.py", line 15050, in __init__
this = _openmm.new_Context(*args)
Exception: Error launching CUDA compiler: 256
<built-in>:0:0: fatal error: when writing output to : Bad file descriptor
compilation terminated.
= ERROR end: MPI rank 1 hostname gpu-2-12.local gpuid 0
mpistate.rank, socket.gethostname(), gpuid, e, trbk
/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/ensembler/refinement.py:300: UserWarning: = ERROR start: MPI rank 3 hostname gpu-2-12.local gpuid 0 =
Error launching CUDA compiler: 256
<built-in>:0:0: fatal error: when writing output to : Bad file descriptor
compilation terminated.
Traceback (most recent call last):
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/ensembler/refinement.py", line 288, in refine_implicit_md
simulate_implicit_md()
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/ensembler/refinement.py", line 110, in simulate_implicit_md
modeller.addHydrogens(forcefield, pH=ph, variants=reference_variants)
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/simtk/openmm/app/modeller.py", line 853, in addHydrogens
context = Context(system, VerletIntegrator(0.0))
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/simtk/openmm/openmm.py", line 15050, in __init__
this = _openmm.new_Context(*args)
Exception: Error launching CUDA compiler: 256
<built-in>:0:0: fatal error: when writing output to : Bad file descriptor
compilation terminated.
= ERROR end: MPI rank 3 hostname gpu-2-12.local gpuid 0
mpistate.rank, socket.gethostname(), gpuid, e, trbk
/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/ensembler/refinement.py:300: UserWarning: = ERROR start: MPI rank 2 hostname gpu-2-12.local gpuid 0 =
Error launching CUDA compiler: 256
<built-in>:0:0: fatal error: when writing output to : Bad file descriptor
compilation terminated.
Traceback (most recent call last):
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/ensembler/refinement.py", line 288, in refine_implicit_md
simulate_implicit_md()
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/ensembler/refinement.py", line 110, in simulate_implicit_md
modeller.addHydrogens(forcefield, pH=ph, variants=reference_variants)
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/simtk/openmm/app/modeller.py", line 857, in addHydrogens
LocalEnergyMinimizer.minimize(context, 1.0, 50)
File "/cbio/jclab/home/albaness/miniconda/lib/python2.7/site-packages/simtk/openmm/openmm.py", line 12223, in minimize
return _openmm.LocalEnergyMinimizer_minimize(*args)
Exception: Error launching CUDA compiler: 256
<built-in>:0:0: fatal error: when writing output to : Bad file descriptor
compilation terminated.
= ERROR end: MPI rank 2 hostname gpu-2-12.local gpuid 0
mpistate.rank, socket.gethostname(), gpuid, e, trbk
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_3II5_A in implicit solvent for 100.0 ps (MPI rank: 12, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_3D4Q_A in implicit solvent for 100.0 ps (MPI rank: 8, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_1UWH_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_2FB8_A in implicit solvent for 100.0 ps (MPI rank: 4, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_3PPJ_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_3PRF_A in implicit solvent for 100.0 ps (MPI rank: 4, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_3PSB_A in implicit solvent for 100.0 ps (MPI rank: 8, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_3Q4C_A in implicit solvent for 100.0 ps (MPI rank: 12, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_3SKC_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_3TV6_A in implicit solvent for 100.0 ps (MPI rank: 4, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_4E26_A in implicit solvent for 100.0 ps (MPI rank: 8, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_4EHE_A in implicit solvent for 100.0 ps (MPI rank: 12, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_4FC0_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_4G9C_A in implicit solvent for 100.0 ps (MPI rank: 4, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_4KSP_B in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_4MBJ_B in implicit solvent for 100.0 ps (MPI rank: 4, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_4H58_A in implicit solvent for 100.0 ps (MPI rank: 8, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_4JVG_C in implicit solvent for 100.0 ps (MPI rank: 12, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating BRAF_HUMAN_D0 => BRAF_HUMAN_D0_4PP7_B in implicit solvent for 100.0 ps (MPI rank: 12, GPU ID: 0)
-------------------------------------------------------------------------
Done.
Compute mode is already set to DEFAULT for GPU 0000:84:00.0.
All done.
Compute mode is already set to DEFAULT for GPU 0000:83:00.0.
All done.
Compute mode is already set to DEFAULT for GPU 0000:04:00.0.
All done.
Compute mode is already set to DEFAULT for GPU 0000:03:00.0.
All done.
Is this a problem with my clusterutils setup? It looks like the errors are with cuda on only certain nodes.