Comments (7)
Potentially it's related. The problem I am seeing is that the symlink is pointing to itself. I don't know if this bug is coming from Nextflow or from GCSFuse.
% gcloud storage ls --full gs://[REDACTED]-scratch/nextflow-work/sidb-scratch-test/c1/e8b37f991cc3ab2a636e6af8e663e0/.command.sh
gs://[REDACTED]-scratch/nextflow-work/sidb-scratch-test/c1/e8b37f991cc3ab2a636e6af8e663e0/.command.sh:
Creation Time: 2024-07-17T18:39:45Z
Update Time: 2024-07-17T18:39:45Z
Storage Class Update Time: 2024-07-17T18:39:45Z
Storage Class: STANDARD
Content-Length: 0
Content-Type: text/plain; charset=utf-8
Additional Properties:
{
"gcsfuse_symlink_target": "/mnt/disks/[REDACTED]-scratch/nextflow-work/sidb-scratch-test/c1/e8b37f991cc3ab2a636e6af8e663e0/.command.sh"
}
Hash (CRC32C): AAAAAA==
Hash (MD5): 1B2M2Y8AsgTpgAmY7PhCfg==
ETag: CMaAmsrcrocDEAE=
Generation: 1721241585483846
Metageneration: 1
ACL: []
TOTAL: 1 objects, 0 bytes (0B)
from nextflow.
The storage volume is needed exactly for the reason to be used a temporary space when using scratch = true
. Setting scratch = false
will cause the task to work directly in the bucket via gcsfuse, resulting in the error you are experiencing.
The scratch = false
is only supported when using Fusion file system see here for details.
from nextflow.
By storage volume, I meant GCSFuse. The mounting solution is called "Cloud Storage Volume" in Google Cloud Batch.
The work directory bucket is mounted through GCSFuse already, so I assumed that it is OK for Nextflow to work directly in the mounted directory. And was surprised that it did not work.
I don't see how Fusion and GCSFuse need to be different. They are both Fuse file systems. The documentation for Fusion also says that it enables the work directory to be the mounted cloud directory, foregoing the need for a scratch space.
from nextflow.
They are both Fuse file systems
That's the same of saying all cars are equals because have four wheels
from nextflow.
I thought gcsfuse always worked without scratch storage, but now I see that the google batch executor sets scratch to true by default:
I wonder if this error is the same as #4845
from nextflow.
I tried to look into what scratch means in the context of google-batch. It seems like the stage process simply symlinks files from gcsfuse so that's actually equivalent to scratchless behavior. Upon exit, the unstage process will copy files from current directory to the gcsfuse paths. I suppose then the main difference then is that with scratch enabled, all output files start getting written out at the end of the whole process, whereas with scratch disabled, the output files start getting written out as soon they are closed.
A major difference with Fusion would also be the automatic use of local SSDs for /tmp. And of course, Fusion could be more optimized than gcsfuse.
from nextflow.
I thought gcsfuse always worked without scratch storage, but now I see that the google batch executor sets scratch to true by default
@bentsherman Sent #5256 for the error I encountered. I included some commentary as to what it means to have a scratch dir vs not when using Google Batch.
from nextflow.
Related Issues (20)
- `exec: not found` when running 24.07.0-edge Azure Batch virtual machine HOT 2
- Process cache groups HOT 1
- Wait until pipeline completes before publishing HOT 2
- AWS spot failure - custom error message HOT 3
- Task output files mixed up between different subworkflow instances HOT 8
- optional parameter for secret directive
- Provide a unified start timestamp for scripts and config files
- Azurebatch does not respect "disk" directive HOT 3
- Trouble running nextflow with SGE
- Docker builder override option cpu-shares HOT 3
- Spot instances per process in GCP
- Implicit variable launchDir breaks caching
- Specify shell for batch jobs HOT 3
- Cached images duplicated by Nextflow and Apptainer/Singularity when ociAutoPull set to true HOT 2
- How to select different processes based on the number of files in channel.fromPath?
- Workflow onComplete/onError handlers don't work when defined in the entry workflow
- aliyun serverless kubernetes schedule failure HOT 1
- Document Azure environment variables
- `nextflow clean` does not work with cloudcache
- Clarification of null string passed to publishdir directive HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nextflow.