Coder Social home page Coder Social logo

Comments (7)

siddharthab avatar siddharthab commented on September 18, 2024 1

Potentially it's related. The problem I am seeing is that the symlink is pointing to itself. I don't know if this bug is coming from Nextflow or from GCSFuse.

% gcloud storage ls --full gs://[REDACTED]-scratch/nextflow-work/sidb-scratch-test/c1/e8b37f991cc3ab2a636e6af8e663e0/.command.sh
gs://[REDACTED]-scratch/nextflow-work/sidb-scratch-test/c1/e8b37f991cc3ab2a636e6af8e663e0/.command.sh:
  Creation Time:               2024-07-17T18:39:45Z
  Update Time:                 2024-07-17T18:39:45Z
  Storage Class Update Time:   2024-07-17T18:39:45Z
  Storage Class:               STANDARD
  Content-Length:              0
  Content-Type:                text/plain; charset=utf-8
  Additional Properties:
  {
    "gcsfuse_symlink_target": "/mnt/disks/[REDACTED]-scratch/nextflow-work/sidb-scratch-test/c1/e8b37f991cc3ab2a636e6af8e663e0/.command.sh"
  }
  Hash (CRC32C):               AAAAAA==
  Hash (MD5):                  1B2M2Y8AsgTpgAmY7PhCfg==
  ETag:                        CMaAmsrcrocDEAE=
  Generation:                  1721241585483846
  Metageneration:              1
  ACL:                         []
TOTAL: 1 objects, 0 bytes (0B)

from nextflow.

pditommaso avatar pditommaso commented on September 18, 2024

The storage volume is needed exactly for the reason to be used a temporary space when using scratch = true. Setting scratch = false will cause the task to work directly in the bucket via gcsfuse, resulting in the error you are experiencing.

The scratch = false is only supported when using Fusion file system see here for details.

from nextflow.

siddharthab avatar siddharthab commented on September 18, 2024

By storage volume, I meant GCSFuse. The mounting solution is called "Cloud Storage Volume" in Google Cloud Batch.

The work directory bucket is mounted through GCSFuse already, so I assumed that it is OK for Nextflow to work directly in the mounted directory. And was surprised that it did not work.

I don't see how Fusion and GCSFuse need to be different. They are both Fuse file systems. The documentation for Fusion also says that it enables the work directory to be the mounted cloud directory, foregoing the need for a scratch space.

from nextflow.

pditommaso avatar pditommaso commented on September 18, 2024

They are both Fuse file systems

That's the same of saying all cars are equals because have four wheels

from nextflow.

bentsherman avatar bentsherman commented on September 18, 2024

I thought gcsfuse always worked without scratch storage, but now I see that the google batch executor sets scratch to true by default:

// enable use of local scratch dir
if( scratch==null )
scratch = true

I wonder if this error is the same as #4845

from nextflow.

siddharthab avatar siddharthab commented on September 18, 2024

I tried to look into what scratch means in the context of google-batch. It seems like the stage process simply symlinks files from gcsfuse so that's actually equivalent to scratchless behavior. Upon exit, the unstage process will copy files from current directory to the gcsfuse paths. I suppose then the main difference then is that with scratch enabled, all output files start getting written out at the end of the whole process, whereas with scratch disabled, the output files start getting written out as soon they are closed.

A major difference with Fusion would also be the automatic use of local SSDs for /tmp. And of course, Fusion could be more optimized than gcsfuse.

from nextflow.

siddharthab avatar siddharthab commented on September 18, 2024

I thought gcsfuse always worked without scratch storage, but now I see that the google batch executor sets scratch to true by default

@bentsherman Sent #5256 for the error I encountered. I included some commentary as to what it means to have a scratch dir vs not when using Google Batch.

from nextflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.