Integration Proposal Implement a new module that handles cre

VCK integration proposal about ffdl HOT 2 OPEN

ibm commented on May 14, 2024

VCK integration proposal

from ffdl.

Comments (2)

animeshsingh commented on May 14, 2024

Need to figure out a shared file storage for all the learner pods (required for many distributed learning methods) and a way to store the model results for our users.

So this shared file storage, will be satisfied the PVC work? Or you explicitly need NFS under the covers?

from ffdl.

Tomcli commented on May 14, 2024

Many distributed learning methods required shared file storage to sync with the other workers. Currently all our workers are mounted on the same input and result bucket, so we have that satisfied. However, with VCK that pulls the data to HostPath, each K8s node will have their own path for the input and result directory. So we need to figure out a shared place where we should store the result files and other files that are required to shared among all the workers.

With the PVC work, this definitely could be solved for the NFS use case because it is mounted with PV. However, for S3 or Pachyderm using VCK we still have the same issue since VCK technically create replicas in the HostPath for the files (can be from multiple sources) that you want to cache.

from ffdl.

VCK integration proposal about ffdl HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent