Coder Social home page Coder Social logo

recup's Introduction

Metadata Collection from Dask Workloads:

Dask Workflow Example:

The image processing notebook from Dask examples contains an image processing pipeline. In this version, we use the distributed scheduler of Dask rather rath plain Dask. The notebook has been edited for this work purpose: Metadata Collection from Dask Distributed Workflows.

The Screenshot video in this repo shows the Dashboard of a Dask Cluster running the image processing notebook. we can see how the workflow progresses, the distribution of the tasks over threads, the occupancy of the workers, and so on.

More documentation on the performance profiling and Dask Dashboard are here.

Dask Metadata Data Map

In a typical Dask workload, several metadata can be collected. These metadata can be represented in several layers ( Job level, Dask configuration, Task graph, Task, Runtime …) that we present in the following map.

Dask Metadata Map Fig. Metadata Map in Dask Workloads.

There are two dimensions in this map. The time dimension is represented by steps (1-5) and abstraction levels (A-E).

Column [A-B-C] shows the evolution of a typical Dask workflow, which metadata we can get at each step, and their location.

  1. Step 1: Configuration and launching the Dask Cluster
  2. Step 2: Task creation and submission. It happens at the client level
    • usually, it is a Python script describing the workflows.
    • The metadata can be retrieved from the script, it is static, and lazy (tasks are created and then submitted to the cluster to be run later) NB: The task graph may be optimized by Dask, check Phases of a coputation
  3. Step 3: The population of internal Dask structures:
    • There are several classes in the scheduler, and each of them keeps the state of a given entity (TaskState, WorkerState, ClientState …)
    • Here we have both static and dynamic data:
      • Static: tasks, and their dependencies
      • Dynamic: the transition of the tasks and their evolution (it happens at runtime): . Where a task is running . The story of a task (it’s transitions)
  4. Step 4-5: Task execution:
    • Step 4: Task reception from the scheduler
      • metadata we can get from the worker data structures
        • TaskState to track task progress
        • WorkerState …
    • Step 5: metadata from Darshan/Yappi reports..
      • Yappi Reports (that take into account asyncio stuff)
      • Darshan Reports

The Column [C-D-E] represents a categorization of metadata and their location:

  • E: Higher level categories (Job level, Workflow system level, System level)
  • D: lower level categories (job, config, client, scheduler, worker, process, thread, coroutine) Level
  • C: Locations

recup's People

Contributors

gueroudjiamal avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.