Coder Social home page Coder Social logo

temp's People

Contributors

ytgui avatar

Watchers

 avatar  avatar

temp's Issues

Glow 3 - Implementation

Placeholder 实现

include/glow/Graph/Nodes.h

// Placeholder nodes are unbound-storage. The content tensors are attached to
// this node at runtime. Placeholders are used as inputs and output nodes to
// the network.
class Placeholder : public Storage {
  // Specifies if the placeholder is trainable.
  bool isTrainable_;
}
  • Node
    • DivNode
    • LogNode
    • Storage
      • Constant
      • PlaceHolder
    • XXNode
  • 由此可见,Placeholder 是一个特殊的 StorageNode
  • 从某种程度上来说 glow 更像是 TensorFlow,由若干 Op (Node) 组成一张 Graph,再以此 Graph 执行后续操作

Tensor 实现

include/glow/Base/Tensor.h

// A class that represents a contiguous n-dimensional array (a tensor).
class Tensor final {
private:
  // A pointer to the tensor data.
  char *data_{nullptr};

  // The type of the tensor.
  Type type_;
}
  • Tensor::getType() 返回当前 Tensor 的 Type 实例
const Type &getType() const { return type_; }
  • Tensor::getElementType() 返回当前 Tensor 对应的类型,例如 FloatTy, Int64ITy
ElemKind getElementType() const { return type_.getElementType(); }
  • 检查边界
// \returns True if the coordinate is within the array.
  bool isInBounds(llvm::ArrayRef<size_t> indices) const {
    assert(type_.numSizes_ == indices.size() && "Invalid number of indices");
    for (size_t i = 0u, e = indices.size(); i < e; i++) {
      if (indices[i] >= type_.sizes_[i]) {
        return false;
      }
    }
    return true;
  }
  • 初始化 Tensor 尺寸

  • 1

Glow 2 - Framework

Folder Structure

  • glow
    • Base: IO, Image, Tensor, Train, Type ...
    • Backends: Backend, BackendUtils, CompiledFunction, LayoutConverter
    • CodeGen: MemoryAllocator
    • Converter: FunctionConverter, TypeAToTypeBFunctionConverter
    • ExecutionEngine: ExecutionEngine
    • Graph: Context, Graph, Node, Grad, Hook, NodeValue
    • IR: GraphScheduler, IR, IRGen, IRUtils, Instrs
    • Importer: ProtobufLoader, Caffe2ModelLoader, ONNXModelLoader, ONNXIFIModelLoader
    • Optimizer: GraphOptimizer, IROptimizer, Quantization, Lower
    • Quantization: Quantization, Serialization
    • Support: ThreadPool, Random
    • Testing

torch.nn.ConvTranspose2d

https://zhuanlan.zhihu.com/p/48501100

  • 卷积是卷积的逆过程,又称作转置卷积。最大的区别在于反卷积过程是有参数要进行学习的(类似卷积过程),理论是反卷积可以实现UnPooling和unSampling,只要卷积核的参数设置的合理。
  • 反卷积的操作只是恢复了矩阵 X 的尺寸大小,并不能恢复 X 的每个元素值

torch.nn.LeakyReLU

  • Leaky version of a Rectified Linear Unit.
  • Leaky ReLUs allow a small, non-zero gradient when the unit is not active.

Parametric ReLUs take this idea further by making the coefficient of leakage into a parameter that is learned along with the other neural network parameters.

image

weight init

    def weight_init(self):
        def trig_init(out_channels, in_channels, size_kernel=3):
            n_kernels = out_channels
            #
            table = {16: (4, 4), 32: (4, 8), 64: (8, 8), 96: (8, 12),  128: (8, 16), 192: (12, 16),
                     256: (16, 16), 288: (16, 18), 384: (16, 24), 512: (16, 32), 1024: (32, 32)}
            h_array, w_array = table[n_kernels]
            #
            xx = np.linspace(-np.pi, np.pi, w_array * size_kernel, dtype=np.float32)
            yy = np.linspace(-np.pi, np.pi, h_array * size_kernel, dtype=np.float32)
            xx, yy = np.meshgrid(xx, yy)
            zz = np.cos(np.sqrt(xx ** 2 + yy ** 2))
            #
            param = np.zeros(shape=[n_kernels, in_channels, size_kernel, size_kernel], dtype=np.float32)
            for y in range(h_array):
                for x in range(w_array):
                    i = y * w_array + x
                    left, right = x * size_kernel, (x + 1) * size_kernel
                    top, bottom = y * size_kernel, (y + 1) * size_kernel
                    param[i, :, :, :] = zz[top:bottom, left:right]
            #
            return param * 1e-3

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                assert m.weight.dtype == torch.float32
                n, c, h, w = m.weight.shape
                assert h == w
                new_weight = trig_init(out_channels=n, in_channels=c, size_kernel=h)
                assert new_weight.shape == m.weight.shape
                m.weight.data = torch.from_numpy(new_weight)

Vulkan 1: Demo

Steps

  1. Instance and device selection
    • Vulkan Instance: The instance is the connection between your application and the Vulkan library and creating it involves specifying some details about your application to the driver.
    • Validation Layers: Validation layers are optional components that hook into Vulkan function calls to apply additional operations, do checking, tracking and logging.
    • Physical Device: Look for and select a graphics card in the system that supports the features we need. We can select any number of graphics cards and use them simultaneously.
    • Logical Device: Set up a logical device to interface with it, similar to the instance creation. Could
      create multiple logical devices from the same physical device if you have varying requirements.
  2. Window surface and swap chain
    • Surface: Establish the connection between Vulkan and the window system to present results to the screen.
    • Swap Chain: Own the buffers we will render to before we visualize them on the screen.
  3. Image views and framebuffers
    • Image Views: To use any VkImage we have to create a VkImageView object. It describes how to access the image and which part of the image to access.
    • Framebuffers: A framebuffer object references all of the VkImageView objects that represent the attachments.
  4. Render passes
    • Before we can finish creating the pipeline, we need to tell Vulkan about the framebuffer attachments that will be used while rendering. We need to specify how many color and depth buffers there will be, how many samples to use for each of them and how their contents should be handled throughout the rendering operations.
  5. Graphics pipeline
  6. Command pools and command buffers
  7. Main loop

步骤

  1. 创建 VkInstance,实例之间互相独立可以单独开启特性或者配置不同硬件参数
  2. 通过 VkInstance 检查设备可用性和拓展,再以此创建 VkPhysicalDevice
  3. 通过 VkPhysicalDevice 创建 VkDevice,作为操作 GPU 的句柄(类似 GL Context)
  4. 使用 VkDevice 创建资源 VkImage 和 VkBuffer:
    • 创建 VkImage 之前需要设定 Image 的用法:Color Attachment 或 Sampled Image 或 Image Load / Store.
    • 创建 VkBuffer 稍微简单些,只需指定尺寸与用途。
    • VkImage 无法直接使用,因此需要 VkImageView。VkBuffer 可以直接使用,但是如果要用于 TextureBuffer 则需要 VkBufferView。
  5. vkAllocateMemory(...) 为上一步创建的资源分配内存,vkMapMemory() / vkUnmapMemory() 完成数据更新。

Note

  • Overview

    • OpenGL designed time hardware was limited with fixed function render, and as graphics card architectures matured, new function had to be integrated with the existing lagacy API, which makes driver to do lots of guesswork on programmer's intent.
  • Subpass

    • A single render pass can consist of multiple subpasses.
    • Subpasses are subsequent rendering operations that depend on the contents of framebuffers in previous passes.
  • 必须正确初始化 VkInstanceCreateInfo createInfo = {}; 否则 vkCreateInstance 默认值为 UB 会挂

Ref

https://github.com/Overv/VulkanTutorial/tree/master/code
https://vulkan-tutorial.com/Overview#page_What_it_takes_to_draw_a_triangle
https://github.com/DsoTsin/AndroidDev/blob/master/Vulkan%20in%2030%20Minutes.md

Glow 1 - Basic Steps

Glow

  • A machine learning compiler and execution engine for various hardware targets.
  • A backend for high-level machine learning frameworks.
  • State of the art compiler optimizations and code generation of neural network graphs.

How

  • Lowers a traditional neural network dataflow graph into a two-phase strongly-typed IR.
  • High-level IR -> domain-specific optimizations (maybe for special network structure and operation).
  • Lower-level instruction-based address-only IR -> memory-related optimizations (instruction scheduling, static memory allocation and copy elimination).
  • Lowest level -> machine-specific code generation (take advantage of specialized hardware features).

Feature

  • Support a high number of input operators.
  • Support a large number of hardware targets.
  • Eliminating the need to implement all operators on all targets.
  • Reduce the input space to focus on a small number of linear algebra primitives.

Burnikel Ziegler Division

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.