`Placeholder` 实现

include/glow/Graph/Nodes.h

// Placeholder nodes are unbound-storage. The content tensors are attached to
// this node at runtime. Placeholders are used as inputs and output nodes to
// the network.
class Placeholder : public Storage {
  // Specifies if the placeholder is trainable.
  bool isTrainable_;
}

Node
- DivNode
- LogNode
- Storage
  - Constant
  - PlaceHolder
- XXNode
由此可见，Placeholder 是一个特殊的 StorageNode
从某种程度上来说 glow 更像是 TensorFlow，由若干 Op (Node) 组成一张 Graph，再以此 Graph 执行后续操作

`Tensor` 实现

include/glow/Base/Tensor.h

// A class that represents a contiguous n-dimensional array (a tensor).
class Tensor final {
private:
  // A pointer to the tensor data.
  char *data_{nullptr};

  // The type of the tensor.
  Type type_;
}

Tensor::getType() 返回当前 Tensor 的 Type 实例

const Type &getType() const { return type_; }

Tensor::getElementType() 返回当前 Tensor 对应的类型，例如 FloatTy, Int64ITy

ElemKind getElementType() const { return type_.getElementType(); }

检查边界

// \returns True if the coordinate is within the array.
  bool isInBounds(llvm::ArrayRef<size_t> indices) const {
    assert(type_.numSizes_ == indices.size() && "Invalid number of indices");
    for (size_t i = 0u, e = indices.size(); i < e; i++) {
      if (indices[i] >= type_.sizes_[i]) {
        return false;
      }
    }
    return true;
  }

初始化 Tensor 尺寸
1

LLVM 5 - Backend and TableGen

https://llvm.org/docs/WritingAnLLVMBackend.html

Computer Architecture

https://blog.csdn.net/prike/article/details/70210328
https://yq.aliyun.com/ziliao/436511
https://www.seas.upenn.edu/~cis565/fbo.htm

dump SelectionDAG and SelectionDAGISel

DL Compiler News

from 0.5 to 2.0 of AI Chips
https://zhuanlan.zhihu.com/p/57808378

argument of IR
https://mp.weixin.qq.com/s?__biz=MzI3MDQ2MjA3OA==&mid=2247484317&idx=1&sn=70ddd439dc33f3e8a30579244695ce65&chksm=ead1fe8cdda6779afe88fe7a8dd2693d0aae9425dfe6a74b96fb63e78afb1c6d6786a30f2c1e&scene=21#wechat_redirect

Parallelism and Optimization

OpenCL Demo

LLVM 4 - Passes

https://llvm.org/docs/Passes.html
print all passes name in o3
read an ir pass
read an dag pass (has dag pass?)

Graph DB Neo4J

ThreadPool with timeout

https://stackoverflow.com/questions/26063877/python-multiprocessing-module-join-processes-with-timeout

async memcpy

LLVM 1 - Conceptions

http://www.llvmpy.org/llvmpy-doc/dev/doc/llvm_concepts.html

Understand cl / cuda memory architecture

vim or emacs gdb frontend

https://stackoverflow.com/questions/38803783/how-to-automatically-refresh-gdb-in-tui-mode
C-x a - Tnter tui mode
C-x s - Single key mode
C-x 1 / 2 - Num of Windows

BiFrost

Skia Library

https://skia.org/

Issues backup

Glow 2 - Framework

Folder Structure

glow
- Base: IO, Image, Tensor, Train, Type ...
- Backends: Backend, BackendUtils, CompiledFunction, LayoutConverter
- CodeGen: MemoryAllocator
- Converter: FunctionConverter, TypeAToTypeBFunctionConverter
- ExecutionEngine: ExecutionEngine
- Graph: Context, Graph, Node, Grad, Hook, NodeValue
- IR: GraphScheduler, IR, IRGen, IRUtils, Instrs
- Importer: ProtobufLoader, Caffe2ModelLoader, ONNXModelLoader, ONNXIFIModelLoader
- Optimizer: GraphOptimizer, IROptimizer, Quantization, Lower
- Quantization: Quantization, Serialization
- Support: ThreadPool, Random
- Testing

Graphics Render Modes

MLPerf

https://mlperf.org/

TVM / VTA / TinyFlow

第一届TVM与深度学习编译器会议总结:
https://zhuanlan.zhihu.com/p/55860793

TVM Conference
https://sampl.cs.washington.edu/tvmconf/#about-tvmconf

OpenCL News

SYCL

https://www.khronos.org/sycl/
#46

POCL

http://portablecl.org/
https://github.com/pocl/pocl

Other

https://mathema.tician.de/the-state-of-opencl-for-scientific-computing-in-2018/
https://streamhpc.com/blog/2017-05-16/khronos-releases-opencl-2-2-spir-v-1-2

STL Common

torch.nn.ConvTranspose2d

https://zhuanlan.zhihu.com/p/48501100

卷积是卷积的逆过程，又称作转置卷积。最大的区别在于反卷积过程是有参数要进行学习的（类似卷积过程），理论是反卷积可以实现UnPooling和unSampling，只要卷积核的参数设置的合理。
反卷积的操作只是恢复了矩阵 X 的尺寸大小，并不能恢复 X 的每个元素值

torch.nn.LeakyReLU

Leaky version of a Rectified Linear Unit.
Leaky ReLUs allow a small, non-zero gradient when the unit is not active.

Parametric ReLUs take this idea further by making the coefficient of leakage into a parameter that is learned along with the other neural network parameters.

SYCL

https://www.khronos.org/sycl/
https://github.com/intel/llvm/blob/sycl/sycl/doc/GetStartedWithSYCLCompiler.md
https://www.khronos.org/assets/uploads/apis/2015-sycl-page3.jpg
https://www.codeplay.com/portal/sycl-tutorial-1-the-vector-addition
https://www.khronos.org/news/press/khronos-releases-sycl-1.2-final-specification-c-single-source-heterogeneous
https://www.khronos.org/news/press/khronos-releases-opencl-2.2-provisional-spec-opencl-c-kernel-language

Inline ASM

https://software.intel.com/en-us/articles/introduction-to-x64-assembly
https://cs.brown.edu/courses/cs033/docs/guides/x64_cheatsheet.pdf
https://www.cs.uaf.edu/2017/fall/cs301/reference/x86_64.html
https://www.jianshu.com/p/10e8a7b4f980
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

Android Spv Compute Shader

LLVM 6 - Intrinsic

https://zhuanlan.zhihu.com/p/53659330
https://llvm.org/docs/ExtendingLLVM.html

CodeGenAndEmitDAG and FastISel

weight init

    def weight_init(self):
        def trig_init(out_channels, in_channels, size_kernel=3):
            n_kernels = out_channels
            #
            table = {16: (4, 4), 32: (4, 8), 64: (8, 8), 96: (8, 12),  128: (8, 16), 192: (12, 16),
                     256: (16, 16), 288: (16, 18), 384: (16, 24), 512: (16, 32), 1024: (32, 32)}
            h_array, w_array = table[n_kernels]
            #
            xx = np.linspace(-np.pi, np.pi, w_array * size_kernel, dtype=np.float32)
            yy = np.linspace(-np.pi, np.pi, h_array * size_kernel, dtype=np.float32)
            xx, yy = np.meshgrid(xx, yy)
            zz = np.cos(np.sqrt(xx ** 2 + yy ** 2))
            #
            param = np.zeros(shape=[n_kernels, in_channels, size_kernel, size_kernel], dtype=np.float32)
            for y in range(h_array):
                for x in range(w_array):
                    i = y * w_array + x
                    left, right = x * size_kernel, (x + 1) * size_kernel
                    top, bottom = y * size_kernel, (y + 1) * size_kernel
                    param[i, :, :, :] = zz[top:bottom, left:right]
            #
            return param * 1e-3

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                assert m.weight.dtype == torch.float32
                n, c, h, w = m.weight.shape
                assert h == w
                new_weight = trig_init(out_channels=n, in_channels=c, size_kernel=h)
                assert new_weight.shape == m.weight.shape
                m.weight.data = torch.from_numpy(new_weight)

NVDLA

http://nvdla.org/
http://nvdla.org/primer.html
https://github.com/nvdla
https://github.com/nvdla/hw

Vulkan 1: Demo

Steps

Instance and device selection
- Vulkan Instance: The instance is the connection between your application and the Vulkan library and creating it involves specifying some details about your application to the driver.
- Validation Layers: Validation layers are optional components that hook into Vulkan function calls to apply additional operations, do checking, tracking and logging.
- Physical Device: Look for and select a graphics card in the system that supports the features we need. We can select any number of graphics cards and use them simultaneously.
- Logical Device: Set up a logical device to interface with it, similar to the instance creation. Could
  create multiple logical devices from the same physical device if you have varying requirements.
Window surface and swap chain
- Surface: Establish the connection between Vulkan and the window system to present results to the screen.
- Swap Chain: Own the buffers we will render to before we visualize them on the screen.
Image views and framebuffers
- Image Views: To use any VkImage we have to create a VkImageView object. It describes how to access the image and which part of the image to access.
- Framebuffers: A framebuffer object references all of the VkImageView objects that represent the attachments.
Render passes
- Before we can finish creating the pipeline, we need to tell Vulkan about the framebuffer attachments that will be used while rendering. We need to specify how many color and depth buffers there will be, how many samples to use for each of them and how their contents should be handled throughout the rendering operations.
Graphics pipeline
Command pools and command buffers
Main loop

步骤

创建 VkInstance，实例之间互相独立可以单独开启特性或者配置不同硬件参数
通过 VkInstance 检查设备可用性和拓展，再以此创建 VkPhysicalDevice
通过 VkPhysicalDevice 创建 VkDevice，作为操作 GPU 的句柄（类似 GL Context）
使用 VkDevice 创建资源 VkImage 和 VkBuffer：
- 创建 VkImage 之前需要设定 Image 的用法：Color Attachment 或 Sampled Image 或 Image Load / Store.
- 创建 VkBuffer 稍微简单些，只需指定尺寸与用途。
- VkImage 无法直接使用，因此需要 VkImageView。VkBuffer 可以直接使用，但是如果要用于 TextureBuffer 则需要 VkBufferView。
vkAllocateMemory(...) 为上一步创建的资源分配内存，vkMapMemory() / vkUnmapMemory() 完成数据更新。

Note

Overview
- OpenGL designed time hardware was limited with fixed function render, and as graphics card architectures matured, new function had to be integrated with the existing lagacy API, which makes driver to do lots of guesswork on programmer's intent.
Subpass
- A single render pass can consist of multiple subpasses.
- Subpasses are subsequent rendering operations that depend on the contents of framebuffers in previous passes.
必须正确初始化 VkInstanceCreateInfo createInfo = {}; 否则 vkCreateInstance 默认值为 UB 会挂

Ref

https://github.com/Overv/VulkanTutorial/tree/master/code
https://vulkan-tutorial.com/Overview#page_What_it_takes_to_draw_a_triangle
https://github.com/DsoTsin/AndroidDev/blob/master/Vulkan%20in%2030%20Minutes.md

BPP

https://github.com/wwei10/ga-bpp
https://github.com/YacineGACI/BinPacking-GeneticAlgorithm
https://github.com/pnvasko/GeneticBinPacker
https://github.com/scottndiku/Binpacking-GA

Perspective Transform

RocksDB Transaction

Glow 1 - Basic Steps

Glow

A machine learning compiler and execution engine for various hardware targets.
A backend for high-level machine learning frameworks.
State of the art compiler optimizations and code generation of neural network graphs.

How

Lowers a traditional neural network dataflow graph into a two-phase strongly-typed IR.
High-level IR -> domain-specific optimizations (maybe for special network structure and operation).
Lower-level instruction-based address-only IR -> memory-related optimizations (instruction scheduling, static memory allocation and copy elimination).
Lowest level -> machine-specific code generation (take advantage of specialized hardware features).

Feature

Support a high number of input operators.
Support a large number of hardware targets.
Eliminating the need to implement all operators on all targets.
Reduce the input space to focus on a small number of linear algebra primitives.

Conway's Game of Life

https://stackoverflow.com/questions/40485/optimizing-conways-game-of-life

sRGB to linear RGB

Burnikel Ziegler Division

https://github.com/wyndavies/LongInteger/blob/master/LongIntegers/LongInteger.cpp#L1519
https://github.com/wyndavies/LongInteger/blob/master/LongIntegers/LongInteger.cpp#L2987
https://github.com/python/cpython/blob/master/Objects/longobject.c#L1700
https://github.com/python/cpython/blob/master/Objects/longobject.c#L2770
https://www.geeksforgeeks.org/restoring-division-algorithm-unsigned-integer/
https://www.geeksforgeeks.org/non-restoring-division-unsigned-integer/
https://stackoverflow.com/questions/32744423/big-integer-division-with-operands-aproximately-of-the-same-size
https://gmplib.org/~tege/division-paper.pdf
https://githu-------------b.com/holiman/uint256/issues/5
https://golang.org/src/math/big/int.go
https://golang.org/src/math/big/nat.go (natural number)
https://golang.org/src/math/big/arith.go#L254 (impl)
https://github.com/igraph/igraph/blob/master/src/bignum.c#L1409

LLVM 3 - TargetMachine

control flow obfuscation

torch.nn.parallel

https://pytorch.org/docs/stable/nn.html#dataparallel-layers-multi-gpu-distributed
https://pytorch.org/docs/stable/nn.html#dataparallel-functions-multi-gpu-distributed

CMake dependencies

Rendering Pipeline

Rendering Pipeline Overview
Vulkan Introduction

Specs

shader storage object

Overview

Buffer Objects
- Buffer Object Streaming
- Vertex Buffer Objects
- Pixel Buffer Objects
- Shader Storage Buffer Objects
- Uniform Buffer Objects
Vertex Array Objects
Textures

Reference

https://www.khronos.org/opengl/wiki/Shader_Storage_Buffer_Object
https://www.opengl.org/discussion_boards/showthread.php/199803-UBO-vs-SSBO-for-large-array-of-local-to-world-transformation-matrices

ytgui / temp Goto Github PK

temp's People

Contributors

Watchers

temp's Issues

Placeholder 实现

Tensor 实现

Folder Structure

SYCL

POCL

Other

Steps

步骤

Note

Ref

Glow

How

Feature

Overview

Reference

Recommend Projects

Recommend Topics

Recommend Org

`Placeholder` 实现

`Tensor` 实现