temp's People
temp's Issues
LLVM 2 - Tricks
pack GDBGUI
Eclipse
ARM NEON
Glow 3 - Implementation
Placeholder
实现
include/glow/Graph/Nodes.h
// Placeholder nodes are unbound-storage. The content tensors are attached to
// this node at runtime. Placeholders are used as inputs and output nodes to
// the network.
class Placeholder : public Storage {
// Specifies if the placeholder is trainable.
bool isTrainable_;
}
- Node
- DivNode
- LogNode
- Storage
- Constant
- PlaceHolder
- XXNode
- 由此可见,Placeholder 是一个特殊的 StorageNode
- 从某种程度上来说 glow 更像是 TensorFlow,由若干 Op (Node) 组成一张 Graph,再以此 Graph 执行后续操作
Tensor
实现
include/glow/Base/Tensor.h
// A class that represents a contiguous n-dimensional array (a tensor).
class Tensor final {
private:
// A pointer to the tensor data.
char *data_{nullptr};
// The type of the tensor.
Type type_;
}
Tensor::getType()
返回当前 Tensor 的 Type 实例
const Type &getType() const { return type_; }
Tensor::getElementType()
返回当前 Tensor 对应的类型,例如FloatTy
,Int64ITy
ElemKind getElementType() const { return type_.getElementType(); }
- 检查边界
// \returns True if the coordinate is within the array.
bool isInBounds(llvm::ArrayRef<size_t> indices) const {
assert(type_.numSizes_ == indices.size() && "Invalid number of indices");
for (size_t i = 0u, e = indices.size(); i < e; i++) {
if (indices[i] >= type_.sizes_[i]) {
return false;
}
}
return true;
}
-
初始化 Tensor 尺寸
-
1
LLVM 5 - Backend and TableGen
Computer Architecture
dump SelectionDAG and SelectionDAGISel
DL Compiler News
Parallelism and Optimization
OpenCL Demo
LLVM 4 - Passes
- https://llvm.org/docs/Passes.html
- print all passes name in o3
- read an ir pass
- read an dag pass (has dag pass?)
Graph DB Neo4J
ThreadPool with timeout
async memcpy
LLVM 1 - Conceptions
Understand cl / cuda memory architecture
vim or emacs gdb frontend
https://stackoverflow.com/questions/38803783/how-to-automatically-refresh-gdb-in-tui-mode
C-x a - Tnter tui mode
C-x s - Single key mode
C-x 1 / 2 - Num of Windows
BiFrost
Skia Library
Issues backup
Glow 2 - Framework
Folder Structure
glow
Base
:IO, Image, Tensor, Train, Type ...
Backends
:Backend, BackendUtils, CompiledFunction, LayoutConverter
CodeGen
:MemoryAllocator
Converter
:FunctionConverter, TypeAToTypeBFunctionConverter
ExecutionEngine
:ExecutionEngine
Graph
:Context, Graph, Node, Grad, Hook, NodeValue
IR
:GraphScheduler, IR, IRGen, IRUtils, Instrs
Importer
:ProtobufLoader, Caffe2ModelLoader, ONNXModelLoader, ONNXIFIModelLoader
Optimizer
:GraphOptimizer, IROptimizer, Quantization, Lower
Quantization
:Quantization, Serialization
Support
:ThreadPool, Random
Testing
Graphics Render Modes
MLPerf
TVM / VTA / TinyFlow
第一届TVM与深度学习编译器会议总结:
https://zhuanlan.zhihu.com/p/55860793
TVM Conference
https://sampl.cs.washington.edu/tvmconf/#about-tvmconf
OpenCL News
STL Common
torch.nn.ConvTranspose2d
https://zhuanlan.zhihu.com/p/48501100
- 卷积是卷积的逆过程,又称作转置卷积。最大的区别在于反卷积过程是有参数要进行学习的(类似卷积过程),理论是反卷积可以实现UnPooling和unSampling,只要卷积核的参数设置的合理。
- 反卷积的操作只是恢复了矩阵 X 的尺寸大小,并不能恢复 X 的每个元素值
torch.nn.LeakyReLU
- Leaky version of a Rectified Linear Unit.
- Leaky ReLUs allow a small, non-zero gradient when the unit is not active.
Parametric ReLUs take this idea further by making the coefficient of leakage into a parameter that is learned along with the other neural network parameters.
SYCL
https://www.khronos.org/sycl/
https://github.com/intel/llvm/blob/sycl/sycl/doc/GetStartedWithSYCLCompiler.md
https://www.khronos.org/assets/uploads/apis/2015-sycl-page3.jpg
https://www.codeplay.com/portal/sycl-tutorial-1-the-vector-addition
https://www.khronos.org/news/press/khronos-releases-sycl-1.2-final-specification-c-single-source-heterogeneous
https://www.khronos.org/news/press/khronos-releases-opencl-2.2-provisional-spec-opencl-c-kernel-language
Inline ASM
Android Spv Compute Shader
LLVM 6 - Intrinsic
CodeGenAndEmitDAG and FastISel
weight init
def weight_init(self):
def trig_init(out_channels, in_channels, size_kernel=3):
n_kernels = out_channels
#
table = {16: (4, 4), 32: (4, 8), 64: (8, 8), 96: (8, 12), 128: (8, 16), 192: (12, 16),
256: (16, 16), 288: (16, 18), 384: (16, 24), 512: (16, 32), 1024: (32, 32)}
h_array, w_array = table[n_kernels]
#
xx = np.linspace(-np.pi, np.pi, w_array * size_kernel, dtype=np.float32)
yy = np.linspace(-np.pi, np.pi, h_array * size_kernel, dtype=np.float32)
xx, yy = np.meshgrid(xx, yy)
zz = np.cos(np.sqrt(xx ** 2 + yy ** 2))
#
param = np.zeros(shape=[n_kernels, in_channels, size_kernel, size_kernel], dtype=np.float32)
for y in range(h_array):
for x in range(w_array):
i = y * w_array + x
left, right = x * size_kernel, (x + 1) * size_kernel
top, bottom = y * size_kernel, (y + 1) * size_kernel
param[i, :, :, :] = zz[top:bottom, left:right]
#
return param * 1e-3
for m in self.modules():
if isinstance(m, nn.Conv2d):
assert m.weight.dtype == torch.float32
n, c, h, w = m.weight.shape
assert h == w
new_weight = trig_init(out_channels=n, in_channels=c, size_kernel=h)
assert new_weight.shape == m.weight.shape
m.weight.data = torch.from_numpy(new_weight)
NVDLA
Vulkan 1: Demo
Steps
- Instance and device selection
- Vulkan Instance: The instance is the connection between your application and the Vulkan library and creating it involves specifying some details about your application to the driver.
- Validation Layers: Validation layers are optional components that hook into Vulkan function calls to apply additional operations, do checking, tracking and logging.
- Physical Device: Look for and select a graphics card in the system that supports the features we need. We can select any number of graphics cards and use them simultaneously.
- Logical Device: Set up a logical device to interface with it, similar to the instance creation. Could
create multiple logical devices from the same physical device if you have varying requirements.
- Window surface and swap chain
- Surface: Establish the connection between Vulkan and the window system to present results to the screen.
- Swap Chain: Own the buffers we will render to before we visualize them on the screen.
- Image views and framebuffers
- Image Views: To use any VkImage we have to create a VkImageView object. It describes how to access the image and which part of the image to access.
- Framebuffers: A framebuffer object references all of the VkImageView objects that represent the attachments.
- Render passes
- Before we can finish creating the pipeline, we need to tell Vulkan about the framebuffer attachments that will be used while rendering. We need to specify how many color and depth buffers there will be, how many samples to use for each of them and how their contents should be handled throughout the rendering operations.
- Graphics pipeline
- Command pools and command buffers
- Main loop
步骤
- 创建 VkInstance,实例之间互相独立可以单独开启特性或者配置不同硬件参数
- 通过 VkInstance 检查设备可用性和拓展,再以此创建 VkPhysicalDevice
- 通过 VkPhysicalDevice 创建 VkDevice,作为操作 GPU 的句柄(类似 GL Context)
- 使用 VkDevice 创建资源 VkImage 和 VkBuffer:
- 创建 VkImage 之前需要设定 Image 的用法:Color Attachment 或 Sampled Image 或 Image Load / Store.
- 创建 VkBuffer 稍微简单些,只需指定尺寸与用途。
- VkImage 无法直接使用,因此需要 VkImageView。VkBuffer 可以直接使用,但是如果要用于 TextureBuffer 则需要 VkBufferView。
- vkAllocateMemory(...) 为上一步创建的资源分配内存,vkMapMemory() / vkUnmapMemory() 完成数据更新。
Note
-
Overview
- OpenGL designed time hardware was limited with fixed function render, and as graphics card architectures matured, new function had to be integrated with the existing lagacy API, which makes driver to do lots of guesswork on programmer's intent.
-
Subpass
- A single render pass can consist of multiple subpasses.
- Subpasses are subsequent rendering operations that depend on the contents of framebuffers in previous passes.
-
必须正确初始化
VkInstanceCreateInfo createInfo = {};
否则vkCreateInstance
默认值为 UB 会挂
Ref
https://github.com/Overv/VulkanTutorial/tree/master/code
https://vulkan-tutorial.com/Overview#page_What_it_takes_to_draw_a_triangle
https://github.com/DsoTsin/AndroidDev/blob/master/Vulkan%20in%2030%20Minutes.md
BPP
Perspective Transform
RocksDB Transaction
Glow 1 - Basic Steps
Glow
- A machine learning compiler and execution engine for various hardware targets.
- A backend for high-level machine learning frameworks.
- State of the art compiler optimizations and code generation of neural network graphs.
How
- Lowers a traditional neural network dataflow graph into a two-phase strongly-typed IR.
- High-level IR -> domain-specific optimizations (maybe for special network structure and operation).
- Lower-level instruction-based address-only IR -> memory-related optimizations (instruction scheduling, static memory allocation and copy elimination).
- Lowest level -> machine-specific code generation (take advantage of specialized hardware features).
Feature
- Support a high number of input operators.
- Support a large number of hardware targets.
- Eliminating the need to implement all operators on all targets.
- Reduce the input space to focus on a small number of linear algebra primitives.
Conway's Game of Life
sRGB to linear RGB
Burnikel Ziegler Division
https://github.com/wyndavies/LongInteger/blob/master/LongIntegers/LongInteger.cpp#L1519
https://github.com/wyndavies/LongInteger/blob/master/LongIntegers/LongInteger.cpp#L2987
https://github.com/python/cpython/blob/master/Objects/longobject.c#L1700
https://github.com/python/cpython/blob/master/Objects/longobject.c#L2770
https://www.geeksforgeeks.org/restoring-division-algorithm-unsigned-integer/
https://www.geeksforgeeks.org/non-restoring-division-unsigned-integer/
https://stackoverflow.com/questions/32744423/big-integer-division-with-operands-aproximately-of-the-same-size
https://gmplib.org/~tege/division-paper.pdf
https://githu-------------b.com/holiman/uint256/issues/5
https://golang.org/src/math/big/int.go
https://golang.org/src/math/big/nat.go (natural number)
https://golang.org/src/math/big/arith.go#L254 (impl)
https://github.com/igraph/igraph/blob/master/src/bignum.c#L1409
LLVM 3 - TargetMachine
control flow obfuscation
torch.nn.parallel
CMake dependencies
Rendering Pipeline
Specs
shader storage object
Overview
- Buffer Objects
- Buffer Object Streaming
- Vertex Buffer Objects
- Pixel Buffer Objects
- Shader Storage Buffer Objects
- Uniform Buffer Objects
- Vertex Array Objects
- Textures
Reference
https://www.khronos.org/opengl/wiki/Shader_Storage_Buffer_Object
https://www.opengl.org/discussion_boards/showthread.php/199803-UBO-vs-SSBO-for-large-array-of-local-to-world-transformation-matrices
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.