Coder Social home page Coder Social logo

hardware_introduction's Introduction

What scientists must know about hardware to write fast code

This document is hosted at https://viralinstruction.com/posts/hardware/

It is written as a Pluto notebook. If you can, I recommend running the code in a Pluto notebook so you can play around with it and learn. Alternatively, you can read the HTML file in your browser.

PRs are welcome.

This notebook covers:

  • Why you must limit your disk read/writes
  • What a CPU cache is, and how to use it effectively
  • Memory alignment
  • How to read assembly code and why you must do it
  • Why you should reduce allocations
  • Why immutable datastructures usually are fastest
  • SIMD vectorization
  • Struct of arrays vs array of structs
  • Specialized CPU instructions
  • Function inlining
  • Loop unrolling
  • Branch prediction
  • The effects of memory dependencies in the CPU pipeline
  • Multithreading
  • Why GPUs are fast at some things and slow at others

hardware_introduction's People

Contributors

colindaven avatar digital-carver avatar jakobnissen avatar rfourquet avatar vsoch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hardware_introduction's Issues

Missing data file

The Julia notebook loads the file "/Users/jakobnissen/Downloads/test.jgi.abundance.dat" in the second cell. Currently this causes an error for me as I do not have this file! Perhaps this file could be included and this line amended?

Few unclear points

Few things which I don't really understand and might need reformulation:

  • "And you must strive to do your computation when only reading in your file once" (I can't make sense of this)
  • "to move a single integer of data" (what is an integer of data?)
  • "Data from the stack can only be accessed from the end" : to be pedantic, it's rather that data can be added or removed only from the end, but access to non-end can be fine. But might not be worth to bother with this detail.
  • "[the compiler needs to know for certain that:] The compiler can predict exactly when it needs to access the program so it can reach it by simply popping the stack" (I'm not sure what "compiler accesses the program" means)

Also, there is a N - N\%4 somewhere, the \ seems to be spurious

A few suggestions

These are a few suggestions where I either don't know the solution, or felt they were too subjective to include in the PR, so I'll leave these here up to your judgment.


"Size of $T: " in Pluto unfortunately prints Size of Main.var"workspace#3".AlignmentTest: which makes the output a bit hard to read. It would be nice if there was a way to avoid the Main.var ... prefix in the output.


The text says

There is no register that can do 128-bit additions.

followed in the next paragraph by

modern CPUs contain specialized 256-bit registers (or 128-bit in older CPUs, or 512-bit in the brand new ones)

I'm not sure of the exact phrasing to use (perhaps there is established terminology for the usual "non-specialized" registers that I don't know of), but maybe the first sentence could be changed to something like "Traditionally, registers on the CPU cannot do 128-bit additions."


In the "SIMD needs a loop where loop order doesn't matter" section, the first code block doesn't use print, and instead uses Pluto's default output of the cell to show the value. Since the rest of the notebook shows output below the cell, it maybe unintuitive to non-Pluto users to look for the output above the cell in this one case. So perhaps change that to

begin
    x = eps(1.0) * 0.4
    print(1.0 + (x + x) == (1.0 + x) + x)
end

or

 begin
     x = eps(1.0) * 0.4
     (1.0 + (x + x) == (1.0 + x) + x) |> print
 end

(whichever seems more readable/easy-to-understand)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.