Coder Social home page Coder Social logo

cse221notes's Introduction

cse221notes

  1. paper summaries
  2. lecture notes
  3. part of my notes is in the issues

Caveats:

Reading the original papers will give you better understanding of why these talents design systems in this way.

cse221notes's People

Contributors

byshen avatar

Stargazers

 avatar PolarBear avatar  avatar Zesen Zhang avatar Tianyi Shan  avatar Xi Pu avatar Haochen Huang avatar Zheng Luo avatar

Watchers

Zheng Luo avatar  avatar

Forkers

tianyishan

cse221notes's Issues

soft updates & RIO file cache

metadata consistency/reliability

FFS synchronized writes => LFS solves the problem but may cause consistency issues.

opportunity/benefit => cache metadata writes, (async) update every 30s

update dependence
update sequence/order

roll back & roll forward: a new update on file cache depends on a update to disk, so it waits and undo all the updates to that portion of blocks, and make updates in order. (like a time machine on that portion of blocks) Then. roll forward to the most recent state. (the block is locked to prevent applications from seeing the rolled-back state.)

constraints/rules:

  • no dangling ptrs => e.g. "create": update innode first, then update directory block pointer; "delete" is reverse; may cause dependency on blocks
  • never reuse until all old ptrs removed
  • never reset last ptr untill new one disk is on disk.

=> granularity of a pointer, keep all changes to the pointer, not the entire inode/directory pointer.
roll back to a point where no cyclic dependency exists, and do a bunch of updates together.

fsync: all pending updates for that file must go to disk before the application to continue..

lottery scheduling

Simple and useful to the problem it solves. Randomized decisions to achieve a good performance.

Goal: proportional scheduling to the execution rate of computations. (statistically)

Tickets -> currencies. => representation.
decides: how much CPU and how long to wait.

Lock inversion. Threads waiting for the lock will donate tickets to the thread holding the lock, make it has a higher priority.

Can be used for CPU, network, memory, disk, etc. Uniformity.

A priority based scheduler, is more likey to cause block. A randomized.

Why it is not used?
Hard to express how many tickets the thread should have.

implementing RPC

Client diagram for how RPC works
Client diagram for how RPC works

Why RPC?

  • well defined semantics. (programmers do not need to take care of programming communications), familiar with the semantics is a big advantage
  • principle -> as much like local! Just like remote library calls.

How RPC works? (see the diagram)

  • Calls in the client are registered in the run time as libraries
  • server just take of the buffer read&write in the runtime
  • RPC Runtime does all the communication

What do we do when the types (passing as the parameters) are complex?

  • need to generate the code automatically for both client/server stubs, for every application
  • e.g. string/array/pointer-based data structures, needed to linearized into the buffer, then the data structures will be recovered in the server side. => higher overhead.
  • e.g. function pointers? Some need to migrate the code, and recompile the code... Then you need distributed garbage collections.. too complex that no one uses. (better local data/local code)

Procedural-based languages => object-oriented languages

  • things to consider: need copy object address space? or just migrate objects?

how to support multiple langiages (like grpc)

  • network data representation (XDR), agreement on the network how to interpret data
  • compiler support (same rules for different compilers)

Bindings => client-server bindings before communications.

Optimizations:

  • process: Use of idle processes in caller and callee machines to reduce process creation and process swaps.
  • communications: design its own specialized comm protocols. (1) simple calls: fit into one packet. (2)complex calls: (not common)
  • connections: Avoid the cost of establishing and termination connection by just sending requests. stateless servers. (impact on NFS server design), minimize the impact of many idle clients.

microkernel and exokernel

OS-MK(VARIANTS)-EXOKERNEL
L4
push as much as possible to use level.

  • low perf penalty of microkernel: L4 LINUX ~ LINUX (<10%)

Benifits

  • protability - multiple arches
  • extensibility (benifit of user level)+ specializations (pipes, less generability, less overhead)

Why is microkernele slow?

  • App use IPC trap into mk, then send a message to OS. (4 times) (colocated monolithic OS with mk has less overhead, only 2 times)

=======
MK components

  • IPC
  • Thread <-> address spaces (managing page tables)
    • context switch - change address space a lot of flush TLB, (L4 WILL try to use physical address as much as possible, thus less TLB flush, think about many adress spaces)
  • Memory
    • one additional step, OS want to update the page table entry, so OS trap to L4 kenel (runs at high level privilege)
    • this means that L4 actually manages all address space for
  • Events
    • normal is syscall/exception, trap into OS
    • now OS are at user lv, app trap into mk, mk encapsulate into an IPC message, then invoke an OS handler
  • CPU -> Sched
    • just normal time sharing among threads

Mach (OS server) for comparison


ExoKernel

  • expose h/w
  • management

Need three functionalities

  • track resource ownership
    • simple, track in the exokernel
  • protection
    • secure bindings (applications can securely bind to machine resources and handle events.)
    • implemented by hardware mechanisms, software caching, and downloading application code.
    • Examples, PTE/TLB, packet filters
  • resource revoke (abort protocol)
  • memory

    • libOS want to update TLB, exokernel check if the TLB is owned by the App.
    • s/wTLB caching
    • s/w managed TLBs (exo mechanism), (h/w TLBs like x86, exo does not work)
  • network

    • packet filters
  • disk

  • want to minimize microkernel

  • each app has its own libOS, the OS has only the features that app needs

Micro v.s. Exo Difference?

  • M: use IPC communicate. Easy to share files
  • Exo: just library call. Application have own file system. Hard to share files. No cross kernel, no ipc, much fast to call.

Linux scalability to many cores

Many cores

  • new OS design (Corey exo)
  • wait, maybe monolithic is ok
  • what are fundamental limitations (1. os implementation, 2. OS interface, 3. fundamental problem)

1. locks on shared data structures

more cores -> more wait (RCU does not work well for more copys)

2. Waiting to shared memory

cache coherency (RCU does not solve this problem, serialization in one place)

  • sloppy counters (1. distribute counts, so each core just get the count from one place, do not sync with other cores, 2. expensive to compute the global sum, but ok to use)

cache line aliasing - false sharing

  • cache coherency is at the granularity of cache line, some data structs are not actually shared, but on the same cache line, so different cores ping-pong on the same cache, more cache misses).

3. limitation for finite resources

more contention

  • cache capacity - memory bus/PCI bandwidth

4. not enough tasks but use more cores

more idleness

GFS

GFS -> HDFS

Requirements:

  • large scale (1k - 100k nodes)
  • componenets always failing
  • very large files
  • read dominated
  • writes append, many writers - (record append)
  • bandwidth > latency

scheduler activations

  • ”classic" - e.g ls, short small process
  • "one user kernel" - e.g. JVM. many user thread stacks , each corresponding to one kernel thread stack, but all these kernel stacks are in the same kernel address space (this is not efficient because OS DOES not know which of next optimal kernel thread to run on, it is actually determined by the corresponding user thread)
  • User level threads (libraries) - e.g. pthreads
  • "M - N" Many user kernels, M user level threads, N kernel stacks (on N cores), just put M into the ready queue, when kernel threads are available to run on. (This still has the drawback of blocking the kernel thread by an invisible user thread)

User:

  • Lightweight => fast => just procedure calls
  • invisible to OS, only know the kernel thread it is in => bad scheduling (e.g. reading from disk, a thread block entire theads)

Kernel

  • system calls => more expensive to cross domains
  • obvious
  • physical parallelism => each processor has one kernel thread to run on

IX- a dataplane OS

RPCs are widely used in datacenters(DC)

  • multiple languages
  • efficient network

question: really fast h/w, why is communication slow? - too MUCH overhead in the communication path.
(Linux put TCP/IP inside the OS, so communication between apps need to cross domains. => some design move tcp/ip to the user-app level, achieves ZERO-COPY, but lose protection.)

Goals

  • high throughput - batching !
  • low latency - batching, run to completion
  • protection - root ring 0
  • many connections
  • elastic resources

designs:
(1) Run-to-completion: - Improves data-cache locality - Remove scheduling unpredictability. Just do not stop the queues on the NICs.
(2) batching: as large as cache size, improve locality. prefetching.

protection:

  • libix/apps on ring 3
  • control path still on ring 0
  • tcp/ip and the dataplane IX on ring 0 (non-root). But shares the same address space for apps (like exokernel). Thus efficient data sharing and protection are achieved.

Berkerly FFS & log structure FS

cylinder groups are the tracks across cylinders. (speed across the surface is faster than seek. => the new format to organize the FS in FFS.)

Question: the difference between seek and move between tracks?

Problems & solutions:
see table here.
Notes
cylinder group => inode and data are in the same group.
superblocks (are fixed to point to inodes), inodes are not fixed now.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.