mmcgrana / services-engineering Goto Github PK
View Code? Open in Web Editor NEWA reading list for services engineering, with a focus on cloud infrastructure services
A reading list for services engineering, with a focus on cloud infrastructure services
Crew Resource Management: a Positive Change for the Fire Service
Best article-length resource I've been able to find so far, probably can replace the current Wikipedia link.
Raft is an attempt at making a consensus algorithm that is easily understandable(compared with Paxos). https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf
E.g. posts from Amazon, GitHub, Heroku.
Resilience Engineering in Practice (Hollnagel et al.)
Basic material on capacity planning.
Best suggestion so far is 'The Art of Capacity Planning' as discussed in #21.
Automatic Management of Partitioned, Replicated Search Services (Leibert et al.)
Nice short, practical paper on managing a replicated search service in production.
Video is here, need to find the slides though.
https://zvzzt.wordpress.com/2012/08/16/a-note-on-uptime/
(candidate resource)
Highly Available Transactions: Virtues and Limitations (Bailis et al.)
A very recent but excellent paper.
The CAP FAQ (Robinson)
From #40.
The Art of Scalability (Abbott and Fisher)
Recommended by @pyr.
http://odbms.org/download/dean-keynote-ladis2009.pdf (Design, Lessons, and Advice from Building Distributed Systems at Google) leads to 404 page.
Online, Asynchronous Schema Changes in F1 (Rae et al.)
http://carlos.bueno.org/optimization/mature-optimization.pdf
Internal performance optimization manual for facebook.
Human Error (Reason)
Reminded of this while reading Dynamo paper. Basic topic but probably can find a good short post that discusses it.
Intro material on hot compatibility and relation to distribution + gradual rollouts.
Web Operations (Allspaw and Robbins)
hello, educational article and post-mortem document
http://pythonsweetness.tumblr.com/post/64740079543/how-to-lose-172-222-a-second-for-45-minutes
In Search of Certainty (Burgress)
Dotscale conference which just closed yesterday is just about that. It is in my view a nice addition to the conference list, and the talks there are independently curated (not sponsored). http://dotscale.eu
http://en.wikipedia.org/wiki/CAP_theorem
I don't have any recommendations for specific papers currently, but I think it's an important concept for engineers to learn!
Security Engineering (Anderson)
A brief history of Consensus, 2PC and Transaction Commit (Mc Keown)
From #40.
Recommended in this reading list.
Dynamo is easily understandable and a good intro to distributed eventually consisted databases. http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf
The Spark work is really interesting and there are some good papers on it:
http://people.csail.mit.edu/matei/papers/2010/hotcloud_spark.pdf
https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf
Paxos Made Live - An Engineering Perspective (Tushar Chandra)
From #40.
Brewer’s Conjecture and the Feasibility of
Consistent, Available, Partition-Tolerant Web
Services (Gilbert and Lynch)
The original CAP proof.
Firedrills, failure simulations, chaos monkeys, and before/after failure testing.
Everything should have an explicit limit, even if very high, backpressure everywhere, etc.
Some good URLs around this that I know of:
Kafka: A Distributed Messaging System for Log Processing (Kreps et al.)
The Log: What every software engineer should know about real-time data's unifying abstraction
Some good paper links at the bottom of that post too.
Recommended in this reading list.
Maybe this turns out to be a better fit than #4.
How well does this content fit a knowledge map structure? The list is great but it's also large-ish and growing. Having a logical starting point (perhaps per high-level topic) might be interesting.
Solving the problem: how does one use this list to build a reading list? What order should things be learned/considered? What papers/books build on ideas in earlier ones?
Perhaps the answer is just reading things in roughly chronological order, in which case publication dates might make sense in the README.
Could be a good survey / practical resource on operable apps.
I think I've read this but it was a while ago, so I need to review.
I'm not exactly sure if this fits in here, but the appendix F from the Challenger explosion investigation was a goldmine of engineering principles and how things can go wrong that I could learn from for software.
Included:
I won't be surprised to see it not fit, but it's an interesting read nonetheless.
Convergent and Commutative Replicated Data Types (Shapiro et al.)
From #40.
General paper- or book-length resources on security engineering practices.
by Marius Eriksen from Twitter
Available from: http://monkey.org/~marius/funsrv.pdf
Abstract:
Building server software in a large-scale setting, where systems exhibit a high degree of concurrency and environmental variability, is a challenging task to even the most experienced programmer. Efficiency, safety, and robustness are paramount—goals which have traditionally conflicted with modularity, reusability, and flexibility.
We describe three abstractions which combine to present a powerful programming model for building safe, modular, and efficient server software: Composable futures are used to relate concurrent, asynchronous actions; services and filters are specialized functions used for the modular composition of our complex server software.
Finally, we discuss our experiences using these abstractions and techniques throughout Twitter’s serving infrastructure.
Recommended by @mfine
Managing the Unexpected (Weick and Sutcliffe)
Sources of Power: How People Make Decisions - recommended by @statik.
Have heard good things about this book.
Also "one, some, many, all".
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.