hwayne / awesome-cold-showers Goto Github PK

For when people get too hyped up about things

License: Other

programming

awesome-cold-showers's Introduction

Awesome Cold Showers

It's great when people get excited about things, but sometimes they get a little too excited. This an awesome (rigorous and respectful) and curated (I read every suggestion and make judgement calls) list of cold showers on overhyped topics. This does not mean the enthusiasm is bad or wrong: we're just reminding people to stay grounded. Feel free to submit your favorites!

Verification Techniques (PDF)

Hype: "Formal Verification is a great way to write software. We should prove all of our code correct."
Shower: Extensive literature review showing that formal methods are hard to learn, extremely expensive to apply, and often miss critical bugs.
Caveats: Written in 2000 and doesn't cover modern tools/techniques, such as TLA+ or dependent typing.
Notes: Part of Peter Gutmann's thesis, "The Design and Verification of a Cryptographic Security Architecture". The whole thesis can be found here.

Static vs Dynamic Typing: a literature review

Hype: "Static Typing reduces bugs."
Shower: A review of all the available literature (up to 2014), showing that the solid research is inconclusive, while the conclusive research had methodological issues.
Caveats: Does not cover other possible benefits of static typing, like documentation. Does not address research on gradual type systems, like mypy or Typescript.

Scalability! but at what COST?

Hype: "We need big data systems to handle big data."
Shower: Benchmarking cutting-edge graph-processing algorithms running on 128-core clusters against a single-threaded 2014 Macbook Pro. The laptop consistently wins, sometimes by an order of magnitude.
Caveats: McSherry is really good at optimizing his algorithms and has skills the average data scientist does not. Big data systems might be better if you have ad-hoc queries and don't want to take the time to optimize them.
Notes: "If you are going to use a big data system for yourself, see if it is faster than your laptop. If you are going to build a big data system for others, see that it is faster than my laptop."

Web Framework Benchmarks

Hype: Anything about performance or scalability of various languages/web frameworks/databases.
Shower: Actual hard data of various combinations of solutions under various tasks.
Caveats: Raw data you have to interpret yourself. ~~Does not provide a complete dump of the raw data for your own analysis.~~ Raw data can now be found here
Notes: Continually updating with new benchmarks. All implementations are public and you can improve them with a PR.

Agile Methods: The Good, the Hype and the Ugly (Video)

Hype: "We should develop software using Agile."
Shower: Review of all the different styles of Agile and how some of the practices (particularly replacing requirements with user stories and the lack of proper specification) are harmful in the long run.
Caveats: While Meyer calls out some problems, overall he's very positive about Agile and recommends it as a good (but imperfect) methodology.
Notes: Starts at 3:30. There's a followup video where he answers audience questions.

An Empirical Study on the Correctness of Formally Verified Systems (PDF)

Hype: "If I formally verify my code, I don't need to test it!"
Shower: Researchers looked at three formally verified systems, and found critical correctness bugs in all three. The bugs were from "a wide range of mismatched assumptions" and caused servers to crash or produce wrong data.
Caveats: Most bugs were at the system boundaries; none were found in the implemented protocols. Formally verified systems, while not perfect, were considerably less buggy than unverified systems.
Notes: Systems were verified with Coq and Z3. Further discussion at The Morning Paper.

Fixing Faults in C and Java Source Code: Abbreviated vs. Full-word Identifier Names (PDF)

Hype: "Identifiers should be self-documenting! Use full names, not abbreviations."
Shower: Researchers had programmers fix bugs in a codebase, either with all of the identifiers were abbreviated, or where all of the identifiers were full-words. They found no difference in time taken or quality of debugging.
Caveats: Only applies to fixing bugs. Otherwise watertight. This is honestly one of the most rigorous and comprehensive papers I've ever read.
Notes: Includes ethnography on how programmers debug abbreviated code. Link is to the author preprint.

An Eye Tracking Study on camelCase and under_score Identifier Styles (PDF)

Hype: camelCase is easier to read than under_score. So it is a best practice to use camelCase in variable names, function names, and other identifiers.
Shower: Several research papers have been done. But when eye-tracking software was used to test the claim, two conclusions emerged: (1) developers are equally accurate regardless of style, but (2) the under_score style can be processed faster and easier.
Caveats: The study's sample size was small (15 people), and "Subjects were historically trained mostly in the underscore identifier style and were all programmers." In the study, subjects were presented terms in isolation (not in blocks of code). Thus, as the study notes, there could be variance due to context.
Notes: "The interaction of Experience with Style indicates that novices benefit twice as much with respect to time, with the underscore style. "This paper purports to remedy difficulties in an earlier paper entitled To CamelCase or Under_score, which concluded that with training, camelCase is more accurately processed. Finally, neither paper seems to have analyzed whether native language comes into play (e.g. whether it is easier for non-native English speakers to understand camelCase versus under_score).

Microservices - Please, don't

Hype: "Microservices! Microservices!"
Shower: Presents five fallacies of "why microservices solve problems monoliths have" and shows how either monoliths don't actually have those problems or that microservices make the problem even worse.
Caveats: Abstract arguments and experience, no case studies or examples.

VM Warmup Blows Hot and Cold

Hype: Your favourite programming language has been updated. The new version makes impressive performance improvement claims.
Shower: Benchmarking modern programming languages under near-ideal circumstances, just for longer than before, suggests that we have not been benchmarking language implementations as accurately as we might wish. Many benchmarks slow down over time. Some never stabilise. Many benchmarking experiments will not be repeatable due to non-determinism. Warmup time is important, but is usually either ignored, or reported inaccurately.
Caveats: Only evaluates the x86_64 architecture, and for only two operating systems (Linux and OpenBSD). Experiment conducted in 2017 (prior to meltdown patching). Evaluates mainly JITted language implementations (although C benchmarks were included).
Notes: The experiment and the benchmark runner are published under an open source license. Start here.

Scaling SQLite to 4M QPS on a Single Server

Hype: "Scaling out is better than scaling up. Cloud is more scalable than bare metal."
Shower: Expensify found that running a single bare-metal server was both faster and cheaper than using a x1e.32xlarge EC2 instance. By using one server, they could avoid sharding their data.
Caveats: Does not cover if scaling out bare metal has the same advantages over scaling out EC2 (assuming you can afford sharding). Can't really compare how much cheaper the bare metal is because they don't list the cost. I'm guessing their servers are 100k each? No basis for that guess though.

Understanding Real-World Concurrency Bugs in Go (PDF)

Hype: Compared to other languages, Go's concurrency system of goroutines and channels is easier to understand, easier to use, and is less prone to bugs and memory leaks.
Shower: According to an empirical study by Tu, et al, there are plenty of concurrency-related bugs related to the difficulty in understanding and following the concurrency features and patterns provided by Go.
Caveats: This study is specific to Go. Though other languages provide similar facilities, they are not covered in this article. Also, the types of bugs seen with channels and shared memory are different. Channels lead to more blocking bugs (deadlocks, dangling channels) while shared-memory lead to more nonblocking bugs (race conditions, dirty reads).
Notes: "We studied six popular Go software including Docker, Kubernetes, and gRPC. We analyzed 171 concurrency bugs in total, with more than half of them caused by non-traditional, Go-specific problems"

Plug

You can find my general ravings on my website or twitter.

awesome-cold-showers's People

Contributors

Stargazers

Watchers

Forkers

neo4reo vext01 matrixy 4mitch tonydc sbmthakur parinck smahood sofetch aebm pranaysonisoft brydzu sahwar fuath ishowx srcoulombe tomjal manojvenkat weisk robert-joscelyne d3vil7 ahnberg jsimpson technosophos ippy04 pplonski mewbak imgurpreetsk osamashabrez p-ranav technikhil314 clayne haloboy777 mkovacek naveenadi freezesoul wysiib 99994433552 memphis2coder vkandola ukriish iraghavr jgemedina darwinz rahul-38-26-0111-0003 kokizzu arm7ai intfrr flamato origamiengineer blockspacer afrische sambacha tiancheng-luo jbwl gonzalovazquez valrcs warkanlock dsfb dmytrosytnyk mutusva bagasstrongman c0demon 1jack2 zlrth doytsujin datj9 yuecehan jynychen oshec nerdfiles quern8783 sanix-darker kevin-zhao-career pitmonticone daixu1028 anhdungadg markusbkk ryman

awesome-cold-showers's Issues

Ontologies

Fortunately it's not mainstream (so not hype?), but the amount of money that goes into teams to foster this initiative is not something to ignore.

https://people.well.com/user/doctorow/metacrap.htm

I won't write a PR, so this issue is just to drop a link in case someone steps up to do it.

Cold shower on GPT and LLM

Stochastic parrots is the obvious one, but there's others that are more pointed about specific flaws in, like, ChatGPT or GPT4

a bunch of cold showers found in one place

This book starts with guidance on how to be critical of software productivity research, and then goes on to present a bunch of studies where folks have tried to prove or disprove the benefits of different productivity enhancing ideas:

http://shop.oreilly.com/product/9780596808303.do

Review "An Empirical Study on the correctness of formally verified distributed systems"

https://blog.acolyer.org/2017/05/29/an-empirical-study-on-the-correctness-of-formally-verified-distributed-systems/

This is something I have domain knowledge in so can handle myself, mostly listing so I don't forget about it

Reconsider "Verification Techniques" as a cold shower

I've taken a look into the paper (the chapter) and that's not great of a paper unfortunately: https://dev.to/gabrielfallen/a-cold-shower-for-a-cold-shower-237d

In my view, it didn't provide an "extensive literature review" even at the time, and still less than that now, 20 years later. And I don't see how it supports the claim that "formal methods are hard to learn, extremely expensive to apply, and often miss critical bugs".

Besides, I don't see anybody actually claiming "Formal Verification is a great way to write software. We should prove all of our code correct.", and it looks like nobody ever did. Thus it doesn't look like we need a cold shower on this one at all...

Why Do Keynote Speakers Keep Suggesting That Improving Security Is Possible?

This seems perfect to add to the list:

Q: Why Do Keynote Speakers Keep Suggesting That Improving Security Is Possible?
A: Because Keynote Speakers Make Bad Life Decisions and Are Poor Role Models

https://www.youtube.com/watch?v=ajGX7odA87k

It's an excellent cold shower on the promises of generalised AI and Machine Learning and how they should not be let anywhere near either the internet or critical infrastructure, let alone both. It's also hilarious and really accessible.

One for `Microservices - Please, don't`

Would you name "integrating different runtimes" a fallacy?

Add caveats to Scalability section (memory, storage)

I read the Scalability entry, and it's a good post. I'd add a couple more caveats (discussed briefly in the article). Not all "big data" scalability problems are built around scaling out the number of CPU cores; I've worked in "big data" scaling on Spark before and often built out clusters for 10,000-100,000 times the dataset size of the one on McSherry's laptop. The calculus for these sorts of systems starts to tip back towards "the cluster's better" fairly quickly when you're also dealing with bus and memory bounds (do you have enough memory to hold the data you need in-memory, plus room to receive shuffles? Do you have a local network/NICs that are adequate to run those shuffles in reasonable time? Do you have enough striped fast storage?)

I'd add the 1G (still fairly large, sure) dataset size to the Shower part and explain that this is heavily a warning against overengineering and premature optimization.

Cold shower for DRY principle

https://news.ycombinator.com/item?id=23739596

Trawl The Morning Paper

https://blog.acolyer.org/

Several hundred papers along with extensive analysis. Across the entire spectrum of CS, so most are not cold showers.

Papers found should link to the actual paper in the title and have an additional note:

Further discussion at [The Morning Paper](link to acolyer's post)

That expensify thing on sqllite

Internet going off for the night so getting this note iiiiiiiin

Would like to see a section on monorepo as well.

I think the monorepo hype can result in bad architectural decisions if not properly evaluated.

N-Version Programming

http://sunnyday.mit.edu/papers.html#misc

Turn the "hype" into each section's header

This is just food for thought, sorry for creating an issue.

When reading the formatted markdown, the headers stick out. Since the headers are 1:1 to papers' names, they may not be "the" obvious representation of the hype.

There's also no room in the current format for associating multiple papers (with different approaches) to the single hype, if that's something that would be useful. I.e. possibly multiple sets of {shower,caveat,paper} per hype.

The Unreasonable Ineffectiveness of Machine Learning in Computer Systems Research

I posted this comment with this article. Someone was getting too hyped up about a certain topic :) Let me know if I should submit a pull request or if you want to discuss further.

https://news.ycombinator.com/item?id=16036133

https://www.sigarch.org/the-unreasonable-ineffectiveness-of-machine-learning-in-computer-systems-research/

Review “Virtual machine warmup blows hot and cold”

https://dl.acm.org/citation.cfm?doid=3152284.3133876 (open access, PDF)

Programming language makes more productive

Hype: Programming language X makes you more productive

Shower: An experiment with more than 600 professional programmers shows that (apart from assembly) programming language makes no difference.

Caveat: Was done in the 80s with Fortran, Cobol, C, Pascal.

Unfortunately, the book is not freely accessible. Maybe someone knows a paper version? Maybe even a more recent study?

AWS is notoriously expensive compared to GCP

Regarding the Cold Shower for "Scaling SQLite to 4M QPS on a Single Server", AWS is notoriously expensive compared to GCP, so I'm not very impressed with the claim and a bare metal VS GCP comparison would be much more relevant. (And I'm not sure bare metal would come out (significantly) ahead, then.)

"LATOZA, T. D., VENOLIA, G., AND DELINE, R. 2006.Maintaining mental models: a study of developer work habits. In Proc. of International Conference on Software Engineering. ACM, 492–501."

Found it as a cite in a different article, might potentially be a cold shower on documentation hype? Haven't looked to see if it's freely accessible.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.