Coder Social home page Coder Social logo

awesome-cold-showers's Introduction

Awesome Cold Showers

It's great when people get excited about things, but sometimes they get a little too excited. This an awesome (rigorous and respectful) and curated (I read every suggestion and make judgement calls) list of cold showers on overhyped topics. This does not mean the enthusiasm is bad or wrong: we're just reminding people to stay grounded. Feel free to submit your favorites!

  • Hype: "Formal Verification is a great way to write software. We should prove all of our code correct."

  • Shower: Extensive literature review showing that formal methods are hard to learn, extremely expensive to apply, and often miss critical bugs.

  • Caveats: Written in 2000 and doesn't cover modern tools/techniques, such as TLA+ or dependent typing.

  • Notes: Part of Peter Gutmann's thesis, "The Design and Verification of a Cryptographic Security Architecture". The whole thesis can be found here.

  • Hype: "Static Typing reduces bugs."

  • Shower: A review of all the available literature (up to 2014), showing that the solid research is inconclusive, while the conclusive research had methodological issues.

  • Caveats: Does not cover other possible benefits of static typing, like documentation. Does not address research on gradual type systems, like mypy or Typescript.

  • Hype: "We need big data systems to handle big data."

  • Shower: Benchmarking cutting-edge graph-processing algorithms running on 128-core clusters against a single-threaded 2014 Macbook Pro. The laptop consistently wins, sometimes by an order of magnitude.

  • Caveats: McSherry is really good at optimizing his algorithms and has skills the average data scientist does not. Big data systems might be better if you have ad-hoc queries and don't want to take the time to optimize them.

  • Notes: "If you are going to use a big data system for yourself, see if it is faster than your laptop. If you are going to build a big data system for others, see that it is faster than my laptop."

  • Hype: Anything about performance or scalability of various languages/web frameworks/databases.

  • Shower: Actual hard data of various combinations of solutions under various tasks.

  • Caveats: Raw data you have to interpret yourself. Does not provide a complete dump of the raw data for your own analysis. Raw data can now be found here

  • Notes: Continually updating with new benchmarks. All implementations are public and you can improve them with a PR.

  • Hype: "We should develop software using Agile."

  • Shower: Review of all the different styles of Agile and how some of the practices (particularly replacing requirements with user stories and the lack of proper specification) are harmful in the long run.

  • Caveats: While Meyer calls out some problems, overall he's very positive about Agile and recommends it as a good (but imperfect) methodology.

  • Notes: Starts at 3:30. There's a followup video where he answers audience questions.

  • Hype: "If I formally verify my code, I don't need to test it!"

  • Shower: Researchers looked at three formally verified systems, and found critical correctness bugs in all three. The bugs were from "a wide range of mismatched assumptions" and caused servers to crash or produce wrong data.

  • Caveats: Most bugs were at the system boundaries; none were found in the implemented protocols. Formally verified systems, while not perfect, were considerably less buggy than unverified systems.

  • Notes: Systems were verified with Coq and Z3. Further discussion at The Morning Paper.

  • Hype: "Identifiers should be self-documenting! Use full names, not abbreviations."

  • Shower: Researchers had programmers fix bugs in a codebase, either with all of the identifiers were abbreviated, or where all of the identifiers were full-words. They found no difference in time taken or quality of debugging.

  • Caveats: Only applies to fixing bugs. Otherwise watertight. This is honestly one of the most rigorous and comprehensive papers I've ever read.

  • Notes: Includes ethnography on how programmers debug abbreviated code. Link is to the author preprint.

  • Hype: camelCase is easier to read than under_score. So it is a best practice to use camelCase in variable names, function names, and other identifiers.

  • Shower: Several research papers have been done. But when eye-tracking software was used to test the claim, two conclusions emerged: (1) developers are equally accurate regardless of style, but (2) the under_score style can be processed faster and easier.

  • Caveats: The study's sample size was small (15 people), and "Subjects were historically trained mostly in the underscore identifier style and were all programmers." In the study, subjects were presented terms in isolation (not in blocks of code). Thus, as the study notes, there could be variance due to context.

  • Notes: "The interaction of Experience with Style indicates that novices benefit twice as much with respect to time, with the underscore style. "This paper purports to remedy difficulties in an earlier paper entitled To CamelCase or Under_score, which concluded that with training, camelCase is more accurately processed. Finally, neither paper seems to have analyzed whether native language comes into play (e.g. whether it is easier for non-native English speakers to understand camelCase versus under_score).

  • Hype: "Microservices! Microservices!"

  • Shower: Presents five fallacies of "why microservices solve problems monoliths have" and shows how either monoliths don't actually have those problems or that microservices make the problem even worse.

  • Caveats: Abstract arguments and experience, no case studies or examples.

  • Hype: Your favourite programming language has been updated. The new version makes impressive performance improvement claims.

  • Shower: Benchmarking modern programming languages under near-ideal circumstances, just for longer than before, suggests that we have not been benchmarking language implementations as accurately as we might wish. Many benchmarks slow down over time. Some never stabilise. Many benchmarking experiments will not be repeatable due to non-determinism. Warmup time is important, but is usually either ignored, or reported inaccurately.

  • Caveats: Only evaluates the x86_64 architecture, and for only two operating systems (Linux and OpenBSD). Experiment conducted in 2017 (prior to meltdown patching). Evaluates mainly JITted language implementations (although C benchmarks were included).

  • Notes: The experiment and the benchmark runner are published under an open source license. Start here.

  • Hype: "Scaling out is better than scaling up. Cloud is more scalable than bare metal."

  • Shower: Expensify found that running a single bare-metal server was both faster and cheaper than using a x1e.32xlarge EC2 instance. By using one server, they could avoid sharding their data.

  • Caveats: Does not cover if scaling out bare metal has the same advantages over scaling out EC2 (assuming you can afford sharding). Can't really compare how much cheaper the bare metal is because they don't list the cost. I'm guessing their servers are 100k each? No basis for that guess though.

  • Hype: Compared to other languages, Go's concurrency system of goroutines and channels is easier to understand, easier to use, and is less prone to bugs and memory leaks.

  • Shower: According to an empirical study by Tu, et al, there are plenty of concurrency-related bugs related to the difficulty in understanding and following the concurrency features and patterns provided by Go.

  • Caveats: This study is specific to Go. Though other languages provide similar facilities, they are not covered in this article. Also, the types of bugs seen with channels and shared memory are different. Channels lead to more blocking bugs (deadlocks, dangling channels) while shared-memory lead to more nonblocking bugs (race conditions, dirty reads).

  • Notes: "We studied six popular Go software including Docker, Kubernetes, and gRPC. We analyzed 171 concurrency bugs in total, with more than half of them caused by non-traditional, Go-specific problems"

Plug

You can find my general ravings on my website or twitter.

awesome-cold-showers's People

Contributors

ahnberg avatar ggorlen avatar hwayne avatar pitmonticone avatar technosophos avatar vext01 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awesome-cold-showers's Issues

Ontologies

Fortunately it's not mainstream (so not hype?), but the amount of money that goes into teams to foster this initiative is not something to ignore.

https://people.well.com/user/doctorow/metacrap.htm

I won't write a PR, so this issue is just to drop a link in case someone steps up to do it.

Cold shower on GPT and LLM

Stochastic parrots is the obvious one, but there's others that are more pointed about specific flaws in, like, ChatGPT or GPT4

Reconsider "Verification Techniques" as a cold shower

I've taken a look into the paper (the chapter) and that's not great of a paper unfortunately: https://dev.to/gabrielfallen/a-cold-shower-for-a-cold-shower-237d

In my view, it didn't provide an "extensive literature review" even at the time, and still less than that now, 20 years later. And I don't see how it supports the claim that "formal methods are hard to learn, extremely expensive to apply, and often miss critical bugs".

Besides, I don't see anybody actually claiming "Formal Verification is a great way to write software. We should prove all of our code correct.", and it looks like nobody ever did. Thus it doesn't look like we need a cold shower on this one at all...

Why Do Keynote Speakers Keep Suggesting That Improving Security Is Possible?

This seems perfect to add to the list:

Q: Why Do Keynote Speakers Keep Suggesting That Improving Security Is Possible?
A: Because Keynote Speakers Make Bad Life Decisions and Are Poor Role Models

https://www.youtube.com/watch?v=ajGX7odA87k

It's an excellent cold shower on the promises of generalised AI and Machine Learning and how they should not be let anywhere near either the internet or critical infrastructure, let alone both. It's also hilarious and really accessible.

Add caveats to Scalability section (memory, storage)

I read the Scalability entry, and it's a good post. I'd add a couple more caveats (discussed briefly in the article). Not all "big data" scalability problems are built around scaling out the number of CPU cores; I've worked in "big data" scaling on Spark before and often built out clusters for 10,000-100,000 times the dataset size of the one on McSherry's laptop. The calculus for these sorts of systems starts to tip back towards "the cluster's better" fairly quickly when you're also dealing with bus and memory bounds (do you have enough memory to hold the data you need in-memory, plus room to receive shuffles? Do you have a local network/NICs that are adequate to run those shuffles in reasonable time? Do you have enough striped fast storage?)

I'd add the 1G (still fairly large, sure) dataset size to the Shower part and explain that this is heavily a warning against overengineering and premature optimization.

Trawl The Morning Paper

https://blog.acolyer.org/

Several hundred papers along with extensive analysis. Across the entire spectrum of CS, so most are not cold showers.

Papers found should link to the actual paper in the title and have an additional note:

Further discussion at [The Morning Paper](link to acolyer's post)

Turn the "hype" into each section's header

This is just food for thought, sorry for creating an issue.

When reading the formatted markdown, the headers stick out. Since the headers are 1:1 to papers' names, they may not be "the" obvious representation of the hype.

There's also no room in the current format for associating multiple papers (with different approaches) to the single hype, if that's something that would be useful. I.e. possibly multiple sets of {shower,caveat,paper} per hype.

Programming language makes more productive

Hype: Programming language X makes you more productive

Shower: An experiment with more than 600 professional programmers shows that (apart from assembly) programming language makes no difference.

Caveat: Was done in the 80s with Fortran, Cobol, C, Pascal.

Unfortunately, the book is not freely accessible. Maybe someone knows a paper version? Maybe even a more recent study?

AWS is notoriously expensive compared to GCP

Regarding the Cold Shower for "Scaling SQLite to 4M QPS on a Single Server", AWS is notoriously expensive compared to GCP, so I'm not very impressed with the claim and a bare metal VS GCP comparison would be much more relevant. (And I'm not sure bare metal would come out (significantly) ahead, then.)

Find "Maintaining Mental Models"

"LATOZA, T. D., VENOLIA, G., AND DELINE, R. 2006.Maintaining mental models: a study of developer work habits. In Proc. of International Conference on Software Engineering. ACM, 492–501."

Found it as a cite in a different article, might potentially be a cold shower on documentation hype? Haven't looked to see if it's freely accessible.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.