Comments (5)
The standard way for any upgrade is to stop/update/start one node at a time across the cluster. There shouldn't be a need to do it by adding nodes unless you're changing storage backends.
Whichever way you go though, I wouldn't expect out of memory issues. This is something going unexpectedly wrong, as if you have triggered a bug. Do you have some information on your cluster you can share?
How many nodes;
Ring size;
Storage backend;
Number of clusters replicating;
Replication version used;
AAE version used;
Approximate key count;
Approximate mean object size;
Precise version migrating from and to;
Operating system;
Physical configuration of each node (CPU, memory, storage type).
It would be useful to know:
Are the OOM issues on all nodes, or just updated nodes;
If you run run riak admin top
(3.0) or riak-admin top
(2.9) sorted by memory, what are the processes hogging memory.
from riak.
Here you go! I got most of this info from our dev-ops, lemme know if there's more i can get.
How many nodes; 5
Ring size; 128
Storage backend; multi
Number of clusters replicating; 5-6
Replication version used; not sure
AAE version used; not sure
Approximate key count; Not sure how to get this either, but maybe half a billion or more, we do around 100k puts daily
Approximate mean object size; This I'm not sure how to get this, if i had to guess I'd say mostly under 1kb, except one bucket which is full of 300kb blobs
Precise version migrating from and to; 2.9 -> 3.0.10
Operating system; debian 9 on 2.9 debian 10 on 3.0.10
Physical configuration of each node (CPU, memory, storage type)
16 CPU 72GB Ram 5TB data disk ssd
Are the OOM issues on all nodes, or just updated nodes; all nodes OOM and crash
If you run run riak admin top (3.0) or riak-admin top (2.9) sorted by memory, what are the processes hogging memory.
this causes a severe outage so we did not run these commands and cannot induce again to run them.
This did not happen in a staging cluster of the clones of prod 5 nodes, adding 5 new nodes 1 at a time and removing old 1 at a time. Same data and specs. Only difference is prod traffic during crash.
from riak.
I don't understand this. There's no obvious reason for this behaviour.
The process of adding a node, and removing a node is much more expensive than stop/update/start - though I wouldn't immediately expect it to blow-up in terms of memory. Is there a reason why you're doing this update this way rather than simply stop/update/start?
There have been problems in Riak with leveled backends and excessive memory use. You can have a leveled backend if you enable tictac_aae, or if you set one of your backends to leveled in multi backend. Is leveled in play here?
from riak.
From our DevOps:
I do not believe we are using leveld. Reasons we're doing add cluster are mostly, if we stop upgrade one of our five and the upgrade fails we just lost a node and we have to take it out of the load balancer so we would take a performance hit and possible outage
We're going to attempt a stop/upgrade/start in a test cluster though!
from riak.
This is now fixed! Thanks for the support, we did a hybrid approach where we took the following steps and were successful
- remove old riak being replaced from load balancer
- spin up new debian 10 with new riak
- join cluster staged
- run replace cmd on current old to new riak
- replace staged
- run commit to have the old direct transfer to the new only while both are out of load balancer
- once done add new to lb and turn off old
Since it was a 1-1 transfer, this prevented the OOM it seems.
the two other approaches we tried:
- adding new to cluster while in lb, failed 100% of the time
- adding new to cluster while out of lb, failed for us 50%, 1 worked 1 did not
from riak.
Related Issues (20)
- Post-commit hook not triggered on Riak 3.0.9 HOT 6
- I got an error when I executed `riak-admin cluster join node` HOT 8
- Crash during ring-resize HOT 2
- Build fails riak 3.0.9 on fedora 35 HOT 2
- Build fails riak from develop branch on fedora 36 HOT 2
- repology.org | Problems in Wikidata
- Document supported hardware architectures HOT 2
- Docker image HOT 4
- RIak Bitcask primary partition Failed to merge HOT 5
- Riak Off Siblings
- Reip doesn't work Riak >= 3.0 HOT 1
- riak 3.x `riak chkconfig` command returns non-zero on success
- Strange error on building Riak on Manjaro on Erlang OTP-25 HOT 8
- Unable to add password authentication for riak control on browser in ubuntu 22.04 HOT 2
- Upgrade to Rebar 3.20 HOT 1
- Partitions waiting to handoff indefinitely HOT 4
- Support riak Erlang OTP-26, error on building Riak on Manjaro Linux on Erlang OTP-26 HOT 5
- Riak3.2.0 ,using Post-commit
- riak kv 3.2.0 OTP25 turns off when stanchion starts HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from riak.