Comments (4)
Given that these are hinted handoffs, I think it would be expected that they are handoffs from secondary partitions (i.e. fallback vnodes that were temporarily created to maintain n_val during an outage).
There's been a lot of work done in the last few versions of Riak to try and improve handoff reliability, as there were a lot of problems with handoff timeouts, particularly when handoffs are occurring during busy periods or vnodes are particularly large.
In your version, the first thing is probably to reduce the riak_core handoff_acksync_threshold across your cluster. This reduces the number of batches between acknowledgements.
There may also be value in increasing the riak_core handoff_timeout across the cluster.
There may also be value in increasing the riak_core handoff_receive_vnode_timeout.
These changes can all be made via riak attach and application set_env (which will change for the next handoff). Also you can add different settings into advanced.config (which will have effect following reboot).
Finally, if you have increased the riak_core handoff_concurrency from the default setting, there may be value in reducing back to the default again.
Monitoring of these handoffs has been improved in recent versions, as working out what exactly is going wrong in older Riak versions is hard. When a handoff fails, it starts to re-send all the data from the beginning, so if the fallback vnodes were created as part of an extended outage (and are quite large) then continuous failures are possible.
If you are confident that all the data is sufficiently covered in your cluster (due to other replicas and anti-entropy mechanisms), in the worst case scenario you can stop each node in turn and manually delete the fallback vnodes. Obviously though, it would be more sustainable to find a configuration which will work for future handoffs.
from riak.
Thanks Martin, I'll try these config changes steps and see who it goes. Will keep you updated.
from riak.
I did some changes in riak attach and application set_env
and restart riak.
That kicks off the transfer again, but now I'm seing a different error in riak errors logs
2023-05-03 01:34:09.787 [error] <0.304.0>@riak_core_ring:check_tainted:263 Error: riak_core_ring/ring_ready called on tainted ring
2023-05-03 01:34:09.787 [error] <0.304.0>@riak_core_ring:check_tainted:263 Error: riak_core_ring/ring_ready called on tainted ring
The transfer seems to be in progress, but I don't understand how to fix this riak_core_ring:check_tainted error
I need your help again, thanks
from riak.
I don't know really. I believe the tainted flag was added, so that before a read-only cache of the ring is exported (using mochiglobal), it is marked as tainted so that it can be confirmed that such a cached ring is never mistakenly used as the version to make an updated ring - i.e. some code updates the ring from get_raw_ring not get_my_ring.
So the tainted state, and the error messages were a check to make sure this never happens. But clearly, in some rare circumstance it can. Because of this the unset_tainted function was added so that this could be fixed from remote_console ... but that isn't available in older versions of Riak.
If the error logs don't go away, there might be another method to clear this status. I don't think it will work, but perhaps riak_core_ring_manager:force_update/0
might be worth a shot. You could compile a new version of the riak_core_ring module with the exported unset_tainted function added, and hot code load it, then use the function to unset.
from riak.
Related Issues (20)
- Docker image HOT 4
- Upgrading to 3.0 HOT 5
- RIak Bitcask primary partition Failed to merge HOT 5
- Riak Off Siblings
- Reip doesn't work Riak >= 3.0 HOT 1
- riak 3.x `riak chkconfig` command returns non-zero on success
- Strange error on building Riak on Manjaro on Erlang OTP-25 HOT 8
- Unable to add password authentication for riak control on browser in ubuntu 22.04 HOT 2
- Upgrade to Rebar 3.20 HOT 1
- Support riak Erlang OTP-26, error on building Riak on Manjaro Linux on Erlang OTP-26 HOT 5
- Riak3.2.0 ,using Post-commit
- riak kv 3.2.0 OTP25 turns off when stanchion starts HOT 2
- upgrade folsom HOT 2
- All the Download Links are Broken HOT 2
- Multi-node on a single host configuration HOT 16
- Setup of Multi-node on single host
- relx_nodetool: not found - what is relx_nodetool? HOT 2
- riak 2.2.3 May I ask how to resolve this error report HOT 1
- Riak script uses $* not "$@" HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from riak.