Comments (5)
I'll think about it. Removal of overgoal chunks and also system rebalance might be suspended for short (configurable) period of time after connecting of new chunkserver (or after adding HDD with chunks). The question is: How long should it be suspended? Hour?
The other question is. Is it really necessary? When disk went offline it is good idea to attach it in "mark for removal" mode because usually this means that such disk is broken and should be replaced asap. In what scenario disk that went offline will be attached again as "normal" disk?
from moosefs.
On Mon, Apr 25, 2016 at 1:43 AM, Jakub Kruszona-Zawadzki <
[email protected]> wrote:
I'll think about it. Removal of overgoal chunks and also system rebalance
might be suspended for short (configurable) period of time after connecting
of new chunkserver (or after adding HDD with chunks). The question is: How
long should it be suspended? Hour?The other question is. Is it really necessary? When disk went offline it
is good idea to attach it in "mark for removal" mode because usually this
means that such disk is broken and should be replaced asap. In what
scenario disk that went offline will be attached again as "normal" disk?
I admit my use case is not the norm for Moosefs. I have master + two chunk
servers in the house and a chunkserver + metadata mirror in an outbuilding.
The chunk servers are on cubieboards. The connection to the outbuilding was
via wireless and it was somewhat unreliable. The purpose was to ensure that
if something bad happened to the house (break in, fire etc.) the data would
be saved in the out building. My hope was that using topology file and
setting goal of three on really important files would result in safe
storage with acceptable performance. My goals for performance are very
modest but even so the remote chunkserver over wireless impacted
performance severely. I ran a network cable and performance is now ok.
My goal with Moosefs is cheap, trustworthy storage with low stress and
burden. Moose has excelled in this for me for quite a few years now. When
disks die, replacing them is trivial, mostly automatic and overall a very
low burden to me. When a disk dies in a btrfs or raid based system it is
relatively complicated to recover. This quality of Moosefs is absolutely
wonderful.
I think removing chunks could possibly be driven by a free space parameter.
I have ~5.5T of raw space of which I'm only using 2.5T. In my ideal world
I would set a free space threshold under which Moosefs would not bother to
remove chunks. This threshold would be distributed across all disks. For
example I'd set a free space threshold of 1.5T and each chunkserver would
trigger disk cleanup only if there was less than 500G of space available. I
suspect this might help performance in some situations as I've seen Moosefs
get very slow when re-balancing.
Anyhow, this is a very low priority suggestion that I thought I'd just
mention. I'm very satisfied with Moosefs and grateful for it's being made
available via open source. Thanks much for that.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#10 (comment)
from moosefs.
My goal with Moosefs is cheap, trustworthy storage with low stress and
burden. Moose has excelled in this for me for quite a few years now. When
disks die, replacing them is trivial, mostly automatic and overall a very
low burden to me. When a disk dies in a btrfs or raid based system it is
relatively complicated to recover. This quality of Moosefs is absolutely
wonderful.
I'd just like to reiterate this myself. When I started with MooseFS over a year ago it was a bit of a learning curve and setup. But it was time well spent. Once up and running MooseFS was very hands off and continues to amaze me how it just sits there and works without needing constant attention or a sysadmin to tweak it all day. It truly amazes me how many storage and sysadmins these days still mess with RAID5 RAID6 or RAID anything really. The "era of Ceph storage" seems to finally be here (even if Ceph isn't the software we've all chosen here, it just seens to be the trendy one all the enterprises are talking and writing articles about).
It's just sooo much easier to handle the inevitable failure (that will happen) when it does happen, by letting a filesystem auto move around and rebalance your data without your intervention, than trying to rebuild disks and hope you don't loose another one in the re-silvering process. I too use MooseFS at my home as a backup for just about all my stuff (I also keep an offsite second backup of my truly critical data). I also use MFS as my datastore for my home ESXi box. And I'm testing the possibility of using it at work as well.
Thanks guys for all your hard work! And keep it up. :)
from moosefs.
I'll think about it. Removal of overgoal chunks and also system rebalance might be suspended for short (configurable) period of time after connecting of new chunkserver (or after adding HDD with chunks). The question is: How long should it be suspended? Hour?
If there was a way to track the "health" of a chunkserver/disk pairs in terms of current uptime, recent number of disconnects and IO errors in the last 1h/12h/24h (pick any reasonable timeframe here) against admin specified thresholds the decision of which cutoff to pick could be easier to automate?
A preferred alternative to embedding all such logic in mfschunkserver and mfsmaster would be to have an out-of-band "health" checking capability. Periodical poll of a sqlite database, external executable, or web api call that provides the caller with sets of operational parameters for maintenance modes, predicted failures, along with TTLs for how long any of such overrides are to be considered valid would enable automation and customization for a variety of deployments. Value of this would extend beyond deciding whether to remove chunks lazily, rapidly or with default speed.
With all that said, hats off to Moosefs for an already great solution.
from moosefs.
Original idea from #8:
I think an option to make removing extra chunks more lazy would be good.
Scenario is that a disk goes offline and gets replicated then comes online and gets removed, then it happens again. All this churn would be reduced by more lazy removal of extra chunks.
Another scenario: entire datacenter/rack worth of chunkservers disappears due to unplanned outage, remaining locations bring replicas to defined levels and everything continues as normal. When datacenter reappears master rapidly deletes known good copies, not knowing that the outage resulted from a cooling failure and that dozens of disks in the originally offlined location will start returning I/O errors next time filesystem activity or scan hits some or any chunks stored on them. This tightly couples with trust model for .chunkdb
proposed in #165. This is not a hypothetical scenario, btw.
from moosefs.
Related Issues (20)
- [BUG] disks within a chuckserver are not getting balanced HOT 1
- supports IPv6 HOT 4
- [BUG] The data displayed by mfs has garbled characters HOT 8
- mfsmaster -a restore hangs with 100% CPU usage HOT 5
- [Question] 2 copys of chunks on one chunkserver HOT 1
- [BUG] Performance impact and write amplification with CHANGELOG_SAVE_MODE = 2 HOT 9
- Do the Master and Chunk servers have to be the same architecture? HOT 3
- chunkserver: High speed rebalance blocks deletions? HOT 7
- [BUG] fuse: bad mount point `/matrix/synapse/storage/media-store/': Input/output error HOT 2
- [FEATURE] Official packages of MooseFS / MooseFS Pro for Debian 12 Bookworm HOT 2
- [BUG] mfsbdev and map + unmap + map on /dev/ndb0 = input/output error HOT 1
- [FEATURE] mfsclient mfstimeout default 0 HOT 1
- mfsmaster register error: No such file or directory HOT 3
- Can't mount MooseFS on Proxmox 8.1 properly. HOT 4
- MooseFS 3.x Erasure Code Support
- [BUG] mfsmaster hung and in unkillable D state HOT 3
- [BUG] DeprecationWarning: 'cgi' is deprecated and slated for removal in Python 3.13 HOT 2
- [FEATURE] mfsbdev as standard (TCP/Unix Socket) NBD server HOT 1
- [BUG] Empty chunks and copies with different checksums HOT 8
- Recovery data from chunks without metadata :) HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from moosefs.