Comments (36)
This happened after the upgrade to 3.56, which we downgraded to 3.55 as it has major bugs as it seems
from seaweedfs.
We can find multiple instances of for example ec.00
on different nodes. With different timestamp, looks like the balancing logic is not working properly.
from seaweedfs.
First guess is this is caused by ec.balance or ec.rebuild. We guess that b/c just overnight this happened. Some files that are now not accessible anymore, where accessible just yesterday.
Also in the master overview page ErasureCodingShards was showing a -
number for one node.
from seaweedfs.
Has there been significant changes to the ec since 3.43
?
from seaweedfs.
There were some benign looking error handling changes.
from seaweedfs.
@chrislusf See email, for some in depth details that I can't post here in this issue.
Here my last mail also for the issue tracker:
"
so, it took a lot of trial and error.
I was able to recover the file.
The following I did to make it work:
mv /mnt/d3/weed/redacted-2_1859.ec02 /root
What I noticed is, that all 1859 volume files were redacted-1_1859, except for this single one, that was named redacted-2. I moved this 0 byte file (redacted-2) to a different location, and immediately the file download and verify worked again.
"
from seaweedfs.
now the question is, how can this happen. Is it normal that there are the same volumeIds for multiple collections @chrislusf ?
from seaweedfs.
command back, the file is again not accessible. hmm
from seaweedfs.
so its mostly not working, and just sporadically worked for a moment, now the error changed to ReadEcShardIntervals: too few shards given
from seaweedfs.
made it working again, by getting rid of all 0 byte files of volume 1859.
They are being created again and again though. So this is the issue then, at least for this file. Have to check what is going on with the other things @chrislusf
from seaweedfs.
the error volumeId 1859 not found in fs.verify remains though
from seaweedfs.
So summary:
There are 0 byte .ec** files created. There are duplicates of .ec** files. There are sometimes even duplicates on the same volumeId with a different collection.
How this happened? No idea yet.
from seaweedfs.
Found another instance of this pattern (with 5 different collections)
-rw-r--r-- 1 root root 0 Sep 22 07:06 /mnt/d4/weed/b1_1857.ec03
-rw-r--r-- 1 root root 0 Sep 22 09:10 /mnt/d4/weed/b2_1857.ec03
-rw-r--r-- 1 root root 0 Sep 22 11:13 /mnt/d4/weed/b3_1857.ec03
-rw-r--r-- 1 root root 0 Sep 22 18:37 /mnt/d1/weed/b4_1857.ec03
-rw-r--r-- 1 root root 0 Sep 22 15:05 /mnt/d1/weed/b5_1857.ec03
Not sure yet if this is something expected though @chrislusf
from seaweedfs.
what is the output of volume.list
?
there should not be different collection with the same volume id 1857. I do not understand why it happened.
from seaweedfs.
> volume.list -volumeId 1857
Topology volumeSizeLimit:1024 MB hdd(volume:3976/107208 active:3958 free:103232 remote:0)
DataCenter dc1 hdd(volume:3976/107208 active:3958 free:103232 remote:0)
Rack rack1 hdd(volume:3976/107208 active:3958 free:103232 remote:0)
DataNode dt64:8080 hdd(volume:536/14487 active:534 free:13951 remote:0)
Disk hdd(volume:536/14487 active:534 free:13951 remote:0)
ec volume id:1857 collection:buck1 shards:[2 9]
Disk hdd total size:0 file_count:0
DataNode dt64:8080 total size:0 file_count:0
DataNode dt65:8080 hdd(volume:537/14488 active:535 free:13951 remote:0)
Disk hdd(volume:537/14488 active:535 free:13951 remote:0)
ec volume id:1857 collection:buck1 shards:[1 8]
Disk hdd total size:0 file_count:0
DataNode dt65:8080 total size:0 file_count:0
DataNode dt66:8080 hdd(volume:540/14492 active:539 free:13952 remote:0)
Disk hdd(volume:540/14492 active:539 free:13952 remote:0)
ec volume id:1857 collection:buck1 shards:[0 7]
Disk hdd total size:0 file_count:0
DataNode dt66:8080 total size:0 file_count:0
DataNode dt67:8080 hdd(volume:537/14491 active:534 free:13954 remote:0)
Disk hdd(volume:537/14491 active:534 free:13954 remote:0)
ec volume id:1857 collection:buck1 shards:[10]
Disk hdd total size:0 file_count:0
DataNode dt67:8080 total size:0 file_count:0
DataNode dt68:8080 hdd(volume:539/14492 active:537 free:13953 remote:0)
Disk hdd(volume:539/14492 active:537 free:13953 remote:0)
ec volume id:1857 collection:buck1 shards:[6 13]
Disk hdd total size:0 file_count:0
DataNode dt68:8080 total size:0 file_count:0
DataNode dt69:8080 hdd(volume:537/14482 active:532 free:13945 remote:0)
Disk hdd(volume:537/14482 active:532 free:13945 remote:0)
ec volume id:1857 collection:buck1 shards:[6 13]
Disk hdd total size:0 file_count:0
DataNode dt69:8080 total size:0 file_count:0
DataNode dt70:8080 hdd(volume:533/14391 active:530 free:13858 remote:0)
Disk hdd(volume:533/14391 active:530 free:13858 remote:0)
ec volume id:1857 collection:buck1 shards:[4]
Disk hdd total size:0 file_count:0
DataNode dt70:8080 total size:0 file_count:0
DataNode dt71:8080 hdd(volume:217/5885 active:217 free:5668 remote:0)
Disk hdd(volume:217/5885 active:217 free:5668 remote:0)
ec volume id:1857 collection:buck1 shards:[3 5 11 12]
Disk hdd total size:0 file_count:0
DataNode dt71:8080 total size:0 file_count:0
Rack rack1 total size:0 file_count:0
DataCenter dc1 total size:0 file_count:0
total size:0 file_count:0
In the volume list nothing is there of these weird ones.
This cluster is running two filers, leveldb. Is the synchronization between them enforced? I opened earlier a ticket where we noticed already before, that this synchronization does not seem to actually work in all cases. We had keys that exist only in one filer.
We were thinking already to switch to cockroachdb as a backend for the filer, to guarantee the HA, but have yet to test if scaling works with multiple filers accessing the same cockroachdb.
from seaweedfs.
I do not see this from the volume.list
output, which all have the same bucket buck1
.
-rw-r--r-- 1 root root 0 Sep 22 07:06 /mnt/d4/weed/b1_1857.ec03
-rw-r--r-- 1 root root 0 Sep 22 09:10 /mnt/d4/weed/b2_1857.ec03
-rw-r--r-- 1 root root 0 Sep 22 11:13 /mnt/d4/weed/b3_1857.ec03
-rw-r--r-- 1 root root 0 Sep 22 18:37 /mnt/d1/weed/b4_1857.ec03
-rw-r--r-- 1 root root 0 Sep 22 15:05 /mnt/d1/weed/b5_1857.ec03
from seaweedfs.
buck1, is the replacement of original name I did, and equals to b1 of that list.
As you state correct, it is not in the list. That doesn't change the fact that the other output gets generated by find /mnt/*/weed | grep 1857.ec03 | xargs sudo ls -alh
and causes major problems.
from seaweedfs.
the volume id should never be reused in other collections. I do not understand how it happened.
from seaweedfs.
I am studying the code to understand what is happening currently, the most interesting thing about the random files from other collections is, only the actual collection has its vif, ecx and ecj files. The random ones not.
-rw-r--r-- 1 root root 0 Sep 21 19:21 /mnt/d2/weed/b1_1857.ecj
-rw-r--r-- 1 root root 4.4K Sep 21 19:21 /mnt/d2/weed/b1_1857.ecx
-rwxr-xr-x 1 root root 78 Sep 21 19:21 /mnt/d2/weed/b1_1857.vif
from seaweedfs.
I don't have a good guess yet, but its either correlated to the multiple filers (although they should be syncing but as said just before, we already had issues with that not being true and items randomly missing, but I thought that already got fixed, b/c I didn't see this appear again in 3.43), or something else.
Who decides the volumeIds? The filer or the master?
from seaweedfs.
"find /mnt/*/weed | grep _1857.* | xargs sudo ls -alh"
-rw-r--r-- 1 root root 0 Sep 22 18:37 /mnt/d1/weed/b2_1857.ec03
-rw-r--r-- 1 root root 0 Sep 22 15:05 /mnt/d1/weed/b3_1857.ec03
-rw-r--r-- 1 root root 104M Sep 21 19:21 /mnt/d2/weed/b1_1857.ec06
-rw-r--r-- 1 root root 104M Sep 21 19:21 /mnt/d2/weed/b1_1857.ec13
-rw-r--r-- 1 root root 0 Sep 21 19:21 /mnt/d2/weed/b1_1857.ecj
-rw-r--r-- 1 root root 4.4K Sep 21 19:21 /mnt/d2/weed/b1_1857.ecx
-rwxr-xr-x 1 root root 78 Sep 21 19:21 /mnt/d2/weed/b1_1857.vif
-rw-r--r-- 1 root root 0 Sep 22 07:06 /mnt/d4/weed/b3_1857.ec03
-rw-r--r-- 1 root root 0 Sep 22 09:10 /mnt/d4/weed/b4_1857.ec03
-rw-r--r-- 1 root root 0 Sep 22 11:13 /mnt/d4/weed/b5_1857.ec03
Looking at this again, according to their timestamp, all of these extra volumeIds on foreign collections, definitely got created WAY later, so the volumeId existed for quite som time (multiple hours)
from seaweedfs.
Also next question is of course. Why does seaweedfs gets troubled by those foreign collection files. I did not yet find the point where the shards gets read in would be really helpful if you could point me there quickly
from seaweedfs.
Btw. this cluster was running fine for a long time on 3.43, troubles started after the upgrade to first 3.56 and then downgrade to 3.55 due to a complete lockup caused by a bug in 3.56 (which you seem to have fixed already but not released yet)
from seaweedfs.
the system is designed to have only unique volume ids.
from seaweedfs.
the system is designed to have only unique volume ids.
I guessed that, so I can pin point this is happening in 3.55
, according to the timestamps. And nothing special happened.
The targets that come to mind that could cause this are
ec.encode -fullPercent=95 -quietFor=1h
ec.rebuild -force
ec.balance -force
In 3.24 there was still a bug that was causing ec.encode -fullPercent=95 -quietFor=1h
to not address collection correctly on its own. So we have had explicit ec.encode -collection=b1 -fullPercent=95 -quietFor=1h
commands to work around this for a few buckets. In 3.55 this now works as expected. The extra collections on that volumeId actually contain collections that we did not erasure code before, so it can't be any of the explicit encode commands, but the general. And otherwise only the balancing and rebuilding is left.
Again trying currently to work through the logic.
Biggest question for me is why is collection_1857.ec03 and a parallel collection2_1857.ec03 even a problem currently for the system. I found by now only logic which is explicitly building the name, and nothing that only filters for the ending _volumeId.ecxx
from seaweedfs.
One more info. It looks like these extra files were always created on the node that was used for the reconstruction action in that moment.
from seaweedfs.
Ok, the 0 byte files come could come from here
or
More likely the first one as it looks.
from seaweedfs.
Ok, so 0 byte files can be created if there is already a 0 byte file existing:
seaweedfs/weed/storage/erasure_coding/ec_encoder.go
Lines 259 to 261 in 76a6285
This return causes the whole rebuild procedure to cancel without writing anything to the new files. But the new 0 byte files are not getting cleaned up as well.
So the possible scenario for a first 0 byte file in the first place might be an unfortunate exit or crash of the volume server. I am not sure if a crash is necessary, or if an exit would be enough. I will try if I can reproduce this with all this information, might be barking at the wrong tree though.
from seaweedfs.
That is not an explanation to the foreign collections at all yet though.
from seaweedfs.
I wrote a short script
import { globStream } from 'glob'
import fs from 'fs/promises'
const hashmap = {}
const originals = {}
const delay = {}
const pt = '/mnt/d1/weed'.length
const gS = globStream(['/mnt/*/weed/*.ec[0-9][0-9]'])
gS.on('data', async path => {
const vol = path.substring(path.lastIndexOf('_') + 1, path.length - 5)
const fin = path.substring(pt, path.length - 5)
const {size} = await fs.stat(path);
if (!size) console.log(`0 byte ${path}`)
if(!hashmap[vol] && size) {
hashmap[vol] = fin;
originals[vol] = path;
}
else if(hashmap[vol] && hashmap[vol] !== fin) {
if(originals[vol]) {
console.log(`original ${originals[vol]}`)
delete originals[vol]
}
console.log(`duplicate ${path}`, size)
delay[vol]?.forEach(x => console.log(`duplicate ${x.path}`, x.size))
delete delay[vol]
} else if(!hashmap[vol] && size) {
if(!delay[vol]) delay[vol] = []
delay[vol].push({path, size});
}
})
(needs npm i glob)
To identify these weirdities and also the collection duplicates.
I found with that also instances, where
-rw-r--r-- 1 root root 103M Sep 22 21:11 /mnt/d3/weed/b1_1904.ec04
-rw-r--r-- 1 root root 0 Sep 22 04:09 /mnt/d4/weed/b2_1904.ec04
the actual collection came later, than the foreign collection.
The only explanation that I could think of is that req.Collection
is actually not a stable reference, but could change in the worst case. Everything else makes not much sense yet, but it is happening, just dunno why yet.
from seaweedfs.
Even found one example, where there is an instance without collection name + only foreign collections but not the actual one of the same ec shard, in the other cases the ec shard of the actual one was always there.
-rw-r--r-- 1 root root 103M Sep 21 20:08 /mnt/d1/weed/b1_1944.ec08
-rw-r--r-- 1 root root 0 Sep 22 04:45 /mnt/d4/weed/b2_1944.ec02
-rw-r--r-- 1 root root 0 Sep 22 05:38 /mnt/d4/weed/1944.ec02
from seaweedfs.
The only good thing out of all this is, that I get to know the codebase in depth :p
So something I suspected already, the info gets simply extracted by the underscore. Which makes sense of course:
seaweedfs/weed/storage/disk_location.go
Lines 109 to 116 in 76a6285
And regardless of the collection they get added to the same array
seaweedfs/weed/storage/disk_location_ec.go
Lines 160 to 173 in 76a6285
That is why those foreign collections make trouble... .
I still have no Idea though how this happened. My guess is that the streaming change you reverted maybe is related to this. We had 3.56 running for one day. Then it completely locked up and marked all volumes non writeable and we downgraded to 3.55. So maybe those files were generated by a bug in 3.56 and causing trouble now in 3.55.
So the second logic probably should check also that the collection is not something else, and ignore these files (auto delete is probably only a good idea if it is a 0 byte file)
from seaweedfs.
I think I might have found the issue @chrislusf
seaweedfs/weed/server/volume_grpc_copy.go
Line 218 in 23f334d
This call is being called without knowing yet whether the file exists or not and creates the file with whatever reached this function.
This is called by
seaweedfs/weed/shell/command_ec_common.go
Line 72 in 23f334d
which earlier gets called by
seaweedfs/weed/shell/command_ec_balance.go
Line 278 in 23f334d
called by
seaweedfs/weed/shell/command_ec_balance.go
Line 472 in 23f334d
called by
seaweedfs/weed/shell/command_ec_balance.go
Line 359 in 23f334d
The collection is not in a single step retrieved from the shard itself, nor the shards get filtered in any step by the collection being passed as parameter. But the initial collection name is what will reach the final copy command in the end. I might have missed something, but checked it multiple times now
T
from seaweedfs.
Let me know if I missed something and am wrong @chrislusf , am on the road for the next few hours again.
from seaweedfs.
The call graph is correct.
from seaweedfs.
ok then this is the issue then
from seaweedfs.
Related Issues (20)
- /vol/status does not return ec volumes
- many volume server
- Volume data loss on windows 11 HOT 1
- error: volume 2 not found HOT 3
- Volume Server starts with 0 volumes HOT 1
- Volumes do not grow when started using server command HOT 1
- many volume server with different disk size HOT 1
- Seaweed FS S3 server over https does not work HOT 1
- ec.decode missing shard n HOT 3
- Support https/tls for weed filer/mount HOT 5
- volume.fix.replication problems when migrating to different DC HOT 2
- how to deploy a high-availability cluster HOT 9
- Seaweedfs temporary credentials for a role with constrained policy?
- 501 not implemented on GET.BUCKET HOT 5
- Wrong default value for volume.ressource HOT 2
- Panic in lock_table_test on 32-bit HOT 1
- [go1.21] code affected by LoopvarExperiment
- helm chart support for full jwt authentication and self certificate HOT 1
- Diskspace filled by erasure coding shards HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from seaweedfs.