Comments (7)
After running the command, I got the following results. I wonder what is wrong here?
./preprocess.sh -s iid --sf 1.0 -k 0 -t sample -tf 0.8
DATASET: shakespeare
424 users
1992135 samples (total)
4698.43 samples per user (mean)
num_samples (std): 10122.73
num_samples (std/mean): 2.15
num_samples (skewness): 6.69
num_sam num_users
0 250
2000 66
4000 18
6000 16
8000 18
10000 13
12000 11
14000 2
16000 5
18000 3
from leaf.
The Project Gutenberg EBook we use to extract the Shakespeare data has changed. I just updated the relevant pre-processing script to point to a similar version of the file, but the statistics have indeed changed (they will be updated in a new version of the preprint we are working on). Right now, running the same command as @chaoyanghe, I am getting:
####################################
DATASET: shakespeare
1129 users
4226158 samples (total)
3743.28 samples per user (mean)
num_samples (std): 6212.26
num_samples (std/mean): 1.66
num_samples (skewness): 3.35
num_sam num_users
0 705
2000 126
4000 72
6000 56
8000 38
10000 33
12000 31
14000 16
16000 8
18000 11
from leaf.
@scaldas Hi, Thanks for your reply. I wait for a long time...
I also found the FMNIST can not aligh to your statistics:
(venv) (base) chaoyanghe-hostname:femnist chaoyanghe$ sh stats.sh
####################################
DATASET: femnist
3500 users
791913 samples (total)
226.26 samples per user (mean)
num_samples (std): 89.12
num_samples (std/mean): 0.39
num_samples (skewness): 0.77
num_sam num_users
0 1
20 4
40 11
60 5
80 15
100 65
120 122
140 392
160 1237
180 322
200 44
220 52
240 87
260 92
280 116
300 157
320 156
340 181
360 166
380 147
400 87
420 36
440 3
460 1
480 0
Could you also help to check the reason? Since I will cite your paper I need to claim we use the same dataset.
from leaf.
@chaoyanghe I will look into this, but if your work is time-sensitive, consider using the FEMNIST version hosted at Tensorflow Federated (they call it EMNIST). They host their own (slightly different) version and thus don't have the problem of mutating sources (which I believe is the issue here as well).
https://www.tensorflow.org/federated/api_docs/python/tff/simulation/datasets/emnist
We will look into hosting our own version of the datasets in the future.
from leaf.
@scaldas I just tried to get a fresh FEMNIST data set and I am only getting 1900 users instead of before 3500. Was that data set changed as well?
from leaf.
@Enehta Unfortunately, at the time we are only hosting preprocessing scripts for data that is hosted elsewhere. If that data mutates, our resulting scripts also mutate. We are actively working on solving this through our own hosting of the datasets. In the meantime, consider using the FEMNIST version hosted at Tensorflow Federated (they call it EMNIST). They host their own (slightly different) version and thus don't have the problem of mutating sources.
https://www.tensorflow.org/federated/api_docs/python/tff/simulation/datasets/emnist
from leaf.
Interestingly, I found there are 3500 users and totally 803267 samples in the FEMNIST dataset.
####################################
DATASET: femnist
3500 users
803267 samples (total)
229.50 samples per user (mean)
num_samples (std): 89.03
num_samples (std/mean): 0.39
num_samples (skewness): 0.71
from leaf.
Related Issues (20)
- index not found error HOT 1
- A question on stacked_lstm model for sent140
- Experiment on sent140 is not generating the result mentioned in the paper
- The Reddit (small) splits seem to have the same data for training, validation and test sets HOT 4
- Download dataset too slow.
- download the femnist data HOT 1
- tf.layers.dense logit values are not correct in synthetic log_reg.py model
- federated learning anomaly detection HOT 2
- PyTorch Version? HOT 1
- Is F-EMNIST class-balanced?
- Fedprox
- local differential privacy
- incorrect model for CelebA
- The statistics for FEMNIST seems to be inaccurate? HOT 2
- Preprocessing of sent140
- TypeError: 'tuple' object does not support item assignment HOT 2
- accuracy
- If i want to use cifar10 how can i do it ? HOT 1
- split data raise error HOT 1
- Completed porting all shell commands in LEAF into python code!
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from leaf.