Coder Social home page Coder Social logo

Comments (7)

yanfangli1986 avatar yanfangli1986 commented on July 29, 2024

After running the command, I got the following results. I wonder what is wrong here?
./preprocess.sh -s iid --sf 1.0 -k 0 -t sample -tf 0.8
DATASET: shakespeare
424 users
1992135 samples (total)
4698.43 samples per user (mean)
num_samples (std): 10122.73
num_samples (std/mean): 2.15
num_samples (skewness): 6.69

num_sam num_users
0 250
2000 66
4000 18
6000 16
8000 18
10000 13
12000 11
14000 2
16000 5
18000 3

from leaf.

scaldas avatar scaldas commented on July 29, 2024

The Project Gutenberg EBook we use to extract the Shakespeare data has changed. I just updated the relevant pre-processing script to point to a similar version of the file, but the statistics have indeed changed (they will be updated in a new version of the preprint we are working on). Right now, running the same command as @chaoyanghe, I am getting:

####################################
DATASET: shakespeare
1129 users
4226158 samples (total)
3743.28 samples per user (mean)
num_samples (std): 6212.26
num_samples (std/mean): 1.66
num_samples (skewness): 3.35

num_sam num_users
0 705
2000 126
4000 72
6000 56
8000 38
10000 33
12000 31
14000 16
16000 8
18000 11

from leaf.

chaoyanghe avatar chaoyanghe commented on July 29, 2024

@scaldas Hi, Thanks for your reply. I wait for a long time...

I also found the FMNIST can not aligh to your statistics:
(venv) (base) chaoyanghe-hostname:femnist chaoyanghe$ sh stats.sh
####################################
DATASET: femnist
3500 users
791913 samples (total)
226.26 samples per user (mean)
num_samples (std): 89.12
num_samples (std/mean): 0.39
num_samples (skewness): 0.77

num_sam num_users
0 1
20 4
40 11
60 5
80 15
100 65
120 122
140 392
160 1237
180 322
200 44
220 52
240 87
260 92
280 116
300 157
320 156
340 181
360 166
380 147
400 87
420 36
440 3
460 1
480 0

Could you also help to check the reason? Since I will cite your paper I need to claim we use the same dataset.

from leaf.

scaldas avatar scaldas commented on July 29, 2024

@chaoyanghe I will look into this, but if your work is time-sensitive, consider using the FEMNIST version hosted at Tensorflow Federated (they call it EMNIST). They host their own (slightly different) version and thus don't have the problem of mutating sources (which I believe is the issue here as well).

https://www.tensorflow.org/federated/api_docs/python/tff/simulation/datasets/emnist

We will look into hosting our own version of the datasets in the future.

from leaf.

Enehta avatar Enehta commented on July 29, 2024

@scaldas I just tried to get a fresh FEMNIST data set and I am only getting 1900 users instead of before 3500. Was that data set changed as well?

from leaf.

scaldas avatar scaldas commented on July 29, 2024

@Enehta Unfortunately, at the time we are only hosting preprocessing scripts for data that is hosted elsewhere. If that data mutates, our resulting scripts also mutate. We are actively working on solving this through our own hosting of the datasets. In the meantime, consider using the FEMNIST version hosted at Tensorflow Federated (they call it EMNIST). They host their own (slightly different) version and thus don't have the problem of mutating sources.

https://www.tensorflow.org/federated/api_docs/python/tff/simulation/datasets/emnist

from leaf.

future-xy avatar future-xy commented on July 29, 2024

Interestingly, I found there are 3500 users and totally 803267 samples in the FEMNIST dataset.
####################################
DATASET: femnist
3500 users
803267 samples (total)
229.50 samples per user (mean)
num_samples (std): 89.03
num_samples (std/mean): 0.39
num_samples (skewness): 0.71

from leaf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.