google-research / mixmatch Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
When using mixmatch.py to train a dataset (e.g. cifar10 README example), the summary generated by eval_stats
seems to freeze the program after a few epochs. The problem seems to be introduced at the start of evaluating the valid
subset, during the loop:
for subset in ('train_labeled', 'valid', 'test'):
images, labels = self.tmp.cache[subset]
predicted = []
for x in range(0, images.shape[0], batch):
p = self.session.run(
classify_op,
feed_dict={
self.ops.x: images[x:x + batch],
**(feed_extra or {})
})
predicted.append(p)
predicted = np.concatenate(predicted, axis=0)
accuracies.append((predicted.argmax(1) == labels).mean() * 100)
However, it does continue training at the correct epoch when interrupting and running the command again.
(I am using Tensorflow 1.14 on a Titan X gpu.)
Dear Authors
From my understanding, line 113-130 in create_split.py is indeed writing all the samples that are not in label (a set of labels for the chosen for the labeled set) to the -unlabel.tfrecord. I am wondering why did you separate this part into two parts: 'else: ...' (from line 113- 120) and 'for remain in class_data' (from line 1226-130). Did I miss anything?
Thanks a lot!
I found that scripts in the runs/
directory use only one sample for validation, e.g., here.
The paper also said that the median error rate of the last 20 checkpoints is used when reporting, which means that no validation is required. Is this the reason why the code uses only one validation sample?
Am I correct? If not, how many validation samples are used?
Very thanks for your code!
I meet ValueError
when i run python pseudo_label.py --train_dir experiments/compare --dataset=cifar10.1@250-1 --wd=0.02 --smoothing=0.01 --consistency_weight=1
Traceback (most recent call last):
File "pseudo_label.py", line 130, in
app.run(main)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "pseudo_label.py", line 93, in main
dataset = data.DATASETSFLAGS.dataset
File "/mnt/SSD/ssl/mixmatch/libml/data.py", line 171, in create
train_labeled=fn(train_labeled).map(augment[0], para),
File "/mnt/SSD/ssl/mixmatch/libml/data.py", line 67, in memoize
dataset = dataset.prefetch(16)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 1777, in prefetch
return DatasetV1Adapter(super(DatasetV1, self).prefetch(buffer_size))
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 721, in prefetch
return PrefetchDataset(self, buffer_size)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 3503, in init
**flat_structure(self))
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_dataset_ops.py", line 4297, in prefetch_dataset
slack_period=slack_period, name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 366, in _apply_op_helper
g = ops._get_graph_from_inputs(_Flatten(keywords.values()))
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 5885, in _get_graph_from_inputs
_assert_same_graph(original_graph_element, graph_element)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 5821, in _assert_same_graph
(item, original_item))
ValueError: Tensor("buffer_size:0", shape=(), dtype=int64) must be from the same graph as Tensor("ParallelMapDataset:0", shape=(), dtype=variant).
Hi authors, thanks very much for the amazing work. I wonder what's the purpose of interleave
function in the mixmatch.py. I tried to remove this, but the performance is very bad. I can't find any description about this in the github repo or papers.
Lines 71 to 78 in 9096b68
Dear authors
Thanks for your great work!
My question is what is the expected/proper behavior of the consistency loss over the course of training? Is it supposed to go down smoothly like the labeled loss term? or Is it supposed to oscillate around certain level during the training?
Thanks!
In the Mean Teacher paper, teacher model weights are EMA of student model's weights. In your implementation of Mean Teacher is the student model and teacher model have exactly the same weights?
Hi,
In section 3.5 of your paper:
In all experiments, we linearly ramp up λ_u to its maximum value over the first 16,000 steps of training as is common practice [44].
But the implementation of lambda seems to ramp up lambda_u in 1024 epochs(1024*1024 steps).
I have 4 gpus and I noticed that libml/utils.py have some function (para_list, para_mean, etc) for parallel training, but it's seems not to be used. I'm not familiar with multi gpus training, how can I use these functions, should I modify mixmatch.py?
UDA(https://github.com/google-research/uda) could achieve good accuracy by only 20 training data on text classification. But I find it is hard to reproduce the result on my own dataset.
So I want to know the reason why UDA or MixMatch works. And I want to know what is the most important thing to reproduce the result.
@david-berthelot Thank you very much.
Dear Authors
I'm wondering is there an easy way to save the train accuracies (a list of accuracies) and test accuracies (a list of accuracies) on disk, maybe as a pkl file or npy file using your codebase? I saw that the train accuracy and test accuracy of each checkpoint are calculated with the eval_stats function, and the eval_stats function is called by the add_summaries function. I know we can use tensorboard to monitor the training progress, but I'm not sure how to save those accuracies on disk (I am also not sure which function called add_summaries and when)?
Sorry, I'm relatively new to tensorflow. It would really appreciate if you can point me to the right direction.
Many thanks!
Hi,
I am going through this codebase carefully, and I find that the wide-resnet you used is slightly different from the model proposed in the paper. In your model, the last two stages did not include the bn-relu in the residual blocks as in the paper (which makes the shortcut clean). What is the reason of using this structure please? Does this structure help to boost the unsupervised performance ?
Could you please tell me the exact architecture for cifar-100? It stated that it has increased the width of the wide-resnet. What is the increased width of the new architecture. For the 28*2 architecture, it increased the number of filters from 16 to 32 to 64 to 128. What's your meaning that "it has 135 filters per layer"? Does it mean all convolution layer has 135 layers?
Thanks a lot!
Hi
I'm just wondering why in your wide resnet backbone, you don't use dropout. Isn't that the original wide resnet paper showed that dropout could be useful?
Thanks
I recently read this amazing following paper ReMixMatch. The arxiv version points to an empty github repo. I am wondering that when will the code be made available? Is it possible that we can access it soon so that I can use it for my ICML submission?
Hi I read your paper thoroughly and think it is interesting.
Could you tell me the reason why you chose to use Beta Distribution?
Hello, I am very inspired to read your thesis, but when I run the code, I find that the code is missing files. Can you send me the complete code? Appreciate it!
Does the "-map.json" file have any purpose in the training process at all? I see that it stores the ids of images that are used as unlabeled data, but cannot find any usage of it anywhere in the repository.
Can you also explain why you read from the "-label.tfrecord" and "-unlabel.tfrecord" twice (for train_labeled, train_unlabeled, eval_labeled, eval_unlabeled respectively) in libml/data.py? Is this because you fetch a sample of images for training and then later fetch the same set of samples to compute evaluation metrics?
Hi,
Just to make sure, considering you achieved state-of-art performance on STL-10. I want to make sure if you used the unlabeled data or not in your current best result with all 5000 labels.
Thanks a lot!
Hi, I do not how what is the "post_ops" in the following codes for? could you help explain for it?
batches = layers.interleave([x] + y, batch)
skip_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
logits = [classifier(batches[0], training=True)]
post_ops = [v for v in tf.get_collection(tf.GraphKeys.UPDATE_OPS) if v not in skip_ops]
During the training of our own datasets, how should the train_kimg, report_kimg, and save_kimg flags change from the default values? Is train_kimg to represent the total number of images in our dataset, both labeled and unlabeled?
The help message for train_kimg states 'Training duration in kibi-samples', but I don't quite understand what the kibi-samples are.
Thank you for your help!
Hi,
Thanks for the excellent paper and the generous share of this code base. I am trying to learn some details from this repo. Please allow me to ask a few questions:
Would you please explain why did you use mse loss rather than the normal cross entropy (or kl diversity loss) to train on the unlabeled data ? What is the philosophy behind this choice please?
By the way, I noticed other issues discussing about interleave the labeled and unlabeled batches. Do you think whether it would work if we first concatenate the labeled and unlabeled batch to one single big batch, and than run forward with this big batch. In this way, we could compute the labeled loss with the first batch size number of samples and compute the unlabeled loss with the rest samples?
What I mean is roughly like this:
inten = concat([x, u1, u2])
logits = model(inten)
loss_x = cross_entropy(logits[:batchsize], label_x)
loss_u = mse(logits[batchsize:], label_u)
loss = loss_x + lam * loss_u
...
Would you please tell me why you used the interleave method rather than the above method?
Hi,
Thank you for the amazing paper and sharing the code! I wondered did you try this MixMatch method on other models (e.g. DNN121) and other datasets? I implemented this method on CIFAR10 using DNN121, but the result is much worse than using WideResNet. I also implemented this method on medical images, but the result is worse than PI model and mean teacher.
I'm sure this is probably due to different hyperparameters. I tried different sets of learning rates and lamba_u, but it didn't improve the result much. If possible, could you provide some tips on how to tune the parameters for different models?
Thank you for your help in advance!
train_op = tf.train.AdamOptimizer(lr).minimize(loss_xe + w_match * loss_l2u, colocate_gradients_with_ops=True)
with tf.control_dependencies([train_op]):
train_op = tf.group(*post_ops)
when you define train op like this, if the optimizer will be ignored and only the post_ops will be updated when training, and how is this work?
Dear authors
You mentioned that the model performance is evaluated using EMA of model parameters during training. So if we save a checkpoint during training and after training is done, we load that checkpoint and evaluate the model performance, wouldn't that be the performance of just one single checkpoint instead of the EMA of model parameters?
How can we deploy the model in real world setting say we want to saved the model checkpoint and then load the saved checkpoint to do inference on unseen data?
Thank you!
Hello, can I use it for multi label classification? If so, what should I pay attention to in the process of tag prediction? For multi label classification, sigmoid is generally used as the loss function. In this case, can you change your loss function to sigmoid?
ModuleNotFoundError: No module named 'libml'
Hi, In mixmatch, inappropriate dependency versioning constraints can cause risks.
Below are the dependencies and version constraints that the project is using
absl-py
easydict
cython
numpy
tensorflow-gpu==1.14.0
tqdm
The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict.
The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.
After further analysis, in this project,
The version constraint of dependency tqdm can be changed to >=4.42.0,<=4.64.0.
The above modification suggestions can reduce the dependency conflicts as much as possible,
and introduce the latest version as much as possible without calling Error in the projects.
The invocation of the current project includes all the following methods.
itertools.product tqdm.tqdm tqdm.trange
self.eval_checkpoint outputs.append tqdm.trange.write tensorflow.contrib.distributions.Categorical.entropy self.experiment_name numpy.prod tensorflow.get_collection tensorflow.device tensorflow.train.Features int collect_samples libml.data_pair.stack_augment tensorflow.nn.relu os.path.join max self.augment absl.app.UsageError tensorflow.name_scope functools.partial AblationMixMatch tqdm.trange FLAGS.set_default train_labeled.make_one_shot_iterator.get_next.make_one_shot_iterator tensorflow.stop_gradient.get_shape self.tmp.print_queue.pop json.load tensorflow.random_normal config.items tensorflow.variable_scope data.make_one_shot_iterator.get_next.map format absl.flags.DEFINE_string tensorflow.train.Example train_labeled.fn.map tensorflow.range join writer_unlabel.write data.make_one_shot_iterator.get_next.make_one_shot_iterator NotImplementedError list train_unlabeled.make_one_shot_iterator.get_next.make_one_shot_iterator _int64_feature classifier FLAGS.batch.ds.batch.prefetch.make_one_shot_iterator.get_next getter tensorflow.shape tensorflow.python_io.tf_record_iterator numpy.argmax FLAGS.dataset.DATASETS _is_installed pos.id_class.class_data.append tensorflow.train.Int64List data.len.tf.data.Dataset.range.repeat tensorflow.train.Saver self.dataset.train_unlabeled.batch minmax.append logits_y.tf.distributions.Categorical.entropy writer_label.write tensorflow.Variable t.update batch.self.dataset.train_labeled.batch.prefetch argv.pop tensorflow.train.MonitoredSession accuracies.append path.open.write id_class.append tensorflow.summary.scalar tarfile.open tensorflow.exp ICT.train re.compile numpy.frombuffer interleave_offsets itertools.product augment_mirror libml.layers.interleave tuple.append train_data_labels.append easydict.EasyDict forward beta.beta.tf.distributions.Beta.sample c.class_data.pop tensorflow.round dict.keys _encode_png train_session._tf_sess cls train_unlabeled.fn.map tensorflow.random_shuffle ICT tensorflow.reduce_mean numpy.log2 numpy.ceil glob.glob fn self.eval_mode tensorflow.reduce_sum logsoftmax tensorflow.contrib.distributions.Categorical absl.flags.DEFINE_float libml.utils.model_vars saver ValueError re.compile.match tensorflow.data.TFRecordDataset URLS.format dataset.batch.prefetch.make_one_shot_iterator x.r_step.match.group ds.batch stack_augment tensorflow.train.Saver.restore libml.utils.ilog2 dataset.shuffle.shuffle ShakeNet.classifier tensorflow.distributions.Beta MeanTeacher.train self.guess_label tensorflow.argmax map AblationMixMatch.train tensorflow.train.MonitoredTrainingSession tensorflow.log numpy.random.shuffle unflatten numpy.array.keys tar.extractfile.read tensorflow.control_dependencies DataSet.creator tensorflow.split shutil.rmtree len branch VAT.train collections.OrderedDict tensorflow.train.NewCheckpointReader.get_tensor batch.self.dataset.train_unlabeled.batch.prefetch self.train_print dataset.shuffle.prefetch DataSetFS.creator tensorflow.image.random_flip_left_right PiModel tensorflow.reduce_any tensorflow.image.encode_png tensorflow.train.Example.SerializeToString dataset.shuffle.make_one_shot_iterator tensorflow.one_hot tensorflow.nn.softmax tensorflow.train.Feature x.repeat.shuffle train_labeled.make_one_shot_iterator.get_next sess.run.min os.makedirs loop.close self.augment_pair self.save_args PseudoLabel.train labels.predicted.argmax.mean entropy_from_logits conv_args iterator tensorflow.logging.set_verbosity collect_hashes FSMixup tensorflow.rsqrt libml.data.compute_mean_std numpy.random.seed tensorflow.layers.batch_normalization FLAGS.batch.to_byte.data.map.batch.prefetch.make_one_shot_iterator.get_next.map tensorflow.FixedLenFeature tensorflow.assign_add tensorflow.train.NewCheckpointReader tensorflow.pad tensorflow.layers.dense frozenset libml.utils.get_config print tensorflow.random_uniform third_party.vat_utils.generate_perturbation numpy.zeros.max root.dataset.take any numpy.median center self.train_step absl.flags.DEFINE_enum self.dataset.train_labeled.batch tensorflow.stop_gradient tensorflow.parse_single_example tensorflow.layers.batch_normalization.get_shape tensorflow.train.BytesList self.cache_eval PseudoLabel get_available_gpus config.get tensorflow.reduce_max ConvNet.classifier compute_mean_std ResNet.classifier hasher raw.append mixmatch.MixMatch.train tensorflow.maximum os.path.abspath tensorflow.image.decode_image tensorflow.concat tensorflow.placeholder sess.run checksums.items session.run tensorflow.Session Mixup.train tensorflow.cast files.items residual parse_fn.take tensorflow.layers.average_pooling2d FSBaseline.train tensorflow.pow tensorflow.train.AdamOptimizer tensorflow.glorot_normal_initializer _bytes_feature sorted kwargs.items tensorflow.train.summary_iterator open.write libml.utils.setup_tf numpy.array tensorflow.greater tensorflow.ones MixMatch libml.utils.get_available_gpus parse_fn libml.data.dataset FLAGS.batch.ds.batch.prefetch.make_one_shot_iterator train_unlabeled.make_one_shot_iterator.get_next numpy.stack libml.layers.kl_divergence_from_logits numpy.concatenate.append tensorflow.train.ExponentialMovingAverage tensorflow.group urllib.request.urlretrieve self.session.run numpy.concatenate split tensorflow.get_collection.extend get_class.dataset.map.batch.map FileExistsError x.data.map.batch.prefetch tensorflow.gradients min tempfile.NamedTemporaryFile CNN13.classifier tarfile.open.extractfile train_data_batches.append numpy.max f.write writer.write mixmatch.MixMatch train_files.data.dataset.skip FLAGS.seed.data.split.split os.path.exists libml.utils.find_latest_checkpoint tensorflow.reshape FLAGS.batch.to_byte.data.map.batch.prefetch.make_one_shot_iterator tensorflow.train.Scaffold dataset.batch.prefetch batch.mean tuple MixMode.augment_pair tensorflow.distributions.Categorical config tensorflow.python_io.TFRecordWriter img.hasher.digest summary_dict get_class.dataset.map.batch self.eval_stats Mixup collect_hashes.add FLAGS.dataset.data.DATASETS tensorflow.clip_by_value tensorflow.data.Dataset.range libml.data.DATA_DIR.os.path.join.open.read self.tune augment i.class_id.append tensorflow.py_func.append FLAGS.batch.ds.batch.prefetch Model.__init__ scipy.io.loadmat average_grads.append loop.update libml.layers.interleave.append train_session.run libml.data.DataSet.creator json.dump numpy.zeros FLAGS.batch.to_byte.data.map.batch.prefetch.make_one_shot_iterator.get_next dataset.batch.prefetch.make_one_shot_iterator.get_next kl_divergence_with_logit tensorflow.random_crop FLAGS.p_unlabeled.split x.repeat tensorflow.get_collection.append FLAGS.batch.to_byte.data.map.batch.prefetch absl.app.run dict.items tensorflow.constant tensorflow.to_float os.path.dirname find_latest_checkpoint images.reshape tensorflow.transpose tensorflow.abs absl.flags.register_validator set.add numpy.concatenate.argmax self.tmp.print_queue.append sum str range frozenset.append to_byte.data.map.batch stats.append tensorflow.train.ExponentialMovingAverage.apply tensorflow.train.replica_device_setter leaky_relu json.dumps numpy.array.max FileNotFoundError tensorflow.layers.conv2d unlabel.append enumerate get_normalized_vector tensorflow.py_func absl.flags.DEFINE_integer tensorflow.python.client.device_lib.list_local_devices dict data.make_one_shot_iterator.get_next.append get_logits parse_fn.concatenate libml.data.DATA_DIR.os.path.join.open.read.split x.data.map.batch tensorflow.layers.max_pooling2d dataset.batch dataset.make_one_shot_iterator.get_next tensorflow.assign tensorflow.nn.sparse_softmax_cross_entropy_with_logits lr.tf.train.AdamOptimizer.minimize tensorflow.ConfigProto numpy.sqrt PiModel.train VAT get_latest_global_step numpy.transpose tensorflow.gather self._create_initial_files renorm collections.defaultdict dataset.shuffle.map tensorflow.square isinstance libml.layers.MixMode zip get_class.dataset.map.batch.make_one_shot_iterator data.make_one_shot_iterator.get_next open scipy.io.loadmat.flatten train_files.data.dataset.take libml.layers.shakeshake sess.run.max self.model tensorflow.contrib.distributions.kl_divergence tensorflow.sqrt MeanTeacher tensorflow.stack set tensorflow.to_int32 self.add_summaries absl.flags.DEFINE_bool tensorflow.random_normal_initializer tensorflow.nn.softmax_cross_entropy_with_logits_v2 ema.average MixMatch.train scipy.io.loadmat.reshape tensorflow.train.get_or_create_global_step dataset.test.repeat os.path.isdir tqdm.tqdm interleave_offsets.append DATASETS.update dataset FSBaseline augment_shift root.dataset.skip FSMixup.train
@developer
Could please help me check this issue?
May I pull a request to fix it?
Thank you very much.
Dear Authors
thanks for your inspiring works. I would like to try MixMatch or FixMatch on higher resolution images. I think your current implementation is already compatible with images with any resolutions? as long as it is RGB images. Is that correct? Any other comments or suggestions that you can give on adapting MixMatch or FixMatch to higher resolution images?
Thanks a lot!
Hi, first of all thank you for this great work, and for providing the implementation.
I really liked the plots in the paper, and was wandering what package / setting was used to make them.
Please feel free to close this issue if this is not the right place to ask such questions.
Thanks.
In the experiments mentioned in the paper, MixMatch is trained with a varying number of labelled examples (250 to 4000) and we see that the error rate is very close to a fully supervised model trained using the complete dataset (50000 labelled examples).
However, there is no mention of the error rates of the fully supervised models using less number of labelled examples i.e a comparison for example between the error rates of MixMatch (trained with 250 labelled examples and all other unlabelled examples) and a fully supervised model (using only 250 labelled examples).
This comparison would help in determining whether the unlabelled data is actually adding any information or not.
From a practical standpoint, if a fully supervised model trained using only 250 labelled examples achieves an error rate almost equal to the one achieved by MixMatch, we can simply use the fully supervised model.
Would highly appreciate if such a comparison, if done, can be made available.
Thanks!
I'm right now trying MixMatch on my tabular data. But I don't know what would be a good augmentation method. Could you please give me some advice? Thank you!
Does this implementation use pre-trained weights from any dataset to achieve the results shown in the paper?
I know that in Oliver et. al., 2018 ("Realistic Evaluation of Deep Semi-Supervised Learning Algorithms"), which is cited by your paper, they claim that transfer learning or using pre-trained weights may actually lead to better results than using semi-supervised learning.
I'm curious to know if this implementation does use transfer learning, or has a mechanism for doing so. If so, then it would allow us to see if combining MixMatch with transfer learning has even more performance improvements and how well using transfer learning alone performs as well.
Dear Authors,
Thanks for your very inspiring works.
I saw that in the implementation details section of your paper "MixMatch: A Holistic Approach to Semi-Supervised Learning", you said that "we checkpoint every 2^16 training samples and report the median
error rate of the last 20 checkpoints. This simplifies the analysis at a potential cost to accuracy...". I know you already mentioned that reporting the median would possibly come with cost to accuracy, I am wondering do you have some estimates on how large the cost would be, and do you have any suggestions or rule of thumbs on how to choose the total number of training steps, since we might imagine that with a poorly chosen total number of training steps, the last 20 checkpoints could fall into the region where the model is already overfitting, and the performance on test set is much worse than it could be?
Thanks!
# Download datasets
CUDA_VISIBLE_DEVICES= ./scripts/create_datasets.py
I used this command and got the following output:
./scripts/create_datasets.py: line 16: Script to download all datasets and create .tfrecord files.
: command not found
./scripts/create_datasets.py: line 18: import: command not found
./scripts/create_datasets.py: line 19: import: command not found
./scripts/create_datasets.py: line 20: import: command not found
./scripts/create_datasets.py: line 21: import: command not found
./scripts/create_datasets.py: line 22: import: command not found
./scripts/create_datasets.py: line 23: import: command not found
from: can't read /var/mail/urllib
from: can't read /var/mail/easydict
from: can't read /var/mail/libml.data
./scripts/create_datasets.py: line 28: import: command not found
./scripts/create_datasets.py: line 29: import: command not found
./scripts/create_datasets.py: line 30: import: command not found
from: can't read /var/mail/tqdm
./scripts/create_datasets.py: line 33: URLS: command not found
./scripts/create_datasets.py: line 34: svhn:: command not found
./scripts/create_datasets.py: line 35: cifar10:: command not found
./scripts/create_datasets.py: line 36: cifar100:: command not found
./scripts/create_datasets.py: line 37: stl10:: command not found
./scripts/create_datasets.py: line 38: syntax error near unexpected token `}'
./scripts/create_datasets.py: line 38: `}'
should there be something after "=" instead of a space and the python script?
Do you have plans to make this repository compatible with a custom dataset, and if not, which files would need to be modified to do so?
Additionally, were the tfrecords generated using the standard generate scripts present in the tensorflow library (ex: generate_cifar10_tfrecords.py from tensorflow/models/blob/master/tutorials/image/cifar10_estimator/) ?
Generally speaking, I noticed that the mechanisms for loading the dataset (libml/data_pair.py, libml/data.py), augmenting the dataset, and performing evaluations would have to be changed to accommodate for a different dataset.
According to my understanding:
The label guessing followed by sharpening can result in erroneous predictions in the beginning which means the unlabelled loss function is not useful during the first few iterations, but the labelled loss function will lead to meaningful updates in the model weights which will eventually lead to better guessed labels and thus the unlabelled loss function will also function become useful after a few iterations.
Can we first use only the labelled loss function and then after training introduce the unlabelled loss function and train again?
Hello,
I'm trying to run your project on google colab.
I ran your example line after the setup :
CUDA_VISIBLE_DEVICES=0 python mixmatch.py --filters=32 --dataset=cifar10.3@250-5000 --w_match=75 --beta=0.75
I can see on stdout that there are 1024 epochs, each taking about 5 minutes for a speed of about 180 images / sec. That leads to a total experiment time of 85 hours.
Is it normal for the experiment to be that long on a Colab (K80) GPU ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.