fastai / fastai_dev Goto Github PK

fast.ai early development experiments

License: Apache License 2.0

Jupyter Notebook 96.93% Python 1.42% Makefile 0.01% Smarty 0.01% C++ 0.09% Cuda 0.02% Swift 1.51% C 0.02% CMake 0.01% Shell 0.01%

fastai_dev's Introduction

fastai_dev

This repo is used for fastai development. If you're looking for the version 2 of the fastai library, go here.

fastai_dev's People

Contributors

Stargazers

Watchers

Forkers

stas00 discdiver eggie5 cklukas edoffagne bmalhi1 tylernisonoff bhollan sotte saadorj gaellemg cedrickchee rawmarshmellows harshvardhansh windshield-rider dienhoa andrewreece annieholly oostopitre chrishna1 andrewayres vborees twofyw lindyrock sabasiddiqi atulanandnitt pectoraux lokeshsoni gravitytrope stevenchang8 nathanie brent-thomas-walker amirunpri2018 miaviles swarna04 willingc fukasawa-gu init27 yiluxiangbei000 superaja rdaligadu mickeykubo mengwangk hseyilishih spurusho traderandreas dsvrsec niranjanaryan abhimanyuaryan bolt25 shashankhalo7 kartikmehta09 abhinavm24 none53 isanjayyadava bennnun salamit miwojc deena-b nkoenig06 nebgor shakasom naoyak apolmig wittmannf lessw2020 hfoffani dineshchauhan adpostma svishnu88 sushengloong xraycat123 ant3ng magnieet omarsf miniso911 algal shgidi fhestvang dan-zheng gshashank84 lisburnlad laurenspiegel blowoffvalve shuyib gazay natikgadzhi ilopezfr wnoxchi julclu supercurious ebochkov sumeetsk yzuaiyou zzurang marcrasi debbieleung ritika26 clueprints test-v1

fastai_dev's Issues

Callbacks - "begin_epoch" always called 2x...

Create a callback and put a print statement in the "begin_epoch"... you'll see it is always called 2x.
This messes up any attempt at tracking the % of overall run (epochs * batch_size).

Docs bug

@sgugger there's a bug in the generated docs. The issue is that the "ID=" attribute in the HTML includes the "<code>" tags, so anchor hyperlinks don't work. for instance, this doesn't work:
http://dev.fast.ai/data.pipeline.html#Transform

learn.get_preds is very memory hungry and terminates with an error

Memory consumption is high regarding which dl I pass. I am using the following list of metrics: [accuracy_multi, PrecisionMulti(), RecallMulti()]

This is the error I receive:

overview link under "Library Development" is broken

seems to point to:
https://github.comfastaifastaiblobmasterdocs-devdevelop.md/

Probably wrong info in a sentence - Beginner's Tutorial

Here in labels.csv, it is said that we have to add 'train' as prefix to filenames, but I already see the file names in csv having train as prefix. I am beginner newly reading these, so not sure if I am missing something so creating this issue.

Issue

My analysis:

def print_N_lines(f,N):
    c = 0
    with open(f) as infile:
        for line in infile:        
            print(line.replace('\n',''))
            c += 1
            if c == N:
                break

print_N_lines(os.path.join(planet,'labels.csv'),5)  # sneak peak at labels data in labels.csv

Output:

image_name,tags
train_31112,clear primary
train_4300,partly_cloudy primary water
train_39539,clear primary water
train_12498,agriculture clear primary road

Bug in nb_002c and nan loss in 006b_pascal training

Hi,
I was going through pascal notebook and got the below error while trying to execute show method.

~/my-work/fastai_dev/dev_nb/nb_002c.py in _perspective_warp(c, targ_pts)
     37 def _perspective_warp(c:FlowField, targ_pts:Points):
     38     "Apply warp to `targ_pts` from `_orig_pts` to `c` `FlowField`"
---> 39     return apply_perspective(c, find_coeffs(_orig_pts, targ_pts))
     40 
     41 @TfmCoord

~/my-work/fastai_dev/dev_nb/nb_002c.py in find_coeffs(orig_pts, targ_pts)
     20     B = FloatTensor(orig_pts).view(8)
     21     #The 8 scalars we seek are solution of AX = B
---> 22     return torch.gesv(B,A)[0][:,0]
     23 
     24 def apply_perspective(coords:FlowField, coeffs:Points)->FlowField:

AttributeError: module 'torch' has no attribute 'gesv'

Quickly went through the pytorch code where they stated .gesv has been removed in 1.2.0 version and advised to use .solve.
please change torch.gesv(B,A)[0][:,0] to torch.solve(B.reshape(-1,1),A)[0][:,0] as i dont know how to make a PR to fastai.:grimacing: sorry.

Second thing is that i am getting a nan loss in learner.lr_find() as well as while training, dont know the reason. I will try to debug more into it.Any help on how to do that will be appreciated.

part2v3: CPU-only still supported or GPU is now mandatory?

Would you be open to a PR allowing CPU-only people to run the code with no changes or that's out of scope and GPU is a prereq?

/content/exp/nb_08.py in <module>()
    205 _m = tensor([0.47, 0.48, 0.45])
    206 _s = tensor([0.29, 0.28, 0.30])
--> 207 norm_imagenette = partial(normalize_chan, mean=_m.cuda(), std=_s.cuda())
    208 
    209 import math

/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py in _lazy_init()
    160             "Cannot re-initialize CUDA in forked subprocess. " + msg)
    161     _check_driver()
--> 162     torch._C._cuda_init()
    163     _cudart = _load_cudart()
    164     _cudart.cudaGetErrorName.restype = ctypes.c_char_p

RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:51

kornia is not part of dependencies on pip install

loss smoothing leads to a jump / fall on graph before it kicks in

Example of the behavior below:

Main concern is that this might lead to misrepresentation - "oh, so I unfroze another param group and now this jump on the first batch - probably I this means I am doing something wrong"

data Symbolic link

Why is there a symbolic link in the data folder to itself?

mixup doesn't work - self.in_train not found

Tried to use Mixup, like in the notebook:
https://github.com/fastai/fastai_dev/blob/master/dev/19_callback_mixup.ipynb

mixup = MixUp(0.5)
learn = Learner([...], cbs=mixup)

But getting self.in_train (which is referred in MixUp.lf()) not found. Indeed I didn't find it anywhere in the code.

learn.freeze_to and learn.fit_one_cycle will accept params that presuppose state of the model that is not correct

The core of the issue here is 1) lack of mechanism to easily gauge state of the model (number of param groups, which param groups are unfrozen 2) methods accepting params without complaints that are in some way incompatible with the model.

I don't know what is the correct fix here and this is more for me about sharing ideas and observations. Maybe 1) is completely not needed, but there were times even with fastai 1x where I would have appreciated having a mechanism that would print out to me whether param groups were frozen or not. But I think this is at best a nice to have - once everything works as intended and blows up when there are issues I don't think this will add that much value.

2 is a bigger issue. Right now, even if I have just a single param group, I can call learn.freeze_to(-2). If I don't realize that the cutting of model didn't go as plan (as is the case right now where the cutting doesn't seem to work) I will never be informed of the problem (can still most likely infer that that is the case from the training time, etc, but that requires deeper understanding and paying attention).

Same for learn.fit. I can right now call `learn.fit([1, 1, ...]) with an arbitrarily long list of lrs and that method will not complain, regardless how many param groups there are.

Probably calling learn.freeze_to with an argument that is incompatible with the model should raise. With learn.fit and the lrs I am not sure how to handle this - the two options that come to mind are being okay with a single lr and with multiple lrs raising either when len(lrs) != len(param_groups) or len(lrs) != len(trainable_param_groups).

defaults.device assignment crashes on machine without CUDA

"dev/local/core.py" line 297 derived from "dev/01_core.ipynb":
defaults.device = torch.cuda.current_device() if torch.cuda.is_available else torch.device('cpu')

torch.cuda.is_available is missing parentheses, i.e., torch.cuda.is_available().

This leads to an error when importing from local.core on a machine without CUDA:
from local.core import *

Error message:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-1-72bc20042ecf> in <module>
      2 from local.imports import *
      3 from local.test import *
----> 4 from local.core import *
      5 from local.notebook.showdoc import show_doc

~/Downloads/fastai_docs/dev/local/core.py in <module>
    295     return apply(lambda x: x.float() if x.dtype not in [torch.int64, torch.int32, torch.int16] else x, b)
    296 
--> 297 defaults.device = torch.cuda.current_device() if torch.cuda.is_available else torch.device('cpu')
    298 
    299 def to_device(b, device=defaults.device):

~/anaconda3/envs/fastai-cpu-v2-test/lib/python3.7/site-packages/torch/cuda/__init__.py in current_device()
    339 def current_device():
    340     r"""Returns the index of a currently selected device."""
--> 341     _lazy_init()
    342     return torch._C._cuda_getDevice()
    343 

~/anaconda3/envs/fastai-cpu-v2-test/lib/python3.7/site-packages/torch/cuda/__init__.py in _lazy_init()
    159         raise RuntimeError(
    160             "Cannot re-initialize CUDA in forked subprocess. " + msg)
--> 161     _check_driver()
    162     torch._C._cuda_init()
    163     _cudart = _load_cudart()

~/anaconda3/envs/fastai-cpu-v2-test/lib/python3.7/site-packages/torch/cuda/__init__.py in _check_driver()
     73 def _check_driver():
     74     if not hasattr(torch._C, '_cuda_isDriverSufficient'):
---> 75         raise AssertionError("Torch not compiled with CUDA enabled")
     76     if not torch._C._cuda_isDriverSufficient():
     77         if torch._C._cuda_getDriverVersion() == 0:

AssertionError: Torch not compiled with CUDA enabled

If an issue for such a small thing is not desired please tell me.

Kind regards
Michael

Can't setup batch transforms from TfmdDL

The Transform.setup() is called before the TfmdDL super constructor here
As the result the transform wont have access to the data. See for example:

class foo(Transform):
    def setup(self, items):
        n = len(items)
test_ds = DataSource(L(range(10)))
test_dl = TfmdDL(test_ds,bs=2,after_batch=[foo])

Found a bug in dev/local/vision/augment.py

I think this line of code does not belong here

fastai_dev/dev/local/vision/augment.py

Line 624 in e715994

ifmnist = untar_data(URLs.MNIST_TINY)

it originally came from this commit:
a16685d

and everything except this line was corrected here:
e715994

If I understood what happened correctly the line should be replaced by:
if do_flip: res.append(Dihedral(p=0.5, **tkw) if flip_vert else Flip(p=0.5, **tkw))

I'd be glad to do the PR

SaveModelCallback possibly broken?

SaveModelCallback(name='test_model', every_epoch=True)

throws and exception complaining that it can't assign to the attribute .name

issues with get_preds

call get_preds on a learner that hasn't done any training causes problems with non-existent variables (uninitialized)

so for example:

channels = 1
num_cat =7
opt_func = partial(Adam, lr=3e-3, wd=0.01)
model = xresnet.xresnet18(c_in=channels,c_out=num_cat)
cb_funcs = [partial(MixedPrecision, clip=0.1)]
metrics = [accuracy_multi]
test_learn = Learner(test_dbunch, model, opt_func=opt_func, loss_func=BCEWithLogitsLossFlat(), 
                cb_funcs=cb_funcs, metrics=metrics)

test_learn = test_learn.load('test')

preds, targets = test_learn.get_preds(ds_idx=0)

will throw exceptions about missing variables on learner if I don't do something like this instead:

test_learn.n_epoch = 1
test_learn.epoch = 0
test_learn.smooth_loss = 0.
preds, targets = test_learn.get_preds(ds_idx=0)

the first two exceptions are tied to the progress_callback i believe - i think because in training we use _do_begin_fit and we don't with get_preds

calling get preds with a dataset that isn't evenly divisible by batch size throws an exception on the last partial batch

Wikitext-2 integration test failing due to LMDataLoader setting unsettable TfmdDL `items` property

LMDataLoader expects to be able to set items, however this is defined in TfmdDL as a property with no setter, which raises 'AttributeError: can't be called' when creating a text databunch (e.g. in the wikitext-2 integration test).

Error thrown in 35_tutorial_wikitext.ipynb:

dbch = dsrc.databunch(bs=bs, seq_len=sl, after_batch=Cuda)

31_text_data.ipynb:

class LMDataLoader(TfmdDL):
    def __init__(self, dataset, lens=None, cache=2, bs=64, seq_len=72, num_workers=0, **kwargs):
        super().__init__(dataset=dataset, bs=bs, num_workers=num_workers, **kwargs)
        self.items = ReindexCollection([(o[0] if isinstance(o, tuple) else o)
                                        for o in dataset], cache=cache)

05_data_core.ipynb:

class TfmdDL(DataLoader):
    @property
    def items(self): return self.tls[0].items

Removing the items property from TfmdDL and re-exporting the notebook fixed the immediate issue, but I'm too new to this library to determine whether this is the right fix. I'd be happy to submit a PR if it is.

Not sure whether this is relevant, but I also noticed that the items property references a tls attribute, which doesn't seem to exist for a DataLoader, but does exist in a DataSource - perhaps something got mixed up here?

Would be nice to have imports at the top of every page/example

Thanks for the detailed docs!

While it's mentioned on the intro page that one typically imports * from fastai and one of the applications e.g. fastai.vision, it is easy to miss, especially if you've arrived at the docs for a particular section by following a link.

It would be nice to have a couple of imports before every example, so one can easily follow along. I can create a PR, but I wanted to bounce off the idea first, hence the issue.

02a_why_sqrt5 may have wrong reference in last cell and 2nd cell

Forgive my ignorance but the 02a notebook may have the wrong references if the 2nd and last cell

get_files incorrectly includes folders if passed as str, not list

Current implementation of get_files() when receives folders parameter as str fails to distinguish folders that are included in the passed parameter, for example:
train in training will be True while should be False.
A fix: cast folders to L type in the beginning of the function.

folders = L(folders)

fastai_dev/dev/local/data/transforms.py

Line 34 in ee9b845

if folders is not None and i==0: d[:] = [o for o in d if o in folders]

notebook2script.py requires fastai_v1 for fastai.script

It seems weird to have this line in notebook2script.py:

from fastai.script import *

Is there a way to bring fastai_v1 script into fastai_v2 to remove this dependency?

Here is what is being imported:

https://github.com/fastai/fastai/blob/master/fastai/script.py

`core.chk` can't distinguish type Int and float on second parameter?

I hope the following code snapnot makes my issue clear. my code snippet is here

The project description should point to the right docs url

Should say: http://dev.fast.ai

build separate package for the export of notebooks

I'm working on a project at my company and I'd like to use the same system of notebook exports that you use for fastai-dev (the showcase you made in the walk-through was "software pretty").
I can do it just for my project or take steps to make it modular and why not try to add features.

Would you be interested in the second option ? If so, could we elaborate specs for a Minimum Viable Package ?

Hyperlinks broken in training.html

The "Walk-through of key functionality" section in http://docs.fast.ai/training.html has two hyperlinks to fastai.docs and get_mnist that lead to a "Page Not Found" error. The links are looking for a docs.html file that is not in the docs/ subfolder of this repo.

Mixup for multi-label classification

Trying to use mixup for the rsna challenge.

Errors on toy examples here:
https://github.com/bearpelican/fastai_dev/blob/mixup_multilabel/dev/19_callback_mixup.ipynb

Unsqueezing seems to work for the unflattened version

Comments linking to inaccessible forum pages

There are parts of the notebooks that link to forum pages that are inaccessible by students. Such as:
(notebook 4 (callbacks) in swift)
/// A wrapper class to hold the loss function, to work around
// https://forums.fast.ai/t/fix-ad-crash-in-learner/42970.

This makes it very difficult to determine why "weird" decisions were made, which is fairly important when trying to understand how the underlying code is working.

Sadly I cannot fix this myself as I do not have access to this part of the forum.

There is no docs_src folder

create_cnn_model not cutting archs properly

If my understanding is correct, the following should give 3 param groups. Instead it gives one:

Fix AD crash in Learner

Reported by @marcrasi here. See also repro in:

cc @dan-zheng

learn.validate output is incorrectly formatted

With custom metrics it does not include valid loss and everything gets shifted by one to the left:

Missing git push for 00_exports.ipynb file

The file 00_exports.ipynb available in the repository is outdated when compared to the presented in the class:

In class

File available

https://github.com/fastai/fastai_docs/blob/master/dev_course/dl2/00_exports.ipynb

Show how to use broadcasting effectively

We want to show students how to use broadcasting to write concise code. However currently in notebook 01 the Matmult with broadcasting section takes 28ms, whereas the Swift loop and Pytorch broadcasting version both take ~0.25ms.

If we can't show how to write concise performant code, then we aren't giving people a way to actually write their own layer logic - which means that they can't really take advantage of s4tf at all; they'd only be able to glue together pre-written layers, which isn't terribly useful...

If fixing this requires XLA then this and implementing performant RunningBatchNorm are likely to go together, so tagging @jekbradbury here.

@saeta provided this information:

Unless I'm confusing two issues, we've done an investigation into this. The issue is that we've written our APIs to facilitate GPE, but this results in lots of small eager ops that have to hit the old TF core, resulting in relatively inefficient execution. This is of course a temporary issue while we migrate to MLIR + the new runtime. In the meantime, we're exploring using XLA to optimize at a function level: https://bugs.swift.org/browse/TF-407

One thing I don't understand here is why PyTorch (which is also "eager") can do this 100x faster than s4tf, despite not using any XLA/JIT type functionality. Is that an underlying TF issue, or something we can fix at our level?

running get_preds on a Learner with loaded weights (without training)

When I load the weights and try to run get_preds as follows:

learn.load('phase-3');
preds, targs = learn.get_preds(0)

I get the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-29-82ad0b2529ce> in <module>
      1 learn.load('phase-3');
----> 2 preds, targs = learn.get_preds(0)

~/work/fastai_dev/dev/local/learner.py in get_preds(self, ds_idx, with_loss)
    229         cb = GatherPredsCallback(with_loss=with_loss)
    230         with self.no_logging(), self.added_cbs(cb), self.loss_not_reduced():
--> 231             self(['begin_fit', 'begin_epoch', 'begin_validate'])
    232             self.all_batches()
    233             self(['after_validate', 'after_epoch', 'after_fit'])

~/work/fastai_dev/dev/local/learner.py in __call__(self, event_name)
    238     def __call__(self, event_name):
    239         "Call `event_name` (one or a list) for all callbacks"
--> 240         for e in L(event_name): self._call_one(e)
    241 
    242     def _call_one(self, event_name):

~/work/fastai_dev/dev/local/learner.py in _call_one(self, event_name)
    242     def _call_one(self, event_name):
    243         assert hasattr(event, event_name)
--> 244         [cb(event_name) for cb in sort_by_run(self.cbs)]
    245 
    246     @contextmanager

~/work/fastai_dev/dev/local/learner.py in <listcomp>(.0)
    242     def _call_one(self, event_name):
    243         assert hasattr(event, event_name)
--> 244         [cb(event_name) for cb in sort_by_run(self.cbs)]
    245 
    246     @contextmanager

~/work/fastai_dev/dev/local/learner.py in __call__(self, event_name)
     21 class Callback():
     22     "Basic class handling tweaks of the training loop by changing a `Learner` in various events"
---> 23     def __call__(self, event_name): getattr(self, event_name, noop)()
     24     def __repr__(self): return self.__class__.__name__
     25     def __getattr__(self, k):

~/work/fastai_dev/dev/local/callback/progress.py in begin_fit(self)
     20     def begin_fit(self):
     21         assert hasattr(self.learn, 'recorder')
---> 22         self.mbar = master_bar(list(range(self.n_epoch)))
     23         self.mbar.on_iter_begin()
     24         self.old_logger,self.learn.logger = self.logger,self._write_stats

~/work/fastai_dev/dev/local/learner.py in __getattr__(self, k)
     26         if k=='learn': raise AttributeError
     27         if not hasattr(self,'learn'): raise AttributeError
---> 28         return getattr(self.learn, k)
     29 
     30     @property

AttributeError: 'Learner' object has no attribute 'n_epoch'

learn.validate() blows up with Precision() and Recall() in metrics

If I have a trained model and I add new metrics to it as follows:

learn.metrics += [Precision(), Recall()]

and then run

learn.validate()

I get the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-26-631604a2e07b> in <module>
----> 1 learn.validate()

~/work/fastai_dev/dev/local/learner.py in validate(self, dl, cbs)
    228             self(['begin_fit', 'begin_epoch', 'begin_validate'])
    229             self.all_batches()
--> 230             self(['after_validate', 'after_epoch', 'after_fit'])
    231         return self.recorder.values[-1]
    232 

~/work/fastai_dev/dev/local/learner.py in __call__(self, event_name)
    245     def __call__(self, event_name):
    246         "Call `event_name` (one or a list) for all callbacks"
--> 247         for e in L(event_name): self._call_one(e)
    248 
    249     def _call_one(self, event_name):

~/work/fastai_dev/dev/local/learner.py in _call_one(self, event_name)
    249     def _call_one(self, event_name):
    250         assert hasattr(event, event_name)
--> 251         [cb(event_name) for cb in sort_by_run(self.cbs)]
    252 
    253     @contextmanager

~/work/fastai_dev/dev/local/learner.py in <listcomp>(.0)
    249     def _call_one(self, event_name):
    250         assert hasattr(event, event_name)
--> 251         [cb(event_name) for cb in sort_by_run(self.cbs)]
    252 
    253     @contextmanager

~/work/fastai_dev/dev/local/learner.py in __call__(self, event_name)
     21 class Callback():
     22     "Basic class handling tweaks of the training loop by changing a `Learner` in various events"
---> 23     def __call__(self, event_name): getattr(self, event_name, noop)()
     24     def __repr__(self): return self.__class__.__name__
     25     def __getattr__(self, k):

~/work/fastai_dev/dev/local/learner.py in after_validate(self)
    404     def after_train   (self): self.log += [_maybe_item(m.value) for m in self._train_mets]
    405     def begin_validate(self): [m.reset() for m in self._valid_mets]
--> 406     def after_validate(self): self.log += [_maybe_item(m.value) for m in self._valid_mets]
    407 
    408     def after_cancel_train(self):    self.cancel_train = True

~/work/fastai_dev/dev/local/learner.py in <listcomp>(.0)
    404     def after_train   (self): self.log += [_maybe_item(m.value) for m in self._train_mets]
    405     def begin_validate(self): [m.reset() for m in self._valid_mets]
--> 406     def after_validate(self): self.log += [_maybe_item(m.value) for m in self._valid_mets]
    407 
    408     def after_cancel_train(self):    self.cancel_train = True

~/work/fastai_dev/dev/local/learner.py in _maybe_item(t)
    366 
    367 def _maybe_item(t):
--> 368     return t.item() if t.numel()==1 else t
    369 
    370 class Recorder(Callback):

AttributeError: 'numpy.float64' object has no attribute 'numel'

I can 'fix' this using the following:

but this is obviously not the fix we want 🙂

<code>2250]))</code> instead of link

In vision.data documentation there are some references that appear as 2250])) (or with other numbers). It seems some automatically generated code. How can I fix this and submit PR?

Wikitext datasets do not have valid.csv as referred in 35_tutorial_wikitext

Wikitext datasets (WIKITEXT, WIKITEXT_TINY) do not have *.txt files, but *.csv, and no valid.csv, as expected/referred to in the 35_tutorial_wikitext notebook:

path = untar_data(URLs.WIKITEXT_TINY)
!ls $path
test.csv  train.csv
...
val_txt = read_file(path/'valid.txt')

datasets.download no longer accepts the ext parameter

In https://github.com/fastai/fastai_docs/blob/master/dev_course/dl2/01_matmul.ipynb, the ext parameter has been deprecated. I propose to instead declare path as path = datasets.url2path(datasets.Config.DEFAULT_CONFIG["data_path"]).parent/"mnist.pkl.gz" and use datasets.download_url(MNIST_URL, path) to download the data

Broken link in Doc authoring > Overview

Section "Modules" (http://docs.fast.ai/gen_doc.html#Modules) contains broken link "fastai.gen_doc.sgen_notebooks".
However, sgen_notebooks script seems to be covered in http://docs.fast.ai/gen_doc.gen_notebooks.html#fastai.gen_doc.gen_notebooks, so that broken link seems to be redundant.

Is there a way to select different loss functions for different targets?

If I understand correctly, there's no way, in the library, to select different loss functions for different targets.

For my use case, I have two types of labels:

Logits from a teacher model, available only for the training set
Ground-truth labels, available only for the validation set

I need to apply MSE loss to the logits during training steps, and cross entropy (+accuracy) to the ground-truth labels as metrics on the validation set.

For now, I solved the problem with this snippet:

class TargetLoss():
    def __init__(self, loss_func, target_index, name=None):
        store_attr(self, "loss_func,target_index,name")
    def __call__(self, x, *targets, **kwargs):
        return self.loss_func(x, targets[self.target_index], **kwargs)
    @property
    def __name__(self):
        return self.name
    
train_loss = TargetLoss(MSELossFlat(), 1)
valid_loss = TargetLoss(CrossEntropyLossFlat(), 0, name="valid_loss")
valid_accuracy = TargetLoss(accuracy, 0, name="valid_accuracy")
learn = Learner(dbch, model, loss_func=train_loss, opt_func=opt_func, cb_funcs=cb_funcs, metrics=[valid_loss, valid_accuracy])

But this way I get two columns named valid_loss: the first is the one made by the learner, which in my case is applied to the wrong targets, and the second one is the one I specified. It also doesn't implement decodes and activation.

Do you think there should be an in-library mechanism for handling these situations? Or is there already a way and I didn't realize?

class XResNet not working with mnist

the following line fails :

return act_fn(self.convs(x) + self.idconv(self.pool(x))) )

because self.pool(x)) and self.convs(x) does not match

i suggest this fix in class ResBlock(nn.Module).init

change self.pool = noop if stride==1 else nn.AvgPool2d(2)
to self.pool = noop if stride==1 else nn.AvgPool2d(2, ceil_mode=True)

the ResBlock.forward could also use an in place add_ although i cannot measure the difference for xresnet18:

act_fn(self.convs(x).add_( self.idconv(self.pool(x)))) )

Issue with 07_batchnorm.ipynb

Hi i get error while running the below class
at code x.var((0,2,3),keepdim=True)
var(): argument 'dim' (position 1) must be int, not tuple

class BatchNorm(nn.Module):
    def __init__(self, nf, mom=0.1, eps=1e-5):
        super().__init__()
        # NB: pytorch bn mom is opposite of what you'd expect
        self.mom,self.eps = mom,eps
        self.mults = nn.Parameter(torch.ones (nf,1,1))
        self.adds  = nn.Parameter(torch.zeros(nf,1,1))
        self.register_buffer('vars',  torch.ones(1,nf,1,1))
        self.register_buffer('means', torch.zeros(1,nf,1,1))

    def update_stats(self, x):
        print(x.size())
        m = x.mean((0,2,3), keepdim=True)
        print(m.size())
        v = x.var((0,2,3), keepdim=True) # This gives syntax error saying Var() needs dim as int but tuple given
        self.means.lerp_(m, self.mom)
        self.vars.lerp_ (v, self.mom)
        return m,v
        
    def forward(self, x):
        if self.training:
            with torch.no_grad(): 
              m,v = self.update_stats(x)
        else: 
          m,v = self.means,self.vars
        x = (x-m) / (v+self.eps).sqrt()
        return x*self.mults + self.adds

Import Error on import fastai2.basics

Hello!

Just installed fastai v2 on a kaggle kernel following instructions from Jeremy Howard's notebook.

Running import fastai2.basics results in the traceback linked below. I'm fairly inexperienced with python, but this seems like an issue within fastai to me.

ImportError Traceback (most recent call last)
in
----> 1 import fastai2.basics

/opt/conda/lib/python3.6/site-packages/fastai2/basics.py in
----> 1 from .data.all import *
2 from .optimizer import *
3 from .learner import *
4 from .metrics import *

/opt/conda/lib/python3.6/site-packages/fastai2/data/all.py in
----> 1 from ..torch_basics import *
2 from .core import *
3 from .load import *
4 from .external import *
5 from .transforms import *

/opt/conda/lib/python3.6/site-packages/fastai2/torch_basics.py in
1 from .core.all import *
----> 2 from .torch_imports import *
3 from .torch_core import *
4 from .layers import *

/opt/conda/lib/python3.6/site-packages/fastai2/torch_imports.py in
----> 1 import torch
2 from torch import as_tensor,Tensor,ByteTensor,LongTensor,FloatTensor,HalfTensor,DoubleTensor
3 import torch.nn as nn
4 import torch.nn.functional as F
5 from torch.utils.data import SequentialSampler,RandomSampler,Sampler,BatchSampler

/opt/conda/lib/python3.6/site-packages/torch/init.py in
79 del _dl_flags
80
---> 81 from torch._C import *
82
83 all += [name for name in dir(_C)

ImportError: libshm.so: cannot open shared object file: No such file or directory

problem with small train datasets?

Hi. When training an imdb sentiment model using the code from:
https://github.com/fastai/fastai_docs/blob/master/docs_src/text.ipynb
but trimming train.csv to the first 10 lines, I get the following error:
.../lib/python3.7/site-packages/fastprogress/fastprogress.py:89: UserWarning: You generator is empty.
warn("You generator is empty.")
...
File ".../lib/python3.7/site-packages/fastprogress/fastprogress.py", line 213, in on_update
filled_len = int(self.length * val // self.total)
ZeroDivisionError: integer division or modulo by zero

train - save - restart notebook - load - train - save raises an error

I documented the behavior in this example notebook.

Source code links not working for properties

See the last few cells in notebook 05_data_core, where we attempt to show_doc DataBunch properties train_dl, valid_dl, train_ds and valid_ds. The documentation summary works fine, but the source code links come up empty.

Those properties are added via add_props, which in turn creates them as property instances. The function get_name, defined in 91_notebook_export, is unable to derive a symbolic name for the desired property.

I haven't found a way to introspect the property name to be able to find the notebook where the symbol is defined. As an alternative, we could add something like the following
to obtain the name of the object the property has been added to:

# export
def get_name(obj):
    "Get the name of `obj`"
    if hasattr(obj, '__name__'):       return obj.__name__
    elif getattr(obj, '_name', False): return obj._name
    elif hasattr(obj,'__origin__'):    return str(obj.__origin__).split('.')[-1] #for types
    elif type(obj) == property:        return obj.fget.func.__qualname__.split('.')[0]
    else:                              return str(obj).split('.')[-1]

If we were to invoke get_name(DataBunch.train_dl), get_name would return DataBunch. This is not semantically correct, but it would suffice to make the rest of the infrastructure point to the source code for DataBunch, which is the same result we get for other fields.

Is this something worth pursuing? If so, is there a way to introspect the property name instead of the parent object's name, or should we be satisfied with the latter?

(This is similar to the issue addressed in #170, but the root cause is different).

Windows compatibility + typo's

Hi,
I am not sure if I should submit a pr for 1) Windows compatibility issues and/or 2) small typo's in the notebooks or not?
For example:

In notebook2script.py
main_dic = json.load(open(fname,'r'))
breaks on windows and could be replaced with
main_dic = json.load(open(fname,'r', encoding="utf-8"))
At the bottom of notebook 02_fully_connected.ipynb python is missing from the command in the last cell.

notebook2script.py tries import from non-existing module

fastai_dev/dev/notebook2script.py

Line 4 in b62fcfe

from local.script import *

_get_files doesn't return files in a deterministic order across OSes

_get_files in local.data.transforms.py doesn't return files in a deterministic order across OSes.

This is an issue when getting files, then splitting using a fixed seed. For example, in 08_pets_tutorial.ipynb (I added the seed parameter):

items = get_image_files(source)
split_idx = RandomSplitter(seed=42)(items)

In this case, 2 users on different OSes would have the same split_idx, but different train/validation sets.

It would be straightforward for a user to correct this by sorting items before passing this list into the splitter, but I wouldn't expect that many people would know to do this.