packtpublishing / modern-computer-vision-with-pytorch Goto Github PK

View Code? Open in Web Editor NEW

639.0 15.0 302.0 103.65 MB

Modern Computer Vision with PyTorch, published by Packt

License: MIT License

Jupyter Notebook 99.99% Python 0.01% Dockerfile 0.01%

modern-computer-vision-with-pytorch's People

Stargazers

Watchers

Forkers

dlreseach rancychepchirchir rudy-t chaturvediabhay24 dnyandeobharambe billyg88 truonggianghus arifmudi huxiangdong sidhusmart fasladodo abkonate albertofernandezvillan rohitkuk sizhky mikeyecology levansy-ai udaabu olibravo serendipity-sd youvenz flywind2 aiedward lybell-art sivashankar-s standardgalactic miaviles rinec kapitsa2811 haeun2609 saswat01 falahgs choesw arashrefai anashr18 dipanwita2019 lucimarinozzi daisukeishibe wanderer2014 jarmat vr3d mohsendaoud1 vishwa360 alluprasad sssamkim meetdoshi90 amirunpri2018 xandersheppard mizzleaa aihill sajucrajan 3dalgolab akkinasrikar lwppwl fuxuliu chao-zhang-pku prince-xuanchan wodole allensmile sougata09 saintland krhodes9 jammer345 ehzawad yiqiao-yin ranjankumar-gh sitinggz umbravenus akhilvydyula jagadeesha-r-g hephaex devotionzhu julifruti torksys arturofu juananmonte dungvuhus lsheiba rohitpawar2406 hslim11 stjordanis ptsang1 guyk1971 rajarshi1998 jrodmath aditishraq ruch-code pepsalehi pheonix-18 drhamedmozaffari ahmedelshehawy coolpaul jhagrut minhphuc1510 ldthinh cmatheusia gpietersz turcanuandrei tnamng thithaotran

modern-computer-vision-with-pytorch's Issues

opt <-> optimizer parameter

The function

def train_batch(x, y, model, opt, loss_fn):
    prediction = model(x)[0]
    batch_loss = loss_fn(prediction, y)
    batch_loss.backward()
    optimizer.step()
    optimizer.zero_grad()

has parameter opt and then uses optimizer which isn't passed in so it must be using a global value.

torchsummary has been renamed to torchinfo

All lines contained torchsummary throughout the book has to be renamed to torchinfo, i.e.:

from torchinfo import summary

Furthermore, all lines containing summary have to edited with the new syntax:
summary(model, (32,3,64,64))

Chapter 6 (class_activation_maps)

The link for downloading data in Chapter06/Class_activation_maps.ipynb doesn't work

Issue with Data_augmentation_with_CNN: 'Tensor' object has no attribute 'deepcopy'

The problems seems to lie with this line:
if self.aug: ims=self.aug.augment_images(images=ims)
The issue is that we are passing a tuple with tensors they must be NumPy arrays to work from my understanding.

Here is the error that I get:

in
3 for epoch in range(5):
4 print(epoch)
----> 5 for ix, batch in enumerate(iter(trn_dl)):
6 x, y = batch
7 batch_loss = train_batch(x, y, model, optimizer, loss_fn)

E:\Programs\anaconda3\envs\deep-learning\lib\site-packages\torch\utils\data\dataloader.py in next(self)
519 if self._sampler_iter is None:
520 self._reset()
--> 521 data = self._next_data()
522 self._num_yielded += 1
523 if self._dataset_kind == _DatasetKind.Iterable and \

E:\Programs\anaconda3\envs\deep-learning\lib\site-packages\torch\utils\data\dataloader.py in _next_data(self)
559 def _next_data(self):
560 index = self._next_index() # may raise StopIteration
--> 561 data = self._dataset_fetcher.fetch(index) # may raise StopIteration
562 if self._pin_memory:
563 data = _utils.pin_memory.pin_memory(data)

E:\Programs\anaconda3\envs\deep-learning\lib\site-packages\torch\utils\data_utils\fetch.py in fetch(self, possibly_batched_index)
50 else:
51 data = self.dataset[possibly_batched_index]
---> 52 return self.collate_fn(data)

in collate_fn(self, batch)
14
15
---> 16 if self.aug: ims=self.aug.augment_images(images=ims)
17 ims = torch.tensor(ims)[:,None,:,:].to(device)/255.
18 classes = torch.tensor(classes).to(device)

E:\Programs\anaconda3\envs\deep-learning\lib\site-packages\imgaug\augmenters\meta.py in augment_images(self, images, parents, hooks)
823 UnnormalizedBatch(images=images),
824 parents=parents,
--> 825 hooks=hooks
826 ).images_aug
827

E:\Programs\anaconda3\envs\deep-learning\lib\site-packages\imgaug\augmenters\meta.py in augment_batch_(self, batch, parents, hooks)
595 batch_unnorm = batch
596 batch_norm = batch.to_normalized_batch()
--> 597 batch_inaug = batch_norm.to_batch_in_augmentation()
598 elif isinstance(batch, Batch):
599 batch_norm = batch

E:\Programs\anaconda3\envs\deep-learning\lib\site-packages\imgaug\augmentables\batches.py in to_batch_in_augmentation(self)
449
450 return _BatchInAugmentation(
--> 451 images=_copy(self.images_unaug),
452 heatmaps=_copy(self.heatmaps_unaug),
453 segmentation_maps=_copy(self.segmentation_maps_unaug),

E:\Programs\anaconda3\envs\deep-learning\lib\site-packages\imgaug\augmentables\batches.py in _copy(var)
445 # TODO first check here if _aug is set and if it is then use that?
446 if var is not None:
--> 447 return utils.copy_augmentables(var)
448 return var
449

E:\Programs\anaconda3\envs\deep-learning\lib\site-packages\imgaug\augmentables\utils.py in copy_augmentables(augmentables)
17 result.append(np.copy(augmentable))
18 else:
---> 19 result.append(augmentable.deepcopy())
20 return result
21

AttributeError: 'Tensor' object has no attribute 'deepcopy'

Detected that PyTorch and torchvision were compiled with different CUDA versions. PyTorch has CUDA Version=11.7 and torchvision has CUDA Version=11.6. Please reinstall the torchvision that matches your PyTorch install.

i got this error when i execute the code on google colab, because of version

Chapter 13 (CycleGAN)

I met the following error while testing CycleGAN.ipynb of Chapter 13.

NameError                                 Traceback (most recent call last)
<ipython-input-18-af85a4b0ad99> in <cell line: 1>()
----> 1 trn_ds = CycleGANDataset('apples_train', 'oranges_train')
      2 val_ds = CycleGANDataset('apples_test', 'oranges_test')
      3 
      4 trn_dl = DataLoader(trn_ds, batch_size=1, shuffle=True, collate_fn=trn_ds.collate_fn)
      5 val_dl = DataLoader(val_ds, batch_size=5, shuffle=True, collate_fn=val_ds.collate_fn)

<ipython-input-17-eca7eea9b41b> in __init__(self, apples, oranges)
      1 class CycleGANDataset(Dataset):
      2     def __init__(self, apples, oranges):
----> 3         self.apples = Glob(apples)
      4         self.oranges = Glob(oranges)

NameError: name 'Glob' is not defined

I could not find a source file in which Glob() was defined. Where does Glob() come from?

Error in Chapter 12 (Face Generation using Conditional GAN)

The code:

log = Report(n_epochs)
for epoch in range(n_epochs):
    N = len(dataloader)
    for bx, (images, labels) in enumerate(dataloader):
        real_data, real_labels = images.to(device), labels.to(device)
        fake_labels = torch.LongTensor(np.random.randint(0, 2, len(real_data))).to(device)
        fake_data = generator(noise(len(real_data)), fake_labels)
        fake_data = fake_data.detach()
        d_loss = discriminator_train_step(real_data, real_labels, fake_data, fake_labels)
        fake_labels = torch.LongTensor(np.random.randint(0, 2, len(real_data))).to(device)
        fake_data = generator(noise(len(real_data)), fake_labels).to(device)
        g_loss = generator_train_step(fake_data, fake_labels)
        pos = epoch + (1+bx)/N
        log.record(pos, d_loss=d_loss.detach(), g_loss=g_loss.detach(), end='\r')
    log.report_avgs(epoch+1)
    with torch.no_grad():
        fake = generator(fixed_noise, fixed_fake_labels).detach().cpu()
        imgs = vutils.make_grid(fake, padding=2, normalize=True).permute(1,2,0)
        img_list.append(imgs)
        show(imgs, sz=10)

displays the following error:

TypeError Traceback (most recent call last)
/tmp/ipykernel_60416/3886870486.py in
2 for epoch in range(n_epochs):
3 N = len(dataloader)
----> 4 for bx, (images, labels) in enumerate(dataloader):
5 real_data, real_labels = images.to(device), labels.to(device)
6 fake_labels = torch.LongTensor(np.random.randint(0, 2, len(real_data))).to(device)

~/miniconda3/envs/c2-vision/lib/python3.9/site-packages/torch/utils/data/dataloader.py in next(self)
519 if self._sampler_iter is None:
520 self._reset()
--> 521 data = self._next_data()
522 self._num_yielded += 1
523 if self._dataset_kind == _DatasetKind.Iterable and \

~/miniconda3/envs/c2-vision/lib/python3.9/site-packages/torch/utils/data/dataloader.py in _next_data(self)
1201 else:
1202 del self._task_info[idx]
-> 1203 return self._process_data(data)
1204
1205 def _try_put_index(self):

~/miniconda3/envs/c2-vision/lib/python3.9/site-packages/torch/utils/data/dataloader.py in _process_data(self, data)
1227 self._try_put_index()
1228 if isinstance(data, ExceptionWrapper):
-> 1229 data.reraise()
1230 return data
1231

~/miniconda3/envs/c2-vision/lib/python3.9/site-packages/torch/_utils.py in reraise(self)
432 # instantiate since we don't know how to
433 raise RuntimeError(msg) from None
--> 434 raise exception
435
436

TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/james/miniconda3/envs/c2-vision/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/james/miniconda3/envs/c2-vision/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/james/miniconda3/envs/c2-vision/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/tmp/ipykernel_60416/1994190152.py", line 13, in getitem
gender = np.where('female' in image_path,1,0)
TypeError: argument of type 'PosixPath' is not iterable

Chapter 5: Age/gender prediction notebook not found in colab

The link to the colab of the notebook age_gender_prediction.ipynb and age_gender_torch_snippets.ipynb from chapter 5 does not longer work.

Chap 4 (Data_augmentation_with_CNN) -- Error while trying to augment data

class FMNISTDataset(Dataset):
    def __init__(self, x, y, aug=None):
        self.x, self.y = x, y
        self.aug = aug
    def __getitem__(self, ix):
        x, y = self.x[ix], self.y[ix]
        return x, y
    def __len__(self): return len(self.x)

    def collate_fn(self, batch):
        'logic to modify a batch of images'
        ims, classes = list(zip(*batch))
        # transform a batch of images at once
        if self.aug: ims=self.aug.augment_images(images=ims)
        ims = torch.tensor(ims)[:,None,:,:].to(device)/255.
        classes = torch.tensor(classes).to(device)
        return ims, classes

In the code of the FMNISTDataset in the collate_fn, we're try to augment ims using the aug.augment_images

While running it in local on a higher version than the book one, I got issue on this part ims=self.aug.augment_images(images=ims) (Tried also in the notebook and get the same error)

Its du to the fact that, ims is a tuple of tensor, and into the augment_images, they call .copy() on each images, cause they waiting for a list of np.ndarray

To fix it, we need to stack tensor into a bigger one and then call the to_numpy() function as this:

def to_numpy(x: torch.Tensor) -> np.ndarray:
    return x.cpu().detach().numpy()
 

class FMNISTDataset(Dataset):
    def __init__(
        self,
        x: torch.Tensor,
        y: torch.Tensor,
        aug: imgaug.augmenters.Augmenter | None = None,
    ) -> None:
        self.x, self.y = x, y
        self.aug = aug

    def __getitem__(self, ix: int) -> tuple[torch.Tensor, torch.Tensor]:
        x, y = self.x[ix], self.y[ix]
        return x, y

    def __len__(self) -> int:
        return len(self.x)

    def collate_fn(
        self, batch: list[tuple[torch.Tensor, torch.Tensor]]
    ) -> tuple[torch.Tensor, torch.Tensor]:
        ims, classes = zip(*batch)
        ims = torch.stack(ims)
        if self.aug:
            ims = self.aug.augment_images(images=to_numpy(ims))
        ims = torch.tensor(ims)[:, None, :, :] / 255.0
        classes = torch.tensor(classes)
        return ims, classes

And with this, should work well, and the ims = torch.stack(ims) and to_numpy(ims) doesn't take too long to process

Chapter7: Training R-CNN Notebook

There is a missing parameter (epsilon) when calculating the IOUs of all candidates using extract_iou function (code line 8).

                  for ix, (im, bbs, labels, fpath) in enumerate(ds):
                      if(ix==N):
                          break
                      H, W, _ = im.shape
                      candidates = extract_candidates(im)
                      candidates = np.array([(x,y,x+w,y+h) for x,y,w,h in candidates])
                      ious, rois, clss, deltas = [], [], [], []
                      ious = np.array([[extract_iou(candidate, _bb_) for candidate in candidates] for _bb_ in bbs]).T
                      for jx, candidate in enumerate(candidates):
                          cx,cy,cX,cY = candidate
                          candidate_ious = ious[jx]
                          best_iou_at = np.argmax(candidate_ious)
                          best_iou = candidate_ious[best_iou_at]
                          best_bb = _x,_y,_X,_Y = bbs[best_iou_at]
                          if best_iou > 0.3: clss.append(labels[best_iou_at])
                          else : clss.append('background')
                          delta = np.array([_x-cx, _y-cy, _X-cX, _Y-cY]) / np.array([W,H,W,H])
                          deltas.append(delta)
                          rois.append(candidate / np.array([W,H,W,H]))
                      FPATHS.append(fpath)
                      IOUS.append(ious)
                      ROIS.append(rois)
                      CLSS.append(clss)
                      DELTAS.append(deltas)
                      GTBBS.append(bbs)

Chap4 (CNN_working_details.ipynb) -- Squeeze on model output

On the cell to train the model

def train_batch(x, y, model, opt, loss_fn):
    model.train()
    prediction = model(x)
    batch_loss = loss_fn(prediction, y)
    batch_loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    return batch_loss.item()

Prediction have the outcome of [1, 1] while we give the y_train

y_train = torch.tensor([0, 1])

We got a miss matching size and the code doesn't run

Error Trigger

Using a target size (torch.Size([1])) that is different to the input size (torch.Size([1, 1])) is deprecated. Please ensure they have the same size.

To fix it, either

y_train = torch.tensor([0, 1]).to(device).float().unsqueeze(1)

def train_batch(x, y, model, opt, loss_fn):
    model.train()
    prediction = model(x)
    batch_loss = loss_fn(prediction.squeeze(0), y)
    batch_loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    return batch_loss.item()

'Tensor' object has no attribute 'deepcopy'

Chapter 4

Data_augmentation_with_CNN

**Training DataLoader is Generating error by 'Tensor' object has no attribute 'deepcopy'

"RuntimeError: Directory 'static/' does not exist" while importing in colab torch_snippets it is giving this error. What is the reason?

Possible Error in Answer Key of Chapter 1

For question number 6 in Chapter 1 (How does the weight update of all the weights across layers happen during back-propagation?), the answer key states the answer to be "It happens using the formula dW = W - alpha * (dW/dL)" (Appendix Pg 1). However, from my understanding, I believe the answer should be W = W - alpha * (dL/dW). This is because each weight is being adjusted by a small amount (alpha * dL/dW). Furthermore, each weight is being adjusted via its effects of its change on the loss (dL/dW).

data augmentation in FasterRCNN

Hi @sizhky ,

Firstly, thanks for this wonderful books. Currently, I'm into chapter 8 Object detection. I would like to know how can I implement data augmentation in the below code. Could you please provide some example of implementing data augmentation in the FasterRCNN model.

from torch_snippets import *
from PIL import Image
import glob, numpy as np, cv2, warnings,random
warnings.filterwarnings('ignore')
 
def seed_everything(seed):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True
    
seed_everything(42)
 
IMAGE_ROOT = 'images'
DF_RAW = pd.read_csv('train_labels.csv')
DF_RAW['image_id'] = DF_RAW['filename'].apply(lambda x: x.split('.')[0])
DF_RAW['labels'] = DF_RAW['class'].apply(lambda x: 1 if x=='car' else 0)
 
 
label2target = {l:t+1 for t,l in enumerate(DF_RAW['class'].unique())}
label2target['background'] = 0
target2label = {t:l for l,t in label2target.items()}
background_class = label2target['background']
num_classes = len(label2target)
 
 
 
def preprocess_image(img):
    img = torch.tensor(img).permute(2,0,1)
    return img.to(device).float()
 
 
class OpenDataset(torch.utils.data.Dataset):
    def __init__(self, df, image_folder=IMAGE_ROOT):
        self.root = image_folder
        self.df = df
        self.unique_images = df['image_id'].unique()
    def __len__(self): return len(self.unique_images)
    def __getitem__(self, ix):
        image_id = self.unique_images[ix]
        image_path = f'{self.root}/{image_id}.jpg'
        img = Image.open(image_path).convert("RGB")
        img = np.array(img)/255
        df = self.df.copy()
        df = df[df['image_id'] == image_id]
        boxes = df[['xmin','ymin','xmax','ymax']].values
        classes = df['class'].values
        target = {}
        target["boxes"] = torch.Tensor(boxes).float()
        target["labels"] = torch.Tensor([label2target[i] for i in classes]).long()
        img = preprocess_image(img)
        return img, target
    def collate_fn(self, batch):
        return tuple(zip(*batch)) 
 
 
 
 
from sklearn.model_selection import train_test_split
trn_ids, val_ids = train_test_split(DF_RAW['image_id'].unique(), test_size=0.1, random_state=99)
trn_df, val_df = DF_RAW[DF_RAW['image_id'].isin(trn_ids)], DF_RAW[DF_RAW['image_id'].isin(val_ids)]
print(len(trn_df), len(val_df))
 
train_ds = OpenDataset(trn_df)
test_ds = OpenDataset(val_df)
 
train_loader = DataLoader(train_ds, batch_size=2, collate_fn=train_ds.collate_fn, drop_last=True,shuffle=True)
test_loader = DataLoader(test_ds, batch_size=2, collate_fn=test_ds.collate_fn, drop_last=True,shuffle=False)
 
 
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
 
device = 'cuda' if torch.cuda.is_available() else 'cpu'
 
def get_model():
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
    return model
 
 
# Defining training and validation functions for a single batch
def train_batch(inputs, model, optimizer):
    model.train()
    input, targets = inputs
    input = list(image.to(device) for image in input)
    targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
    optimizer.zero_grad()
    losses = model(input, targets)
    loss = sum(loss for loss in losses.values())
    loss.backward()
    optimizer.step()
    return loss, losses
 
@torch.no_grad() # this will disable gradient computation in the function below
def validate_batch(inputs, model):
    model.train() # to obtain the losses, model needs to be in train mode only. # #Note that here we are not defining the model's forward method 
#and hence need to work per the way the model class is defined
    input, targets = inputs
    input = list(image.to(device) for image in input)
    targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
 
    optimizer.zero_grad()
    losses = model(input, targets)
    loss = sum(loss for loss in losses.values())
    return loss, losses
 
 
model = get_model().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.005,
                            momentum=0.9, weight_decay=0.0005)
n_epochs = 5
log = Report(n_epochs)
 
 
for epoch in range(n_epochs):
    _n = len(train_loader)
    for ix, inputs in enumerate(train_loader):
        loss, losses = train_batch(inputs, model, optimizer)
        loc_loss, regr_loss, loss_objectness, loss_rpn_box_reg = \
            [losses[k] for k in ['loss_classifier','loss_box_reg','loss_objectness','loss_rpn_box_reg']]
        pos = (epoch + (ix+1)/_n)
        log.record(pos, trn_loss=loss.item(), trn_loc_loss=loc_loss.item(), 
                   trn_regr_loss=regr_loss.item(), trn_objectness_loss=loss_objectness.item(),
                   trn_rpn_box_reg_loss=loss_rpn_box_reg.item(), end='\r')
 
    _n = len(test_loader)
    for ix,inputs in enumerate(test_loader):
        loss, losses = validate_batch(inputs, model)
        loc_loss, regr_loss, loss_objectness, loss_rpn_box_reg = \
          [losses[k] for k in ['loss_classifier','loss_box_reg','loss_objectness','loss_rpn_box_reg']]
        pos = (epoch + (ix+1)/_n)
        log.record(pos, val_loss=loss.item(), val_loc_loss=loc_loss.item(), 
                  val_regr_loss=regr_loss.item(), val_objectness_loss=loss_objectness.item(),
                  val_rpn_box_reg_loss=loss_rpn_box_reg.item(), end='\r')
    if (epoch+1)%(n_epochs//5)==0: log.report_avgs(epoch+1)
 
 
log.plot_epochs(['trn_loss','val_loss'])
 
 
from torchvision.ops import nms
def decode_output(output):
    'convert tensors to numpy arrays'
    bbs = output['boxes'].cpu().detach().numpy().astype(np.uint16)
    labels = np.array([target2label[i] for i in output['labels'].cpu().detach().numpy()])
    confs = output['scores'].cpu().detach().numpy()
    ixs = nms(torch.tensor(bbs.astype(np.float32)), torch.tensor(confs), 0.05)
    bbs, confs, labels = [tensor[ixs] for tensor in [bbs, confs, labels]]
 
    if len(ixs) == 1:
        bbs, confs, labels = [np.array([tensor]) for tensor in [bbs, confs, labels]]
    return bbs.tolist(), confs.tolist(), labels.tolist()
 
 
model.eval()
 
 
model.eval()
for ix, (images, targets) in enumerate(test_loader):
    if ix==6: break
    images = [im for im in images]
    outputs = model(images)
    for ix, output in enumerate(outputs):
        bbs, confs, labels = decode_output(output)
        info = [f'{l}@{c:.2f}' for l,c in zip(labels, confs)]
        print(info)
        show(images[ix].cpu().permute(1,2,0), bbs=bbs, texts=labels, sz=5)

Chapter 4 (Classifying images using deep CNNs)

In the Training cell I am getting this error

RuntimeError: Calculated padded input size per channel: (1 x 1). Kernel size: (3 x 3). Kernel size can't be greater than actual input size

Training

model, loss_fn, optimizer = get_model()
accuracies, losses =[], []
epochs = 5
for epoch in range(epochs):
    epoch_losses, epoch_accuracies = [], []
    for ix, batch in enumerate(iter(trn_dl)):
        x, y = batch
        loss = train_batch(x, y, model, loss_fn, optimizer)
        epoch_losses.append(loss)
    epoch_loss = np.array(epoch_losses).mean()
    # accuracy check
    for ix, batch in enumerate(iter(trn_dl)):
        x, y = batch
        acc = accuracy_fn(x, y, model)
        epoch_accuracies.append(acc)
    epoch_acc = np.mean(epoch_accuracies)
    accuracies.append(epoch_acc)
    losses.append(epoch_loss)

CNN model architecture

# defining the CNN architecture
def get_model():
    model = nn.Sequential(
        nn.Conv2d(1, 64, kernel_size=3),
        nn.MaxPool2d(2),
        nn.ReLU(),
        nn.Conv2d(64, 128, kernel_size=3),
        nn.MaxPool2d(2),
        nn.ReLU(),
        nn.Flatten(),
        nn.Linear(3200, 256),
        nn.ReLU(),
        nn.Linear(256, 10)
    ).to(device)
    loss_fn = nn.CrossEntropyLoss()
    optimizer = Adam(model.parameters(), lr=1e-3)
    return model, loss_fn, optimizer

The Training and Accuracy functions

# creating a training function
def train_batch(x, y, model, loss_fn, opt):
    model.train()
    prediction = model(x)
    batch_loss = loss_fn(prediction, y)
    batch_loss.backward()
    opt.zero_grad()
    opt.step()
    return batch_loss.item()

@torch.no_grad()
def accuracy_fn(x, y, model):
    model.eval()
    prediction = model(x)
    max_val, argmaxes = prediction.max(-1)
    is_correct = argmaxes == y
    is_correct = is_correct.cpu().numpy().tolist()
    return is_correct

I have no idea what must have gone.. I Think the images was reduced to size smaller for the filter with 3x3. How do i fix this?

There is an error in the code for chapter 2 with title building a neural network using pytorch on a toy dataset

I run the code in my editor and get some errors.
in the update loop we must convert the loss value into item object not a tensor so it raises an error
loss_history.append(loss_value.item()) not
loss_history.append(loss_value)

cannot import name 'Field' from 'torchtext.data'

While working on:

from torchtext.data import Field
from pycocotools.coco import COCO
from collections import defaultdict

captions = Field(sequential=False, init_token='', eos_token='')
all_captions = data[data['train']]['caption'].tolist()
all_tokens = [[w.lower() for w in c.split()] for c in all_captions]
all_tokens = [w for sublist in all_tokens for w in sublist]
captions.build_vocab(all_tokens)

i got error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
[<ipython-input-17-700040b5c201>](https://localhost:8080/#) in <cell line: 1>()
----> 1 from torchtext.data import Field
      2 from pycocotools.coco import COCO
      3 from collections import defaultdict
      4 
      5 captions = Field(sequential=False, init_token='', eos_token='')

ImportError: cannot import name 'Field' from 'torchtext.data' (/usr/local/lib/python3.10/dist-packages/torchtext/data/__init__.py)

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

Chapter9 the url of dataset1 is not valid

the shared zip file is deleted when i run the following code:

wget -q \
     https://www.dropbox.com/s/0pigmmmynbf9xwq/dataset1.zip

requirements file

The codes in chapter 9 has some dependency issues. Do you mind posting a requirements.txt file for some of the chapters.?

Chapter 9 - UNET

Hello,

First Thanks for this book. It is a great learning and reference tool.

I am trying to apply the UNET exercise to my own data. My data masks only have 2 integers [0,1]

I have prepared the data as highlighted in the chapter content. I have made one change which is ce(loss) = = BCELoss instead of cross entropy loss.

I have also changed out_channels=2 (representing the 2 values listed above).

When I run training, I get the following error:

ValueError: Using a target size (torch.Size([4, 512, 512])) that is different to the input size (torch.Size([4, 2, 512, 512])) is deprecated. Please ensure they have the same size.

Any help addressing this would be appreciated.

chap5 : age_gender_torch_snippets.ipynb

it's missing import . You use "pd.read...." in second cell, but didnt import pandas as pd ... maybe juste review all ref to add all missing import.

(CHPT -1) is this the code for backpropagation ?

    original_weights = deepcopy(weights)
    temp_weights = deepcopy(weights)
    updated_weights = deepcopy(weights) 
    original_loss = feed_forward(inputs, outputs, \
                                 original_weights)
    for i, layer in enumerate(original_weights):
        for index, weight in np.ndenumerate(layer):
            temp_weights = deepcopy(weights)
            temp_weights[i][index] += 0.0001
            _loss_plus = feed_forward(inputs, outputs, \
                                      temp_weights)
            grad = (_loss_plus - original_loss)/(0.0001)
            updated_weights[i][index] -= grad*lr
    return updated_weights, original_loss

losses = []
for epoch in range(100):
    W, loss = update_weights(x,y,W,0.01)
    losses.append(loss)

bugNum1

used loss function in val_loss without passing it as parameters to val_loss function

Chapter3: Train the model over n epochs

Using the enumerate function in each for loop inside the main loop of training the model over n epochs leaves unused variable ix.

for example (Varying_loss_optimizer notebook) :

        for epoch in range(10):
            print(epoch)
            train_epoch_losses, train_epoch_accuracies = [], []
            for ix, batch in enumerate(iter(trn_dl)):
                x, y = batch
                batch_loss = train_batch(x, y, model, optimizer, loss_fn)
                train_epoch_losses.append(batch_loss) 
            train_epoch_loss = np.array(train_epoch_losses).mean()
        
            for ix, batch in enumerate(iter(trn_dl)):
                x, y = batch
                is_correct = accuracy(x, y, model)
                train_epoch_accuracies.extend(is_correct)
            train_epoch_accuracy = np.mean(train_epoch_accuracies)
            for ix, batch in enumerate(iter(val_dl)):
                x, y = batch
                val_is_correct = accuracy(x, y, model)
                validation_loss = val_loss(x, y, model)
            val_epoch_accuracy = np.mean(val_is_correct)
            train_losses.append(train_epoch_loss)
            train_accuracies.append(train_epoch_accuracy)
            val_losses.append(validation_loss)
            val_accuracies.append(val_epoch_accuracy)

training FRCNN- requires_grad

in the book it says we freeeze the parameters but the requires_grad is set to True?
is that correct?
why?
shouldnt it be set to false like in RCNN?

class FRCNN(nn.Module):
def init(self):
super().init()
rawnet = torchvision.models.vgg16_bn(pretrained=True)
for param in rawnet.features.parameters():
param.requires_grad = True

Chap4 (CNN_working_details.ipynb) -- Sumprod (Device: CPU) X & cnn_w (Device: GPU ("cuda" or "mps"))

In the code

sumprod = torch.zeros((h_im - h_conv + 1, w_im - w_conv + 1))

for i in range(h_im - h_conv + 1):
    for j in range(w_im - w_conv + 1):
        img_subset = X_train[0, 0, i:(i+3), j:(j+3)]
        model_filter = cnn_w.reshape(3,3)
        val = torch.sum(img_subset*model_filter) + cnn_b
        sumprod[i,j] = val

Sumprod is on the CPU device, while X_train and model_filter that come from the cnn_w form the model are both init on the GPU

So when come the assignment, there will have an issues, and the sumprod will look like this

tensor([[val, 0],
              [0, 0]])

To fix it, just specify the device to sumprod

sumprod = torch.zeros((h_im - h_conv + 1, w_im - w_conv + 1)).to(device)

Broken Code in Chapter 9

Masked R-CNN notebooks Instance_Segmentation.ipynb and predicting_multiple_instances_of_multiple_classes.ipynb won't run. They give error at following lines:

_annots = stems(annots)
trn_items, val_items = train_test_split(_annots, random_state=2)

RecursionError Traceback (most recent call last)

in ()
7 annots.append(ann)
8 from sklearn.model_selection import train_test_split
----> 9 _annots = stems(annots)
10 trn_items, val_items = train_test_split(_annots, random_state=2)

7 frames

... last 1 frames repeated, from the frame below ...

/usr/lib/python3.7/glob.py in _iglob(pathname, recursive, dironly)
69 else:
70 glob_in_dir = _glob0
---> 71 for dirname in dirs:
72 for name in glob_in_dir(dirname, basename, dironly):
73 yield os.path.join(dirname, name)

RecursionError: maximum recursion depth exceeded while calling a Python object

I tried to increase recursion depth by adding the following lines:

import sys
sys.setrecursionlimit(10000000)

But it leads to another error:

--
ValueError Traceback (most recent call last)
/tmp/ipykernel_2241122/3562555350.py in
1 from sklearn.model_selection import train_test_split
2 _annots = stems(annots)
----> 3 trn_items, val_items = train_test_split(_annots, random_state=2)

~/miniconda3/envs/c2-vision/lib/python3.9/site-packages/sklearn/model_selection/_split.py in train_test_split(test_size, train_size, random_state, shuffle, stratify, *arrays)
2420
2421 n_samples = _num_samples(arrays[0])
-> 2422 n_train, n_test = _validate_shuffle_split(
2423 n_samples, test_size, train_size, default_test_size=0.25
2424 )

~/miniconda3/envs/c2-vision/lib/python3.9/site-packages/sklearn/model_selection/_split.py in _validate_shuffle_split(n_samples, test_size, train_size, default_test_size)
2096
2097 if n_train == 0:
-> 2098 raise ValueError(
2099 "With n_samples={}, test_size={} and train_size={}, the "
2100 "resulting train set will be empty. Adjust any of the "

ValueError: With n_samples=0, test_size=0.25 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.

Looks like a bug withing torch_snippets.

UPDATE: I ran into same error in crowd_counting.ipynb in Chapter 10

Chapter5:age_gender_prediction notebook

Unsed declared variable in the training loop over n epoch
```
             _n = len(train_loader)
```
No need for using enumerate in the training loop over n epoch

fix the ax index for the 'Validation Age Mean-Absolute-Error' plot

             epochs = np.arange(1,len(val_gender_accuracies)+1)
             fig,ax = plt.subplots(1,2,figsize=(10,5))
             ax = ax.flat
             ax[0].plot(epochs, val_gender_accuracies, 'bo')
             ax[1].plot(epochs, val_age_maes, 'r')
             ax[0].set_xlabel('Epochs')
             ax[1].set_xlabel('Epochs')
             ax[0].set_ylabel('Accuracy')
             ax[1].set_ylabel('MAE')
             ax[0].set_title('Validation Gender Accuracy')
             ax[0].set_title('Validation Age Mean-Absolute-Error')
             plt.show()

 correction:

               ax[1].set_title('Validation Age Mean-Absolute-Error')

Ch 13, Pix2Pix Dataset link not found

Not Found
The requested URL /~qian/Qian's Materials/ShoeV2.zip was not found on this server.

Error encountered in Chapter15 Object_detection_with_DETR

At the line:

!python main.py --coco_path ../open-images-bus-trucks/\
  --epochs 10 --lr=1e-4 --batch_size=2 --num_workers=4\
  --output_dir="outputs" --resume="detr-r50_no-class-head.pth"

The following error shows up:

Traceback (most recent call last):
File "/home/cmn/torch/fconda/detr/main.py", line 13, in
import datasets
File "/home/cmn/torch/fconda/detr/datasets/init.py", line 5, in
from .coco import build as build_coco
File "/home/cmn/torch/fconda/detr/datasets/coco.py", line 14, in
import datasets.transforms as T
File "/home/cmn/torch/fconda/detr/datasets/transforms.py", line 13, in
from util.misc import interpolate
File "/home/cmn/torch/fconda/detr/util/misc.py", line 22, in
from torchvision.ops import _new_empty_tensor
ImportError: cannot import name '_new_empty_tensor' from 'torchvision.ops' (/home/miniconda3/envs/c2-vision/lib/python3.9/site-packages/torchvision/ops/init.py)

Chapter - 7 RCNN

In the following code, I don't really understand why the candidates are resized along with delta and rois with width and height -

FPATHS, GTBBS, CLSS, DELTAS, ROIS, IOUS = [], [], [], [], [], []
N = 500
for ix, (im, bbs, labels, fpath) in enumerate(ds):
  if(ix==N):
    break

  H, W, _ = im.shape
  candidates = extract_candidates(im)
  candidates = np.array([(x,y,x+w,y+h) for x,y,w,h in candidates])       // This line of code
  ious, rois, clss, deltas = [], [], [], []
  ious = np.array([[extract_iou(candidate, _bb_) for candidate in candidates] for _bb_ in bbs]).T

  for jx, candidate in enumerate(candidates):
    cx,cy,cX,cY = candidate
    candidate_ious = ious[jx]
    best_iou_at = np.argmax(candidate_ious)
    best_iou = candidate_ious[best_iou_at]
    best_bb = _x,_y,_X,_Y = bbs[best_iou_at]
    if best_iou > 0.3: 
      clss.append(labels[best_iou_at])
    else: 
      clss.append('background')
    delta = np.array([_x-cx, _y-cy, _X-cX, _Y-cY]) / np.array([W,H,W,H])       // This line of code
    deltas.append(delta)
    rois.append(candidate / np.array([W,H,W,H]))        // This line of code

IoU threshold value clarification

Hi @sizhky , have a doubt in this function.

from torchvision.ops import nms
def decode_output(output):
    'convert tensors to numpy arrays'
    bbs = output['boxes'].cpu().detach().numpy().astype(np.uint16)
    labels = np.array([target2label[i] for i in output['labels'].cpu().detach().numpy()])
    confs = output['scores'].cpu().detach().numpy()
    ixs = nms(torch.tensor(bbs.astype(np.float32)), torch.tensor(confs), 0.05)
    bbs, confs, labels = [tensor[ixs] for tensor in [bbs, confs, labels]]
 
    if len(ixs) == 1:
        bbs, confs, labels = [np.array([tensor]) for tensor in [bbs, confs, labels]]
    return bbs.tolist(), confs.tolist(), labels.tolist()

Here, in this function you have mentioned the IoU threshold value as 0.05, does it mean, you are considering the predicted bbox which is greater than 5 %?

is it 1 - 0.05 = 0.95 --> 95 % percent of overlap?

ixs = nms(torch.tensor(bbs.astype(np.float32)), torch.tensor(confs), 0.05)

Order is changing while inferencing.

Hi @sizhky

After model building, when I tried to do inferencing the decode output function changes the order of the label and bbox which impacts the mAP score eventually.

Could you please correct it.

Facing error in Customizing StyleGAN2 Chapter13

At line

!python stylegan-encoder/align_images.py stylegan-encoder/raw_images/ stylegan-encoder/aligned_images/
!mv stylegan-encoder/aligned_images/* ./MyImage.jpg

the following error appears

Traceback (most recent call last):
File "stylegan-encoder/align_images.py", line 4, in
from keras.utils import get_file
ImportError: cannot import name 'get_file' from 'keras.utils' (/usr/local/lib/python3.7/dist-packages/keras/utils/init.py)
mv: cannot stat 'stylegan-encoder/aligned_images/*': No such file or directory

Also at the next line:

from PIL import Image
img = Image.open('MyImage.jpg')
show(np.array(img), sz=4, title='original')

!python encode_image.py ./MyImage.jpg\
    pred_dlatents_myImage.npy\
    --use_latent_finder true\
    --image_to_latent_path ./trained_models/image_to_latent.pt

pred_dlatents = np.load('pred_dlatents_myImage.npy')
pred_dlatent = torch.from_numpy(pred_dlatents).float().cuda()
pred_image = latent2image(pred_dlatent)
show(pred_image, sz=4, title='synthesized')

this error shows up:

FileNotFoundError Traceback (most recent call last)

in ()
1 from PIL import Image
----> 2 img = Image.open('MyImage.jpg')
3 show(np.array(img), sz=4, title='original')
4
5 get_ipython().system('python encode_image.py ./MyImage.jpg pred_dlatents_myImage.npy --use_latent_finder true --image_to_latent_path ./trained_models/image_to_latent.pt')

/usr/local/lib/python3.7/dist-packages/PIL/Image.py in open(fp, mode)
2841
2842 if filename:
-> 2843 fp = builtins.open(filename, "rb")
2844 exclusive_fp = True
2845

FileNotFoundError: [Errno 2] No such file or directory: 'MyImage.jpg'

Chapter09/Semantic_Segmentation_with_U_Net.ipynb

log = Report(n_epochs)
for ex in range(n_epochs):
    N = len(trn_dl)
    for bx, data in enumerate(trn_dl):
        loss, acc = train_batch(model, data, optimizer, criterion)
        log.record(ex+(bx+1)/N, trn_loss=loss, trn_acc=acc, end='\r')

    N = len(val_dl)
    for bx, data in enumerate(val_dl):
        loss, acc = validate_batch(model, data, criterion)
        log.record(ex+(bx+1)/N, val_loss=loss, val_acc=acc, end='\r')
        
    log.report_avgs(ex+1)

RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of size: : [4, 224, 224, 3]

in UnetLoss(preds, targets)
1 ce = nn.CrossEntropyLoss()
2 def UnetLoss(preds, targets):
----> 3 ce_loss = ce(preds, targets)
4 acc = (torch.max(preds, 1)[1] == targets).float().mean()
5 return ce_loss, acc

Error encountered in style transfer notebook (ch11)

The following error shows up after completing the notebook:

RuntimeError Traceback (most recent call last)
/tmp/ipykernel_151841/1024981148.py in
----> 1 out_img = postprocess(opt_img[0]).permute(1,2,0)
2 show(out_img)

~/miniconda3/envs/c2-vision/lib/python3.9/site-packages/torchvision/transforms/transforms.py in call(self, img)
59 def call(self, img):
60 for t in self.transforms:
---> 61 img = t(img)
62 return img
63

~/miniconda3/envs/c2-vision/lib/python3.9/site-packages/torchvision/transforms/transforms.py in call(self, img)
435
436 def call(self, img):
--> 437 return self.lambd(img)
438
439 def repr(self):

/tmp/ipykernel_151841/2609962387.py in (x)
6 ])
7 postprocess = T.Compose([
----> 8 T.Lambda(lambda x: x.mul_(1./255)),
9 T.Normalize(mean=[-0.485/0.229, -0.456/0.224, -0.406/0.225], std=[1/0.229, 1/0.224, 1/0.225]),
10 ])

RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

chapter 4, Image_augmentation.ipynb

when imgaug=0.4.0
in chapter 4, Image_augmentation.ipynb
all the code like plt.imshow(aug.augment_image(tr_images[0])) will cause error: %d format: a number is required, not str
therefore, it should be corrected as plt.imshow(aug.augment_image(tr_images[0].cpu().detach().numpy()))

Chapter 3 Scaling_the_dataset.ipynb

Hello! I would like to ask that this section of the book describes the effect of Sigmoid function on data scaling, but your code uses RELU(), so can you explain it again?

Binary cross-entropy function p.30

Is the function for binary-cross entropy on p.30 correct? - should there be a np.sum in there - looks like it should just be np.mean to correspond to the formula given on p.24 ?

Code for mask rcnn is not working

The notebook Instance_Segmentation.ipynb fails when training the model (which is quite expected).
the error is : "TypeError: Expected input images to be of floating type (in range [0, 1]), but found type torch.uint8 instead"

you load the image from file to PIL Image and then use prorprietary transform that convert to uint8 tensor.
but then you feed it to the model that you preloaded that expects float (scaled to [0,1] according to the normalization the model has) so its obvious that this error occurs.

How did it pass QA before publishing ?

Chapter 11

Is there a way to save attacked images after having performed adversarial attack?

I tried that:

img_mod = modified_images[0]
Image.fromarray(np.array(img_mod)).save("test.jpg")
img = Image.open("test.jpg")
predict_on_image(torch.tensor(np.array(img)))

And the prediction should be "lemon" but I still get "African elephant".

Then I tried saving as ".png" instead of ".jpg":

img_mod = modified_images[0]
Image.fromarray(np.array(img_mod)).save("test.png")
img = Image.open("test.png")
predict_on_image(torch.tensor(np.array(img)))

And this time I correctly get the label "lemon".

Can you explain why?

Chapter 9 when training semantic segmentation with u net i got that problem

Missing code in one cell - Chapter 11

One cell in the Autoencoders notebook of Chapter 11 - simple_auto_encoder_with_different_latent_size seems to be missing. This is the cell before the one where train_aec function is defined. This code is there in the book but not in the Jupyter notebook.

for _ in range(3):
    ix = np.random.randint(len(val_ds))
    im, _ = val_ds[ix]
    _im = model(im[None])[0]
    fig, ax = plt.subplots(1,2,figsize=(3,3)) 
    show(im[0], ax=ax[0], title='input')
    show(_im[0], ax=ax[1], title='prediction')
    plt.tight_layout()
    plt.show()