Coder Social home page Coder Social logo

RuntimeError: Given groups=1, weight of size [48, 37, 11], expected input[8, 691, 18] to have 37 channels, but got 691 channels instead about transformer HOT 6 OPEN

Abdelsater avatar Abdelsater commented on June 11, 2024
RuntimeError: Given groups=1, weight of size [48, 37, 11], expected input[8, 691, 18] to have 37 channels, but got 691 channels instead

from transformer.

Comments (6)

maxjcohen avatar maxjcohen commented on June 11, 2024

Hi, as you can see your input vector has shape (8, 18, 691) from what was printed before the error, whereas you would like to have an input with shape (batch_size, time_length, d_input). My guess is that you tried to cobined R with X in your data preprocessing, which would explain the size 691 = 672 + 19 of you current input vector.

I suggest double checking your data processing function, and to concatenate X and R on the d_input dimension, as to obtain an input vector of shape (7500, 19+8=37, 672). Note that you may need to broadcast R.

from transformer.

Abdelsater avatar Abdelsater commented on June 11, 2024

thank you for your calrification, but I am still not quite sure how to address it, this how the function is converting the dataset from csv to npz :

def csv2npz(dataset_x_path, dataset_y_path, output_path, filename, labels_path='labels.json'):
    """Load input dataset from csv and create x_train tensor."""
    # Load dataset as csv
    x = pd.read_csv(dataset_x_path)
    y = pd.read_csv(dataset_y_path)

    # Load labels, file can be found in challenge description
    with open(labels_path, "r") as stream_json:
        labels = json.load(stream_json)

    m = x.shape[0]
    K = TIME_SERIES_LENGTH  # Can be found through csv

    # Create R and Z
    R = x[labels["R"]].values
    R = np.tile(R, 672)
    R = R.astype(np.float32)

    X = y[[f"{var_name}_{i}" for var_name in labels["X"]
           for i in range(K)]]
    X = X.values.reshape((m, -1, K))
    X = X.astype(np.float32)

    Z = x[[f"{var_name}_{i}" for var_name in labels["Z"]
           for i in range(K)]]
    Z = Z.values.reshape((m, -1, K))
#     Z = Z.transpose((0, 2, 1))
    Z = Z.astype(np.float32)

    np.savez(path.join(output_path, filename), R=R, X=X, Z=Z)

my input and output after reading the csv files look like the following :

d_input : 37
d_output : 8

Could you please point out for me or edit the code directly and thank you again for sharing the code and supporting in troubleshouting

from transformer.

maxjcohen avatar maxjcohen commented on June 11, 2024

my input and output after reading the csv files look like the following :

d_input : 37
d_output : 8

This seems good to me, so the problem probably isn't from the csv2npz function, but rather in your dataloader and how it handles the R and X variables. Sorry for the late answer, please tell me if that helps.

from transformer.

Abdelsater avatar Abdelsater commented on June 11, 2024

As I stated before my npz dataset dimension looks like this 👍

[('R', (7500, 12768), dtype('float32')), ('X', (7500, 8, 672), dtype('float32')), ('Z', (7500, 18, 672), dtype('float32'))]'

the benchmark notebook that I am trying to use looks like this (it is taken from your repo on gihub) and the dataloader specifically looks like this:

import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from tqdm import tqdm
import seaborn as sns

from tst.loss import OZELoss

from src.benchmark import BiGRU, ConvGru
from src.dataset import OzeDataset
from src.utils import compute_loss
from src.visualization import map_plot_function, plot_values_distribution, plot_error_distribution, plot_errors_threshold, plot_visual_sample
# Training parameters
DATASET_PATH = 'Output-Dataset.npz'
BATCH_SIZE = 8
NUM_WORKERS = 4
LR = 1e-4
EPOCHS = 30

# Model parameters
d_model = 48 # Lattent dim
N = 2 # Number of layers
dropout = 0.2 # Dropout rate

d_input = 37 # From dataset
d_output = 8 # From dataset

# Config
sns.set()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")
Using device cuda:0
Training
Load dataset
ozeDataset = OzeDataset(DATASET_PATH)

dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (5500, 1000, 1000))
dataloader_train = DataLoader(dataset_train,
                              batch_size=BATCH_SIZE,
                              shuffle=True,
                              num_workers=NUM_WORKERS,
                              pin_memory=False
                             )

dataloader_val = DataLoader(dataset_val,
                            batch_size=BATCH_SIZE,
                            shuffle=True,
                            num_workers=NUM_WORKERS
                           )

dataloader_test = DataLoader(dataset_test,
                             batch_size=BATCH_SIZE,
                             shuffle=False,
                             num_workers=NUM_WORKERS
                            )
Load network
# Load transformer with Adam optimizer and MSE loss function
net = ConvGru(d_input, d_model, d_output, N, dropout=dropout, bidirectional=True).to(device)
optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = OZELoss(alpha=0.3)
Train
model_save_path = f'models/model_LSTM_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
val_loss_best = np.inf

# Prepare loss history
hist_loss = np.zeros(EPOCHS)
hist_loss_val = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, (x, y) in enumerate(dataloader_train):
            optimizer.zero_grad()

            # Propagate input
            netout = net(x.to(device))

            # Comupte loss
            loss = loss_function(y.to(device), netout)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
            pbar.update(x.shape[0])
        
        train_loss = running_loss/len(dataloader_train)
        val_loss = compute_loss(net, dataloader_val, loss_function, device).item()
        pbar.set_postfix({'loss': train_loss, 'val_loss': val_loss})
        
        hist_loss[idx_epoch] = train_loss
        hist_loss_val[idx_epoch] = val_loss
        
        if val_loss < val_loss_best:
            val_loss_best = val_loss
            torch.save(net.state_dict(), model_save_path)
        
plt.plot(hist_loss, 'o-', label='train')
plt.plot(hist_loss_val, 'o-', label='val')
plt.legend()
print(f"model exported to {model_save_path} with loss {val_loss_best:5f}")'

The error that I am getting is the following:

RuntimeError: Given groups=1, weight of size [48, 37, 11], expected input[8, 13440, 18] to have 37 channels, but got 13440 channels instead'

This error is giving me hard times, since I tried several transformation before , but since you confirmed the same input and output , how we can make this work , by the way I tried the original benchmark using the csv directly and it worked , the code looks like this :

import datetime

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from tqdm import tqdm
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from pathlib import Path
import sys
import psutil

from src.dataset import OzeDataset, OzeEvaluationDataset, OzeNPZDataset
from src.utils import npz_check, compute_loss, csv2npz
from src.model import BenchmarkLSTM
BATCH_SIZE = 100
# NUM_WORKERS = psutil.cpu_count() # Use this to get number of logical processing units
NUM_WORKERS = psutil.cpu_count(logical=False) # Use this to get number of physical Cores
LR = 1e-2
EPOCHS = 30
HIDDEN_DIM = 100
NUM_LAYERS = 3

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")

#dataset = OzeNPZDataset(dataset_path=npz_check(Path('datasets'), 'dataset'), labels_path="labels.json")
dataset = OzeDataset(dataset_x_path="Datasets/x_train_LsAZgHU.csv", dataset_y_path="Datasets/y_train_EFo1WyE.csv", labels_path="labels.json")
#K = dataset.time_series_length
K= 672

# More info about memory pinning here: https://pytorch.org/docs/stable/data.html#memory-pinning
is_cuda = device == torch.device("cuda:0")
num_workers = 0 if is_cuda else NUM_WORKERS
dataloader = DataLoader(dataset,
                        batch_size=BATCH_SIZE,
                        shuffle=True,
                        pin_memory=is_cuda,
                        num_workers=num_workers)

m, M = dataloader.dataset.m, dataloader.dataset.M

d_input = dataset.get_x_shape()[2]  # From dataset
print('d_input : {}'.format(d_input))
d_output = dataset.get_y_shape()[2]  # From dataset
print('d_output : {}'.format(d_output))
# Load benchmark network with Adam optimizer and MSE loss function
net = BenchmarkLSTM(input_dim=d_input, hidden_dim=HIDDEN_DIM, output_dim=d_output, num_layers=NUM_LAYERS).to(device)
loss_function = nn.MSELoss()
optimizer = optim.Adam(net.parameters(), lr=LR)

model_save_path = f'model_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'

def fit():
    """
    Fits selected network
    """
    loss_best = np.inf
    # Prepare loss history
    hist_loss = np.zeros(EPOCHS)
    for idx_epoch in range(EPOCHS):
        running_loss = 0
        with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
            for idx_batch, (inp, out) in enumerate(dataloader):
                optimizer.zero_grad()

                # Propagate input
                net_out = net(inp.to(device))

                # Compute loss
                loss = loss_function(out.to(device), net_out)

                # Backpropagate loss
                loss.backward()

                # Update weights
                optimizer.step()

                running_loss += loss.item()
                pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
                pbar.update(inp.shape[0])

            train_loss = running_loss/len(dataloader)
            pbar.set_postfix({'loss': train_loss})

            hist_loss[idx_epoch] = train_loss

            if train_loss < loss_best:
                train_loss_best = train_loss
                torch.save(net.state_dict(), model_save_path)
    print(f"\nmodel exported to {model_save_path} with loss {train_loss_best:5f}")
    return hist_loss

try:
    hist_loss = fit()
except RuntimeError as err:
    if str(err).startswith('CUDA out of memory.'):
        print('\nSwitching device to cpu to workaround CUDA out of memory problem.')
        device = torch.device("cpu")
        net = net.to(device)
        dataloader = DataLoader(dataset,
                                batch_size=BATCH_SIZE,
                                shuffle=True,
                                pin_memory=False,
                                num_workers=NUM_WORKERS)
        hist_loss = fit()
    else:
        sys.exit()

plt.plot(hist_loss, 'o-', label='train')
plt.legend()

Thank you for debugging this with me , my goal is to re-run your experiment so I can build my own transformer in the end, so understanding your experiment will help me a lot. Thank you

from transformer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤ī¸ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.