Eltwise binary EQ should produce a boolean or integer tensor as output about tt-metal HOT 8 CLOSED

davorchap commented on June 12, 2024

Eltwise binary EQ should produce a boolean or integer tensor as output

from tt-metal.

Comments (8)

razorback3 commented on June 12, 2024 1

Important note Most other operations do not support int tensor inputs yet. Typecast also does not support int -> float yet. It means returned tensor has a very limited use.

Request Please share details on the background behind this request. How do you expect to use eq results next? Why is it important for it to be int in your case? Thank you!

Hi @ayerofieiev-tt, thanks for the question.
As a first step of the training milestone for the collaboration of Moreh and TT, we are trying to train the MNIST minimal model.

MNIST minimal code

# from https://github.com/pytorch/examples/blob/master/mnist/main.py
# pylint: disable=C,R,W
from __future__ import print_function

import argparse
import time

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR
from torchvision import datasets
from torchvision import transforms

start_time = 0


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(784, 10)
        # self.fc2 = nn.Linear(200, 10)
    def forward(self, x):
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        # x = F.relu(x)
        # x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output


def train(args, model, device, train_loader, optimizer, epoch):
    global start_time
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % args.log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))
            if args.dry_run:
                break


def test(model, device, test_loader):
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()
    test_loss /= len(test_loader.dataset)
    print(
        '\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
            test_loss, correct, len(test_loader.dataset),
            100. * correct / len(test_loader.dataset)))


def main():
    global start_time
    start_time = time.time()

    # Training settings
    parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
    parser.add_argument('--batch-size',
                        type=int,
                        default=128,
                        metavar='N',
                        help='input batch size for training (default: 128)')
    parser.add_argument('--test-batch-size',
                        type=int,
                        default=1000,
                        metavar='N',
                        help='input batch size for testing (default: 1000)')
    parser.add_argument('--epochs',
                        type=int,
                        default=14,
                        metavar='N',
                        help='number of epochs to train (default: 14)')
    parser.add_argument('--lr',
                        type=float,
                        default=0.1,
                        metavar='LR',
                        help='learning rate (default: 0.1)')
    parser.add_argument('--gamma',
                        type=float,
                        default=0.7,
                        metavar='M',
                        help='Learning rate step gamma (default: 0.7)')
    parser.add_argument('--no-cuda',
                        action='store_true',
                        default=False,
                        help='disables CUDA training')
    parser.add_argument('--dry-run',
                        action='store_true',
                        default=False,
                        help='quickly check a single pass')
    parser.add_argument('--seed',
                        type=int,
                        default=1,
                        metavar='S',
                        help='random seed (default: 1)')
    parser.add_argument(
        '--log-interval',
        type=int,
        default=1,
        metavar='N',
        help='how many batches to wait before logging training status')
    parser.add_argument('--save-model',
                        action='store_true',
                        default=False,
                        help='For Saving the current Model')
    args = parser.parse_args()
    use_cuda = not args.no_cuda and torch.cuda.is_available()
    torch.manual_seed(args.seed)

    device = "cuda" if not args.no_cuda else "cpu"

    kwargs = {'batch_size': args.batch_size}
    if use_cuda:
        kwargs.update({'num_workers': 1, 'pin_memory': True, 'shuffle': True},)
    transform = transforms.Compose(
        [transforms.ToTensor(),
         transforms.Normalize((0.1307,), (0.3081,))])
    dataset1 = datasets.MNIST('./data',
                              train=True,
                              download=True,
                              transform=transform)
    dataset2 = datasets.MNIST('./data', train=False, transform=transform)
    train_loader = torch.utils.data.DataLoader(dataset1, **kwargs)
    test_loader = torch.utils.data.DataLoader(dataset2, **kwargs)
    model = Net()
    model = model.to(device)
    optimizer = optim.SGD(model.parameters(), lr=args.lr)
    scheduler = StepLR(optimizer, step_size=1, gamma=args.gamma)

    for epoch in range(1, args.epochs + 1):
        train(args, model, device, train_loader, optimizer, epoch)
        test(model, device, test_loader)
        scheduler.step()
    if args.save_model:
        torch.save(model.state_dict(), "mnist_cnn.pt")


if __name__ == '__main__':
    main()

In the above code, there is a line: correct += pred.eq(target.view_as(pred)).sum().item().
Here, the output of binary eq is used for sum. Yes, I agree that there is no sum implementation that receives integer input. Once the integer output of binary eq is supported, we should add support for integer input for sum op, too.

As a caveat, our goal is to train the model in PyTorch code without any code modification.

cc. @sangwon-chae , @dongjin-na

from tt-metal.

ayerofieiev-tt commented on June 12, 2024 1

@razorback3 thank you for details! Aligned now.

@rdjogoTT just merged the support for BFLOAT16 --> UINT16 typecast.
I am updating my PR which adds support for cast, output_tensor and queue_id.
Should be finished in a couple hours.

from tt-metal.

davorchap commented on June 12, 2024

producing a float, should be integer or boolean

from tt-metal.

eyonland commented on June 12, 2024

It looks like this may be dependent on #4858.

from tt-metal.

ayerofieiev-tt commented on June 12, 2024

So far we (@eyonland + @KalaivaniMCW) tried:

@KalaivaniMCW has a change calling set_dtype on output tensor after the operation
Setting output dtype before running the operation
Adding typecast from bfloat16 to uint16

Calling to set_dtype on a tensor

This is just a wrong approach (see 225c25a)

from @KalaivaniMCW

I tried to change the dtype of bf16 result after computation, result.set_dtype(DataType::UINT16); I could get partial output that passes PCC check. Only issue is while going from bf16 to uint16, the 0's and 1's become 0 => 0, 1 => 16256. It passes PCC as torch test checks with output.bool(). But

answer from @tt-aho

"set_dtype when output is bf16 is not the correct approach. this is just the tensor’s dtype attribute so that when it is read out, the bits are interpreted as u16 instead of bf16. There is no conversion being done"

Passing output dtype

Tried passing proper type to EltwiseBinary config. Fails tests. Need to collect details.
I wish setting output dtype just worked, but it fails post commit tests.
I'd like to better understand on what layer the failure occurs. I guess composite ops rely on intermediate tensors dtype.

Typecast

This could be an inefficient but working intermediate solution, but it does not work.
See #9029
Typecast produces bad results and this seem to be a bug in bfloat16 -> uint16 conversion
CC @rdjogoTT

Here are results, when we add a typecast from bloat16 to uint16:
A and B are rand-filled tensors, with first 3 elements set to 1 in both so we can see both true/false cases.

E150 (greyskull)

Expected 1.0 to become 1, not 31747.
Produced 0, where 1 was expected.

Input A 
ttnn.Tensor([
[ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
[ 0.08203,  0.94531,  ...,  0.00000,  0.00000],
...,
[ 0.00000,  0.00000,  ...,  0.00000,  0.00000],
[ 0.00000,  0.00000,  ...,  0.00000,  0.00000]], shape=Shape([5[32], 5[32]]), dtype=DataType::BFLOAT16, layout=Layout::TILE)


Input B 
ttnn.Tensor([
[ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
[ 0.76953,  0.99219,  ...,  0.00000,  0.00000],
...,
[ 0.00000,  0.00000,  ...,  0.00000,  0.00000],
[ 0.00000,  0.00000,  ...,  0.00000,  0.00000]], shape=Shape([5[32], 5[32]]), dtype=DataType::BFLOAT16, layout=Layout::TILE)

Output
BLOAT16 ttnn.Tensor([
[ 1.00000,  1.00000,  ...,  1.00000,  1.00000],
[ 0.00000,  0.00000,  ...,  1.00000,  1.00000],
...,
[ 1.00000,  1.00000,  ...,  1.00000,  1.00000],
[ 1.00000,  1.00000,  ...,  1.00000,  1.00000]], shape=Shape([5[32], 5[32]]), dtype=DataType::BFLOAT16, layout=Layout::TILE)

Output (after conversion to UINT16)
UINT16 ttnn.Tensor([
[31747, 31747,  ..., 31747, 31747],
[    0,     0,  ..., 31747, 31747],
...,
[    0,     0,  ...,     0,     0],
[    0,     0,  ...,     0,     0]], shape=Shape([5[32], 5[32]]), dtype=DataType::UINT16, layout=Layout::TILE)

N150 (wormhole)

Expected 1.0 to become 1, not 127. Otherwise looks correct

Input A 
ttnn.Tensor([
[ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
[ 0.08203,  0.94531,  ...,  0.00000,  0.00000],
...,
[ 0.00000,  0.00000,  ...,  0.00000,  0.00000],
[ 0.00000,  0.00000,  ...,  0.00000,  0.00000]], shape=Shape([5[32], 5[32]]), dtype=DataType::BFLOAT16, layout=Layout::TILE)

Input B 
ttnn.Tensor([
[ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
[ 0.76953,  0.99219,  ...,  0.00000,  0.00000],
...,
[ 0.00000,  0.00000,  ...,  0.00000,  0.00000],
[ 0.00000,  0.00000,  ...,  0.00000,  0.00000]], shape=Shape([5[32], 5[32]]), dtype=DataType::BFLOAT16, layout=Layout::TILE)

Output BLOAT16 
ttnn.Tensor([
[ 1.00000,  1.00000,  ...,  1.00000,  1.00000],
[ 0.00000,  0.00000,  ...,  1.00000,  1.00000],
...,
[ 1.00000,  1.00000,  ...,  1.00000,  1.00000],
[ 1.00000,  1.00000,  ...,  1.00000,  1.00000]], shape=Shape([5[32], 5[32]]), dtype=DataType::BFLOAT16, layout=Layout::TILE)

Output UINT16 
ttnn.Tensor([
[  127,   127,  ...,   127,   127],
[    0,     0,  ...,   127,   127],
...,
[  127,   127,  ...,   127,   127],
[  127,   127,  ...,   127,   127]], shape=Shape([5[32], 5[32]]), dtype=DataType::UINT16, layout=Layout::TILE)

from tt-metal.

KalaivaniMCW commented on June 12, 2024

Adding to @ayerofieiev-tt 's comment ,
I tried the following approaches

Typecasting Bfloat16 to Uint gives random values in output
Passing output_dtype as Uint also gives wrong values
Sending an output_tensor of Uint dtype gave wrong values at some indices
Setting dtype of the bf16 output using set_dtype to uint16 worked (partially) but cannot use it as it is not the right way to do it, and also it will affect all the EQ dependent ops and tests in main.

I added all the test result ( tt out vs torch out) of the cases I tried in https://docs.google.com/spreadsheets/d/1KBbtKhSc3znjhEkNUerSivvcEpnRAtKEygXGGpvvbms/edit?usp=sharing-

from tt-metal.

ayerofieiev-tt commented on June 12, 2024

Got the change that produces a proper result by adding an optional typecast in eltwise_binary kernel.
Before I merge the change:

I need to pick up uint16, which @rdjogoTT likely lands on Jan 4
Need to decide if EQ should support both bfloat16 and uint16 data type

https://github.com/tenstorrent/tt-metal/pull/9071/files

from tt-metal.

ayerofieiev-tt commented on June 12, 2024

With this PR, we start to respect passed output_dtype or output_tensor in ttnn's binary operations, including eq
#9071

So if you use python, the call looks like this

Default call like this returns bfloat by default

result = ttnn.eq(input_tensor_a, input_tensor_b)

But if you pass a dtype the result will be correctly formed with 0 and 1.

result = ttnn.eq(input_tensor_a, input_tensor_b, dtype=ttnn.DataType.UINT32)

Same will work if you call ttnn::eq from C++

Important note
Most other operations do not support int tensor inputs yet.
Typecast also does not support int -> float yet.
It means returned tensor has a very limited use.

Request
Please share details on the background behind this request.
How do you expect to use eq results next? Why is it important for it to be int in your case?
Thank you!

from tt-metal.

Eltwise binary EQ should produce a boolean or integer tensor as output about tt-metal HOT 8 CLOSED

Comments (8)

Calling to set_dtype on a tensor

Passing output dtype

Typecast

E150 (greyskull)

N150 (wormhole)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent