Coder Social home page Coder Social logo

Comments (8)

razorback3 avatar razorback3 commented on June 12, 2024 1

Important note Most other operations do not support int tensor inputs yet. Typecast also does not support int -> float yet. It means returned tensor has a very limited use.

Request Please share details on the background behind this request. How do you expect to use eq results next? Why is it important for it to be int in your case? Thank you!

Hi @ayerofieiev-tt, thanks for the question.
As a first step of the training milestone for the collaboration of Moreh and TT, we are trying to train the MNIST minimal model.

MNIST minimal code
# from https://github.com/pytorch/examples/blob/master/mnist/main.py
# pylint: disable=C,R,W
from __future__ import print_function

import argparse
import time

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR
from torchvision import datasets
from torchvision import transforms

start_time = 0


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(784, 10)
        # self.fc2 = nn.Linear(200, 10)
    def forward(self, x):
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        # x = F.relu(x)
        # x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output


def train(args, model, device, train_loader, optimizer, epoch):
    global start_time
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % args.log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))
            if args.dry_run:
                break


def test(model, device, test_loader):
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()
    test_loss /= len(test_loader.dataset)
    print(
        '\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
            test_loss, correct, len(test_loader.dataset),
            100. * correct / len(test_loader.dataset)))


def main():
    global start_time
    start_time = time.time()

    # Training settings
    parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
    parser.add_argument('--batch-size',
                        type=int,
                        default=128,
                        metavar='N',
                        help='input batch size for training (default: 128)')
    parser.add_argument('--test-batch-size',
                        type=int,
                        default=1000,
                        metavar='N',
                        help='input batch size for testing (default: 1000)')
    parser.add_argument('--epochs',
                        type=int,
                        default=14,
                        metavar='N',
                        help='number of epochs to train (default: 14)')
    parser.add_argument('--lr',
                        type=float,
                        default=0.1,
                        metavar='LR',
                        help='learning rate (default: 0.1)')
    parser.add_argument('--gamma',
                        type=float,
                        default=0.7,
                        metavar='M',
                        help='Learning rate step gamma (default: 0.7)')
    parser.add_argument('--no-cuda',
                        action='store_true',
                        default=False,
                        help='disables CUDA training')
    parser.add_argument('--dry-run',
                        action='store_true',
                        default=False,
                        help='quickly check a single pass')
    parser.add_argument('--seed',
                        type=int,
                        default=1,
                        metavar='S',
                        help='random seed (default: 1)')
    parser.add_argument(
        '--log-interval',
        type=int,
        default=1,
        metavar='N',
        help='how many batches to wait before logging training status')
    parser.add_argument('--save-model',
                        action='store_true',
                        default=False,
                        help='For Saving the current Model')
    args = parser.parse_args()
    use_cuda = not args.no_cuda and torch.cuda.is_available()
    torch.manual_seed(args.seed)

    device = "cuda" if not args.no_cuda else "cpu"

    kwargs = {'batch_size': args.batch_size}
    if use_cuda:
        kwargs.update({'num_workers': 1, 'pin_memory': True, 'shuffle': True},)
    transform = transforms.Compose(
        [transforms.ToTensor(),
         transforms.Normalize((0.1307,), (0.3081,))])
    dataset1 = datasets.MNIST('./data',
                              train=True,
                              download=True,
                              transform=transform)
    dataset2 = datasets.MNIST('./data', train=False, transform=transform)
    train_loader = torch.utils.data.DataLoader(dataset1, **kwargs)
    test_loader = torch.utils.data.DataLoader(dataset2, **kwargs)
    model = Net()
    model = model.to(device)
    optimizer = optim.SGD(model.parameters(), lr=args.lr)
    scheduler = StepLR(optimizer, step_size=1, gamma=args.gamma)

    for epoch in range(1, args.epochs + 1):
        train(args, model, device, train_loader, optimizer, epoch)
        test(model, device, test_loader)
        scheduler.step()
    if args.save_model:
        torch.save(model.state_dict(), "mnist_cnn.pt")


if __name__ == '__main__':
    main()

In the above code, there is a line: correct += pred.eq(target.view_as(pred)).sum().item().
Here, the output of binary eq is used for sum. Yes, I agree that there is no sum implementation that receives integer input. Once the integer output of binary eq is supported, we should add support for integer input for sum op, too.

As a caveat, our goal is to train the model in PyTorch code without any code modification.

cc. @sangwon-chae , @dongjin-na

from tt-metal.

ayerofieiev-tt avatar ayerofieiev-tt commented on June 12, 2024 1

@razorback3 thank you for details! Aligned now.

@rdjogoTT just merged the support for BFLOAT16 --> UINT16 typecast.
I am updating my PR which adds support for cast, output_tensor and queue_id.
Should be finished in a couple hours.

from tt-metal.

davorchap avatar davorchap commented on June 12, 2024

producing a float, should be integer or boolean

from tt-metal.

eyonland avatar eyonland commented on June 12, 2024

It looks like this may be dependent on #4858.

from tt-metal.

ayerofieiev-tt avatar ayerofieiev-tt commented on June 12, 2024

So far we (@eyonland + @KalaivaniMCW) tried:

  • @KalaivaniMCW has a change calling set_dtype on output tensor after the operation
  • Setting output dtype before running the operation
  • Adding typecast from bfloat16 to uint16

Calling to set_dtype on a tensor

This is just a wrong approach (see 225c25a)

from @KalaivaniMCW

I tried to change the dtype of bf16 result after computation, result.set_dtype(DataType::UINT16); I could get partial output that passes PCC check. Only issue is while going from bf16 to uint16, the 0's and 1's become 0 => 0, 1 => 16256. It passes PCC as torch test checks with output.bool(). But

answer from @tt-aho

"set_dtype when output is bf16 is not the correct approach. this is just the tensor’s dtype attribute so that when it is read out, the bits are interpreted as u16 instead of bf16. There is no conversion being done"

Passing output dtype

Tried passing proper type to EltwiseBinary config. Fails tests. Need to collect details.
I wish setting output dtype just worked, but it fails post commit tests.
I'd like to better understand on what layer the failure occurs. I guess composite ops rely on intermediate tensors dtype.

Typecast

This could be an inefficient but working intermediate solution, but it does not work.
See #9029
Typecast produces bad results and this seem to be a bug in bfloat16 -> uint16 conversion
CC @rdjogoTT

Here are results, when we add a typecast from bloat16 to uint16:
A and B are rand-filled tensors, with first 3 elements set to 1 in both so we can see both true/false cases.

E150 (greyskull)

Expected 1.0 to become 1, not 31747.
Produced 0, where 1 was expected.

Input A 
ttnn.Tensor([
[ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
[ 0.08203,  0.94531,  ...,  0.00000,  0.00000],
...,
[ 0.00000,  0.00000,  ...,  0.00000,  0.00000],
[ 0.00000,  0.00000,  ...,  0.00000,  0.00000]], shape=Shape([5[32], 5[32]]), dtype=DataType::BFLOAT16, layout=Layout::TILE)


Input B 
ttnn.Tensor([
[ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
[ 0.76953,  0.99219,  ...,  0.00000,  0.00000],
...,
[ 0.00000,  0.00000,  ...,  0.00000,  0.00000],
[ 0.00000,  0.00000,  ...,  0.00000,  0.00000]], shape=Shape([5[32], 5[32]]), dtype=DataType::BFLOAT16, layout=Layout::TILE)

Output
BLOAT16 ttnn.Tensor([
[ 1.00000,  1.00000,  ...,  1.00000,  1.00000],
[ 0.00000,  0.00000,  ...,  1.00000,  1.00000],
...,
[ 1.00000,  1.00000,  ...,  1.00000,  1.00000],
[ 1.00000,  1.00000,  ...,  1.00000,  1.00000]], shape=Shape([5[32], 5[32]]), dtype=DataType::BFLOAT16, layout=Layout::TILE)

Output (after conversion to UINT16)
UINT16 ttnn.Tensor([
[31747, 31747,  ..., 31747, 31747],
[    0,     0,  ..., 31747, 31747],
...,
[    0,     0,  ...,     0,     0],
[    0,     0,  ...,     0,     0]], shape=Shape([5[32], 5[32]]), dtype=DataType::UINT16, layout=Layout::TILE)

N150 (wormhole)

Expected 1.0 to become 1, not 127. Otherwise looks correct

Input A 
ttnn.Tensor([
[ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
[ 0.08203,  0.94531,  ...,  0.00000,  0.00000],
...,
[ 0.00000,  0.00000,  ...,  0.00000,  0.00000],
[ 0.00000,  0.00000,  ...,  0.00000,  0.00000]], shape=Shape([5[32], 5[32]]), dtype=DataType::BFLOAT16, layout=Layout::TILE)

Input B 
ttnn.Tensor([
[ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
[ 0.76953,  0.99219,  ...,  0.00000,  0.00000],
...,
[ 0.00000,  0.00000,  ...,  0.00000,  0.00000],
[ 0.00000,  0.00000,  ...,  0.00000,  0.00000]], shape=Shape([5[32], 5[32]]), dtype=DataType::BFLOAT16, layout=Layout::TILE)

Output BLOAT16 
ttnn.Tensor([
[ 1.00000,  1.00000,  ...,  1.00000,  1.00000],
[ 0.00000,  0.00000,  ...,  1.00000,  1.00000],
...,
[ 1.00000,  1.00000,  ...,  1.00000,  1.00000],
[ 1.00000,  1.00000,  ...,  1.00000,  1.00000]], shape=Shape([5[32], 5[32]]), dtype=DataType::BFLOAT16, layout=Layout::TILE)

Output UINT16 
ttnn.Tensor([
[  127,   127,  ...,   127,   127],
[    0,     0,  ...,   127,   127],
...,
[  127,   127,  ...,   127,   127],
[  127,   127,  ...,   127,   127]], shape=Shape([5[32], 5[32]]), dtype=DataType::UINT16, layout=Layout::TILE)

from tt-metal.

KalaivaniMCW avatar KalaivaniMCW commented on June 12, 2024

Adding to @ayerofieiev-tt 's comment ,
I tried the following approaches

  • Typecasting Bfloat16 to Uint gives random values in output
  • Passing output_dtype as Uint also gives wrong values
  • Sending an output_tensor of Uint dtype gave wrong values at some indices
  • Setting dtype of the bf16 output using set_dtype to uint16 worked (partially) but cannot use it as it is not the right way to do it, and also it will affect all the EQ dependent ops and tests in main.

I added all the test result ( tt out vs torch out) of the cases I tried in https://docs.google.com/spreadsheets/d/1KBbtKhSc3znjhEkNUerSivvcEpnRAtKEygXGGpvvbms/edit?usp=sharing-

from tt-metal.

ayerofieiev-tt avatar ayerofieiev-tt commented on June 12, 2024

Got the change that produces a proper result by adding an optional typecast in eltwise_binary kernel.
Before I merge the change:

  • I need to pick up uint16, which @rdjogoTT likely lands on Jan 4
  • Need to decide if EQ should support both bfloat16 and uint16 data type

https://github.com/tenstorrent/tt-metal/pull/9071/files

from tt-metal.

ayerofieiev-tt avatar ayerofieiev-tt commented on June 12, 2024

With this PR, we start to respect passed output_dtype or output_tensor in ttnn's binary operations, including eq
#9071

So if you use python, the call looks like this

Default call like this returns bfloat by default

result = ttnn.eq(input_tensor_a, input_tensor_b)

But if you pass a dtype the result will be correctly formed with 0 and 1.

result = ttnn.eq(input_tensor_a, input_tensor_b, dtype=ttnn.DataType.UINT32)

Same will work if you call ttnn::eq from C++

Important note
Most other operations do not support int tensor inputs yet.
Typecast also does not support int -> float yet.
It means returned tensor has a very limited use.

Request
Please share details on the background behind this request.
How do you expect to use eq results next? Why is it important for it to be int in your case?
Thank you!

from tt-metal.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.