Comments (8)
Important note Most other operations do not support int tensor inputs yet. Typecast also does not support int -> float yet. It means returned tensor has a very limited use.
Request Please share details on the background behind this request. How do you expect to use eq results next? Why is it important for it to be int in your case? Thank you!
Hi @ayerofieiev-tt, thanks for the question.
As a first step of the training milestone for the collaboration of Moreh and TT, we are trying to train the MNIST minimal model.
MNIST minimal code
# from https://github.com/pytorch/examples/blob/master/mnist/main.py
# pylint: disable=C,R,W
from __future__ import print_function
import argparse
import time
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR
from torchvision import datasets
from torchvision import transforms
start_time = 0
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(784, 10)
# self.fc2 = nn.Linear(200, 10)
def forward(self, x):
x = torch.flatten(x, 1)
x = self.fc1(x)
# x = F.relu(x)
# x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
def train(args, model, device, train_loader, optimizer, epoch):
global start_time
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % args.log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
if args.dry_run:
break
def test(model, device, test_loader):
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print(
'\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
def main():
global start_time
start_time = time.time()
# Training settings
parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
parser.add_argument('--batch-size',
type=int,
default=128,
metavar='N',
help='input batch size for training (default: 128)')
parser.add_argument('--test-batch-size',
type=int,
default=1000,
metavar='N',
help='input batch size for testing (default: 1000)')
parser.add_argument('--epochs',
type=int,
default=14,
metavar='N',
help='number of epochs to train (default: 14)')
parser.add_argument('--lr',
type=float,
default=0.1,
metavar='LR',
help='learning rate (default: 0.1)')
parser.add_argument('--gamma',
type=float,
default=0.7,
metavar='M',
help='Learning rate step gamma (default: 0.7)')
parser.add_argument('--no-cuda',
action='store_true',
default=False,
help='disables CUDA training')
parser.add_argument('--dry-run',
action='store_true',
default=False,
help='quickly check a single pass')
parser.add_argument('--seed',
type=int,
default=1,
metavar='S',
help='random seed (default: 1)')
parser.add_argument(
'--log-interval',
type=int,
default=1,
metavar='N',
help='how many batches to wait before logging training status')
parser.add_argument('--save-model',
action='store_true',
default=False,
help='For Saving the current Model')
args = parser.parse_args()
use_cuda = not args.no_cuda and torch.cuda.is_available()
torch.manual_seed(args.seed)
device = "cuda" if not args.no_cuda else "cpu"
kwargs = {'batch_size': args.batch_size}
if use_cuda:
kwargs.update({'num_workers': 1, 'pin_memory': True, 'shuffle': True},)
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))])
dataset1 = datasets.MNIST('./data',
train=True,
download=True,
transform=transform)
dataset2 = datasets.MNIST('./data', train=False, transform=transform)
train_loader = torch.utils.data.DataLoader(dataset1, **kwargs)
test_loader = torch.utils.data.DataLoader(dataset2, **kwargs)
model = Net()
model = model.to(device)
optimizer = optim.SGD(model.parameters(), lr=args.lr)
scheduler = StepLR(optimizer, step_size=1, gamma=args.gamma)
for epoch in range(1, args.epochs + 1):
train(args, model, device, train_loader, optimizer, epoch)
test(model, device, test_loader)
scheduler.step()
if args.save_model:
torch.save(model.state_dict(), "mnist_cnn.pt")
if __name__ == '__main__':
main()
In the above code, there is a line: correct += pred.eq(target.view_as(pred)).sum().item()
.
Here, the output of binary eq is used for sum. Yes, I agree that there is no sum implementation that receives integer input. Once the integer output of binary eq is supported, we should add support for integer input for sum op, too.
As a caveat, our goal is to train the model in PyTorch code without any code modification.
cc. @sangwon-chae , @dongjin-na
from tt-metal.
@razorback3 thank you for details! Aligned now.
@rdjogoTT just merged the support for BFLOAT16 --> UINT16 typecast.
I am updating my PR which adds support for cast, output_tensor and queue_id.
Should be finished in a couple hours.
from tt-metal.
producing a float, should be integer or boolean
from tt-metal.
It looks like this may be dependent on #4858.
from tt-metal.
So far we (@eyonland + @KalaivaniMCW) tried:
- @KalaivaniMCW has a change calling set_dtype on output tensor after the operation
- Setting output dtype before running the operation
- Adding typecast from bfloat16 to uint16
Calling to set_dtype on a tensor
This is just a wrong approach (see 225c25a)
from @KalaivaniMCW
I tried to change the dtype of bf16 result after computation, result.set_dtype(DataType::UINT16); I could get partial output that passes PCC check. Only issue is while going from bf16 to uint16, the 0's and 1's become 0 => 0, 1 => 16256. It passes PCC as torch test checks with output.bool(). But
answer from @tt-aho
"set_dtype when output is bf16 is not the correct approach. this is just the tensor’s dtype attribute so that when it is read out, the bits are interpreted as u16 instead of bf16. There is no conversion being done"
Passing output dtype
Tried passing proper type to EltwiseBinary config. Fails tests. Need to collect details.
I wish setting output dtype just worked, but it fails post commit tests.
I'd like to better understand on what layer the failure occurs. I guess composite ops rely on intermediate tensors dtype.
Typecast
This could be an inefficient but working intermediate solution, but it does not work.
See #9029
Typecast produces bad results and this seem to be a bug in bfloat16 -> uint16 conversion
CC @rdjogoTT
Here are results, when we add a typecast from bloat16 to uint16:
A and B are rand-filled tensors, with first 3 elements set to 1 in both so we can see both true/false cases.
E150 (greyskull)
Expected 1.0 to become 1, not 31747.
Produced 0, where 1 was expected.
Input A
ttnn.Tensor([
[ 1.00000, 1.00000, ..., 0.00000, 0.00000],
[ 0.08203, 0.94531, ..., 0.00000, 0.00000],
...,
[ 0.00000, 0.00000, ..., 0.00000, 0.00000],
[ 0.00000, 0.00000, ..., 0.00000, 0.00000]], shape=Shape([5[32], 5[32]]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
Input B
ttnn.Tensor([
[ 1.00000, 1.00000, ..., 0.00000, 0.00000],
[ 0.76953, 0.99219, ..., 0.00000, 0.00000],
...,
[ 0.00000, 0.00000, ..., 0.00000, 0.00000],
[ 0.00000, 0.00000, ..., 0.00000, 0.00000]], shape=Shape([5[32], 5[32]]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
Output
BLOAT16 ttnn.Tensor([
[ 1.00000, 1.00000, ..., 1.00000, 1.00000],
[ 0.00000, 0.00000, ..., 1.00000, 1.00000],
...,
[ 1.00000, 1.00000, ..., 1.00000, 1.00000],
[ 1.00000, 1.00000, ..., 1.00000, 1.00000]], shape=Shape([5[32], 5[32]]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
Output (after conversion to UINT16)
UINT16 ttnn.Tensor([
[31747, 31747, ..., 31747, 31747],
[ 0, 0, ..., 31747, 31747],
...,
[ 0, 0, ..., 0, 0],
[ 0, 0, ..., 0, 0]], shape=Shape([5[32], 5[32]]), dtype=DataType::UINT16, layout=Layout::TILE)
N150 (wormhole)
Expected 1.0 to become 1, not 127. Otherwise looks correct
Input A
ttnn.Tensor([
[ 1.00000, 1.00000, ..., 0.00000, 0.00000],
[ 0.08203, 0.94531, ..., 0.00000, 0.00000],
...,
[ 0.00000, 0.00000, ..., 0.00000, 0.00000],
[ 0.00000, 0.00000, ..., 0.00000, 0.00000]], shape=Shape([5[32], 5[32]]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
Input B
ttnn.Tensor([
[ 1.00000, 1.00000, ..., 0.00000, 0.00000],
[ 0.76953, 0.99219, ..., 0.00000, 0.00000],
...,
[ 0.00000, 0.00000, ..., 0.00000, 0.00000],
[ 0.00000, 0.00000, ..., 0.00000, 0.00000]], shape=Shape([5[32], 5[32]]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
Output BLOAT16
ttnn.Tensor([
[ 1.00000, 1.00000, ..., 1.00000, 1.00000],
[ 0.00000, 0.00000, ..., 1.00000, 1.00000],
...,
[ 1.00000, 1.00000, ..., 1.00000, 1.00000],
[ 1.00000, 1.00000, ..., 1.00000, 1.00000]], shape=Shape([5[32], 5[32]]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
Output UINT16
ttnn.Tensor([
[ 127, 127, ..., 127, 127],
[ 0, 0, ..., 127, 127],
...,
[ 127, 127, ..., 127, 127],
[ 127, 127, ..., 127, 127]], shape=Shape([5[32], 5[32]]), dtype=DataType::UINT16, layout=Layout::TILE)
from tt-metal.
Adding to @ayerofieiev-tt 's comment ,
I tried the following approaches
- Typecasting Bfloat16 to Uint gives random values in output
- Passing output_dtype as Uint also gives wrong values
- Sending an output_tensor of Uint dtype gave wrong values at some indices
- Setting dtype of the bf16 output using set_dtype to uint16 worked (partially) but cannot use it as it is not the right way to do it, and also it will affect all the EQ dependent ops and tests in main.
I added all the test result ( tt out vs torch out) of the cases I tried in https://docs.google.com/spreadsheets/d/1KBbtKhSc3znjhEkNUerSivvcEpnRAtKEygXGGpvvbms/edit?usp=sharing-
from tt-metal.
Got the change that produces a proper result by adding an optional typecast in eltwise_binary kernel.
Before I merge the change:
- I need to pick up uint16, which @rdjogoTT likely lands on Jan 4
- Need to decide if EQ should support both bfloat16 and uint16 data type
https://github.com/tenstorrent/tt-metal/pull/9071/files
from tt-metal.
With this PR, we start to respect passed output_dtype or output_tensor in ttnn's binary operations, including eq
#9071
So if you use python, the call looks like this
Default call like this returns bfloat by default
result = ttnn.eq(input_tensor_a, input_tensor_b)
But if you pass a dtype the result will be correctly formed with 0 and 1.
result = ttnn.eq(input_tensor_a, input_tensor_b, dtype=ttnn.DataType.UINT32)
Same will work if you call ttnn::eq
from C++
Important note
Most other operations do not support int tensor inputs yet.
Typecast also does not support int -> float yet.
It means returned tensor has a very limited use.
Request
Please share details on the background behind this request.
How do you expect to use eq results next? Why is it important for it to be int in your case?
Thank you!
from tt-metal.
Related Issues (20)
- WH: assert failure running on double nebula machine HOT 1
- Move ArgMax to an updated TTNN structure
- [Feature Request] Refactoring moreh_layernorm
- ElemwiseUnary/ELEMWISE_UNARY_MUL
- ElemwiseUnary/ELEMWISE_UNARY_ASSIGN
- ND Hang behaviour on f10cs08 in T3000 LLM model perf tests HOT 2
- Find and re-organize uses of `CI` and `GITHUB_ACTIONS` environment variables, such as in mistral7b HOT 3
- New resnet50 issues on WH
- Debug mode compilation does not include debug symbols HOT 1
- HiResResnet50
- Update Dockerfile for clang-17 and gdb usage
- [HELP] How to resolve WAR dependency in Circular Buffer HOT 1
- Mistral perf optimizations
- SDPA has non-deterministic PCC HOT 14
- ttnn.conv2d fails with L1 issue for FADNetPP
- WH remote runs hang on consecutive runs with dispatch core profiling
- Move eltwise binary to ttnn
- API call for fetching active dispatch cores on a run
- Clean up imports used in `models/` and add static checks for Python imports
- Investigate TG <-> TG network latency
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tt-metal.