Comments (5)
from cutlass.
True fp32 is not supported by tensor core. Only tf32 can use tensor core. Do you want to convert fp32 to tf32 before the computation?
Do you want to support fprop or wgrad or anything else?
The inline ptx in scale_bias_relu_transform.h is hard coded for fp16x2, not for fp32. you don't have to write inline ptx, but just write cuda. Something like
if (input != special_nan) { // we use a special nan to mark out of bound data. we use 0x7eff for fp16 special nan.
float res = input > float(0) ? input : input * leaky_alpha;
}
from cutlass.
@hwu36 I implemented in the same way by defining leaky_alpha as float(0.1), by doing that I am getting nans at the output. Do I need to change the 'MmaElements' and 'MmaCols' in the scale_bias_relu_transform.h file in converting to floats.
from cutlass.
Yes. You'd better dump the value of matrix, bias, scale first to see if every thread owns the right data. You can use 1,2,3,4... to initialize a small matrix to do that.
Mainloop fusion is the most difficult one. If possible, you'd better do the fusion in the previous kernel epilogue, which is easier and has better performance.
from cutlass.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
from cutlass.
Related Issues (20)
- [QST] Split-k in hopper gather scatter gemm
- [QST] kInternalError while increasing warp count in older SIMT GEMM kernels.
- [BUG] TMA Cooperative GeMM with Stream-K scheduler hangs for specific gemm shapes HOT 1
- [QST] Get a slice of the Tensor while keeping the dimension HOT 3
- [BUG] my code compiles well in `git reset --hard v3.2.1`, but errors `git reset --hard v3.4.1` HOT 1
- [BUG]
- [QST] Integrating Cutlass EVT to MLIR-Based compiler problems
- [QST] How does cutlass profiler test gemm performance?
- [QST] How does cutlass profiler test gemm performance?
- [QST] Internal error in cutlass gemm HOT 1
- [BUG] `#include "cutlass/gemm/device/gemm_universal_adapter.h"` is causing the named symbol to not be found HOT 6
- [QST]Can I create a null tensor? HOT 1
- [QST]How to understand composition?
- [QST] How do I import the `cutlass::gemm::kernel::GemmUniversal` HOT 2
- [BUG] cutlass-python does not work on H100 for CUDA 11.8 HOT 1
- [BUG] FAILED to compile example 47_ampere_gemm_universal_streamk
- [QST]Tensor Shape Mismatch in CUTLASS: Does Layout Information Attach to Pointers?
- cutlass hipblas
- [QST] conda or pip package for cutlass missing HOT 3
- [QST] Can synchronized TensorCore MMA operations overlap with CUDA Core operations in a single thread?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cutlass.