Coder Social home page Coder Social logo

Comments (5)

thakkarV avatar thakkarV commented on September 18, 2024

@hwu36

from cutlass.

hwu36 avatar hwu36 commented on September 18, 2024

True fp32 is not supported by tensor core. Only tf32 can use tensor core. Do you want to convert fp32 to tf32 before the computation?

Do you want to support fprop or wgrad or anything else?

The inline ptx in scale_bias_relu_transform.h is hard coded for fp16x2, not for fp32. you don't have to write inline ptx, but just write cuda. Something like

if (input != special_nan) {  // we use a special nan to mark out of bound data.  we use 0x7eff for fp16 special nan.
  float res = input > float(0) ? input : input * leaky_alpha;
}

from cutlass.

satyabhagavan avatar satyabhagavan commented on September 18, 2024

@hwu36 I implemented in the same way by defining leaky_alpha as float(0.1), by doing that I am getting nans at the output. Do I need to change the 'MmaElements' and 'MmaCols' in the scale_bias_relu_transform.h file in converting to floats.

from cutlass.

hwu36 avatar hwu36 commented on September 18, 2024

Yes. You'd better dump the value of matrix, bias, scale first to see if every thread owns the right data. You can use 1,2,3,4... to initialize a small matrix to do that.

Mainloop fusion is the most difficult one. If possible, you'd better do the fusion in the previous kernel epilogue, which is easier and has better performance.

from cutlass.

github-actions avatar github-actions commented on September 18, 2024

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

from cutlass.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.