Coder Social home page Coder Social logo

Comments (6)

gstoner avatar gstoner commented on July 27, 2024

from tensile.

tingxingdong avatar tingxingdong commented on July 27, 2024

You do not need to differentiate Fiji and Polaris. They are the same GFX803 architecture while Vega is GFX900. The kernel optimal for Fiji should be the same optimal for Polaris. This helps you to reduce amount of work

from tensile.

paolodalberto avatar paolodalberto commented on July 27, 2024

Thank you for the reply. I was not clear. Let me work on my question.

Background: I am looking for sGEMM, dGEMM, cGEMM, and zGEMM. Classic GEMMs without batching (in the future I may add the batched). Currently my library is built using C and OpenCL (1.2) and I currently call the *GEMM from clBLAS for the code generated for Fiji GFX803. clBLAS provides a clean and known interface. I manage the data movement and then link the library to my application.

My understanding: Tensile creates a library of methods by an empirical search. So different devices may have different winners although they have the same architecture. At least, I think you explore a space and select winners, In this case I am willing to explore each device.

gstoner: Note AMDGPUpro driver cannot with 18.10 driver support the GCN Assembly based kernels since it not using Native LLVM compiler with has the GCN ISA support.

I recently installed: Radeon™ Software for Linux® Driver Version 17.50 for Ubuntu 16.04.3. I could run the experiments and create a client using the Tensile.py. I do not understand the statement above. You are saying that I cannot create the library and use it beside the experiments created by Tensile (hip only)?

Note: As long as I can use OpenCL to call the final result I will be very happy. I will be able to reuse my OpenCL code and I can work with other devices that are not GPUs. But if the only way is to use RocM and Hip. I will work to introduce a new interface for the new requirements. Either way, it is moving forward.

I am asking to learn how to create s,d,c,z GEMM self contained library for a device in such a way I can link it to an application written in C using an OpenCL interface, I would rather customize the call for the device a priori/or at run time. I completely miss, by Looking at the Client.cpp Client.h available in 4_LibraryClient, the methods that will be called to execute the computational kernel but I can follow most of the data preparation (may be because I know how to do it already).

I hope this time I clarified my request and expressed my ignorance. Would you mind to add a tutorial to address my request ? For example after we build the sgemm and its libtensile.a, what will be the interface to call the opencl sgemm function if there is any.

Note clBLAS used to have sample code for sgemm in c and c++. The code was clear (not short) but everything was there to understand how to reuse the code in a different scenario. This will help me to use other OpenCL implementations for other devices that are not GPUs.

Please, do not hesitate to contact me directly if you wish to ask me to do anything in particular.

from tensile.

gstoner avatar gstoner commented on July 27, 2024

from tensile.

paolodalberto avatar paolodalberto commented on July 27, 2024

17.50 still shipping with ROCm support.
legacy and rocm yep.

what about a tutorial ? Is it worth asking ?

from tensile.

paolodalberto avatar paolodalberto commented on July 27, 2024

The answer is no. So let us move on (no mixed devices).
Next will be rocBLAS then. I installed and run the first sgemm.

sgemm example
NT: m, n, k, lda, ldb, ldc = 1023, 1024, 1025, 1023, 1024, 1023
PASS: max_relative_error = 1.17549e-38

Can I customize rocBLAS per device? (tensile does that)
Are they z and c GEMM available ?

from tensile.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.