Comments (6)
from tensile.
You do not need to differentiate Fiji and Polaris. They are the same GFX803 architecture while Vega is GFX900. The kernel optimal for Fiji should be the same optimal for Polaris. This helps you to reduce amount of work
from tensile.
Thank you for the reply. I was not clear. Let me work on my question.
Background: I am looking for sGEMM, dGEMM, cGEMM, and zGEMM. Classic GEMMs without batching (in the future I may add the batched). Currently my library is built using C and OpenCL (1.2) and I currently call the *GEMM from clBLAS for the code generated for Fiji GFX803. clBLAS provides a clean and known interface. I manage the data movement and then link the library to my application.
My understanding: Tensile creates a library of methods by an empirical search. So different devices may have different winners although they have the same architecture. At least, I think you explore a space and select winners, In this case I am willing to explore each device.
gstoner: Note AMDGPUpro driver cannot with 18.10 driver support the GCN Assembly based kernels since it not using Native LLVM compiler with has the GCN ISA support.
I recently installed: Radeon™ Software for Linux® Driver Version 17.50 for Ubuntu 16.04.3. I could run the experiments and create a client using the Tensile.py. I do not understand the statement above. You are saying that I cannot create the library and use it beside the experiments created by Tensile (hip only)?
Note: As long as I can use OpenCL to call the final result I will be very happy. I will be able to reuse my OpenCL code and I can work with other devices that are not GPUs. But if the only way is to use RocM and Hip. I will work to introduce a new interface for the new requirements. Either way, it is moving forward.
I am asking to learn how to create s,d,c,z GEMM self contained library for a device in such a way I can link it to an application written in C using an OpenCL interface, I would rather customize the call for the device a priori/or at run time. I completely miss, by Looking at the Client.cpp Client.h available in 4_LibraryClient, the methods that will be called to execute the computational kernel but I can follow most of the data preparation (may be because I know how to do it already).
I hope this time I clarified my request and expressed my ignorance. Would you mind to add a tutorial to address my request ? For example after we build the sgemm and its libtensile.a, what will be the interface to call the opencl sgemm function if there is any.
Note clBLAS used to have sample code for sgemm in c and c++. The code was clear (not short) but everything was there to understand how to reuse the code in a different scenario. This will help me to use other OpenCL implementations for other devices that are not GPUs.
Please, do not hesitate to contact me directly if you wish to ask me to do anything in particular.
from tensile.
from tensile.
17.50 still shipping with ROCm support.
legacy and rocm yep.
what about a tutorial ? Is it worth asking ?
from tensile.
The answer is no. So let us move on (no mixed devices).
Next will be rocBLAS then. I installed and run the first sgemm.
sgemm example
NT: m, n, k, lda, ldb, ldc = 1023, 1024, 1025, 1023, 1024, 1023
PASS: max_relative_error = 1.17549e-38
Can I customize rocBLAS per device? (tensile does that)
Are they z and c GEMM available ?
from tensile.
Related Issues (20)
- Trying to compile with upstream llvm-13 HOT 12
- hipErrorSharedObjectInitFailed when testing default example HOT 7
- ‘JoinParameters’ is no longer supported HOT 5
- Why MT not equal to WG*TT? HOT 2
- mixed precision casues failure to benchmark HOT 2
- `getKernel` should return `hipErrorNotFound` if no module HOT 2
- Enabling UnrollLoopEfficiencyEnable leads to crash during kernel generation HOT 3
- Please enable two factor authentication in your github account
- Set constStrideC0 to 0 instead of 1 HOT 1
- How to determine the input size to be tested? HOT 2
- Asking for advices about tuning small GEMM with large batch count HOT 4
- Is Tensile adapted to RDNA2 ? HOT 2
- Rouding error with Gfx90aFp16altSupport HOT 1
- hipblasdgemm not getting close to peak HOT 4
- [Feature]: Further FP32 GEMM optimization for gfx11 HOT 2
- how to enable t_debug? HOT 1
- Tensile won't produce backend libraries for archs without optimized logic files when using --separate-architectures HOT 33
- Kernels source code are not generated HOT 1
- kernel.cpp without assembly kernel implement HOT 2
- undefined symbol: Tensile::TypedContractionInputs<Tensile::Half>::TypedContractionInputs() HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensile.