Reproducer: <div class="snippet-clipboard-content notranslate position-relative ov

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Thanks for the very clear explanation! Hey <a class="us

Not completely implemented `CIRGenTypes::getRecordTypeName` about clangir HOT 10 OPEN

ChuanqiXu9 commented on July 17, 2024

Not completely implemented `CIRGenTypes::getRecordTypeName`

from clangir.

Comments (10)

sitio-couto commented on July 17, 2024 1

Hey @ChuanqiXu9! I'm unsure if I understood the core issue here, but let me try to answer your questions.

How can we get the attributes from the name of the type?
If you want to get the type of a template argument, there is no current way to do this from CIR. You'll need to query the AST. If you want the type of the struct members, you can use the getMembers function defined in StructType.
It sounds like an ODR violation different StructType with the same name. What's the reason for the design?
This is not the design. We ensure unique names for struct types of the same kind. The function you pointed out, creates a unique name for each template instance. Also, when lowering to LLVM Dialect, we ensure the Kind attribute is added as a prefix in the LLVM struct name through the getPrefixedName method.
we don't print the name for other template arguments? Or did we simply not implement it?
We simply haven't implemented it. Keep in mind that the header you are including (#include <memory>) has several unimplemented features besides the one responsible for the error you found.
Also do we have back compatibility requirement for CIR?
I'm not sure I understood this question. Do you mean if CIR aims to compile all legacy C/C++ code? If so, yes.
Regarding mangled struct names.
AFAIK, this is a part of ABI-related codegen/lowering that we do not yet address. Using the mangled name as a unique name instead of the approach described in item 2 would be one possibility. We simply haven't addressed this yet.
What is causing this error?
We haven't implemented the approach described in item 2 for certain template arguments (packed arguments for example template<ypename ...Args>), so it falls on the default clause. To fix this, you can add the required logic to handle the missing case statement to create the unique name.

Hope this helps!

from clangir.

ChuanqiXu9 commented on July 17, 2024 1

Thanks for the clear response. Yeah, it is find to extract the NYI things into a configurable function. At least it won't cause a lots of scaring crashes.

from clangir.

Lancern commented on July 17, 2024

The important difference between CIR and LLVM IR here is how they distinguish different struct types.

In LLVM IR, different llvm::StructType objects represent different struct types. The name of a llvm::StructType is more like a "comment" that makes the assembly more readable. Thus in the original CodeGen we just have to make a "rough" name that does not have to be precise.

On contrary, in CIR, two different mlir::cir::StructType objects can represent the same struct type as long as they have the same name and same kind (see here for how CIR compares two mlir::cir::StructType objects). Thus in CIR we have to invent a precise "mangled" name for each struct type, as the mangled struct names will be used for equality comparison.

from clangir.

ChuanqiXu9 commented on July 17, 2024

Thanks for explanation. It sounds like an ODR violation to me if we have two different StructType with the same name. What's the reason for the design?

And what's the concern that we don't print the name for other template arguments? Or did we simply not implement it? Also do we have back compatibility requirement for CIR? If not, and I assuming we don't have other consumers for CIR outside the LLVM repo, it looks like it is not so hard to mangle the name.

from clangir.

Lancern commented on July 17, 2024

It sounds like an ODR violation to me if we have two different StructType with the same name. What's the reason for the design?

Are you asking about why we need to additionally compare the struct kind besides the mangled name, which can already uniquely identify a struct type? I'm not quite sure about this, but the git history shows the relevant part is commited in e776625 . Maybe we can ask the author about this :)

And what's the concern that we don't print the name for other template arguments? Or did we simply not implement it?

We simply has not implemented it yet. "NYI" represents "not yet implemented" and in CIR by convention we use llvm_unreachable("NYI") to make clang crash on unimplemented features.

Also do we have back compatibility requirement for CIR?

We'd better confirm with @bcardosolopes about this :)

If not, and I assuming we don't have other consumers for CIR outside the LLVM repo, it looks like it is not so hard to mangle the name.

Agree. IMHO if the name is merely used for distinguishing different struct types we can have easier ways to build the mangled name (with an increasing counter, for example).

from clangir.

ChuanqiXu9 commented on July 17, 2024

Are you asking about why we need to additionally compare the struct kind besides the mangled name, which can already uniquely identify a struct type?

I mean, in a valid C++ program, it is not allowed to have different struct definitions with the same name. It violates the ODR (One Definition Rule).

from clangir.

bcardosolopes commented on July 17, 2024

I'll take a more careful look at this tomorrow and reply to all comments. In the meantime, @sitio-couto worked a bunch on this and might have feedback here.

from clangir.

bcardosolopes commented on July 17, 2024

Thanks @sitio-couto for the great explanation!

from clangir.

ChuanqiXu9 commented on July 17, 2024

Thanks for the very clear explanation!

Hey @ChuanqiXu9! I'm unsure if I understood the core issue here, but let me try to answer your questions.

How can we get the attributes from the name of the type?
If you want to get the type of a template argument, there is no current way to do this from CIR. You'll need to query the AST. If you want the type of the struct members, you can use the getMembers function defined in StructType.

It sounds like an ODR violation different StructType with the same name. What's the reason for the design?
This is not the design. We ensure unique names for struct types of the same kind. The function you pointed out, creates a unique name for each template instance. Also, when lowering to LLVM Dialect, we ensure the Kind attribute is added as a prefix in the LLVM struct name through the getPrefixedName method.

we don't print the name for other template arguments? Or did we simply not implement it?
We simply haven't implemented it. Keep in mind that the header you are including (#include <memory>) has several unimplemented features besides the one responsible for the error you found.

Also do we have back compatibility requirement for CIR?
I'm not sure I understood this question. Do you mean if CIR aims to compile all legacy C/C++ code? If so, yes.

I mean, if the CIR produced by clang version X, can be consumed by clang version Y, given Y > X. If yes, it has backward compatibility. The examples have backward compatibility may be the DWARF information and the Itanium CXX ABI. The examples don't have backward compatibility may be the LLVM IR and C++ Modules.

Regarding mangled struct names.
AFAIK, this is a part of ABI-related codegen/lowering that we do not yet address. Using the mangled name as a unique name instead of the approach described in item 2 would be one possibility. We simply haven't addressed this yet.

Interesting. I didn't thought the mangled name is different from the unique name.

BTW, it sounds slightly odd to me that we mangle the struct names due to ABI reasons in CIR level. Since I think CIR is a C++ specific IR and so we should preserve more C++ information as much as possible. Then we can mangle it in the phase of Lowering.

What is causing this error?
We haven't implemented the approach described in item 2 for certain template arguments (packed arguments for example template<ypename ...Args>), so it falls on the default clause. To fix this, you can add the required logic to handle the missing case statement to create the unique name.

Yeah, I did that locally and passes this error. (And meet more errors, as expected : )) I am still in the phase experiencing CIR.

BTW, since I'd like to proposing analysis only CIR to make it usable quickly. I find the existing NYI mechanism is way too strict for analysis only purpose. It makes sense to boil out if we're going to generating an executable and we're not done. But if we're only going to analysis something, it looks not necessary to boil out as long as the missing piece won't cause the analysis pass to produce incorrect results.

For example, whether or not we generate the initializer for dynamic TLS shouldn't matter if we only want to analysis the code, so I'd make such changes:

    if (D->getTLSKind()) {
+     if (!isAnalysisOnly() && 
           D->getTLSKind() == VarDecl::TLS_Dynamic)
        llvm_unreachable("NYI");
      setTLSMode(GV, *D);
    }

What do you think about this? @bcardosolopes @sitio-couto @Lancern

Hope this helps!

You are welcome!

from clangir.

bcardosolopes commented on July 17, 2024

I mean, if the CIR produced by clang version X, can be consumed by clang version Y, given Y > X. If yes, it has backward compatibility. The examples have backward compatibility may be the DWARF information and the Itanium CXX ABI. The examples don't have backward compatibility may be the LLVM IR and C++ Modules.

We have no plans for backward compatibility yet, it's too early for that. It's a possibility once CIR is mature enough, but right now it's evolving fast and won't bring any benefits to any current users.

BTW, it sounds slightly odd to me that we mangle the struct names due to ABI reasons in CIR level.

Turns out mangling is an integral part of how C++ works, and pretty much how a lot of things are encoded (e.g. multiple compiler syntethized default ctors, etc). Using mangled names greatly improve symbol lookup in MLIR and make it convenient to understand common post ABI decisions that are usually implicitly encoded. Using "real" names instead of mangling poses some interesting challenge because in some way or another we would have to reinvent part of the wheel to deal with namespaces, and other things that would differentiate one symbol from the other - it's not trivial as one might think.

It doesn't feel odd to me, but you are free to have opinions.

Since I think CIR is a C++ specific IR and so we should preserve more C++ information as much as possible. Then we can mangle it in the phase of Lowering.

Backreferences to the AST could be available whenever someone wants more "raw" information.

BTW, since I'd like to proposing analysis only CIR to make it usable quickly.

Defining "usable quickly" is also tricky, the project is at a state that unless you put effort to implement the things you need, it won't be useful.

I find the existing NYI mechanism is way too strict for analysis only purpose. It makes sense to boil out if we're going to generating an executable and we're not done.

This is also not as simple as it seems. One example is when we don't yet generate a CIR operation for a given set of AST nodes, a given analysis could be completely wrong because we are basically deleting code it should be looking at. My experience with linting at Meta tells me that inaccuracy is the main cause people turn checks off or start ignoring them. I rather we be more conservative in the short term, while aiming at more reliability in the future.

But if we're only going to analysis something, it looks not necessary to boil out as long as the missing piece won't cause the analysis pass to produce incorrect results. ... For example, whether or not we generate the initializer for dynamic TLS shouldn't matter if we only want to analysis the code

How do you even know if the missing pieces are not going to produce incorrect results? On a case basis looks like shortcut to me. It also makes an analysis hard to debug later, because we don't even know where to start debugging.

On the TLS case, it might not be relevant to you, but it's possible someone is writing a thread-like analysis that could benefit from this information. We are not interested in providing half baked tools to the community, we'd prefer that interested users put their time to make things work properly instead of taking shortcuts.

What do you think about this?

I don't feel comfortable with this approach. I'd find reasonable here would be to abstract all NYI into a helper function that as part of its implementation could optionally return early and never assert. However, I'd be opposed to turning that on for analysis-only because it will be basically bad PR for the quality of analysis results we can give. I rather we take longer to have something usable than having something usable but wrong that might scare people away from trying it again. Nothing should prevent you from turning it ON in your downstream version of clangir though.

from clangir.

Not completely implemented `CIRGenTypes::getRecordTypeName` about clangir HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent