hsutter / cppfront Goto Github PK

View Code? Open in Web Editor NEW

5.1K 5.1K 217.0 14.08 MB

A personal experimental C++ Syntax 2 -> Syntax 1 compiler

License: Other

C++ 99.89% Batchfile 0.01% Shell 0.09% C 0.01%

cppfront's People

Contributors

Stargazers

Watchers

Forkers

vicfred nashvillechicken69 gmh5225 eyalz800 michaelbprice mrcodechef sentido-labs clayne harlowja aaronalbers cyberflamego monad-one jake-arkinstall classic130 mfkiwl blockspacer carrot93 richardsonjf eshanatnight shonker ysheshang janecarpenter3256 ahmblum tamato doytsujin drew-gross mu-l kapabl annushka6 acebilly jessydl mariusbancila schwarzschild-radius dima116535 gilbertoalbino mick235711 jojaquix ombuend gugurok titancheat unknown442 gregwagner banasiakadam60 fpelliccioni dacap filipsajdak ellen2015 curioustauseef lmmqxyx404 zaucy a4z elielvipata hugolm84 bmghirbi risingliberty dongxi sirraide caikera linecode feiyunwill mikestillman marioarbras jpakkane mordante damandoh maybelaterornot aitzaz-ahmad monwarez pastapastapasta luckyabby2010 kspine bengtgustafsson jfsimon1981 richard-delorenzi h0nzzik ljleb qumeta mmadej25 beaglesystems cloudbee7 vasilislasdas andersama azurecloudmonk marekknapek rossjaywill uamhforever mcriedel adler3d fluffychaos jaymichaud jarzec gerlero jacke switch-blade-stuff apetcho knst n8behavior tumnus wingunder zhenyatnk

cppfront's Issues

[BUG] CPP2_UFCS macros don't work with template functions/methods with multiple template arguments

CPP2_UFCS macro is based on variadic macros. That means that every comma , in the macro separates two arguments for the macro (unless surrounded by parentheses ().

That makes below the cpp2 code (mixed mode):

#include <string>

template <auto from, auto to>
auto substr(const std::string& input) -> std::string {
    return input.substr(from, to-from);
}

main: () -> int = {
    test_string: std::string = "Unfortunatelly macros have limitations";
    cmd.substr<3,6>();
}

compiles to (skipping boilerplate):

[[nodiscard]] auto main() -> int{
    std::string test_string { "Unfortunatelly macros have limitations" }; 
    CPP2_UFCS_0(substr<3,6>, test_string);
}

And a call to CPP2_UFCS_0 will be perceived by the compiler as CPP2_UFCS_0((substr<3),(6>), (test_string)); (parentheses added to emphasize the problem. That will not compile and will end up with an error:

clang++ -std=c++20 -Iinclude ../tests/ufcs-method-limitations.cpp 
tests/ufcs-method-limitations.cpp2:10:30: error: too many arguments provided to function-like macro invocation
    CPP2_UFCS_0(substr<3,6>, test_string);
                             ^
include/cpp2util.h:487:9: note: macro 'CPP2_UFCS_0' defined here
#define CPP2_UFCS_0(FUNCNAME,PARAM1) \
        ^
tests/ufcs-method-limitations.cpp2:10:5: error: use of undeclared identifier 'CPP2_UFCS_0'
    CPP2_UFCS_0(substr<3,6>, test_string);
    ^
2 errors generated.

A similar error will happen for CPP2_UFCS version - that reminds me why we hate macros so much.

Normally, in such cases, we would add parentheses to give hints to the preprocessor where are the arguments. In this case when we do it we will look like:

CPP2_UFCS_0((substr<3,6>), test_string);

But it will also fail to compile with the error: expected unqualified-id:

clang++ -std=c++20 -Iinclude ../tests/ufcs-method-limitations.cpp 
tests/ufcs-method-limitations.cpp2:10:17: error: expected unqualified-id
    CPP2_UFCS_0((substr<3,6>), test_string);
                ^
tests/ufcs-method-limitations.cpp2:10:17: error: expected unqualified-id
error: constexpr if condition is not a constant expression
tests/ufcs-method-limitations.cpp2:10:5: note: in instantiation of function template specialization 'main()::(anonymous class)::operator()<std::string &>' requested here
    CPP2_UFCS_0((substr<3,6>), test_string);
    ^
include/cpp2util.h:494:2: note: expanded from macro 'CPP2_UFCS_0'
}(PARAM1)
 ^
3 errors generated.

Unless there is a trick to make that macro work again we will need to come up with some other idea to solve it.

I found that bug while trying to update the function call chaining PR (#18). Probably working UFCS will not be possible without semantic analysis on cppfront side.

The same will be needed to allow Unified "." member/scope selection (e.g., std.swap(x,y)) - it is impossible to distinguish if std is a namespace by using require expressions or decltype hacks (already tried that).

[BUG] In mixed mode cppfront can interfere with string done with raw string literal

Cppfront recognized a line as a cpp2 line but it is in the middle of a raw string literal.

Cpp2 code:

auto a = R"(
i:int = 42;
)";

after compiling with cppfront we get:

// ----- Cpp2 support -----
#include "cpp2util.h"

auto a = R"(

)";

//=== Cpp2 definitions ==========================================================

int i { 42 };

https://godbolt.org/z/qMnerz8cM

Expected: cppfront should treat raw string literals similarly as it treats /* */ comments - there is a need to check if it is in the middle of a raw string literal (like we test if it is in the middle of the comment).

[BUG] does not bounds check iterators

the following code segfaults:
using while first <= last { instead of while first < last {

To Reproduce
1.

main: () -> int
= {
    words: std::vector<std::string> = ( "decorated", "hello", "world" );

    first: std::vector<std::string>::iterator = words.begin();
    last : std::vector<std::string>::iterator = words.end();

    while first <= last {
        print_and_decorate(first*);
        first++;
    }
}

print_and_decorate: (thing:_) =
    std::cout << ">> " << thing << "\n";

 $ cppfront  pure2-bounds-safety-iterator.cpp2 -p -s
pure2-bounds-safety-iterator.cpp2... ok (all Cpp2, passes safety checks)

 $ g++ -std=c++20 pure2-bounds-safety-iterator.cpp -o pure2-bounds-safety-iterator
 $ ./pure2-bounds-safety-iterator 
>> decorated
>> hello
>> world
Segmentation fault (core dumped)

I wanted cppfront to reject it
it it passed and segfaulted on running

Pointless experiment

Sorry to be that guy but the whole "experiment" with new c++ syntax is rather pointless.
We already have that: It is called Rust and does everything and more than cppfront2 is trying to poorly mimic and imitate.
Also, one of main selling points for rust like explicit lifetimes controlled by the compiler - this is simply not gonna happen in cpp. So basically Herb is trying to create/promote knock-off technology that is already outdated...

Discussion: Optional declarations and syntax sugar

So based on what I understand about cppfront and Cpp2 as it stands, there's a union keyword that effectively infers / uses an std::variant wherever it's used, and it's invisible to the end-user-- I think that's great and a good choice.

However, it looks like when using optionals, you still have to declare them with the original C++ syntax… I think there's a missed opportunity here that has been well treaded in C# (via "Nullables"), JavaScript / TypeScript, and Swift.

If you look at Swift's Optionals, we have declaration like this:

var a: Int	// int type
var b: Int?	// Optional Int type
var c: Dog?	// Optional class type

And accessing values are like this (the last two lines are called "optional chaining" in most languages):

print(b ?? 7)		// prints the value of "b" if non-optional, otherwise prints "7"
let h = c?.height	// accessing member variables from an optional class
c?.bark()		// calls "bark" method on c if c is not null, does nothing otherwise
d?.data?.dog?.bark()	// Similar to above, but you have a hierarchy of optional members

Swift also provides the concept of "unwrapping" an optional; I don't want to muck up the above with this, but it would be negligent of me not to mention it:

if var c = c {	// try to "unwrap" the optional into a non-optional type
	c.bark()	// optional was successfully "unwrapped", c is now a non-optional type, and "bark" method can be called
}

In my opinion, Swift is a very "Optional first" type of language, meaning you often want to see if something "can be" an optional before you even consider a non-optional type, and I think this kind of thinking might also benefit C++… and maybe a good vector to do this would be through Cpp2 / cppfront.

If for nothing else the syntax sugar of using a declaration of type? to imply std::optional<type> and providing support for optional chaining (the c?.bark() instead of writing if (*c) { c->bark(); } would be tremendous in my opinion.

One other thing to consider about Optionals in general (as well as how they currently stand in Cpp2), is that the absence of a value is typically represented as null or nil or (in the case of C++)nullopt. I do agree (and totally support) the idea of eliminating NULL in Cpp2, but I think they track you'll find here is similar to Swift, where nil (their version of null) only exists to support the "absense of a value in an optional."

Whether or not you wanted to go down a similar "unwrap" path would be up to you (and dependent on if you even wanted to go down this "syntax sugar" idea or not), but I think there's lots to learn from other languages that do optionals that C++ can't do without considerable effort that are worth considering here.

tl;dr:

Consider treating optional types like you treat union -> std::variant
Learn from other languages (C#, TypeScript, Swift), and provide syntax sugar:
- For declaration: b: Int?
- For chaining: c?.bark()
- For defaults: b ?? 5
Consider having a "null" but only for optional types (nullopt_t is okay, but maybe there's a better long-term naming solution)
Consider advanced ideas like optional unwrapping from Swift

Add operator<< for standard containers.

It is very trivial task which I met every day more than 5 years. And in C++ we should write something like

std::vector<int> v;
...
for (auto x : v) {
  std::cout << x << ' ';
}

If we can replace this with

std::cout << v;

it also can reduce language complexity. This is very easy feature which exists in every modern programming language. But C++ does not have that : (

trouble trying out vector iteration

I was curious to see if cpp2 improvements included iterator invalidation safety so I tried the following:

#include <vector>

main: () -> int = {
    v : std::vector = (1, 2, 3, 4, 5);

    i : int = 0;
    for (auto& item : v) {
        if i > 3 {
            v.resize(100);
        }
        i += 1;
    }
}

I got the for loop syntax from other regression tests but cppfront doesn't accept it. It appears to be complaining about imbalanced parens but AFAICT they are balanced. Errors:

~/cppfront/regression-tests$ ../cppfront resize-invalidation-check.cpp2  && g++-10 -fconcepts -std=c++2a -I../include resize-invalidation-check.cpp
resize-invalidation-check.cpp2...
resize-invalidation-check.cpp2(7,21): error: unexpected text - expression-list is not terminated by ) (at ':')
resize-invalidation-check.cpp2(7,23): error: expected valid range expression after 'for' (at 'v')
resize-invalidation-check.cpp2(7,24): error: expected ; at end of statement (at ')')
resize-invalidation-check.cpp2(7,24): error: invalid statement in compound-statement (at ')')
resize-invalidation-check.cpp2(7,24): error: ill-formed initializer (at ')')
resize-invalidation-check.cpp2(3,1): error: unexpected text at end of Cpp2 code section (at 'main')
resize-invalidation-check.cpp2(2,0): error: parse failed for section starting here

resize-invalidation-check.cpp2: In function ‘int main()’:
resize-invalidation-check.cpp2:12:15: error: request for member ‘resize’ in ‘v’, which is of non-class type ‘int’
   12 |     }
      |               ^

Also not sure what prevents call to resize... in generated code v is shadowed, not sure if this is incidental or some deliberate attempt to prevent what I'm doing?

parsing '*' fails due to a confusion between postfix and infix operator

The parser eagerly turns * into postfix operators, which is not always correct. Consider the example below:

foo : (a:int) -> auto = { return a*2; }

It fails to compile in cppfront with an error message (missing ';'). The error goes away when adding a space in front of the * because the parser then does not recognize it as postfix operator. But insisting on a whitespace in front on a binary operator seems questionable, it is a significant deviation from regular C++.

In general the postfix * and & operators are problematic, as they create an ambiguity with binary * and &. Resolving that requires either an LR(2) parser or a lot of lexer magic to recognize these as postfix operators. Having them as prefix operators would be easier.

There are also other ambiguities in the grammar, some of them probably fixable (e.g., template-argument -> expression | id-expression, but if the expression is an identifier the parser cannot distinguish these cases), some of them probably unfixable without major changes to the language (e.g., template syntax, there the parser cannot parse a<b,c>d correctly without knowing if a is a template or not). I am not sure if it makes sense to open bugs for all of them. If you want I can report them, of course, but I do not want to spam the issue tracker with problems if you only consider the syntax experimental anyway. (On the other hand it is probably useful to know if the suggested syntax works or not).

[BUG] Global variable treated as local variable by initialization safety guarantee rule

The initialization safety guarantee informs that the global variable is a local variable and that it needs to be initialized before the local variable in the function.

i : int;

main: () -> int = {
    j : int;
    j = 42;
    i = 12;
}

Compile it with cppfront and we get an error:

cppfront % build/external/cppfront external/tests/global_variable.cpp2
external/tests/global_variable.cpp2...
global_variable.cpp2(5,5): error: local variable i must be initialized before j (local variables must be initialized in the order they are declared)
  ==> program violates initialization safety guarantee - see previous errors

Expected behavior: i is a global variable and (according to the error message) should not be considered in checking the order of initialization of the local variables.

[Wiki] Text unintentionally interpreted as HTML should be escaped

From Design-note:-Unambiguous-parsing.md:

-"if it can be an <earlier production>, it is.")
+"if it can be an \<earlier production>, it is.")

Currently, it looks like this:

[BUG] UFCS Not Working with Pointer Dereferncing

I was experimenting a little bit and discovered that UFCS would not work on a pointer that is being de-referenced for example:

main : () -> int = {
	life := 42;
	p := life&;

	life.someFunction();
	p*.someFunction();  // Not UFCSed
}

someFunction : (x : int) -> int = {
	// Do something useful...
}

Will emit the following Syntax1 code:

// ----- Cpp2 support -----
#define CPP2_USE_SOURCE_LOCATION Yes
#include "cpp2util.h"


#line 1 "helloworld.cpp2"
[[nodiscard]] auto main() -> int;
#line 9 "helloworld.cpp2"
[[nodiscard]] auto someFunction(cpp2::in<int> x) -> int;

//=== Cpp2 definitions ==========================================================

#line 1 "helloworld.cpp2"
[[nodiscard]] auto main() -> int{
 auto life { 42 }; 
 auto p { &life }; 

 CPP2_UFCS_0(someFunction, life);
  *p .someFunction();  // Fails to compile!!
}

[[nodiscard]] auto someFunction(cpp2::in<int> x) -> int{
 // Do something useful...
}

Which fails to compile since we are trying to call someFunction as a member of p instead of running it through the UFCS system. I'm not entirely sure if this is intended behavior or not... but either way it is a little unintuitive.

How do you forward non-last-use argument?

How do you forward a non-last-use argument? It has to be done when destructuring aggregates. std::apply does this variadically.

#include <string>

void g(auto&& x);

struct pair_t {
  std::string x, y;
};

f: (forward pair : pair_t) = {
  g(pair.x);  // Not a forward. How do I forward?
  g(pair.y);  // OK
}

[DISCUSSION] type casting and correctness

Something I haven't seen addressed or talked about is how cppfront or the new cpp2 syntax will handle type casting. When I write code, I try to be very explicit and specific with my types and that includes adding a ton of casts. I also do thing such as turn on -pendaic and -Werror to catch any type casts that I might have missed to do.

Is there any proposal to make casting easier and simpler?

E.g., I've never understood why I might get a warning/error from a compiler if I do this:

int32_t a = 0;
int64_t b = a;

I can understand the cast from 64 -> 32 bits, since there is a loss in precision. But not from going 32 -> 64 bits since there is a growth in the precision.

Talking about the syntax for the moment, I've also had my share of colleagues who don't like writing out static_cast<> because they thing "It's too long to type out when just a simple C-style cast suffices.". Is there a syntactical change that could be make to make proper casting easier?. E.g. reduce static_cast<> to scast<>?

I'm also not sure if the names of static_cast, dynamic_cast, reinterpret_cast and const_cast are that great, as when I first was learning C++ a decade+ ago I had trouble trying to understand the meaning of the assigned names.

[BUG] Short function syntax with multiple return values generates bad code.

Having a single expression function that uses multiple return values syntax compiles to bad cpp1 code.

fun: () -> (i:int) = i = 42;

Steps

tests % cppfront ufcs.cpp2 
ufcs.cpp2... ok (all Cpp2, passes safety checks)

tests % clang++ -std=c++20 -I../cppfront/include ufcs.cpp
tests/ufcs.cpp2:24:47: error: use of undeclared identifier 'i'
[[nodiscard]] auto fun() -> fun__ret { return i.construct(42); }
                                              ^
1 error generated.

Expected result

It should fail on cppfront or compile to correct code like:

[[nodiscard]] auto fun() -> fun__ret{
  cpp2::deferred_init<int> i;
  i.construct(42);
  return  { std::move(i.value()) }; 
}

Actual result/error

cppfront generates bad code that fails to compile:

[[nodiscard]] auto fun() -> fun__ret { return i.construct(42); }

Compiler version

tests % clang++ --version
Apple clang version 14.0.0 (clang-1400.0.29.102)
Target: arm64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Potential design flaw with unified function calls?

I really liked the fprintf demo in the cppcon talk, but either I'm missing something (more likely) or it simply cannot work (sad).
In cpp1+cpp2 mode, let's assume we have this code:

struct S
{
    int a;
};

get: (s: S) -> int = s.a;

Now we can call both:

a := get(s)
b := s.get()

So far so good. But let's say that S is actually defined in some library's header file, and the author decides it would be a good idea to add a member callable, be it a member function, lambda, std::function, function pointer, or whatever, that is also called get.

In cpp1, that would be totally fine. In cpp2 however, it seems to me that it would break everything. Even if you add a rule saying that a member callable takes precedence over the one we defined in the first bit of code, that would still mean that b's assignment now calls a different function, without us changing any of our code, only because of a library update that would be a-OK in cpp1. Backwards-compatibility broken, silently, and unexpected behavior that will be very fun to debug, I'm sure.

But it gets worse: if we define get as a template instead:

get<T>: (t: T) -> int = ...;

part of the design goals as far as I understand is making cpp2 more toolable. But without concepts or something, Intellisense would probably show this as a member function of every single variable in your code.

Oh, and what about more basic stuff, like factorial? Cause this syntax looks kinda weird imo:

fact: (n: int) -> int = n ? (n * fact(n - 1)) : 1;

main: () -> int = {
    std::cout << 5.fact();  //do I really want Intellisense to recommend every function that takes an integer whenever I type an integer?
}

I don't know, maybe I'm missing something, but it seems like something to think about for sure.

regression test: pure2-type-safety segfaults on linux gcc11

Hi, thanks for sharing this exciting experiment! So far, this rekindles my joy of writing c++ :D

I'm facing some issues with gcc11 and some regression tests:

For sanity, hello world works:

hlindstrom ~/devel/cppfront (main)$ cppfront regression-tests/pure2-hello.cpp2 
regression-tests/pure2-hello.cpp2... ok (all Cpp2, passes safety checks)

hlindstrom ~/devel/cppfront (main)$ g++-11 -I/$HOME/devel/cppfront/include regression-tests/pure2-hello.cpp -std=c++20
hlindstrom ~/devel/cppfront (main)$ ./a.out 
Hello [world]

However, pure2-type-saftey emits:

hlindstrom ~/devel/cppfront (main)$ g++-11 -I/$HOME/devel/cppfront/include regression-tests/pure2-type-safety-1.cpp -std=c++20
pure2-type-safety-1.cpp2: In function ‘void print(cpp2::in<std::__cxx11::basic_string<char> >, cpp2::in<bool>)’:
pure2-type-safety-1.cpp2:34:23: error: ‘setw’ is not a member of ‘std’
In file included from regression-tests/pure2-type-safety-1.cpp:2:
//home/hlindstrom/devel/cppfront/include/cpp2util.h: In instantiation of ‘void cpp2::deferred_init<T>::construct(auto:45&& ...) [with auto:45 = {const char (&)[5]}; T = const char*]’:
pure2-type-safety-1.cpp2:32:28:   required from here
//home/hlindstrom/devel/cppfront/include/cpp2util.h:392:82: warning: placement new constructing an object of type ‘const char*’ and size ‘8’ in a region of type ‘std::aligned_storage<8, 8>’ and size ‘1’ [-Wplacement-new=]
  392 |     auto construct     (auto&& ...args) -> void { Default.expects(!init);  new (&data) T(std::forward<decltype(args)>(args)...);  init = true; }
      |                                                                                  ^~~~
//home/hlindstrom/devel/cppfront/include/cpp2util.h:380:49: note: ‘cpp2::deferred_init<const char*>::data’ declared here
  380 |     std::aligned_storage<sizeof(T), alignof(T)> data;
      |                                                 ^~~~
//home/hlindstrom/devel/cppfront/include/cpp2util.h: In instantiation of ‘void cpp2::deferred_init<T>::construct(auto:45&& ...) [with auto:45 = {const char (&)[6]}; T = const char*]’:
pure2-type-safety-1.cpp2:33:26:   required from here
//home/hlindstrom/devel/cppfront/include/cpp2util.h:392:82: warning: placement new constructing an object of type ‘const char*’ and size ‘8’ in a region of type ‘std::aligned_storage<8, 8>’ and size ‘1’ [-Wplacement-new=]
  392 |     auto construct     (auto&& ...args) -> void { Default.expects(!init);  new (&data) T(std::forward<decltype(args)>(args)...);  init = true; }
      |                                                                                  ^~~~
//home/hlindstrom/devel/cppfront/include/cpp2util.h:380:49: note: ‘cpp2::deferred_init<const char*>::data’ declared here
  380 |     std::aligned_storage<sizeof(T), alignof(T)> data;
      |                                                 ^~~~

Adding

#include <iomanip>

to the generated cpp file fixes the error but keeps the warnings.

Running the test segfaults:

Reading symbols from a.out...
(gdb) r
Starting program: /home/hlindstrom/devel/cppfront/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
                              d is int? false
*** stack smashing detected ***: terminated

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737348137920) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) threads 
Undefined command: "threads".  Try "help".
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737348137920) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737348137920) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737348137920, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7b71476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7b577f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7bb86f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7d0a943 "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007ffff7c6576a in __GI___fortify_fail (msg=msg@entry=0x7ffff7d0a92b "stack smashing detected") at ./debug/fortify_fail.c:26
#7  0x00007ffff7c65736 in __stack_chk_fail () at ./debug/stack_chk_fail.c:24
#8  0x0000555555557840 in print (msg=..., b=false) at /home/hlindstrom/devel/cppfront/pure2-type-safety-1.cpp2:35
#9  0x0000555555558189 in test_generic<double> (x=@0x7fffffffd708: 3.1400000000000001) at /home/hlindstrom/devel/cppfront/pure2-type-safety-1.cpp2:27
#10 0x0000555555557627 in main () at /home/hlindstrom/devel/cppfront/pure2-type-safety-1.cpp2:8

Discussion: Allowing "unsafe" features in explicit contexts

While I do understand and support the drive to avoid using unsafe features such as raw unions, pointer arithmetic, unsafe casts, etc. It is sometimes necessary to use these features, for example, when interfacing with low-level code, hardware, or simply implementing a higher-level structure (ex. a custom memory allocator or a hash table). For example, you may want to store trivial CPU-specific types in a union for intrinsics. While there are ways to replace unions and pointer arithmetic with variant and friends, in these low-level tightly controlled situations they either are not applicable, or will require more overhead than might be necessary.

For these situations, you would have to implement your low-level code in syntax1, then use it in you syntax2 code. This might become very tedious and complicated, especially when you need to use the unsafe code in a header, you'd have to split your implementation in syntax1/syntax2 headers and would have something akin to my_table.hpp2 and my_table.hpp, with the user only being allowed to include the hpp2 file. Which, i think you'd aggree would just complicate design and maintnance.

As such, why not enable the developer to explicitly allow unsafe code in certain situations, such as via an unsafe keyword applied to a structures, functions, variables or code blocks. Code and data tagged as unsafe would then be allowed to use unions, pointer arithmetic, and other low-level unsafe features, without having to separate safe/unsafe code among multiple files.

This way you can implement your embeded binary, hardware driver, custom heap allocator and whatever need be in syntax2 files, while using "modern" and safe code, and only drop down to unsafe context in very few situations with surgical precision.

[BUG] cpp2::deferred_init does not accept initialisation by external function

The local variables can be initialized by external functions that take out argument. cpp2::deferred_init treat it as violation of initialization safety and reports error:

error: local variable i is used before it was initialized

fun: () -> (i:int) = {
    init(i);
    return;
}

init: (out i : int) = {
    i = 42;
}

After trying to compile that with cppfront we got:

cppfront % build/external/cppfront external/tests/defered.cpp2
external/tests/defered.cpp2...
defered.cpp2(2,10): error: local variable i is used before it was initialized
  ==> program violates initialization safety guarantee - see previous errors

I expect cppfront accepts the functions that have out argument as proper initialization of deferred_init variables (assuming that there is a check that out argument, inside the function, has an assignment operation done on it).

[BUG] Correct #line directives to point to cpp2 source when output option is used

After introducing output option the #line directives points to the output file path instead of cpp2 source path.

Cpp2 source file

i : int = 42;

Steps

1. build/external/cppfront external/tests/easy.cpp2 -o build/easy.cpp

build/easy.cpp content

// ----- Cpp2 support -----
#include "cpp2util.h"


#line 1 "build/easy.cpp2"

//=== Cpp2 definitions ==========================================================

#line 1 "build/easy.cpp2"
    int i { 42 };

Error & expected behaviour

#line directive points to file build/easy.cpp2 that does not exists. The expected path is external/tests/easy.cpp2

Expected file content:

// ----- Cpp2 support -----
#include "cpp2util.h"


#line 1 "external/tests/easy.cpp2"

//=== Cpp2 definitions ==========================================================

#line 1 "external/tests/easy.cpp2"
    int i { 42 };

How to build with Mac Clang 12?

Hi,

I tried to build this awesome project, but got disappointing error. Would you be able to help me?

cppfront 
❯ clang source/cppfront.cpp -std=c++20 -o cppfront
In file included from source/cppfront.cpp:18:
In file included from source/sema.h:21:
source/parse.h:150:5: error: use of class template 'String' requires template arguments; argument deduction not allowed in template parameter
    String   Name,
    ^~~~~~
source/common.h:211:8: note: template is declared here
struct String
       ^
In file included from source/cppfront.cpp:18:
In file included from source/sema.h:21:
source/parse.h:201:64: error: value of type 'const char [6]' is not implicitly convertible to 'int'
using is_as_expression_node          = binary_expression_node< "is-as"          , prefix_expression_node         >;
                                                               ^~~~~~~
source/parse.h:202:83: error: unknown type name 'is_as_expression_node'; did you mean 'id_expression_node'?
using multiplicative_expression_node = binary_expression_node< "multiplicative" , is_as_expression_node          >;
                                                                                  ^~~~~~~~~~~~~~~~~~~~~
                                                                                  id_expression_node
source/parse.h:110:8: note: 'id_expression_node' declared here
struct id_expression_node;
       ^
source/parse.h:202:64: error: value of type 'const char [15]' is not implicitly convertible to 'int'
using multiplicative_expression_node = binary_expression_node< "multiplicative" , is_as_expression_node          >;
                                                               ^~~~~~~~~~~~~~~~
source/parse.h:203:83: error: use of undeclared identifier 'multiplicative_expression_node'
using additive_expression_node       = binary_expression_node< "additive"       , multiplicative_expression_node >;
                                                                                  ^
source/parse.h:204:83: error: use of undeclared identifier 'additive_expression_node'
using shift_expression_node          = binary_expression_node< "shift"          , additive_expression_node       >;
                                                                                  ^
source/parse.h:205:83: error: unknown type name 'shift_expression_node'; did you mean 'id_expression_node'?
using compare_expression_node        = binary_expression_node< "compare"        , shift_expression_node          >;
                                                                                  ^~~~~~~~~~~~~~~~~~~~~
                                                                                  id_expression_node
source/parse.h:110:8: note: 'id_expression_node' declared here
struct id_expression_node;
       ^
source/parse.h:205:64: error: value of type 'const char [8]' is not implicitly convertible to 'int'
using compare_expression_node        = binary_expression_node< "compare"        , shift_expression_node          >;
                                                               ^~~~~~~~~
source/parse.h:206:83: error: use of undeclared identifier 'compare_expression_node'
using relational_expression_node     = binary_expression_node< "relational"     , compare_expression_node        >;
                                                                                  ^
source/parse.h:207:83: error: use of undeclared identifier 'relational_expression_node'
using equality_expression_node       = binary_expression_node< "equality"       , relational_expression_node     >;
                                                                                  ^
source/parse.h:208:83: error: use of undeclared identifier 'equality_expression_node'; did you mean 'binary_expression_node'?
using bit_and_expression_node        = binary_expression_node< "bit-and"        , equality_expression_node       >;
                                                                                  ^~~~~~~~~~~~~~~~~~~~~~~~
                                                                                  binary_expression_node
source/parse.h:153:8: note: 'binary_expression_node' declared here
struct binary_expression_node
       ^
source/parse.h:208:83: error: use of undeclared identifier 'equality_expression_node'
using bit_and_expression_node        = binary_expression_node< "bit-and"        , equality_expression_node       >;
                                                                                  ^
source/parse.h:209:83: error: use of undeclared identifier 'bit_and_expression_node'; did you mean 'binary_expression_node'?
using bit_xor_expression_node        = binary_expression_node< "bit-xor"        , bit_and_expression_node        >;
                                                                                  ^~~~~~~~~~~~~~~~~~~~~~~
                                                                                  binary_expression_node
source/parse.h:153:8: note: 'binary_expression_node' declared here
struct binary_expression_node
       ^
source/parse.h:209:83: error: use of undeclared identifier 'bit_and_expression_node'
using bit_xor_expression_node        = binary_expression_node< "bit-xor"        , bit_and_expression_node        >;
                                                                                  ^
source/parse.h:210:83: error: use of undeclared identifier 'bit_xor_expression_node'; did you mean 'binary_expression_node'?
using bit_or_expression_node         = binary_expression_node< "bit-or"         , bit_xor_expression_node        >;
                                                                                  ^~~~~~~~~~~~~~~~~~~~~~~
                                                                                  binary_expression_node
source/parse.h:153:8: note: 'binary_expression_node' declared here
struct binary_expression_node
       ^
source/parse.h:210:83: error: use of undeclared identifier 'bit_xor_expression_node'
using bit_or_expression_node         = binary_expression_node< "bit-or"         , bit_xor_expression_node        >;
                                                                                  ^
source/parse.h:211:83: error: use of undeclared identifier 'bit_or_expression_node'; did you mean 'binary_expression_node'?
using logical_and_expression_node    = binary_expression_node< "logical-and"    , bit_or_expression_node         >;
                                                                                  ^~~~~~~~~~~~~~~~~~~~~~
                                                                                  binary_expression_node
source/parse.h:153:8: note: 'binary_expression_node' declared here
struct binary_expression_node
       ^
source/parse.h:211:83: error: use of undeclared identifier 'bit_or_expression_node'
using logical_and_expression_node    = binary_expression_node< "logical-and"    , bit_or_expression_node         >;
                                                                                  ^
source/parse.h:212:83: error: use of undeclared identifier 'logical_and_expression_node'
using logical_or_expression_node     = binary_expression_node< "logical-or"     , logical_and_expression_node    >;
                                                                                  ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.

clang

❯ clang --version
Apple clang version 12.0.5 (clang-1205.0.22.11)
Target: x86_64-apple-darwin20.4.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

[BUG] parens are not preserved when used for calculating argument value

Parens disappear when used inside function call to calculate value of argument.

Given cpp2 code:

f: (x:_) = {}

main: () -> int = {
    i := 2;
    f((i+(i+1)*2)/2);
}

When compiled with cppfront we get (I am skipping the boilerplate)

[[nodiscard]] auto main() -> int{
    auto i { 2 }; 
    f( i + i + 1 * 2 / 2);
}

all parens inside the function call are gone. (i+(i+1)*2)/2 is not equal i + i + 1 * 2 / 2.

The expectation is that all parens used to calculate arguments will last cppfront compilation.

[SUGGESTION] omit return where it is clear from the definition

On slide 74 in https://github.com/CppCon/CppCon2022/blob/main/Presentations/CppCon-2022-Sutter.pdf we have the example of initialization safety by naming the return values already in the definition

f:() -> (i: int, s: std::string) = {
  i = 1;
  s = "x";

  return;
}

We basically know this function returns i and s - and we guarantee initialization in the code with Cpp2 anyway.

Can't we just omit the empty return; then? What is the value of having to write return?

[question] ABI?

Not sure if you want these types of discussions here...but here goes nothing. I really like a lot about this effort. However, unless I'm missing something, it does not address the problem of ABI breaks in C++. Now, fixing many problems is better than fixing no problems...but it seems like a solution that will (paraphrasing) "keep C++ relevant for the next 30 years" needs to solve that problem. Will cpp2 introduce anything to address that...or does it ignore it?

Discussion: Defer Control Flow

The C stdio UFCS example got me thinking about a defer control flow construct which would run the statement or block after it when the current scope would be exited. This would allow the the C stdio example to be rewritten as:

main: () -> int = {
    s: std::string = "Fred";
    myfile := fopen("xyzzy", "w");
    defer myfile.fclose(); // Run at the end of the scope
    myfile.fprintf( "Hello %s with UFCS!", s.c_str() );
}

Which doesn't do much in this example, but in more complicated cases where error handling gets involved it could become useful. I know RAII wrappers are the "idiomatic" C++ solution to this problem, but there are times (especially while prototyping) where writing a wrapper is unnecessarily clunky. In the past I have used some macro and lambda wizardry along with some scope guard libraries to implement almost this exact functionality in Syntax1. Thus implementing backend support for this type of feature in the cpp2 supplemental library should be straightforward...

But with all of this factors considered, would this sort of feature even be worth including in Syntax2? Or is it just another unnecessary feature to steal from Go for the sake of stealing it?

Multiple `in` arguments will generate exponential number of calls

The D0708/A.4 Section "Multiple in and/or forward in the same definite last use" section has wording for avoiding an exponential number of function calls, but I don't think it does that.

In cpp2:

void func(auto&& w, auto&& x, auto&& y, auto&& z);

f: (in w : std::string, in x : std::string, in z : std::string, in w : std::string) = {
  func(w, x, y, z);
}

I would expect a 16-way switch here, as each of the four invented parameters of func will be deduced as either std::string or const std::string&. What is the utility of the four bullet-point subsumption language? You have to perform OR for each combination of argument value categories, and you're necessarily locked into the bad brute-force implementation.

[BUG] Unable to use enums in inspect expressions - `no matching function for call to 'is'`

The current implementation is not supporting enums in inspect expressions.

This cpp2 code

enum class lexeme : std::uint8_t {
    hash,
};

to_string: (e:lexeme) -> auto = {
    return inspect (e) -> std::string {
        is lexeme::hash = "hash";
        is _ = "INTERNAL_ERROR";
    };
}

compiles by cppfront to (skipping enum declaration):

[[nodiscard]] auto to_string(cpp2::in<lexeme> e) -> auto{
    return [&] () -> std::string { auto&& __expr = (e);
        if (cpp2::is<lexeme::hash>(__expr)) { if constexpr( requires{"hash";} ) if constexpr( std::is_convertible_v<CPP2_TYPEOF("hash"),std::string> ) return "hash"; else return std::string{}; else return std::string{}; }
        else return "INTERNAL_ERROR"; }
    ()
; }

and fails to compile with the cpp1 compiler with the error:

main.cpp2:26:13: error: no matching function for call to 'is'
[build]         if (cpp2::is<lexeme::hash>(__expr)) { if constexpr( requires{"hash";} ) if constexpr( std::is_convertible_v<CPP2_TYPEOF("hash"),std::string> ) return "hash"; else return std::string{}; else return std::string{}; }
[build]             ^~~~~~~~~~~~~~~~~~~~~~
[build] /Users/filipsajdak/dev/execspec/external/cppfront/include/cpp2util.h:517:6: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'C'

The issue is caused that enums cannot match to any provided is function overloads. To handle that we need special overload for enums.

[BUG] Expected outputs in tests need to be updated

Describe the bug
Two expected outputs are stale

mixed-function-expression-with-pointer-capture
mixed-test-parens

To Reproduce
See the failing workflow run here.

modern-cmake/cppfront#9

Additional context
The workflow doesn't show the diff of the failure, yet, unfortunately.

Possible oversight with postfix "type" keyword?

I have a bit of a concern about the postfix type and namespace keywords, both about consistency and implementation caveats.

The main gist of my concern is that with postfix type keyword, a type's parents cannot be defined as they are in syntax 1, as column is being used for the keyword.

Of course it can be done via Java-style extends (or similar) keyword. Or via a second column after the type. But that just seems like introducing visual noise into the syntax for the sake of having a postfix keyword.

Another concern is how would template declarations and concept requirements work, would the template/requires keywords also come after the column like this?
my_type : template<typename T> requires (X) type
or like this
my_type : type template<typename T> requires (X)
This, at least to me, seems a bit noisy as well.

And finally - consistency: this way there will be prefix keywords (i.e. if, for, etc.), postfix keywords without a column (ex. noexcept) and postfix keywords with a column (type). This, again, seems to me like additional syntactical noise for the sake of "using the shiny postfix".

Are there any valid reasons for the postfix keywords for types/namespaces that i just dont see? How would inheritance, templates, etc. work with the postfix keywords?

[BUG] Aggressively consuming copied variables

The issue is that the last use detection of variables doesn't take into account loops. If the last use is detected inside a loop body it will apply std::move() and it will be consumed on the first iteration.

wreakit: (copy s: std::string) -> std::string = {
    wreak: std::string = "it";
    i: int = 0;
    while i < 2 next i++ {
        wreak = s;
    }
    return wreak;
}

Generates the code:

[[nodiscard]] auto wreakit(std::string s) -> std::string{
    std::string wreak { "it" }; 
    int i { 0 }; 
    for( ; i < 2; ++i ) {
        wreak = std::move(s);
    }
    return wreak; 
}

Expected value returned by wreakit("ralph"); would be "ralph" but actual value is "";

[BUG] cpp2::to_string(std::optional<T>) is calling std::to_string

Specialisation of cpp2::to_string function for std::optional is using std::to_string on optional value.

template<typename T>
auto to_string(std::optional<T> const& o) -> std::string {
    if (o.has_value()) {
        return std::to_string(o.value());
    }
    return "(empty)";
}

That makes the below code fail to compile:

v : std::optional<std::string> = "this will not work";
std::court << "(v)$" << std::endl;

cppfront will generate

std::optional<std::string> v { "this will not work" }; 
std::cout << "My map: " + cpp2::to_string(v) + "\n" << std::endl;

The result is that there is a call to std::to_string(std::string) and there is no such specialization in the standard library.

Scoping and overloading

Hi Mr. Sutter. Really loved the talk (finished watching today) and appreciate your work on the language and C++ in general.

About 3 minutes ago I just got off of a Skype call with my brother. He's still new to C++, whereas I've been with the language for around 14-ish years. He was having some trouble with a segfault in a very simple project and it took me a while to figure out what the exact issue was. This is definitely something I've shot myself in the foot with quite a bit; or at least it created all sorts of headaches when it came to debugging.

In short, here's what we had up:

ComplexWidget.hpp:

class CompexWidget : public QWidget {

public:
    ComplexWidget();

private:
    QLineEdit *m_numberValue;
};

ComplexWidget.cpp:

ComplexWidget::ComplexWidget() : ...
{
    QLineEdit *m_numberValue = new QLineEdit("");
}

...

void ComplexWidget::onButtonClicked()
{
    int n = m_numberValue->text().toInt();
}

Whenever he pressed his button, hoping to get the value from m_numberValue and parse it to an integer, we were treated with a segmentation fault. It wasn't until some debugging did I spot what the true problem was. In the constructor, he re-declared m_numberValue. I told him to get rid of the QLineEdit * in the constructor and then his program worked, with no segmentation faults. This actually took me a little bit to hunt down as well. Not once, when we were compiling, was there any warning about this. This is an issue related to scoping.

As for what to do to prevent this, making things safer, my proposal is: Don't allow programmers to re-declare variable names. Give a compilation error instead. I'm sure some other rules could be added to this.

[SUGGESTION] in cpp2 mode, string literals should be string_literal by default

Raw pointers are fundamentally problematic if we have to do pointer arithmetic or offset based access. A major source for pointers, that we inherited from C, are strings. There are safer alternatives to C strings (i.e., string_view), but they are not used by default.
We still need classic C strings for compatibility reasons, but the default should be a safe construct.

Thus, I would suggest that a string literal in cpp2 mode is considered a string_view by default. Traditional C strings could be constructed by using, e.g., the c suffix:

cpp2: "abc" -> cpp: "abc"sv
cpp2: "abc"c -> cpp: "abc"

That would make string handling much cleaner and safer. Compatibility with existing code is a concern, but as cpp2 code is by definition new anyway, we can always add the 'c' suffix if needed to get a traditional C string.

Discussion: variables const by default

Have you considered making variables const by default? I have seen it float around as a recommendation and personally am of the "const everything" camp. It encourages more careful code, catches many errors and makes it more readable (if I see same variable later down the line, I know it has the same value, don't need to check every line in between).

a: int = 3;
a = 5;  // error! implicit const

b: mutable int = 3;
b = 5;  // fine

[BUG] `std const::string`

main : () -> int = {
        i : const int = 0;
        s : const std::string = "Hello, world!";
}

gets turned into

[[nodiscard]] auto main() -> int{
 int  const i { 0 }; 
 std const::string s { "Hello, world!" }; 
}

[BUG] Lambda that use capture `i$` with lambda argument `n` inside if statement failed with assertion

Using capture and if failed if done in some order.

In this cpp2 code cppfront fail on if i$ + n < std::ssize(line$):

lex_process_line: (line: std::string) = {
    i := 0;
    peek := :(n : int) -> char = {
        if i$ + n < std::ssize(line$) {} // it triggers assertion
        return '\0';
    };
}

https://godbolt.org/z/bTvzW4P6z

On Compiler Explorer it fails with the error:

cppfront: source/cppfront.cpp:1561: void cpp2::cppfront::emit(cpp2::postfix_expression_node&, bool): Assertion `n.expr' failed.
Compiler returned: 139

On my setup, it fails with the error:

Assertion failed: (cap.capture_expr->cap_grp == &captures), function build_capture_lambda_intro_for, file cppfront.cpp, line 1495.

The issue can be solved by changing the order of variables in the if statement:

lex_process_line: (line: std::string) = {
    i := 0;
    peek := :(n : int) -> char = {
        if n + i$ < std::ssize(line$) {} // works
        return '\0';
    };
}

https://godbolt.org/z/h8PGEs39K

by using parentheses:

lex_process_line: (line: std::string) = {
    i := 0;
    peek := :(n : int) -> char = {
        if (i$ + n) < std::ssize(line$) {} // works
        return '\0';
    };
}

https://godbolt.org/z/4v7YxevY7

Replacing n with the constant also works:

lex_process_line: (line: std::string) = {
    i := 0;
    peek := :(n : int) -> char = {
        if i$ + 2 < std::ssize(line$) {} // works
        return '\0';
    };
}

https://godbolt.org/z/hcefPbers

My expectation is that cppfront does not crash. It should work with various orders of variables or should show the error message if there is some preferred order.

[SUGGESTION] Add a -o flag to specify the output file

Currently cppfront seems to write its output .cpp file next to the input .cpp2 file. This makes it difficult to integrate with setups that have separate build and source directories. Please add a -o command line argument so the output path can be specified from the outside.

Typo in `is`

I am sure the code in

cppfront/include/cpp2util.h

Lines 539 to 543 in fd0c99d

    
           template< typename C, typename X > 
        
               requires (std::is_base_of_v<X, C> && !std::is_same_v<C,X>) 
        
           auto is( X const* x ) -> bool { 
        
               return dynamic_cast<C const&>(x) != nullptr; 
        
           }

should cast to a pointer rather than to a reference:

  return dynamic_cast<C const*>(x) != nullptr;

[BUG] Negation not working on bools and functions that return bools

Exclamation signs disappear when compiled by cppfront.

Cpp2 code:

main: () -> int = {
    b : bool = false;

    c := !b;

    while !b || !is_true() {
        b = !true;
    }
}

is_true: () -> bool = false;

after compilation with cppfront (I will limit the code to line that matters)

[[nodiscard]] auto main() -> int{
    bool b { false }; 

    auto c { b }; 

    while( b ||  is_true() ) {
        b = true;
    }
}

All exclamation signs disappeared - no way to negate the boolean expression.

Expected result:

[[nodiscard]] auto main() -> int{
    bool b { false }; 

    auto c { !b }; 

    while( !b ||  !is_true() ) {
        b = !true;
    }
}

Enable discussions?

Just a suggestion to consider enabling the GitHub Discussions feature on this cppfront repo.

A stated project aim is to start a discussion, this could be a quick way to kickstart things. It could be convenient for now. The Issues tab has a different connotation and feels a barrier to contributions that aim at a collaborative conversation.

(I don't have much experience of Discussions pros and cons.)

Discussion: class member functions should be const by default

I know that class are still not supported on cppfront; but this makes the discussion even more relevant.

One of the things that lambda functions (arguably) get right is making their operator() const by default and requiring mutable if you want to change data from the lambda.

This should be done also for cpp2 classes. This has the advantages of (1) making lambdas less of a special case, and (2) making const-correct code the default.

This is kind of related to issue #25 .

[BUG] Add support for defining pointer to pointers.

Hi, I am very interested in this project's future!

Cppfront fails to compile this code

main: () -> int = {
    a:     int = 2;
    pa:   *int = a&;
    ppa: **int = pa&;
    return a*pa**ppa**; // 8
}

As of now, ppa: **int = pa&; does not work.

[BUG] the grammar comments are not always correct

Out of curiosity I have an implemented an alternative parser for cppfront / cpp2, which uses a PEG grammar as input for a parser generator. During that experiment, I noticed that the grammar rules embedded as //G comments are not always correct. I will list errors that I noticed below.

One preliminary note: The cppfront compiler has a rather relaxed concept of keywords. In most cases it will accept a keyword were an identifier is expected, for example it will happily compile if: () -> void = { }. I don't think that is a good idea, my grammar explicitly distinguishes between keywords and identifiers. (Modulo the few context specific soft-keywords like in/out etc.). For some grammar rules that requires changes were the parser previously worked by accident (i.e, by not recognizing a certain keyword).

a) id_expression

    //G id-expression
    //G     unqualified-id
    //G     qualified-id
    //G

here the order is wrong, it should be

    //G id-expression
    //G     qualified-id
    //G     unqualified-id    
    //G

b) primary_expression

    //G primary-expression:
    //G     literal
    //G     ( expression-list )
    //G     id-expression
    //G     unnamed-declaration
    //G     inspect-expression
    //G

this does not correspond to the source code order. Furthermore, the expression-list is optional. And if we distinguish keywords from literals we potentially need some extra rules to handle keywords that are currently silently eaten as identifier. I would suggest

    //G primary-expression:
    //G     inspect-expression
    //G     id-expression
    //G     literal
    //G     '(' expression-list? ')'
    //G     unnamed-declaration
    //G     'nullptr'
    //G     'true'
    //G     'false'
    //G     'typeid' '(' expression ')'
    //G     'new' < id-expression > '(' expression-list? ')'

c) nested-name-specifier

    //G nested-name-specifier:
    //G     ::
    //G     unqualified-id ::

this has to support nested scopes. I would suggest

    //G nested-name-specifier:
    //G     :: (unqualified-id ::)*
    //G     (unqualified-id ::)+

d) template-argument

    //G template-argument:
    //G     expression
    //G     id-expression

There should be a comment here that we disable '<'/'>'/'<<'/'>>' in the expressions until a new parentheses is opened. In fact that causes some of the expression rules to be cloned until we reach the level below these operators. (In my implementation these are the rules with suffix _no_cmp).

e) id-expression from fundamental types

We want to accept builtin types like int as type ids. Currently this works by accident because the parser does not even recognize these as keywords. When enforcing that keywords are not identifiers we need rules for these, too. I have added a fundamental-type alternative at the end of id-expression, and have defines that as follows:

fundamental-type
  'void'
  fundamental-type-modifier_list? 'char'
  'char8_t'
  'char16_t'
  'char32_t'
  'wchar_t'
  fundamental-type-modifier-list? 'int'
  'bool'
  'float'
  'double'
  'long' 'double'
  fundamental-type-modifier-list

fundamental-type-modifier-list
  fundamental-type-modifier+

fundamental-type-modifier
  'unsigned'
  'signed'
  'long'
  'short'

[BUG] using N:namespace = ... suppress creation of struct for multiple return values

I know that namespaces are not supported yet by cppfront. Unfortunately, it can be added following the l-to-r approach, it compiles by cppfront but fails to compile by cpp1 compiler.

N: namespace = fun: () -> (a:int) = {
    a = 42;
    return;
}

Cppfront compiles fine but generates code that will not compile as it doesn't have a struct definition for multiple return values.

// ----- Cpp2 support -----
#include "cpp2util.h"



//=== Cpp2 definitions ==========================================================

   namespace N { [[nodiscard]] auto fun() -> fun__ret{
                       cpp2::deferred_init<int> a;
    a.construct(42);
    return  { std::move(a.value()) }; 
} };

https://godbolt.org/z/vr51TExGG

No definition of fun__ret.

Expected behavior: cppfront forbids namespaces/struct/class to be used like that or generates correct code.

Caveat: There's little else in the C stdlib that allocates a resource

@hsutter regarding your question:

cppfront/include/cpp2util.h

Line 809 in bf5998a

    
           //  ... is that it? I don't think it's useful to provide a c_raii just for fopen,

I am currently trying to use cppfront to build my project (that's why I send PRs - I am implementing what I am missing).

I use something like your c_raii (I call it scope_exit and I have a function on_scope_exit that produces it) for handling popen (here I am using UFCS chaining from here: #18):

execute: (cmd : std::string) -> auto = {
    std_out := popen(cmd.c_str(), "r").on_scope_exit(pclose);
    buf : std::array<char, 1024> = ();

    output : std::string = ();

    read_size := fread(buf.begin(), 1, buf.size(), std_out);
    while read_size > 0 next read_size = fread(buf.begin(), 1, buf.size(), std_out) {
        output += std::string(buf.begin(), read_size); 
    }
    return output;
}

and to remove temporary directories on scope exit

create_temporary_directory: () -> auto = {
    tmp := fs::temp_directory_path();
    build_dir := on_scope_exit(tmp / "md_tests/build", :(p : fs::path) = { 
        fs::remove_all(p); 
        std::cout << "removed tmp directory: (p)$" << std::endl;
    } );
    fs::create_directories(build_dir);
    std::cout << "created tmp directory: (fs::path(build_dir))$" << std::endl;
    return build_dir;
}

My current implementation of on_scope_exit():

template <typename T, typename D>
struct scope_exit {
    scope_exit(T v, D d) 
        : value(v)
        , deleter(d)
    {
    }

    ~scope_exit() {
        deleter(value);
    }

    operator T&() { return value; }

private:
    T value;
    D deleter; 
};

on_scope_exit: (forward v : _, forward d : _) -> auto = {
    return scope_exit(v,d);
}

I will check your implementation and will let you know.

[BUG] UFCS doesn't work with class/struct member variables

UFCS is not triggered when a function is called as a class method on a class member variable.

main: () -> int = {
    p := std::pair(1,2);
    p.first.ufcs(); // bad! compiles to p.first.ufcs() instead of CPP2_UFCS_0(ufcs, p.first)
}

The current implementation handle only member function calls syntax on one variable. Below are some examples of what works and what is not working + what changes with #18.

main: () -> int = {
    i := 42;
    i.ufcs();              // works

    j := fun();
    j.i.ufcs();            // doesn't work

    fun().i.ufcs();        // doesn't work

    k := fun().i;
    k.ufcs();              // works

    get_i(j).ufcs();       // works with https://github.com/hsutter/cppfront/pull/18

    get_i(fun()).ufcs();   // works with https://github.com/hsutter/cppfront/pull/18

    res := int(42).ufcs(); // works with https://github.com/hsutter/cppfront/pull/18

    int(j.i).ufcs();       // works with https://github.com/hsutter/cppfront/pull/18
}

ufcs: (i:int) -> auto = {
    return i+2;
}

fun: () -> (i:int) = {
    i = 42;
    return;
}

get_i: (r:_) -> auto = {
    return r.i;
}

I know that UFCS is still a work in progress - I am reporting that to track that. I fall into that issue when working on my small project. I was returning variables using multiple return values syntax and I was trying to trigger UFCS on one of the member variables.

[SUGGESTION] Namespaces

I want to use namespaces in the cpp2. Do you have any plan for syntax that will be used for it?

I know that the code is identified as cpp2 by

//  Switch to cpp2 mode if we're not in a comment, not inside nested { },
//  and the line starts with "nonwhitespace :" but not "::"

Based on my current experience with cppfront I try:

N : namespace = {
    fun: (i : _) -> auto = { return i * 10; }
}

it produces

namespace N { {
    [[nodiscard]] auto fun(auto const& i) -> auto{return i * 10; }
} };

close enough - too many braces. (https://godbolt.org/z/f61h9W8Mn)

To my surprise, this worked

N : namespace = fun: (i : _) -> auto = { return i * 10; }

and produces

namespace N { [[nodiscard]] auto fun(auto const& i) -> auto{return i * 10; } };

(https://godbolt.org/z/nWWbs7d3K)

In lower_to_cpp1() it accidentally matches

//  Object with optional initializer
else if (!printer.doing_declarations_only() && n.is(declaration_node::object))

the fun part is that using the same pattern we can define a struct or a class

N : struct = i:int = 42;

that produces

struct N { int i { 42 };  };

Voilà! cppfront has class support ;) (https://godbolt.org/z/89aPas1nr)

Probably the syntax should be

N : namespace = {
    fun: (i : _) -> auto = { return i * 10; }
}

What do you think?

I am trying to introduce the change in the cppfront that enables it but I need to spend some time to understand how it works to be able to spot the right place.

PS: I didn't know what type of issue to select - sorry for making a mess.

[question] Why Cant we use the c# syntax instead of new syntax

C# syntax is already a millions of developers familiar with syntax, defiantly there are few differences, but I think it worth that one day millions of c# developers just found that they are able to code in C++

[BUG] Duplicate symbols from cpp2util.h

Describe the bug
In cpp2util.h file there are duplicated symbols of functions:

cpp2::to_string
cpp2::report_and_terminate
cpp2::fopen

To Reproduce
Steps to reproduce the behavior:

Sample code - distilled down to minimal essentials please
main.cpp2 file:

#include <iostream>

main: () -> int = {
    std::cout << "Running my new app\n" << std::endl;
}

test.cpp2 file:

#include <string>

message: () -> std::string = {
    return "hello from the cpp2";
}

Command lines including which C++ compiler you are using

src % clang++ --version
Apple clang version 14.0.0 (clang-1400.0.29.102)
Target: arm64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

src % cppfront main.cpp2 test.cpp2
main.cpp2... ok (mixed Cpp1/Cpp2, Cpp2 code passes safety checks)

test.cpp2... ok (mixed Cpp1/Cpp2, Cpp2 code passes safety checks)

src % clang++ -std=c++20 -Icppfront/include main.cpp test.cpp 
duplicate symbol 'cpp2::to_string(...)' in:
    /var/folders/52/hk4pdfpj0qx83161z_d05ckm0000gp/T/main-b0e56a.o
    /var/folders/52/hk4pdfpj0qx83161z_d05ckm0000gp/T/test-c3d2ea.o
duplicate symbol 'cpp2::report_and_terminate(std::__1::basic_string_view<char, std::__1::char_traits<char> >, char const*)' in:
    /var/folders/52/hk4pdfpj0qx83161z_d05ckm0000gp/T/main-b0e56a.o
    /var/folders/52/hk4pdfpj0qx83161z_d05ckm0000gp/T/test-c3d2ea.o
duplicate symbol 'cpp2::fopen(char const*, char const*)' in:
    /var/folders/52/hk4pdfpj0qx83161z_d05ckm0000gp/T/main-b0e56a.o
    /var/folders/52/hk4pdfpj0qx83161z_d05ckm0000gp/T/test-c3d2ea.o
duplicate symbol 'cpp2::to_string(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)' in:
    /var/folders/52/hk4pdfpj0qx83161z_d05ckm0000gp/T/main-b0e56a.o
    /var/folders/52/hk4pdfpj0qx83161z_d05ckm0000gp/T/test-c3d2ea.o
ld: 4 duplicate symbols for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Expected result - what you expected to happen
A program should link and produce executable
Actual result/error

duplicate symbol 'cpp2::to_string(...)' in:
    /var/folders/52/hk4pdfpj0qx83161z_d05ckm0000gp/T/main-b0e56a.o
    /var/folders/52/hk4pdfpj0qx83161z_d05ckm0000gp/T/test-c3d2ea.o
duplicate symbol 'cpp2::report_and_terminate(std::__1::basic_string_view<char, std::__1::char_traits<char> >, char const*)' in:
    /var/folders/52/hk4pdfpj0qx83161z_d05ckm0000gp/T/main-b0e56a.o
    /var/folders/52/hk4pdfpj0qx83161z_d05ckm0000gp/T/test-c3d2ea.o
duplicate symbol 'cpp2::fopen(char const*, char const*)' in:
    /var/folders/52/hk4pdfpj0qx83161z_d05ckm0000gp/T/main-b0e56a.o
    /var/folders/52/hk4pdfpj0qx83161z_d05ckm0000gp/T/test-c3d2ea.o
duplicate symbol 'cpp2::to_string(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)' in:
    /var/folders/52/hk4pdfpj0qx83161z_d05ckm0000gp/T/main-b0e56a.o
    /var/folders/52/hk4pdfpj0qx83161z_d05ckm0000gp/T/test-c3d2ea.o
ld: 4 duplicate symbols for architecture arm64

Additional context
Add any other context about the problem here.

I need to check if there are no other duplicated symbols excluded due to preprocessor rules.

[BUG] Using multiple return inside cpp2 function generates code that calls value()

cppfront adds a call to value() method when trying to get one of the values returned from a function that returns multiple values (happens when function is called in other cpp2 function).

cppfront compiles below code

vals: () -> (i : int) = {
    i = 42;
    return;
}

main: () -> int = {
    v := vals();
    v.i;
}

int cpp1fun() {
    v = vals();
    v.i;
}

into

// ----- Cpp2 support -----
#include "cpp2util.h"


struct vals__ret {
    int i;
    };
[[nodiscard]] auto vals() -> vals__ret;
[[nodiscard]] auto main() -> int;

int cpp1fun() {
    v = vals();
    v.i;
}

//=== Cpp2 definitions ==========================================================

[[nodiscard]] auto vals() -> vals__ret{
        cpp2::deferred_init<int> i;
    i.construct(42);
    return  { std::move(i.value()) }; 
}

[[nodiscard]] auto main() -> int{
    auto v { vals() }; 
    v.i.value();
}

In main() function v.i.value() cannot compile as int has no value() method.

[BUG] order-independence of functions fails when using default arguments

While implementing semantic analysis I noticed a conceptual problem with the order-independence approach of cpp2. Consider this code here:

foo: (x : int = bar()) -> int = { return 3*x+1; }
bar: (x : int = 2) -> int = { return x/2; }
main: () -> int = { return foo(); }

cppfront happily accept that code, but it then fails in C++ because the order of forward declarations is wrong. Swapping the order of foo and bar fixes that problem. (Modulo another bug that the default argument is repeated at the declaration site, which also makes the C++ compiler unhappy).

One question is, what should the intended semantic be? Can default arguments reference functions that are further down in the translation unit? Intuitive, if we fully bet on order-independence, the answer should be yes. On the other hand implementing that is actually quite subtle. Note that it is not always possible find a topological order, for example here:

foo: (x : int = bar()) -> int = { return 3*x+1; }
bar: (x : int = foo()) -> int = { return x/2; }
main: () -> int = { return foo(); }

Clearly that must trigger an error. We could detect this cyclic dependency, but it increases the complexity of the compiler. Another question is, is the compiler required to analyze default arguments if they are never evaluated? If not, we could simply translate them lazily on demand and check for cycles during expansion. (Though lowering that to C++ will be nightmare, this approach will only work of default arguments are expanded by cppfront itself and not by the underlying C++1 compiler).

Note that we will have exactly the same problem with auto return types. These also require analyzing the function itself, which might depend on other functions further down.

I see basically two options. Either we forbid this, requiring that functions are defined before they are used as default arguments or with auto return types. Or we make function analysis lazy, which requires the compiler to topological order functions according to their dependencies, producing an error if that is not possible.

	template< typename C, typename X >
	requires (std::is_base_of_v<X, C> && !std::is_same_v<C,X>)
	auto is( X const* x ) -> bool {
	return dynamic_cast<C const&>(x) != nullptr;
	}

hsutter / cppfront Goto Github PK

cppfront's People

Contributors

Stargazers

Watchers

Forkers

cppfront's Issues

Steps

Expected result

Actual result/error

Cpp2 source file

Steps

Error & expected behaviour

Recommend Projects

Recommend Topics

Recommend Org