dkorpel / ctod Goto Github PK
View Code? Open in Web Editor NEWAutomatically translate C code to D
Automatically translate C code to D
C:
int x[10] = {0};
D:
int[10] x = [0];
Results in a compiler error for size mismatch. Nicer would be:
int[10] x = 0;
This won't work in all situations, but the current behavior will never work.
#define foo 5 // comment
enum foo = 5 // comment;
In a struct, floats should be initialized to 0 to prevent surprises.
However, in a union, D does not permit setting the default value of members that aren't the first.
So the following doesn't work:
typedef union
{
stbir_uint32 u;
float f;
} stbir__FP32;
union _Stbir__FP32 {
stbir_uint32 u;
float f = 0; // error
}alias stbir__FP32 = _Stbir__FP32;
I feel like the current macro translation situation is poor in ctod. Probably not ctod's fault, and again, we are creeping towards a full compiler but...
Just ran into this:
#define MIN(a,b) (((a)<(b))?(a):(b))
pixels[y*image->width + x+1].r = MIN((int)pixels[y*image->width + x+1].r + (int)((float)rError*7.0f/16), 0xff);
Translates to:
enum string MIN(string a,string b) = ` (((a)<(b))?(a):(b))`;
pixels[y*image.width + x+1].r = MIN(cast(int)pixels[y*image.width + x+1].r + cast(int)(cast(float)rError*7.0f/16), 0xff);
Lots of problems here:
I get that ctod has to do something here. But this isn't very useful. I get that understanding MIN
is now a macro call, and therefore you change the expressions inside to strings would be difficult in an automatic way. But I'd almost rather have a nested function call than an enum here.
Can we explore other options?
C has a defined macro offsetof
documented here
This should be translated to D's offsetof
property.
Another weird one, I originally thought related to #4, but it happens without the unsigned
attribute.
void foo(void) {
size_t a = sizeof(unsigned char) * 5;
size_t b = sizeof(unsigned char);
size_t c = sizeof(int) * 5;
size_t d = sizeof(int);
}
void foo() {
size_t a = sizeofcast(ubyte) * 5;
size_t b = ubyte.sizeof;
size_t c = sizeofcast(int) * 5;
size_t d = int.sizeof;
}
sizeof(...) * something
is used a lot in malloc calls, so this is an important one.
This is likely a somewhat uncommon occurrence as most code will typedef structs into a symbol, but using a struct with a tag as results in some odd code.
struct S {
int x;
};
struct T {
struct S s;
};
void foo(struct T t);
struct S {
int x;
}
struct T {
struct S ;S s;
}
struct T ;void foo(T t);
I have a file that uses structs without typedefs, and it doesn't translate well.
typedef unsigned char X;
void main()
{
unsigned char c = 5;
c = (unsigned char)(c + 5);
c = (X)(c + 5);
}
alias X = ubyte;
int main() {
ubyte c = 5;
c = cast(ubyte)(c + 5);
c = (X)(c + 5);
}
That second line should change into a cast. It may not be as easily detectable. But there is a lot of code that uses typedefs, and casts.
Remove the parentheses around the expression, and it's recognized as a cast.
So I let this run overnight chewing through https://codeberg.org/drummyfish/tinyphysicsengine/src/branch/master/tinyphysicsengine.h
it never finished
Seems like the Node accumulator gets caught in an infinite loop finding nodes.
I have no idea what would cause this, maybe the file is too long?
In the project I'm working on (raylib), many #defines are specified in a config.h file, and many are specified by the makefile. Some way to distinguish between them would be helpful:
e.g.:
#ifdef PLATFORM_DESKTOP // specified by the makefile
#ifdef SUPPORT_IMAGE_EXPORT // specified by the config.h
I'd like some option of translation for these. Some I want to be version statements, some I want to be enums/static if:
version(PLATFORM_DESKTOP) {
static if(SUPPORT_IMAGE_EXPORT) {
I'm not sure how to envision this. Maybe a configuration file for ctod? I'm not sure if there would be a way to infer the right usage from the existing file. Especially since a lot of the config options are commented out in the config file, so ctod won't even see how they are defined.
Not sure how this happened when converting this file: https://github.com/schveiguy/draylib/blob/acb0b099169d73ac2fc4c11ddf00776bdf0aaa40/raylibc/rtextures.c
All the function calls and the image code is just missing.
with C, the macro va_arg
does some funky stuff with a type name. You use it like:
va_arg(v, int);
which comes out untouched on the D side, but obviously this is invalid syntax.
This should translate to:
va_arg!int(v);
This translation isn't critical, I can do a search/replace, but it would be nice to have. Probably not a huge problem, as not many functions are actually varargs.
Not sure if this was intentional. In order to build on macos, I used the build from the original tree-sitter source, so I don't technically need this to build. But I did expect it to actually work with an apparent makefile, only to find it's empty.
#ifndef foo
#define foo bar
#endif
version (foo) {} else {
enum foo = bar;
}
Somewhat nonsensical. Though I get how this happens. Just bringing it up in case there's any better way to handle this.
typedef struct S {
unsigned x[10];
unsigned y;
unsigned int z[10];
} S;
void foo(void)
{
unsigned x = 5;
}
=>
struct S {
[10] x;
y;
uint[10] z;
}
void foo() {
x = 5;
}
I believe unsigned
without a further type is unsigned int
.
int buf[10];
if(buf[0] == 5) {}
int[10] buf;
if(buf[0].ptr == 5) {}
That .ptr shouldn't be there.
C:
void foo(void) {
int x[10];
int *ptr = x + 5;
}
D:
void foo() {
int[10] x;
int* ptr = x + 5; // should be x.ptr + 5
}
Not sure if this is solvable in the general case, but you seem to be able to sniff out pointer usage in other cases when it's a static array.
if (len < sizeof(struct mg_dns_header)) return 0;
into
if (len < struct mg_dns_header;.sizeof) return 0;
From: https://github.com/cesanta/mongoose/blob/092f2ce0b32aba3e818652aacb0273d5e6e6f6fc/src/dns.c#L79
In C, when you define an enum type, the members are accessible without the namespace.
This needs to be reproduced in D for equivalent code to compile.
e.g.:
enum X
{
A,
B
};
int x = A;
current conversion:
enum X
{
A,
B
}
int x = A;
Proposed conversion:
enum X
{
A,
B
}
alias A = X.A;
alias B = X.B;
int x = A;
In a file I'm translating, I have this (this is common for Windows systems):
// Function specifiers in case library is build/used as a shared library (Windows)
// NOTE: Microsoft specifiers to tell compiler that symbols are imported/exported from a .dll
#if defined(_WIN32)
#if defined(BUILD_LIBTYPE_SHARED)
#define RAYGUIAPI __declspec(dllexport) // We are building the library as a Win32 shared library (.dll)
#elif defined(USE_LIBTYPE_SHARED)
#define RAYGUIAPI __declspec(dllimport) // We are using the library as a Win32 shared library (.dll)
#endif
#endif
// Function specifiers definition
#ifndef RAYGUIAPI
#define RAYGUIAPI // Functions defined as 'extern' by default (implicit specifiers)
#endif
Then things are defined like:
RAYGUIAPI void GuiEnable(void);
But when passed via ctod it comes out like:
RAYGUIAPI GuiEnable();
Which somehow swallows the return type. I can work around by just removing all the RAYGUIAPI in all cases, but this seems like something that might need addressing.
No rush of course on this, I'm not building DLLs here.
Some possible thoughts -- I don't see how you can correctly translate this to D, as it doesn't allow such a string replacement as the C preprocessor allows. But, what if you could just define direct string replacements? Like, just say, ctod --redefine RAYGUIAPI=export
or ctod --redefine RAYGUIAPI=
?
I was playing with transforming neomutt/nntp source code and it seemed to hang. I didn't hone in on the exact construct that is causing the parsing issue. :/
The attached newsrc.txt
file is a slightly reduced version. This is about 100 lines and takes 20s to translate. Delete a few lines and it goes to 9 seconds and the right lines and it's under 1 sec. I'm not sure if this is still a valid reproduce case as I've deleted enough arbitrarily that it likely isn't valid C anymore either.
So in my code base, I have something like:
char header[] = "LOTS OF TEXT...";
// sometime later
foo(header, sizeof(header)-1);
This gets translated using ctod to:
char * header = "LOTS OF TEXT...";
// sometime later
foo(header, sizeof(header)-1);
It's clear from this that we don't want the size of the pointer minus 1, but the number of bytes (minus the null character).
A couple of problems here:
char[n]
.sizeof(header) - 1
is going to strip of the last character, not the zero terminator!header
was typed as const char header[]
, then this would have compiled and done exactly the wrong thing!So what to do?
One of the worst things ctod can do is to translate the code into something that compiles, but does the wrong thing. Because nobody is going to scrutinize this.
The sizeof call is obviously wrong, so at least it's flagged by the compiler. But i'm wondering if that was an accident, because other sizeof calls are properly translated.
But really I wonder if this kind of pattern should be recognized, and changed to char[N] = "LOTS OF TEXT...\0";
, where N is detected by ctod to at least make the sizeof calculation accurate?
For reference, the real code is here:
https://github.com/schveiguy/draylib/blob/acb0b099169d73ac2fc4c11ddf00776bdf0aaa40/raylibc/external/stb_image_write.h#L770
If I have a function in C that takes a sized array, and a call with that same type, the translated D code will build, but won't be equivalent.
e.g.:
#include <stdio.h>
void foo(unsigned short arr[2]) {
arr[0] = 5;
}
int main() {
// nested array needed to trick ctod into not putting a .ptr on it
unsigned short arr[4][2] = { 0 };
foo(arr[0]);
printf("arr[0] is %d\n", arr[0][0]);
return 0;
}
module test;
@nogc nothrow:
extern(C): __gshared:
public import core.stdc.stdio;
void foo(ushort[2] arr) {
arr[0] = 5;
}
int main() {
// nested array needed to trick ctod into not putting a .ptr on it
ushort[2][4] arr = 0;
foo(arr[0]);
printf("arr[0] is %d\n", arr[0][0]);
return 0;
}
The C code prints 5
, the D code prints 0
My recommendation is probably to use a pointer instead of the static array for the parameters. Or else, use ref
. The former is more likely to compile with correct code without modification.
In a translated file, I have
struct sdefl_freq {
unsigned lit[SDEFL_SYM_MAX];
unsigned off[SDEFL_OFF_MAX];
};
struct sdefl_code_words {
unsigned lit[SDEFL_SYM_MAX];
unsigned off[SDEFL_OFF_MAX];
};
struct sdefl_lens {
unsigned char lit[SDEFL_SYM_MAX];
unsigned char off[SDEFL_OFF_MAX];
};
In the D file I get:
sdefl_freq;
sdefl_code_words;
sdefl_lens;
Not sure why this is happening.
Reference file is: https://github.com/schveiguy/draylib/blob/0a7b3d1ada6ce4daedd95ed7fee0d34422b1782b/raylib/external/sdefl.h#L138
C:
#ifndef foo
int x;
#else
long x;
#endif
D:
version (foo) {} else {
int x;
} else {
c_long x;
}
What needs to happen, unfortunately, is the else branch needs to be copied into the first brace set. Not sure if this is easy to do.
I see that probably it is required to add some libtree-sitter and libc-parser objects to make it run.
Can you please give some hints how to build it?
I can build it for arm architecture so you will be able to add it to the repo.
int foo[5]= {0,1,2,3,4};
int bar[5]= {1,2,3,4,5};
int[5] foo = 0;
int[5] bar = [1,2,3,4,5];
The key is it has to be a static array, and the initializer values have to start with a 0.
This took me forever to figure out because I'm translating stb_image
which is a giant nest of bit manipulation/lookup tables, and there are some static tables in the huffman decoding that started with 0! So basically, the huffman decoding was failing, and I couldn't figure out why.
Now that I have found this, I have it building and working ;)
I got it to build on macos.
I ran it on my first c file, here: https://github.com/schveiguy/draylib/blob/0a7b3d1ada6ce4daedd95ed7fee0d34422b1782b/raylib/rmodels.c
After running, I got a rmodels.d. But the diff is:
0a1,4
> module rmodels;
> @nogc nothrow:
> extern(C): __gshared:
>
106,108c110,111
< #ifndef MAX_MATERIAL_MAPS
< #define MAX_MATERIAL_MAPS 12 // Maximum number of maps supported
< #endif
---
>
>
5041c5044
< #endif
---
> #endif
\ No newline at end of file
It's almost like it's giving up early or something. Does it deal properly with header files? Would it be best to translate preprocessed files?
Based on #8 (comment) (so it's not lost)
Most C code that initializes C structs does so via = {0}
which has no D equivalent.
Alternatively, assigning {0}
could be replaced with some template that zeroes everything. But likely this is not required for most C code.
int *x = malloc(4);
In D, a cast(int*)
should be added:
int* x = cast(int*) malloc(4);
It currently doesn't do that.
str.c
struct S { double x; int y; }
Sarray[2] = {
{1.5, 2},
{2.5, 3}
}
That produces
module str;
@nogc nothrow:
extern(C): __gshared:
struct S { double x = 0; int y; }S[2] Sarray = [
[1.5, 2],
[2.5, 3]
];
Which results in the error
str.d(5): Error: cannot implicitly convert expression `[1.5, 2.0]` of type `double[]` to `S`
str.d(6): Error: cannot implicitly convert expression `[2.5, 3.0]` of type `double[]` to `S`
The fix is simple, it should instead generate
struct S { double x = 0; int y; }S[2] Sarray = [
{1.5, 2},
{2.5, 3}
];
This:
#ifdef __cplusplus
}
#endif
translates to this:
version(none) {
}
}
Which doesn't work... The initial header translates to:
#ifdef __cplusplus
extern "C" {
//! #endif
Which isn't great, but at least is obviously wrong, and it still has the __cplusplus
statement there instead of the unrelated version(none)
A: https://tigerbeetle.com/blog/2023-09-19-64-bit-bank-balances-ought-to-be-enough-for-anybody/
https://github.com/tigerbeetle/tigerbeetle/blob/main/src/clients/c/tb_client.h
#include <stdint.h>
typedef __uint128_t tb_uint128_t;
to:
public import core.stdc.stdint; // why not core.int128 - Cent or std.int128 ?
alias tb_uint128_t = __uint128_t; // wrong!
Similar reference:
C:
int x[3] = {10, 20};
typedef struct {
int x;
int y;
} S;
S y = {10, 20};
D:
int[3] x = [10, 20];
struct _S {
int x;
int y;
}alias _S S;
S y = [10, 20];
The struct initializer should not use [] brackets.
for(int i = 1, j = 2; i < 5; ++i) {
}
for(int i = 1;int j = 2; i < 5; ++i) {
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.