jameysharp / corrode Goto Github PK

View Code? Open in Web Editor NEW

2.1K 2.1K 167.0 725 KB

C to Rust translator

License: GNU General Public License v2.0

Haskell 68.22% Makefile 1.18% Python 25.37% Batchfile 5.23%

corrode's People

Contributors

Stargazers

Watchers

Forkers

nabilhassein lamarqua bradediger cortlandd alfhub adelarsq aaron1011 buremba markmccanna1 harpocrates mikepurvis widmogrod netgusto vincentjzimmer bch29 schollz mewbak omefire stonegao knocte tarun-verma dirvine ppamorim aisamanra storyyeller rostepher tml remyzorg seanjensengrey cramertj bnewbold pombredanne hrothen natronics elsuizo fanzier baig jfischoff jdub tisma waywardmonkeys kiddkai tko locallycompact errord dannyob marwes mikowiec bryteise d33tah dardevelin bysshe dtwood xeqlol nickager lambdageek ranweiler aeyakovenko danielrh pfasante ojkelly porges khromov cfr nunb ajinkyakulkarni isonmad gomson alexw91 dafyddcrosby iuliandumitru obliviateandsurrender sanxiyn nikhil3456 e42s skade gaelan uuhan adetokunbo phase uncombedcoconut sitkevij viewtiful cohomology shingtaklam1324 lu-zero hoangpq sahwar igxactly-forks mikefedyk rust-stuff hsk lucafavatella zaoqi-clone sethpoulsen matthew-hilty apatrida hollyfeld sdleffler thomcc

corrode's Issues

Unsupported rust generated (hex float literal unsupported)

Corrode generated some code from a csmith file which is not supported in rust.

Compile with:
[0] [email protected]:~/Projects/corrode (2) $ stack exec -- corrode -Wall -I/home/pmatos/Projects/csmith/runtime /home/pmatos/tmp/corrode-csmith/csmith_1.c /home/pmatos/tmp/corrode-csmith/csmith_1.rs /home/pmatos/tmp/corrode-csmith/csmith_1.o [0] [email protected]:~/Projects/corrode (3) $ rustc /home/pmatos/tmp/corrode-csmith/csmith_1.rs /home/pmatos/tmp/corrode-csmith/csmith_1.rs:693:12: 693:17 error: hexadecimal float literal is not supported /home/pmatos/tmp/corrode-csmith/csmith_1.rs:693 0x1.0p-100f32 * sf1 * (0x1.0p-28f32 * sf2) ^~~~~ /home/pmatos/tmp/corrode-csmith/csmith_1.rs:693:35: 693:40 error: hexadecimal float literal is not supported /home/pmatos/tmp/corrode-csmith/csmith_1.rs:693 0x1.0p-100f32 * sf1 * (0x1.0p-28f32 * sf2) ^~~~~ error: aborting due to previous error

files.zip

fix type unification for ternary conditional

In the expression c ? p : 0, if p is a pointer then the 0 should be interpreted as a null pointer. Generally, if either branch is a pointer, we should force the other branch to be a pointer as well.

Keep Language.Rust as a separate Haskell library?

The Language.Rust module would be very useful for other people who are generating Rust from Haskell. In particular, I've been looking at writing a Rust backend to Elm for quite a while, and this would be very useful, since Elm is written in Haskell.

Would you consider releasing it separately on Hackage? Possibly with a more permissive license, that would allow it to be used in MIT or or BSD licensed projects?

Corrode does not produce .o files, halting build systems

As the corrode tool currently does not produce a .o file, buildsystems which rely on that (or the -o flag) currently halt very quickly with an error. Producing anything would be better than nothing, and the following might be palatable options:

Simply drop an empty file there. This is unlikely to work well, but it's quite easy
Drop an object file that has no contents (and thus can be constant data in corrode) into place
Invoke rustc on the generated .rs file
Some combination of 2 and #54 - akin to LLVM's ThinLTO
Some combination of 3 and #54 - akin to "classic" LTO, that both generates normal code and also permits the linker to throw it away and start over.

implement byteswap.h GCC builtins

I tried to run corrode on this program: https://github.com/tomhughes/libdwarf/blob/master/dwarfexample/simplereader.c

and got this error:

corrode: ("/usr/include/x86_64-linux-gnu/bits/byteswap.h": line 47): illegal undefined variable; check whether a real C compiler accepts this:
    __builtin_bswap32

I have no idea whether I'm correctly interpreting this error, but I thought I'd report it :)

Translate types only as needed

Standard C system headers have a bazillion declarations in them, most of which are not used by any given translation unit. So it's unfortunate when Corrode fails to translate a declaration and gives up, if just skipping that declaration would have led to correct output for a particular translation unit.

I'd like to solve this by making baseTypeOf defer most of the work it currently does. For each declared type, it should save just enough information that we can recognize whether the type is used later. We shouldn't check whether we know how to translate the type until its first use.

Doing this correctly means that when a type is first used, translating it may trigger the first use of other types. The translator must not go into an infinite loop if a type references itself, directly or indirectly.

extern declarations need to be similarly lazy. An extern declaration should only force the types it uses to be inspected if the declaration is used.

Would be nice: Distinguish between using a pointer to a type versus using the type itself. If all uses of a type are behind pointers, then we can translate the type as an unconstructable enum, like we currently do for union, because the pointer representation doesn't depend on the type it points to. In that case we should not report any error even if the type is one we can't translate. If, on the other hand, a struct or union ever has its fields accessed or is copied, or a C enum has its values used, then we need to translate the type. So this generalizes the special-case treatment we currently give union.

does not build on Windows

I get the following error running cabal install or stack install in the project root:

cabal install output:

Resolving dependencies...
Configuring corrode-0.1.0.0...
Building corrode-0.1.0.0...
Failed to install corrode-0.1.0.0
Build log ( C:\Users\crame\AppData\Roaming\cabal\logs\corrode-0.1.0.0.log ):
Building corrode-0.1.0.0...
Preprocessing library corrode-0.1.0.0...

src\Language\Rust\Corrode\C.lhs:1:1: error:
    File name does not match module name:
    Saw: `Main'
    Expected: `Language.Rust.Corrode.C'
cabal: Leaving directory '.'
cabal: Error: some packages failed to install:
corrode-0.1.0.0

stack install output:

corrode-0.1.0.0: build
Preprocessing library corrode-0.1.0.0...

C:\Users\crame\rustiness\corrode\src\Language\Rust\Corrode\C.lhs:1:1:
    File name does not match module name:
    Saw: `Main'
    Expected: `Language.Rust.Corrode.C'

--  While building package corrode-0.1.0.0 using:
      C:\Users\crame\AppData\Roaming\stack\setup-exe-cache\x86_64-windows\setup-Simple-Cabal-1.22.5.0-ghc-7.10.3.exe --builddir=.stack-work\dist\2672c1f3 build lib:corrode exe:corrode --ghc-options " -ddump-hi -ddump-to-file"
    Process exited with code: ExitFailure 1

This occurs on 64 bit Windows after a fresh install of the Haskell Platform.

implement global variables

We're already tagging all the generated functions as unsafe, so we can blithely generate code to read and write global mutable variables. Wheee!

Add static items to the Rust AST's data Item datatype. See the Let constructor from data Stmt for a very similar example.
For any CDeclExt which does not have the CExtern storage specifier and does not have function type (IsFunc or CFunDeclr), emit a static item. Compare the handling of let-bindings in localDecls to get an idea of what to do.

Arrays are broken!

It seems like arrays are decayed into mere pointers. This causes all sorts of problems.
Let's start a very simple test example (one that does not require stdio):

int main() { int array[1]; array[0] = 42; return array[0]; }

[adrien@MacBookPro ~/dev/repos/corrode/tests]$ gcc -Wall array.c -o array
[adrien@MacBookPro ~/dev/repos/corrode/tests]$ ./array; echo $?
42

So far so good. But Corrode translates that to:

fn main() {
    let ret = unsafe { _c_main() };
    std::process::exit(ret);
}

#[no_mangle]
pub unsafe fn _c_main() -> i32 {
    let mut array : *mut i32;
    {
        let _rhs = 42i32;
        let _lhs = &mut *array.offset(0i32 as (isize));
        *_lhs = _rhs;
    }
    *array.offset(0i32 as (isize))
}

Calling rustc on the generated code:

[adrien@MacBookPro ~/dev/repos/corrode/tests]$ rustc array.rs
array.rs:11:26: 11:31 error: use of possibly uninitialized variable: `array` [E0381]
array.rs:11         let _lhs = &mut *array.offset(0i32 as (isize));
                                     ^~~~~
array.rs:11:26: 11:31 help: run `rustc --explain E0381` to see a detailed explanation
array.rs:14:6: 14:11 error: use of possibly uninitialized variable: `array` [E0381]
array.rs:14     *array.offset(0i32 as (isize))
                 ^~~~~
array.rs:14:6: 14:11 help: run `rustc --explain E0381` to see a detailed explanation
error: aborting due to 2 previous errors

rustc legitimately complains that we're not initializing the arrray, and he's right! We're never allocating memory anywhere.

Rust has an array type, which I think is appropriate to handle these kind of cases. I feel like the semantics between C arrays and Rust arrays are similar, but I haven't dived deep into it.

There are a couple interesting things to note about arrays, and how we can handle them in Corrode.

First, through some mechanism unknown to me, they are decayed into pointers. I feel like this has to do with baseTypeOf and derivedTypeOf (which seem to ignore the relevant information from CDerivedDeclarator), but I can't say I understand this part of the code quite yet. My understanding is that there isn't an equivalent Rust.Type either, which would contain the relevant information for the translation.

Arrays have a whole bunch of interesting properties:

their size can be omitted, making the type of the declaration incomplete, but a subsequent declaration can fix this and make the type complete: 6.2.5.22

22 An array type of unknown size is an incomplete type. It is completed, for an identifier of
that type, by specifying the size in a later declaration (with internal or external linkage).

This is interesting because I am not sure how the language-c package would handle cases like this:

int global_array[];
// [...]
int global_array[25];

When we access the relevant information parsed by language-c, we are definitely more interested in knowing that global_array has 25 elements.

Ever since C99, arrays can have variable length - they are still allocated on the stack though. But this feature was retrograded to optional in C11 from mandatory in C99, and it is generally frowned upon by a subset of C developers. Implementing this in Rust while preserving the semantics may be hard! (the alloca POSIX function [and its Windows] equivalent) may be used instead?
Arrays are decayed into pointers when they are passed as an argument to a function. But also, TIL, you can have a static specifier within the brackets to hint the compiler that the array (while still being decayed) must contain at least N elements! This is amazing and I didn't know that despite having worked with C for a very long time. Having quickly tested it, it seems like language-c accepts it (but I don't know whether it holds the info somewhere, and whether this info would be useful at all in translating to Rust - my gut feeling is probably not).
Another interesting thing is the interaction between sizeof and arrays. The interesting part to know is that if we declare an array, say:

int array[25];

Using the symbol array anywhere in C code is gonna be completely equivalent to &array[0], except in the case where we take the sizeof of it! In the case of VLA, we want to return the expression used to compute the number of elements in the array times the size of an element in the array. In the case of a non-VLA we simply return the size known at compile time (#elements x sizeof(element)). It is a very common idiom to have a macro defined as such:

#define ARRAY_COUNT(x) (sizeof(x) / (sizeof((x)[0])

You can then use this to get, for example, the size of a compile-time declared string (provided it's declared as an incomplete array):

const char s[] = "Corrode rules!";
size_t len_s = ARRAY_COUNT(s);

I discovered a few other broken things related to sizeof and arrays, but I'll talk about it in detail in the sizeof issue :).

translate nicer type names by using typedef names

A fairly common pattern in C looks like this:

typedef struct private private;
struct private {
    ...
};

Often the typedef is in a public header file and the full struct definition is in a private header file.

Corrode gets this case wrong at the moment: any uses of the private type alias see an empty struct type, even if the full definition appears later.

I think the best way to fix this is to make typedef resolution more lazy. When the typedef itself is processed, it should record an action that will look up the referenced struct as needed, instead of storing the definition the struct has at that instant. Then baseTypeOf needs to be able to run such actions when it sees a CTypeDef type specifier.

implement do-while loops

The CWhile constructor of CExpression is currently only translated if its Bool flag is False, meaning this is a regular while-loop which tests the loop condition before entering the loop.

If that flag is True, then we need a different translation plan. Since Rust doesn't have do-while loops, we'll have to translate these like this:

loop {
    ...
    if !cond { break; }
}

But like for loops, it's a little trickier than that if the loop contains any continue statements. The same strategy we use on for loops should work here, though:

'breakTo: loop {
    'continueTo: loop {
        ...
        break;
    }
    if !cond { break; }
}

Like with for loops, break and continue statements should be translated to break 'breakTo and break 'continueTo, respectively.

psa: build requires ghc 7.10

This is a meta-issue. As requiring GHC 7.10 isn't a problem. Readme should be updated.

I am super new to Haskell and attempted briefly to use GHC 8.0.1, the current version supplied by homebrew on OSX to build corrode, resulting in the following error:

Resolving dependencies...
cabal: Could not resolve dependencies:
trying: corrode-0.1.0.0 (user goal)
next goal: base (dependency of corrode-0.1.0.0)
rejecting: base-4.9.0.0/installed-4.9... (conflict: corrode => base>=4.8 &&
<4.9)
rejecting: base-4.9.0.0, base-4.8.2.0, base-4.8.1.0, base-4.8.0.0,
base-4.7.0.2, base-4.7.0.1, base-4.7.0.0, base-4.6.0.1, base-4.6.0.0,
base-4.5.1.0, base-4.5.0.0, base-4.4.1.0, base-4.4.0.0, base-4.3.1.0,
base-4.3.0.0, base-4.2.0.2, base-4.2.0.1, base-4.2.0.0, base-4.1.0.0,
base-4.0.0.0, base-3.0.3.2, base-3.0.3.1 (constraint from non-upgradeable
package requires installed instance)
Dependency tree exhaustively searched.

For those using a Mac and homebrew, I have instructions here

https://gist.github.com/seanjensengrey/c423a5ab0276758030e74f501a63994f

that outline how to get the required tooling installed and corrode compiled.

Define [repr(C)] on translated structs

Currently, Corrode fails to add #[repr(C)] to translated struct definitions, meaning that the generated code does not preserve behavior and cannot interact with C code.

configure travis-ci

probably test against 8.x and 7.10 so that things like #39 would get caught sooner

implement array types as synonyms for pointers

Handle CArrDeclr constructor from CDerivedDeclarator in the derive helper function in cTypeOf. (At that point the catch-all case can be deleted as derive will handle all three kinds of declarators.)

For now, throw away any information about the length of the array and just translate it to the IsPtr type, just like CPtrDeclr.

C language va_list Error

corrode: ("/usr/lib/gcc/x86_64-linux-gnu/5/include/stdarg.h": line 40): illegal undefined type; check whether a real C compiler accepts this:
__builtin_va_list

and stdarg.h line 40 is:

typedef __builtin_va_list __gnuc_va_list;

use current directory as include search directory

I run corrode -Wall src/main.h and it says it can not find file src/version.h. This is fixed with adding -I., but it would be better to include the currect directory.

Can't translate basic C programs?

Everything that I try and translate seems to result it pub unsafe fn main() -> i32 { 0i32 } or something similar (depending on the types used in the c program). Why is this?

Consider a corrode bindgen

Just wondering, would it be possible to use corrode as a sort of bindgen? That is, instead of generating any code, just generate all the extern "C" declarations of types and functions for FFI use. And if it's possible, do you think it would have any advantage over the existing bindgen?

implement C99 compound literals

Translating the CCompoundLit constructor of CExpression should use the typeName helper on its CDeclaration field to get the type which the compound literal is supposed to construct, and then should use interpretInitializer to translate the CInitializerList field to a Rust expression.

license

It looks like you are using GPL v2. Have you considered adding the "or any later version" option, in order to improve compatibility with other licenses?

Unable to build on Windows

I'm new to Haskell so it's likely that I've set up something incorrectly, but I get this error when trying to build this:

Preprocessing library corrode-0.1.0.0...

C:\Users\Matt Ickstadt\Code\Rust\corrode\src\Language\Rust\Corrode\C.lhs:1:1:
    File name does not match module name:
    Saw: `Main'
    Expected: `Language.Rust.Corrode.C'
Completed 12 action(s).

--  While building package corrode-0.1.0.0 using:
      C:\stack\setup-exe-cache\x86_64-windows\setup-Simple-Cabal-1.22.5.0-ghc-7.10.3.exe --builddir=.stack-work\dist\2672c1f3 build lib:corrode exe:corrode --ghc-options " -ddump-hi -ddump-to-file"
    Process exited with code: ExitFailure 1

I'm not sure where it's getting 'Main' from, the file has module Language.Rust.Corrode.C as expected.

Infos

Windows 10 x64
Stack Version 1.1.2, Git revision c6dac65e3174dea79df54ce6d56f3e98bc060ecc (3647 commits) x86_64 hpack-0.14.0
ghc 7.10.3
STACK_ROOT=C:\stack

Edit:
I'm having the same issue in a fresh Ubuntu 16.04 VM.
The stack commands I ran to have this issue:
stack setup (installs ghc 7.10.3)
stack install

Edit:
I'm also having the same issue in Ubuntu with the cabal build system.

Edit:
The build works fine when cloned inside of ubuntu rather than using a shared folder, so I'm guessing this is yet another windows symlink problem.

Linux/Debian `cabal install` Failure

This is my first cabal/haskell build and I got a difficult to parse error:

cabal: Error: some packages failed to install:
corrode-0.1.0.0 depends on language-c-0.5.0 which failed to install.
language-c-0.5.0 failed during the configure step. The exception was:
ExitFailure 1

I found some similar results on stack overflow:

Though after making sure ~/.cabal/bin was in my path I still had the same error.

It might have been user error on my part but I found that I had to install happy and alex first, before running cabal install like this:

$ cabal install happy
$ cabal install alex
$ cabal install

Posting this for visibility/searching if someone else has the same issue.

When an error is encountered, provide the include trace/stack/whatever

The error in #49 may be harder to eliminate than necessary, as I can't tell how the project arrived at the (very, very internal to the C library) header in question.

implement union

C file:

// test.c
#include <stdio.h>
struct Fish {
  char type[];
  int age;
} fish;

I run ~$corrode test.c and get

corrode: ("/usr/include/wchar.h": line 85): Corrode doesn't handle this yet:
    union {
        unsigned int __wch; char __wchb[4];
    }

Any tips on why this has happened?

implement empty statements

In C, one kind of legal statement is the empty statement ;. language-c represents such statements as CExpr Nothing _. interpretStatement should handle this case by returning an empty block, Rust.BlockExpr (Rust.Block [] Nothing).

The usual time when people use empty statements is in loop bodies, where all of the loop's side effects happen in the loop header. Conveniently, our translations of loop bodies are all wrapped in toBlock calls, so that idiom won't generate an unnecessary pair of empty curly braces.

implement extern inline functions

This should translate pretty trivially to Rust, as it's essentially a matter of changing the function declaration syntax, then slapping an extern "C" on the front.

Encountered while attempting to compile LMDB:

corrode -pthread -O2 -g -W -Wall -Wno-unused-parameter -Wbad-function-cast -Wuninitialized   -c mdb.c
("/usr/x86_64-pc-linux-gnu/include/sys/sysmacros.h": line 38): illegal storage class specifier for function; check whether a real C compiler accepts this:
    extern
make: *** [Makefile:82: mdb.o] Error 1

implicitly construct/dereference function pointers

In C, a function call looks like this:

int putchar(int);
putchar('A');

Indirecting through a function pointer looks like this:

int (*f)(int) = &putchar;
(*f)('B');

But it can also look like this:

int (*g)(int) = putchar;
g('C');

Notably, C automatically dereferences a function pointer if it's called, and it automatically takes the address of a function if it's used where a function pointer is expected.

In Rust, function items have a type such as fn(i32) -> i32, which is a kind of pointer. So first we need to translate C types like int (*)(int) to that Rust type, not to *const fn(i32) -> i32 or something.

Then we need two special cases in expression evaluation:

Taking the address of an expression that has function type should be a no-op.
Dereferencing an expression that has function type should be a no-op.

Here's an example Rust program which calls the C standard library's qsort function, passing it a callback which is implemented in Rust: https://is.gd/QLJxHh

fix initialization for enum-typed variables

In my ongoing quest to corrode libsel4, I have come across a new blocker. (I'm using -DNDEBUG to dodge __FUNCTION__ for now.)

("libsel4/include/sel4/objecttype.h": line 23): Corrode doesn't handle this yet:
    seL4_NotificationObject

Here's what leads up to line 23 as per gcc -E:

#14 "libsel4/include/sel4/objecttype.h"
typedef enum api_object {
    seL4_UntypedObject,
    seL4_TCBObject,
    seL4_EndpointObject,
    seL4_NotificationObject,
    seL4_CapTableObject,
    seL4_NonArchObjectTypeCount,
} seL4_ObjectType;

__attribute__((deprecated("use seL4_NotificationObject"))) static const seL4_ObjectType seL4_AsyncEndpointObject = seL4_NotificationObject;

It's nothing to do with the __attribute__, as it aborts the same way without it.

So, could it be something to do with referencing an enum member (seL4_NotificationObject), but corrode doesn't keep enough state/context to know what's going on? (I recall from other issues that this crops up elsewhere.)

Related to #19, not fixed by #40?

implement array subscript operator

The CIndex constructor of CExpression should be translated as if it added its two operands and then dereferenced the result. Note that in the C expression e1[e2], although we usually expect e1 to be a pointer and e2 to be an integer, C permits them to be the other way around. But pointer/integer addition is commutative too, so the translation can be exactly like *(e1 + e2).

Perhaps implement a "corrode-ld" and "corrode-ar" for inter-translation-unit conversion?

Handling C's inline/static inline/extern inline behavior seems like it might be somewhat tricky unless the linker plays along.

As I'm not sure we can get rustc to produce output files with the shape the linker expects for these features, it may be worthwhile to make corrode-cc mostly do the job of parsing the C files and do local conversions, followed by dumping an intermediate form, while corrode-ld/corrode-ar collects those files, convert them sensibly, and produce something much more like a crate.

Translate FUNCTION

Now that I can at least partially read Haskell, I'm going to try out a (potentially) easy first change: Translate __FUNCTION__. It's currently regarded as badSource:

("shared_types_gen.h": line 19): illegal undefined variable; check whether a real C compiler accepts this:
    __FUNCTION__

Would it be correct and appropriate to add __FUNCTION__ to getSymbolIdent's list of builtin symbols?

Properly translating this to Rust depends on rust-lang/rfcs/pull/1719, but for now, I could look up the current function name in the symbol environment.

I could then follow up by translating __FILE__ and __LINE__ to Rust's file! and line! macros respectively, which is arguably slightly simpler, but __FUNCTION__ is first in my sights.

implement alignof

In src/Language/Rust/Corrode/C.hs, interpretExpr should handle the CAlignofType constructor of CExpression in a manner very similar to how it handles CSizeofType, except by translating to std::mem::align_of.

implement typedef for function types

I got this error

("/usr/include/libio.h": line 360): illegal undefined type; check whether a real C compiler accepts this:
    __io_read_fn

but found

335    typedef __ssize_t __io_read_fn (void *__cookie, char *__buf, size_t __nbytes);

implement goto in C

extern int printf (const char *format, ...);
extern int getchar(void);

int main(void)
{
    int n=0;
    printf("input a string ：\n");
loop: if(getchar()!='\n')
      {
          n++;
          goto loop;
      }
      printf("length is %d\n",n);
}

corrode: ("goto.c": line 10): Corrode doesn't handle this yet:
    loop:
        if (getchar() != '\n')
        {
            n++;
            goto loop;
        }

Complex numbers and complex.h are not handled

As far as I can tell, there are only two things stopping us from compiling anything that includes <complex.h> (section 7.3).

we don't handle the builtin complex type _Complex
we don't recognize when an extern declaration is to a library builtin function

For the first problem, I think we should translate using num_complex::Complex. That works especially nicely since the aforementioned struct is parametrized over the type of the components, so we can easily bundle together all the types of complex numbers: double complex, float complex, long double complex etc. In terms of code in corrode, we'd be extending
baseTypeOf to support CComplexType.

Furthermore, almost all the required functions in <complex.h> match up to something in num_complex::Complex (and they have the same branch cuts 😅), with three easily circumvented exceptions: cpow, csqrt, and cproj.

However, I'm not sure how we link up these implementations to the extern declarations in <complex.h>, and that seems like a more general problem we should solve...

Is there some list of builtin functions for which implementations are automatically linked? Aren't there conflicts if functions of the same names are defined in a program (and what about if the builtin function is only in scope after an include - like cabs)? Maybe someone with more C experience can weigh in...

Output incomplete corrosion on error?

Would it be difficult or ugly to output the partial / incomplete corroded source on error (or, if I've missed it, is there already a way to do so)?

implement enum

C's enum types are pretty easy to translate except for the fact that C allows arbitrary integer values to be used where an enum variant is expected. Rust's enums don't allow that (as far as I know).

So step one is: Figure out what we can translate C enums to in Rust that will allow valid C programs to be translated (even if they use values outside the range of the enum), while ideally preserving all uses of the names of the enum variants.

implement int128

Corrode doesn't handle this yet:
    __int128

Installation instructions?

I know how to run C and Rust code, but with Haskell… I don't know where to start. Could you add "How to run this for C coders" instructions?

implement inline assembly

The CAsm constructor of CStatement, and the CAssemblyStatement it contains, are almost easy to translate to Rust's asm! macro.

At a high level, both syntaxes deliberately have the same semantics: Rust borrows LLVM's, which exists to support Clang's attempts to exactly match GCC. However, I'm not sure the details actually agree. I'd like someone who has some experience with inline assembly to take this on.

A correct implementation probably starts like this:

interpretStatement stmt@(CAsm (CAsmStmt qual expr outops inops clobbers node) _) = do
    volatile <- case qual of
        Just (CVolatQual _) -> True
        Nothing -> False
        _ -> badSource stmt "qualifier on inline assembly"

If volatile, then the "volatile" option should be added as the last parameter to the asm! macro.

The template expression can be extracted using something like the code that would run for interpretExpr True (CConst (CStrConst expr node)) (try factoring out the code that handles CStringLiteral expressions) except that the asm! macro probably doesn't expect byte strings, so it should be expanded as a Unicode string instead. Similar string extraction is needed inside the operands and clobbers.

I'm stuck trying to figure out how to map operands and clobbers. Some cases will probably work if they're copied through unchanged, but maybe others are more complicated?

implement test suite

relates to #43

Handle builtins

When trying to compile LMDB with CC=corrode:

("/usr/x86_64-pc-linux-gnu/include/bits/byteswap.h": line 47): illegal undefined variable; check whether a real C compiler accepts this:
    __builtin_bswap32
make: *** [Makefile:82: mdb.o] Error 1

A minimal approach might be to simply match against __builtin_ (which is reserved), and pass them through unchanged. The resulting Rust code will fail to compile due to calling an undefined function, prompting the user to correct it.

Can't corrode a hello world example

Hi,

I tried to compile the following simple program, but I get the following error:

#include <stdio.h>

int main() {
    printf("Hello World!");
    return 0;
}

corrode: ("/usr/include/wchar.h": line 85): Corrode doesn't handle this yet:
    union {
        unsigned int __wch; char __wchb[4];
    }

Let me know if you need any details about my environment/compiler.

sequence points

In C, assignment expressions (including pre and post increment/decrement) are supposed to be grouped into sequence points and evaluated together. I haven't looked up the details yet, so I'm probably doing that wrong.

There may be an opportunity to report a useful diagnostic if we find multiple writes, or mixed reads and writes, to the same variable within one sequence point. I could imagine that rejecting programs with those errors might make the translation easier, in which case we should absolutely do that.

generate FFI bindings for external symbols

For a semantics-preserving translation, we need to arrange to call any native C functions that are not present in the current translation unit. In a post-processing pass, we'll probably want to identify existing crates that expose the same FFI bindings and import those instead, but for now hopefully this will be enough to get translated modules to compile, link, and run identically to their original C versions.

This should probably wait for issue #4 to be completed so we have global variables handled, not just functions.

If a CDeclExt constructor of CExternalDeclaration either has storage specifier CExtern or is a function prototype, then record its name and type in a global map of symbols which may need FFI binding.
Have interpretFunction record its symbol in a global set of functions which do not need FFI bindings created. Same for CDeclExt when encountering a declaration that is not a function prototype and is not CExtern.
For each symbol in the maybe-FFI map which is not in the no-FFI set, get the symbol's type from the map and add a declaration to an extern block in the generated code.

We must not emit FFI bindings for symbols which are defined in the current module, because otherwise we'd get name collisions. I'd also prefer not to emit FFI bindings for symbols which aren't used by the current module: C programs normally include header files which are absolutely full of declarations that the program doesn't use.

Add a WriterT monad transformer to the EnvMonad stack with a Set of the possible-FFI symbols that have been referenced. Add to the set when translating CVar in interpretExpr.
When generating the extern block, use only symbols which are in the used-symbols set, while still subtracting the no-FFI set.

Note that FFI-binding functions are permitted to have varargs signatures, even though pure Rust functions are not. And since the first thing anyone ever wants to call is printf, we'd better handle the varargs functions.

implement sizeof/alignof for expressions

The CSizeofExpr and CAlignofExpr constructors of CExpression should be translated almost identically to the CSizeofType and CAlignofType constructors, respectively, except that the expression needs to be translated using interpretExpr.

However, I'm not sure about the exact translation. There are two possibilities:

If the expression is supposed to have its side effects evaluated, then we should pass a borrow of the resulting expression to std::mem::size_of_val.
Otherwise, if side effects are not supposed to be evaluated, we should just use the resultType from translating the expression, throwing away the expression itself, and proceed just like CSizeofType/CAlignofType.

Before trying to implement this, we need a citation to C99 or C11 which clarifies whether the expression should actually be evaluated.

Bad conversion of static const var = {}

Hey,

there seems to be a pretty big issue with static const var = {} in Corrode.

Input:

static const D = {0x78a3, 0x1359, 0x4dca, 0x75eb, 0xd8ab, 0x4141, 0x0a4d, 0x0070, 0xe898, 0x7779, 0x4079, 0x8cc7, 0xfe73, 0x2b6f, 0x6cee, 0x5203};

Output:

static D : *mut isize = 0x78a3i32 as (*mut isize);

special-case translation for main

Blindly translating main just like any other C function results in rustc reporting that main has the wrong type. In C, main should have one of these types:

int main(void)
int main(int argc, char **argv)
int main(int argc, char **argv, char **environ)

(That's from memory; somebody please correct me if I have something wrong.)

In Rust, main must be declared fn main() -> (). If you want access to the program's arguments or environment variables, you're supposed to get them using definitions from std::env. To exit with a specific status code, call std::process::exit. And main must not be declared either unsafe or extern, but we should normally use both for all translated functions.

Also, it is legal in C to call main recursively or from elsewhere in the program, so we should translate the C main just like any other C function, preserving its argument types and everything.

I think that means Corrode must rename main where it's defined and anywhere that it's called; and I think when we translate main, we should also emit a wrapper function that is a legal Rust declaration for main, which sets up the expected arguments, calls the translated main in an unsafe block, and finally calls exit with the result.

The translated C code can expect argv and environ to be mutable null-terminated arrays of mutable null-terminated strings, and none of those things describe what we get from std::env. So for the two-argument and three-argument variants of main, I guess we need to:

allocate a Vec<OsString> to hold ownership of a copy of each string;
call .push('\0') on each one;
allocate a Vec<*mut u8> to hold the result of .as_mut_ptr() on each string;
call .push(std::ptr::null()) on the latter vector to null-terminate it;
and pass .as_mut_ptr() of that vector to the translated main.

Or something like that? And we need to ensure that both vectors stay alive until main returns because the as_mut_ptr references won't do the job for us.

BLOCKS is defined, but blocks are not supported

When including MacOSX10.11.sdk/usr/include/stdlib.h, corrode does not parse the C block syntax extension

The symbol `^' does not fit here.

Invoking the compiler with -U__BLOCKS__ helps, but I think __BLOCKS__ should be undefined by default.