visq / language-c Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cartazio/language-c

84.0 84.0 43.0 6.96 MB

Source repository for https://hackage.haskell.org/package/language-c

Home Page: http://visq.github.io/language-c/

License: Other

Haskell 45.99% C 40.68% Shell 1.42% CSS 0.14% Ruby 0.08% Yacc 8.11% Makefile 0.90% SWIG 0.03% Nix 0.07% Lex 2.57%

language-c's People

Contributors

Stargazers

Watchers

Forkers

dfordivam acowley iphydf dvekeman yuhangwang lambdageek nicolasdp evincarofautumn maskray corngood krakrjak deepfire chkl denisenkom jchia tmcdonell chanshunli mewbak bgaster bendmorris typelogic julmue aweinstock314 richard-zhang flyfish30 expipiplus1 jiriklepl helvm wyc itsshadowl terrorjack mtolly hth313 git-lri beyonddream-productions andreasabel hamishmack kalhauge drone29a csabahruska noahmartinwilliams bgamari karenkonou

language-c's Issues

Add support for GCC legacy __sync_XXX builtins

From https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html:

The following built-in functions are intended to be compatible with those described in the Intel Itanium Processor-specific Application Binary Interface, section 7.4. As such, they depart from normal GCC practice by not using the ‘__builtin_’ prefix and also by being overloaded so that they work on multiple types.

The definition given in the Intel documentation allows only for the use of the types int, long, long long or their unsigned counterparts. GCC allows any scalar type that is 1, 2, 4 or 8 bytes in size other than the C type _Bool or the C++ type bool. Operations on pointer arguments are performed as if the operands were of the uintptr_t type. That is, they are not scaled by the size of the type to which the pointer points.

They have about a dozen of these things, they look more or less like this one:

type __sync_fetch_and_add (type *ptr, type value, ...)

Missing support for `__uint128_t` on ARM64 (Apple M1)

Hello,

I just moved to to a new ARM64-based laptop running macOS 13.0.1 and am running into this problem when building a Haskell package I authored:

binaryninja> Preprocessing library for binaryninja-0.1.0..
binaryninja> c2hs: C header contains errors:
binaryninja>
binaryninja> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/mach/arm/_structs.h:498: (column 2) [ERROR]  >>> Syntax error !
binaryninja>   The symbol `__uint128_t' does not fit here.
binaryninja>

It appears language-c, which is used by c2hs, isn't expecting the __uint128_t token and generates a syntax error.

There's a similar issue posted awhile back here:
https://discourse.haskell.org/t/problem-with-language-c-on-arm-mac/3841

Please let me know if there's anything I can do to help with debugging.

Build failure with GHC 7.8

Latest published version (0.9.0.1) fails to build with GHC 7.8:

[32 of 40] Compiling Language.C.Analysis.TravMonad ( src/Language/C/Analysis/TravMonad.hs, dist/build/Language/C/Analysis/TravMonad.o )

src/Language/C/Analysis/TravMonad.hs:432:48:
    Not in scope: ‘<$>’
    Perhaps you meant ‘<*>’ (imported from Control.Applicative)
Error: cabal: Failed to build language-c-0.9.0.1

Building with GHC 7.10 succeeds.

The missing import is probably easily fixable so GHC 7.8 build can be restored.

In my role as hackage trustee, I revised to base >= 4.8 on hackage to fix the build plan: https://hackage.haskell.org/package/language-c-0.9.0.1/revisions/

Add support for GNU C11 extension __auto_type

Support statement attributes

GCC Attribute Syntax allows the following:

In GNU C, an attribute specifier list may appear as part of a null statement. The attribute goes before the semicolon.

ie, this is currently a syntax error in language-c:

switch (e) {
  case 1:
         foo ();
         __attribute__((fallthrough)) ;
   default:
         bar ();
}

Unfortunately adding the naive additional production to statement leads to two extra shift/reduce conflicts and it's not obvious to me if they're benign or problematic.

statement ::
  ...
  | attrs ';'                   {% withNodeInfo $1 $ CAttrStmt $1 }

Weird edge case when case is labeled

Language C parses the following code:



int
main()
{
    int x;

    x = 0;

    switch(x) {
        {
            x = 1 + 1;
            foo:
            case 1:
                return 1;
            case 2:
                return 9;
        }
    }
}

Into this (note the extra nesting that has both the first and the second case:

int main()
{
    int x;
    x = 0;
    switch (x)
    {
        {
            x = 1 + 1;
        foo:
        case 1:
            return 1;
        case 2:
            return 9;
        }
    }
}

The reason I think this happens is because Language.C greedily adds the foo label into the compound statement of x = 1 + 1, which in turn grabs the next case. Or something like that.

Anyway, it's not much of a bug since the resulting code still compiles with the same behavior. Just thought you might wanted to know.

By the way thanks for this project, it really is awesome!

_Float128 is unrecognized

Under gcc 7.2 with glibc 2.26 on Arch Linux, parsing /usr/include/stdlib.h with the option -D__STDC_WANT_IEC_60559_TYPES_EXT__ results in an error:

  Syntax error !
  The symbol `strtof128' does not fit here.

The offending code from stdlib.h is as follows:

#if __HAVE_FLOAT128 && __GLIBC_USE (IEC_60559_TYPES_EXT)
/* Likewise for the '_Float128' format  */
extern _Float128 strtof128 (const char *__restrict __nptr,
                      char **__restrict __endptr)
     __THROW __nonnull ((1));
#endif

To repro, take the BasicUsage.hs example and:

parse "/usr/include/stdlib.h" instead of "test.c"
use parseCFile options ["-D__STDC_WANT_IEC_60559_TYPES_EXT__"] instead of []
run the example on a problematic platform (current Arch Linux should suffice, but I think another system with with gcc 7.2 and glibc 2.26 should also suffice)

Relevant link about _Float128: https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/Floating-Types.html#Floating-Types

[Analysis] Trav ought to be a monad transformer

Here's a wishlist item I've been thinking about: I wish Trav was a monad transformer built up out of smaller simpler monad transformers.

Trav s a is meant to be extensible via the userState component s and the handleDecl function from MonadTrav, but this isn't always enough.

For example if I want to handle some events by running some kind of imperative operation (e.g. some kind of Union-Find algorithm or maybe feeding facts to an external solver like Z3) that's not easy to do right now.

Instead of baking in user state into the monad, Trav should be a monad transformer: TravT m a and Trav s a = TravT (State s) a (I'm not sure how user state interacts with Trav's exception mechanism).

In fact, ideally the MonadSymtab and MonadName, and MonadCError parts of Trav should all be broken out into separate transformers that layer on the functionality one by one (so that if I want to have some kind of extended symbol information, for example, I could add my own MonadSymtab instance in a custom stack). The default TravT should be a newtype around a particular instantiation of a stack of simpler transformers..

Benign duplicate type definition is not an error in GNU C

typedef int Foo;
typedef int Foo;

This code compiles fine with gcc. It is a GNU extension. GNU99 mode does not (but should) support it. It is a real problem, because OSX headers define va_list twice: once in /usr/include/sys/_types/_va_list.h and once in $TOOLCHAIN/usr/lib/clang/7.3.0/include/stdarg.h. They both define it to __builtin_va_list.

Support of C11's spelling of alignof

C11 standardized the use of alignof through the _Alignof operator, see http://en.cppreference.com/w/c/language/_Alignof. language-c will correctly parse the following spellings: alignof, __alignof__, and __alignof, but throws an error when parsing the new _Alignof (which the alignof macro gets expanded to in C11).

Missing builtins: bswap32, bswap64

On Mac OS X:

/usr/include/libkern/i386/_OSByteOrder.h:60: (column 12) [WARNING]  >>> AST invariant violated
  unknown function: __builtin_bswap32
/usr/include/libkern/i386/_OSByteOrder.h:74: (column 12) [WARNING]  >>> AST invariant violated
  unknown function: __builtin_bswap64

Add support for multiple enumerator attributes

From GCC documentation on attribute syntax:

In GNU C, an attribute specifier list may appear as part of an enumerator. The attribute goes after the enumeration constant, before =, if present. The optional attribute in the enumerator appertains to the enumeration constant. It is not possible to place the attribute after the constant expression, if present.

where attribute specifier list is multiple attribute specifiers with no intervening tokens.

OSX /usr/include/time.h has multiple __attribute__((availability(...))) attributes stacked on each enumerator:

typedef enum {
_CLOCK_REALTIME __OSX_AVAILABLE(10.12) __IOS_AVAILABLE(10.0) __TVOS_AVAILABLE(10.0) __WATCHOS_AVAILABLE(3.0) = 0,
}

where __OSX_AVAILABLE(_vers) is __attribute__(((macosx,introduced=_vers))) and similarly for the others.

sizeofType is very broken

It doesn't take into account padding in structs, it really shouldn't be exported in this state, or at the very least should have a really big warning attached!

TypeDefRef should be defined with a Type not Maybe Type

I can't find any code in language-c where TypeDefRef is constructed with a Nothing :: Maybe Type?

maybe we should have:

data TypeDefRef = TypeDefRef Ident Type {- used to be (Maybe Type) -} NodeInfo

0-based line number directives generate errors.

Initially I noticed the error as a c2hs failure on most projects when using c2hs against gcc-11.0.0 (development trunk).

Complete c2hs example:

$ cat a.chs
module M where
$ /usr/bin/c2hs '--cpp=x86_64-pc-linux-gnu-gcc-10.2.0' '--cppopts=-E'  a.chs
$ /usr/bin/c2hs '--cpp=x86_64-pc-linux-gnu-gcc-11.0.0' '--cppopts=-E'  a.chs
c2hs: C header contains errors:

a.i:1: (column 1) [ERROR]  >>> Lexical error !
  The character '#' does not fit here.

This seems to happen because gcc-11 slightly renumbered line numbers for synthetic entries:

$ diff -U0 <(gcc-10.2.0 -E -x c /dev/null) <(gcc-11.0.0 -E -x c /dev/null)
--- /dev/fd/63  2020-08-08 09:09:37.245505668 +0100
+++ /dev/fd/62  2020-08-08 09:09:37.245505668 +0100
@@ -1,4 +1,3 @@
-# 1 "/dev/null"
-# 1 "<built-in>"
-# 1 "<command-line>"
-# 31 "<command-line>"
+# 0 "/dev/null"
+# 0 "<built-in>"
+# 0 "<command-line>"
@@ -6 +5 @@
-# 32 "<command-line>" 2
+# 0 "<command-line>" 2

Note: # 1 "/dev/null" changed to # 0 "/dev/null".

I think the error is also seen when running language-c directly as:

$ ghci
Prelude> Language.C.parseC (Data.ByteString.Char8.pack "# 1 \"/dev/null\"\n") Language.C.nopos
*** Exception: No match in record selector posOffset

Prelude> Language.C.parseC (Data.ByteString.Char8.pack "# 0 \"/dev/null\"\n") Language.C.nopos
Left <no file>:: [ERROR]  >>> Syntax Error !
  Lexical error !
  The character '#' does not fit here.

Thanks!

Should ignore '-g...' preprocessor options for GCC

With -ggdb3 (or -g3) gcc will leave preprocessor definitions in the output:

# 1 "foo.c"
# 5 "/home/user/example//"
# 13 "<built-in>"
#define __STDC__ 1
#define __STDC_VERSION__ 199901L
#define __STDC_UTF_16__ 1
#define __STDC_UTF_32__ 1
#define __STDC_HOSTED__ 1
#define __GNUC__ 6
#define __GNUC_MINOR__ 3
#define __GNUC_PATCHLEVEL__ 0
#define __VERSION__ "6.3.0 20170406"
...

(Not just for builtins, also for regular #defines occurring in the source)

The documentation is in at https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html

What are TyAny and TyFloatN supposed to represent?

Aside from these I've come up with this MachineDesc for x86_64

x86_64 :: MachineDesc
x86_64 =
  let iSize = \case
        TyBool    -> 1
        TyChar    -> 1
        TySChar   -> 1
        TyUChar   -> 1
        TyShort   -> 2
        TyUShort  -> 2
        TyInt     -> 4
        TyUInt    -> 4
        TyLong    -> 8
        TyULong   -> 8
        TyLLong   -> 8
        TyULLong  -> 8
        TyInt128  -> 16
        TyUInt128 -> 16
      fSize = \case
        TyFloat    -> 4
        TyDouble   -> 8
        TyLDouble  -> 16
        TyFloatN{} -> error "TyFloatN"
      builtinSize = \case
        TyVaList -> 24
        TyAny    -> error "TyAny"
      ptrSize  = 8
      voidSize = 1
      iAlign   = \case
        TyBool    -> 1
        TyChar    -> 1
        TySChar   -> 1
        TyUChar   -> 1
        TyShort   -> 2
        TyUShort  -> 2
        TyInt     -> 4
        TyUInt    -> 4
        TyLong    -> 8
        TyULong   -> 8
        TyLLong   -> 8
        TyULLong  -> 8
        TyInt128  -> 16
        TyUInt128 -> 16
      fAlign = \case
        TyFloat    -> 4
        TyDouble   -> 8
        TyLDouble  -> 16
        TyFloatN{} -> error "TyFloatN"
      builtinAlign = \case
        TyVaList -> 8
        TyAny    -> error "TyAny"
      ptrAlign  = 8
      voidAlign = 1
  in  MachineDesc { .. }

Semantic analysis support for Float128 (Unexpected typespec: CFloat128Type)

Right now if we run analyzeAST on a translation unit that uses 128 bit floats, we will die at tDirectType:

language-c/src/Language/C/Analysis/DeclAnalysis.hs

Line 269 in 3f348f5

TSNonBasic t -> astError node ("Unexpected typespec: " ++ show t)

with the message

Unexpected typespec: CFloat128Type (NodeInfo "/usr/include/stdlib.h": line 236, ...

Support tracking the C include stack

As I found and posted on jameysharp/corrode#50 :

gcc -E puts these directives into the output:
# linenum filename flags
The (space-separated) flags are:

Begin a new file

Return to a file after an included file ends

Came from a system header file

Should be considered to be wrapped in extern "C" (irrelevant for Corrode, as it lacks C++)

As a result, you can simply treat "1" as "push" (with "3" setting a flag on it), "2" as "pop", and any directive with neither replaces the top-of-stack - that gives you the full include stack.

Then, at each newline that does not end such a directive, increment the line number of the current top of the stack.

EDIT: Oh, hm. Looks like Language.C mishandles this - fixing it would require altering Language.C.Data.Position to have a posParent or posCause member, which is itself a Maybe Position, so as to form the include stack, and then track it that way.

GCC __builtin_ builtins appear as undeclared identifiers

GCC (and Clang, evidently) have a ton of builtins that are just C standard functions, prefixed with __builtin_. From 6.59 Other Built-in Functions Provided by GCC

The ISO C90 functions abort, abs, acos, asin, atan2, atan, calloc, ceil, cosh, cos, exit, exp, fabs, floor, fmod, fprintf, fputs, frexp, fscanf, isalnum, isalpha, iscntrl, isdigit, isgraph, islower, isprint, ispunct, isspace, isupper, isxdigit, tolower, toupper, labs, ldexp, log10, log, malloc, memchr, memcmp, memcpy, memset, modf, pow, printf, putchar, puts, scanf, sinh, sin, snprintf, sprintf, sqrt, sscanf, strcat, strchr, strcmp, strcpy, strcspn, strlen, strncat, strncmp, strncpy, strpbrk, strrchr, strspn, strstr, tanh, tan, vfprintf, vprintf and vsprintf are all recognized as built-in functions unless -fno-builtin is specified (or -fno-builtin-function is specified for an individual function). All of these functions have corresponding versions prefixed with __builtin_.

As well as

GCC provides built-in versions of the ISO C99 floating-point comparison macros that avoid raising exceptions for unordered operands. They have the same names as the standard macros ( isgreater, isgreaterequal, isless, islessequal, islessgreater, and isunordered) , with __builtin_ prefixed. We intend for a library implementor to be able to simply #define each standard macro to its built-in equivalent. In the same fashion, GCC provides fpclassify, isfinite, isinf_sign, isnormal and signbit built-ins used with __builtin_ prefixed. The isinf and isnan built-in functions appear both with and without the __builtin_ prefix.

And there is more (see the GCC documentation linked above).

Related: jameysharp/corrode#99 (clang's <math.h> uses __builtin_fabsf)

Possible asm syntax with trailing colon in parameters

I ran into a c2hs issue when compiling fltkhs on Windows with OpenGL support, and I think I've traced it to the syntax issue it is complaining about. Please let me know if this should instead be reported somewhere else.

The error I got complained of this header in my Stack GHC folder:

C:/Users/Mike/AppData/Local/Programs/stack/x86_64-windows/ghc-8.8.4/mingw/x86_64-w64-mingw32/include/psdk_inc/intrin-impl.h:836: (column 222) [ERROR]  >>> Syntax error !
  The symbol `)' does not fit here.

The contents of the header are more or less this (line numbers differ somewhat but all the pieces below are the same): https://github.com/Alexpux/mingw-w64/blob/2dce559/mingw-w64-headers/include/psdk_inc/intrin-impl.h

As best as I can tell, what's happening is that:

__FLAGCLOBBER2 is being defined to nothing
This macro and this macro have __asm__ statements that end in : __FLAGCLOBBER2);, which becomes : );
The macros are then used a few places including here (which is the line 836 in my version)

So, it looks like either here or here in Parser.y could be modified to accept the : ) sequence with a colon but no clobbers. But, I am not sure if it would be correct to do so. If this all looks sensible I can whip up a PR.

Also pinging @deech due to the fltkhs build issue. (Can be worked around for now by editing the header.)

Support _Atomic and _Atomic ( type )

C11 has introduced a rather unpleasant complication.
The keyword _Atomic is allowed to be used in two different contexts:
(1) as a type qualifier if it is not followed by a opening parenthesis
(2) as part of a type specifier if it is followed by ( type )
It is tricky to extend the happy spec to support both forms in a conflict-free way.

Version parsing in Clang availability attribute

I encountered this in the OS X 10.12 SDK (/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/string.h):

__OSX_AVAILABLE(10.12.1)
// expands to
__OS_AVAILABILITY(macosx,introduced=10.12.1)
// expands to
__attribute__((availability(macosx,introduced=10.12.1)))

language-c tries to parse the version number as a float, leading to a syntax error:

The symbol `.1' does not fit here.

This is an unfortunate complication, but I guess it should be supported.

GCC preprocessor output generated in non-ASCII locales cannot be processed

see this issue

Parse errors when parsing macOS system headers

I have recently noticed the issues in the title. There seem to be some syntax extensions involved which language-c does not appear to currently support.

A small example:

Contents of test.c:

#include <dirent.h>

Contents of Test.hs:

{-# LANGUAGE LambdaCase #-}

import           Language.C
import           Language.C.System.Preprocess
import           Language.C.System.GCC

main :: IO ()
main = do
  let gcc = newGCC "/usr/bin/gcc"
      fileName = "test.c"

  runPreprocessor gcc (cppFile fileName) >>= \case
    Left err -> putStrLn ("Preprocessor error: " ++ show err)

    Right preprocessed ->
      case parseC preprocessed (initPos fileName) of
        Left err -> putStrLn ("Parse error: " ++ show err)
        Right _ -> putStrLn "Success"

Executing runghc Test.hs in the same directory as test.c produces the following:

Parse error: /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/usr/include/dirent.h:159: (column 10) [ERROR]  >>> Syntax Error !
  Syntax error !
  The symbol `^' does not fit here.

The relevant line, and surrounding lines, in that header are these:

int scandir_b(const char *, struct dirent ***,
    int (^)(const struct dirent *) __scandir_noescape,         //  **** This is line 159  *****
    int (^)(const struct dirent **, const struct dirent **) __scandir_noescape)
    __DARWIN_INODE64(scandir_b) __OSX_AVAILABLE_STARTING(__MAC_10_6, __IPHONE_3_2);

I think this syntax extension is for blocks, but I haven't used that feature before.

Support putting arbitrary command line options first

At the moment this library issues commands like this (verified with strace)

» /usr/bin/gcc -o /tmp/c2hsc499725-608499725-610.i -E /tmp/c2hsc499725-608.src
gcc: warning: /tmp/c2hsc499725-608.src: linker input file unused because linking not done

To get rid of this warning one solution is to write this

/usr/bin/gcc -x c -o /tmp/c2hsc499725-608499725-610.i -E /tmp/c2hsc499725-608.src

At the moment this is not possible due to a limited set of options that are accepted to go first https://github.com/visq/language-c/blob/master/src/Language/C/System/GCC.hs#L104

Make a hackage release for GHC 8.8

There is no current hackage release that supports GHC 8.8
master supports GHC 8.8 right now - could you please cut a release?

Add cabal test suite for harness

_Float128 handling seems to be still broken

In glibc-2.26, in /usr/include/bits/floatn.h:

/* The type _Float128 exists only since GCC 7.0.  */
# if !__GNUC_PREREQ (7, 0) || defined __cplusplus
typedef __float128 _Float128;
# endif

I think this is saying that in GCC older than 7.0 and in C++ mode, _Float128 is not a type natively recognized by the compiler, so it's put in a typedef. This is consistent with what we can tell from the glibc-2.26 announcement at https://sourceware.org/ml/libc-announce/2017/msg00001.html.
#41 adds _Float128 to language-c, so language-c becomes like a compiler that natively recognizes _Float128, i.e. GCC 7.0 and above, under which the above typedef line would be rejected.
I don't know how exactly language-c deals with macros when parsing source files, specifically what macro definitions it starts with by default, but I wonder, to properly handle this case in floatn.h, whether language-c should define macros related to GCC version and pretend to be GCC 7?

Currently, there is the following test case that breaks: Take the BasicUsage.hs example, but instead of 'test.c', compile a file with the following content:

#include <bits/floatn.h>

And:

specify an older gcc (e.g. gcc 5) as argument to newGCC OR
use a filename extension of '.cpp' instead of '.c' (I'm not sure whether parsing of .cpp files is within the scope of language-c.)

An error is observed for the typedef line: "Syntax error ! The symbol ',' does not fit here."

I'm not sure where language-c gets default macro definitons, but I suspect it just takes the macros from whatever gcc is used for newGCC.

Add support for Clang blocks

"Blocks" are closures and are supported by Clang and used in OSX system headers. It would be fairly straightforward to add: they are simple function pointers but instead of *, they use ^ for the pointer symbol. E.g. void (^)(int, char).

Update package on hackage

Hi, just wondering if this package could be updated on hackage. I'm having some build problems because cabal is pulling the latest from hackage which still uses foldWithKey (rather than foldrWithKey).

Thanks

Problem generating function binding with c2hs

I'm writing some ncurses bindings with c2hs and for some reason it's running into trouble parsing a declaration.

Here is the c2hs code:

{# pointer *WINDOW as Window #}
{# fun keypad { `Window', `Bool' } -> `Int' }

And here is the ncurses.h declaration I'm trying to generate a FFI call to:

extern NCURSES_EXPORT(int) keypad (WINDOW *,bool);

c2hs is failing with:

./Graphics/Ncurses/Raw.chs:30: (column 13) [ERROR]  >>> Internal wrapper error!
  Something went wrong generating a bare structure wrapper.
  makeArg:arg=False
  cdecl=CDecl [CTypeSpec (CTypeDef (Ident "WINDOW" 143904934 (NodeInfo ("/usr/include/curses.h": line 675, in file included from ("/home/mitchell/haskell/ncurses-raw/dist-newstyle/build/x86_64-linux/ghc-8.4.2/ncurses-raw-0.1.0/noopt/build/Graphics/Ncurses/Raw.chs.h": line 2)) (("/usr/include/curses.h": line 675, in file included from ("/home/mitchell/haskell/ncurses-raw/dist-newstyle/build/x86_64-linux/ghc-8.4.2/ncurses-raw-0.1.0/noopt/build/Graphics/Ncurses/Raw.chs.h": line 2)),6) (Name {nameId = 4074}))) (NodeInfo ("/usr/include/curses.h": line 675, in file included from ("/home/mitchell/haskell/ncurses-raw/dist-newstyle/build/x86_64-linux/ghc-8.4.2/ncurses-raw-0.1.0/noopt/build/Graphics/Ncurses/Raw.chs.h": line 2)) (("/usr/include/curses.h": line 675, in file included from ("/home/mitchell/haskell/ncurses-raw/dist-newstyle/build/x86_64-linux/ghc-8.4.2/ncurses-raw-0.1.0/noopt/build/Graphics/Ncurses/Raw.chs.h": line 2)),6) (Name {nameId = 4075})))] [(Just (CDeclr Nothing [CPtrDeclr [] (NodeInfo ("/usr/include/curses.h": line 675, in file included from ("/home/mitchell/haskell/ncurses-raw/dist-newstyle/build/x86_64-linux/ghc-8.4.2/ncurses-raw-0.1.0/noopt/build/Graphics/Ncurses/Raw.chs.h": line 2)) (("/usr/include/curses.h": line 675, in file included from ("/home/mitchell/haskell/ncurses-raw/dist-newstyle/build/x86_64-linux/ghc-8.4.2/ncurses-raw-0.1.0/noopt/build/Graphics/Ncurses/Raw.chs.h": line 2)),1) (Name {nameId = 4076}))] Nothing [] (OnlyPos <no file> (<no file>,-1))),Nothing,Nothing)] (NodeInfo ("/usr/include/curses.h": line 675, in file included from ("/home/mitchell/haskell/ncurses-raw/dist-newstyle/build/x86_64-linux/ghc-8.4.2/ncurses-raw-0.1.0/noopt/build/Graphics/Ncurses/Raw.chs.h": line 2)) (("/usr/include/curses.h": line 675, in file included from ("/home/mitchell/haskell/ncurses-raw/dist-newstyle/build/x86_64-linux/ghc-8.4.2/ncurses-raw-0.1.0/noopt/build/Graphics/Ncurses/Raw.chs.h": line 2)),1) (Name {nameId = 4077}))
  idx=1

And the internal c2hs function that is producing this output is defined as:

makeArg :: Position -> (Bool, Int) -> CDecl -> CST s CExpr
makeArg _ (arg, _) (CDecl _ [(Just (CDeclr (Just i) _ _ _ _), _, _)] n) =
  return $ case arg of
    False -> CVar i n
    True -> CUnary CIndOp (CVar i n) n
makeArg _ (arg, idx) (CDecl _ [] n) =
  let i = internalIdent $ "c2hs__dummy_arg_" ++ show idx
  in return $ case arg of
    False -> CVar i n
    True -> CUnary CIndOp (CVar i n) n
makeArg pos (arg, idx) cdecl =
  internalWrapperErr pos ["makeArg:arg=" ++ show arg,
                          "cdecl=" ++ show cdecl,
                          "idx=" ++ show idx]

The problem here is the pattern match

(Just (CDeclr (Just i) _ _ _ _), _, _)

is failing because we instead have parsed

(Just (CDeclr Nothing ...)

So, it appears somehow we've parsed a nameless, abstract declarator, but it clearly has a name - keypad!

New hackage release

Would you mind making a new hackage release? This would be helpful for users of Arch Linux and Fedora 27 as both are using glibc 2.26, which is associated with #39.

Callback declaration with _Nullable doesn't get parsed

FreeBSD stdio.h has this code:

int (* _Nullable _close)(void *);

Parsing this code via c2hs gives:

> > > Syntax error!
The symbol `_close' does not fit here.

I suppose it is a bug, because clang itself processes this file just fine.

Provide NFData instances for AST data types

Please consider providing NFData instances for the types in Language.C.Syntax.AST.

New release on hackage

It would be very nice to have a new release on hackage. We at FreeBSD use C11 and Clang extensively in system headers, but current language-c version does not support various specifiers, like _Nullable.

revise bounds on Hackage to allow latest deepseq

Right now language-c can't be built with GHC 9.8:

Error: cabal: Could not resolve dependencies:
[__0] trying: language-c-0.9.2 (user goal)
[__1] next goal: process (dependency of language-c)
[__1] rejecting: process-1.6.18.0/installed-b188 (conflict: language-c =>
deepseq>=1.4.0.0 && <1.5, process => deepseq==1.5.0.0/installed-5710)
[__1] trying: process-1.6.18.0
[__2] trying: unix-2.8.3.0/installed-7905 (dependency of process)
[__3] next goal: time (dependency of unix)
[__3] rejecting: time-1.12.2/installed-c858 (conflict: language-c =>
deepseq>=1.4.0.0 && <1.5, time => deepseq==1.5.0.0/installed-5710)

Support all IEC 60559 types (_Float{32,64,128},_Float{32,64,128}x)

Hackage's ncurses-0.2.16 package fails to build against ncurses-6.1 C library as it now defines references to _Float32 and _Float64. The c2hs fails to built ncurses as:

c2hs: C header contains errors:

/usr/include/wchar.h:396: (column 17) [ERROR]  >>> Syntax error !
  The symbol `wcstof32' does not fit here.

/usr/include/wchar.h contents are:

#ifdef __USE_ISOC99
/* Likewise for `float' and `long double' sizes of floating-point numbers.  */
extern float wcstof (const wchar_t *__restrict __nptr,
                     wchar_t **__restrict __endptr) __THROW;
extern long double wcstold (const wchar_t *__restrict __nptr,
                            wchar_t **__restrict __endptr) __THROW;
#endif /* C99 */

/* Likewise for `_FloatN' and `_FloatNx' when support is enabled.  */

#if __HAVE_FLOAT16 && defined __USE_GNU
extern _Float16 wcstof16 (const wchar_t *__restrict __nptr,
                          wchar_t **__restrict __endptr) __THROW;
#endif

#if __HAVE_FLOAT32 && defined __USE_GNU
extern _Float32 wcstof32 (const wchar_t *__restrict __nptr,
                          wchar_t **__restrict __endptr) __THROW;
#endif

#if __HAVE_FLOAT64 && defined __USE_GNU
extern _Float64 wcstof64 (const wchar_t *__restrict __nptr,
                          wchar_t **__restrict __endptr) __THROW;
#endif

#if __HAVE_FLOAT128 && defined __USE_GNU
extern _Float128 wcstof128 (const wchar_t *__restrict __nptr,
                            wchar_t **__restrict __endptr) __THROW;
#endif

#if __HAVE_FLOAT32X && defined __USE_GNU
extern _Float32x wcstof32x (const wchar_t *__restrict __nptr,
                            wchar_t **__restrict __endptr) __THROW;
#endif

#if __HAVE_FLOAT64X && defined __USE_GNU
extern _Float64x wcstof64x (const wchar_t *__restrict __nptr,
                            wchar_t **__restrict __endptr) __THROW;
#endif

#if __HAVE_FLOAT128X && defined __USE_GNU
extern _Float128x wcstof128x (const wchar_t *__restrict __nptr,
                              wchar_t **__restrict __endptr) __THROW;
#endif

To clarify: gcc-8.1.0 directly supports the following types: _Float{32,64,128}, _Float{32,64,128}x (they are not typedef aliases of other types from syntax standpoint).

c2hs fails to parse lzlib.h

When I run c2hs on the following Haskell:

module Codec.Lzip ( LZErrno (..) 
                  ) where

#include <lzlib.h>

{# enum LZ_Errno as LZErrno {underscoreToCase} #}

I get

c2hs: C header contains errors:

/usr/include/lzlib.h:59: (column 23) [ERROR]  >>> Syntax error !
  The symbol `uint8_t' does not fit here.

The file /usr/include/lzlib.h contains the following:

/*  Lzlib - Compression library for the lzip format
    Copyright (C) 2009-2019 Antonio Diaz Diaz.

    This library is free software. Redistribution and use in source and
    binary forms, with or without modification, are permitted provided
    that the following conditions are met:

    1. Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.

    2. Redistributions in binary form must reproduce the above copyright
    notice, this list of conditions and the following disclaimer in the
    documentation and/or other materials provided with the distribution.

    This library is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
*/

#ifdef __cplusplus
extern "C" {
#endif

#define LZ_API_VERSION 1

static const char * const LZ_version_string = "1.11";

enum LZ_Errno { LZ_ok = 0,         LZ_bad_argument, LZ_mem_error,
                LZ_sequence_error, LZ_header_error, LZ_unexpected_eof,
                LZ_data_error,     LZ_library_error };


const char * LZ_version( void );
const char * LZ_strerror( const enum LZ_Errno lz_errno );

int LZ_min_dictionary_bits( void );
int LZ_min_dictionary_size( void );
int LZ_max_dictionary_bits( void );
int LZ_max_dictionary_size( void );
int LZ_min_match_len_limit( void );
int LZ_max_match_len_limit( void );


/*---------------------- Compression Functions ----------------------*/

struct LZ_Encoder;

struct LZ_Encoder * LZ_compress_open( const int dictionary_size,
                                      const int match_len_limit,
                                      const unsigned long long member_size );
int LZ_compress_close( struct LZ_Encoder * const encoder );

int LZ_compress_finish( struct LZ_Encoder * const encoder );
int LZ_compress_restart_member( struct LZ_Encoder * const encoder,
                                const unsigned long long member_size );
int LZ_compress_sync_flush( struct LZ_Encoder * const encoder );

int LZ_compress_read( struct LZ_Encoder * const encoder,
                      uint8_t * const buffer, const int size );
int LZ_compress_write( struct LZ_Encoder * const encoder,
                       const uint8_t * const buffer, const int size );
int LZ_compress_write_size( struct LZ_Encoder * const encoder );

enum LZ_Errno LZ_compress_errno( struct LZ_Encoder * const encoder );
int LZ_compress_finished( struct LZ_Encoder * const encoder );
int LZ_compress_member_finished( struct LZ_Encoder * const encoder );

unsigned long long LZ_compress_data_position( struct LZ_Encoder * const encoder );
unsigned long long LZ_compress_member_position( struct LZ_Encoder * const encoder );
unsigned long long LZ_compress_total_in_size( struct LZ_Encoder * const encoder );
unsigned long long LZ_compress_total_out_size( struct LZ_Encoder * const encoder );


/*--------------------- Decompression Functions ---------------------*/

struct LZ_Decoder;

struct LZ_Decoder * LZ_decompress_open( void );
int LZ_decompress_close( struct LZ_Decoder * const decoder );

int LZ_decompress_finish( struct LZ_Decoder * const decoder );
int LZ_decompress_reset( struct LZ_Decoder * const decoder );
int LZ_decompress_sync_to_member( struct LZ_Decoder * const decoder );

int LZ_decompress_read( struct LZ_Decoder * const decoder,
                        uint8_t * const buffer, const int size );
int LZ_decompress_write( struct LZ_Decoder * const decoder,
                         const uint8_t * const buffer, const int size );
int LZ_decompress_write_size( struct LZ_Decoder * const decoder );

enum LZ_Errno LZ_decompress_errno( struct LZ_Decoder * const decoder );
int LZ_decompress_finished( struct LZ_Decoder * const decoder );
int LZ_decompress_member_finished( struct LZ_Decoder * const decoder );

int LZ_decompress_member_version( struct LZ_Decoder * const decoder );
int LZ_decompress_dictionary_size( struct LZ_Decoder * const decoder );
unsigned LZ_decompress_data_crc( struct LZ_Decoder * const decoder );

unsigned long long LZ_decompress_data_position( struct LZ_Decoder * const decoder );
unsigned long long LZ_decompress_member_position( struct LZ_Decoder * const decoder );
unsigned long long LZ_decompress_total_in_size( struct LZ_Decoder * const decoder );
unsigned long long LZ_decompress_total_out_size( struct LZ_Decoder * const decoder );

#ifdef __cplusplus
}
#endif
/*  Lzlib - Compression library for the lzip format
    Copyright (C) 2009-2019 Antonio Diaz Diaz.

    This library is free software. Redistribution and use in source and
    binary forms, with or without modification, are permitted provided
    that the following conditions are met:

    1. Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.

    2. Redistributions in binary form must reproduce the above copyright
    notice, this list of conditions and the following disclaimer in the
    documentation and/or other materials provided with the distribution.

    This library is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
*/

#ifdef __cplusplus
extern "C" {
#endif

#define LZ_API_VERSION 1

static const char * const LZ_version_string = "1.11";

enum LZ_Errno { LZ_ok = 0,         LZ_bad_argument, LZ_mem_error,
                LZ_sequence_error, LZ_header_error, LZ_unexpected_eof,
                LZ_data_error,     LZ_library_error };


const char * LZ_version( void );
const char * LZ_strerror( const enum LZ_Errno lz_errno );

int LZ_min_dictionary_bits( void );
int LZ_min_dictionary_size( void );
int LZ_max_dictionary_bits( void );
int LZ_max_dictionary_size( void );
int LZ_min_match_len_limit( void );
int LZ_max_match_len_limit( void );


/*---------------------- Compression Functions ----------------------*/

struct LZ_Encoder;

struct LZ_Encoder * LZ_compress_open( const int dictionary_size,
                                      const int match_len_limit,
                                      const unsigned long long member_size );
int LZ_compress_close( struct LZ_Encoder * const encoder );

int LZ_compress_finish( struct LZ_Encoder * const encoder );
int LZ_compress_restart_member( struct LZ_Encoder * const encoder,
                                const unsigned long long member_size );
int LZ_compress_sync_flush( struct LZ_Encoder * const encoder );

int LZ_compress_read( struct LZ_Encoder * const encoder,
                      uint8_t * const buffer, const int size );
int LZ_compress_write( struct LZ_Encoder * const encoder,
                       const uint8_t * const buffer, const int size );
int LZ_compress_write_size( struct LZ_Encoder * const encoder );

enum LZ_Errno LZ_compress_errno( struct LZ_Encoder * const encoder );
int LZ_compress_finished( struct LZ_Encoder * const encoder );
int LZ_compress_member_finished( struct LZ_Encoder * const encoder );

unsigned long long LZ_compress_data_position( struct LZ_Encoder * const encoder );
unsigned long long LZ_compress_member_position( struct LZ_Encoder * const encoder );
unsigned long long LZ_compress_total_in_size( struct LZ_Encoder * const encoder );
unsigned long long LZ_compress_total_out_size( struct LZ_Encoder * const encoder );


/*--------------------- Decompression Functions ---------------------*/

struct LZ_Decoder;

struct LZ_Decoder * LZ_decompress_open( void );
int LZ_decompress_close( struct LZ_Decoder * const decoder );

int LZ_decompress_finish( struct LZ_Decoder * const decoder );
int LZ_decompress_reset( struct LZ_Decoder * const decoder );
int LZ_decompress_sync_to_member( struct LZ_Decoder * const decoder );

int LZ_decompress_read( struct LZ_Decoder * const decoder,
                        uint8_t * const buffer, const int size );
int LZ_decompress_write( struct LZ_Decoder * const decoder,
                         const uint8_t * const buffer, const int size );
int LZ_decompress_write_size( struct LZ_Decoder * const decoder );

enum LZ_Errno LZ_decompress_errno( struct LZ_Decoder * const decoder );
int LZ_decompress_finished( struct LZ_Decoder * const decoder );
int LZ_decompress_member_finished( struct LZ_Decoder * const decoder );

int LZ_decompress_member_version( struct LZ_Decoder * const decoder );
int LZ_decompress_dictionary_size( struct LZ_Decoder * const decoder );
unsigned LZ_decompress_data_crc( struct LZ_Decoder * const decoder );

unsigned long long LZ_decompress_data_position( struct LZ_Decoder * const decoder );
unsigned long long LZ_decompress_member_position( struct LZ_Decoder * const decoder );
unsigned long long LZ_decompress_total_in_size( struct LZ_Decoder * const decoder );
unsigned long long LZ_decompress_total_out_size( struct LZ_Decoder * const decoder );

#ifdef __cplusplus
}
#endif

Thanks!

Handle GCC command-line flags used by libtool

The following GCC flags need to keep their arguments together with the flag in gccParseCPPArgs:

-MF file
-MT target
-MQ target

Otherwise the filename or target gets separated from the flag and gcc tries to interpret them as source filenames, because of the special case when "-M" is a prefix of flag.

Projects that use libtool use both -MF and -MT, so this bug prevents language-c-based tools like Corrode from being used as drop-in replacements for GCC in those build environments.

language-c-0.8 fails to parse fa.h / stdio.h

Hello,

I wrote a small binding to the library libfa (http://augeas.net/libfa/) with c2hs.

However, i noticed that when c2hs is compiled with the newer language-c-0.8 instead of language-c-0.7.2 it fails to parse the header file of libfa. With the older version it works. This is also the reason why i post the issue here instead of the c2hs repository. In case you think this is unrelated to language-c feel free to redirect me.
You'll see the exact issue at the end of the build log, it fails to parse the standard stdio.h header file that is included by libfa's header file. My version of stdio.h is the one bundled with Apple LLVM version 9.1.0 (clang-902.0.39.2) as i am using macOS 10.13.4. I have additionally attached the lines of stdio.h that cause the failure.

Please find attached the build log, i tidied it up a bit to only include the relevant information:

Failed to install libfa-1.0.8.23
Build log:
Using Parsec parser
Configuring libfa-1.0.8.23...
Dependency base ==4.10.1.0: using base-4.10.1.0
Source component graph: component lib
Configured component graph:
    component libfa-1.0.8.23-ZfhKuK3orHDHB6gQt3Gon
        include base-4.10.1.0
Linked component graph:
    unit libfa-1.0.8.23-ZfhKuK3orHDHB6gQt3Gon
        include base-4.10.1.0
        FiniteAutomata=libfa-1.0.8.23-ZfhKuK3orHDHB6gQt3Gon:FiniteAutomata
Ready component graph:
    definite libfa-1.0.8.23-ZfhKuK3orHDHB6gQt3Gon depends base-4.10.1.0
Using Cabal-2.2.0.0 compiled by ghc-8.4
Using compiler: ghc-8.2.2
Using alex version 3.2.4 found on system at: /Users/travis/.cabal/bin/alex
Using ar found on system at: /usr/bin/ar
Using c2hs version 0.28.5 found on system at: /Users/travis/.cabal/bin/c2hs
No cpphs found
No doctest found
Using gcc version 4.2.1 found on system at: /usr/bin/clang
Using ghc version 8.2.2 found on system at: /usr/local/bin/ghc
Using ghc-pkg version 8.2.2 found on system at: /usr/local/bin/ghc-pkg
No ghcjs found
No ghcjs-pkg found
No greencard found
Using haddock version 2.18.1 found on system at: /usr/local/bin/haddock
Using happy version 1.19.9 found on system at: /Users/travis/.cabal/bin/happy
Using haskell-suite found on system at: haskell-suite-dummy-location
Using haskell-suite-pkg found on system at: haskell-suite-pkg-dummy-location
No hmake found
Using hpc version 0.67 found on system at: /usr/local/bin/hpc
Using hsc2hs version 0.68.2 found on system at: /usr/local/bin/hsc2hs
No hscolour found
No jhc found
Using ld found on system at: /usr/bin/ld
No lhc found
No lhc-pkg found
Using pkg-config version 0.29.2 found on system at: /usr/local/bin/pkg-config
Using runghc version 8.2.2 found on system at: /usr/local/bin/runghc
Using strip found on system at: /usr/bin/strip
Using tar found on system at: /usr/bin/tar
No uhc found
Component build order: library
Preprocessing library for libfa-1.0.8.23..

/Users/travis/.cabal/bin/c2hs '--cpp=/usr/bin/clang' '--cppopts=-E' '--cppopts=-D__GLASGOW_HASKELL__=802' '--cppopts=-Ddarwin_BUILD_OS=1' '--cppopts=-Dx86_64_BUILD_ARCH=1' '--cppopts=-Ddarwin_HOST_OS=1' '--cppopts=-Dx86_64_HOST_ARCH=1' '--cppopts=-I/usr/local/include' '--cppopts=-includedist/dist-sandbox-c7261b7a/build/autogen/cabal_macros.h' '--include=dist/dist-sandbox-c7261b7a/build' '--cppopts=-I/usr/local/Cellar/[email protected]/8.2.2/lib/ghc-8.2.2/base-4.10.1.0/include' '--cppopts=-I/usr/local/Cellar/[email protected]/8.2.2/libexec/integer-gmp/include' '--cppopts=-I/usr/local/Cellar/[email protected]/8.2.2/lib/ghc-8.2.2/integer-gmp-1.0.1.0/include' '--cppopts=-I/usr/local/Cellar/[email protected]/8.2.2/lib/ghc-8.2.2/include' '--output-dir=dist/dist-sandbox-c7261b7a/build' '--output=FiniteAutomata.hs' ./FiniteAutomata.chs

c2hs: C header contains errors:
/usr/include/_stdio.h:137: (column 19) [ERROR]  >>> Syntax error !
  The symbol `_close' does not fit here.

The cabal file looks like this:

name:                libfa
version:             0.1.0.0
build-type:          Simple
cabal-version:       >=1.24

library
  exposed-modules:   FiniteAutomata
  build-depends:     base >= 4.9 && < 4.12
  includes:          fa.h
  include-dirs:      "@LIBFA_INCLUDE_DIRS@"
  extra-lib-dirs:    "@LIBFA_LIBRARY_DIRS@"
  extra-libraries:   fa
  default-language:  Haskell2010
  other-extensions:  ForeignFunctionInterface
  build-tools:       c2hs

And a small excerpt of _stdio.h, see line 137:

   126  typedef struct __sFILE {
   127    unsigned char *_p;  /* current position in (some) buffer */
   128    int _r;   /* read space left for getc() */
   129    int _w;   /* write space left for putc() */
   130    short _flags;   /* flags, below; this FILE is free if 0 */
   131    short _file;    /* fileno, if Unix descriptor, else -1 */
   132    struct  __sbuf _bf; /* the buffer (at least 1 byte, if !NULL) */
   133    int _lbfsize; /* 0 or -_bf._size, for inline putc */
   134  
   135    /* operations */
   136    void  *_cookie; /* cookie passed to io functions */
   137    int (* _Nullable _close)(void *);
   138    int (* _Nullable _read) (void *, char *, int);
   139    fpos_t  (* _Nullable _seek) (void *, fpos_t, int);
   140    int (* _Nullable _write)(void *, const char *, int);
   141  
   142    /* separate buffer for long sequences of ungetc() */
   143    struct  __sbuf _ub; /* ungetc buffer */
   144    struct __sFILEX *_extra; /* additions to FILE to not break ABI */
   145    int _ur;    /* saved _r when _r is counting ungetc data */
   146  
   147    /* tricks to meet minimum requirements even when malloc() fails */
   148    unsigned char _ubuf[3]; /* guarantee an ungetc() buffer */
   149    unsigned char _nbuf[1]; /* guarantee a getc() buffer */
   150  
   151    /* separate buffer for fgetln() when line crosses buffer boundary */
   152    struct  __sbuf _lb; /* buffer for fgetln() */
   153  
   154    /* Unix stdio files get aligned to block boundaries on fseek() */
   155    int _blksize; /* stat.st_blksize (may be != _bf._size) */
   156    fpos_t  _offset;  /* current lseek offset (see WARNING) */
   157  } FILE;

Cannot parse OSX system headers due to Clang `__builtin_convertvector` "function"

Documentation is here: https://clang.llvm.org/docs/LanguageExtensions.html#langext-builtin-convertvector

The weird thing about __builtin_convertvector is that it takes a type as its second argument which results in a parse error on OSX (XCode 8.3) while parsing <emmintrin.h>:

/Applications/Xcode83.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/8.1.0/include/emmintrin.h:395: (column 64) [ERROR]  >>> Syntax Error !
  Syntax error !
  The symbol `__v2df' does not fit here.

(where __v2df is defined as typedef double __v2df __attribute__ ((__vector_size__ (16)));).

The language-c lexer is context-sensitive and makes a CTokTyIdent for typedef'd identifiers and a CTokIdent otherwise. The parser for argument_expression_list (and ultimately, primary_expression) does not expect tyident tokens.

test-failure in stackage nightly build

It's not clear to me how serious this is so I'm marking the test suite as an expected failure for now, please let me know if we should keep it like that or if it can/will be fixed.

> /tmp/stackage-build14/language-c-0.6$ dist/build/language-c-harness/language-c-harness
Changing to test directory test/harness and compiling
make: *** No rule to make target 'prepare'.  Stop.
language-c-harness: callProcess: make "prepare" (exit 2): failed

Towards a more complete Analysis.Export

Hi everyone,

I was already talking a bit with lambdageek on reddit (InformalInflation, that's me). In short, I have this use case where I would like to apply some tranformations to a C file after type checking. (I need the type information to perform the transformations). As lambdageek pointed out, I should probably do this on the SemRep and not on the AST.

So, after applying my transformation on SemRep I still need to pretty-print it back into the file. I believe that is what Analysis.Export is for.

I saw the warning:

WARNING : This is just an implementation sketch and not very well tested.

And well, if I am not mistaken, there's no top-level function like exportSemRep :: GlobalDecls -> CTranslUnit, which would be what I need.

Before I start adding functions towards the goal, is there something I should know?

tl;dr: I would like to contribute to Analysis.Export, what do I need to know?

language-c and C11 support

Originally reported this to c2hs here and was noted that language-c is providing all of the usefulness: haskell/c2hs#159

Noticed on alpine linux 3.3.1 when compiling the ncurses package. Are there plans for language-c and C11 support? I know this pacage looks to just have changed ownership, and the haskell is a bit beyond my current ability at this point or I'd try to add a fix for the _Noreturn case I hit.

The error log:

Configuring ncurses-0.2.15...
Building ncurses-0.2.15...
Preprocessing library ncurses-0.2.15...
c2hs: C header contains errors:

/usr/include/stdlib.h:44: (column 11) [ERROR]  >>> Syntax error !
  The symbol `void' does not fit here.

And that section of stdlib.h:

_Noreturn void abort (void);
int atexit (void (*) (void));
_Noreturn void exit (int);
_Noreturn void _Exit (int);
int at_quick_exit (void (*) (void));
_Noreturn void quick_exit (int);

For now I just patch stdlib.h and remove the _Noreturn's to work around this:
https://github.com/mitchty/alpine-linux-ghc-bootstrap/blob/master/test-7.10/Dockerfile#L19

But I think a more general approach would be to have language-c start to support C11 constructs. I don't think full on generic support is really needed yet or anything from the appendices.

Support __int128

Add support for omitting braces in initializer lists

Language.C should support this:

struct Foo {
  char arr[3];
};

struct Foo foo = {0};

It's more error-prone in deeply nested structs to be adding each of the layers manually. All C compilers support this.

Hackage release and major version bump

Hi, I need language-c > 0.5 for c2hs to work for me, so I'd like a new hackage release as it's been a while. The latest changes, however, will necessitate a major version bump as they break c2hs in a minor way (the type of partitionDeclSpecs). The c2hs fix is trivial, but we can't PR that until language-c-0.6.0 goes up on hackage.

label LABEL must precede all other declarations in a block

I had occasion to need local label declarations a la gcc, and noticed that the parser (sorry, tell me how to tell what version I have and I'll tell you) only accepts _ _ label _ _ LABEL if it's at the very start of a block. Anything before it, declaration or otherwise, and one gets "user error" from the parser.

So I fixed it. I'm just mailing this before I forget in the hope that you can use the fix, better it, or say it's already been fixed in the latest version.

You need to look at Parser.y where the original definition of compound statement is
...
compound_statement
: '{' enter_scope block_item_list leave_scope '}'
{% withNodeInfo $1 $ CCompound [] (reverse $3) }
| '{' enter_scope label_declarations block_item_list leave_scope '}'
{% withNodeInfo $1 $ CCompound (reverse $3) (reverse $4) }
...
I've bold-faced where the label declaration stuff is, and as you can see it's set to be before the block_item stuff, which is either declaration of statement, repeated. So there is no mistake. It's set up as I described.

Now it's notionally easy to fix this. Just broaden your definition of block items in the AST to admit label declarations as an extra kind of item, and practically that's hey presto and done. You'd have to modify the definition of CCompound in the AST to match, to make it take just block item list as an argument, and not label declarations and block items.

But that's got way to many ramifications for me to countenance, as the AST is exported and so you are practically stuck with it, silly as it is in places! (what's with not really parsing declarators at all, just delivering what is practically a stream of tokens in the AST? But I digress ...). Other people may rely on the AST staying the way it is.

OK, no problem, we can loosen the parse to allow mixed label declarations and the existing block items, just as they are, and trivially pick the two apart out of the mixed list for the existing CCompound constructor to use. That's what I'll do.

So start by erasing that second alternative from block parse, the one that fixed label declarations as having to come first:
...
RM | '{' enter_scope label_declarations block_item_list leave_scope '}'
RM {% withNodeInfo $1 $ CCompound (reverse $3) (reverse $4) }
...
Now improve block_item to allow a label declaration by adding a new alternative, last:
...
ADD | "label" identifier_list ';' { Right $2 }
...
You'll notice I made it return a "Right", so the type has become an Either:
...
RM block_item :: { CBlockItem }
ADD block_item :: { Either CBlockItem (Reversed [Ident])}
...
One has to go through the existing block item parse alternatives and add a "Left" to what they return. Now we're good to go back to the amputated compound parse and fish out the bits from the mixed list of Lefts and Rights that it will now receive.
...
RM {% withNodeInfo $1 $ CCompound [] (reverse $3) }
ADD {% withNodeInfo $1 $ CCompound (rights $ reverse $3) (lefts $ reverse $3)
...
Here rights picks out the Right elements and lefts picks out the Left elements from the mixed list $3 that's been received from the block_item_list parse. Those are defined in the post-logue. I am sure you can write those!

You can better this by removing some of the Reverse/reverse constructs. I couldn't be bothered, as I was hacking to make it work, not make it beautiful. I'll append my diff file.
diffs.txt

Happy hunting!