Coder Social home page Coder Social logo

Comments (2)

expipiplus1 avatar expipiplus1 commented on June 26, 2024

Hopefully this should just be a simple change in the lexer. PR's welcome!

from language-c.

mtolly avatar mtolly commented on June 26, 2024

So, I looked into this and I think I found the fix, but Alex might need to release a bug fix first.

I saved the sample from the linked issue as a UTF-8 file:

# 1 "test.c"
# 1 "<built-in>"
# 1 "<命令行>"
# 31 "<命令行>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 32 "<命令行>" 2
# 1 "test.c"
int main()
{
 return 0;
}

And sure enough got Prelude.head: empty list. The error comes from the second usage of head at this location, and is caused by the first non-ASCII line # 1 "<命令行>".

Basically the problem is that Alex is assuming the input bytestring is UTF-8, but the InputStream is a byte-by-byte abstraction (effectively Latin-1). In these lines:

\#$space*@digits$space*(\"($infname|@charesc)*\"$space*)?(@int$space*)*\r?$eol
  { \pos len str -> setPos (adjustLineDirective len (takeChars len str) pos) >> lexToken' False }

Alex is passing 12 for len, which is the correct Unicode codepoint length of # 1 "<命令行>" plus a newline at the end. But takeChars then takes 12 bytes off the bytestring, so adjustLineDirective receives a broken string which does not include the double quote at the end.

The correct fix is to put Alex back into Latin-1 mode (my impression is that this was the default previously, but was then switched in Alex 3.0). This is done with the %encoding "latin1" directive (added in Alex 3.1.7). However, it still doesn't work because there was a remaining bug in character counting that caused it to still pass the too-short length. This was fixed in haskell/alex#156 but even though that was merged a year ago it appears to not have made it into the recent Alex 3.2.6. So, I'll ping that to see when it can be released.

from language-c.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.