cgmb / guardonce Goto Github PK

View Code? Open in Web Editor NEW

139.0 10.0 3.0 148 KB

Utilities for converting from C/C++ include guards to #pragma once and back again.

License: MIT License

Python 99.19% C 0.81%

c cpp converter pragma-once include-guards c-plus-plus

guardonce's Introduction

guardonce

Utilities for converting from C/C++ include guards to #pragma once and back again.

Why Convert?

Include guards suck. They're tiring to type and tedious to update. Worse, the task of updating boilerplate leaves room for copy/paste errors, or other mistakes. #pragma once is simpler and less error prone. That's why you should convert to #pragma once.

Alas, though #pragma once is available on all the most commonly used compilers, it's not available on every compiler. Perhaps one day you'll add support for a platform with a barebones compiler with no support for #pragma once and you'll have to convert back. That's ok. It's easy!

What exactly is guardonce?

There are three main tools provided by guardonce:

checkguard helps find any broken include guards you may already have in your project. These should be addressed before converting.
guard2once converts files with include guards into files with #pragma once directives. This ensures your entire project is consistently using #pragma once.
once2guard converts files with #pragma once directives back into files with include guards. This ensures your entire project is consistently using include guards.

How to use:

First, check your project for broken headers. To recursively search your project directories for the names of all files that lack proper include guards, use the following command, substituting your project's directory for the quoted string:

checkguard -r "source_directory"

By default, checkguard is very forgiving. It accepts either #pragma once or anything that looks like an include guard. If you know that all your guards should match some format, you can be more strict by using -p to specify a pattern to check against.

If certain files are not supposed to have include guards, feel free to leave them be. Files without include guards are ignored by this next step.

Now, all that remains is converting the headers to use #pragma once:

guard2once -r "source_directory"

You're done! Double check that the result matches your expectations and start using #pragma once in your new code. Know that if you ever need to switch back, it's as simple as:

once2guard -r "source_directory"

If the default guard style doesn't appeal to you, there are a few options to customize it. Maybe take a look through once2guard --help or check out a walkthrough for some examples.

How to Install:

Whether you use Python 2 or Python 3, these tools can be installed with pip. Run python -m pip install guardonce and you're off to the races.

If you'd rather not use pip, it is possible to instead just run from the repository. However, you'll need to use slightly different commands. Add the repository to your PYTHONPATH and invoke the tools as python modules, as illustrated below.

Linux / OSX

git clone https://github.com/cgmb/guardonce.git
export PYTHONPATH="$(pwd)/guardonce"
python -m guardonce.checkguard -r ~/myproject

Windows

git clone https://github.com/cgmb/guardonce.git
set "PYTHONPATH=%CD%\guardonce"
python -m guardonce.checkguard -r ~/myproject

Note that on Windows you might need to invoke guardonce via python -m even if you install with pip.

guardonce's People

Contributors

Stargazers

Watchers

Forkers

alexanderleebloor renefritze ukaiser

guardonce's Issues

Add option to read from stdin

Support for writing to stdout was requested in #25 so guardonce could be integrated with other tools. I considered adding support for reading from stdin to go along with that, but eventually decided against it due to time constraints. It's still a good idea, though, so it's going in the backlog.

One trick to this is that reading from stdin implies writing to stdout, because where else would we put the output? I was thinking I'd name the flag --stdio to make that fact really obvious, but --stdin seems to be the less surprising name. So, I'll probably just name the flag --stdin.

Another consideration is the path. The input file path is often used for generating the include guard pattern, but guardonce doesn't know the actual file path if input comes from stdin. There needs to be a way to explicitly pass guardonce that information, so we'll add an --assume-filepath=<file> option to do so.

--assume-filepath will be required when using --stdin together with --pattern for guard2once or checkguard and will be required even when using --stdin alone for once2guard. The difference here is because once2guard always uses a pattern—if you don't specify one, it just uses a default pattern.

Handle Windows line endings on Linux and vice versa

checkguard is rather unhappy when you use Windows line endings on Linux. It regards your files as being broken. Perhaps it should be more permissive. The point of checkguard is to let you know if there are potential problems with your include guards, not to admonish you for using the wrong line ending convention.

This definitely affects v2. Not sure about v1.

Identify duplicate guards

It would be nice if checkguard could tell you if you had any duplicate include guards. Two files with the same guard symbol are likely to be a problem, but even if it's intentional it's something to be aware of.

Understand C++: Ignore comments

When searching through files, anything within comment blocks should be ignored as irrelevant.

Understand C++: Ignore strings

When searching through files, anything within strings should be ignored as irrelevant.

cp1251 file processing/autodetect encoding as an option

Can you add cp1251 file processing as option? or autodetect encdoing as option?

checkguard.py should understand #pragma once

Typically, checkguard is looking for files with mismatched include guards. Files with a #pragma once don't need to be included in the error list by default.

Support other header file suffixes

*.hpp is reasonably common and should be supported. Possibly other suffixes, too.

Workaround: Edit isHeaderFile(fileName) in crules.py.

Warn about using reserved symbols for include guards

It would be nice if checkguard could warn about include guards using symbols are reserved for the compiler or standard library. It's an easy thing to forget about when picking your guard pattern. I put a reminder in the docs, but it wouldn't be hard to automatically check that.

Strip trailing whitespace after removing include guards

It would be nice to have a flag for this. I try to ensure guardonce makes minimal changes to the files it touches, but removing the guards without removing the surrounding whitespace may violate style conventions. The Qt Creator changeover is a concrete example of a time when this would have been helpful.

Path pattern argument not optional

The argument to path is supposed to be optional, but guardonce will complain about a missing argument if there is any filter following it.

For example:

cgmb@localhost:~/abseil-cpp$ checkguard -r -o guard -p 'path | upper | append _' absl
Missing argument from "path" in pattern

Workaround: Put a big number for the path depth. As long as the actual path depth is smaller than that, the behaviour will be the same as not supplying an argument.

First line of file is ignored if encoded as UTF-8 with BOM

When looking for preprocessor directives, guardonce looks for # characters that have only been preceded by whitespace. For files that are UTF-8 with BOM, the BOM is interpreted as a series of characters that precede everything else on the line. As such, no preprocessor commands can be found on the first line of the file.

So, this is only a problem if the guard is on the first line of the file. If it appears anywhere else, the guard is found as normal and the BOM passes through guardonce unmodified to appear in the output as expected.

Experimenting with GCC, it seems that the BOM is ignored by compilers. This is probably reasonable behaviour, as it's very unlikely that a C header file could start with those characters and still be valid.

Handle #endif with no space before comments

The VulkanSDK includes a bunch of guards that end with stuff like: #endif//MATCH_H. They're valid, but guardonce expects a space or a newline after endif, so checkguard complains about them.

My goal is to have a smarter version of guardonce that actually understands comments (as mentioned in #7), but perhaps there's an easier fix for now.

Handling non-ascii encoding with --stdout

I was unlucky to apply it to some UTF8 code, yielding error:

Error processing /home/kwesolow/....
(UnicodeEncodeError) 'ascii' codec can't encode character '\xe9' in position 123938: ordinal not in range(128)

The error is not there when running in place?

With clang-format it work ok, and I can for example capture correctly encoded output via
fixed_content = subprocess.check_output([GUARD2ONCE_CMD, path, '--stdout']).decode()

Customize once2guard's #endif output

Many projects include a comment after each #endif to specify what conditional is ending. It would be nice if you could specify how once2guard did this. For example, from node:

#ifndef SRC_BASE_OBJECT_H_
#define SRC_BASE_OBJECT_H_
...
#endif  // SRC_BASE_OBJECT_H_

A major part of why I built guardonce was to make it easy to convert back from #pragma once if a problem ever arises. If the generated code breaks a style guide and needs manual correction, that's a problem. once2guard and guard2once should be able to round-trip most guard styles.

There are lots of different styles, so perhaps the easiest thing to do would be to allow the user to provide a template. There just needs to be some way to refer to the generated guard symbol from that template.

It would work something like,

once2guard -t '#endif /* % */' file.h

Fancier template languages like jinja seem a bit overkill, so perhaps a single special character to refer to the guard symbol would be sufficient. Or maybe I should use {} to match find and friends?

Using include guards and pragma once together

It seems a lot of people like the idea of using both include guards and #pragma once in the same file. I'm not a fan.

Unlike just using #pragma once, if you use both together you still need to manually maintain the uniqueness of the include guard symbol. Eliminating that burden was the best feature of #pragma once, in my opinion. With that gone, the only possible advantage of also using #pragma once would be performance, but there is no performance improvement on modern compilers.

Compilers can recognize that a file is protected by include guards and optimize its inclusion just as they would for #pragma once (as long as they check that the guard is never #undef'd). GCC has long done this, but MSVC historically lacked this optimization. That's the main reason why you see so much old information promoting the use of #pragma once and include guards together.

That information is out of date. To quote Microsoft's documentation on #pragma once in VS2015:

There is no advantage to use of both the #include guard idiom and #pragma once in the same file. The compiler recognizes the #include guard idiom and implements the multiple include optimization the same way as the #pragma once directive if no non-comment code or preprocessor directive comes before or after the standard form of the idiom:

</endrant>

With all that being said, it's clear that using both together is a style that many people use. guardonce has no support for working with files that contain both, but maybe it should. At the very least, it should help people migrate away from using both to using just one.

Allow getting fixed/converted file on stdout

Similar to clang-format behavior when one can select in place or stdout as target.

It would give more flexibility when integrated into bigger tool suite (i.e. code format, fix include guards in single pass).

Add support for multiple --exclude arguments

Match files against all --exclude arguments, and ignore them if any match.

Support other include guard conventions

The function to transform the file name into the include guard needs to be more flexible. It's particularly important in guard2once.py since we're trying to match any existing convention a user might have for their files.

Workaround: Edit guardSymbol(fileName) in crules.py to transform the file name and return the include guard symbol for the file.

Errors when converting files do not result in non-zero exit codes

When guard2once failed it still returned 0 exit code, causing script to misinterpret its output.

Failures handing UTF-16, UTF-32 encoded files

It seems that Visual Studio may generate UTF-16 header files for Resource Files. An example of such a file is renderdoccmd/resource.h. I expect that UTF-32 files have the same problem, though I have never encountered one.

Under Python 2, guardonce actually happens to handle this case correctly, as resource files don't have guards. checkguard notes that no guard was found, and both guard2once and once2guard ignore it. This is not because guardonce is behaving intelligently. Even if there were a guard, it would not be recognized, and guardonce would exhibit the same behaviour. That's not ideal, but as long as checkguard is telling you that the files are a problem, and as long as guard2once and once2guard do no harm to the files, it's acceptable.

Under Python 3, guardonce fails to decode the file to string, and prints a cryptic error message:

'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

This is from Linux, where utf-8 is the default codec. There's probably a different message under Windows. The behaviour is mostly the same as under Python 2, but all utilities print that error message, and checkguard does not print out the file name. It's hard to track down what file has the problem, because I'm not including enough information in that error message. That's not acceptable.

It's hard to say what the right thing to do is. Programs like file and vim will guess these encodings, though sed and gcc won't. UTF-16 and UTF-32 are pretty distinctive. They will have a BOM, and it's very likely that a large percentage of bytes in the file are going to be null. It's very unlikely that a real C header would start with the BOM characters in any encoding, or be full of null bytes.

Another possibility is to allow the user to specify the encodings of their files, but that may be complicated, as even in the renderdoc example above, most files in the repository are UTF-8 and there's only a single UTF-16 file. Many developers probably don't know how all their files are encoded, and there's probably a mixture of encodings within the repository.

At least for now, the plan is to make Python 3's behaviour match Python 2. Everything beyond complaining about and ignoring these files is a bonus.

Operate on individual files

The ability to operate on individual files would be nice to allow usage in combination with other tools, like find and xargs.

Add processing another guard case

Add processing another guard case
for example from our legacy

#if !defined(CONVERTER_H)
#define CONVERTER_H

once2guard fails on files containing unicode if --endif-style specified

My first encoding bug. Hurrah! once2guard silently fails to convert files containing unicode characters if --endif-style is specified. This only affects Python 2.

This seems to stem from accidentally combining unicode and str. Using either one consistently is sufficient to fix the problem.

I kind of miss static typing.

Python 3 Support

guardonce v1.0 supports only Python 2.7.

guardonce v2.0 will support Python 2.7 and Python 3.5.

Handle include guards with values

checkguard will complain about perfectly valid guards like this:

#ifndef VULKAN_H_
#define VULKAN_H_ 1
...
#endif

The problem with just accepting a value for include guards is that it makes confusing an include guard and a constant declaration more likely. This looks pretty similar:

#ifndef M_PI
#define M_PI 3.14159265358979323
#endif

But, perhaps something could still be done about these cases? If guardonce checked that the #ifndef/#endif covered the whole file (aside from comments and preprocessor directives), it could be reasonably certain the it was dealing with an include guard. That, of course, depends on #7.

Do most people who give a value to their include guards use 1? Perhaps that would be a decent heuristic to use in the meantime? Needs research.

Extraneous newline is left at the end of file in some cases

When running guard2once 2.4.0 on files like this one, it will work correctly but an extraneous newline will be left at the end of file, causing clang-format to fail.

Not a huge issue as clang-format can fix it on its own, but I thought I'd report it anyway 🙂

Thanks a lot for making this tool, it's a huge time saver.

Failures handling unicode in headers when using Python 3 on Windows

My second encoding bug. :\

Apparently, the default encoding for Windows is CP-1252. A not-uncommon scenario would be processing a file that turns out to be UTF-8. In that case, decoding will fail during the file read if the file contains a byte sequence that's invalid for CP-1252. Fancy quotes, for example.

This only happens for Python 3, because in Python 2 the string isn't decoded. There's really no need to decode it, because any string of characters outside of the ASCII range is irrelevant to guardonce, and can be passed through without modification. To my knowledge, the only popular-ish encodings that mangle the ASCII range are UTF16 and UTF32, so aside from files with those encodings the Python 2 method of being Unicode-oblivious works great.

In general, there's no way to know the encoding of a given file. Given that my parsing will work on nearly all encodings aside from UTF16 and UTF32, I'm tempted to switch to bytestrings in Python 3 so that I get the same behaviour as Python 2 and so I can bypass the whole character-encoding guessing game.

Add support for multiple file and directory arguments

A command like ./guard2once.py src/* might expand to a call like ./guard2once.py src/dir1 src/dir2 src/dir3. It would be nice to support operating on each argument given. Right now that's just an error, but the intention is obvious.

Less Archery; More Music

To be changed when http://youtu.be/siwpn14IE7E no longer requires Flash.

once2guard --help throws exception

The use of % in argparse help strings needs to be escaped. I caught this the first time I wrote the help for --endif-style, but I went back and forth on what symbol to use, and it got missed in
b7ac670.

This was an embarrassingly simple oversight. The unit tests for guardonce have been focused on the algorithms handling searching and substitution, but until yesterday there were no integration tests checking that they're glued together correctly. That has been corrected, and this will not be missed again. There's only a few integration tests at the moment, but they will be expanded upon as development continues.

Add UTF-16 and UTF-32 support

Currently, UTF-8 is the only supported encoding for guardonce. UTF-16 and UTF-32 files cannot be processed. I began to improve the handling of those files in #22 by ensuring they were flagged as a problem or ignored. However, it's possible to do better.

There's are a few heuristics that can be used to guess that a file is UTF-16 or UTF-32. The simplest is to check if the first few bytes of the file match a BOM. It's extremely unlikely that any header file in another encoding would start with the same bytes as a UTF BOM, so this seems sufficient for our case.

The encoding used for reading should also be used for writing when processing the file in place. However, I'm less sure about the correct behaviour for printing the new file to stdout. I suspect that I should just use the same encoding there too, though perhaps I should use the output stream's desired encoding if it is known.