Coder Social home page Coder Social logo

matus-chochlik / ctcache Goto Github PK

View Code? Open in Web Editor NEW
73.0 73.0 26.0 2.79 MB

Cache for clang-tidy static analysis results

License: Boost Software License 1.0

Shell 1.47% Python 90.45% Dockerfile 0.37% CSS 0.74% HTML 1.66% JavaScript 1.93% CMake 2.81% C++ 0.57%

ctcache's People

Contributors

bartdesmet avatar christian-steinmeyer avatar coryan avatar dsilakov avatar felixoid avatar generhr avatar greg-kent avatar haudren-woven avatar marcsanfacon avatar matus-chochlik avatar n-coder avatar nicocvn avatar solarispika avatar staffanf avatar timangus avatar tony-p avatar zitrax avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ctcache's Issues

Move import of requests

Since the "requests" module is only used by the server cache, how about moving that import so that clang-tidy-cache can work without "requests" installed? I.e. something like:

diff --git a/clang-tidy-cache b/clang-tidy-cache
index c887e63..ee17407 100755
--- a/clang-tidy-cache
+++ b/clang-tidy-cache
@@ -12,7 +12,6 @@ import errno
 import getpass
 import logging
 import hashlib
-import requests
 import tempfile
 import subprocess
 import json
@@ -222,6 +221,7 @@ class ClangTidyCacheHash(object):
 # ------------------------------------------------------------------------------
 class ClangTidyServerCache(object):
     def __init__(self, log, opts):
+        import requests
         self._log = log
         self._opts = opts
 

run-clang-tidy - No such file or directory: '-p=...'

It's probably related to #16

I did exactly the same thing:

CTCACHE_CLANG_TIDY=/usr/bin/clang-tidy-17 CTCACHE_DIR=/tmp/ctcache run-clang-tidy-17 -clang-tidy-binary /usr/local/bin/clang-tidy-cache

I tried with clang-tidy-cache from main and from commit 0364427 (Jan 20, 2023). Both have the same problem.

I have:

$ CTCACHE_CLANG_TIDY=/usr/bin/clang-tidy-17 CTCACHE_DIR=/tmp/ctcache time run-clang-tidy-17 -clang-tidy-binary /usr/local/bin/clang-tidy-cache
<class 'FileNotFoundError'>: [Errno 2] No such file or directory: '-p=/home/...'
Unable to run clang-tidy.
Command exited with non-zero status 1
0.09user 0.01system 0:00.11elapsed 99%CPU (0avgtext+0avgdata 16456maxresident)k
0inputs+0outputs (0major+4391minor)pagefaults 0swaps

Ability to apply path striping to source files

We Are using a conda virtual environment and ros which generates code, and these 2 both have instances where paths end up in the source code. For this reason we want to also have the option to strip paths from the source output.

I have already made a PR #59 that implements this.

Question: Shared cache

Does ctcache support sharing 'local' cache locations across multiple machines? Say for example running builds on multiple servers but having a shared network attached storage with a single ctcache shared by all the servers? The servers could be running builds concurrently with each other so could be reading and writing to the cache at the same time.

Feature Request: Consider adding a `clang-tidy-cache --show-stats` option

ccache has a --show-stats flag that's useful to see the value its providing. https://ccache.dev/manual/4.2.html

Typical usage is like:

ccache --zero-stats
... do the build using ccache
ccache --show-stats

It may be nice if clang-tidy-cache had similar functionality when run in local mode.

What should the output be? Cache hits, cache misses, etc. In case it helps, the output from ccache looks like the following:

cache directory                     /h/.cache/ccache
primary config                      /h/.config/ccache/ccache.conf
secondary config (readonly)         /etc/ccache.conf
stats updated                       Mon Apr 12 16:33:48 2021
stats zeroed                        Mon Apr 12 16:32:03 2021
cache hit (direct)                   971
cache hit (preprocessed)              89
cache miss                             0
cache hit rate                    100.00 %
cleanups performed                     0
files in cache                      2039
cache size                         157.6 MB
max cache size                       5.0 GB

Of course, the output doesn't need to exactly match this, but it's something to consider and draw inspiration from.

--show-stats doesn't work

./clang-tidy-cache --show-stats
<class 'TypeError'>: ClangTidyCache.query_stats() takes 1 positional argument but 2 were given

clang-tidy runs that produce warnings are cached

It seems like ctcache caches runs that have produced clang-tidy warnings, so that the next time the same file is processed no warning is shown.

Take this file (warning.cpp):

#include <string>
#include <vector>

void Foo(const std::vector<std::string> bar)
{
}

void Bar()
{
  std::vector<std::string> foo{"foo", "bar"};
  Foo(foo);
}

When run with ctcache:

% ./clang-tidy-cache clang-tidy --checks=performance-\* warning.cpp -- g++ -c warning.cpp -o warning.o
8 warnings generated.
warning.cpp:4:41: warning: the const qualified parameter 'bar' is copied for each invocation; consider making it a reference [performance-unnecessary-value-param]
void Foo(const std::vector<std::string> bar)
                                        ^
                                       &
Suppressed 7 warnings (7 in non-user code).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
% ./clang-tidy-cache clang-tidy --checks=performance-\* warning.cpp -- g++ -c warning.cpp -o warning.o
<no output>

One option would be to not cache runs that have produced output on stdout.

Question: Are there tuneable parameters to control cache size?

I'm looking into using this clang-tidy cache for my project (googleapis/google-cloud-cpp#6170), and so far it looks pretty promising.

I'm wondering about cache growth. Are there tunable parameters prune cache files that haven't been used in N days? I see that the cache files themselves are empty, so I'm not concerned about size, but the number of files may grow without bound otherwise.

Thanks in advance.

Add option on remote caches to be read-only?

It'd be useful to have a "read-only" flag for remote caches, similar to ccache's support for a read-only=true flag that can be specified on a per-remote basis. See https://ccache.dev/manual/4.9.1.html#_attributes_for_all_backends.

Our scenario is a bunch of CI machines that run builds, including clang-tidy checks. These all use a common cache using clang-tidy-cache-server, which greatly speeds up the CI pipelines (over 98% gains). Developers would also like to benefit from this cache for local builds, but we wouldn't want to pollute the cache as they're making local iterations to their code, often in a tight development loop with repeated builds using clang-tidy.

If there's no objection, I'm happy to prepare a change, potentially with CTCACHE_[S3|GCS|HOST]_READ_ONLY flags that default to 0 but can be set to 1. The effect would be for store_in_cache functions to simply return.

Some issues with MSVC

I have a project built using CMake with Ninja build system and it supports Windows so the compiler is cl.exe from MSVC.

I tried to use ctcache and had some issues that I think can be fixed easily:

  1. cl.exe writes file name to standard error even when successfully pre-compiled the file, causing ctcache to fail to cache here

    ctcache/clang-tidy-cache

    Lines 858 to 859 in dac994e

    if stderr:
    return None

    I think the check can be changed to look at proc.returncode.
  2. In adjusting compiler options to print precompiled content here

    ctcache/clang-tidy-cache

    Lines 102 to 104 in dac994e

    for i in range(1, len(self._compiler_args)):
    if self._compiler_args[i-1] in ["-E"]:
    self._compiler_args.insert(i, "-P")

    The -P option tells cl.exe to write the content to a .i file. To achieve what is desired by the code, it should change -E to -EP.
  3. In FileLock.release, before os.unlink, self.lockfile should be closed first, or os.unlink will fail.

    ctcache/clang-tidy-cache

    Lines 450 to 457 in dac994e

    def release(self):
    if self.lock_handle is not None:
    try:
    os.unlink(self.lockfile) # Remove the lock file upon release
    except OSError:
    pass # Ignore errors if the file doesn't exist or has already been released
    finally:
    self.lock_handle = None

With these changes, I can successfully cache clang-tidy's output.

I am just not sure how would you like to fix this in a proper way.

Questions regarding integration with codechecker

Hello,
I am trying to integrate your caching wrapper with codechecker (https://github.com/Ericsson/codechecker), and I see some strange issues, all of which I can fix locally myself, but I still would like to mention these problems:

  1. Why is adjust_chunk method not ignoring lines that start with '#' ? Is there a reason why those lines can matter? I think in general adjust_chunk should not try to replace anything in preprocessed code, it is usable for command line arguments but not for source code.

  2. I think _omit_after(ct_args, ["-export-fixes"] should be adjusted to work with --export-fixes as well

  3. It seems command for clang-tidy that is generated by codechecker is quite special and current wrapper does not expect such command:
    /usr/lib/llvm-15/bin/clang-tidy /build/asd/file.cpp --export-fixes /build/asd/build/reports_clang-tidy/fixit/file.yaml -- -Qunused-arguments -Wall -Wextra -x c++ --target=x86_64-pc-linux-gnu -DBOOST_NO_AUTO_PTR -DBOOST_NO_CXX11_UNICODE_LITERALS -DBOOST_STACKTRACE_BACKTRACE_INCLUDE_FILE="/usr/lib/gcc/x86_64-linux-gnu/10/include/backtrace.h" -DBOOST_STACKTRACE_USE_BACKTRACE -DEIGEN_MPL2_ONLY -DUNICODE -D_UNICODE -I/build/asd/dev/libraries/mwstd/public -isystem /build/asd/dev/third_party/boost_1_78 -g -fPIC -fexperimental-library -fdata-sections -ffunction-sections -fexceptions -ftrapping-math -stdlib=libc++ -fvisibility-ms-compat -fopenmp=libomp -std=c++20 -isystem /usr/lib/llvm-15/include/c++/v1 -isystem /usr/local/include -isystem /usr/include/x86_64-linux-gnu -isystem /usr/include

    • First issue happens because there is no compiler command after -- (no clang++-15 for example), and wrapper does not add its own compiler command
    • Second issue happens because -E is not added if there is no -c in input
    • Third issue happens because there is no actual cpp file mentioned in the second part of command (after --)

The question: Are you aware of all this? Does it make sense to fix all of this? I think I can help with fixing if you want.

Frequent stats.lock contention

Hi,

Since trying to move to the latest version of ctcache (with the stats added to local-mode) we're getting frequent errors below. Looks to be heavy contention on the stats file when running parallel builds. Interestingly these errors don't cause any builds failures... are the exceptions being ignored and the build has actually not completed or are these errors benign and should therefore be turned into warnings instead of errors?

Timeout (3 seconds) exceeded while acquiring lock.
Traceback (most recent call last):
  File "/ctcache/dac994e/clang-tidy-cache", line 436, in acquire
    self.lock_handle = os.open(self.lockfile, os.O_CREAT | os.O_EXCL)
FileExistsError: [Errno 17] File exists: '/external/ctcache/stats.lock'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/ctcache/dac994e/clang-tidy-cache", line 486, in update_stats
    hits, misses = self.read_stats()
  File "/ctcache/dac994e/clang-tidy-cache", line 473, in read_stats
    with FileLock(self.stats_file() + ".lock") as _:
  File "/ctcache/dac994e/clang-tidy-cache", line 460, in __enter__
    return self.acquire()
  File "/ctcache/dac994e/clang-tidy-cache", line 443, in acquire
    raise RuntimeError(msg)

Consider `boto3` an optional import

After the recent update the boto3 has became a mandatory installed module, and it's not a lightweight at all.

apt-get install python3-boto3
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libarchive13 libjsoncpp1 librhash0
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
  docutils-common libfreetype6 libimagequant0 libjbig0 libjpeg-turbo8 libjpeg8 liblcms2-2
  libpaper-utils libpaper1 libpng16-16 libtiff5 libwebp6 libwebpdemux2 libwebpmux3 python3-botocore
  python3-dateutil python3-docutils python3-jmespath python3-olefile python3-pil python3-roman
  python3-s3transfer sgml-base ucf xml-core
Suggested packages:
  liblcms2-utils docutils-doc fonts-linuxlibertine | ttf-linux-libertine texlive-lang-french
  texlive-latex-base texlive-latex-recommended python-pil-doc python3-pil-dbg sgml-base-doc debhelper
The following NEW packages will be installed:
  docutils-common libfreetype6 libimagequant0 libjbig0 libjpeg-turbo8 libjpeg8 liblcms2-2
  libpaper-utils libpaper1 libpng16-16 libtiff5 libwebp6 libwebpdemux2 libwebpmux3 python3-boto3
  python3-botocore python3-dateutil python3-docutils python3-jmespath python3-olefile python3-pil
  python3-roman python3-s3transfer sgml-base ucf xml-core
0 upgraded, 26 newly installed, 0 to remove and 17 not upgraded.
Need to get 5665 kB of archives.
After this operation, 55.7 MB of additional disk space will be used.
Do you want to continue? [Y/n]

Will you consider a patch to make it optional?

Writeback to local cache upon remote cache hit

Changes in #50 worked very well for our setup to pull down cache clang-tidy results from the remote cache, thus greatly reducing local development time, without pushing back changes to the remote caches as to avoid pollution with random changes.

However, there's still one concern left, which is that remote cache hits do not result in local cache writes, so all subsequent invocations keep making the remote calls, which add to latency and server load. The suggested write back behavior would be similar to https://ccache.dev/manual/4.5.1.html#_storage_interaction.

I've prepared a draft change locally (and will push a draft PR, linking to this issue) that implements this behavior, but it may be worth considering if it should be done under an option.

Computation of the hash value does not always detect code changes

Description of the problem

The current implementation of hash_inputs() does not always detect if the code to be analyzed has changed. It seems that the issue can be traced to how #24 modified the hash_inputs() modification.

For example, consider the following clang-tidy invocation (using a compilation database):

clang-tidy -p build_dir foo-lib/foo.cpp

Upon parsing the compilation database to extract the compiler arguments, the compiler command line would look like:

clang++ -D__clang_analyzer__=1 __whatever_compiler_flags__ -o - -E foo-lib/foo.cpp

hash_inputs will then calculate the hash based on the modification time of the file foo-lib/foo.cpp. If the headers included by that file have changed since last run this will not impact the computation of the hash and clang-tidy-cache will use the cached information if available.

Discussion

Prior #24 the compiler command line was actually executed to collect the output of the preprocessor (which was then filtered using adjust_chunks()). The advantage of this approach is that any changes in the translation unit lead to a different hash, and I assume this is the desired behavior.

Note that, #24 was motivated by issues observed in VSCode+Ninja but it is not clear (to me at least) what was the real bottleneck.

Would it be possible to actually revert these changes to avoid the issue?

Thoughts:

  • #24 introduced a better way to find the location of the .clang-tidy file via a dedicated argument (--directories_with_clang_tidy=) and this should probably be retained. That makes it possible to simplify the parsing of the compiler command output.
  • During the parsing of the compiler arguments -c is replaced with -E. Should this be -E -P to automatically remove linemarkers (# lines)? This way there is no need anymore to parse line by line the output and this can be used to directly update the hash. This might improve performance which was the original issue in #24.
  • The use of -E -P instead of -E to bypass calling adjust_chunks() on the compiler output somewhat relates to one of the item in #23 as this make it possible to avoid touching the compiler output.

I can open a PR but I first wanted to discuss the issue as this is essentially asking for reverting a big part of #24.

Order of local versus remote caches when doing lookups and stores

I was perusing the code a little more in the context of #50, and stumbled upon the logic that adds the caches:

https://github.com/matus-chochlik/ctcache/blob/main/clang-tidy-cache#L896

        if not self._caches or opts.cache_locally():
            self._caches.append(ClangTidyLocalCache(log, opts))

I'd expect the local cache to be prepended to the list, rather than appended, such that methods like is_cached and get_cache_data would check the local cache first, and hence avoid making more expensive remote calls.

Am I missing something obvious here? If not, would it make sense to tweak this?

run-clang-tidy support

run-clang-tidy is a tool that reads a compilation DB and invokes clang-tidy for each source file in parallel and then aggregates the results into a single file:

https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clang-tidy/tool/run-clang-tidy.py

I tried to use ctcache and run-clang-tidy together.

You can run this setup in a directory containing the compilation DB with:
run-clang-tidy -export-fixes clang-tidy.yaml -clang-tidy-binary clang-tidy
where clang-tidy executable found in $PATH is a ctcache wrapper.

run-clang-tidy invokes, for each source file, a command that looks like the following:
clang-tidy -export-fixes /tmp/tmppri1nz_t/tmpeko_clr7.yaml -p=/absolute/path/to/CWD /path/to/file.cpp

This causes 2 issues with ctcache:

  1. path to compilation DB is given with argument -p=/some/path instead of -p /some/path
  2. argument -export-fixes /tmp/tmppri1nz_t/tmpeko_clr7.yaml uses an autogenerated random file that changes for each run.

I could workaround these issues with the following patch:

diff --git a/clang-tidy-cache b/clang-tidy-cache
index 279aacd..8172107 100755
--- a/clang-tidy-cache
+++ b/clang-tidy-cache
@@ -29,6 +29,14 @@ class ClangTidyCacheOpts(object):

         self._strip_list = os.getenv("CTCACHE_STRIP", "").split(':')

+        tmpArgs = []
+        for arg in args:
+            if arg.startswith("-p="):
+                tmpArgs.append("-p")
+                tmpArgs.append(arg[3:])
+            else:
+                tmpArgs.append(arg)
+        args = tmpArgs
         if args.count("--") == 1:
             # Invoked with compiler args on the actual command line
             i = args.index("--")
@@ -483,7 +491,15 @@ def hash_inputs(opts):

     for chunk in sorted(set([opts.adjust_chunk(arg) for arg in co_args[1:]])):
         result.update(chunk)
-    for chunk in sorted(set([opts.adjust_chunk(arg) for arg in ct_args[1:]])):
+    tmpArgs = []
+    i = 1
+    while i < len(ct_args):
+      if ct_args[i] == "-export-fixes":
+          i += 2 # ignore args '-export-fixes /path/to/file.yaml'
+          continue
+      tmpArgs.append(ct_args[i])
+      i += 1
+    for chunk in sorted(set([opts.adjust_chunk(arg) for arg in tmpArgs])):
         result.update(chunk)

     return result.hexdigest()

Client/server mode no longer working by default

Following doesn't work any more

This mode is enabled by setting the CTCACHE_HOST (localhost by default) and optionally CTCACHE_PORT (5000 by default) environment variables.

This is after commit a10ce62. Simplest fix is to change the new option

    def save_output(self) -> bool:
        return os.getenv("CTCACHE_SAVE_OUTPUT", "1") == "1"

so that it returns false by default, maintaining the old behaviour by default, i.e. return os.getenv("CTCACHE_SAVE_OUTPUT", "0") == "1".

[suggestion] change CTCACHE_STRIP delimiter to platform default path delimiter

Using CTCACHE_STRIP gets a bit hairy under Windows, as the currently used delimiter (:) is part of each and every absolute path. This unnecessarily makes usage of CTCACHE_STRIP harder for absolute paths.

This is the relevant piece of code:

self._strip_list = os.getenv("CTCACHE_STRIP", "").split(':')

It would be nice if this could be changed to use os.pathsep instead of : in order to adhere to different OS conventions, unless there is a specific reason to prefer : as the delimiter.

Failing to parse the compile commands DB

Hello!

Thanks for a fantastic tool! However, I tried to integrate ctcache into my project and experienced the following issue. _load_compile_command_db fails silently when trying to parse the JSON. The problem seems so be that the backslashes are not escaped properly: Expecting ',' delimiter: line 1094 column 151 (char 232775). The compile_commands.json file causing the error is attached.

Getting rid of replace in _load_compile_command_db seems to work as a temporary workaround for me:

def _load_compile_command_db(self, filename):
    try:
        f = open(filename)
        self._compile_commands_db = json.loads(f.read())
        f.close()
    except Exception:
        return False

Also, I think that it would probably be better to explicitly signal about JSON parsing errors instead of silently suppressing them.

compile_commands.json.txt

Successful compile command reported as erroneous

I was wondering why ctcache didn't cache any clang-tidy output for my header files and after some investigation it turned out that the compile command stored for my header files would emit warning: treating 'c-header' input as 'c++-header' when in C++ mode, this behavior is deprecated [-Wdeprecated] when executed, but still succeed. Your current logic for any compiler that isn't MSVC is that you don't look at the compile command exit code but rather whether stderr is empty (

if stderr:
).

Shouldn't a check for the exit code suffice for all compilers?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.