Coder Social home page Coder Social logo

pycachesim's Introduction

pycachesim

A single-core cache hierarchy simulator written in python.

image

The goal is to accurately simulate the caching (allocation/hit/miss/replace/evict) behavior of all cache levels found in modern processors. It is developed as a backend to kerncraft, but is also planned to introduce a command line interface to replay LOAD/STORE instructions.

Currently supported features:
  • Inclusive cache hierarchies
  • LRU, MRU, RR and FIFO policies
  • N-way cache associativity
  • Write-allocate with write-back caches
  • Non-write-allocate with write-through caches
  • Write-combining with sub-blocking
  • Tracking of cacheline states (e.g., using dirty bits)
  • Speed (core is implemented in C)
  • Python 2.7+ and 3.4+ support, with no other dependencies
Planned features:
  • Report cachelines on all levels (preliminary support through backend.verbosity > 0)
  • Report timeline of cache events (preliminary support through backend.verbosity > 0)
  • Visualize events (html file?)
  • Interface to Valgrind Infrastructure (see Lackey) for access history replay.
  • (uncertain) instruction cache
  • Optional classification into compulsory/capacity and conflict misses (by simulating other cache configurations in parallel)
  • (uncertain) multi-core support

License

pycachesim is licensed under AGPLv3.

Usage

from cachesim import CacheSimulator, Cache, MainMemory

mem = MainMemory()
l3 = Cache("L3", 20480, 16, 64, "LRU")  # 20MB: 20480 sets, 16-ways with cacheline size of 64 bytes
mem.load_to(l3)
mem.store_from(l3)
l2 = Cache("L2", 512, 8, 64, "LRU", store_to=l3, load_from=l3)  # 256KB
l1 = Cache("L1", 64, 8, 64, "LRU", store_to=l2, load_from=l2)  # 32KB
cs = CacheSimulator(l1, mem)

cs.load(2342)  # Loads one byte from address 2342, should be a miss in all cache-levels
cs.store(512, length=8)  # Stores 8 bytes to addresses 512-519,
                         # will also be a load miss (due to write-allocate)
cs.load(512, length=8)  # Loads from address 512 until (exclusive) 520 (eight bytes)

cs.force_write_back()
cs.print_stats()

This should return:

CACHE *******HIT******** *******MISS******* *******LOAD******* ******STORE*******
   L1      1 (       8B)      2 (      65B)      3 (      73B)      1 (       8B)
   L2      0 (       0B)      2 (     128B)      2 (     128B)      1 (      64B)
   L3      0 (       0B)      2 (     128B)      2 (     128B)      1 (      64B)
  MEM      2 (     128B)      0 (       0B)      2 (     128B)      1 (      64B)

Each row refers to one memory-level, starting with L1 and ending with main memory. The 3 loads in L1 are the sum of all individual accesses to the cache-hierarchy. 1 (from first load) + 1 (from store with write-allocate) + 1 (from second load) = 3.

The 1 hit, is for bytes which were cached already. Internally the pycachesim operates on cache-lines, which all addresses get transformed to. Thus, the two misses throughout all cache-levels are actually two complete cache-lines and after the cache-line had been loaded the consecutive access to the same cache-line are handled as hits. That is also the reason why data sizes increase from L1 to L2. L1 is accessed byte-wise and L2 only with cache-line granularity.

So: hits, misses, stores and loads in L1 are byte-wise. Every other statistical information are based on cache-lines.

When using victim caches, setting victims_to to the victim cache level, will cause pycachesim to forward unmodified cache-lines to this level on replacement. During a miss, victims_to is checked for availability and only hit if it the cache-line is found. This means, that load stats will equal hit stats in victim caches and misses should always be zero.

Comparison to other Cache Simulators

While searching for more versatile cache simulator for kerncraft, I stumbled across the following:

  • gem5: Very fully-featured full system simulator. Complex to extract only the memory subsystem
  • dineroIV: Nice and simple code, but does not support exclusive caches and not available under open source license.
  • cachegrind: Maintained and stable code of a well established open source project, but only supports inclusive first and last level caches.
  • callgrind: see cachegrind
  • SMPcache: Only supports one single cache and runs on Windows with GUI. Also not freely available.
  • CMPsim: Was only academically published and source code never made available.
  • CASPER: Was only academically published and source code never made available.
Package instructions1 blocks2 sub-blocks3 associtivity4 LRU5 MRU6 FIFO7 RR8 CCC9 3+ levels10 exclusive11 victim12 multi-core13 API14 open source15

gem5 dineroIV cachegrind callgrind SMPcache CMPsim CASPER pycachesim

x x x x

x

x x x x x x x x

? x

x x

x x x x x x x x

x x x x x x x x

x

x x x

x x

x x x x

? x

x x x x

? x

?

x

x x

x x x

?

?

x

?

?

x

?

x x

python, ruby, c++ c cli cli Windows GUI ? perl, c python, C backend

yes, BSD-style no, free for non-comercial use yes, GPLv2 yes, GPLv2

no, free for education und research

no, source not public no, source not public yes, AGPLv3


  1. Instruction cache support (typically L1I)

  2. Cacheline/block granular caching

  3. Sub-blocking/sectoring for in cache-storage

  4. Support for n-way associativity

  5. Support least-recently-used (LRU), most-recently-used (MRU), first-in-last-out (FIFO), random (RR) replacement policy

  6. Support least-recently-used (LRU), most-recently-used (MRU), first-in-last-out (FIFO), random (RR) replacement policy

  7. Support least-recently-used (LRU), most-recently-used (MRU), first-in-last-out (FIFO), random (RR) replacement policy

  8. Support least-recently-used (LRU), most-recently-used (MRU), first-in-last-out (FIFO), random (RR) replacement policy

  9. Classification of misses into: compulsory (first time access), capacity (access after replacement), conflict (would have been a hit with full-associativity)

  10. Combining of at least three cache levels

  11. Exclusive cache relations (two levels may not share the same cacheline)

  12. Victim caches, where only evicted lines endup(e.g., AMD Bulldozer L3)

  13. Multi-core cache hierarchies with private and shared caches and cache coherency protocol

  14. Supported interfaces (cli = command-line-interface)

  15. Published under an Open Source Initiative approved license?

pycachesim's People

Contributors

cod3monk avatar janljl avatar lailaelbeheiry avatar mikuhn avatar reynozeros avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pycachesim's Issues

Segfaults with 64bit addresses

I see that 64bit support was added pretty recently. I get a lot of nondeterministic segfaults when trying to use the backend.c by itself.

Randomly on these line:

self->placement[set_id*self->ways+i] = self->placement[set_id*self->ways+i-1];

if(self->placement[set_id*self->ways+i].invalid == 0 &&

In cases when it crashes set_id and cl_id are negative while the source addr is positive, I get the feeling that shouldn't be the case.

Cache__get_cacheline_id takes addr as a long long, shifts it to the right, then returns it as a long. This can result in a negative return value, is that ok?

inline static long Cache__get_cacheline_id(Cache* self, long long addr) {
return addr >> self->cl_bits;
}

Request for Instructions on How to Feed a binary file to the simulator

Hi,

I find your simulator very interesting for my research. I am trying to figure out how to feed a workload into the simulator but unfortunately I could not find any information.I would be much thankful if you could provide some instructions on how to do it.

Thank You &
Warm Regards,
Piyumal

Consider Write Allocate Misses not as LOADs

Currently a Write Allocate Miss leads to a LOAD on the same cache level. This in-turn produces a LOAD and MISS in the statics.

A better solution would be to produce a LOAD in the corresponding load_from cache and inject the cacheline into the current level.

There seems to be a logic error in cache block size checks

in cache.py Cashe.init() line ~293
assert store_to is None or store_to.cl_size <= cl_size,
"cl_size may only increase towards main memory."
assert load_from is None or load_from.cl_size <= cl_size,
"cl_size may only increase towards main memory."

if cl_size may only increase towards main memory then I expect
l1b = 32, l2b = 64, l3b =128 to work since l3 is closest to memory and l1 is furthest
but it does not
l1b = 128, l2b = 64, l3b =32 works
obviously these sizes are usually the same
but intuitively I think the comment is correct and the logic is wrong

Understanding stats: more hits than loads in L1 ?!

Hi all and thanks for the tool.

I'm feeding pycachesim with addresses and sizes to collect cache statistics.
I only issue loads, since stores are done in the initialization of my application and are negligible for my initial estimation.
I have instantiated a memory hierarchy exactly as in the readme

mem = MainMemory()
l3 = Cache("L3", 20480, 16, 64, "LRU")  # 20MB: 20480 sets, 16-ways with cacheline size of 64 bytes
mem.load_to(l3)
mem.store_from(l3)
l2 = Cache("L2", 512, 8, 64, "LRU", store_to=l3, load_from=l3)  # 256KB
l1 = Cache("L1", 64, 8, 64, "LRU", store_to=l2, load_from=l2)  # 32KB
cs = CacheSimulator(l1, mem)

and then I do a lot of loads:

for line in trace:
    address, n_bytes = parse_line(line)
    cs.load(address, length=n_bytes)
cs.force_write_back()
cs.print_stats()

However, when I print the stats, I'm getting more hits than loads in the L1:

CACHE *******HIT******** *******MISS******* *******LOAD******* ******STORE******* ******EVICT*******
   L1 6774536 (345751301B)  15627 (  812688B) 3815705 (194257152B)      0 (       0B)      0 (       0B)
   L2   8025 (  513600B)   7602 (  486528B)  15627 ( 1000128B)      0 (       0B)      0 (       0B)
   L3    269 (   17216B)   7333 (  469312B)   7602 (  486528B)      0 (       0B)      0 (       0B)
  MEM   7333 (  469312B)      0 (       0B)   7333 (  469312B)      0 (       0B)      0 (       0B)

Also, adding bytes from hits and misses, does not sum up to loaded bytes. Are loaded bytes taken into account differently from hits/miss calculation? Am I assuming something here that pycachesim is not able to handle?

Also, n_bytes often is larger than a cache line (64B), because I trace the load of a variable size structure. Do I have to manually break it into loads of 64B (or less) for the simulator to work correctly? Or it already breaks it into multiple loads under the hood?

For L2 and higher, data seems to be consistent (e.g., hit+miss = loads).

Thank you in advance.

Incompatibility with 64-bit addresses

I encountered a bug where pycachesim is incompatible with 64-bit addresses. It seems that all addresses are truncated to 32-bits. The following test illustrates the problem:

from cachesim import CacheSimulator, Cache, MainMemory

mem = MainMemory()
l1 = Cache("L1", 1, 1, 32, "LRU")
mem.load_to(l1)
mem.store_from(l1)
cs = CacheSimulator(l1, mem)

cs.load(0xf, 4)
print('MISS', l1.stats()['MISS_count'], 'HIT', l1.stats()['HIT_count'])

cs.load(0xf, 4)
print('MISS', l1.stats()['MISS_count'], 'HIT', l1.stats()['HIT_count'])

print("Should be a miss -> ")
cs.load(0xf | (1<<32), 4)
print('MISS', l1.stats()['MISS_count'], 'HIT', l1.stats()['HIT_count'])

cs.load(0xf | (1<<32), 4)
print('MISS', l1.stats()['MISS_count'], 'HIT', l1.stats()['HIT_count'])

The result is (with 0.3.1):

MISS 1 HIT 0
MISS 1 HIT 1
Should be a miss -> 
MISS 1 HIT 2
MISS 1 HIT 3

Feature Request: Get loaded cache lines

As a user, I would like to know which memory addresses are loaded from the MainMemory. I came up with a "solution" that parses cachesims log output. But really that is not a real solution. Here is my "solution" for reference:

import sys
import io
from cachesim import CacheSimulator, Cache, MainMemory

mem = MainMemory()
l1 = Cache("L1", 128, 4, 32, "LRU")
mem.load_to(l1)
mem.store_from(l1)
cs = CacheSimulator(l1, mem)
l1.backend.verbosity=4

def cache_op(cache, op, addr, width):
    """This is wild"""
    old = cache.stats()['MISS_count']
    old_sys = sys.stdout
    sys.stdout = io.StringIO()
    op(addr, width)
    X = sys.stdout
    sys.stdout = old_sys
    delta =  cache.stats()['MISS_count'] - old
    lines = [x for x in X.getvalue().split("\n") if 'MISS' in x and 'LOAD' in x]
    cls = []
    for line in lines:
        line = line[line.index("cl_id=")+len("cl_id="):]
        line = line[:line.index(" ")]
        line = int(line)
        if line not in cls:
            cls.append(line)
    assert len(cls) == delta, X.getvalue()
    return cls

print('Loaded CL_ids:', cache_op(l1, cs.load, 16, 4))
print('Loaded CL_ids:', cache_op(l1, cs.load, 20, 4))
print('Loaded CL_ids:', cache_op(l1, cs.load, 32, 4))
print('Loaded CL_ids:', cache_op(l1, cs.load, 127, 4))

Result:

Loaded CL_ids: [0]
Loaded CL_ids: []
Loaded CL_ids: [1]
Loaded CL_ids: [3, 4]

Steps to add a feature in the C backend

I want to add a feature to the backend and test it out with pycachesim. What are the steps that I need to follow for my changes to reflect in the python libs ? Do i have to recompile the backend into a shared-object every time I make a change ?

Any hellp will be appreciated. Thanks in advance.

Circular import

I am trying to run examples.py with

python examples.py

on pycachesim directory, but I've got this error

ImportError: cannot import name 'backend' from partially initialized module 'cachesim' (most likely due to a circular import)

I'm currently running windows 11 on my machine

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.