Coder Social home page Coder Social logo

jackwadden / anmlzoo Goto Github PK

View Code? Open in Web Editor NEW
32.0 6.0 18.0 101.08 MB

High-performance automata-processing engines are traditionally evaluated using a limited set of regular expression rulesets. While regular expression rulesets are valid real-world examples of use cases for automata processing, they represent a small proportion of all use cases for automata-based computing. With the recent availability of architectures and software frameworks for automata processing, many new applications have been found to benefit from automata processing. These show a wide variety of characteristics that differ from prior, popular regular-expression benchmarks, and these should be considered when designing new systems for automata processing. This paper presents ANMLZoo, a benchmark repository for automata-based applications as well as automata engines for both von-Neumann and reconfigurable data flow architectures.

Shell 0.93% Python 88.28% Makefile 0.11% C++ 10.68%

anmlzoo's Introduction

ANMLZoo Automata Processing Benchmark Suite

IMPORTANT: if using ANMLZoo for experiments, please see ERRATA below...

ANMLZoo is under continual development. Please make sure to use the proper release for comparisons to prior work.

If you have any questions or comments, please feel free to contact [email protected] or create an Issue ticket.

Description

High-performance automata-processing engines are traditionally evaluated using a limited set of regular expressionrulesets. While regular expression rulesets are valid real-world examples of use cases for automata processing, they represent a small proportion of all use cases for automata-based computing. With the recent availability of architectures and software frameworks for automata processing, many new applications have been found to benefit from automata processing. These show a wide variety of characteristics that differ from prior, popular regular-expression benchmarks, and these should be considered when designing new systems for automata processing. This paper presents ANMLZoo, a benchmark repository for automata-based applications as well as automata engines for both von-Neumann and reconfigurable dataflow architectures.

Errata

Since the publication of ANMLZoo, we have found a few issues with the construction of the benchmarks. We've listed the main issues below that may impact your experimentation with suggested ways to get around this. We will be updated the benchmark suite with a Version 1.1 Summer 2017 that addresses most of these issues.

  1. The prefix merging algorithm in VASim had a bug that missed some minimization opportunities: We have since fixed this bug and are now able to properly minimize applications like SPM. SPM even in the ANMLZoo paper had a node count of over 100,000! We originally included the application because the Micron compiler was able to identify these optimization opportunities.

  2. RandomForest was incorrectly generated: RandomForest was originally incorrectly generated for the ANMLZoo paper. We recognized this very early on and we have already generated a new application that fits within a single chip. I have noted this in the README for that application. The new application is rf.1chip.anml, while the old application used for the paper remains in the repo. This is discussed in the RandomForest README.

  3. Some ANMLZoo benchmarks were improperly labeled as compiling to 1 chip: To generate ANMLZoo "standard candle" automata, we increased the widget count until the number of "rectangular blocks" used by the Micron compiler violated the number of available rectangular blocks on the Micron D480 chip. Unfortunately, this was not the correct way to identify if an automata used more than 1 chip's worth of resources. Consequently, we now know that applications such as Levenshtein, EntityResolution, Snort, and ClamAV actually require 1.5 chips (or 3 half cores). We are looking to remedy this in Version 1.1.

TODO

The benchmark suite has known mistakes outlined int the Errata. However, there are features that many users have requested that we plan to add in future versions. They are outlined below. If you would like a feature or application added to ANMLZoo, please create an issue ticket with a full description and use case of your feature.

  • Fix Erratum #3.
  • Add code for ANML emission.
  • Support MNRL file format.
  • Add more inputs for training and testing of automata optimization algorithms and automata processing engines and architecture development.
  • Add more automata within each "benchmark" label. Applications like Hamming and Levenshtein have huge amount of play in how they are generated. They were originally generated with semi-arbitrary parameters and so other applications with other dimensions could be added.

Benchmark Contributors

Jack Wadden
Vinh Dang
Deyuan Guo
Elaheh Sadredini
Ke Wang
Chunkun Bo
Nathan Brunelle
Tom Tracy II
Matt Grimm

This suite was originally compiled by Jack Wadden ([email protected]).

If you use this benchmark suite in a publication, please use the following citation:

Wadden, J., Dang, V., Brunelle, N., Tracy II, T., Guo, D., Sadredini, E., Wang, K., Bo, C., Robins, G., Stan, M., and Skadron, K. "ANMLZoo: A Benchmark Suite for Exploring Bottlenecks in Automata Processing Engines and Architectures." 2016 IEEE International Symposium on Workload Characterization (IISWC'16). IEEE, 2016.

@inproceedings{ANMLZoo,  
    title={{ANMLZoo: A Benchmark Suite for Exploring Bottlenecks in Automata Processing Engines and Architectures}},  
    author={Wadden, Jack and Dang, Vinh and Brunelle, Nathan and Tracy II, Tom and Guo, Deyuan and Sadredini, Elaheh and Wang, Ke and Bo, Chunkun and Robins, Gabriel and Stan, Mircea and Skadron, Kevin},
    booktitle={Proceedings of the IEEE International Symposium on Workload Characterization (IISWC)},  
    year={2017},  
}

License

Each benchmark and automata processing engine in ANMLZoo is individually licensed. Please refer to the benchmark directories for individual license files.

Acknowledgements

This work was started at the University of Virginia and was supported by the following organizations at some point in time: The ARCS Foundation, the National Science Foundation (CCF-1116673, CCF-1629450, EF-1124931), Micron Technologies, and the Center for Future Architectures Research (C-FAR), one of the six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and Darpa.

anmlzoo's People

Contributors

jackwadden avatar jeffudall avatar lemons2lemonade avatar tjt7a avatar warsier avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

anmlzoo's Issues

Protomata protomata_new.anml does not compile

When attempting to compile protomata_new.anml with apcompile version 1.7-32, I receive the following error:

Error [-141]: on line 33760 - STE contains an invalid symbol set specification.

It appears that the STE defined on that line has an empty symbol set:

<state-transition-element id="__11088__" symbol-set="[]">
      <report-on-match reportcode="640"/>
</state-transition-element>

SPM computation source

I'm looking for a code similar to the one in RF, but to compute the rule mining sequences as NFA (.anml). ANMLZoo includes only bible in SPM, while the published paper supports more than one mining example.

Levenshtein ANML not reporting properly

The standard Levenshtein ANML file in ANMLZoo (24_20x3.1chip.anml) only reports on the final character of a string, even though it indicates in the name that it should have an edit distance of d=3.

24_20x3 1chip_anml 24_20x3.1chip.anml

The leven program properly reports on this string:
leven 20x3_anml leven_20x3.anml

Notes:
If I make a Levenshtein ANML with the Micron code it reports edits from BEFORE the end (but does not report inserts)
micron_test Micron_test.anml

The leven program reports inserts both before and after end of string
leven_micron_test leven_Micron_test.anml

The leven program also reports properly for insertions:
leven_micron_test_insertions leven_Micron_test.anml

Brill brill_opt.anml does not compile

Compiling of Brill benchmark brill_opt.anml fails with apcompile 1.7-32:

Error [-204]: on line 1 - ANML syntax error

Adding additional attributes to the <anml> tag appears to fix this:

 <anml version="1.0"  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

CPU performance in ANMLZoo paper

Hello Jack,

In ANMLZoo paper "ANMLZoo: A Benchmark suite .. ", the CPU performance in Fig 8 appears almost the same in all the benchmark. I know that each benchmark has different number of states and level of complexity.

Can you explain why this occurred? how did you produce it via VASim?

Thank you,
Rasha

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.