stanfordaha / aha Goto Github PK
View Code? Open in Web Editor NEWHome Page: https://stanfordaha.github.io/aha-wiki-page/
Home Page: https://stanfordaha.github.io/aha-wiki-page/
With the guidance of the website, I install the aha docker and run the "garnet", "halide", "map" successfully, but when I try to run "aha test apps/pointwise", it occurs some errors:
Running command: verilator -Wall -Wno-INCABSPATH -Wno-DECLFILENAME -Wno-fatal --cc Interconnect.v --exe Interconnect_driver.cpp --top-module Interconnect
%Warning-PINMISSING: Interconnect.v:2112: Cell has missing pin: ren_out
%Warning-PINMISSING: Use "/* verilator lint_off PINMISSING */" and lint_on around source to disable this message.
%Warning-PINMISSING: Interconnect.v:2112: Cell has missing pin: wen_out
%Warning-PINMISSING: Interconnect.v:10384: Cell has missing pin: parallel_out
%Error: Specified --top-module 'Interconnect' isn't at the top level, it's under another cell 'Garnet'
%Error: Exiting due to 1 error(s)
%Error: Command Failed /usr/bin/verilator_bin -Wall -Wno-INCABSPATH -Wno-DECLFILENAME -Wno-fatal --cc Interconnect.v --exe Interconnect_driver.cpp --top-module Interconnect
Found 1 error(s):
I don't have the Cadence Incisive, so I install the verilator.
All of the commands are:
aha garnet --width 16 --height 4 --verilog --interconnect-only --no-pd
aha halide apps/pointwise
aha map apps/pointwise --width 16 --height 4 --interconnect-only --no-pd
aha test apps/pointwise
Thanks a lot! Happy new year.
Hello. Using the currently provided docker, I configured cgra, compiled the example halide code, and performed RTL simulation.
Where can I find the source code for the provided binary aha? The code provided at 'cgra_pnr' directory and the pnr results of the binary tend to be different, and I am wondering how I can compile the binary anew.
Our CI 'aha-flow' has stopped working, because of a python-package version requirement in the docker build. In particular, aha/Dockerfile specifies a specific version 21.3 for the "packaging" package:
aha/Dockerfile line 95:
pip install packaging==21.3
This seemed to work fine for a long time, but now it seems to be causing an error later in the build https://buildkite.com/stanford-aha/aha-flow/builds/10718
#56 12.73 ERROR: requirements-parser 0.11.0 has requirement
packaging>=23.2, but you'll have packaging 21.3 which is
incompatible.
I will be submitting a change removing the version requirement for packaging==21.3
. My question for the group:
I was using garnet to test demosaic_complex app which have three output port(RGB) in parallel. It could not identify the pixel size from the gold.raw
file. Keyi suggest we take the .pgm
file from Halide and it have the pixel width information.
For an example of a failing build: https://buildkite.com/stanford-aha/aha-flow/builds/940
Dependabot encountered the following error when parsing your .dependabot/config.yml
:
Automerging is not enabled for this account. You can enable it from the [account settings](https://app.dependabot.com/accounts/hofstee/settings) screen in your Dependabot dashboard.
Please update the config file to conform with Dependabot's specification using our docs and online validator.
Currently, it seems like even if the builds succeed, the PR is reported as a failure. e.g. cdonovick/peak#137
I've been trying to go through the basic flow in the readthedocs page just to get started. Everything works well until I run:
aha test apps/pointwise
At which point I get a long error printout from the simulator which ends with:
file: flop_unq1.sv
module worklib.flop_unq1:sv
errors: 0, warnings: 0
Elaborating the design hierarchy:
Caching library 'worklib' ....... Done
.config_config_addr(config_config_addr),
|
ncelab: *E,CUVPOM (./Interconnect_tb.sv,88|26): Port name 'config_config_addr' is invalid or has multiple connections.
.config_config_data(config_config_data),
|
ncelab: *E,CUVPOM (./Interconnect_tb.sv,89|26): Port name 'config_config_data' is invalid or has multiple connections.
.config_read(config_read),
|
ncelab: *E,CUVPOM (./Interconnect_tb.sv,90|19): Port name 'config_read' is invalid or has multiple connections.
.config_write(config_write),
|
ncelab: *E,CUVPOM (./Interconnect_tb.sv,91|20): Port name 'config_write' is invalid or has multiple connections.
.stall(stall),
|
ncelab: *E,CUVPOM (./Interconnect_tb.sv,94|13): Port name 'stall' is invalid or has multiple connections.
irun: *E,ELBERR: Error during elaboration (status 1), exiting.
</STDOUT>
Found 1 error(s):
1) Got return code 1.
Traceback (most recent call last):
File "tbg.py", line 390, in <module>
test.test()
File "tbg.py", line 321, in test
directory=tempdir)
File "/aha/fault/fault/tester/staged_tester.py", line 276, in compile_and_run
self._compile_and_run(target=target, **kwargs)
File "/aha/fault/fault/tester/staged_tester.py", line 261, in _compile_and_run
self.run(target)
File "/aha/fault/fault/tester/staged_tester.py", line 251, in run
target_obj.run(self.actions)
File "/aha/fault/fault/system_verilog_target.py", line 715, in run
err_str=sim_err_str, disp_type=self.disp_type)
File "/aha/fault/fault/subprocess_run.py", line 163, in subprocess_run
raise AssertionError
AssertionError
Traceback (most recent call last):
File "/aha/bin/aha", line 11, in <module>
load_entry_point('aha', 'console_scripts', 'aha')()
File "/aha/aha/aha.py", line 51, in main
args.dispatch(args, extra_args)
File "/aha/aha/util/test.py", line 27, in dispatch
cwd=args.aha_dir / "garnet",
File "/usr/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/aha/bin/python', 'tbg.py', 'garnet.v', 'garnet_stub.v', PosixPath('/aha/Halide-to-Hardware/apps/hardware_benchmarks/apps/pointwise/bin/pointwise.bs.json')]' returned non-zero exit status 1.
Am I doing something wrong here? Do either @hofstee or @Kuree have any idea what might be going wrong?
Hey Steve, the retry strategy implemented in garnet.py and regress.py:
Line 49 in 251bd0e
subprocess.CalledProcessError
, and since there is no else statement to the if 'SIGSEGV' in str(e):
, any error won't be caught, and the regression will just continue on.
We probably want an else: raise Exception("Exception in mapping/garnet")
or something like that.
Hey @hofstee I'm trying to run the quick start described in the docs on kiwi, but I get the following error:
dhuff@kiwi:~/aha$ pip install -e .
Defaulting to user installation because normal site-packages is not writeable
Obtaining file:///home/dhuff/aha
ERROR: Package 'aha' requires a different Python: 3.6.9 not in '>=3.7'
Any idea what I'm doing wrong?
It seems like it gets to the point where a testbench is compiled in Verilator and then begins running, but after a while the process running Verilator is taking up 0% CPU and seems like it hangs. Not sure if this is because a testbench failed to terminate or something else.
Previously, using the aha Docker image, on a generic linux server, with Verilator installed, I was able to run the garnet generation, halide compilation, map/pnr, and the test bench sequentially on a subset of the applications the Halide-to-Hardware repo.
The problem I now have is that in the last few months I think lot of the feature work related to clockwork has moved forward and I am no longer able to coax the commands (garnet->halide->map->test) into running in the docker container. I suspect that the accumulated changes have fully deprecated the ability to run anything outside of the specific dev env the team uses that includes xcelium and Cadence IP.
Here's a series of commands that used to run for harris/resnet_layer_gen/others, that no longer does:
# within aha docker container
apt update && apt install verilator
...
python garnet.py --width 32 --height 16 --verilog --interconnect-only --no-pd
aha halide apps/harris
aha map apps/harris --width 32 --height 16
aha test apps/harris
Previously I had applied single line edits in garnet.py and tbg.py inspired by
Those no longer seem to help make it past errors.
Is the glb
/ global buffer related to/required for running these types of tests? #1277
Executive summary: I'm not sure we are properly testing dense resnet layers.
Details:
I'm not sure what to do with this information (yet). And I have not done the work to thoroughly, under controlled conditions, verify what's really happening (yet). But here's what it looks like:
I tried using the "aha regress" command to run 'conv1' on its own, as a dense test, and it failed.
But! If I include a non-layer test e.g. gaussian in the test suite, e.g. "gaussian" followed by "conv1", then both tests pass.
I run the tests with the garnet daemon turned ON, which means that the "conv1" test (re)uses the verilog that was built for the "gaussian" test, in the case where both tests pass. I think this is relevant. But I'm not sure what it means, since both are dense tests (I think?) and both should be using the same verilog anyway (right?)
The next thing I would probably try: What happens if we run "gaussian" + "conv1" with daemon OFF? I'm guessing that conv1 fails.
Until I have better information, I guess I will file this as both a "garnet" issue and an "aha" issue...
I will include @kalhankoul96 as an assignee, because I think he'll be interested, and because he can remove himself and/or add more assignees if there's anyone else that might obviously want to know about this...
Garnet issue is here: StanfordAHA/garnet#1070
Dependabot was set up to create pull requests against the branch coreir-dev
, but couldn't find it.
If the branch has been permanently deleted you can update Dependabot's target branch in the .dependabot/config.yml
file in this repo.
Hello, I have a question regarding Amber CGRA mapping.
Using the provided Gaussian application example, we obtained the final PNR result using the aha map and pnr commands. After checking, the placement of PE tiles used was relatively widespread. Can you tell me why?
I understand that in order to solve the placement problem in the thunder library used in the AHA PNR process, HPWL (sum of squared distances from the center of the net to each block (cluster)), spreading potential, legalization energy, etc. are calculated. As a result, isn't the placement of the PE tiles used as a PNR result close to each other?
The picture below shows the results of gaussian application pnr in the order described in the AHA wiki (https://github.com/StanfordAHA/aha/wiki/CGRA-Design-Flow). The layout seems to be different from what I expected, so I would appreciate it if you could let me know if I misunderstood anything.
And the provided application does not perform global placement because "DISABLE_GP" is all set to 1 in the aha/util/application_parameters.json file. Is there a reason? I arbitrarily tried to perform PNR with the DISABLE_GP setting removed, but an error occurred during the PNR process and it stopped midway. Is “DISABLE_GP” all set to 1 because of this phenomenon?
Thank you for reading the question.
Your dependency file specified a branch or reference for lake
, but Dependabot couldn't find it at the project's source. Has it been removed?
Whenever scripts and CI definitions update on master
, the changes need to be propagated to the other branches for the Buildkite CI to work. Additionally, we usually want to have the branches running the most updated version of CI they can.
One way to fix this might be to create a scripts
branch where updating the scripts is done and then merging that into both master
and the other branches.
Problem: AHA regressions show log information out of order. I.e. in regress.py
we see the following code
print(f"--- Running regression: {args.config}")
...
generate_sparse_bitstreams(sparse_tests, width, height)
However, in the log output, sparse_bitstream
info comes out before we see the Running regression
tag:
% docker exec $container /bin/bash -c "aha regress pr" >& log
% egrep '(BASE|Running)' regress.log
TEST BASE NAME: matmul_ijk
TEST BASE NAME: mat_mattransmul
...
--- Running regression: pr
An example can of this behavior can be seen in the Aha regressions
step of the buildkite log at https://buildkite.com/stanford-aha/garnet/builds/4660
Why: sparse_bitstream
log info comes out on stderr
, whereas the print("Running regression")
goes to stdout. Until the stdout
buffer reaches minimum capacity (e.g. 1024 characters), it does not get written out to the terminal screen. Meanwhile, stderr
continues to spew.
Solution: Periodically flush stdout
so that output appears in a more timely manner.
I have a fix that I will submit once the tapeout panic is over.
The official docker image garnet:latest
currently takes up about 7.5G of disk space.
(kiwi)% docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
stanfordaha/garnet latest 7579cf0747ef 4 days ago 7.41GB
Is there some easy way to reduce the image size? I have an idea that might help.
The .git
directories associated with some of the aha submod repos are enormous, due mainly to poor git-repo hygiene on the part of the repo maintainers (yes that includes me). Here (below) we see that the .git
metadata, for four submodules, takes up almost a gigabyte of disk space:
(aha) root@daa174656670:/aha# du -shx /aha/.git/modules/* | sort -hr
418M ../.git/modules/clockwork
340M ../.git/modules/Halide-to-Hardware
104M ../.git/modules/sam
63M ../.git/modules/garnet
The metadata is large generally because someone checked in a big binary file to the repo in question, e.g. the clockwork repo metadata includes a 90MB file libntl.a
. The file does not exist in the repo itself, at least not in the master branch, but because it was pushed and subsequently deleted, the file lives on as master-branch metadata. In addition, clockwork has over a hundred "active" branches, many of them very old and unused, and many of which still have the undeleted file. Extracting this, and many other similar, objects from the repos would be extremely time consuming.
But what if there was an easier way?
I have written a script that can recover a missing .git
directory, such that one might maybe
.git
file before writing the final image, and.bashrc
use the script to re-install .git
whenever a user fires up a new container.I am preparing a new aha branch to implement and test out this idea, initially targeting the largest offender (clockwork), watch for an upcoming related pull request on this topic.
Meanwhile, your comments and suggestions are more than welcome!
Dependabot encountered the following error when parsing your .dependabot/config.yml
:
The property '#/' contains additional properties ["automerged_updates"] outside of the schema when none are allowed
Please update the config file to conform with Dependabot's specification using our docs and online validator.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.