spinnakermanchester / spinnaker_pdp2 Goto Github PK
View Code? Open in Web Editor NEWCognitive systems modelling on SpiNNaker
License: GNU General Public License v3.0
Cognitive systems modelling on SpiNNaker
License: GNU General Public License v3.0
With the latest changes, the number of missing ticks reported by 'output_mon_lens' has increased significantly.
This appears to be a problem of SDP messages being dropped, not a problem with actual computation on SpiNNaker.
A delay was inserted before sending data to the host to try to reduce the problem [process_t.c/line 308]. This seems to be working but should only be a temporary measure. We may need to stop using output_mon_lens and use a different approach to report data.
Reported by @lplana
Replicated with: https://github.com/SpiNNakerManchester/IntegrationTests/tree/less_java
With Java on it works as normal
Currently, only the SDRAM region requirements are included. This may result in a run-time error if cores run out of SDRAM.
The following features of Lens-style training sets are currently not supported:
Some of these features are required for the example networks provided, such as the simple visual-semantic-phonological network visSemPhon.
Network congestion is present in large networks and congestion leads to packet dropping.
Occasionally, dropped packets can not be picked up by the reinjector and are permanently lost. leading to deadlock as every packet is required to complete the computation.
The current implementation does not have a way of dealing with permanently lost packets.
This issue has also been raised in SpiNNakerManchester/SpiNNFrontEndCommon
Compiling the SpiNNaker_PDP2 C code with armcc produces the following error:
armcc -c --c99 --cpu=5te --apcs interwork --min_array_alignment=4 -I /home/plana/scratch/gfe_tests/pdp2/spinnaker_tools/include -Ofast -Wall -Wextra -DPRODUCTION_CODE -Otime -DAPPLICATION_NAME_HASH=0xa43c1b0a -g -o build/input.o input.c
Fatal error: C3900U: Unrecognized option '-all'.
Warning: C3910W: Old syntax, please use '-E'.
Fatal error: C3900U: Unrecognized option '-xtra'.
The error seems to be related to the use of options '-Wall' and '-Wextra'. Additionally, using options '-Ofast' and '-Otime' does not seem right.
The SpiNNaker_PDP2 Makefile does not set any compilation flags or options. It #includes 'Makefile.SpiNNFrontEndCommon'.
Compilation completes correctly for arm-none-eabi-gcc.
Currently, a w core is created in the machine graph for every possible (group, group) pair, even if a link between the pair does not exist. This results in an all-zero matrix, which is wasteful. Unfortunately, they cannot simply be removed because they contribute to system synchronisation.
Currently, each group is transformed into a single [w, s, i, t] core pipeline, irrespective of the group size (in terms of units). This will not scale to any arbitrary size.
It is possible to test implementation correctness by comparing the output files generated by the examples in the repository with reference output files.
The reference files are attached here. Please note that the extension has been changed from '.out' to '.txt' due to repository file type restrictions.
example rand10x40:
REF_rand10x40_test_20e.txt
REF_rand10x40_train.txt
REF_rand10x40_train_test_20e.txt
example rogers-basic:
REF_rogers-all-links.txt
example simple_past_tense:
REF_simple_past_tense_train_test.txt
Currently, a global timeout is used, which assumes that there is an upper bound on program execution time. This is not adequate.
A better alternative is to time out on lack of progress. Given that all cores need to send and receive packets "continuously", an upper bound can be set for lack of progress. This has to be done carefully as both deadlock and livelock must be catered for.
After the update to align PDP2 with SpiNNTools version 6, PacketGatherer-related warnings appear when running the examples. Some of the warnings are included below.
Selected warnigs:
2021-05-05 20:36:35 WARNING: The transmission buffer for SYSTEM:PacketGatherer(0,0) on 0,0,2 was blocked on 402924800 occasions. This is often a sign that the system is experiencing back pressure from the communication fabric. Please either: 1. spread the load over more cores, 2. reduce your peak transmission load, or 3. adjust your mapping algorithm.
2021-05-05 20:36:35 WARNING: The callback queue for SYSTEM:PacketGatherer(0,0) on 0,0,2 overloaded on 2560 occasions. This is often a sign that the system is running too quickly for the number of neurons per core. Please increase the machine time step or time_scale_factor or decrease the number of neurons per core.
2021-05-05 20:36:35 WARNING: The DMA queue for SYSTEM:PacketGatherer(0,0) on 0,0,2 overloaded on 278530 occasions. This is often a sign that the system is running too quickly for the number of neurons per core. Please increase the machine time step or time_scale_factor or decrease the number of neurons per core.
2021-05-05 20:36:35 WARNING: A Timer tick callback in SYSTEM:PacketGatherer(0,0) on 0,0,2 was still executing when the next timer tick callback was fired off 71565312 times. This is a sign of the system being overloaded and therefore the results are likely incorrect. Please increase the machine time step or time_scale_factor or decrease the number of neurons per core
2021-05-05 20:36:35 WARNING: The timer for SYSTEM:PacketGatherer(0,0) on 0,0,2 fell behind by up to 402655296 ticks. This is a sign of the system being overloaded and therefore the results are likely incorrect. Please increase the machine time step or time_scale_factor or decrease the number of neurons per core
We have had to set the integration test on simple_past_tense.py
to Convert SpinnmanTimeoutException or SpiNNManCoresNotInStateException
to SkipTest as it happens much much too often.
Currently, the only mechanism supported is to read a Lens weights file.
Currently, routing keys are assigned by the GFE using the SpiNNTools default key assignment algorithm. This works correctly but a targeted algorithm could result in a more efficient use of the key space.
Currently, each vertex requests a part of the key space in which to indicate the unit being processed and, additionally, encode functionality. The added features normally include: packet type and colour, execution phase and group/subgroup data. These could be encoded efficiently in the routing key, saving key space and also saving decoding effort in the receiving core.
The assignment must be done carefully so that packets are sent only to where they are needed, as is done currently. This requires correct key/mask combinations.
Two possible approaches were suggested by the SpiNNaker software team, each with pros and cons:
The binaries are currently stored in a directory that not inside (a child directory of) the main spinn_pdp2 directory.
This result in the code only working if installed in developer/ edittable mode.
In developer / editable mode the spinn_pdp2 directory is only referenced not copied into site_packages
Therefor the code
path to binary files
binaries_path = os.path.join(os.path.dirname(file), "..", "binaries")
works.
In a normal install the spinn_pdp2 directory is copied into site_packages.
However the "binaries" directory is not.
It could be but with a generic name like "binaries" this is not recommended.
Also the build would fail it you need sudo access to site_packages.
The PyPA recommends that any data files you wish to be accessible at run time be included inside the package.
ref: https://setuptools.pypa.io/en/latest/userguide/datafiles.html
The fix is to move binaries under spinn_pdp2 and change the code that references it.
From pull request #29, output_mon_lens is no longer used. The generation of Lens-style output files is handled by function write_Lens_output_file ( ) in mlp_networks.py.
Group types such as BIAS (bias clamp) or INPUT (hard clamp) do not require a [w, s, i, t] core pipeline. They can be optimised.
The arrival of the following packets is not verified before moving to the next processing tick:
Arrival of these packets is difficult to verify due to their multicast/broadcast nature.
These packets are transmitted during periods of quiet network traffic and have a low probability of being dropped and missed, i.e., they are unlikely to be dropped and, if indeed dropped, they are very likely to be picked up and successfully reinjected.
Lens supports the use of different example sets for training and testing. Also, multiple sets can be used in each stage.
Often, testing is done on a different example set or sets (possibly a subset of the original, or sometimes something completely new, if generalisation performance is being assessed).
LENS has an option for loading all the example sets at the beginning and then switching between them, which is generally more efficient .
The MAX_CRIT group criterion is described in the Lens Manual.
This group criterion, implemented in function max_stop_crit() [file process_t.c] needs re-writing as the current implementation only works correctly if only one unit has the largest target, which is not usually the case.
Additionally, as this criterion is based on group-wide values rather than on individual unit ones, the function needs to make a correct distributed decision when the output group is split across multiple subgroups.
@joannavioletmoy reports result differences with respect to Lens when training network rand10x40 for 300 epochs, testing after every 10 epochs.
Although differences are expected due to fixed-point (PDP2) vs double (Lens) numeric representation, further verification is needed because implementation issues could also be the cause.
Weight fix-point representation was changed to s16.5. With larger weights, partial nets (s4.27 representation) can get outside the [-16.0, 16.0) range. They may need a longer type and saturation.
This may also be the case for error deltas backprop.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.