seer-lab / arc Goto Github PK
View Code? Open in Web Editor NEWA tool to automatically repair concurrency bugs in Java.
Home Page: http://www.sqrlab.ca/software/arc/
License: Mozilla Public License 2.0
A tool to automatically repair concurrency bugs in Java.
Home Page: http://www.sqrlab.ca/software/arc/
License: Mozilla Public License 2.0
Currently we have to manually adjust the ConTest's KingProperties file for every new program. Basically we need to make it ConTest will apply to the current system under test. To do this we need to modify the KingProperties file to specify the targetClasses and sourceDirs.
Due to issues with Pyevolve, we will create and use our own Evolutionary Algorithm in ARC.
Most of the algorithm components are present, such that really only the main loop is required.
It might be the situation where certain individuals might stray down a path that only results in poor success rates. An option that could be present is to prun/restart the lowest x scoring individuals (or lowest x% of the population).
This might be too strict and a better variation might be to only perform the pruning if the said individual has been in the lower bound for n generations.
The non-functional fitness score measures performance of the evolved software. This being said, we are looking at measuring the time resources used during the execution of the system under test.
We can acquire a large set of statistics using the Linux /usr/bin/time -v
command. Unfortunately this will break the cross-platform nature to some extent (we have not tried ARC on Windows nor Mac yet though).
The result of this command looks like the following
Command being timed: "python2 arc.py"
User time (seconds): 299.15
System time (seconds): 20.79
Percent of CPU this job got: 111%
Elapsed (wall clock) time (h:mm:ss or m:ss): 4:46.96
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 89368
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 1433714
Voluntary context switches: 20243
Involuntary context switches: 16198
Swaps: 0
File system inputs: 8
File system outputs: 3072
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
From this we can extract the maximum memory used with the Maximum resident set size field. I am not so sure memory is of a concern though. We can also figure the real-time of the process running on the CPU using User time + System time. We also have the Wall-time, though other process could effect this and I believe the real-time would be a better measure.
We can then calculate the score like so (we can also take into account avg/min/max values if we want):
non-functional score = (max_memory * m) + (real_time * n) + ((wall_time * p)
Again not too sure if we need wall-time or max memory.
The evolutionary algorithm works on optimizing the target software's functional or non-functional fitness score. The general process is to first optimize the functional fitness to ensure the correctness of the software and then to optimize the non-functional fitness score. The second phase attempts to optimize the software (in the most correct state found from the previous phase) for best performance without sacrificing any functionality.
Work in the functional phase till the functional termination occurs (no changes in the score for a while, or met a minimum fitness threshold). When the termination occurs a second more focused test is applied to ensure the individual indeed works correctly. Then the phase will switch to non-functional, when this switch occurs the population will now transform into the best found functional individual. Work is then done in the non-functional phase till the non-functional termination occurs (no changes in the score for a while). It must be noted that any individual that reduces the functional score is discarded.
When working off of the correct software found from the first phase we can assume that no errors are present. This assumption will allow us to reduce the timeout parameter to an average of x runs of the correct software (plus some cushion time.
The functional fitness is currently given a score based on the success rate. The problem with this is that it is hard to distinguish two individuals that have the same success rate.
The proposed solution to this is to consider the timeouts as well. A timeout could have resulted in a successful execution if the time limit was extended. Thus timeouts should effect the fitness (worse then a success, but better then deadlock/datarace).
Therefore the functional fitness score could be:
score = (# of success * n) + ( # of timeouts * m)
n in this case represents the factor of weighting for successful executions, while m is the weighting for timeouts.
It might also be worth considering the density of dataraces in an individual by using the number of errors in the testsuite.
Trying another way to upload latest changes
After every mutation, an individual gets compiled before the ConTest evaluation phase can occur. In a situation where the mutation causes the individual to fail in compiling there should be some handling to chose another mutation until mutations are exhausted (therefore restarting the individual) or a valid mutation compiles.
It might be more appropriate to move the compiling phase directly to the mutation method (to keep them tightly coupled).
Adding txl files (who do mutation) to the arc repository.
Currently, if two different members of the population arrive at the same program structure after mutation, each is evaluated. Time would be saved if after we do the first evaluation, we record a hash of the mutated program and the result of the evaluation. For every new mutation, we can check against the hash list. If we find it, there is no need to test it again. Reject the mutation and try again.
Some how need to make TXL replace variables based on the contents of a file (randomly).
The existing source code must be converted to Python 2.7 to work with the Pyevolve framework. Currently the source code is written in Python 3.2.
Additionally, some of the formatting must be adjusted.
The README must also reflect this change.
Create the individual representation that will be used in Pyevolve's evolutionary approach.
A 2D list of binary values (like the table presented) to act as the genome for the individual. Additionally, there will be values that each individual will track:
Mutation Operator 1: | 0 | 0 | 0 |
Mutation Operator 2: | 0 | ||
Mutation Operator 3: | 0 | 0 |
Mutation is a single change in the 2D list of binary values.
There is no crossover.
Finish the ConTest module that will perform the following:
Some of the mutation operators can be improved:
We need some mutation operators to address the new Java 5 concurrency structures:
We need a mutation operator that changes the lock object to that of another shared object
Currently the evolution process will only terminate if an individual meets 100% success rate or if the generation limit is reached.
It might be more applicable to include alternative terminating criteria that can also be used such as:
The mutation operators seem very general in terms of their applicability (for example, out of all the operators there is total - 1 used for datarace situations, and total used for deadlock situations. The heuristic search capabilities are not going to be visible nor effective in this situation.
The following describes an adaption to the mutation operator selection. Based on the prior executions results if the selected operator ends up increase the rate of successful executions, then it can be said that the mutation operator was effective. Thus in future generations and individuals might benefit from more applications of said mutation operator. Therefore all the mutation operator will have a dynamic rank based on their historical success rate for this software.
The dynamic ranking can be determined using the following formula:
rank_score += current_successes - previous_successes
The rank scores are then sorted and the highest scores are ranked first. From this ranking we then want to weight the percentages of a mutation operator to be selected for application. To avoid converging on only a few operators a floor and ceiling will be applied to the percentage modifiers. Most likely the format will be something like best rated will be +10%, while worst will be -10%, second best will be +8% while second worst will be -8%, etc... This approach still keeps all the operators in the play regardless of how they are ranked.
ARC currently mutates all individuals than evaluates all individuals and finally validates the potential solutions for a fix. The downside to the current implementation is if a potential fix is found in the first individual ARC does not evaluate it till after all individuals have been evaluated.
The proposed optimization is to evaluate->validate each individual as a whole. This way if a fix is found in early individuals ARC does not have to wait until all individuals are evaluated.
Create a 10 page paper about ARC.
Part (all?) of the problem we are having with projects not coming from the right source is in the
mutate function. For example, 'create representation' was being called before 'create local
project' and 'mutate project'.
The purpose of this branch is to attempt to fix these issues. It has already opened up additional
problems that I have created hotfixes for.
TODO:
The current approach for replacement is to first wait n generations and then perform the replacement every generation thereafter. This approach is a bit harsh and illogical. The proposed fix is to only perform the replacement every n generations instead, effectively on a defined interval.
The wiki must be updated with the new gnome representation, as well as the new approach for ARC.
The new genome representation is for a specific state of the individual. The only concerns are what mutations that can be applied to the individual (there is no crossover anymore, as the search consists of a heuristically guided mutations). Therefore, each bit represents a possible location where the associated mutation operator can be applied in the source code. The way that this works is that TXL will encounter the mutation's pattern in the source code in the same order, so it becomes possible to flip a bit and apply said mutation.
For example, this gnome can be acquired after checking how many pattern instances of mutation operator 1, 2 and 3 it detected.
Mutation Operator 1: | 0 | 0 | 0 |
Mutation Operator 2: | 0 | ||
Mutation Operator 3: | 0 | 0 |
Applying a random mutation might flip the right most bit of mutation operator 3. The result of this is that the second mutation operator will be applied at the second instance of the TXL match for that mutation pattern.
The new approach does not use crossover as the problem seems ill-fitted for it. After some brainstorming there was no clear approach to incorporate crossover. Therefore, crossover was removed entirely which now changes the search algorithm into more of an evolutionary approach. The solution will now be found by simply applying mutation in a heuristically guided manner using prior test execution feedback.
Also update the README so it remains current.
We discovered a bug: If an individual runs out of mutations, it's mutation list [ASAV, ..., SHSB] stops growing. The adjust operator weights functions scans past the end of the list at higher generations causing an out of bounds exception. We are considering how to fix this.
When we switch from functional to non-functional, the score changes radically. (Decreases as NF is just starting.) The sliding window will consider both the functional AND non-functional scores when ranking non-functional operators. We don't want this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.