Check if the condition for pruning is part of a larger boolean expression

While checking mutually exclusive conditions which are used for pruning, only basic conditions are considered.
e.g
patterns such as following are considered

txn OnCompletion
int 1
==
bnz

however, patterns where condition is part of larger expression are not considered

txn OnCompletion
int 1
==
txna ApplicationArgs 0
byte 0x65
==
&&
bnz label

Boolean expression which are simple chain of && could be implemented with present capabilities of tealer. However, generalizing this requires more analysis.

Infer group index on paths

Every basic block could have an associated group_indexes : Optional[List[int]] field, which we would collect based on things like

txn GroupIndex
int VALUE

This would allow us to start reasoning on the gtxn KEY field element, to see if we are checking a group transaction field from the current contract, or another one

Execution path redundancy highlighting

As a user I would like to have an option to highlight redundant asserts and operations along an execution path so that I can optimize execution and readability of my program.

Improve detector output

Currently, the output of the detector is difficult to read, we should clarify what it found and what it shows.

We might want to add a small description for each detector, similar to what we have for slither

Add printer to list methods, ids and it's args of ARC-4 application.

Improve support for groups of transactions

Most of the Algorand system relies on multiple contracts that interact with each other within a group of transactions, while currently tealer analyse one contract at the time. This leads to miss a lot of context-related information, and prevents the development of more complex analyses.

We should make tealer work on a set of contracts, and adds the semantics related information. Detecting the group of transaction, and what function should be called within the group might be tricky. I think we can try to have a couple of heuristics to detect this automatically, and have has a fallback mechanism a user configuration.

Plan:

Add support for a group of transactions
Add group configuration by the user
Detection of functions that should be called within a group
Creation of a “group information” printer
Update the existing detectors to rely on the group information

Update output format to reduce the number of output files

Currently, detectors explore paths recursively. For each execution path, the detector will check if the path satisfies a condition and does not report the execution path if it does satisfy the condition. The conditions are verified at the block level and the detector would not know whether to report a path or not until it reaches the end of it. This results in a large number of output files as each path is represented in a separate file.

Using the analysis described in #96, we can find the root block B which lacks the condition i.e All execution paths containing B will be vulnerable. This helps in representing multiple vulnerable paths in one output file and also helps developers in finding the code location to update i.e developers can just add the condition in block B.

Creation of a regex system to allow querying Tealer’s API and internal information

Creating detectors can be a difficult and long task, as it requires knowing about Tealer’s internals and Python. However, most of the existing detectors work by exploring the paths of the contracts and look for incorrect patterns (ex: all the paths check for canDelete). To ease the creation of detectors and custom rules for developers, we could create a DSL that will allow developers to query Tealer’s information without the need to know Tealer’s internals.

I don't have a clear view of what the DSL would like, but I think we can try to at least express the existing detectors in some form, and iterate over the format.

Plan

Extract high-level patterns from existing detectors
Design a DSL
Create a query API

rekeyTo check missing checks at beginning of code

tealer dev branch commit: 403cff3

The following code appears to be ignored and Tealer insists the smart contract contract can be rekeyed, yet the very first instructions are:

#pragma version 5
txn RekeyTo
global ZeroAddress
!=
txn CloseRemainderTo
global ZeroAddress
!=
||
txn AssetCloseTo
global ZeroAddress
!=
||
bnz main_l104

....

main_l104:
int 0
return

I end up getting about 2380 rekey_to errors in fact, even though it's literally the first thing checked.

CFG construction error

CFG recovery of teal programs fails when the subroutine jumps to a label which is defined before this subroutine and when a single subroutine contains multiple retsub instructions.

subroutine jumps to a label which is defined before this subroutine

#pragma version 5
b main
getmod:
    %
    retsub
is_odd:
    int 2
    b getmod
main:
    int 5
    callsub is_odd
    return

results in

subroutine containing multiple retsub instructions

#pragma version 5
b main
is_even:
    int 2
    %
    bz return_1
    int 0
    retsub
return_1:
    int 1
    retsub
main:
    int 4
    callsub is_even
    return

results in

most likely, this is because of assumptions

lack of back jump, which are introduced in Teal v4.

presence of single return point from a subroutine

tealer/tealer/teal/parse_teal.py

Lines 76 to 81 in 8a7fe21

    
           # A retsub? Assign the current open entry points list to it and then reset the list 
        
           if isinstance(ins, Retsub): 
        
               # pylint: disable=no-member 
        
               ins.set_labels(entries[:]) 
        
               entries = []

Parser incorrectly parses byte instructions with ":" character in them as labels

Parser uses following code to parse each line of teal program.

def parse_line(line: str) -> Optional[instructions.Instruction]:
    comment = ""
    if "//" in line:
        comment_start = line.find("//")
        # save comment
        comment = line[comment_start:]
        # strip line of comments and leading/trailing whitespace
        line = line[:comment_start].strip()
    if ":" in line:
        return instructions.Label(line[0 : line.find(":")])
    for key, f in parser_rules:
        if line.startswith(key):
            ins = f(line[len(key) :].strip())
            ins.comment = comment
            return ins

Above code checks for : character in the line for labels before comparing with instruction prefixes. As byte constants can contain :, they will be considered as label which is not the case. e.g

byte "this is not a label:2" // wil be parsed as label instruction by the parser.
thisisalabel:

Tealer parser doesn't parse all the formats used to define byte constants in teal

Teal allows defining byte constants in multiple formats. currently, only few formats are parsed correctly, namely byte and byte base64. Following are the format's used to define byte constants in teal

byte base64 AAAA...
byte b64 AAAA...
byte base64(AAAA...)
byte b64(AAAA...)
byte base32 AAAA...
byte b32 AAAA...
byte base32(AAAA...)
byte b32(AAAA...)
byte 0x0123456789abcdef...
byte "\x01\x02"
byte "string literal"

ref: https://developer.algorand.org/docs/get-details/dapps/avm/teal/specification/#constants-and-pseudo-ops

Create source code mapping

See algorand/pyteal#449

Code Patterns of False Positives for Transaction Field based detectors

Detectors such as feeCheck, canCloseAccount, canCloseAsset, rekeyTo traverse the CFG and detect if there's a check involving a transaction field.

txn [Fee | CloseTo | AssetCloseTo | RekeyTo]
[int ... | (addr ... | global ZeroAddress)]
[ == | < | > | .. ]

The same checks could be implemented following different patterns:
1.) Verifies transaction index in the group and then checks the transaction field by accessing it with gtxn INDEX FIELD. issue #75
e.g

txn GroupIndex
int 0
==
assert
gtxn 0 AssetCloseTo
global ZeroAddress
==
assert

2.) Verify the GroupSize is certain value and verify fields of all transactions in the group. issue #87
e.g

global GroupSize
int 2
==
...
gtxn 0 CloseRemainderTo
global ZeroAddress
==
&&
gtxn 1 CloseRemainderTo
global ZeroAddress
==
&&
...

3.) Using a loop to iterate over all transactions in the group and verifying the transaction field.
e.g

12: int 0
13: store 9

14: label1:
15: load 9
16: Gtxns RekeyTo
17: global ZeroAddress
18: ==
19: assert
20: load 9
21: int 1
22: +
23: store 9
24: load 9
25: global GroupSize
26: <
27: bnz label1

Few false positives stem from not using pruning for stateless detectors.
e.g. Account can be closed only if the transaction is a payment transaction. canCloseAccount doesn't need to explore paths where the path is taken only if transaction type is not payment. And also canCloseAccount can only explore paths which are taken if and only if the transaction type is payment. Similar pruning conditions can be considered for canCloseAsset detector.

Add issue template

Add missing transactions fields

Tealer is missing some transaction fields (for example AssetSender). We need to go over the doc and check if others are missing and add them

Add printer to output possible values of transaction fields in json

Transaction field analysis is a general framework useful to find possible values of transaction fields.
Given a contract, the analysis gives information on the transaction calling the contract. For a given field F, it gives a list of possible values for F so that the contract execution will not fail. If field F has a value not present in the possible value, the execution will fail.

Assume T is the transaction responsible for the execution of contract C.

Kinds of information available:

Possible values of transaction T's field F irrespective of its position in the group. it does not matter if T is at index 0, 1, or .. F can only have one of the possible values.
Possible values of T's field F if it is at a given position in the group. If T is at index 0, what values can F have? If T is at index 1, what values can F have? This information is available for all possible indices.
Possible values of field F of other transactions in the group. In a group transaction containing T, what values can field F of the transaction at 0 have? what values can field F of the transaction at 1 have? This information is available for all possible indices.

Supported fields as of now:

GroupIndex: possible positions/indices of T in the group
GroupSize: possible sizes of groups containing T.
Transaction type: Possible transaction types T can have: (pay, axfer, ..., appl)? if application call (NoOp, OptIn, CloseOut, UpdateApplication)?
Fee field: Maximum fee T can have.
All address field types (RekeyTo, CloseRemainderTo, AssetCloseTo, Sender, ...): Taking RekeyTo as an example, possible values of T's RekeyTo? RekeyTo can be any address or RekeyTo must be ZeroAddress or RekeyTo has to be some known address or some unknown address?

Question regarding "Lack of OnCompletion check allows to delete the app"

Great tool!

Hi, i was running tealer for this contract in our algo-builder/examples. And i came accross:

Lack of OnCompletion check allows to delete the app
	Check the path in can_delete_1.dot

But in the pyTEAL code delete app is handled (check here). While analyzing the cfg, it shows a red path during onInitialize (app deployment) when appID == 0. How can app be deleted for this path? Could you please validate/confirm?

Here is the o/p path.

Dynamic str trick to remove lots of code.

Random generic idea. As Instructions/Fields class names seems to match 1:1 with _str string in most places.
Consider adding the following to the base classes and override only when necessary.

     def __str__(self):
         return self.__class__.__qualname__.lower()

This could be implemented in a metaclass or new if want to avoid the u-perf penalty.

Detect call to inner transaction without a fixed fee

Call to inner transaction should have the fee parameter set to a value (which should probably be zero)

Immediate arguments of instructions intcblock and bytecblock are not parsed

intcblock and bytecblock instructions are used to define constants which are stored separately from the stack and can be referenced later in the code using instructions intc, intc_[0123], bytec, bytec_[0123]. Most of the time, assembler creates the constants based on the integer and bytes used in the program. Teal also allows defining integer and byte constants explicitly using intcblock and bytecblock. The constants are space separated and are written in the same line. e.g.

intcblock 1 2 3
bytecblock "1" "2" "3"

Missing global fields and transaction fields in teal parser

There are few global fields (used with instruction global) and transaction fields (using with instructions txn, gtxn, ..) which are not defined in tealer parser but are stated in teal reference.

Missing Global Fields

CurrentApplicationAddress
GroupID

ref : https://developer.algorand.org/docs/get-details/dapps/avm/teal/opcodes/#global-f

Missing Transaction Fields

ExtraProgramPages
Nonparticipation
Logs (itxn only)(array field)
NumLogs (itxn only)
CreatedAssetID (itxn only)
CreatedApplicationID (itxn only)

ref: https://developer.algorand.org/docs/get-details/dapps/avm/teal/opcodes/#txn-f

Notes: few fields are only available with itxn instruction only.

Next release checklist

#1
#2

In addition, we should prepare the next release to be ready for pip to ease the installation

Have proper tests related to "usage"

Automated tests for running the tool from the terminal and comparing to the expected output.

Add CODEOWNERS

Missing teal instructions in teal parser

Few teal instructions are not parsed in tealer, might be because they were added in the recent version. These are the missing instructions

Missing Instructions

ecdsa_verify v
ecdsa_pk_decompress v
ecdsa_pk_recover v
loads
stores
cover n
uncover n
extract s l
extract3
extract_uint16
extract_uint32
extract_uint64
app_params_get i
log
itxn_begin
itxn_field f
itxn_submit
itxn f
itxna f i
txnas f
gtxnas t f
gtxnsas f
args

ref: https://developer.algorand.org/docs/get-details/dapps/avm/teal/opcodes/

Add support for instruction "aliases"

Teal assembler supports various alias instructions which are not mentioned in the Opcodes document.

method {s: method-signature}: Alias to bytes {4-byte method selector}
extract instruction without immediate arguments is equivalent to extract3.
txn instruction with 2 immediate arguments(txn {i} {j} is equivalent to txna {i} {j}
gtxn instruction with 3 immediate arguments(gtxn {i} {j} {k}) is equivalent to gtxna {i} {j} {k}
gtxns instruction with 2 immediate arguments(gtxns {i} {j}) is equivalent to gtxnsa {i} {j}

Tealer error in check_rekey_to

When running the command tealer contract.teal, I get the following error:

Traceback (most recent call last):
  File "/Users/gidon/tealer/venv/bin/tealer", line 33, in <module>
    sys.exit(load_entry_point('tealer==0.0.2', 'console_scripts', 'tealer')())
  File "/Users/gidon/tealer/venv/lib/python3.8/site-packages/tealer-0.0.2-py3.8.egg/tealer/__main__.py", line 75, in main
  File "/Users/gidon/tealer/venv/lib/python3.8/site-packages/tealer-0.0.2-py3.8.egg/tealer/detectors/rekeyto.py", line 79, in detect
  File "/Users/gidon/tealer/venv/lib/python3.8/site-packages/tealer-0.0.2-py3.8.egg/tealer/detectors/rekeyto.py", line 74, in check_rekey_to
  File "/Users/gidon/tealer/venv/lib/python3.8/site-packages/tealer-0.0.2-py3.8.egg/tealer/detectors/rekeyto.py", line 74, in check_rekey_to
  File "/Users/gidon/tealer/venv/lib/python3.8/site-packages/tealer-0.0.2-py3.8.egg/tealer/detectors/rekeyto.py", line 74, in check_rekey_to
  [Previous line repeated 18 more times]
  File "/Users/gidon/tealer/venv/lib/python3.8/site-packages/tealer-0.0.2-py3.8.egg/tealer/detectors/rekeyto.py", line 54, in check_rekey_to
KeyError: 4

I haven't managed to isolate the breaking code (the teal code is over 2000 lines) so I am not sure what the issue is.

When I run the command with the --print-cfg option then it does successfully create the CFG.

Fields of instructions asset_holding_get and asset_params_get are ignored by the parser

While parsing, the fields of instructions asset_holding_get and asset_params_get are not stored with the instruction. This might be intended. These are the fields

asset_holding_get fields

AssetBalance
AssetFrozen

ref: https://developer.algorand.org/docs/get-details/dapps/avm/teal/opcodes/#asset_holding_get-i

asset_params_get fields

AssetBalance
AssetFrozen
AssetDefaultFrozen
AssetUnitName
AssetName
AssetURL
AssetMetadataHash
AssetManager
AssetReserve
AssetFreeze
AssetClawback
AssetCreator

ref: https://developer.algorand.org/docs/get-details/dapps/avm/teal/opcodes/#asset_params_get-i

Add support for TEALv6 opcodes

Is there a plan to work on adding support for the new TEALv6 opcodes?

I'm primarily interested to be able to generate control-flow graphs of TEALv6 programs with Tealer, hence I suggest adding:

Representation of TEALv6 instructions into tealer/teal/instructions/instructions.py
Representation of Account Parameters Fields into a new file, tealer/teal/instructions/instructions.py, which are required for the acct_params_get f instruction.
New parsing rules into tealer/teal/instructions/parse_instruction.py
Relevant tests

I'm eager to work on this, but I wouldn't want to scoop other's effort, if somebody already planning to work on it.

Use assert conditions for pruning

For detectors such as canUpdate and canDelete, a set of mutually exclusive conditions are used for pruning. This conditions follow pattern

txn OnCompletion
int CONSTANT
==
[bz | bnz]

if the constant is NoOp or OptIn or CloseOut or DeleteApplication, the canUpdate can prune the branch(path) which is taken when the condition txn OnCompletion == int CONSTANT is true.

Detectors should consider assert instruction as well

txn OnCompletion
int CONSTANT
==
assert

If the detector detects this pattern, the detector could stop exploring paths further

"Not found txnas ApplicationArgs"

#12 (comment)

Execution Path Cost Analysis Feature Suggestion

Would be really awesome if we can get op code cost analysis along various execution paths, hard part would be how you handle branching and such.

Detect contracts lacking validations on asset id field of asset transfer transaction

If a contract verifies that a certain transaction in the group is an "AssetTransfer" Then it should also check that the asset-id of the transaction is valid.

See building-secure-contracts/not-so-smart-contracts/algorand/asset_id_check

Use data flow analysis to find constraints on transaction fields in a contract

Dataflow Analysis to find constraints on transaction fields

What are constraints on transaction fields

For each basic block in the CFG

For each Transaction Field, find all possible values for the given transaction field which leads to successful execution of the basic block.
- e.g: block_transaction_context[B][F] stores the possible values form field F in basic block B.
  if block_transaction_context[b]["GroupSize"] = [1, 3]. Then, if and only if the group size of the transaction invoking this contracts has groupsize 1 or 3,
  - The entry point of basic block b will be executed AND
  - The execution reaches the end of the basic block b without reverting AND
  - The execution will continue without reverting in atleast one of execution paths.
- i.e All execution paths that contain basic block b constraint the group size to 1 or 3.

NOTE: only assert instructions are considered to check if execution reverts or not.
e.g if sequence of instructions are

...
Global GroupSize
int 3
==
assert
...

Then

The groupsize must be 3 for successful execution of the basic block.
block_transaction_context[b]["GroupSize"] = [3]

Implementation

Source of constraints for a given basic block

Predecessor blocks
Block itself
Successor blocks

From Predecessors: if and only if field has these values the execution will reach this block.
From Block: if and only if field has these values the execution will reach exit point of block without reverting.
From Successors: if and only if field has these values the execution will successfully continue after exiting from the block.

block_transaction_context[b] = Union(block_transaction_context[bi] for bi in predecessors[b]) \
                            && block_transaction_context[b] \
                            && Union(block_transaction_context[bi] for bi in successors[b])
# Note: && is intersection

consider

B3 has predecessors B1, B2 and successors B4, B5.
Given following transaction contexts,

    # predecessors
    block_transaction_context[B1]["GroupSize] = [1, 3], 
    block_transaction_context[B2]["GroupSize] = [1, 5],
    # block
    block_transaction_context[B3]["GroupSize"] = [1, 3, 5]
    # successors
    block_transaction_context[B4]["GroupSize] = [3], # asserts groupsize is 3 in B4
    block_transaction_context[B5]["GroupSize] = [5], # asserts groupsize is 5 in B5

Then new context will be,

    block_transaction_context = {1, 3, 5} && {1, 3, 5} && {3, 5} = {3, 5}

Branch Based Constraints

Teal contracts generally use conditional branch instructions directly on the constraints(conditional expression).
e.g, consider B1 from above is

...
Global GroupSize
int 3
==
bnz B3

This implies

# while considering for B3 
block_transaction_context[B1]["GroupSize"] = [3] # instead of [1, 3] which is the case for other children of B1

To solve this, we can use path based transaction context while considering information from predecessors

# path_transaction_context[bi][b] gives the transaction constraints for path bi -> b
block_transaction_context[b] = Union((block_transaction_context[bi] && path_transaction_context[bi][b]) for bi in predecessors[b]) \
                            && block_transaction_context[b] \
                            && Union(block_transaction_context[bi] for bi in successors[b])
# Note: && is intersection

Algorithm

Phase 1

block_transaction_context = dict()
path_transaction_context = dict()

for B in basic_blocks:
    for F in transaction_fields:
        c = ConstraintsFromAssert(B, F)
        if c == set(): # NullSet(F)
            c = UniversalSet(F)
        
        block_transaction_context[B][F] = c

        if B[-1] == Ins(BZ) or B[-1] == Ins(BNZ):    # last instruction is conditional branch
            # constraints that must be satisfied to take the jump.
            true_branch_constraints = ComputeTrueBranchConstraints(B, F)
            # constraints if the default branch is taken
            false_branch_constraints = ComputeFalseBranchConstraints(B, D) # default branch
                        
            if true_branch_constraints = set(): # NullSet(F)
                true_branch_constraints = UniversalSet(F)

            if false_branch_constraints = set(): # NullSet(F)
                false_branch_constraints = UniversalSet(F)

            # successors[0] is the default branch and successors[1] is the target branch
            
            path_transaction_context[successors[B][0]][B][F] = false_branch_constraints
            path_transaction_context[successors[B][1]][B][F] = true_branch_constraints

Phase 2

worklist = [b for b in basic_blocks]

while worklist:
    B = worklist[0]
    worklist = worklist[1:]
    new_transaction_context = merge_information(B)
    if new_transaction_context != block_transaction_context[B]:
        worklist.extend(predecessors(B))
        worklist.extend(successors(B))
        worklist = list(set(worklist))

Transaction Fields to consider for constraints

Global Fields:

GroupSize: Finite -> [1, 16]

Transaction Fields:

GroupIndex: Finite -> [0, 15]
Type: Finite -> [pay{}, keyreg{}, acfg{}, axfer{}, afrx{}, appl{Creation, UpdateApplication, DeleteApplication, NoOp, ...}]
RekeyTo: Addr -> Store concrete constraints only
CloseRemainderTo: Addr
AssetCloseTo: Addr
Sender: Addr
Receiver: Addr
...

Note

==, != are both considered for fields with small domain.

If block B is,
```
Global GroupSize
int 2
==
assert
```
Then, block_transaction_context[B]["GroupSize"] = [2]
And if block B is
```
Global GroupSize
int 2
!=
assert
```
Then, block_transaction_context[B]["GroupSize"] = list(set(range(1, 16)) - set([2]))

However, We cannot do the same for Addr type fields. we will only consider constraints which are small enough to store and enumerate. if the number of possible values are small, then we store them otherwise the possible values are considered to be ALL possible values(no constraints).
Using the gtxn instruction, constraints can be computed for other transactions in the group as well. This helps in detecting the possible group structures. Constraints of other transactions in the group can also be used for contracts which distribute the security checks to multiple contracts.

Uses

Currently canUpdate, canDelete detectors check for following sequence of instructions and report execution paths that do not have that sequence
```
txn OnCompletion
int [UpdateApplication | DeleteApplication]
==
```
These detectors can be divided into isUpdatable, isDeletable and unprotectedUpdateApplication, unprotectedDeleteApplication detectors.
- isUpdatable: is the contract updatable?. Check all the leaf basic blocks in the CFG, if any of the block has appl{UpdateApplication} in it's context and doesn't contain return 0 then there's a execution path that will be executed when transaction is UpdateApplication call.
- unprotectedUpdateApplication: update application has proper access controls?. if the contract is Updatable then check constraints on all the execution paths(using leaf basic blocks) with appl{UpdateApplication}, if there is no constraint on SenderAddress, then report that execution path. Based on Tealer capabilities, it can be checked if SenderAddress is checked against state variable.
- isDeleteable and unprotectedDeleteApplication is same as above but for DeleteApplication.
Printers for UpdateApplication, DeleteApplication paths.
Highlight basic blocks which has appl{UpdateApplication} in their paths for UpdateApplication printer. Highlight basic blocks which has appl{DeleteApplication}.
Get Heuristics on contract type: stateless or stateful. We could check if there are any blocks which does not have any appl{UpdateApplication, DeleteApplication, ...}. This implies that either contract is stateless or contract is stateful and block is dead block. Similar deduction can be made if any of the blocks has only appl{...}.
Get information on the group size and possible combinations of group this contract might be part of.
- This helps in deducing the group structure itself.
- Detectors can perform better analysis for contracts where checks are distributed among multiple contracts. e.g First contract in the group verifies fields of all transactions in the group.
- Reduce FP for contracts which check group size and fields of all transactions in the group.
Improve analysis of current detectors which check if contracts are validating certain transaction fields.

Blockers

Currently, assert instruction or (bz/bnz) instructions only consider single comparison expression. A constraint is considered if and only if comparison operator is immediately followed by assert or bz/bnz. This does give some reasonable constrainsts. However, developers generally chain multiple conditions and then assert or branch on that large expression.
- Ideally, entire boolean expression should be constructed and constraints should be extracted from that.
- However, based on the benchmark contracts, in practice, most of this expressions are chain of && operations, which are much easier to extract from, than the generalised expressions.

ToDo

Evaluate above use cases, complexity and tradeof and decide whether to proceed with this approach. Find edgecases that are not considered and improve the algorithm.
Implement improved algorithm with as much generality as possible.
Improve False Positives of the current detectors
Implement and evaluate above detectors and printers.
Use heuristics for contract type and use it for future analysis.
Find other use cases

Related/Solves issues #75, #82, #83, #87, #90, #95

Add simple loop detection

Yup, seems like loops are possible in newer Teal. Detectors need to avoid recurring on loop-y paths (it end up doing "inf" recursion)

Spray a check about the current BB being already in current path ?

        # check for loops
        if bb in current_path:
            return

Detect paths missing ClearState transaction check

When a contract relies on validations performed in a different application, the contract checks that the target application is called in the group. The contract should also check that the Application call transaction is not a ClearState transaction. ClearState transaction executes the ClearState Program instead of the Approval Program. Because validations are generally performed in the Approval program, the contract will be vulnerable

Improve parsing

The CI is failing on these new opcodes:

https://github.com/algorand/smart-contracts/blob/master/devrel/AVM1.0/loopsandsub.teal

And probably others. We have to investigate if some opcodes are not parsed, or not correctly handled

Update linter + CI

We should update the linter, and pin the CI:

tealer/.github/workflows/linter.yml

Line 40 in 48e1108

uses: docker://github/super-linter:v4

Similar to crytic/slither#1157

Use single instruction for semantically equivalent but syntactically different instructions

The instructions int NUM, pushint NUM are equivalent. replacing pushint NUM instructions int NUM will reduce the overhead of checking both type of instructions for syntax based checks. There might be other instructions.

Use integer values along with names for Named Integer Constants

Integer values are not considered as valid named integer constant.
e.g Integer value for UpdateApplication is 4.

txn OnCompletion
int 4
==

The above pattern results in false positives for update application detector, as detector doesn't consider int 4 as int UpdateApplication even though both are equivalent

Create benchmark of real-world attacks and large codebases

On dev several detectors give too many false positives. We need to create a benchmark to get a better sense of how to improve the current detectors and make sure tealer give a good rate of FP <> FN <> TP

Related: #65

Teal parser fails to parse numbers written in octal and hex representation

Teal supports representing numbers in hex (prefix: 0x), octal (prefix: 0) and decimal. tealer's parser uses int() to convert to python int everywhere. This will result in error, if the numbers are represented using hex format and in incorrect results if the numbers are represented in octal.

In some places, parser checks the numbers using .isdigit() before converting to a integer. For octal numbers prefix 0 is also a digit which makes .isdigit() return True unlike for hex representation where x in 0x is not a digit, as a result, octal numbers are incorrectly parses even in the presence of .isdigit() check.

ref: https://developer.algorand.org/docs/get-details/dapps/avm/teal/specification/#constants-and-pseudo-ops

Fix execution paths generated by group-size detector

detect_missing_tx_field_validations and search_paths already address the issues with the generation of execution paths.

tealer/tealer/detectors/utils.py

Lines 51 to 229 in c590caa

    
           def detect_missing_tx_field_validations( 
        
               entry_block: "BasicBlock", checks_field: Callable[["BlockTransactionContext"], bool] 
        
           ) -> List[List["BasicBlock"]]: 
        
               """search for execution paths lacking validations on transaction fields. 
        
               `tealer.analyses.dataflow` calculates possible values a transaction field can have to successfully complete execution. 
        
               Information is computed at block level. Each block's context(`BlockTransactionContext`) 
        
               will have information on possible values for a transaction field such that 
        
                   - Execution reaches to the first instruction of block `B` 
        
                   - Execution reaches exit instruction of block `B` without failing. 
        
               The goal of current detectors is to find if a given transaction field(s) can have a certain vulnerable value(s). 
        
               e.g: is_updatable detector checks if the transaction can be UpdateApplication type transaction. Here, the vulnerable value for 
        
               transaction type is "UpdateApplication". if "UpdateApplication" is a possible value then the contract is considered vulnerable and is reported. 
        
               In order to find whether the given contract is vulnerable, it is sufficient to check the leaf blocks: whether transaction field can 
        
               have target value and execution reaches end of a leaf block. 
        
               However, The current output format requires "a execution path" from entry block to leaf block. And when a leaf block is vulnerable, 
        
               it does **not** mean that all execution paths ending at that leaf block are vulnerable. It only means that atleast one path ending at 
        
               that leaf block is vulnerable. 
        
               For this reason, a traversal over the CFG is performed to enumerate execution paths. And for each block in the path it is checked 
        
               whether transaction field(s) can have the target vulnerable value. 
        
               Args: 
        
                   entry_block: entry basic block of the CFG 
        
                   checks_field: Given a block context, should return True if the target field cannot have the vulnerable value 
        
                       or else False. 
        
                       e.g: For is_updatable detector, vulnerable value is `UpdateApplication`. 
        
                       if `block_ctx.transaction_types` can have `UpdateApplication`, this method will 
        
                       return false or else returns True. 
        
               Returns: 
        
                   Returns a list of vulnerable execution paths: none of the blocks in the path check the fields. 
        
               """ 
        
               def search_paths( 
        
                   bb: "BasicBlock", 
        
                   current_path: List["BasicBlock"], 
        
                   paths_without_check: List[List["BasicBlock"]], 
        
                   current_call_stack: List[Tuple[Optional["BasicBlock"], "Subroutine"]], 
        
                   current_subroutine_executed: List[List["BasicBlock"]], 
        
               ) -> None: 
        
                   """ 
        
                   Args: 
        
                       current_call_stack: list of callsub blocks and called subroutine along the current path. 
        
                           e.g current_callsub_blocks = [(Bi, S1), (Bj, S2), (Bk, S3), ...] 
        
                           => 1. Bi, Bj, Bk, .. all end with callsub instruction. 
        
                              2. Bi called subroutine S1. S1 contains block Bj which called subroutine S2. 
        
                                   S2 contains Bk which calls S3. 
        
                                   S1 calls S2, S2 calls S3, ... 
        
                       current_subroutine_executed: list of subroutine basic blocks that were executed in the path. 
        
                           e.g if call stack is [(Bi, S1), (Bj, S2), (Bk, S3)] 
        
                           1. current_subroutine_executed[0] contains basic blocks of S1 that were executed before call to S2. 
        
                           2. current_subroutine_executed[1] contains basic blocks of S2 that were executed before call to S3. 
        
                           3. current_subroutine_executed[2] contains basic blocks of S3 that were executed upto this call. 
        
                           Main purpose of this argument is to identify loops. A loop contains a back edge to a previously executed block. 
        
                           if presence of loop is checked using `return True if bb in current_path else False`, detector misses paths that 
        
                           does not contain a loop. 
        
                           Mainly, when a subroutine is called twice in a path, the second call is treated as a back edge because the subroutine's blocks 
        
                           will be present in the current path (added in the first call.) 
        
                           e.g 
        
                               ``` 
        
                               callsub Sa  // B0 
        
                               callsub Sa  // B1 
        
                               int 1       // B2 
        
                               return 
        
                               Sa:         // B3 
        
                                   callsub Sb 
        
                                   retsub      // B4 
        
                               Sb:         // B5 
        
                                   retsub 
        
                               ``` 
        
                           before executing B1, current_path = [B0, B3, B5, B4], bb = B1 
        
                           after executing B1, current_path = [B0, B3, B5, B4, B1], bb = B3 
        
                           B3 is already present in the current_path and so is treated as a loop even though it's not. 
        
                           In order to identify loops properly, basic blocks that are executed as part of the current subroutines are tracked. 
        
                           For above example, 
        
                               At the start of the function 
        
                                   bb = B0, current_path = [], current_call_stack = [(None, Main)], current_subroutine_executed = [[]] 
        
                               At the end of the function before calling search_paths again 
        
                                   bb = B0, current_path = [B0], current_call_stack = [(None, Main)], current_subroutine_executed = [[B0]] 
        
                               # B0 calls Sa. Calling a subroutine adds a item to current_call_stack and current_subroutine_executed. 
        
                               Start 
        
                                   bb = B3, current_path = [B0], current_call_stack = [(None, Main), (B0, Sa)], current_subroutine_executed = [[B0], []] 
        
                               End 
        
                                   bb = B3, current_path = [B0, B3], current_call_stack = [(None, Main), (B0, Sa)], current_subroutine_executed = [[B0], [B3]] 
        
                               # B3 calls Sb. Calling a subroutine adds a item to current_call_stack and current_subroutine_executed. 
        
                               Start 
        
                                   bb = B5, current_path = [B0, B3], current_call_stack = [(None, Main), (B0, Sa), (B3, Sb)], current_subroutine_executed = [[B0], [B3], []] 
        
                               End 
        
                                   bb = B5, current_path = [B0, B3, B5], current_call_stack = [(None, Main), (B0, Sa), (B3, Sb)], current_subroutine_executed = [[B0], [B3], [B5]] 
        
                               # B5 returns the execution from Sb. Returning from a subroutine removes a item from current_call_stack, current_subroutine_executed. 
        
                               Start 
        
                                   bb = B4, current_path = [B0, B3, B5], current_call_stack = [(None, Main), (B0, Sa)], current_subroutine_executed = [[B0], [B3]] 
        
                               End 
        
                                   bb = B4, current_path = [B0, B3, B5, B4], current_call_stack = [(None, Main), (B0, Sa)], current_subroutine_executed = [[B0], [B3, B4]] 
        
                               Start 
        
                                   bb = B1, current_path = [B0, B3, B5, B4], current_call_stack = [(None, Main)], current_subroutine_executed = [[B0]] 
        
                               End 
        
                                   bb = B1, current_path = [B0, B3, B5, B4, B1], current_call_stack = [(None, Main)], current_subroutine_executed = [[B0, B1]] 
        
                               .... 
        
                   """ 
        
                   # check for loops 
        
                   if bb in current_subroutine_executed[-1]: 
        
                       logger_detectors.debug( 
        
                           f"Loop Path: current_full_path = {current_path}\n, current_subroutine_executed = {current_subroutine_executed[-1]}, current_block: {repr(bb)}" 
        
                       ) 
        
                       return 
        
                   if validated_in_block(bb, checks_field): 
        
                       logger_detectors.debug( 
        
                           f"Validated Path: current_full_path = {current_path}\n, current_block: {repr(bb)}" 
        
                       ) 
        
                       return 
        
                   current_path = current_path + [bb] 
        
                   # leaf block 
        
                   if len(bb.next) == 0: 
        
                       logger_detectors.debug(f"Vulnerable Path: current_full_path = {current_path}") 
        
                       paths_without_check.append(current_path) 
        
                       return 
        
                   # do we need to make a copy of lists in [:-1]??? 
        
                   current_subroutine_executed = current_subroutine_executed[:-1] + [ 
        
                       current_subroutine_executed[-1] + [bb] 
        
                   ] 
        
                   if isinstance(bb.exit_instr, Callsub): 
        
                       teal = bb.teal 
        
                       assert teal is not None 
        
                       called_subroutine = teal.subroutines[bb.exit_instr.label] 
        
                       # check for recursion 
        
                       already_called_subroutines = [frame[1] for frame in current_call_stack] 
        
                       if called_subroutine in already_called_subroutines: 
        
                           # recursion 
        
                           return 
        
                       current_call_stack = current_call_stack + [(bb, called_subroutine)] 
        
                       current_subroutine_executed = current_subroutine_executed + [[]] 
        
                   if isinstance(bb.exit_instr, Retsub): 
        
                       # if this block is retsub then it returns execution to the return 
        
                       # point of callsub block. return point is the next instruction after the callsub 
        
                       # instruction. 
        
                       (callsub_block, _) = current_call_stack[-1] 
        
                       assert callsub_block is not None 
        
                       return_point = callsub_block.sub_return_point 
        
                       if return_point is not None and return_point in bb.next: 
        
                           search_paths( 
        
                               return_point, 
        
                               current_path, 
        
                               paths_without_check, 
        
                               current_call_stack[:-1],  # returning from a subroutine 
        
                               current_subroutine_executed[:-1], 
        
                           ) 
        
                   else: 
        
                       for next_bb in bb.next: 
        
                           search_paths( 
        
                               next_bb, 
        
                               current_path, 
        
                               paths_without_check, 
        
                               current_call_stack, 
        
                               current_subroutine_executed, 
        
                           ) 
        
               teal = entry_block.teal 
        
               assert teal is not None 
        
               paths_without_check: List[List["BasicBlock"]] = [] 
        
               search_paths(entry_block, [], paths_without_check, [(None, teal.main)], [[]]) 
        
               return paths_without_check

group-size detector implements a function to generate execution paths. This function does not address the recently uncovered issues.

tealer/tealer/detectors/groupsize.py

Lines 119 to 164 in c590caa

    
               def _check_groupsize( 
        
                   self, 
        
                   bb: BasicBlock, 
        
                   current_path: List[BasicBlock], 
        
                   paths_without_check: List[List[BasicBlock]], 
        
                   used_abs_index: bool, 
        
               ) -> None: 
        
                   """Find execution paths with missing GroupSize check. 
        
                   This function recursively explores the Control Flow Graph(CFG) of the 
        
                   contract and reports execution paths with missing GroupSize 
        
                   check. 
        
                   This function is "in place", modifies arguments with the data it is 
        
                   supposed to return. 
        
                   Args: 
        
                       bb: Current basic block being checked(whose execution is simulated.) 
        
                       current_path: Current execution path being explored. 
        
                       paths_without_check: 
        
                           Execution paths with missing GroupSize check. This is a 
        
                           "in place" argument. Vulnerable paths found by this function are 
        
                           appended to this list. 
        
                       used_abs_index: Should be True if absolute index in `current_path`. 
        
                   """ 
        
                   # check for loops 
        
                   if bb in current_path: 
        
                       return 
        
                   if MAX_GROUP_SIZE not in bb.transaction_context.group_sizes: 
        
                       # GroupSize is checked 
        
                       return 
        
                   if not used_abs_index: 
        
                       used_abs_index = self._accessed_using_absolute_index(bb) 
        
                   current_path = current_path + [bb] 
        
                   if not bb.next: 
        
                       # leaf block 
        
                       if used_abs_index: 
        
                           # accessed a field using absolute index in this path. 
        
                           paths_without_check.append(current_path) 
        
                       return 
        
                   for next_bb in bb.next: 
        
                       self._check_groupsize(next_bb, current_path, paths_without_check, used_abs_index)

The group-size could be updated to use the detect_missing_tx_field_validations

"""Detector for finding execution paths missing GroupSize check."""

from typing import List, TYPE_CHECKING
from functools import lru_cache

from tealer.detectors.abstract_detector import (
    AbstractDetector,
    DetectorClassification,
    DetectorType,
)
from tealer.teal.basic_blocks import BasicBlock
from tealer.teal.instructions.instructions import (
    Gtxn,
    Gtxna,
    Gtxnas,
    Gtxns,
    Gtxnsa,
    Gtxnsas,
)
from tealer.teal.teal import Teal

from tealer.utils.algorand_constants import MAX_GROUP_SIZE
from tealer.utils.analyses import is_int_push_ins
from tealer.analyses.utils.stack_ast_builder import construct_stack_ast, UnknownStackValue
from tealer.detectors.utils import (
    detect_missing_tx_field_validations,
    detector_terminal_description
)

if TYPE_CHECKING:
    from tealer.utils.output import SupportedOutput
    from tealer.teal.instructions.instructions import Instruction
    from tealer.teal.context.block_transaction_context import BlockTransactionContext


class MissingGroupSize(AbstractDetector):  # pylint: disable=too-few-public-methods
   # ...

    @lru_cache(maxsize=None)
    @staticmethod
    def _accessed_using_absolute_index(bb: BasicBlock) -> bool:
        """Return True if a instruction in bb access a field using absolute index

        a. gtxn t f, gtxna t f i, gtxnas t f,
        b. gtxns f, gtxnsa f i, gtxnsas f

        Instructions in (a) take transaction index as a immediate argument.
        Return True if bb contains any one of those instructions.

        Instructions in (b) take transaction index from the stack.
        `gtxns f` and `gtxnsa f i` take only one argument and it is the transaction index.
        `gtxnsas f` takes two arguments and transaction index is the first argument.
        Return True if the transaction index is pushed by an int instruction.
        """
        stack_gtxns_ins: List["Instruction"] = []
        for ins in bb.instructions:
            if isinstance(ins, (Gtxn, Gtxna, Gtxnas)):
                return True
            if isinstance(ins, (Gtxns, Gtxnsa, Gtxnsas)):
                stack_gtxns_ins.append(ins)
        if not stack_gtxns_ins:
            return False
        ast_values = construct_stack_ast(bb)
        for ins in stack_gtxns_ins:
            index_value = ast_values[ins].args[0]
            if isinstance(index_value, UnknownStackValue):
                continue
            is_int, _ = is_int_push_ins(index_value.instruction)
            if is_int:
                return True
        return False


    def detect(self) -> "SupportedOutput":
        """Detect execution paths with missing GroupSize check.

        Returns:
            ExecutionPaths instance containing the list of vulnerable execution
            paths along with name, check, impact, confidence and other detector
            information.
        """

        def checks_group_size(block_ctx: "BlockTransactionContext") -> bool:
            # return True if group-size is checked in the path
            # otherwise return False
            return MAX_GROUP_SIZE not in block_ctx.group_sizes
        
        paths_without_check: List[List[BasicBlock]] = detect_missing_tx_field_validations(
            self.teal.bbs[0], checks_group_size
        )

        # paths_without_check contain all the execution paths not checking the GroupSize
        # Only report paths which also use absolute_index

        paths_to_report: List[List[BasicBlock]] = []
        for path in paths_without_check:
            for bb in path:
                if self._accessed_using_absolute_index(bb):
                    paths_to_report.append(path)
                    break

        description = detector_terminal_description(self)
        filename = "missing_group_size"

        results = self.generate_result(paths_to_report, description, filename)
        construct_stack_ast.cache_clear()
        return results

Improve False Positives

In the limitted analysis that I have done it seems like the tool does not recognise the following rekey check, leading to false positives:

txn GroupIndex
int 3
==
assert
gtxn 3 RekeyTo
global ZeroAddress
==
assert

which is the equivalent of

txn RekeyTo
global ZeroAddress
==
assert

This may also extend to other checks.

Improve support of stateful and stateless contracts

Currently Tealer does not have a good notion of stateful/stateless. We have some some information from the detectors

tealer/tealer/detectors/abstract_detector.py

Lines 12 to 15 in 8a7fe21

    
           class DetectorType(ComparableEnum): 
        
               STATELESS = 0 
        
               STATEFULL = 1 
        
               STATEFULLGROUP = 2

But its not really used so far. We have to develop further the API to express better the statefullness of the contract, and make sure the proper detector are used based on that.

What we could do:

Create statefulness information into Teal
Create detection of the statefulness based on semantics heuristics
Adapt existing detectors to be enabled/change their strategies based on the statefulness

Recover method // push 4 bytes

We should start adding support to recover the function id, like in

method "some_func()void"     // pushes 4-byte method selector on to the stack

Have a uniform output directory and filenames

Detectors and printers output .dot files. All these files are saved in the current directory by default. User can use --dest option to select the destination directory. When user uses this, all the files that are generated in that run will be saved in the selected directory. Most of the times users will not use the --dest option and even if used, user has to select the directory for every run of the tool.

We can use a standard directory structure for tealer output files.

All files will be saved in the default directory tealer-export.
Assuming the users run tealer on multiple contracts, each contract should have a separate sub-directory.

Each of the detectors could generate one or more dot files to represent the output. It makes sense to have a separate directory to save the output files of each of the detectors.

Printers either

Print to the terminal (human-summary)
Output single file (call-graph, cfg)
Output multiple files (subroutine-cfg, transaction-context)

Printers generating multiple files will have a separate directory. Printers generating the single files will save in the main directory.

Directory structure:

root_directory = "tealer-export"
contract_directory = f"teal.contract_name"
main_directory = f"{root_directory}/{contract_directory}"

printer_directory = f"{main_directory}" if printer_outputs_single_file else f"{main_directory}/{printer.NAME}"
detector_directory = f"{main_directory}/{detector.NAME}"

# filenames of detectors

for index, output in enumerate(outputs):
      file_location = f"{detector_directory}/{detector.NAME}-{index}.{file-extension}" # file-extension is dot

Contract Name:

# __main__.py fetch_contract function
def fetch_contract(args: argparse.Namespace) -> Tuple[str, str]:
    program: str = args.program
    network: str = args.network
    b32_regex = "[A-Z2-7]+"
    if program.isdigit():
        # is a number so a app id
        print(f'Fetching application using id "{program}"')
        # contract_name = f"app_{appid}", ex: app_991196662
        return get_application_using_app_id(network, int(program)), f"app_{program}"
    if len(program) == 52 and re.fullmatch(b32_regex, program) is not None:
        # is a txn id: base32 encoded. length after encoding == 52
        print(f'Fetching logic-sig contract that signed the transaction "{program}"')
        # contract_name = f"txn_{txn_id[:8].lower()}"
        return logic_sig_from_txn_id(network, program), f"txn_{program[:8].lower()}"
    if len(program) == 58 and re.fullmatch(b32_regex, program) is not None:
        # is a address. base32 encoded. length after encoding == 58
        print(f'Fetching logic-sig of contract account "{program}"')
        # contract_name = f"lsig_{address[:8].lower()}"
        return logic_sig_from_contract_account(network, program), f"lsig_{program[:8].lower()}"
    # file path
    print(f'Reading contract from file: "{program}"')
    try:
        contract_name = program[:-len(".teal")] if program.endswith(".teal") else program
        contract_name = contract_name.lower()
        with open(program, encoding="utf-8") as f:
            return f.read(), contract_name
    except FileNotFoundError as e:
        raise TealerException from e

Its better to remove --dest option if we use this directory structure.

Replace intc, bytec instructions with int and byte instructions

Teal allows declaring constants using intcblock which can be accessed later using intc {constant number} instruction. When detectors traverse the contract, they don't have access to integer value the intc refers to. This may result in False Positives as detectors can't reason about the values declared using intc. Same goes for bytecblock and bytec instructions.

Infer groupsize on paths

Along with group index(#75 ), infer groupsize from instructions like

global GroupSize
int VALUE
==

This allows to reason if we are checking the fields of all transactions in the group using gtxn KEY field implying we are checking the transaction field of current contract along with others.

	# A retsub? Assign the current open entry points list to it and then reset the list
	if isinstance(ins, Retsub):
	# pylint: disable=no-member
	ins.set_labels(entries[:])
	entries = []

	def detect_missing_tx_field_validations(
	entry_block: "BasicBlock", checks_field: Callable[["BlockTransactionContext"], bool]
	) -> List[List["BasicBlock"]]:
	"""search for execution paths lacking validations on transaction fields.

	`tealer.analyses.dataflow` calculates possible values a transaction field can have to successfully complete execution.
	Information is computed at block level. Each block's context(`BlockTransactionContext`)
	will have information on possible values for a transaction field such that
	- Execution reaches to the first instruction of block `B`
	- Execution reaches exit instruction of block `B` without failing.

	The goal of current detectors is to find if a given transaction field(s) can have a certain vulnerable value(s).
	e.g: is_updatable detector checks if the transaction can be UpdateApplication type transaction. Here, the vulnerable value for
	transaction type is "UpdateApplication". if "UpdateApplication" is a possible value then the contract is considered vulnerable and is reported.

	In order to find whether the given contract is vulnerable, it is sufficient to check the leaf blocks: whether transaction field can
	have target value and execution reaches end of a leaf block.
	However, The current output format requires "a execution path" from entry block to leaf block. And when a leaf block is vulnerable,
	it does not mean that all execution paths ending at that leaf block are vulnerable. It only means that atleast one path ending at
	that leaf block is vulnerable.

	For this reason, a traversal over the CFG is performed to enumerate execution paths. And for each block in the path it is checked
	whether transaction field(s) can have the target vulnerable value.

	Args:
	entry_block: entry basic block of the CFG
	checks_field: Given a block context, should return True if the target field cannot have the vulnerable value
	or else False.

	e.g: For is_updatable detector, vulnerable value is `UpdateApplication`.
	if `block_ctx.transaction_types` can have `UpdateApplication`, this method will
	return false or else returns True.
	Returns:
	Returns a list of vulnerable execution paths: none of the blocks in the path check the fields.
	"""

	def search_paths(
	bb: "BasicBlock",
	current_path: List["BasicBlock"],
	paths_without_check: List[List["BasicBlock"]],
	current_call_stack: List[Tuple[Optional["BasicBlock"], "Subroutine"]],
	current_subroutine_executed: List[List["BasicBlock"]],
	) -> None:
	"""
	Args:
	current_call_stack: list of callsub blocks and called subroutine along the current path.
	e.g current_callsub_blocks = [(Bi, S1), (Bj, S2), (Bk, S3), ...]
	=> 1. Bi, Bj, Bk, .. all end with callsub instruction.
	2. Bi called subroutine S1. S1 contains block Bj which called subroutine S2.
	S2 contains Bk which calls S3.
	S1 calls S2, S2 calls S3, ...
	current_subroutine_executed: list of subroutine basic blocks that were executed in the path.
	e.g if call stack is [(Bi, S1), (Bj, S2), (Bk, S3)]
	1. current_subroutine_executed[0] contains basic blocks of S1 that were executed before call to S2.
	2. current_subroutine_executed[1] contains basic blocks of S2 that were executed before call to S3.
	3. current_subroutine_executed[2] contains basic blocks of S3 that were executed upto this call.

	Main purpose of this argument is to identify loops. A loop contains a back edge to a previously executed block.
	if presence of loop is checked using `return True if bb in current_path else False`, detector misses paths that
	does not contain a loop.
	Mainly, when a subroutine is called twice in a path, the second call is treated as a back edge because the subroutine's blocks
	will be present in the current path (added in the first call.)
	e.g
	```
	callsub Sa // B0
	callsub Sa // B1
	int 1 // B2
	return
	Sa: // B3
	callsub Sb
	retsub // B4
	Sb: // B5
	retsub
	```
	before executing B1, current_path = [B0, B3, B5, B4], bb = B1
	after executing B1, current_path = [B0, B3, B5, B4, B1], bb = B3

	B3 is already present in the current_path and so is treated as a loop even though it's not.

	In order to identify loops properly, basic blocks that are executed as part of the current subroutines are tracked.
	For above example,
	At the start of the function
	bb = B0, current_path = [], current_call_stack = [(None, Main)], current_subroutine_executed = [[]]

	At the end of the function before calling search_paths again
	bb = B0, current_path = [B0], current_call_stack = [(None, Main)], current_subroutine_executed = [[B0]]

	# B0 calls Sa. Calling a subroutine adds a item to current_call_stack and current_subroutine_executed.
	Start
	bb = B3, current_path = [B0], current_call_stack = [(None, Main), (B0, Sa)], current_subroutine_executed = [[B0], []]
	End
	bb = B3, current_path = [B0, B3], current_call_stack = [(None, Main), (B0, Sa)], current_subroutine_executed = [[B0], [B3]]

	# B3 calls Sb. Calling a subroutine adds a item to current_call_stack and current_subroutine_executed.
	Start
	bb = B5, current_path = [B0, B3], current_call_stack = [(None, Main), (B0, Sa), (B3, Sb)], current_subroutine_executed = [[B0], [B3], []]
	End
	bb = B5, current_path = [B0, B3, B5], current_call_stack = [(None, Main), (B0, Sa), (B3, Sb)], current_subroutine_executed = [[B0], [B3], [B5]]

	# B5 returns the execution from Sb. Returning from a subroutine removes a item from current_call_stack, current_subroutine_executed.
	Start
	bb = B4, current_path = [B0, B3, B5], current_call_stack = [(None, Main), (B0, Sa)], current_subroutine_executed = [[B0], [B3]]
	End
	bb = B4, current_path = [B0, B3, B5, B4], current_call_stack = [(None, Main), (B0, Sa)], current_subroutine_executed = [[B0], [B3, B4]]

	Start
	bb = B1, current_path = [B0, B3, B5, B4], current_call_stack = [(None, Main)], current_subroutine_executed = [[B0]]
	End
	bb = B1, current_path = [B0, B3, B5, B4, B1], current_call_stack = [(None, Main)], current_subroutine_executed = [[B0, B1]]

	....
	"""
	# check for loops
	if bb in current_subroutine_executed[-1]:
	logger_detectors.debug(
	f"Loop Path: current_full_path = {current_path}\n, current_subroutine_executed = {current_subroutine_executed[-1]}, current_block: {repr(bb)}"
	)
	return

	if validated_in_block(bb, checks_field):
	logger_detectors.debug(
	f"Validated Path: current_full_path = {current_path}\n, current_block: {repr(bb)}"
	)
	return

	current_path = current_path + [bb]

	# leaf block
	if len(bb.next) == 0:
	logger_detectors.debug(f"Vulnerable Path: current_full_path = {current_path}")
	paths_without_check.append(current_path)
	return

	# do we need to make a copy of lists in [:-1]???
	current_subroutine_executed = current_subroutine_executed[:-1] + [
	current_subroutine_executed[-1] + [bb]
	]
	if isinstance(bb.exit_instr, Callsub):
	teal = bb.teal
	assert teal is not None
	called_subroutine = teal.subroutines[bb.exit_instr.label]
	# check for recursion
	already_called_subroutines = [frame[1] for frame in current_call_stack]
	if called_subroutine in already_called_subroutines:
	# recursion
	return
	current_call_stack = current_call_stack + [(bb, called_subroutine)]
	current_subroutine_executed = current_subroutine_executed + [[]]

	if isinstance(bb.exit_instr, Retsub):
	# if this block is retsub then it returns execution to the return
	# point of callsub block. return point is the next instruction after the callsub
	# instruction.
	(callsub_block, _) = current_call_stack[-1]
	assert callsub_block is not None
	return_point = callsub_block.sub_return_point
	if return_point is not None and return_point in bb.next:
	search_paths(
	return_point,
	current_path,
	paths_without_check,
	current_call_stack[:-1], # returning from a subroutine
	current_subroutine_executed[:-1],
	)
	else:
	for next_bb in bb.next:
	search_paths(
	next_bb,
	current_path,
	paths_without_check,
	current_call_stack,
	current_subroutine_executed,
	)

	teal = entry_block.teal
	assert teal is not None
	paths_without_check: List[List["BasicBlock"]] = []
	search_paths(entry_block, [], paths_without_check, [(None, teal.main)], [[]])
	return paths_without_check

	def _check_groupsize(
	self,
	bb: BasicBlock,
	current_path: List[BasicBlock],
	paths_without_check: List[List[BasicBlock]],
	used_abs_index: bool,
	) -> None:
	"""Find execution paths with missing GroupSize check.

	This function recursively explores the Control Flow Graph(CFG) of the
	contract and reports execution paths with missing GroupSize
	check.

	This function is "in place", modifies arguments with the data it is
	supposed to return.

	Args:
	bb: Current basic block being checked(whose execution is simulated.)
	current_path: Current execution path being explored.
	paths_without_check:
	Execution paths with missing GroupSize check. This is a
	"in place" argument. Vulnerable paths found by this function are
	appended to this list.
	used_abs_index: Should be True if absolute index in `current_path`.
	"""

	# check for loops
	if bb in current_path:
	return

	if MAX_GROUP_SIZE not in bb.transaction_context.group_sizes:
	# GroupSize is checked
	return

	if not used_abs_index:
	used_abs_index = self._accessed_using_absolute_index(bb)

	current_path = current_path + [bb]
	if not bb.next:
	# leaf block
	if used_abs_index:
	# accessed a field using absolute index in this path.
	paths_without_check.append(current_path)
	return
	for next_bb in bb.next:
	self._check_groupsize(next_bb, current_path, paths_without_check, used_abs_index)

	class DetectorType(ComparableEnum):
	STATELESS = 0
	STATEFULL = 1
	STATEFULLGROUP = 2

crytic / tealer Goto Github PK

tealer's People

Contributors

Stargazers

Watchers

Forkers

tealer's Issues