google / go-flow-levee Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Describe the bug
While iterating #278, I noticed that internal/pkg/levee/testdata/test-config.yaml
will pass all tests with the following Source specification removed:
Sources:
# ...
- Package: "levee_analysistest/example/core"
Type: "SourceManipulator"
This was introduced in #264.
To Reproduce
Remove or comment out the above configuration. Observe that tests pass. See #278 first test run for demonstration.
func TestDeletingFromTaintedMapDoesNotTaintTheKey(key string, sources map[string]core.Source) {
delete(sources, key)
// TODO: no report should be produced here
core.Sink(key) // want "a source has reached a sink"
}
The simplest way to handle this would be to check if the delete
's associated Call
's StaticCallee
is the Builtin
function delete
, and avoid propagating if that is the case.
Particularly as we examine cross-function and whole-program analysis, it will be useful to be able to benchmark our overall runtime. In particular, anecdotal evidence suggests that the construction of the SSA graph by buildssa.Analyzer
constitutes the majority of our runtime. It would be good to be able to put actual data to such claims, identify any significant performance regressions, and evaluate the runtime costs of future work.
This issue is related to #161.
Consider the following test case:
func TestPanicTypeAssertSource(i interface{}) {
s := i.(core.Source)
_ = s
// The dominating type assertion would panic if i were not a source type.
core.Sink(i) // TODO(210) want "a source has reached a sink"
}
Clearly, the only way that the call to core.Sink
can be reached is if i
holds a value of type core.Source
. The above code is therefore unsafe.
The following case is more ambiguous, and currently we do not expect a report to be produced:
func TestOkayTypeAssert(i interface{}) {
s, ok := i.(core.Source)
_, _ = s, ok
// The dominating type assertion will not panic.
core.Sink(i)
}
Since the , ok
form is used, it is possible that the type assertion failed and that i
is in fact safe to sink. However, there is certainly a possibility that i
contains a value of type core.Source
, and therefore this code should perhaps be considered unsafe.
Currently, any interaction between a source and a collection taints the entire collection. Explicit access of non-source elements of a tainted collection should not necessarily be considered tainted. For example:
func Foo(s Source, key string) {
m := map[interface{}]interface{} {"one": 1, "two": 2}
m["source"] = s // this taints m
Sink(m) // this emits a Diagnostic, as well it should
Sink(m["one"]) // this currently emits a Diagnostic, because m is tainted, but const access of this non-Source element should not necessarily emit.
Sink(m[key]) // this is ambiguous, so this should emit a Diagnostic
m["source"] = nil // Related to #92, the propagated taint has been removed
Sink(m) // since the only taint has been removed, this should no longer emit Diagnostic.
For reference: #92
Slices must be handled carefully, as the positions can shift if a user is using a slice as, e.g., a queue. See SliceTricks for potential pitfalls. Any ambiguity should propagate taint rather than halting it.
Currently, when a tainted value is placed in a collection, the entire collection is considered to be tainted. This is not unreasonable considering that if a collection containing a tainted value reaches a sink, the tainted value will almost certainly "reach the sink" as well, since e.g. printing a collection implies printing its contents.
This does have the unfortunate side effect that any key access on the collection will yield a value that is considered to be tainted. For example, analyzing the following instructions yields a diagnostic, even though the value that reaches the sink is not really tainted:
m := map[string]interface{}{"source": s}
core.Sink(m[""]) // diagnostic: "a source has reached a sink"
This sequence of instructions, in which a map is properly "sanitized" (ahem...) also yields a diagnostic:
m := map[string]interface{}{"source": s}
delete(m, "source")
core.Sink(m) // diagnostic: "a source has reached a sink"
A similar "sanitization" effect could be obtained by setting a slice
or map
key to nil.
We would like to be able to keep track of which keys/values in a collections are actually tainted, in order to avoid producing false positives.
We currently do not traverse through Allocs, except when they are slice-like:
source.go
case *ssa.Alloc:
// An Alloc represents the allocation of space for a variable. If a Node is an Alloc,
// and the thing being allocated is not an array, then either:
// a) it is a Source value, in which case it will get its own traversal when sourcesFromBlocks
// finds this Alloc
// b) it is not a Source value, in which case we should not visit it.
// However, if the Alloc is an array, then that means the source that we are visiting from
// is being placed into an array, slice or varags, so we do need to keep visiting.
if _, isArray := utils.Dereference(t.Type()).(*types.Array); isArray {
s.visitReferrers(n, maxInstrReached, lastBlockVisited)
}
At the time that this was implemented, avoiding Alloc
s was allowing us to avoid many false positives. With changes to the traversal code over time, it seems that this check may need to be revisited, as it can cause false negatives:
func TestPointerToNamedStructIsTainted(s core.Source, i core.Innocuous) {
// This test is failing because &x introduces an Alloc,
// and we don't traverse through non-array Allocs
colocateInnocPtr(s, &i)
core.Sink(i) // TODO(212) want "a source has reached a sink"
}
At the time of writing, only a single false positive is obtained:
func TestSelectSourceAndInnoc(sources <-chan core.Source, innocs <-chan core.Innocuous) {
select {
case s := <-sources:
core.Sink(s) // want "a source has reached a sink"
case i := <-innocs:
core.Sink(i) // an undesired report is produced here
}
}
In this case the incorrect behavior is likely not traversal through Alloc
s, but a mishandling of the SSA instructions associated with select
statements.
This issue relates to this test case:
func TestTaintingInterfaceValueDoesNotTaintContainedValue(s *core.Source, str string) {
var x interface{} = str
colocate(s, x)
// TODO(209): no report should be produced below
core.Sink(str) // want "a source has reached a sink"
}
func colocate(s *core.Source, x interface{}) {}
See #200 for additional context.
It would likely be useful for users to be able to customize all or part of the report message. Currently, the report message is generic: <sink position>: a source has reached a sink: <source position>
. This is an adequate message in general, but users may wish to add something to it to make it more actionable, e.g. by specifying what a developer who sees the message should do about it: (this is just an example) The sensitive value at <source position> must be sanitized before reaching the call at <sink position>
.
Consider the following test which is expected to create a report but does not:
func Oops(source core.Source) {
go core.Sink(source) // want "a source has reached a sink"
}
(We are assuming that core.Source
has been configured as a source and core.Sink
has been configured as a sink.)
Currently, when the analyzer is used through go vet
, if no config
flag is provided or the provided path is invalid, the analyzer produces this error message:
levee: failed prerequisites: fieldtags, source
This message does not provide any hints as to its cause, which will likely leave users scratching their head. (I know I've scratched my own head more than once when dealing with this issue.)
If the config can't be read, the fieldtags
and source
analyzers will fail:
// fieldtags/analyzer.go
conf, err := config.ReadConfig()
if err != nil {
return nil, err
}
Then, in the analysis
package, running levee
fails because its dependencies failed:
// Report an error if any dependency failed.
var failed []string
for _, dep := range act.deps {
if dep.err != nil {
failed = append(failed, dep.String())
}
}
if failed != nil {
sort.Strings(failed)
act.err = fmt.Errorf("failed prerequisites: %s", strings.Join(failed, ", "))
return
}
Note that when the analyzer is run directly (i.e. not through go vet
), the error message is much better:
error reading analysis config: open config.yaml: no such file or directory
This message may still be a bit confusing to users who did not specify a config
at all. The open config.yaml
part is due to the fact that the config
flag defaults to config.yaml
.
Within a type switch, non-default blocks have definite type information that remains obfuscated by interface. That information could be carried within that block and those dominated by that case entry for better reporting. This could possibly also be applied to type assertions, though nontrivial effort would be required to support typ, ok := foo.(bar)
style type assertions.
A sample of (suppressed) test cases will be introduced by #149.
This issue replaces some TODOs that have been in the code for some time now:
levee.go:50
if b == fn.Recover {
// TODO Handle calls to sinks in a recovery block.
continue // skipping Recover since it does not have instructions, rather a single block.
}
source.go:323
if b == fn.Recover {
// TODO Handle calls to log in a recovery block.
continue
}
Currently, if a sink
is called in a function's Recover
block, we will not catch it. This is an issue, since such a sink
call may leak source
data, just like any other sink
call.
Currently, configuration permits exclusion of functions from levee
analysis (but not upstream analyzers) via the Exclude:
clause. However, a consumer may wish to exclude analysis based on filename. In particular, consumers that use a vendor/
directory may wish to exclude analysis and possible findings from packages outside their control. These vendor/
packages do not contain vendor
in the package's path, so exclusion would currently need to be done on a per-package basis. While functional, it presents a potential pain point for consumers.
analysistest
expects testdata
to be a fake GOROOTGOPATH. We currently have various tests at .../testdata/src/example.com/...
to test the various analyzers.
IntelliJ expects our project to be configured with Go modules. This fake root is opaque to IntelliJ. As a consequence, the IDE can't resolve the linkage for packages within testdata
. E.g.,
This conflict makes development in of tests sticker than it needs to be. Additionally, while I believe I'm the only one who does this, but it also prevents general execution of an analyzer from the command-line. I am partial to using ssadump
when investigating issues, but this is prevented in any file with an unresolved import.
I propose adding a trivial go.mod
to each testdata/src/<some shared root>
so that our IDEs can resolve these paths. See #273 and associated branch for an example within config/testdata
. This allows proper linking within IntelliJ while retaining test compatability.
For compatability with the builder within analysistest
, we would each testdata/src/...
to be self-contained. Cross-package imports are permitted, but not cross-module imports. This requires a shared root within testdata/src/
. #273 demonstrates this as testdata/src/github.com/
. In practice, we should avoid collision of go modules and adopt adopt testdata/src/<identifier>
, e.g. testdata/src/levee_configtest/...
.
I'd love to hear your thoughts about adding this as a convention going forward. At your convenience, please pull my pseudo-module-testing
branch and see if this works with your own workflow. (I had to execute config_test
to trigger compiling and linkage.)
Document in README that users are welcome to open an Issue when they experience unexpected behavior, false positives or false negatives, etc.
This issue was prompted by discussion on #264.
Currently, specifying a source type does not require specifying a field, so any named type can be identified as a Source in the configuration (using the package and type keys). Although we do not know of any users leveraging this currently, it is easy to imagine how/why users might want to configure sources that are not structs:
type Secret string
type Secret interface {
ASecret()
}
Any named type can have methods declared on it. For structs, sensitive information is ultimately stored in fields. Since propagating taint to the results of any source method was too coarse, the fieldpropagator
analyzer was introduced to identify methods on struct sources that could return values tainted by a sensitive field. Consider, e.g.:
type Source struct {
SomethingPublic string
SomethingPrivate string `levee:"source"`
}
func (s *Source) DoesNotReturnATaintedValue() string {
return s.SomethingPublic
}
func (s *Source) ReturnsATaintedValue() string {
return s.SomethingPrivate
}
In the first case, the return value isn't tainted (more on this later), because the field being returned is not a source. We do want the return value of the second method to be tainted, because it returns a source field. We have tests validating this behavior.
Currently, the behavior with respect to methods on Source types is:
For methods on values that are not Sources, we always consider the returned values to be tainted, no matter if the value itself is tainted or not. Since we don't scrutinize these methods' bodies at all, we assume that they always propagate taint, just like we do for regular functions.
Coming back to this statement:
If the method is a fieldpropagator, then it always returns a tainted value, and it will get its own propagation (i.e., the call will be identified as a source and a propagation will start from there).
If a source type has no fields, then its methods will not be identified as field propagators. However, I believe that methods on non-struct source types should always propagate taint. Indeed:
Given this reasoning, I believe that we should:
fieldpropagator
analyzer to accurately reflect this behavior. Maybe something like methodpropagator
.Further discussion:
At the time of writing the fieldpropagator
analyzer (and before), we were concerned about false positives. In general, there's no harm in a simple getter that returns a non-source field. However, in some cases it is possible that non-source fields may be tainted, which effectively turns any method that returns that field into a propagator. Continuing with the previous example:
func Oops(s *Source) {
s.SomethingPublic = s.SomethingPrivate
Sink(s.DoesNotReturnATaintedValue()) // OOPS, actually, it *does* return a tainted value, and it just reached a sink
}
Currently, we do not have a good way of modeling something like this. The fieldpropagator
analyzer merely identifies methods as being propagators or not. It does not say which fields could be incorporated in any return values. In any case, the propagation code does not precisely model the taint status of fields either, because different FieldAddrs
referring to the same field are different ssa.Values
, so tainting one reference to a field does not taint the others.
Furthermore, consider this other example:
func (s *Source) MoreOops() {
s.SomethingPublic = s.SomethingPrivate
}
This behavior would also be very difficult to model using our existing abstractions.
Ultimately, what I believe this points to is:
fieldpropagator
analyzer is currently doing a very limited form of IPA (interprocedural analysis), which only captures whether or not the function always returns a value derived from a source field. Implementing a more complete form of IPA (e.g., function summaries describing the behavior of functions in more detail) will likely help alleviate some of the issues outlined here.ssa
values.This issue falls under the umbrella of #101.
Some ssa Instruction
s, for whatever reason, return token.NoPos
when asked for their position (instruction.Pos()
). If such an instruction is identified as a Source
and we produce a report for it, the report will say ... source: -
, which is rather unhelpful.
I first noticed this when running levee
on kubernetes
. A report was being produced where a Source
was an ssa.Extract
instruction. According to my investigations, (ssa.Extract).Pos()
always returns NoPos
.
I wrote a small analyzer to find which instructions sometimes return NoPos
, and which ones always return NoPos
. Running on kubernetes
, these are my findings:
The following instructions had no position at least 1 time:
*ssa.Alloc
*ssa.BinOp
*ssa.Call
*ssa.ChangeType
*ssa.Convert
*ssa.Field
*ssa.FieldAddr
*ssa.IndexAddr
*ssa.MakeClosure
*ssa.MakeInterface
*ssa.Panic
*ssa.Phi
*ssa.Return
*ssa.Slice
*ssa.Store
*ssa.TypeAssert
*ssa.UnOp
Many of these instructions could be Source
s. (Some clearly could not). I have not deeply investigated why these instructions sometimes have a position and sometimes don't. At this time, I am not convinced that it is worth investigating.
The following instructions had no position every time:
*ssa.ChangeInterface
*ssa.Extract
*ssa.If
*ssa.Jump
*ssa.Next
*ssa.RunDefers
In my opinion, it would be sufficient to handle Extract
s, since none of the other instructions could be Source
s, except maybe Next
s, but these will always occur with related Extract
s (I am 90% confident of this).
I think we should improve the reporting at least for Extract
s, and possibly for other instructions too. For other instructions, I believe a deeper investigation would be needed to ensure that special handling is actually required, i.e. that other instructions could simultaneously 1. be identified as Sources, and 2. have no position.
For Extract
s, I believe a reasonable approximation would be to use the position of the associated Tuple
. PR here: #107
Consider the following:
type Source struct {
secret string // `levee:"source"`
}
func Bar(s Source) {
s.secret = ""
Sink(s) // This should not emit a Diagnostic.
}
Explicitly setting to zero-values all fields containing source data should be recognized as sanitization.
#283 is adding the ability to suppress reports at call sites using a code comment. In most cases, the suppression behaves as expected. There is one case where the (lack of) suppression behavior is surprising:
// TODO(#284): we don't actually want a report here
fmt.Println(
core.SinkAndReturn(s), // levee.DoNotReport // want "a source has reached a sink"
)
This is because in the comment map, the // levee.DoNotReport
comment is (unexpectedly) associated with the s
identifier, not the core.SinkAndReturn
call.
Since s
is an argument to core.SinkAndReturn
, we could find the identifier by traversing the tree under the CallExpr
that corresponds to core.SinkAndReturn
and find the suppressing comment on the identifier. However, looking for suppressing comments on the arguments may cause misleading behavior, as in other cases it would suggest that it is possible to suppress a report for a specific argument, which isn't the case.
See #283 (comment) for additional discussion.
Here is a bit of code that can be used to dump the comment map for a file, which may be handy in investigating this issue.
package main
import (
"fmt"
"go/ast"
"go/parser"
"go/token"
"os"
"sort"
)
func main() {
fset := token.NewFileSet()
f, err := parser.ParseFile(fset, os.Args[1], nil, parser.ParseComments)
if err != nil {
panic(err)
}
// type CommentMap map[Node][]*CommentGroup
cm := ast.NewCommentMap(fset, f, f.Comments)
nodes := make([]ast.Node, 0, len(cm))
for node := range cm {
if node.Pos() == f.Pos() {
continue
}
nodes = append(nodes, node)
}
sort.Slice(nodes, func(i, j int) bool { return nodes[i].Pos() < nodes[j].Pos() })
for _, node := range nodes {
comments := cm[node]
fmt.Printf("%#v\n", node)
fmt.Println(fset.Position(node.Pos()).Line)
for _, c := range comments {
fmt.Println(fset.Position(c.Pos()).Line)
fmt.Println(c.Text())
}
}
}
A user may wish to configure identification of sources, sinks, and sanitizers based on many criteria beyond package, type, and method names.
See previous discussion for context.
Identification of sources, sinks, and sanitizers
Much of our work is directly with
ssa.Value
s andssa.Instruction
s.
Whether those values or instructions represent one element or another,
the configuration to identify these members should be as uniform as possible.
In that light, specification should follow:ValueSpecifier
A ValueSpecifier marks an
ssa.Value
, e.g., for identification as a source or as the safe > result of a sanitizer.
Values should be specifiable via any or all of:
- Specifier name: For identification during reporting
- Type / Field: Package path and type name, optional field name
- Field tags
- Scope: local variable, free variable, or global
- Context: within a specific package or function call. See CallSpecifier below.
- Is a reference
- Is "reference-like," e.g., slices, maps, chans.
- Const value (e.g.,
"PASSWORD"
)- Argument position (for use below)
CallSpecifier
A CallSpecifier marks an
*ssa.Call
, e.g., as a sink or sanitizer.
Calls should be specifiable via any or all of:
- Specifier name: For identification during reporting
- Symbol: package path, function name, optional receiver name.
- Context: A CallSpecifier indicating scope, such as package or specific functions.
- Based on argument value and position (ValueSpecifier).
Currently, analysis is executed on all provided packages, as well as the transitive closure of all Go package dependencies.
A user may consider third-party packages, generated code to be out-of-scope, or specifically sanctioned packages out of scope for analysis. While upstream analysis on these may be necessary (e.g. buildssa
), we should bypass our analysis of such specified packages as much as possible.
Ideally, this bypass should occur prior to entry into the various analyzer.Analyzer
s to reduce the size of the transitive dependency graph.
Consider this function:
func TestSanitizedSourceDoesNotTriggerFindingWhenTypeAsserted(s *core.Source) {
sanitized := core.Sanitize(s)[0].(*core.Source)
core.Sinkf("Sanitized %v", sanitized)
}
The relevant part of the SSA graph is:
The issue here is that the TypeAssert
is not identified as being produced by a sanitizer according to isProducedBySanitizer
. We still don't produce a report here because the current propagation code finds the sanitizer
and determines that the sinked value is dominated by the sanitizing call.
This incorrect behavior may lead to false positives.
The following need to be included or updated in our documentation.
Foo
, what other types do we implicitly consider sources?The following test cases were introduced by #195 and are currently failing:
func TestCallWithStructReferenceTaintsEveryField(h Headers) {
fooByPtr(&h) // without interprocedural assessment, foo can do anything, so this call should taint every field on h
core.Sink(h.Name) // TODO want "a source has reached a sink"
core.Sink(h.Other) // TODO want "a source has reached a sink"
}
func TestCallWithStructValueDoesNotTaintNonReferenceFields(h Headers) {
foo(h) // h is passed by value, so only its reference-like fields should be tainted
core.Sink(h.Name)
core.Sink(h.Other) // TODO want "a source has reached a sink"
}
type Headers struct {
Name string
Auth map[string]string `levee:"source"`
Other map[string]string
}
func fooByPtr(h *Headers) {}
func foo(h Headers) {}
The crux of the issue here is that since one of Headers
's fields is a source, that source will be in scope inside the caller and so in theory it could taint every other field. Without interprocedural analysis, we have to assume that every field will be tainted.
In fact, we have to do this inference whenever a struct with a tainted field is passed to a call, not just when a field is a source.
See #195 for additional discussion.
Consider this case:
func Test(str *string, src core.Source) {
fmt.Sscan(src.Data, &str)
core.Sink(str) // TODO() want "a source has reached a sink"
}
If you're not familiar with Sscan
(I wasn't), it essentially splits its first argument (a string) on whitespace and places the resulting strings into its second argument (which has type ...interface{}
). (See here for an example of using the closely related Sscanf
function).
So, clearly, if the string is tainted, the varargs
slice can also be tainted. This slice is implicit, however, and to construct it, in SSA form a slice is made and the arguments are placed into it via Store
instructions. Since for Store
s we only propagate from the value being stored to the storage location, we can't currently handle this case.
One solution would be to allow taint to propagate backwards through stores. This is likely to cause false positives, however.
Currently, the analysis performed by the fieldpropagator
analyzer to determine if a given method on a Source
type is a Field propagator
is very basic, and it only works for simple getters. As just one example, the following method is not identified as a Field propagator
:
func (s Source) DerivedFromData() string { // want DerivedFromData:"field propagator identified"
derived := fmt.Sprintf("%s", s.data)
return derived
}
The main analyzer would identify derived
as being tainted by s.data
, but the fieldpropagator
analyzer does not. Currently, we do not propagate taint to the return value of a method on a Source
type unless the method is a Field propagator
. This means that we would not find a potential leak if it involves a Field propagator
like the one above.
I think both analyzers should use the same propagation logic.
(This will also apply to interprocedural analysis when we get there.)
At present, reporting of Diagnostics is a generic a source has reached a sink
message with source and sink positions included. This can be useful in simple cases, but can be opaque in the event that a source has reached a sink via taint through many propagating steps.
While a full trace of propagation steps may be unnecessary, diagnostic reporting should be improved to the point that a developer reading a diagnostic does not need familiarity with SSA, taint propagation, or go-flow-levee
to remediate the failure. For instance, a more general "<source name and position> has exceeded its data boundary by reaching <sink name and position>. Arguments to <sink name> must be sanitized."
PR #178 added explicit handling for each node type.
For most node types, it is obvious what the correct behavior is, e.g. when doing a MapUpdate
, we shouldn't be traversing to the Key
or the Value
, only the Map
.
For other node types, the correct behavior is not obvious.
Here are the cases that need further investigation, and likely, more tests:
Currently, when building a graph of a source we stop following the graph when a sanitizer is encountered.
However, there is not guarantee that such sanitizer will dominate the yet to be discovered sinks.
Therefore, we should build the graph beyond sanitizers.
Once we know all the sinks, we could trip the graph by removing sinks that are dominated by a sanitizers.
In order to minimize the risk of configuration errors, we would like the unmarshalling of the configuration to be strict. Specifically, it should disallow unknown fields in matchers. This avoids errors where a typo in a key name leads to an incorrect configuration. We don't want typos to lead to silent failures, e.g. a matcher is broken and the user thinks their code is safe when it is not.
There are already two test cases covering this behavior to some extent, in config/matcher_test.go
:
{
desc: "Unmarshaling is strict",
yaml: `
Blahblah: foo
PackageRE: bar`,
shouldErrorOnLoad: false, // TODO true
},
(It is the same case, in the tests for two different matchers.)
Currently, the following can cause a report:
core.Sinkf("%T", source)
There is actually no issue in the above example: the type of a sensitive value is not sensitive.
We can likely borrow some logic from the printf
analyzer.
If we want to be especially diligent, we could go through all the formatting verbs and determine how to handle each one. From a cursory glance, I think the relevant ones are %T
, and maybe %s
and %q
. For these two, we could e.g. do some kind of analysis of the appropriate String()
function for the corresponding argument.
levee treats panic as a sink
but does not create a report for the cased below when AllowPanicOnTaintedValues=false (default config)
func Oops() {
go panic(Source{"sensitive data"})
}
func OopsIDidItAgain() {
defer panic(Source{"sensitive data"})
}
(We are assuming that Source
has been configured as a source and Sink
has been configured as a sink.)
Consider the following test case:
func TestTaintedAndSinkedInDifferentBranches(objects chan interface{}) {
select {
case objects <- core.Source{}:
case s := <-objects:
// TODO(211) want no report here, because objects is only tainted if the
// other branch is taken, and only one branch can be taken
core.Sink(s) // want "a source has reached a sink"
}
}
Given the semantics of select
, one of two things can happen:
objects
channelinterface{}
value from the objects
channel and we sink itAs indicated by the TODO
, it is therefore not possible that the channel be tainted and that a tainted value is obtained from it in the same call to this function.
However, this code may still be unsafe, because the fact that one of the cases puts a source on the channel indicates that the channel may contain sources in general. Consider e.g. a case where this function is called concurrently in two different goroutines. Such a case would be unsafe code. Handling this behavior in full generality within the scope of a single function would require us to model every interaction with the channel. Alternatively, this could be handled via interprocedural analysis (#99).
This issue is somewhat related to #161 and #210, in the sense that it is related to a behavior where we may want to leverage
Similar to #98, consider the following:
func Sinker(st string) {
Sink(st)
}
func Foo(s Source){
Sinker(s.String())
}
During analysis of Sinker
, the concrete type of st
is not concerning to reach a sink. However, without also configuring Sinker
as a sink, the taint propagation from s -> st := s.String() -> Sink(st)
is lost.
Currently, we are avoiding Booleans
, because it is quite clear that those can't meaningfully be tainted. The are other Basic
types we may want to avoid though, among the list of types.BasicKinds
:
type BasicKind int
const (
Invalid BasicKind = iota // type is invalid
// predeclared types
Bool
Int
Int8
Int16
Int32
Int64
Uint
Uint8
Uint16
Uint32
Uint64
Uintptr
Float32
Float64
Complex64
Complex128
String
UnsafePointer
// types for untyped values
UntypedBool
UntypedInt
UntypedRune
UntypedFloat
UntypedComplex
UntypedString
UntypedNil
// aliases
Byte = Uint8
Rune = Int32
)
(this is from go/types/type.go
)
For additional context, see this discussion: #178 (comment)
We may wish to be able to detect incorrect uses of sanitization by value. Consider the following sanitizer:
func Sanitize(s Source) Source {
s.Data = "<redacted>"
return s
}
Now consider the following incorrect use of this sanitizer:
func Oops(s Source) {
Sanitize(s)
Sink(s)
}
The issue is rather obvious: since Sanitize
receives a copy of s
, the original s
is not sanitized.
Although the issue is obvious, we currently do not produce a report for it. For additional context, here is the correct way to use a "value" sanitizer:
func Value(s Source) {
san := Sanitize(s)
Sink(san)
}
The reason the incorrect case above does not yield a report is that our checks to determine whether a source
was sanitized before reaching a sink
rely on domination
of the sink
instruction by the sanitizer
instruction. This allows us to handle cases like this:
func Pointer(s Source) {
Sanitize(&s)
Sink(s)
}
So it seems we will need some way to treat sanitizers differently depending on whether they take their argument by pointer or by value.
Consider this test case:
func TestSourcePointerAssertedFromParameterEface(e interface{}) {
s := e.(*core.Source)
core.Sink(s) // TODO want "a source has reached a sink"
}
The SSA is:
func TestSourcePointerAssertedFromParameterEface(e interface{})
0: entry
0(*ssa.TypeAssert ): t0 = typeassert e.(*example.com/core.Source)
1(*ssa.Alloc ): t1 = new [1]interface{} (varargs)
2(*ssa.IndexAddr ): t2 = &t1[0:int]
3(*ssa.MakeInterface ): t3 = make interface{} <- *example.com/core.Source (t0)
4(*ssa.Store ): *t2 = t3
5(*ssa.Slice ): t4 = slice t1[:]
6(*ssa.Call ): t5 = example.com/core.Sink(t4...)
7(*ssa.Return ): return
It it were a CommaOk TypeAssert, there would be an Extract, so the Source would be detected in sources.go:sourcesFromBlocks
.
If it were a TypeAssert to a Source value (not a pointer), there would be an Alloc, so again the source would be detected in sources.go:sourcesFromBlocks
.
The fieldpropagator
analyzer currently does not identify fieldpropagators that get their fields from another fieldpropagator, e.g. :
type Source struct { secret string }
func (s Source) Secret() string {
return s.secret
}
func (s Source) FormattedSecret() string {
return "the secret is: " + s.Secret()
}
The above example is admittedly a bit frivolous. I'm not sure how common these are.
The analysis is fact-based, so if a method has already been analyzed, its status as a fieldpropagator
can be checked by calling pass.ImportObjectFact(obj, &isFieldPropagator{})
.
Let's deconstruct that Issue title. Consider the following:
type Source struct {
secret string // `levee:"source"`
}
type Foo Source
The type Source
will be identified as a source due to the field tag. The type Foo
has the underlying type Source
, and should also be identified as a source type without explicit configuration.
In this instance, the underling struct literal contains a field tag, which takes part of type identity. As such, Foo.secret
also contains the field tag and will be identified as a source type.
However, consider instead
type Source struct { // Suppose this source type is identified by package path and name regexp
secret string
}
type Foo Source
Because Source
is matched by regexp, and if that regexp does not also match Foo
, then Foo
will not be currently be considered a source. This should be corrected by this Issue.
Implementation detail: if the underlying struct literal were marked as a source type, rather than the named type Source
, this would propagate more easily.
A user may have highly-specific considerations outside the scope of configuration we provide, perhaps interacting with ssa
, types
, or library-specific packages. We should consider exposing an entrypoint between our analysis steps for users to provide their own identification methods.
If there is user interest in this feature, please respond here with a comment or your choice of affirming :reaction:
.
This is a continuation of #87.
At time of writing, report of diagnostics can be suppressed at a function-level granularity. While functional, this has potential to suppress a wider breadth of reports than intended.
Suppression via code comment, e.g. nolint:levee <Justification>
would allow fine-grained suppression by immediately preceding a sink call. Optionally, we may consider suppression of source identification or taint propagation.
If analysis is integrated to become a blocking test, users will be dismayed if a false positive blocks progress. It should be possible to bypass these errors.
This could be done via an explicit allow-list (inclusive-)or in-code comment in the // NOLINT
style. The former should perhaps be favored for the possibility of out-of-scope findings, e.g. a finding in a user's third-party dependencies or in generated code.
Consider the following:
type Source struct {
secret string // `levee:"source"`
}
type Container struct {
ID int
content *Source // This field should implicitly be marked similarly as a source
}
While explicit instantiation c := Container{ID: 10, content: Source{secret: "my-token"}}
should be currently detected by taint propagation to the struct's field, this context may be lost outside the instantiation of the wrapper.
The following test case was introduced by #195 and is currently failing:
func TestTaintFieldOnNonSourceStruct(s core.Source, i *core.Innocuous) {
i.Data = s.Data
core.Sink(i) // TODO want "a source has reached a sink"
core.Sink(i.Data) // TODO want "a source has reached a sink"
}
This behavior is inadequate: It should be possible to taint a field on a non-Source struct, and tainting this field should taint the struct.
The following test case is also failing:
func TestTaintNonSourceFieldOnSourceType(s core.Source, i *core.Innocuous) {
s.ID, _ = strconv.Atoi(s.Data)
core.Sink(s.ID) // TODO want "a source has reached a sink"
}
It should be possible to taint a non-source field on a source type.
This is related to this piece of propagation code:
if !prop.config.IsSourceField(typPath, typName, fieldName) && !prop.taggedFields.IsSourceFieldAddr(t) {
return
}
Currently, we stop traversing when we reach a field unless the field is a Source.
See #195 for additional discussion.
This issue describes some bugs involving sanitization that could lead to false positives.
Consider the following case:
func TestTaintedInIfButSanitizedBeforeIfExit() {
var e interface{}
if false {
e = core.Source{}
e = core.Sanitize(e)[0]
}
core.Sink(e)
}
With the current code, e
is not considered sanitized when it reaches core.Sink
. This is because:
core.Sink
.core.Sanitize
does not dominate the call to core.Sink
, because one can get from var e interface{}
to core.Sink(e)
without going through the loop. However, in that path e
is not tainted.For this first case, correct behavior can be obtained by stopping traversal upon reaching a sanitizer. The fact that the sanitizer has been reached still needs to be recorded though, so that for other cases involving pointers, domination can be checked.
However, even if we don't traverse through the sanitizer, the following case still poses an issue:
func TestPointerTaintedInIfButSanitizedBeforeIfExit() {
var e interface{}
if false {
s := &core.Source{}
core.SanitizePtr(s)
e = s
}
core.Sink(e)
}
In this case, even if we stop traversing when we hit SanitizePtr
, there is still a path from s
, to e
(via e = s
), to core.Sink
. Then, because the call to core.SanitizePtr
does not dominate the call to core.Sink
, e
is not considered sanitized.
I have opened a PR in an attempt to address this: #154. However, I am not sure how to fix the second bug without making the code much more convoluted. I also feel like many additional tests would be needed.
A key limitation at present is in the following:
func Sinker(obj interface{}) {
Sink(obj)
}
func Foo(s Source){
Sinker(s)
}
During analysis of Sinker
, the concrete type of obj
is not known to be a Source
. The pointer
package could be used to produce a PointsTo
set of possible concrete types which could reach Sink(obj)
.
Because whole-program analysis can be expensive, configuration to enable/disable this feature should be provided.
Currently, we only examine for the tag levee:"source"
. The source tag should allow this to be user-configurable.
Consider the following (slightly contrived) test case:
func TestPropagationViaSourceMethod(s core.Source) {
str := s.Propagate(s.Data)
core.Sink(str) // TODO want "a source has reached a sink"
}
Currently, we do not produce a report in the above case. This is because of the following snippet in propagation.go
:
// This is to avoid attaching calls where the source is the receiver, ex:
// core.Sinkf("Source id: %v", wrapper.Source.GetID())
if recv := t.Call.Signature().Recv(); recv != nil && sourcetype.IsSourceType(prop.config, prop.taggedFields, recv.Type()) {
return
}
As far as I know, this check is to avoid incorrectly propagating through safe getters such as Source.GetID()
. If a getter is not safe, it should be identified by the fieldpropagators
analyzer, which will cause the Call
to be identified as a source in source.go
.
However, as seen in the test case at the beginning of this issue, this does not take into accounts methods on source types that can propagate taint received via one of their arguments. This could cause false negatives, and is inconsistent with how we handle calls in general : if a tainted argument goes in, we assume that whatever comes out is tainted.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.