Coder Social home page Coder Social logo

codeql-uboot's Introduction

codeql-uboot's People

Contributors

adityasharad avatar xcorail avatar

Watchers

 avatar

codeql-uboot's Issues

Step 6 - Relating two variables

Step 6: Relating two variables

In step 4, you wrote a query that finds the definitions of functions named memcpy in the codebase. Now, we want to find all the calls to memcpy in the codebase.

One way to do this is to declare two variables: one to represent functions, and one to represent function calls. Then you will have to create a relationship between these variables in the where section, so that they are restricted to only functions that are named memcpy, and calls to exactly those functions.

Step 9 - Write your own class

Step 9: Write your own class

In this step we will learn how to write our own CodeQL classes. This will help us make the logic of our query more readable, easier to reuse, and easier to refine.

We'd like to find the same results as in the previous step, i.e. the top level expressions that correspond to the ntohl, ntohs and ntohll macro invocations. It would be useful if we could refer to all such expressions directly, just like we can use MacroInvocation from the standard library to refer to all macro invocations.

We will define a class to describe exactly this set of expressions, and use it in the last step of this course.

The Expr class is the set of all expressions, and we are interested in a more specific set of expressions, so the class we write will be a subclass of Expr.

Step 8 - Changing the selected output

Step 8: Changing the selected output

In the previous step, you found invocations of the macros we are interested in. Modify your query to find the top-level expressions these macro invocations expand to.

Note: An expression is a source code element that can have a value at runtime. Invoking a macro can bring various source code elements into scope, including expressions.

Step 7 - Relating two variables, continued

Step 7: Relating two variables, continued

In step 5, you wrote a query that finds the definitions of macros named ntohs, ntohl and ntohll in the codebase. Now, we want to find all the invocations of these macros in the codebase.

This will be similar to what you did in step 6, where you created variables for functions and function calls, and restricted them to look for a particular function and its calls.

Note: A macro invocation is a place in the source code that calls a particular macro. This is comparable to how a function call is a place in the source code that calls a particular function.

Step 4 - Anatomy of a query

Step 4: Anatomy of a query

Now let's analyze what you have written. A CodeQL query has the following basic structure:

import /* ... path to some CodeQL libraries ... */

from /* ... variable declarations ... */
where /* ... logical formulas that say something about the variables ... */
select /* ... expressions to output ... */

The from/where/select part is the query clause: it describes what we are trying to find in the source code.

Let's look closer at the query we wrote in the previous step.

Show the query
import cpp

from Function f
where f.getName() = "strlen"
select f, "a function named strlen"

Imports

At the top of the query is import cpp. This is an import statement. It brings into scope the standard CodeQL library that models C/C++ code, allowing us to use its features in our query. We'll use this library in every query, and in later steps we'll also use some more specialized libraries.

Classes

In the from section, there is a declaration Function f. Here we declare a variable named f which has the type Function. Function is a class declared in the standard library (you can jump to the definition using F12). A class represents a collection of values, in this case the collection of all C/C++ functions in the source code.

Predicates

Now look at the expression f.getName() in the where section. Here we call the predicate getName on the variable f of type Function. Predicates are the building blocks of a query: they express logical properties that we want to hold. Some predicates return results (like getName), and some predicates do not (they just assert that a property must be true).

So far your query finds all functions with the name strlen. It does this by asserting that the result of f.getName() is equal to the string "strlen".

Step 3 - Your first query

Step 3: Your first query

You will now run a simple CodeQL query, to understand its basic concepts and get familiar with your IDE.

โŒจ๏ธ Activity: Run a CodeQL query

  1. Edit the file 3_function_definitions.ql with the following contents:

    import cpp
    
    from Function f
    where f.getName() = "strlen"
    select f, "a function named strlen"

    Don't copy / paste this code, but instead type it slowly. You will see the CodeQL auto-complete suggestions in your IDE as you type.

    • After typing from and the first letters of Function, the IDE will propose a list of available classes from the CodeQL library for C/C++. This is a good way to discover what classes are available to represent standard patterns in the source code.
    • After typing where f. the IDE will propose a list of available predicates that you can call on the variable f.
    • Type the first letters of getName() to narrow down the list.
    • Move your cursor to a predicate name in the list to see its documentation. This is a good way to discover what predicates are available and what they mean.
  2. Run this query: Right-click on the query editor, then click CodeQL: Run Query.

  3. Inspect the results appearing in the results panel. Click on the result hyperlinks to navigate to the corresponding locations in the U-Boot code. Do you understand what this query does? You probably guessed it! This query finds all functions with the name strlen.

Now it's time to submit your query. You will have 2 choices to do that, and we'll explain both of them in the comments below. Once you have chosen your method, submit your answer!

Read carefully: you will need to follow the same steps to submit your answers to later steps. You can always come back to this issue later to check the submission instructions.

Step 5 - Using different classes and their predicates

Step 5: Using different classes and their predicates

We want to identify integer values that are supplied from network data. A good way to spot those is to look for use of network ordering conversion macros such as ntohl, ntohll, and ntohs.

In the from section of the query, you declare some variables, and state the types of those variables. The type tells us what the possible values are for the variable.

In the previous query you were querying for values in the class Function to find functions in the source code. We have to query a different type to find macros in the source code instead. Can you guess its name?

NOTE: These Network ordering conversion utilities can be macros or functions depending on the platform. In this course, we are looking at a Linux database, where they are macros.

Step 10 - Data flow and taint tracking analysis

Step 10: Data flow and taint tracking analysis

Great! You made it to the final step!

In step 9 we found expressions in the source code that are likely to have integers supplied from remote input, because they are being processed with invocations of ntoh, ntohll, or ntohs. These can be considered sources of remote input.

In step 6 we found calls to memcpy. These calls can be unsafe when their length arguments are controlled by a remote user. Their length arguments can be considered sinks: they should not receive user-controlled values without further validation.

Combining these pieces of information,
we know that code is vulnerable if tainted data flows from a network integer source to a sink in the length argument of a memcpy call.

However, how do we know whether data from a particular source might reach a particular sink? This is known as data flow or taint tracking analysis. Given the number of results (hundreds of memcpy calls and a large number of macro invocations), it would be quite a lot of work to triage all these cases manually.

To make our triaging job easier, we will have CodeQL do this analysis for us.

You will now write a query to track the flow of tainted data from network-controlled integers to the memcpy length argument. As a result you will find 9 real vulnerabilities!

To achieve this, weโ€™ll use the CodeQL taint tracking library. This library allows you to describe sources and sinks, and its predicate hasFlowPath holds true when tainted data from a given source flows to a sink.

Step 1 - Welcome to the course!

Welcome to the CodeQL U-Boot Challenge for C/C++

We created this course to help you quickly learn CodeQL, our query language and engine for code analysis. The goal is to find several remote code execution (RCE) vulnerabilities in the open-source software known as U-Boot, using CodeQL and its libraries for analyzing C/C++ code. To find the real vulnerabilities, you'll need to write a sequence of queries, making them more precise at each step of the course.

More detail

The goal is to find a set of 9 remote-code-execution vulnerabilities in the U-Boot boot loader. These vulnerabilities were originally discovered by GitHub Security Lab researchers and have since been fixed. An attacker with positioning on the local network, or control of a malicious NFS server, could potentially achieve remote code execution on the U-Boot powered device. This was possible because the code read data from the network (that could be attacker-controlled) and passed it to the length parameter of a call to the memcpy function. When such a length parameter is not properly validated before use, it may lead to exploitable memory corruption vulnerabilities.

U-Boot contains hundreds of calls to both memcpy and libc functions that read data from the network. You can often recognize network data being acted upon through use of the ntohs (network to host short) and ntohl (network to host long) functions or macros. These swap the byte ordering for integer values that are received in network ordering to the host's native byte ordering (which is architecture dependent).

In this course, you will use CodeQL to find such calls. Many of those calls may actually be safe, so throughout this course you will refine your query to reduce the number of false positives, and finally track down the unsafe calls to memcpy that are influenced by remote input.

Upon completion of the course, you will have created a CodeQL query that is able to find variants of this common vulnerability pattern.

Step 1: Know where to get help

Bookmark these useful documentation links:

If you get stuck during this course and need some help, the best place to ask for help is on the GitHub Security Lab Slack. Request an invitation from the Security Lab Get Involved page and ask in the channel #codeql-writing. There are also sample solutions in the course repository, but please try to solve the tasks on your own first!

Hope this is exciting! Please close this issue now, then wait for the next set of instructions to appear in a comment below.

Step 2 - Set up your IDE

Step 2: Setup your environment

We will use the CodeQL extension for Visual Studio Code. You will take advantage of IDE features like auto-complete, contextual help and jump-to-definition.

Don't worry, you'll do this setup only once, and you'll be able to use it for future CodeQL development.

Follow the instructions below.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.