Coder Social home page Coder Social logo

krypteria / yaralyze Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 2.0 309 KB

Yaralyze is an malware detection tool for Android that relies on two types of static analysis, Yara rule analysis and hashes analysis.

Java 87.08% Python 12.92%
android-application antimalware java malware-detection python static-analysis yara-rules yara-scanner

yaralyze's Introduction

Yaralyze

There are currently more than 7,260,000,000,000 mobile devices in the world, which means that 91.54% of the world's population has one. Approximately 2,500,000,000,000 of these devices have Android as their operating system.

It is no secret that these devices are becoming more and more important to us, they are with us practically all day long and contain a lot of personal information, which makes them an interesting target for malicious actors.

¿How malware can be detected?

Malware analysis can be categorised into 3 main types. static analysis, dynamic analysis and hybrid analysis.

Static analysis is considered to be any analysis that does not have to execute the code to analyse it, it is based on the search for patterns through rules or heuristics which makes them extremely safe because there is no possibility of activating the malware unintentionally. This type of analysis is faster than dynamic analysis and has a high detection rate for known malware by the very nature of its detection system.

Dynamic scanning, on the other hand, is any scan that needs to run the malware to analyse it, which means that a larger infrastructure must be in place to isolate it so that its execution does not affect real systems. This type of scanning is more reliable than static scanning and can detect unknown malware.

Finally, an analysis that uses both static and dynamic analysis techniques is known as hybrid analysis. Currently, well-known anti-malware solutions such as Kaspersky, Avira or Avast, among others, use this type of analysis, dividing it into distinct stages.

¿What is a Yara Rules?

Within the category of static analysis are the Yara rules. Yara rules are a type of malware signature that allows to identify and classify known malware.


The rules have three sections, a meta section where information about the rule itself is usually placed, the strings section where the patterns on which we are going to compare the malware are defined and the conditions section where the condition that the pattern must meet for the file to be considered malware is defined. The yara rules can be extremely complex, so I recommend reading their documentation if you want to understand in more detail how they work.

Another favourable point of yara rules is that they are a current technique that is starting to be widely used by analysts, which means that there are a large number of contributions.

Now lets go to the point: Yaralyze, malware detection tool

Yaralyze is a malware detection tool for Android devices that employs two static analysis techniques, one using yara rules and the other based on hashes analysis. It allows the storage and visualisation of reports, it is designed using a client-server architecture where the server can be hosted in the cloud so that it is always available from any mobile device that has the client installed and makes use of +130,000 Yara rules and +500,000 hashes of malware apps obtained from virusShare and Github (the rules and hashes are not published in the repository).

Analysis with Yara Rules

Analysis of application hashes

Testing on real malware

Two types of tests were carried out. One type of test consisted of testing the effectiveness of the tool in detecting known malware, using samples of Brata, Sharkbot, Cerberus and Flubot malwares, and the other was to test the speed of analysis.

analysis2 analysis1

As it can be seen in the images, it manages to detect the malware files and does not produce false positives with the real APK of winrar.

APK T1 T2 T3 T4 Average
Flubot (malware) 2.27s 2.23s 2.24s 2.29s 2.257s
Sharkbot (malware) 2.54s 2.51s 2.53s 2.56s 2.535s
Winrar 2.18s 2.20s 2.16s 2.16s 2.175s
Location of the application hash T1 T2 T3 T4 Average
Client DB 0.079s 0.081s 0.078s 0.077s 0.0787s
Server DB 0.088s 0.085s 0.087s 0.091s 0.0877s
No coincidence 0.087s 0.088s 0.084s 0.088s 0.0867s

In the first table we can see that in terms of speed it can be observed that the average analysis times are very similar, this is because all the APKs analysed go through all the Yara rules even if they have already been marked as malware because there may be rules that narrow down the type of malware we are dealing with. In addition, the analysis time is also conditioned by the size of the APK to be analysed, as is logical. These APKs did not have very different sizes.

In the second table we can see that the times are also very similar and this may seem strange because when the hash is in the server's database or when there are no matches, the client is required to make a request to the server, which should slow down the speed of the analysis. The equal time can be justified by the fact that at the time of testing the server was only receiving a single request so it did not have a heavy workload and also the database does not have a large enough number of hashes to overly burden the searches.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.