Coder Social home page Coder Social logo

danielplohmann / apiscout Goto Github PK

View Code? Open in Web Editor NEW
223.0 19.0 44.0 3.02 MB

This project aims at simplifying Windows API import recovery on arbitrary memory dumps

License: BSD 2-Clause "Simplified" License

Makefile 0.11% Python 96.79% C 0.79% HTML 0.06% Java 2.25%
reverse-engineering windows-api malware-analysis malware-classifier

apiscout's Introduction

ApiScout

This project aims at simplifying Windows API import recovery. As input, arbitrary memory dumps for a known environment can be processed (please note: a reference DB has to be built first, using apiscout/db_builder).
The output is an ordered list of identified Windows API references with some meta information, and an ApiVector fingerprint.

  • scout.py -- should give a good outline on how to work with the library.
  • ida_scout.py -- is a convenience GUI wrapper for use in IDA Pro.
  • GhidraScout.java -- is a Ghidra plugin for ApiScout (contributed by @mari-mari).
  • match.py -- demonstrates how ApiVectors can be matched against each other and collections of fingerprints.
  • collect.py -- builds a database of WinAPI fingerprints (ApiVectors) that can be used for matching.
  • export.py -- generates ApiQR diagrams that visualize ApiVectors.
  • update.py -- pull the most recent ApiVector DB from Malpedia (requires Malpedia account / API token).

The code should be fully compatible with Python 2 and 3.
There is a blog post describing ApiScout in more detail: http://byte-atlas.blogspot.com/2017/04/apiscout.html.
Also, another blog post explaining how ApiVectors are constructed and stored: https://byte-atlas.blogspot.com/2018/04/apivectors.html.
We also presented a paper at Botconf 2018 that describes the ApiScout methodology in-depth, including an evaluation over Malpedia: https://journal.cecyf.fr/ojs/index.php/cybin/article/view/20/23

Version History

  • 2023-03-27: v2.0.2 - Bugfix for IdaScout and handling of sets, contributed by @7a6570 (THX!!)
  • 2023-01-02: v2.0.1 - Bugfix for lief-based import table parsing with Python 3.10, contributed by @malware-kitten (THX!!)
  • 2022-08-01: v1.2.0 - Added plugin for Ghidra, contributed by @mari-mari (THX!!)
  • 2022-09-20: v2.0.0 - (potentially BREAKING) crawl results now have one additional output field with a set of calling references for a given WinAPI. This is also included in the JSON output of scout.py, contributed by @renzhexigua (THX!!)
  • 2022-08-01: v1.2.0 - Added plugin for Ghidra, contributed by @mari-mari (THX!!)
  • 2022-01-17: v1.1.9 - Fixed ida_scout.py to work with IDA 7.5+ when apiscout is also installed as a Python package.
  • 2021-10-04: v1.1.8 - Extension of winapi contexts based on observations provided by @blattm (THX!).
  • 2021-08-30: v1.1.7 - Fixed deprecation warning in ApiQR as raised by numpy.
  • 2021-07-31: v1.1.6 - It's no longer required to keep a fixed LIEF version. (THX to @cccs-rs!)
  • 2021-01-10: v1.1.5 - Python3 LIEF package fixed to version 0.10.1 (THX to @akhribfarouk!)
  • 2020-12-09: v1.1.4 - Python3 fixes on DatabaseBuilder (THX to @Dump-GUY!)
  • 2020-07-13: v1.1.3 - Added "install_requires" to setup.py to ensure dependencies are installed.
  • 2020-06-30: v1.1.0 - Now using LIEF for import table parsing. Fixed bug which would not produce ApiVectors when using import table parsing. ApiScout is now also available through PyPI.
  • 2020-03-03: Added a script to pull the most recent ApiVector DB from Malpedia (requires Malpedia account / API token).
  • 2020-03-02: Ported to IDA 7.4 (THX to @jenfrie).
  • 2020-02-18: DB Builder is now compatible up to Python 3.7 (THX to @elanfer).
  • 2019-10-08: Workaround for broken filtering of the API view in IDA 7.3 (THX to @enzok for pointing this out).
  • 2019-08-22: Fixed a bug where missing type info in IDA would lead to a crash (now gives an error message instead).
  • 2019-08-20: Added self-filter to eliminate pointers to own memory image that could be mistakenly treated as API references.
  • 2019-06-06: Added support for proper type reconstruction for annotated APIs in IDA Pro (THX to @FlxP0c)
  • 2019-05-15: Added numpy support for vector calculations (based on implementation provided by @garanews - THX!)
  • 2019-05-15: Fixed a bug in PE mapper where buffer would be shortened because of misinterpretation of section sizes.
  • 2019-01-23: QoL improvements: automated data folder deployment when used as module, logger initialization (THX to @jdval)
  • 2018-08-23: Fixed a bug in PE mapper where the PE header would be overwritten by (empty) section data.
  • 2018-08-21: Added functionality that allows to use import table information instead of crawling for references.
  • 2018-07-31: Fixed convenience functions to create/export vectors from/to lists and dicts, added test coverage.
  • 2018-07-23: WARNING: Change in Apivector format -- Introduced sorted ApiVectors which are even more space efficient (20%+).
  • 2018-06-25: Fixed incompatibility with IDA Pro 7.0+ (THX to @nazywam!)
  • 2018-05-23: Added further semantic context groups (THX to Quoscient.io)
  • 2018-03-27: Heuristic estimation of Windows API reference counts added
  • 2018-03-06: ApiQR visualization of vector results (C-1024)
  • 2017-11-28: Added own import table parser to enrich result information
  • 2017-08-24: Multi-Segment support in IDA Pro (THX to @nazywam!)
  • 2017-05-31: Added Windows 7 SP1 64bit import DB (compatible to Malpedia)

Credits

The idea has previously gone through multiple iterations until reaching this refactored release.
Thanks to Thorsten Jenke and Steffen Enders for their previous endeavours and evaluating a proof-of-concept of this method.
More thanks to Steffen Enders for his work on the visualization of ApiQR diagrams.
Also thanks to Ero Carrera for pefile and Elias Bachaalany for the IDA Python AskUsingForm template. :)
Additionally many thanks to Andrea Garavaglia for his performance benchmarks that lead to drastic speedups in the applied matching!

Pull requests welcome! :)

apiscout's People

Contributors

7a6570 avatar catsuryuu avatar cccs-rs avatar danielplohmann avatar dump-guy avatar elanfer avatar garanews avatar jdval avatar mari-mari avatar nazywam avatar renzhexigua avatar rikyoz avatar steffenenders avatar trietptm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

apiscout's Issues

Wrong mapping in exported images.

Hi @danielplohmann,
compressed API vector "g1A170" is correctly shown on the malpedia website, when the import is correctly mapped to the top left corner of the image. However when setting the vector in ApiQR instance and exporting the image (to png or html), it's mapped to advapi32.dll:BackupEventLog

IDA 7.0 - ValueError: Invalid chooser passed

Hey, great job on apiscout!

I've noticed that recent(ish) changes to idaapi have broken some stuff in apiscout. This is what I get when I try to run the script on IDA Version 7.0.170914 Linux x86_64:

/home/michal/work/apiscout/ida_scout.py: Invalid chooser passed.
Traceback (most recent call last):
  File "/home/michal/ida-7.0/python/ida_idaapi.py", line 553, in IDAPython_ExecScript
    execfile(script, g)
  File "/home/michal/work/apiscout/ida_scout.py", line 63, in <module>
    main()
  File "/home/michal/work/apiscout/ida_scout.py", line 42, in main
    parameters = tools.formGetParameters()
  File "/home/michal/work/apiscout/apiscout/IdaTools.py", line 148, in formGetParameters
    form = IdaApiScoutOptionsForm(db_folder)
  File "/home/michal/work/apiscout/apiscout/IdaForm.py", line 84, in __init__
    'cApiDbChooser' : Form.EmbeddedChooserControl(self.apiDbChooser)
  File "/home/michal/ida-7.0/python/ida_kernwin.py", line 4219, in __init__
    raise ValueError("Invalid chooser passed.")
ValueError: Invalid chooser passed.

My guess is that they made some breaking changes in the Choose interface, as seen here (give it a moment to load, it's a pretty big diff):
idapython/src@d99a893#diff-35a3e7c5c3a9f77d6e7f50ec29de6401R680

Doing a rough sed s/Choose2/Choose/g on the IdaForm.py seems to fix the issue, however, probably some more examination should be done ;)

I'll take a look at the idaapi form implementation and apiscout implementation and will probably report back in a few days
Cheers!

#Failed Installation with "setup.py" because of "Lief"


Running setup.py install for lief ... error - Under windows 10 64 bits - Python 3.9.1 - pip 20.3.3


python setup.py install
.
.
Collecting lief
  Downloading lief-0.10.1.tar.gz (12.7 MB)
     |████████████████████████████████| 12.7 MB
Using legacy 'setup.py install' for lief, since package 'wheel' is not installed.
Installing collected packages: lief
    Running setup.py install for lief ... error
    ERROR: Command errored out with exit status 1:

FIX for Windows 32 bits -->

(make sure that your python and pip are installed )

  1. Open Command prompt as Administrator
  2. run this command:
pip install "https://lief-project.github.io/packages/lief/lief-0.11.0.dev0-cp39-cp39-win32.whl"
  1. Re-run the install command:
python setup.py install

FIX for Windows 64 bits -->

(make sure that your python and pip are installed )

  1. Open Command prompt as Administrator
  2. run this command:
pip install "https://lief-project.github.io/packages/lief/lief-0.11.0.dev0-cp39-cp39-win_amd64.whl"
  1. Re-run the install command:
python setup.py install

FIX for Other platforms -->

(make sure that your python and pip are installed )

  1. go to this website:
https://lief.quarkslab.com/packages/lief/
  1. choose your system platform
  2. run:
pip install <YOUR CHOOSEN FILE FOR SELECTED PLATFORM> 

This is the official Repo of Lief:
https://github.com/lief-project/LIEF

IDA 7.3 - AttributeError: 'module' object has no attribute 'MAXSTR'

When selecting Filter button the following error is generated.

Traceback (most recent call last): File "_ctypes/callbacks.c", line 315, in 'calling callback function' File "C:\Program Files\IDA Pro 7.3\python\ida_kernwin.py", line 6205, in helper_cb r = self.handler(button_code) File "C:/Users/ez/Desktop/tools/apiscout-master\apiscout\IdaForm.py", line 192, in OnButtonApplyFilter self.SetControlValue(self.cApiInfo, "APIs: %d/%d (filtered to 0x%x - 0x%x, range: 0x%x)" % (len(self.apiChooser.items), len(self.apiChooser.all_items), addr_from, addr_to, byte_range)) File "C:\Program Files\IDA Pro 7.3\python\ida_kernwin.py", line 6924, in SetControlValue tid, _ = self.ControlToFieldTypeIdAndSize(ctrl) File "C:\Program Files\IDA Pro 7.3\python\ida_kernwin.py", line 6951, in ControlToFieldTypeIdAndSize return (3, min(_ida_kernwin.MAXSTR, ctrl.size)) AttributeError: 'module' object has no attribute 'MAXSTR'

DatabaseBuilder skipping some dlls

No matter if I did not use --auto option or if I modified config.py, DatabaseBuilder.py is still skipping some dlls during processing and parsing exports.

Tested on win7 sp1 professional, win10 pro.
Python 3.7

I attached a screenshot where i specified option to parse only dll from my directory and you can see that advapi32.dll, iertutil.dll, crypt32.dll etc.. are still not processed...
It's an amazing tool but could you please check this issue?
Capture

False Positive on DllBaseChecker[32|64]

as i said before you can avoid the False positive of the scanners by forcing the allocated memory and it worked (everytime)
this is the fix (maybe it is not sofisticated but it works like a charm):


#include <windows.h>
#include <tchar.h>
#include <winbase.h>
#include <stdio.h>
#include <stdbool.h> 
// bitness check courtesy of http://stackoverflow.com/a/12338526
// Check windows
#if _WIN32 || _WIN64
#if _WIN64
#define ENV64BIT
#else
#define ENV32BIT
#endif
#endif
// Check GCC
#if __GNUC__
#if __x86_64__ || __ppc64__
#define ENV64BIT
#else
#define ENV32BIT
#endif
#endif
#define AllotOff_MEM 100000000
int main() {
	DWORD written_b = 0;
	HANDLE hStdOut = 0;
	HINSTANCE hDllBase = 0;
	LPWSTR* szArglist;
	char buffer[50];
	int nArgs;
	hStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
	szArglist = CommandLineToArgvW(GetCommandLineW(), &nArgs);
	if (nArgs > 1) {
		GetFileAttributesW(szArglist[1]); // from winbase.h
		if (INVALID_FILE_ATTRIBUTES == GetFileAttributesW(szArglist[1]) && GetLastError() == ERROR_FILE_NOT_FOUND)
		{
			char cFileNotFound[] = "DLL not found?\n";
			WriteFile(hStdOut, cFileNotFound, strlen(cFileNotFound), &written_b, 0);
			return 0;
		}
		else {
			// suppress output of popups (Entry Point not found etc.)
			UINT oldErrorMode = SetErrorMode(SEM_FAILCRITICALERRORS);
			SetErrorMode(oldErrorMode | SEM_FAILCRITICALERRORS);
			hDllBase = LoadLibraryW(szArglist[1]);

		}
	}
	else {
		char output[] = "Usage: DllBaseChecker[32|64].exe <dll_to_load>";
		WriteFile(hStdOut, output, strlen(output), &written_b, 0);
		return 0;
	}
#if defined(ENV64BIT)
	bool a = true;
	bool b = false;
	int	s, l = 0;
	char * memdmp = NULL;
	//ByPassing the False Positive here because it is to much for the memory but not the CPU.
	memdmp = (char *)malloc(AllotOff_MEM);
	if (a != false)
	{
		goto zone_two;
		for (s = 0; s < 100000; s++)
		{
			l += s;
		}
	}
zone_three:
	if (memdmp != NULL)
	{
	if (sizeof(void*) != 8)
	{
		wprintf(L"ENV64BIT: Error: pointer should be 8 bytes. Exiting.");
		return 0;
		b = true;
	}
	else {
		sprintf(buffer, "DLL loaded at: 0x%llx\n", hDllBase);
		WriteFile(hStdOut, buffer, strlen(buffer), &written_b, 0);
		b = true;
	}
	}
zone_two:
	if (b != true)
	{
		for (s = 0; s < 100000; s++)
		{
			l += s;
		}
		goto zone_three;
	}
	return 0;
#elif defined (ENV32BIT)
	bool a = true;
	bool b = false;
	int	s,l = 0;
	char * memdmp = NULL;
	memdmp = (char *)malloc(AllotOff_MEM);
	if (a!=false)
	{
		goto zone_two;
		for (s=0;s<100000;s++)
		{
			l += s;
		}
	}
	zone_three:
	if (memdmp != NULL)
	{
		if (sizeof(void*) != 4)
		{
			wprintf(L"ENV32BIT: Error: pointer should be 4 bytes. Exiting.");
			b = true;
			return 0;
		}
		else {
			sprintf(buffer, "DLL loaded at: 0x%x\n", (unsigned int)hDllBase);
			WriteFile(hStdOut, buffer, strlen(buffer), &written_b, 0);
			b = true;
		}
	}
zone_two:
	if (b!=true)
	{
		for (s = 0; s < 100000; s++)
		{
			l += s;
		}
		goto zone_three;
	}
#else
#error "Must define either ENV32BIT or ENV64BIT".
#endif
	return 0;
}

Lief error reading file in Python 3.10

The file I used is a copy of plugx from Malpedia, but the error should reproduce regardless.

When using ApiScout on 3.8 this all works as expected

Python 3.8.10 (default, Nov 14 2022, 12:59:47) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from apiscout.ApiScout import ApiScout
>>> buf = open('malpedia/win.plugx/vt-2021-08-03-MustangPanda/9c89776d100ecb57dc84742b433c7d7d0eca291f523cc662c94e2582c1a476f4_unpacked', 'rb').read()
>>> scout = ApiScout()
>>> scout.setBaseAddress(0)
>>> scout_ev = scout.evaluateImportTable(buf, is_unmapped=True)
 is not an ELF
>>> scout.getWinApi1024Vectors(scout_ev).get('import_table', {}).get('vector', None)
'A109EA5wA3QA4EA5BAQA5CA4IQA7SAExoAgQXpD}-MpYyl?'

However when using 3.10:

Python 3.10.9 (main, Dec 08 2022, 14:49:06) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from apiscout.ApiScout import ApiScout
>>> buf = open('malpedia/win.plugx/vt-2021-08-03-MustangPanda/9c89776d100ecb57dc84742b433c7d7d0eca291f523cc662c94e2582c1a476f4_unpacked', 'rb').read()
>>> scout = ApiScout()
>>> scout.setBaseAddress(0)
>>> scout_ev = scout.evaluateImportTable(buf, is_unmapped=True)
.....snipped.....
                                                                                                                                                               ????? ?$?(?,?0?4?8?<?@?D?H?L?P?\?`?d?h?l?p?t?x?|?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�h00                                                                                                                                                                                                                                                                                                                040D0T0d0t0�0�0�0�0�0�0H2P2T2X2\2`2d2h2l2p2t2�2�2�2�2�2�2�2�2�2�2�2�2�2�2�2�2�2�2�2�2�2�2'
Unknown format
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.10/site-packages/apiscout-2.0.0-py3.10.egg/apiscout/ApiScout.py", line 194, in evaluateImportTable
    bitness = 32 if lief_binary.header.machine == lief.PE.MACHINE_TYPES.I386 else 64
AttributeError: 'NoneType' object has no attribute 'header'

It looks like the actual error is just above the line mentioned in the error.

lief_binary = lief.parse(bytearray(binary))

When not explicitly casting it to a bytearray the file will parse.

Working example (under 3.10)

Python 3.10.9 (main, Dec 08 2022, 14:49:06) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import lief
>>> from apiscout.ApiScout import ApiScout
>>> scout = ApiScout()
>>> scout.setBaseAddress(0)
>>> buf = open('malpedia/win.plugx/vt-2021-08-03-MustangPanda/9c89776d100ecb57dc84742b433c7d7d0eca291f523cc662c94e2582c1a476f4_unpacked', 'rb').read()
>>> lief_binary = lief.parse(buf)
>>> results = {"import_table": []}
>>> bitness = 32 if lief_binary.header.machine == lief.PE.MACHINE_TYPES.I386 else 64
>>> for imported_library in lief_binary.imports:
...     for func in imported_library.entries:
...         if func.name:
...             results["import_table"].append((func.iat_address + 0, 0xFFFFFFFF, imported_library.name.lower() + "_0x0", func.name, bitness, True, 1, set()))
... 
>>> scout.getWinApi1024Vectors(results).get('import_table', {}).get('vector', None)
'A109EA5wA3QA4EA5BAQA5CA4IQA7SAExoAgQXpD}-MpYyl?'

Add symbols for some MFC DLLs

It should be possible to enhance DbBuilder by using information from Microsoft DEF files to also get symbol names for mfc42(u).dll and some other DLLs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.