cert-polska / mwdb-core Goto Github PK
View Code? Open in Web Editor NEWMalware repository component for samples & static configuration with REST API interface.
Home Page: https://mwdb.readthedocs.io/
License: Other
Malware repository component for samples & static configuration with REST API interface.
Home Page: https://mwdb.readthedocs.io/
License: Other
It would be nice to be able to tag objects directly from feed (Recent objects view). Maybe display [+]
while hovering over Tags field.
Moved from CERT.pl internal repository. Reported originally by chivay
Typed-config (https://github.com/bwindsor/typed-config) is used by Malwarecage to provide configuration that can be easily extended by plugins.
Current version of library is missing one feature needed for extendability and one bugfix. Both are added/fixed by the following PRs:
I've just copied the fixed code of library under core/typedconfig
, but it would be best to use it as PyPi dependency.
Moved from CERT.pl internal repository. Reported originally by psrok1
Promises can't be easily cancelled in current model but at least we should check whether target component is still mounted.
Moved from CERT.pl internal repository. Reported originally by psrok1
Feature Category
Describe the problem
https://<mwdb>/sample/<hash>
https://<mwdb>/config/<hash>
https://<mwdb>/blob/<hash>
Describe the solution you'd like
Test if view renders correctly in these cases:
https://<mwdb>/sample/<existent sha256 hash>
https://<mwdb>/sample/<existent md5 hash>
https://<mwdb>/sample/<non-existent sha256 hash>
https://<mwdb>/sample/<non-existent md5 hash>
https://<mwdb>/config/<existent hash>
https://<mwdb>/config/<non-existent hash>
https://<mwdb>/blob/<existent hash>
https://<mwdb>/blob/<non-existent hash>
Currently, there is no way to know what plugins are loaded and installed for Malwarecage.
It would be great to have a page in the UI to list the plugins and even a toggle-switch component to turn on or off each plugin.
This feature is already supported by backend and can be used for links to objects where we don't have information about object type.
Environment information
/about
): 2.0.0-alpha1
Behaviour the bug (what happened?)
JSONDecodeError
exception.Expected behaviour
JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
Proposed fix
ValidationError
instead of returning error via field like obj.errors
. We should provide wrapper that loads request data and universally handles both ValidationError
, JSONDecodeError
and other exceptions that can occur due to corrupted input.model.object.Object.add_parent
(test-and-set race)model.object.Object.give_access
(ObjectPermission.create
seems ok at first sight)Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
context)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
cursor.execute(statement, parameters)
psycopg2.IntegrityError: duplicate key value violates unique constraint "ix_relation_parent_child"
DETAIL: Key (parent_id, child_id)=(3014895, 3014562) already exists.
Moved from CERT.pl internal repository. Reported originally by psrok1
Feature Category
Describe the problem
conn = redis.from_url(app_config.malwarecage.redis_uri)
if request.method == 'GET':
"""
DownloadResource is token-based and shouldn't be limited
"""
if request.endpoint != 'downloadresource':
# 1000 per 10 seconds
rate_limit(conn, "get-request", 10, 1000)
# 2000 per 1 minute
rate_limit(conn, "get-request", 60, 2000)
# 6000 per 5 minutes
rate_limit(conn, "get-request", 5 * 60, 6000)
# 10000 per 15 minutes
rate_limit(conn, "get-request", 15 * 60, 10000)
else:
# 10 per 10 seconds
rate_limit(conn, "set-request", 10, 10)
# 30 per 1 minute
rate_limit(conn, "set-request", 60, 30)
# 100 per 5 minutes
rate_limit(conn, "set-request", 5 * 60, 100)
# 200 per 15 minutes
rate_limit(conn, "set-request", 15 * 60, 200)
Describe the solution you'd like
We should be able to customize rate limits via configuration with some good defaults included.
Limits should be grouped by their semantics instead of used HTTP method e.g.:
Describe the problem you are facing
Currently, when I want to execute the same set of queries in mwdb
frequently (once a day, once a week, ...) I don't have any easy way to do this. Only by typing them ad-hoc or keeping a list of queries at the side.
Describe the solution you'd expect
I would like to have an option to save queries (e.g tag:"yara:win_formbook" AND NOT tag:"ripped:*"
) so I can quickly access them when I need. In this solution, I could visit the Search page, or a special dashboard, and choose a query I want to execute, without having to type it manually or pasting it from notepad.
Currently the only way to add parent-child relation between files from web UI is to click on Add child
button which redirects us to the upload view.
In many cases we want to add relation between existing objects by providing identifier (hash) of child object. We should be able to do that without uploading file contents.
Action is already possible from the mwdblib CLI:
$ mwdb link --help
Usage: mwdb link [OPTIONS] PARENT CHILD
Set relationship for objects
Feature Category
Describe the problem
in-blob
key in configuration has special meaning: it allows to put links to blob objects in static configuration that contain the value which is too big to be presented directly or we want to have additional level of relationship on single key level.
All related blobs must be uploaded manually by user along with proper relationships and config transformation to in-blob
form. Malwarecage handles it only at presentation level (web app). It's not very convenient for config uploaders that want to take advantage of this feature.
Describe the solution you'd like
During configuration upload, Malwarecage should look for {"in-blob": ...}
structure provided as value:
{
"config-key": {
"in-blob": {
"blob_name": ...
"blob_type": ...
"content" ...
}
}
Blob description should follow the same scheme which is used as blob upload request body:
After config upload, the config contents should be transformed to the form before upload:
{
"config-key": {
"in-blob": <blob dhash (sha256)>
}
}
Additional constraints:
in-blob
upload without Capabilities.adding_blobs
should result in rejection with HTTP 403 errorin-blob
value which doesn't follow the TextBlobSchema
should result in rejection with HTTP 400 errormwdblib.util.config_dhash
utility should handle that case by evaluating sha256 of blob contents and transforming config to "in-blob": <contents sha256>
form before the actual hash computingChange version in package.json to the latest and check whether there is no regression and everything still works.
Warning: componentWillReceiveProps has been renamed, and is not recommended for use. See https://fb.me/react-unsafe-component-lifecycles for details.
* Move data fetching code or side effects to componentDidUpdate.
* If you're updating state whenever props change, refactor your code to use memoization techniques or move it to static getDerivedStateFromProps. Learn more at: https://fb.me/react-derived-state
* Rename componentWillReceiveProps to UNSAFE_componentWillReceiveProps to suppress this warning in non-strict mode. In React 17.x, only the UNSAFE_ name will work. To rename all deprecated lifecycles to their new names, you can run `npx react-codemod rename-unsafe-lifecycles` in your project source folder.
Please update the following components: Switch
react-dom.development.js:12449
Related issue: remix-run/react-router#6871
Moved from CERT.pl internal repository. Reported originally by psrok1
Handle parent:
and child:
selectors in search, accepting FieldGroup with subquery.
Example syntax:
tag:"document:win32:xls" AND parent:(tag:emotet AND upload_time:[2020-01-01 TO 2020-01-04])
which looks for objects tagged as document:win32:xls
having parent tagged as emotet
and uploaded between 1st and 4th January 2020.
AFAIK that's not the typical use case of Lucene syntax but seems to be correctly handled by Luqum used in Malwarecage.
In [1]: parser.parse("parent:(tag:emotet AND upload_time:[2020-01-01 TO 2020-01-04])")
Out[1]: SearchField('parent', FieldGroup(AndOperation(SearchField('tag', Word('emotet')), SearchField('upload_time', Range(Word('2020-01-01'), Word('2020-01-04'))))))
Next bugged thing which doesn't resolve correctly after expired session:
Redirects to: https://mwdb.cert.pl/relations
Moved from CERT.pl internal repository. Reported originally by pp
Current permission model involves groups that allow users to share samples and all related artifacts within one of their own groups. That's how public feed works - each user is member of "public" group, so everybody can share some objects with all users.
This feature can be used to create groups for organizations that want to have common workspace in Malwarecage. Unfortunately, all group management features and knowledge about group members can be accessed only by Malwarecage administrator (needs manage_users
capability) which limits the feature usability for external users.
Proposed improvements:
Action required
notification via Malwarecage. Maybe we should optionally allow users to register custom workspaces?Moved from CERT.pl internal repository. Reported originally by psrok1
Write first test that authenticates in Malwarecage and checks whether Logged as: <login>
is showed.
Moved from CERT.pl internal repository. Reported originally by icedevml
We already have some limitation involved, but it's just hardcoded in one of nginx configurations:
client_max_body_size 50M;
https://github.com/CERT-Polska/malwarecage/blob/master/malwarefront/default.conf.template#L6
Moved from CERT.pl internal repository. Reported originally by psrok1
Environment information
current master (17705d3)
Behaviour the bug (what happened?)
Screenshots
There is a huge mess in API documentation, schemas and validation that must be address before the stable release.
Refactored endpoint groups
REST API refactor checklist
authenticated_access
, use Object.access
directly and raise NotFound if necessary with appropriate messagePost refactor
authenticated_access
Endpoint /events
which notifies about new objects/tags/elements in mwdb respecting user's workspace view (groups) and permissions.
Feature is limited by fact that we're using synchronous uWSGI backend which doesn't work well with websockets/event-stream.
Implementation proposal by @icedevml:
Moved from CERT.pl internal repository. Reported originally by psrok1
Description
Notice: this PR should be moved to Karton-MWDB Plugin once such repository will be set up
When the Karton-MWDB plugin is installed, in order to get info about the status of tasks, MWDB tries to communicate with Karton dashboard. By default, no such env variable exists.
It would be great to document that such env variable can be set using the mwdb-vars.env
$ cat mwdb-vars.env
MALWARECAGE_REDIS_URI=redis://redis/
...
...
KARTON_DASHBOARD_URL=http://<KARTON-DASHBOARD-URL>:5000/
Currently we're getting ssdeep directly from Github during Docker build. Maybe it would be better to use BUILD_LIB
option proposed by the library author along with libfuzzy-dev
installed.
Reference: https://python-ssdeep.readthedocs.io/en/latest/installation.html
Moved from CERT.pl internal repository. Reported originally by psrok1
Currently uploader:<login>
query allows only to search our own uploads if we are not an administrator (manage_users
capability required).
python-magic has only two methods for providing input: from_file
(filename) and from_buffer
(bytes).
Because we don't have any guarantee that file.stream is named file, we probably need to make NamedTemporaryFile on our own.
Moved from CERT.pl internal repository. Reported originally by psrok1
Direct encodeURIComponent doesn't work because of this bug: remix-run/history#505
We need to:
encodeSearchQuery
from @malwarefront/helpers
that encodes queries twiceBug is fixed in history 5.0.0 (react-router dependency, remix-run/history#689 , remix-run/history#656)
Fix some long-standing issues, update browser support level, and introduce a few major new features:
...
Stop decoding pathnames (#656)
Unfortunately, v5.0.0 is still beta so we need to use our workaround until stable release.
Reference: remix-run/react-router#7173
Moved from CERT.pl internal repository. Reported originally by psrok1
Current regex: ^[A-Za-z0-9_-]{1,32}$
It means that:
_
_-__-____---___
S
This is not really a bug, but can be confusing. Created with proposal status to be discussed.
Moved from CERT.pl internal repository. Reported originally by psrok1
Feature Category
Describe the problem
Describe the solution you'd like
malwarecage.ini
) with JSON as a defaultBehaviour the bug (what happened?)
Expected behaviour
400 Bad Request
when password is too long and will be truncated./auth/login
and /auth/change_password
Passwords are UTF-8 encoded so limitation must be provided for bytes, not chars.
Reproduction Steps
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaś
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaź
are accepted during logon (the same multibyte prefix for last char)Additional context
Reported by nsty.
Feature Category
Describe the problem
admin
account has access_all_objects
capability enabled. To share all objects with other users in organisation, administrator must create new group with access_all_objects
enabled before any object is added or reupload/exclusively share all previously added objects.Describe the solution you'd like
access_all_objects
capability enabledRelated tickets:
ReactDOM.render
Possible solutions:
On LOGOUT
action - navPath
is remembered in redux state to be recovered on LOGIN_SUCCESS
.
But something goes wrong with that. It works when user log out manually, but expired session is probably triggering this bug, occured when using different browser that wasn't used for a while
Moved from CERT.pl internal repository. Reported originally by chivay
Feature Category
Describe the problem
When no plugins are installed, an empty table is shown in the about page.
Describe the solution you'd like
Instead of an empty table, write "No plugins are installed. Visit our documentation[link] to learn about Malwarecgae plugins and how they can be used and installed."
Describe alternatives you've considered
Show nothing
Currently we can provide some attributes (known also as "metakeys") along with object contents during upload. They can be used to provide additional information in the same transaction as object upload so they can be used by plugins.
It would be nice to have the same feature available for tags.
Moved from CERT.pl internal repository. Reported originally by psrok1
It's really uneffective for huge graphs and can quickly lead to rate limit expiration.
Moved from CERT.pl internal repository. Reported originally by psrok1
Links are broken on tags.
clicking results with
react-dom.production.min.js:957 Uncaught TypeError: e.props.tagClick is not a function
at onClick (Tag.js:27)
at onClick (react-router-dom.js:133)
at Object.<anonymous> (react-dom.production.min.js:33)
at h (react-dom.production.min.js:53)
at react-dom.production.min.js:57
at m (react-dom.production.min.js:77)
at at (react-dom.production.min.js:942)
at it (react-dom.production.min.js:931)
at st (react-dom.production.min.js:955)
at pt (react-dom.production.min.js:1041)
Allow to query for samples uploaded by multiple groups to provide something like "reputation score" based on number of uploads. This can be used to hunt for interesting campaigns.
Moved from CERT.pl internal repository. Reported originally by nazywam
Great examples can be found in Elasticsearch Lucene query documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_ranges
*
(count:[10 TO *]
) to make an unbounded range: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_ranges>=
operator syntaxage:>10
age:>=10
age:<10
age:<=10
Proposed by @ITAYC0HEN
Feature Category
Related issue #36
Describe the problem
Plugins don't identify themselves in Malwarecage. They doesn't inform about the status and plugin version.
Plugin can deliver both backend and frontend extension. Some plugins are frontend-only. Frontend extensions are built into the frontend bundle at compile time. Backend extensions are loaded run-time.
Backend plugin entrypoint (__init__.py
) usually looks like this:
from plugin_engine import PluginAppContext
from .resources import MQueryResource
from .schema import MQueryYaraJobSchema
def entrypoint(app_context: PluginAppContext):
# Initialization code which also could be "first installation" code
app_context.register_resource(MQueryResource, "/mquery/search")
app_context.register_schema_spec("MQueryYaraJobSchema", MQueryYaraJobSchema)
...
__plugin_entrypoint__ = entrypoint
Describe the solution you'd like
Plugin should provide its author/version/docstring information via module-level dunder names (__author__
, __version__
and __doc__
like in __plugin_entrypoint__
case)
"""MQuery integration plugin"""
from plugin_engine import PluginAppContext
from .resources import MQueryResource
from .schema import MQueryYaraJobSchema
__author__ = "CERT Polska"
__version__ = "1.0.0"
def entrypoint(app_context: PluginAppContext):
# Initialization code which also could be "first installation" code
app_context.register_resource(MQueryResource, "/mquery/search")
app_context.register_schema_spec("MQueryYaraJobSchema", MQueryYaraJobSchema)
...
__plugin_entrypoint__ = entrypoint
Frontend-only plugins should also deliver the information stub in __init__.py
(e.g. plugins/certInfo/__init__.py
customizing the logo and adding our Terms of Service)
"""mwdb.cert.pl logo and Terms of Service plugin"""
__author__ = "CERT Polska"
__version__ = "1.0.0"
It means that __init__.py
is now mandatory and __plugin_entrypoint__
is optional
These information should be gathered by backend plugin loader plugin_engine.load_plugins
https://github.com/CERT-Polska/malwarecage/blob/master/plugin_engine.py#L97 and exposed via global dictionary plugin_engine.loaded_plugins
(due to current code structure it would be difficult to expose it other way, good topic for future code refactor)
loaded_plugins = {
"<module name>": {
"active": bool
"author":
"description":
"version": ...
}
}
When __author__
, __description__
or __version__
is missing, None should be placed instead.
When plugin is successfully loaded, active
should be set to True
. False
otherwise. The reason of inactivity should stay visible only in logs (the most common case will be probably a lack of configuration).
Complete plugin information (from plugin_engine.loaded_plugins
should be exposed via /server
endpoint (ServerInfoResource
, https://github.com/CERT-Polska/malwarecage/blob/master/resources/server.py#L28)
The plugin information should be exposed in web app's /about
view under the banner.
Keep in mind that /server
is already loaded into the Redux state as "client-side config" (https://github.com/CERT-Polska/malwarecage/blob/master/malwarefront/src/commons/config/config.js) and used to fetch Malwarecage version.
Feature Category
Describe the problem
It is very interesting to see if some of the values in the config are mutual to different samples or even different families. For example, I'd like to click on an IP from QBot's config and see that it reuses an IP from previous Emotet campaigns.
Currently, the only way to see this is when clicking on a config-item it generates a query. The query is too specific (e.g it contains the field name and not only the value, which can cause conflicts with different namings like "C&C" and "remote_server").
Describe the solution you'd like
There are several options. The first one in to visualize the relationships via graph.
Second, is to have a little badge with a number of configs that has the same value and upon click, it will be expanded to graph\search
Describe the problem
Currently, due to docker limitations (COPY) the existence and the location requirements file of the plugin isn't documented.
For example, for the karton plugin, one need to have a requirements-karton.txt
in malwarecage/plugins
.
Describe the solution you'd like
Document the requirement for the plugin's requirement file to be in malwarecage/plugins
folder and not in the malwarecage/plugins/plugin_name/
folder.
Describe alternatives you've considered
Ignoring caching plugin requirements and building them each time
Add the ability to filter by the sample quality. Sample quality may be computed for example as max(uploader.feed_quality)
for all the people that uploaded the file.
Moved from CERT.pl internal repository. Reported originally by msm
Description
If a user wants to update the permissions of a group, they need to click on the "update" button. This is confusing because the UI shows one state while in reality, without clicking the button, this state is lost.
Alternatives:
An alternative approach can be that when the "update" button is grayed out until the changes are made. Then, the button will turn blue and some screen indication will appear to remind the user to save. In addition, if the user left the page with unsaved changes, malwarecage will trigger an "are you sure?" pop-up
Currently ShowObjectPresenter.setCurrentTab
looks like this:
setCurrentTab(tab, subtab) {
let pathElements = this.props.history.location.pathname.split("/");
let newPath = pathElements.slice(0, 3).concat([tab]).concat(subtab ? [subtab] : []).join("/");
this.props.history.replace(newPath);
}
We can use Link
component to provide transition between tabs instead of rewriting this.props.history
manually.
Moved from CERT.pl internal repository. Reported originally by psrok1
Feature Category
Describe the problem
If required capability is missing, Malwarecage returns 403 with "You are not permitted to perform this action" without informing which permission is needed.
Describe the solution you'd like
Add more details to the error with name (or friendly name) of required capability.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.