controlmeta

The overall idea is to have a repository for image data and metadata about the images. The image data is pretty simple, just a blob of image data and a mime type to decode it. The metadata is a bit more complex. There are many kinds of metadata we can imagine: Faces, faces associated with names. Wall clocks, cars, cigarettes, boats, meetings, relationships between people. etc. The only thing that can be said about the metadata is that it is incredibly diverse. So an initial challenge will be how to represent it all. Here is a sketch foo how we can go about that.

Persistent data model

We model the whole thing in postgresql using a hybrid relational / Json(nosql) model.

Raw data:

obid, source uri, mimetype, blob

The obid is an unique object id. The source uri is an URI (or equivalent) that points to the location where the data can unuquely be found. The blob is just the bytes of the object and the mimetype describes it.

So far so good. We will also allow, but not encourage, that the blob is empty. In that case the semantics should be to look up the source and get the bytes from there. This allows us to work with skinny databases in situations where that is useful (like testing perhaps). It is not encouraged for production use.

Metadata:

obid, metadataid, json-blob

metadataid, Human readable description

Inherent in this model there is the possibility of the json blobs having a high number of foreign keys into external data models, e.g. people (identities), various ontologies for this that or the other thing. We chose simply to say: Good for them. There will be room for consolidation, but this model simply will not care.

Task distribution

This will have to be a distributed model. The metadata annotation agents should get a task from somewhere, get the raw data and perhaps other metadata it requires, perform a calculation and then write the result to the metadata repo.

IT IS CRUCIAL THAT THIS MODEL IS IN PLACE FROM DAY ONE!

This is the one thing that we can't compromise on when it comes to scaling.

It must be possible to write small scripts in whatever to do things. These scripts should be first class citizens in the ingestion facility.

Distribution of tasks should use some queueing mechanism. Either one from amazon or perhaps 0MQ.

The tasks are then picked up, processed and the results stored back to the metadata storage. It can then be used for search & collation by known features ("tags").

REST interface

The task distribution network will have a REST interface so that it's possible to read/write all data from/to the central service. Agents should only have to communicate through REST, and optionally through whatever methods are used to access actual content (e.g. direct access to S3, or a filesystem or whatever).

Accessing media and metadata

GET /media -> Gets all the media stored. This could be a very long list indeed => XXX Return value is missing from the documentation.

POST /media -> Store a media item, assign a document ID, and return the ID => XXX Return value is missing from the documentation.

GET /media/id/{id} -> Gets the content stored in the blob Success => The document, as a document with a MIME type Nonexistant document => HTTP 404

POST /media/id/{id} -> Upload new media, get an ID back in the reply. Success => XXX MISSING Nonexistant document => HTTP 404

DELETE /media/id/{id} -> Delete a particular entry Success => XXX Missing Nonexistant document => HTTP 404

GET /media/id/{id}/metatype/{metatype} -> The JSON describing the metadata

POST /media/id/{id}/metatype/{metatype} -> The JSON describing the metadata

DELETE /media/id/{id} Success => XXX Nonexistant ID => 404

Accessing the task queue

Some ideas:

Creating and updating individual tasks .....

GET /task/id/{id} PUT /task/id/{id} GET /task/{type/{type}}/waiting/next GET /task/{type/{type}}/waiting/pick GET /task/{type/{type}}/in-progress/list GET /task/{type/{type}}/done/list GET /task/{type/{type}}/list

code backdoor

We discovered a malicious backdoor in the project's dependencies, affected versions are ce06894 Its malicious backdoor is the request package, controlmeta/requirements.txt file has a dependency request.

Even if the request has been deleted by PyPI, many mirror sites have not completely deleted this package, so it can still be installed. For example: https://mirrors.neusoft.edu.cn/pypi/web/simple/request/

Using such a mirror site to download and install this item will be vulnerable.

Analysis of malicious function of request package: 1.Remote download of malicious code When the request package is installed, the setup.py file in the package will be actively executed. The setup.py file contains the logic for the attacker to remotely download and execute malicious code. At the same time, the C2 domain name is encoded and obfuscated. The decrypted C2 address is: https://dexy.top/request/check.so. 2.Release the remote control Trojan and persist it The malicious code loaded remotely during the installation of the request package includes two functions: Release the remote control Trojan to the .uds folder of the current user's HOME directory. The Trojan name is _err.log (for example, /root/.uds/_err.log). The content of the _err.log remote control Trojan script is encoded and compressed by base64, which reduces the size and enhances the confrontation. Implant malicious backdoor commands in .bashrc to achieve persistence 3.Issue stealing instructions The attacker issues python secret stealing instructions through the remote control Trojan to steal sensitive information (coinbase account secret) After decrypting the stealing instruction, the function is to request the C2 service: http://dexy.top/x.pyx, and remotely load the stealing Trojan. Some of the functions of the remotely loaded secret stealing Trojan are shown below, which are used to steal browser cookies, coinbase accounts and passwords, etc.

Repair suggestion: replace request in controlmeta/requirements.txt with requests

la3lma / controlmeta Goto Github PK

controlmeta's Introduction

controlmeta

Persistent data model

Raw data:

Metadata:

Task distribution

REST interface

Accessing media and metadata

Accessing the task queue

Some ideas:

controlmeta's People

Contributors

Watchers

controlmeta's Issues

code backdoor

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent