Coder Social home page Coder Social logo

controlmeta's Introduction

controlmeta

The overall idea is to have a repository for image data and metadata about the images. The image data is pretty simple, just a blob of image data and a mime type to decode it. The metadata is a bit more complex. There are many kinds of metadata we can imagine: Faces, faces associated with names. Wall clocks, cars, cigarettes, boats, meetings, relationships between people. etc. The only thing that can be said about the metadata is that it is incredibly diverse. So an initial challenge will be how to represent it all. Here is a sketch foo how we can go about that.

Persistent data model

We model the whole thing in postgresql using a hybrid relational / Json(nosql) model.

Raw data:

obid, source uri, mimetype, blob

The obid is an unique object id. The source uri is an URI (or equivalent) that points to the location where the data can unuquely be found. The blob is just the bytes of the object and the mimetype describes it.

So far so good. We will also allow, but not encourage, that the blob is empty. In that case the semantics should be to look up the source and get the bytes from there. This allows us to work with skinny databases in situations where that is useful (like testing perhaps). It is not encouraged for production use.

Metadata:

obid, metadataid, json-blob

metadataid, Human readable description

Inherent in this model there is the possibility of the json blobs having a high number of foreign keys into external data models, e.g. people (identities), various ontologies for this that or the other thing. We chose simply to say: Good for them. There will be room for consolidation, but this model simply will not care.

Task distribution

This will have to be a distributed model. The metadata annotation agents should get a task from somewhere, get the raw data and perhaps other metadata it requires, perform a calculation and then write the result to the metadata repo.

IT IS CRUCIAL THAT THIS MODEL IS IN PLACE FROM DAY ONE!

This is the one thing that we can't compromise on when it comes to scaling.

It must be possible to write small scripts in whatever to do things. These scripts should be first class citizens in the ingestion facility.

Distribution of tasks should use some queueing mechanism. Either one from amazon or perhaps 0MQ.

The tasks are then picked up, processed and the results stored back to the metadata storage. It can then be used for search & collation by known features ("tags").

REST interface

The task distribution network will have a REST interface so that it's possible to read/write all data from/to the central service. Agents should only have to communicate through REST, and optionally through whatever methods are used to access actual content (e.g. direct access to S3, or a filesystem or whatever).

Accessing media and metadata

GET /media -> Gets all the media stored. This could be a very long list indeed => XXX Return value is missing from the documentation.

POST /media -> Store a media item, assign a document ID, and return the ID => XXX Return value is missing from the documentation.

GET /media/id/{id} -> Gets the content stored in the blob Success => The document, as a document with a MIME type Nonexistant document => HTTP 404

POST /media/id/{id} -> Upload new media, get an ID back in the reply. Success => XXX MISSING Nonexistant document => HTTP 404

DELETE /media/id/{id} -> Delete a particular entry Success => XXX Missing Nonexistant document => HTTP 404

GET /media/id/{id}/metatype/{metatype} -> The JSON describing the metadata

POST /media/id/{id}/metatype/{metatype} -> The JSON describing the metadata

DELETE /media/id/{id} Success => XXX Nonexistant ID => 404

Accessing the task queue

Some ideas:

Creating and updating individual tasks .....

GET /task/id/{id} PUT /task/id/{id} GET /task/{type/{type}}/waiting/next GET /task/{type/{type}}/waiting/pick GET /task/{type/{type}}/in-progress/list GET /task/{type/{type}}/done/list GET /task/{type/{type}}/list

controlmeta's People

Contributors

la3lma avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

controlmeta's Issues

code backdoor

We discovered a malicious backdoor in the project's dependencies, affected versions are ce06894 Its malicious backdoor is the request package, controlmeta/requirements.txt file has a dependency request.

image

Even if the request has been deleted by PyPI, many mirror sites have not completely deleted this package, so it can still be installed. For example: https://mirrors.neusoft.edu.cn/pypi/web/simple/request/

Using such a mirror site to download and install this item will be vulnerable.

image

Analysis of malicious function of request package: 1.Remote download of malicious code When the request package is installed, the setup.py file in the package will be actively executed. The setup.py file contains the logic for the attacker to remotely download and execute malicious code. At the same time, the C2 domain name is encoded and obfuscated. The decrypted C2 address is: https://dexy.top/request/check.so. 2.Release the remote control Trojan and persist it The malicious code loaded remotely during the installation of the request package includes two functions: Release the remote control Trojan to the .uds folder of the current user's HOME directory. The Trojan name is _err.log (for example, /root/.uds/_err.log). The content of the _err.log remote control Trojan script is encoded and compressed by base64, which reduces the size and enhances the confrontation. Implant malicious backdoor commands in .bashrc to achieve persistence 3.Issue stealing instructions The attacker issues python secret stealing instructions through the remote control Trojan to steal sensitive information (coinbase account secret) After decrypting the stealing instruction, the function is to request the C2 service: http://dexy.top/x.pyx, and remotely load the stealing Trojan. Some of the functions of the remotely loaded secret stealing Trojan are shown below, which are used to steal browser cookies, coinbase accounts and passwords, etc.

Repair suggestion: replace request in controlmeta/requirements.txt with requests

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.