mayhem-lab / cspot Goto Github PK

c spot run

License: Other

C 50.29% Makefile 4.47% C++ 38.02% Shell 2.23% Python 1.12% Dockerfile 0.06% PHP 0.12% HTML 0.01% Cython 0.04% CMake 1.51% TeX 2.14%

cspot's Introduction

CSPOT

Installation

Currently, CSPOT builds from source using a bash installation script on CentOS 7, CentOS 8 Stream, CentOS 9 Stream, and Ubuntu 20.04. It requires docker ce, which it will install, but to do so it will remove the docker package that ships with the distro. There are install scripts (written in bash). It would be best to read them before running them to make sure that the installation will not "corrupt" the local installation. CSPOT was originally designed to work in a virtualized environments (e.g. clouds) so it is not "gentle" with respect to software installation.

The following shows the installation procedure for CentOS 7.

git clone https://github.com/MAYHEM-Lab/cspot.git
cd cspot
sudo ./install-centos7.sh

To install CSPOT in /usr/local,

cd build
sudo ninja install

Overview

CSPOT is acronym (slightly reordered) for Serverless Platform of Things in C. It is an empirical coding experiment that amalgamates ``Serverless'' computing (i.e. Functions as a Service – FaaS) and distributed computing principles targeting Internet of Things (IoT) applications.

The goal of CSPOT is to explore the interplay between application programming abstractions, runtime systems and operating systems in a tiered cloud setting. The design presupposes that IoT sensors and actuators will communicate with computing elements at the ``edge'' of the network (e.g. edge clouds that can implement public cloud services near where data is gathered and actuation occurs). Edge computing, in turn, may need to employ resources at a regional level (e.g. a private cloud) or more globally (e.g. in a public cloud).

CSPOT attempts to layer a platform across these three tiers that supports a common set of low-level abstractions for programmers to use to construct applications. The programming model that CSPOT supports is one akin to ``Functions as a Service'' in which control-flow is event driven. Thus a CSPOT program consists of events triggered by functions applied to a distributed set of storage objects that are implemented as part of a single, common storage abstraction.

The initial implementation and its test applications use C as the programming language. However, functions execute within Linux containers making it possible to employ a mixed-language programming approach.

Current CSPOT Abstractions

The current implementation supports three primary programming abstractions: * Wide-area Objects of Functions (WOOFs) – Append-only storage objects capable of persisting data in fixed-sized elements * Handlers – Functions that may be triggered by the platform when data is appended to a WOOF * Namespaces – Collections of WOOFs that share a common name prefix

The intention is to begin with a minimal set of abstractions and API semantics and to expand as needed. Performance optimization, in particular, may drive modification and or expansion of the API semantics.

Namespaces

Each namespace defines a flat space for storing WOOFs and handler code. Each namespace is located on some host within the system. While yet to be implemented, the intention is to implement access control between namespaces. Initially, each namespace corresponds to a top level directory on a Linux host that contains related state (WOOFs and handler code).

WOOFs

A WOOF is an append-only sequence of fix-sized memory regions (called elements) managed as a circular buffer. The size of each element in a WOOF as well as the history size (the maximal number of the most recent appends to the WOOF) are specified when the WOOF is created and cannot be changed. CSPOT does not interpret the content of each memory element. However it does assign a unique 64-bit sequence number to each element when the element has been successfully added to a WOOF.

Note that the historical capacity of a WOOF is programmer-determined. When the circular buffer wraps, the oldest data in the WOOF is simply overwritten. However, the sequence number space for each WOOF does not wrap.

The CSPOT API

The current WOOF C-language API consists of the following API calls

int WooFCreate(char *woof_name, unsigned long element_size, unsigned long history_size)

creates a WooF with fixed-sized elements and specified history size
woof_name is the local or fully qualified name of the WOOF to create
element_size refers to the number of bytes in a memory region
history_size refers to the number of elements (not bytes) in the WOOF history
returns < 0 on failure

unsigned long WooFPut(char *woof_name, char *handler_name, void *element)

woof_name is the local or fully qualified name of an existing WOOF
handler_name is the name of a file in the WOOF’s namespace that contains the handler code or NULL. When handler_name is NULL, no handler will be triggered after the append of the element.
element is an in parameter that points to an memory region to be appended to the WOOF
the call returns the sequence number of the element or a representation of -1 on failure

int WooFGet(char woof_name, void element, unsigned long long seq_no)

woof_name is the local or fully qualified name of an existing WOOF
element is an in parameter that points to an memory region to be set to the contents of the element from the WOOF (i.e. an out parameter)
seq_no is the sequence number, from the WOOF, of the element to be retrieved (sequence number zero is not a valid sequence number and, thus, when specified in a call WooFGet() returns the element having the largest sequence number stored in the WOOF). If the sequence number is invalid (i.e. out of the range of sequence numbers in the WOOF) an error is returned.

void WooFInit()

allows a Linux process external to CSPOT to make called to WooFPut()
reads its parameters from environment variables that the calling process must set

This API definition is, more or less, stable. There is an internal API for implementing ``fast-path'' WOOF accesses, but it is not maintained in the current release and is definitely subject to change.

There are several features of the API that, perhaps, require some scrutiny.

First, this is the complete API (a WooFRemove() call will be included in a future release). A well-formed CSPOT program uses WOOFs as its only data structures and WooFCreate(), WooFPut(), and WooFGet() are the only operations supported for those data structures.

Secondly, only a call to WooFPut() causes a computation to be initiated. That is, CSPOT requires that program state be appended to a WOOF as a prerequisite to executing a computation. As a result, the elements stored in a program’s set of WOOFs represent the full program state in the event of failure and the program can be resumed from that state. Parsing the program state so that the program can be resumed is not currently automated.

Thirdly, handlers are concurrent and may execute out of order with respect to their invocation. Synchronization occurs when a sequence number is assigned to an element when it is appended to a WOOF. That is, a call to WooFPut() will append the element and return a sequence number as a transaction. Note that there are no primitives for synchronizing handlers beyond this transaction.

Lastly, WooFInit() is included as an optimization that allows CSPOT client applications ``join'' a namespace. By default, each WOOF is addressed by a URN and when the API code parses the WOOF name, if the name is fully qualified, the request will generate a network request and response. As a local optimization, it is possible to address WOOFs by path name, but to do so, the process must initialize the namespace state. WooFInit() is a primitive that implements this initialization.

WOOF Names

WOOF names are either interpreted locally, with respect to the namespace of the handler that is referring to them or fully qualified as a URI beginning with the string woof://''. A name must be unique within each namespace. If the prefix of the name string iswoof://'' the remainder of the string is interpreted by the current implementation as an absolute path to the WOOF on the host where it is located. If not, it is interpreted relative to the namespace path for the referring handler.

Additionally, each namespace must contain binary files carrying the handlers that can be executed on WOOFs within the namespace. The handler names and the WOOF names must not conflict.

WOOF Handlers

Each WOOF handler must have the following function signature as its top-level entry point

int HandlerName(WOOF *woof, unsigned long seq_no, void *element)

When the CSPOT runtime system invokes the handler, it will pass an opaque handle for the WOOF, the sequence number of the element that the handler is to handle, and a pointer to the element. The handler should return a value >= on success and < 0 on failure. Handlers should not persist state other than by calling WooFPut() on one or more WOOFs (possibly creating them when needed).

The CSPOT Runtime

Each WOOF is implemented as a memory-mapped file within a namespace. Handlers run within a Docker container associated with the namespace that contains them. Thus, the CSPOT platform creates a container per name space maps all WOOFs referred to in an API call into the address space of the handler making the call. Thus, it is necessary to start a platform component for each namespace. Currently each namespace platform must be started manually using the commands

woofc-namespace-platform -N path-to-namespace

The namespace platform must be executing before any puts to a namespace activate. That is, the platform is intended to function as a long running daemon that services the namespace for all applications that access WOOFs contained within it.

The namespace platform creates an internal append-only log for the namespace that the runtime uses to trigger handlers. A threaded process running within the container monitors the tail of the namespace log. When a call to WooFPut() specifies a handler, the code will append a TRIGGER record to the log indicating that a handler must be triggered. Threads within the dispatch process claim TRIGGER records exclusively (and append their claims to the log) and, once claimed, trigger the handler specified in the record.

Each container is also run with the ``-i'' option. As a result, if a handler writes to standard out or standard error, the resulting output will appear on the tty associated with the shell that launched the platform. That is, the platform aggregates the standard out and standard error file descriptors from all handlers executing in the namespace it is managing.

Because the handler is actually executing in a separate process within a namespace container, the process must execute bootstrap code to map the WOOF and pass the sequence number to the handler. As a result, the handler code must be wrapped in a C main() routine that is part of CSPOT. This main() routine is contained in the file woofc-shepherd.c.

Additionally, it is possible to issue CSPOT API calls from outside of a namespace so that CSPOT programs can communicate with external users and programs.

A call to WooFPut() or WooFGet() that specifies a fully-qualified URN will generate network message (using link::https://zeromq.org[ZeroMQ]) when the call is from an application component that is external to the namespace, or when CSPOT determines that a handler is referencing a WOOF in another namespace. It is possible to use a Linux path name to reference a WOOF, but an external process must make a call to WooFInit() before doing so to initialize the runtime environment. Handlers, however, inherit the environment in which they are to execute and, thus, need not call WooFInit().

Example Applications

A CSPOT application consists of an initial Linux process that starts the application by issuing one or more calls to WooFPut(), a set of WOOFs that the application will access, and a set of handlers that the runtime triggers optionally when data is appended to a WOOF. Each handler must be wrapped by the code contained in woofc-shepherd.c so that the API can find the internal runtime system log and also map the WOOFs referred to in any API calls. The initial process must make a call to WooFInit() after setting one or more environment variables appropriately before it attempts to issue a WooFPut() call. All of the namespace platforms must be running for the WOOFs that are mentioned in the application or the application will not execute.

Build Model

The CSPOT runtime causes the namespace containers to mount the namespace top-level directory from the host as a Docker volume. Each namespace container assumes that the handler binary is compiled for the baseline distribution used by the container (currently CentOS 7) and is present in the top-level namespace directory before it is invoked.

The example applications contained in this repo build using make and copy the binaries into the namespace. This methodology works when the Linux distribution that is used to build CSPOT is matches the baseline used in the containers (CentOS 7, at present). However, if the distribution that builds CSPOT is different than the container distribution, the in-container binaries should be built in a container, separately, so that the dynamically loaded libraries are compatible.

Hello World (cspot/apps/hello-world)

The Hello world'' application consists of a single handler which prints to the stringHello world'' and then prints a string that the initial process has appended to the WOOF. Here is the source code fpr the handler hw().

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include "woofc.h"
#include "hw.h"

int hw(WOOF *wf, unsigned long seq_no, void *ptr)
{
    HW_EL *el = (HW_EL *)ptr;
    fprintf(stdout,"hello world\n");
    fprintf(stdout,"from woof %s at %lu with string: %s\n",
                    wf->shared->filename, seq_no, el->string);
    fflush(stdout);
    return(1);

}

Note that the handler’s entry point must be a C function and that all handlers take 3 arguments: * a pointer ot a WOOF structure (defined in woofc.h) * a sequence number * a void * pointer to an element The size of the elements are defined when the WOOF is created. The header file woofc.h defines a C structure that the application uses as the type of each element in the WOOF.

#ifndef HW_H
#define HW_H
struct obj_stc
{
    char string[255];
};
typedef struct obj_stc HW_EL;
#endif

Finally, the initial start process takes a WOOF name to use, creates the WOOF (with a history size of 5), types element as an HW_EL, fills in a string, and calls WooFPut() with ``hw'' specified as a handler.

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

#include "woofc.h"
#include "hw.h"

#define ARGS "f:N:W:"
char *Usage = "hw-start -W woof_name\n\
\t-N namespace <CWD is the default>\n";

char Fname[4096];
char Wname[4096];
char NameSpace[4096];
char Namelog_dir[4096];
int UseNameSpace;

char putbuf1[1024];
char putbuf2[1024];

int main(int argc, char **argv)
{
	int c;
	int err;
	HW_EL el;
	unsigned long long ndx;

	while((c = getopt(argc,argv,ARGS)) != EOF) {
		switch(c) {
			case 'f':
			case 'W':
				strncpy(Fname,optarg,sizeof(Fname));
				break;
			case 'N':
				UseNameSpace = 1;
				strncpy(NameSpace,optarg,sizeof(NameSpace));
				break;
			default:
				fprintf(stderr,
				"unrecognized command %c\n",(char)c);
				fprintf(stderr,"%s",Usage);
				exit(1);
		}
	}

	if(Fname[0] == 0) {
		fprintf(stderr,"must specify filename for woof\n");
		fprintf(stderr,"%s",Usage);
		fflush(stderr);
		exit(1);
	}

	if(Namelog_dir[0] != 0) {
		sprintf(putbuf2,"WOOF_NAMELOG_DIR=%s",Namelog_dir);
		putenv(putbuf2);
	}

	if(UseNameSpace == 1) {
		sprintf(Wname,"woof://%s/%s",NameSpace,Fname);
		sprintf(putbuf1,"WOOFC_DIR=%s",NameSpace);
		putenv(putbuf1);
	} else {
		strncpy(Wname,Fname,sizeof(Wname));
	}

	WooFInit(); // attach to namespace

	err = WooFCreate(Wname,sizeof(HW_EL),5); // create a WOOF
	if(err < 0) {
		fprintf(stderr,"couldn't create woof from %s\n",Wname);
		fflush(stderr);
		exit(1);
	}

	/*
	 * copy string into a structure to be stored as an element
	 * in the WOOF
	 */
	memset(el.string,0,sizeof(el.string));
	strncpy(el.string,"my first bark",sizeof(el.string));

	/*
	 * put the string in the WOOF and trigger a handler
	 */
	ndx = WooFPut(Wname,"hw",(void *)&el);

	if(WooFInvalid(err)) {
		fprintf(stderr,"first WooFPut failed for %s\n",Wname);
		fflush(stderr);
		exit(1);
	}

	printf("successfully appended %s to %s at seq_no %llu\n",
		"my first bark",
		Wname,
		ndx);

	return(0);
}

The code for this application is in the apps/hello-world subdirectory of the CSPOT repo.

To run ``Hello world'', first start the namespace platform for the application’s namespace. Typically, the method is to copy the CSPOT runtime into a directory to use as the name space and then to copy the code (handlers and start program) to the name space. The easiest way to start the platform is to cd into the namespace on the host and to run the platform without any arguments. It will use the current working directory as the namespace in this case.

mkdir test-name-space
cp cspot/build/bin/woofc* test-name-space
cp cspot/apps/hello-world/hw-start test-name-space
cp cspot/apps/hello-world/hw test-name-space
cd spot/apps/hello-world/cspot
cd test-name-space
./woofc-namespace-platform

Once the platform is running, it will spawn a Docker container. Unfortunately, the interaction between pthreads, the Linux system command, and docker isn’t completely bug free in CentOS 7. Currently, woofc-namespace-platform can’t be terminated with a when running in the foreground. Alternatively, killing the process ID with ``kill -HUP'' will also trigger a clean up of the docker container. Any other form of termination may leave the container running which holds the port associated with the namespace.

Once the platform is running, run the application

./hw-start -W hello-woof

So, for example, if CSPOT were installed in /home/centos/cspot, the commands would be

cd /home/centos
mkdir /home/centos/test-name-space
cp /home/centos/cspot/build/bin/woofc* test-name-space
cp /home/centos/cspot/build/bin/hello-world/hw test-name-space
cp /home/centos/cspot/build/bin/hello-world/hw-start test-name-space
cp /home/centos/cspot/build/bin/hello-world/hw-client test-name-space
cd /home/centos/test-name-space
./woofc-namespace-plaotform >& namespace.log &
./hw-start hello-woof

Because the start program creates the WOOF ``hello-woof'' in this example, the WOOF name is specified as a path. If successful, in this example, the start program should have printed

successfully appended my first bark to hello-woof at seq_no 1

and the file namespace.log should contain

hello world
at 1 with string: my first bark

Because the handler prints to stdout, the output of the handler will be sent to the controlling tty of the shell that is running the platform.

To continue appending to ``hello-woof'' without recreating the woof each time, a client program (contained in cspot/apps/hello-word/hw-client) simply calls WooFPut() on the same WOOF.

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

#include "woofc.h"
#include "hw.h"

#define ARGS "f:N:W:"
char *Usage = "hw-client -W woof_name\n\
\t-N namespace <CWD is the default>\n";

char Fname[4096];
char Wname[4096];
char NameSpace[4096];
char Namelog_dir[4096];
int UseNameSpace;

char putbuf1[1024];
char putbuf2[1024];

int main(int argc, char **argv)
{
	int c;
	int err;
	HW_EL el;
	unsigned long long ndx;

	while((c = getopt(argc,argv,ARGS)) != EOF) {
		switch(c) {
			case 'f':
			case 'W':
				strncpy(Fname,optarg,sizeof(Fname));
				break;
			case 'N':
				UseNameSpace = 1;
				strncpy(NameSpace,optarg,sizeof(NameSpace));
				break;
			default:
				fprintf(stderr,
				"unrecognized command %c\n",(char)c);
				fprintf(stderr,"%s",Usage);
				exit(1);
		}
	}

	if(Fname[0] == 0) {
		fprintf(stderr,"must specify filename for woof\n");
		fprintf(stderr,"%s",Usage);
		fflush(stderr);
		exit(1);
	}

	if(Namelog_dir[0] != 0) {
		sprintf(putbuf2,"WOOF_NAMELOG_DIR=%s",Namelog_dir);
		putenv(putbuf2);
	}

	if(UseNameSpace == 1) {
		sprintf(Wname,"woof://%s/%s",NameSpace,Fname);
		sprintf(putbuf1,"WOOFC_DIR=%s",NameSpace);
		putenv(putbuf1);
	} else {
		strncpy(Wname,Fname,sizeof(Wname));
	}

	/*
	 * copy string into a structure to be stored as an element
	 * in the WOOF
	 */
	memset(el.string,0,sizeof(el.string));
	strncpy(el.string,"my second bark",sizeof(el.string));

	/*
	 * put the string in the WOOF and trigger a handler
	 */
	ndx = WooFPut(Wname,"hw",(void *)&el);

	if(WooFInvalid(err)) {
		fprintf(stderr,"first WooFPut failed for %s\n",Wname);
		fflush(stderr);
		exit(1);
	}

	printf("successfully appended %s to %s at seq_no %llu\n",
		"my second bark",
		Wname,
		ndx);

	return(0);
}

If the client is running in the same namespace, it can refer to the WOOF by a path name. Otherwise, as in the following example, the client uses a fully-qualified WOOF name.

./hw-client -W woof://127.0.0.1/home/centos/test-name-space/hello-woof

Note that of the client had been located on another machine, the IP address or DNS name of the machine hosting the namespace would be substituted for the local IP address ``127.0.0.1'' in this example.

Runs Test (cspot/apps/runs-test)

The Runs test application is intended to simulate an IoT processing pipeline. A producing handler (RHandler in the application) generates a stream of pseudo-random numbers. The next stage of the pipeline (''SHandler”) processes the stream in batches ofsample size'' (specified as the -s'' parameter) and compute the Runs test statistic for each sample. It then puts each statistic in a WOOF for the final stage of the pipeline (KHandler'') which runs a KS-test for the set of statistics against a z-transformed, empirically generated Normal distribution of the same size. The number of such samples it considers is specified by the ``-c'' parameter to the start program.

The apps/runs-test subdirectory contains several versions of this program

c-runstest.c: sequential C implementation
c-runstat.c: C implementation using pthreads and shared memory in an event-driven style
cspot-runstat: CSPOT implementation of c-runstat running in a single namespace
cspot-runstat-fast: CSPOT implementation that does not run ``RHandler'' in a container
cspot-runstat-multi-ns: CSPOT implementation of c-runstat that runs handlers in separate namespaces

On-going and Future Work

There is a lot left to do.

On Puts, Gets, Appends, and Reads

The minimalist initial API uses *WooFPut()* as the primary API abstraction for moving state between application components. This emphasis is intended to promote the use of append-only semantics in a FaaS context. For IoT, doing so will (may) make it possible to program distributed IoT applications in a FaaS style.

However, it introduces an asymmetry between writing and reading program state that may make application programming more difficult. Specifically, all reads must be namespace local (requiring a *WooFOpen()* to obtain in internal WOOF handle). Logically, no asymmetry is mandated. Thus it will be important to understand whether building it into the API is useful or confusing.

The API design also influences the performance of the system. In particular, mapping a WOOF into the memory space of a process running in a container is a performance-expensive operation under the current implementation supported by Linux. Thus, it is useful, as a programmer-controlled optimization, to allow the mapping to be reused. Because *WooFPut() takes a WOOF name, it must first map the WOOF, then do the put, and then unmap the WOOF (there are optimization possibilities here, to be sure). To make make multiple puts to the same WOOF more efficient, the API currently includes WooFAppend() which takes a handle returned from WooFOpen() (in the same way WooFRead() does) to a WOOF in the local namespace. Indeed, WooFPut() uses WooFAppend()* internally. Its implementation looks something like

unsigned long WooFPut(char *woof_name, char *handler_name, void *element)
{
   if(woof_name is a local WOOF) {
      woof = WooFOpen(woof_name);
      seq_no = WooFAppend(woof, handler_name, element);
      WooFFree(woof);
   } else {
      seq_no = send a put request to the put proxy for the WOOF's namespace
   }
   return(seq_no);
 }

I/O

I/O creates another related question that the project must investigate. In particular, it is possible for a process outside of a namespace to make a call to *WooFPut() to introduce data but without an analogous WooFGet()* call, there is no way to get data back out of a namespace. Thus the put/get API that, ultimately, is part of the prototype is richer than the minimalist API:

unsigned long *WooFPut*(char woof_name, char handler_name, void *element)
- woof_name is the local or fully qualified name of an existing WOOF
- handler_name is the name of a file in the WOOF’s namespace that contains the handler code or NULL. When handler_name is NULL, no handler will be triggered after the append of the element.
- element is an in parameter that points to an memory region to be appended to the WOOF
- the call returns the sequence number of the element or a representation of -1 on failure
- can be called from either wishing a handler or from a process outside of a namespace
int *WooFGet*(char woof_name, void element, unsigned long seq_no)
- woof_name is the local or fully qualified name of an existing WOOF
- element is an out parameter pointing to memory that will be filled in by the specified WOOF element
- seq_no is the sequence number of the element to be returned through the element pointer
- returns < 0 if the call fails to successfully return the element
- WOOF can either be in the local namespace or a remote namespace
WOOF * *WooFOpen*(char *woof_name)
- woof_name is the local or fully qualified name of an existing WOOF
- returns an opaque handle to an in-memory data structure referring to the WOOF or NULL on failure
- if the WOOF is not in the local namespace, the call fails
int WooFAppend(WOOF woof, char handler_name, void *element)
- woof is an opaque handle returned from a call to WooFOpen()
- handler_name is the name of a file in the WOOF’s namespace that contains the handler code or NULL. When handler_name is NULL, no handler will be triggered after the append of the element.
- element is an in parameter that points to an memory region to be appended to the WOOF
- the call returns the sequence number of the element or a representation of -1 on failure
- the WOOF must be in the local namespace
int *WooFRead*(WOOF woof, void element, unsigned long seq_no)
- woof is an opaque handle returned from a call to WooFOpen()
- element is an out parameter pointing to memory that will be filled in by the specified WOOF element
- seq_no is the sequence number of the element to be returned through the element pointer
- returns < 0 if the call fails to successfully return the element
void *WooFFree*(WOOF *woof)
- releases the in-memory data structure created by a call to WooFOpen()

There are two possibilities for the API, long-term. The first is that *WooFPut() and WooFGet() are symmetric meaning that they can both be called from within a handler or outside of a namespace. From an API design perspective, this option is attractive but it promotes the use of WOOFs as random access memories from a read perspective. The second option is that WooFGet()* which turns out to be necessary in some forms – see below) is restricted to be executed only outside of a handler.

The current CSPOT implementation does not restrict *WooFGet() – it is symmetric with respect to WooFPut()*. However, the applications will not use it to implement cross-namespace random access memory in an attempt to determine if it should be restricted.

*WooFGet() turns out to be necessary in order to get application state out of the application. That is, without WooFGet() the final output of an application must reside inside a namespace (as a file – not a WOOF). To get access to this state, then, the application user must have read access to the Linux directory which implements the namespace on the machine where the output is stored. Thus, it is necessary to implement an API primitive to extract application state from the various namespaces it uses (which is WooFGet() in the current API). As mentioned above, there is a question regarding whether WooFGet() should be a full-fledged CSPOT API call (symmetric with respect to WooFPut()*) or not.

To Delete or Not to Delete – a Question of Access Controls

One glaring omission from the current API is a lack of a way to destroy an existing WOOF. That’s not strictly true in the sense that *WooFCreate()* resets an existing WOOF if it already exists, thereby overwriting its original contents. However, there is currently no way to remove a WOOF permanently from a namespace.

Because WOOFs can grow and shrink (by being ``recreated'' with different sizes) the argument for a destroy API call is one regarding WOOF name conflicts within a namespace. That is, one wishes to remove a WOOF from the namespace because the name conflicts with another name. However, allowing the name to reused by a subsequent call to *WooFCreate()* simply delays the conflict resolution until the create. That is, removing a name really only needs to happen when another create wants to use the name.

This delayed binding of name conflict resolution is possible as long as the access control permissions are not associated with the WOOF name. If they are, then a *WooFCreate()* cannot resolve a name conflict since the caller may not have permission to ``take over'' the name (and thereby delete the WOOF’s contents).

It is possible to use something similar to user-group-world but then the namespace cannot be flat. That is, each user would need to be able to carve out a subtree within the namespace.

Another possibility is that namespaces carry access controls, but all WOOFs within a namespace are viewed to be part of the same trust domain. From the perspective of using messaging as an an authentication mechanism (e.g. CURVE in ZeroMQ), this option makes the most sense, but it then creates the possibility of a proliferation of namespaces.

The project must resolve this issue when determining the security model. At present, there are no authentication mechanisms or access controls implemented.

cspot's People

Contributors

Stargazers

Watchers

Forkers

fatihbakir tubbz-alt lukasbrand tekaireb cspot-pipeline

cspot's Issues

CSPOT is currently only single host

At the moment, the code does not parse WOOF URIs to determine the host. That is, regardless of whether there is a host and user specification in the URI, the code will send messages to localhost.

[REFACTOR] Separate message format from transport layer

it should be possible to create a message independently of the transport used to send the messages
this will enable things like constructing WooFPuts and WooFGets in other languages i.e. python without dependency on CSPOT.

fix is_local in WooFCreate()

WooFCreate() needs to test to see if a namespace is local. This test currently does not check to see fi the URI carries a host and, if so, compares the host IP with the local ip. Fix this.

[REFACTOR] Switch build system to CMake

zmsg_destroy() not found in compilation

For some reason, calls to zmsg_destroy() (which is documented as being part of the CZMQ API) do not resolve when a code that includes them is compiled. It is possible that CZMQ doesn't include the call any longer but one suspects it is necessary to prevent internal memory leaks. Need to figure out exactly what is happening here.

Fix shutdown

SIGINT is being intercepted in the platform so shutting down is an arduous process.

[FEATURE] handlers as shared libraries

handlers will be loaded and dynamically linked against a woof shepherd process
there will still be one shepherd process forked per handler, each shepherd process responsible for linking a different handler.
this work may also help with the eventual goal of keeping handlers running.

[BUG] investigate SELinux policy violation bug and fix it

add message timeouts

The CZMQ messaging uses REQ and REP paradigms at the client and server respectively. This messaging pattern does not include the possibility of a timeout. Should be changed to use asynchronous versions and message timeouts.

message retry and WooFPut() cross-namespace semantics

ZeroMQ recommends tearing down and re-establishing a socket connection that has timed out. The current WooFPut() semantics are that the caller gets back the sequence number of the put or an error.

With a proxy architecture for cross-namespace puts, these semantics make message retry in the event of a network partition difficult because the client can't know whether the put was lost or the reply was lost.

One way to fix this issue to allow the client to "know" whether the put "made it" is to implement a cross-namespace get that returns sequence numbers and a separate one that returns elements associated with specific sequence numbers. With these two calls a client could then use the following logic:

 get the highest sequence number
 seq_no = put(element)
 if(seq_no == ERROR) {
      get highest sequence number
      if highest sequence number didn't change {
            the put didn't make it
      } else {  // the reply was lost
            fetch the elements for the sequence numbers between first highest and this highest
            loop through elements to see if value is there
      }
 }

For now, we'll assume that ZeroMQ is being diligent about making socket connections but we'll need to fix this at some point.

[FEATURE] Cache open WooF's and recently invoked handlers

Let handlers be loaded from a shared library

Right now, we need a separate executable for each handler, which isn't strictly necessary for Cspot's operation. Being an executable isn't even part of the API, the writers of handlers is only concerned with their functions. This is the raison d'être of shared libraries.

Loading the handlers from a shared library would allow for having multiple handlers within a single binary, which would be helpful for deployment.

Add event logging

At present, only TRIGGER requests in each namespace are logged. As a result, it is possible to produce a causal order of handlers but not all WOOF updates since no log entry is generated when a put with a NULL handler executed.

Also, cross-namespace logging needs to be enabled carrying the host-log seq number in the cross namespace messages.

[WIKI] hello-world wiki outdated

Hello,

For the hello-world app, it's also necessary to include an -W argument. I've updated the workflow with the argument: https://github.com/berendeanicolae/cspot-wiki/blob/hw-wiki/Home.md
Unfortunately, the wiki cannot be updated via pull requests.

Thanks,
Nicolae

Handler runtime is single threaded

At the moment, each namespace is survived by a single process in a single container. Cross namespace messages use threads, but the main runtime is single threaded.

To fix this, the runtime must log the acceptance of a handler in a critical section so that multiple threads do not fire the same handler for a single post.

client side caching of element size

When a woof is remote, the protocol uses two messages to access it (put or get). The first message gets the element size and the second performs the operation.

Currently there is a cache (based on woof name) to cache this value. The assumption is that it is used in clients that do not persist across subsequent calls to WooFCreate() that could change the element size.

One way to address this issue is to put in some kind of init function for remote access.

CSPOT remote WooFPut segfault when target woof does not exist

When running
idx = woof.put("woof://localhost:53933/%s/%s" % (funcname, constants.WOOF_NAME), constants.HANDLER_NAME, elstr). Once the target woof has been created, the application no longer segfaults.

the backtrace from gdb is

#0  0x00007ffff6a7e510 in __memcpy_ssse3 () from /lib64/libc.so.6
#1  0x00007ffff7613a11 in zframe_new (data=0xa1f2a0, size=4294967295) at src/zframe.c:62
#2  0x000000000040bc5e in WooFMsgPut (woof_name=0x7fffe8c91270 "woof://localhost:53933/add/lambda_woof", hand_name=0x7fffee1a62e0 "awspy_lambda", element=0xa1f2a0, el_size=4294967295) at woofc-access.c:2229
#3  0x00000000004069f9 in WooFPut (wf_name=0x7fffe8c91270 "woof://localhost:53933/add/lambda_woof", hand_name=0x7fffee1a62e0 "awspy_lambda", element=0xa1f2a0) at woofc.c:685
#4  0x0000000000405458 in woof_put (self=<optimized out>, args=<optimized out>) at pywoof.c:53
#5  0x00007ffff71bdb68 in _PyCFunction_FastCallDict () from /lib64/libpython3.6m.so.1.0
#6  0x00007ffff72334f3 in call_function () from /lib64/libpython3.6m.so.1.0
(the rest of the trace continues into the depths of the python interpreter so I have omitted it)

The console output from the client (with #define DEBUG) is

func name:  add
request args:  {}
WooFCreate: opened /result-b0aa8cfc-a4d1-4cb7-8aad-0badebfdc039 with inode 1119515
string for woof: '{"function": "add", "result_woof": "result-b0aa8cfc-a4d1-4cb7-8aad-0badebfdc039", "context": null}{"a": 14,"b": 22}'
WooFPut: called woof://localhost:53933/add/lambda_woof awspy_lambda
WE SUSPECT THAT IT IS A REMOTE PUT
WooFMsgGetElSize: woof: woof://localhost:53933/add/lambda_woof trying enpoint >tcp://127.0.0.1:53933
WooFMsgGetElSize: woof: woof://localhost:53933/add/lambda_woof got new msg
WooFMsgGetElSize: woof: woof://localhost:53933/add/lambda_woof got WOOF_MSG_GET_EL_SIZE command frame frame
WooFMsgGetElSize: woof: woof://localhost:53933/add/lambda_woof got woof_name namespace frame
WooFMsgGetElSize: woof: woof://localhost:53933/add/lambda_woof added woof_name namespace to frame
WooFMsgGetElSize: woof: woof://localhost:53933/add/lambda_woof sending message to server at >tcp://127.0.0.1:53933
WooFMsgGetElSize: woof: woof://localhost:53933/add/lambda_woof recvd size: 4294967295 message from server at >tcp://127.0.0.1:53933
WooFMsgPut: woof: woof://localhost:53933/add/lambda_woof trying enpoint >tcp://127.0.0.1:53933
WooFMsgPut: woof: woof://localhost:53933/add/lambda_woof got new msg
WooFMsgPut: woof: woof://localhost:53933/add/lambda_woof got WOOF_MSG_PUT command frame frame
WooFMsgPut: woof: woof://localhost:53933/add/lambda_woof got woof_name namespace frame
WooFMsgPut: woof: woof://localhost:53933/add/lambda_woof added woof_name namespace to frame
WooFMsgPut: woof: woof://localhost:53933/add/lambda_woof got frame for handler name awspy_lambda
WooFMsgPut: woof: woof://localhost:53933/add/lambda_woof appended frame for handler name awspy_lambda
WooFMsgPut: woof: woof://localhost:53933/add/lambda_woof for new frame for 0x21baf70, size 4294967295
[1]    6481 segmentation fault  ./awsapi_client

and the output from woofc-namespace-platform is

starting platform listening to port 53933
woofc-container: started message server
WooFOpen: couldn't open woof: /add/lambda_woof
WooFProcessGetElSize: couldn't open lambda_woof (woof://localhost:53933/add/lambda_woof

[REFACTOR] Switch compiler to C++ compiler

need a handler throttle

woofc-container uses multiple threads to service the event log and (now) forks and execs each handler. If one handler does multiple puts, it is possible to create a backlog of processes within the container. Probably need a throttle that will restrict new process spawns when the backlog reaches a specified maximum. The easiest way to do this would be to add a global semaphore to woofc-container that the WooFForker() threads use to determine when it is safe to call fork()

caching open woofs

Looks like the expensive operation is a a call to WooFOpen()/WooFFree(). Caching open woofs, however, is problematic because the woof can change while the woof is cached. Only WooFCreate() can rewrite a woof and it keeps the woof state (e.g. element size and history size) in its persistent state . Reopening the woof on each use ensures that the current state is used in what ever process or container the woof is being accessed but then a rewrite causes inconsistency.

Possible fix is to delete the woof in the create if it exists. Then the cachers would cache the creation time and check to see if the creation time has changed before using a woof. Calling stat() should be faster than calling WooFOpen()/WooFClose().

enabling non CSPOT clients

Currently, a process must have a local namespace in order to make any WOOFC calls. However, if a client never uses an implied-local WOOF (i.e. always specifies a target namespace explicitly), then it need not have a local namespace platform active.

Public API should support old compilers

We have a stable C ABI for handlers and external programs. Right now, our exported cmake config tries to add C++17 flag to the handlers, which is not needed.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.