nuclio — "Serverless" for Real-Time Events and Data Processing

nuclio is a new serverless project, derived from iguazio's elastic data life-cycle management service for high-performance events and data processing. nuclio is being extended to support a large variety of event and data sources. You can use nuclio as a standalone binary (for example, for IoT devices), package it within a Docker container, or integrate it with a container orchestrator like Kubernetes.

nuclio is extremely fast. A single function instance can process hundreds of thousands of HTTP requests or data records per second. This is 10–100 times faster than some other frameworks. See Architecture Details to learn how it works.

Note: nuclio is still under development, and is not recommended for production use.

In This Document

Why Another "serverless" Project?
nuclio High-Level Architecture
Getting Started Example
Function Versioning
Function Configuration and Metadata

Why Another "serverless" Project?

We considered existing cloud and open-source serverless solutions, but none addressed our needs:

Real-time processing with minimal CPU and I/O overhead and maximum parallelism
Native integration with a large variety of data and event sources, and processing models
Abstraction of data resources from the function code, to support code portability, simplicity, and data-path acceleration
Simple debugging, regression testing, and multi-versioned CI/CD pipelines
Portability across low-power devices, laptops, on-prem clusters, and public clouds

We designed nuclio to be extendable, using a modular and layered approach. We hope many will join us in developing new modules and integrations with more event and data sources, developer tools, and cloud platforms.

nuclio High-Level Architecture

Function Processors: A processor listens on one or more event sources (for example, HTTP, Message Queue, Stream), and executes user functions with one or more parallel workers. The workers use language-specific runtimes to execute the function (via native calls, SHMEM, or shell). Processors use abstract interfaces to integrate with platform facilities for logging, monitoring, and configuration, allowing for greater portability and extensibility (such as logging to a screen, file, or log stream).

Controller: A controller accepts function and event-source specifications, invokes builders and processors through an orchestration platform (such as Kubernetes), and manages function elasticity, life cycle, and versions.

Event Sources: Functions can be invoked through a variety of event sources (such as HTTP, RabitMQ, Kafka, Kinesis, DynamoDB, iguazio v3io, or schedule), which are defined in the function specification.
Event sources are divided into several event classes (req/rep, async, stream, pooling), which define the sources' behavior.
Different event sources can plug seamlessly into the same function without sacrificing performance, allowing for portability, code reuse, and flexibility.

Data Bindings: Data-binding rules allow users to specify persistent input/output data resources to be used by the function. (Data connections are preserved between executions.) Bound data can be in the form of files, objects, records, messages etc.
The function specification may include an array of data-binding rules, each specifying the data resource and its credentials and usage parameters.
Data-binding abstraction allows using the same function with different data sources of the same type, and enables function portability.

Builder: A builder receives raw code and optional build instructions and dependencies, and generates the function artifact — a binary file or a Docker container image, which the builder can also push to a specified image repository.
The builder can run in the context of the CLI or as a separate service for automated development pipelines.

Dealer: A dealer is used with streaming and batch jobs to distribute a set of tasks or data partitions/shards among the available function instances, and guarantee that all tasks are completed successfully. For example, if a function reads from a message stream with 20 partitions, the dealer will guarantee that the partitions are distributed evenly across workers, taking into account the number of function instances and failures.

nuclio SDK: The nuclio SDK is used by function developers to write, test, and submit their code, without the need for the entire nuclio source tree.

For more information about the nuclio architecture, see Architecture.

Getting-Started Example

Following is a basic step-by-step example of using the nuclio Go (golang) SDK. For more advanced examples, see the SDK examples directory.

Get the nuclio Artifacts

Download the nuclio SDK

Download the nuclio Go SDK (nuclio-sdk) by running the following command:

go get -d github.com/nuclio/nuclio-sdk

Download or Build the nuclio CLI

Build the CLI (nuctl) from the source and add it to your path by running the following commands:

go get github.com/nuclio/nuclio/cmd/nuctl
export PATH=$PATH:$GOPATH/bin

You can find a full CLI guide here, or just run nuctl --help after installing nuctl.

Create a New Function

Create an example.go file, add code to import the nuclio SDK, and define a simple Handler() function that uses the SDK, below you can see a simple function which returns the text "Hello, World":

package handler

import (
    "github.com/nuclio/nuclio-sdk"
)

func Handler(context *nuclio.Context, event nuclio.Event) (interface{}, error) {
    return "Hello, World", nil
}

A more advanced function example below use the Event and Context interfaces to handle inputs and logs, and return a structured HTTP result instead of simple text string, this can allow you more granular control over the output.

package handler

import (
    "github.com/nuclio/nuclio-sdk"
)

func Handler(context *nuclio.Context, event nuclio.Event) (interface{}, error) {
    context.Logger.Info("Request received: %s", event.GetURL())

    return nuclio.Response{
        StatusCode:  200,
        ContentType: "application/text",
        Body: []byte("Response from handler"),
    }, nil
}

Build and Execute the Function

Use any of the following supported methods to build and execute your function.

Build and Execute the Function Locally

Build the sample Handler() function by running the following CLI command:
```
nuctl build example -p <example.go directory>
```
Advanced build options and package or binary dependencies can be specified in the build.yaml file, which is located in the root path of the source code. See the examples for sample uses, or read the builder documentation.
Run the processor locally to serve the function. The following command serves the function on port 8080:
```
docker run -p 8080:8080 example:latest
```
Then, use a browser to access the function on the specified port (8080 in this example).

Build and Execute the Function on a Kubernetes Cluster

Prepare the cluster:
1. Ensure that you have a working Kubernetes cluster and the Kubernetes CLI (kubectl). For a detailed explanation on how to properly install and configure Kubernetes, and optionally create a local Docker image registry, see the Kubernetes documentation.
2. Verify that the nuclio controller deployment is running, or use the following command to start it:
```
kubectl create -f https://raw.githubusercontent.com/nuclio/nuclio/development/hack/k8s/resources/controller.yaml
```
Build or run the function:

If you intend to create multiple function instances from the same code, you can use the build command to build the function and push the build image to a local or remote repository. You can then create different instances of the function, at any time, and specify unique parameters and environment variables for each instance by using the run command options. Alternatively, you can build and run the function by using a single run command, as demonstrated here:
```
nuctl run myfunc -p <example.go directory> -r <cluster-ip:31276>
```
For the CLI to connect to the Kubernetes cluster we need to have a Kubernetes configuration file in the default path (~/.kube/config) or set the KUBECONFIG environment variable to the right file path. You can also use the CLI -k option to point to the Kubernetes configuration file or override the default. In the above command -r <cluster-ip:31276> is passed to indicate where the function image should be pushed to. If you followed Kubernetes documentation you will have a docker registry set up in your Kubernetes cluster listening on node port 31276. You can, of course, use any another docker registry (or the docker hub), but this guide assumes the former. When a function has already been built and pushed to the repository, you can use the -i option of the run command to set the function's image path. Setting this option skips the build phase, thereby eliminating the need to specify any build parameters (such as the path or the name of the handler function).
Test your function:

Use the nuctl get command to verify your function:
```
nuctl get fu
```
Following is a sample output for this command:
```
  NAMESPACE | NAME    | VERSION |   STATE   |      LOCAL URL      | NODE PORT | REPLICAS
  default   | hello   | latest  | processed | 10.107.164.223:8080 |     31010 | 1/1
  default   | myfunc  | latest  | processed | 10.96.188.133:8080  |     31077 | 1/1
```
Use the nuclio exec command to invoke the function (by default, performs an HTTP GET):
```
nuctl exec myfunc
```
Note: Because the functions are implemented as a Custom Resource Definition (CRD) in Kubernetes, you can also create a function using the Kubernetes kubectl command-line utility and APIs — for example, by running kubectl create -f function.yaml. We recommend using the CLI, as it is more robust and includes step-by-step verification.

The nuclio controller automatically creates the Kubernetes function, pods, deployment, service, and optionally a pod auto-scaler. You can also view the status of your function by using kubectl get functions, or watch the Kubernetes deployments and services with your function name and proper labels.

To access the function, you can send HTTP requests to the exposed local or remote function service port (node port). (Specific external ports can be specified with the --port CLI option.)

Note: If you want to assign a custom API URL to your function, you can use the Kubernetes ingress resources. In future versions of nuclio, this task will be automated.

Function Versioning

By default, functions are tagged with version latest. Versions can be published and assigned aliases (for example, "production" or "beta"). Earlier versions can be viewed in the CLI, and can be managed independently. Earlier versions are immutable and cannot be modified.

To publish a function and tag it with an alias, use the nuctl update command with the --publish and --alias options, as demonstrated in the following example:

nuctl update myfunc --publish --alias prod

Function Configuration and Metadata

Like other Kubernetes resources, a function can be defined or retrieved by using a YAML or JSON function configuration file. This allows granular and reusable specification of function resources, parameters, events, and data bindings. For more details, see the function-specification documentation.

Following is a sample function YAML configuration file:

apiVersion: "nuclio.io/v1"
kind: Function
metadata:
  name: example
spec:
  image: example:latest
  replicas: 1
  env:
  - name: SOME_ENV
    value: abc

You can create functions from the YAML or JSON configuration file by specifying the -f option in the run or build CLI commands. You can also use the configuration file as a template, and override specific parameters with command-line arguments. The following example uses a function.yaml template configuration file to create a function, and explicitly sets the function name and the value of one of the environment variables (overriding the template definitions):

nuctl run myfunc -f function.yaml -e ENV_PARAM=somevalue -r <cluster-ip:31276>

The following command returns a YAML file with the full function specification and status:

nuctl get function myfunc -o yaml

gophersgang / nuclio Goto Github PK

nuclio's Introduction

nuclio — "Serverless" for Real-Time Events and Data Processing

Why Another "serverless" Project?

nuclio High-Level Architecture

Getting-Started Example

Get the nuclio Artifacts

Download the nuclio SDK

Download or Build the nuclio CLI

Create a New Function

Build and Execute the Function

Build and Execute the Function Locally

Build and Execute the Function on a Kubernetes Cluster

Function Versioning

Function Configuration and Metadata

nuclio's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent