Coder Social home page Coder Social logo

gchq / kai Goto Github PK

View Code? Open in Web Editor NEW
5.0 8.0 6.0 969 KB

Kai is an experimental Graph-as-a-Service framework built with the Amazon CDK

License: Apache License 2.0

JavaScript 0.45% TypeScript 74.22% Python 25.33%

kai's Introduction

Kai

Kai is an experimental Graph as a Service application built on AWS. It uses the Amazon CDK.

The cdk.json file tells the CDK Toolkit how to execute your app.

NOTE: As Kai is currently early in development and likely subject to breaking changes, we do not advise this product be used in any production capacity. If you have an interest in using Kai in production, please watch this repository to stay updated.

Useful commands

  • npm run build compile typescript to js
  • npm run watch watch for changes and compile
  • npm run lint run the eslint style checking
  • npm run test perform the jest unit tests
  • npm run e2e run end to end jest integration tests
  • cdk deploy deploy this stack to your default AWS account/region
  • cdk diff compare deployed stack with current state
  • cdk synth emits the synthesized CloudFormation template

Configuration

Kai has a number of different properties which can be altered using the cdk.json file or by passing in context objects through the --context option

Name Type Default value Description
vpcId string "DEFAULT" The Vpc that the eks cluster will use. By default it uses the default VPC for the account you're deploying with. If this is removed, a VPC will be created. If a VPC id is specified it will use that VPC.
extraIngressSecurityGroups string "" Additional vpcs that will be added to every application load balancer that comes with a gaffer deployment. To Add multiple ones, use a comma seperated list eg "sg-xxxxxxxxx, sg-yyyyyyyyyy". The security group of the EKS cluster is automatically added.
globalTags object {} Tags that get added to every taggable resource.
clusterNodeGroup object null Configuration for the eks cluster nodegroup. See below for details.
userPoolConfiguration object null Cognito UserPool configuration. See below for details.
graphDatabaseProps object see cdk.json Configuration for the Dynamodb graph database's autoscaling.

Changing the nodegroup properties

By default, Kai ships with a nodegroup with the following parameters:

{
    "instanceType": "m3.medium",
    "minSize": 1,
    "maxSize": 10,
    "preferredSize": 2
}

These properties are changeable through the context variable: "clusterNodeGroup".

Graph Database Autoscaling

Depending on your needs, you may want to change the autoscaling properties of the Graph Database. The default properties in the cdk.json file are as follows:

{
    "graphDatabaseProps": {
      "minCapacity": 1,
      "maxCapacity": 25,
      "targetUtilizationPercent": 80
    }
}

The min and max capacity relate to amazon's read and write capacity units

These settings haven't yet been tested at production scale so may change when we do.

Cognito UserPool configuration

By default Kai uses a vanilla AWS Cognito UserPool to manage authentication with the application. The default UserPool and UserPoolClient settings can be overridden by supplying a userPoolConfiguration context option populated as shown here:

{
    "defaultPoolConfig": {
        "userPoolProps": {
            "selfSignUpEnabled": false // See below for full options
        },
        "userPoolClientOptions": {
            "disableOAuth": true // See below for full options
        }
    }
}

The full list of userPoolProps and userPoolClientOptions can be found on Amazon's docs

Alternatively a pre-configured external pool can be referenced using the following example:

{
    "externalPool": {
        "userPoolId": "myRegion_userPoolId",
        "userPoolClientId": "randomString"
    }
}

kai's People

Contributors

d47853 avatar dependabot[bot] avatar m29827 avatar macenturalxl1 avatar p3430233 avatar t11947 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kai's Issues

Add a programatic client

Add clients for Kai that will interact with the deployed infrastructure via the REST API. These should be aimed at developers and data scientists who want to programatically interact with Kai. As a first step, a python client should be developed and if developers want different languages, we can add those later down the line.

Integrate user pool with REST API

Only allow users part of the user pool to access the REST API. The pool could also be used to determine who administrates the graph (This should be whoever created the graph initially)

Decouple graph IDs from release names

At the moment we use the graphId as the Helm release name when deploying a Gaffer graph. Unfortunately, this means that all graphIds have to be lowercase alphanumerical strings. We could have them linked - so have a uniqueId which is basically a lowercased graphId that we use as a release name. We would have to use this uniqueId as the primary key in the graph table to stop users having releaseIds which conflict.

Clean up warnings in user pool and tests

Some lint warnings were introduced by #12. To see warnings run npm run lint as indicated in the README
Currently the warnings are as follows:

lib/app-stack.ts
  33:15  warning  'userPool' is assigned a value but never used  @typescript-eslint/no-unused-vars

lib/authentication/user-pool.ts
  46:49  warning  Forbidden non-null assertion  @typescript-eslint/no-non-null-assertion
  53:59  warning  Forbidden non-null assertion  @typescript-eslint/no-non-null-assertion
  56:33  warning  Forbidden non-null assertion  @typescript-eslint/no-non-null-assertion
  63:52  warning  Forbidden non-null assertion  @typescript-eslint/no-non-null-assertion

test/authentication/user-pool-config.test.ts
  17:13  warning  'cdk' is defined but never used      @typescript-eslint/no-unused-vars
  18:13  warning  'cognito' is defined but never used  @typescript-eslint/no-unused-vars

test/authentication/user-pool.test.ts
  19:13  warning  'cognito' is defined but never used  @typescript-eslint/no-unused-vars

The no-unused-vars warnings can be solved by removing the variable/constant they are assigned to and just running the constructor. The no-non-null-assertion ones are fixed by doing a null check and handling it appropriately. It looks as though they have been checked already by the fromConfig function, in which case they can be temporarily disabled.

Add endpoints to Graph objects after deployment

Upon successful deployment of a Gaffer Graph, the Graph object in the backend table should be updated with the various endpoints that get created by the application load balancer. These include:

  • The Gaffer REST service / UI
  • The Hadoop Namenode UI
  • The Accumulo Monitor

The urls for these interfaces can be found by running a kubectl get ing command which should be replicated in the Graph deployment lambda.

Add automated E2E testing

Add testing that deploys the project on AWS, makes some API calls and checks the response. These tests should check that:

  • The project deploys correctly
  • A user can create a graph
  • A user can see that their graph exists
  • A user can not add a graph twice
  • A user can delete a graph

As well as any new features implemented before this issue is resolved.

Generated passwords break Accumulo configuration file parsing.

Punctuation characters in generated passwords for Accumulo breaks configuration file parsing.

/etc/accumulo/conf/accumulo-site.xml:37.15: StartTag: invalid element name
    <value>m?<|Ppg'</value>
...
Initializing Accumulo...
[Fatal Error] accumulo-site.xml:37:15: The content of elements must consist of well-formed character data or markup.

Introduce more separation to graphs

All graphs currently have to have unique names as they all share the default namespace. If users could create and use different namespaces, they could re-use graph names and from an admin point of view, it will be easier to locate problems with a graph if they are searching through a namespace with 5 graph deployments in rather than 50.

It would also allow users to make use of test / dev / ref namespaces to use as a sort of environment and allow teams to keep their graphs within their own namespace.

Application load-balancers not created

Adding a graph through the reset API is reporting success but the application ingress load-balancers for the Accumulo, HDFS and Gaffer UI's are failing to create. Logs from the ingress controller pod show there is a permissions problem for the role:

E0630 09:47:45.838990       1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to find existing LoadBalancer due to AccessDenied: User: arn:aws:sts::01234567890123456789:assumed-role/KaiStackm29827-GraphPlatformEksClusterNodegroupgra-G22JPFJROJLR/i-0187f6c77f1c6a4e8 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers\n\tstatus code: 403, request id: 05b649bf-2e19-4e95-9a12-b045f9c79700"  "controller"="alb-ingress-controller" "request"={"Namespace":"default","Name":"test-hdfs"}

Add continuous integration

Add continuous integration with a framework of your choice. Typically we use Travis CI.

This should:

  • Build the project
  • Run all the tests

Remove EBS volumes when cluster deleted

Similar to behaviour of application load-balancers described in gh-26, EBS Volumes provisioned when graphs are deployed into a cluster are orphaned when the cluster is torn down.

Allow users to run bulk ingest to load large volumes of data into a graph

The ingest should be carried out by lambdas which can run spark-submit jobs to the Kubernetes cluster. These lambdas should initially be developed outside of Kai and referenced via their ARN. The admins of Kai needs some way of adding ingest lambdas to the deployment. The easiest way I can think to do this is with configuration. You could do it via REST but that would require a new user pool etc.

The ingest objects should be stored in DynamoDB and should have the rough structure:

{
    "name": "My Ingest Job",
    "arn": "lambda arn",
    "arguments": {
        "inputFile": "text",
        "generatorJson": "json"
    }
} 

A Kai user should be able to retrieve these objects (minus the arn) and a UI should be able to use the arguments and their types to render a form that the user can fill in to trigger a bulk ingest.

Add Basic CRUD backend using DynamoDB

Create a DynamoDB table as part of the deployment.
Integrate the Table with the existing Lambda Functions

When a graph is added (initially by providing a graphId and schema) via the APIGateway, an entry should be created in the DynamoDB table containing it's id, the owner and a URL which should take the user to the Gaffer REST API. Until #15 or some other mechanism for deploying a Gaffer instance is done, we can stub this value.

When a graph is deleted, the row in the DynamoDB table should be deleted.

When the API is queried for all the graphs, all the graphs the user owns in the DynamoDB table should be returned.

Externalise storage of graph credentials

Accumulo passwords are generated when the add_graph lambda is invoked, it would be better to create and store these when the stack is provisioned and change the add_graph lambda to retrieve the credentials when required.

503 Temporarily unavailable message on random ALB endpoints

Sometimes when you deploy a graph, one or more of the endpoints will not work. I deployed one graph and the HDFS namenode was broken, the in the next one the monitor was showing the same thing. It could be something to do with the AZ the node is deployed in but needs further investigation.

Add a UI for the app

Add a dynamic SPA which will allow users to provision and manage their Graph instances. It should use some kind of UI framework to make it easier to deliver features in the future. This should probably be either React or Angular.

Deployment fails when extra security groups are not set

When extra security groups are not set, the deployment fails as the environment cannot contain a null value.

This is due to line 43 in worker.ts which is extra_security_groups: extraSecurityGroups == "" ? null : extraSecurityGroups

The desired behaviour should be that if the context variable is an empty string, the environment variable is not set.

Remove hard coded resource names

To encourage collaboration some of the hard coded names should be removed from the deployment. These include:

  • The name of the cluster (currently this is a variable in cdk.json but it can be autogenerated)
  • The name of the REST API. This uses the id of the construct so if the ID is set to something like this.node.uniqueId + "API" would work

Improve reliability of ALB deployment

We use the ALB helm chart to deploy the ingress controller. However this deployment is unreliable and often fails. This is particularly frustrating as it has to happen after the EKS cluster has been deployed which takes around half an hour. This results in a lot of time wasted and makes it harder to deploy Kai.

To address this, find the underlying cause of the issue and fix it, or just deploy the Kubernetes resources separately.

Add an underlying database for users and graphs

Add an underlying persistent Database which will store data about users and their graphs.
The database needs to be indexed by username so graph owners can see which graphs they own, by REST endpoint so that we can quickly delete instances by rest endpoint. It makes sense to use a relational database for this.

Create API Gateway

Create an API Gateway (Can be stubbed for now). The endpoints it must support initially are:

GET /graphs - Get all the graphs the user owns

POST /graph - Start a graph in AWS. Data in the post request must include

  • The name of the graph
  • The Schema of the graph
    It should return a REST endpoint, where the graph is located.

DELETE /graph/ deletes the graph at a given URL.

Add Cognito User Pool

Add a Cognito User pool which will allow users to sign in to the app once #7 is done. It will also allow us to control which graphs are returned from a backend service such as the one implemented in #17

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.