Coder Social home page Coder Social logo

gchq / kai Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 6.0 969 KB

Kai is an experimental Graph-as-a-Service framework built with the Amazon CDK

License: Apache License 2.0

JavaScript 0.45% TypeScript 74.22% Python 25.33%

kai's People

Contributors

d47853 avatar dependabot[bot] avatar m29827 avatar macenturalxl1 avatar p3430233 avatar t11947 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kai's Issues

Add an underlying database for users and graphs

Add an underlying persistent Database which will store data about users and their graphs.
The database needs to be indexed by username so graph owners can see which graphs they own, by REST endpoint so that we can quickly delete instances by rest endpoint. It makes sense to use a relational database for this.

Add Cognito User Pool

Add a Cognito User pool which will allow users to sign in to the app once #7 is done. It will also allow us to control which graphs are returned from a backend service such as the one implemented in #17

Allow users to run bulk ingest to load large volumes of data into a graph

The ingest should be carried out by lambdas which can run spark-submit jobs to the Kubernetes cluster. These lambdas should initially be developed outside of Kai and referenced via their ARN. The admins of Kai needs some way of adding ingest lambdas to the deployment. The easiest way I can think to do this is with configuration. You could do it via REST but that would require a new user pool etc.

The ingest objects should be stored in DynamoDB and should have the rough structure:

{
    "name": "My Ingest Job",
    "arn": "lambda arn",
    "arguments": {
        "inputFile": "text",
        "generatorJson": "json"
    }
} 

A Kai user should be able to retrieve these objects (minus the arn) and a UI should be able to use the arguments and their types to render a form that the user can fill in to trigger a bulk ingest.

Generated passwords break Accumulo configuration file parsing.

Punctuation characters in generated passwords for Accumulo breaks configuration file parsing.

/etc/accumulo/conf/accumulo-site.xml:37.15: StartTag: invalid element name
    <value>m?<|Ppg'</value>
...
Initializing Accumulo...
[Fatal Error] accumulo-site.xml:37:15: The content of elements must consist of well-formed character data or markup.

Add a UI for the app

Add a dynamic SPA which will allow users to provision and manage their Graph instances. It should use some kind of UI framework to make it easier to deliver features in the future. This should probably be either React or Angular.

Introduce more separation to graphs

All graphs currently have to have unique names as they all share the default namespace. If users could create and use different namespaces, they could re-use graph names and from an admin point of view, it will be easier to locate problems with a graph if they are searching through a namespace with 5 graph deployments in rather than 50.

It would also allow users to make use of test / dev / ref namespaces to use as a sort of environment and allow teams to keep their graphs within their own namespace.

503 Temporarily unavailable message on random ALB endpoints

Sometimes when you deploy a graph, one or more of the endpoints will not work. I deployed one graph and the HDFS namenode was broken, the in the next one the monitor was showing the same thing. It could be something to do with the AZ the node is deployed in but needs further investigation.

Add automated E2E testing

Add testing that deploys the project on AWS, makes some API calls and checks the response. These tests should check that:

  • The project deploys correctly
  • A user can create a graph
  • A user can see that their graph exists
  • A user can not add a graph twice
  • A user can delete a graph

As well as any new features implemented before this issue is resolved.

Improve reliability of ALB deployment

We use the ALB helm chart to deploy the ingress controller. However this deployment is unreliable and often fails. This is particularly frustrating as it has to happen after the EKS cluster has been deployed which takes around half an hour. This results in a lot of time wasted and makes it harder to deploy Kai.

To address this, find the underlying cause of the issue and fix it, or just deploy the Kubernetes resources separately.

Externalise storage of graph credentials

Accumulo passwords are generated when the add_graph lambda is invoked, it would be better to create and store these when the stack is provisioned and change the add_graph lambda to retrieve the credentials when required.

Integrate user pool with REST API

Only allow users part of the user pool to access the REST API. The pool could also be used to determine who administrates the graph (This should be whoever created the graph initially)

Add Basic CRUD backend using DynamoDB

Create a DynamoDB table as part of the deployment.
Integrate the Table with the existing Lambda Functions

When a graph is added (initially by providing a graphId and schema) via the APIGateway, an entry should be created in the DynamoDB table containing it's id, the owner and a URL which should take the user to the Gaffer REST API. Until #15 or some other mechanism for deploying a Gaffer instance is done, we can stub this value.

When a graph is deleted, the row in the DynamoDB table should be deleted.

When the API is queried for all the graphs, all the graphs the user owns in the DynamoDB table should be returned.

Add a programatic client

Add clients for Kai that will interact with the deployed infrastructure via the REST API. These should be aimed at developers and data scientists who want to programatically interact with Kai. As a first step, a python client should be developed and if developers want different languages, we can add those later down the line.

Deployment fails when extra security groups are not set

When extra security groups are not set, the deployment fails as the environment cannot contain a null value.

This is due to line 43 in worker.ts which is extra_security_groups: extraSecurityGroups == "" ? null : extraSecurityGroups

The desired behaviour should be that if the context variable is an empty string, the environment variable is not set.

Create API Gateway

Create an API Gateway (Can be stubbed for now). The endpoints it must support initially are:

GET /graphs - Get all the graphs the user owns

POST /graph - Start a graph in AWS. Data in the post request must include

  • The name of the graph
  • The Schema of the graph
    It should return a REST endpoint, where the graph is located.

DELETE /graph/ deletes the graph at a given URL.

Decouple graph IDs from release names

At the moment we use the graphId as the Helm release name when deploying a Gaffer graph. Unfortunately, this means that all graphIds have to be lowercase alphanumerical strings. We could have them linked - so have a uniqueId which is basically a lowercased graphId that we use as a release name. We would have to use this uniqueId as the primary key in the graph table to stop users having releaseIds which conflict.

Add continuous integration

Add continuous integration with a framework of your choice. Typically we use Travis CI.

This should:

  • Build the project
  • Run all the tests

Remove EBS volumes when cluster deleted

Similar to behaviour of application load-balancers described in gh-26, EBS Volumes provisioned when graphs are deployed into a cluster are orphaned when the cluster is torn down.

Add endpoints to Graph objects after deployment

Upon successful deployment of a Gaffer Graph, the Graph object in the backend table should be updated with the various endpoints that get created by the application load balancer. These include:

  • The Gaffer REST service / UI
  • The Hadoop Namenode UI
  • The Accumulo Monitor

The urls for these interfaces can be found by running a kubectl get ing command which should be replicated in the Graph deployment lambda.

Clean up warnings in user pool and tests

Some lint warnings were introduced by #12. To see warnings run npm run lint as indicated in the README
Currently the warnings are as follows:

lib/app-stack.ts
  33:15  warning  'userPool' is assigned a value but never used  @typescript-eslint/no-unused-vars

lib/authentication/user-pool.ts
  46:49  warning  Forbidden non-null assertion  @typescript-eslint/no-non-null-assertion
  53:59  warning  Forbidden non-null assertion  @typescript-eslint/no-non-null-assertion
  56:33  warning  Forbidden non-null assertion  @typescript-eslint/no-non-null-assertion
  63:52  warning  Forbidden non-null assertion  @typescript-eslint/no-non-null-assertion

test/authentication/user-pool-config.test.ts
  17:13  warning  'cdk' is defined but never used      @typescript-eslint/no-unused-vars
  18:13  warning  'cognito' is defined but never used  @typescript-eslint/no-unused-vars

test/authentication/user-pool.test.ts
  19:13  warning  'cognito' is defined but never used  @typescript-eslint/no-unused-vars

The no-unused-vars warnings can be solved by removing the variable/constant they are assigned to and just running the constructor. The no-non-null-assertion ones are fixed by doing a null check and handling it appropriately. It looks as though they have been checked already by the fromConfig function, in which case they can be temporarily disabled.

Remove hard coded resource names

To encourage collaboration some of the hard coded names should be removed from the deployment. These include:

  • The name of the cluster (currently this is a variable in cdk.json but it can be autogenerated)
  • The name of the REST API. This uses the id of the construct so if the ID is set to something like this.node.uniqueId + "API" would work

Application load-balancers not created

Adding a graph through the reset API is reporting success but the application ingress load-balancers for the Accumulo, HDFS and Gaffer UI's are failing to create. Logs from the ingress controller pod show there is a permissions problem for the role:

E0630 09:47:45.838990       1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to find existing LoadBalancer due to AccessDenied: User: arn:aws:sts::01234567890123456789:assumed-role/KaiStackm29827-GraphPlatformEksClusterNodegroupgra-G22JPFJROJLR/i-0187f6c77f1c6a4e8 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers\n\tstatus code: 403, request id: 05b649bf-2e19-4e95-9a12-b045f9c79700"  "controller"="alb-ingress-controller" "request"={"Namespace":"default","Name":"test-hdfs"}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.