gchq / kai Goto Github PK
View Code? Open in Web Editor NEWKai is an experimental Graph-as-a-Service framework built with the Amazon CDK
License: Apache License 2.0
Kai is an experimental Graph-as-a-Service framework built with the Amazon CDK
License: Apache License 2.0
Add an underlying persistent Database which will store data about users and their graphs.
The database needs to be indexed by username so graph owners can see which graphs they own, by REST endpoint so that we can quickly delete instances by rest endpoint. It makes sense to use a relational database for this.
Change oraclejdk8 to openjdk8
Create initial folder structure using amazon CDK.
Add a line to the docs informing the community that Kai is still very early in development and is not secure. Therefore any production use of Kai is done at their own risk.
The ingest should be carried out by lambdas which can run spark-submit jobs to the Kubernetes cluster. These lambdas should initially be developed outside of Kai and referenced via their ARN. The admins of Kai needs some way of adding ingest lambdas to the deployment. The easiest way I can think to do this is with configuration. You could do it via REST but that would require a new user pool etc.
The ingest objects should be stored in DynamoDB and should have the rough structure:
{
"name": "My Ingest Job",
"arn": "lambda arn",
"arguments": {
"inputFile": "text",
"generatorJson": "json"
}
}
A Kai user should be able to retrieve these objects (minus the arn) and a UI should be able to use the arguments and their types to render a form that the user can fill in to trigger a bulk ingest.
Punctuation characters in generated passwords for Accumulo breaks configuration file parsing.
/etc/accumulo/conf/accumulo-site.xml:37.15: StartTag: invalid element name
<value>m?<|Ppg'</value>
...
Initializing Accumulo...
[Fatal Error] accumulo-site.xml:37:15: The content of elements must consist of well-formed character data or markup.
Add a dynamic SPA which will allow users to provision and manage their Graph instances. It should use some kind of UI framework to make it easier to deliver features in the future. This should probably be either React or Angular.
All graphs currently have to have unique names as they all share the default namespace. If users could create and use different namespaces, they could re-use graph names and from an admin point of view, it will be easier to locate problems with a graph if they are searching through a namespace with 5 graph deployments in rather than 50.
It would also allow users to make use of test / dev / ref namespaces to use as a sort of environment and allow teams to keep their graphs within their own namespace.
When the cluster is torn down as the stack is deleted, the load balancers and their target groups remain. This probably happens because the alb helm chart is torn down before the graphs have unregistered.
Sometimes when you deploy a graph, one or more of the endpoints will not work. I deployed one graph and the HDFS namenode was broken, the in the next one the monitor was showing the same thing. It could be something to do with the AZ the node is deployed in but needs further investigation.
Add testing that deploys the project on AWS, makes some API calls and checks the response. These tests should check that:
As well as any new features implemented before this issue is resolved.
The add graph worker always logs a message saying gaffer was successfully deployed. Even when it wasn't. To fix this make changes to lib/workers/lambdas/add_graph.py so that this message is not printed if the deployment fails.
We use the ALB helm chart to deploy the ingress controller. However this deployment is unreliable and often fails. This is particularly frustrating as it has to happen after the EKS cluster has been deployed which takes around half an hour. This results in a lot of time wasted and makes it harder to deploy Kai.
To address this, find the underlying cause of the issue and fix it, or just deploy the Kubernetes resources separately.
Accumulo passwords are generated when the add_graph lambda is invoked, it would be better to create and store these when the stack is provisioned and change the add_graph lambda to retrieve the credentials when required.
Only allow users part of the user pool to access the REST API. The pool could also be used to determine who administrates the graph (This should be whoever created the graph initially)
Create a DynamoDB table as part of the deployment.
Integrate the Table with the existing Lambda Functions
When a graph is added (initially by providing a graphId and schema) via the APIGateway, an entry should be created in the DynamoDB table containing it's id, the owner and a URL which should take the user to the Gaffer REST API. Until #15 or some other mechanism for deploying a Gaffer instance is done, we can stub this value.
When a graph is deleted, the row in the DynamoDB table should be deleted.
When the API is queried for all the graphs, all the graphs the user owns in the DynamoDB table should be returned.
Add clients for Kai that will interact with the deployed infrastructure via the REST API. These should be aimed at developers and data scientists who want to programatically interact with Kai. As a first step, a python client should be developed and if developers want different languages, we can add those later down the line.
Subtask of #6
Add a Lambda which starts a Gaffer deployment in EKS. It should use the Gaffer Helm chart defined in Gaffer Docker. It should return as soon as the request has been sent and provide some mechanism for tracking the progress of the deployment.
When extra security groups are not set, the deployment fails as the environment cannot contain a null value.
This is due to line 43 in worker.ts which is extra_security_groups: extraSecurityGroups == "" ? null : extraSecurityGroups
The desired behaviour should be that if the context variable is an empty string, the environment variable is not set.
Create an API Gateway (Can be stubbed for now). The endpoints it must support initially are:
GET /graphs - Get all the graphs the user owns
POST /graph - Start a graph in AWS. Data in the post request must include
DELETE /graph/ deletes the graph at a given URL.
At the moment we use the graphId as the Helm release name when deploying a Gaffer graph. Unfortunately, this means that all graphIds have to be lowercase alphanumerical strings. We could have them linked - so have a uniqueId which is basically a lowercased graphId that we use as a release name. We would have to use this uniqueId as the primary key in the graph table to stop users having releaseIds which conflict.
Add continuous integration with a framework of your choice. Typically we use Travis CI.
This should:
Similar to behaviour of application load-balancers described in gh-26, EBS Volumes provisioned when graphs are deployed into a cluster are orphaned when the cluster is torn down.
Upon successful deployment of a Gaffer Graph, the Graph object in the backend table should be updated with the various endpoints that get created by the application load balancer. These include:
The urls for these interfaces can be found by running a kubectl get ing
command which should be replicated in the Graph deployment lambda.
Add a CONTRIBUTING.md file similar to the one in Gaffer Docker which will outline guidelines for contributing.
Some lint warnings were introduced by #12. To see warnings run npm run lint
as indicated in the README
Currently the warnings are as follows:
lib/app-stack.ts
33:15 warning 'userPool' is assigned a value but never used @typescript-eslint/no-unused-vars
lib/authentication/user-pool.ts
46:49 warning Forbidden non-null assertion @typescript-eslint/no-non-null-assertion
53:59 warning Forbidden non-null assertion @typescript-eslint/no-non-null-assertion
56:33 warning Forbidden non-null assertion @typescript-eslint/no-non-null-assertion
63:52 warning Forbidden non-null assertion @typescript-eslint/no-non-null-assertion
test/authentication/user-pool-config.test.ts
17:13 warning 'cdk' is defined but never used @typescript-eslint/no-unused-vars
18:13 warning 'cognito' is defined but never used @typescript-eslint/no-unused-vars
test/authentication/user-pool.test.ts
19:13 warning 'cognito' is defined but never used @typescript-eslint/no-unused-vars
The no-unused-vars warnings can be solved by removing the variable/constant they are assigned to and just running the constructor. The no-non-null-assertion ones are fixed by doing a null check and handling it appropriately. It looks as though they have been checked already by the fromConfig function, in which case they can be temporarily disabled.
To encourage collaboration some of the hard coded names should be removed from the deployment. These include:
this.node.uniqueId + "API"
would workAdding a graph through the reset API is reporting success but the application ingress load-balancers for the Accumulo, HDFS and Gaffer UI's are failing to create. Logs from the ingress controller pod show there is a permissions problem for the role:
E0630 09:47:45.838990 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to find existing LoadBalancer due to AccessDenied: User: arn:aws:sts::01234567890123456789:assumed-role/KaiStackm29827-GraphPlatformEksClusterNodegroupgra-G22JPFJROJLR/i-0187f6c77f1c6a4e8 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers\n\tstatus code: 403, request id: 05b649bf-2e19-4e95-9a12-b045f9c79700" "controller"="alb-ingress-controller" "request"={"Namespace":"default","Name":"test-hdfs"}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.