cloudnativegame / aigc-gateway Goto Github PK
View Code? Open in Web Editor NEWA user gateway that provides serverless AIGC experience.
License: Apache License 2.0
A user gateway that provides serverless AIGC experience.
License: Apache License 2.0
现在web前端界面,当点击INSTALL、PAUSE、RESUME按钮之后,界面没有任何变化,只能通过刷新来改变界面状态,交互体验较差,且很容易让用户误以为操作没有成功从而重复操作。
期望在按钮点击之后,界面能即使自动相应更新,比如出现操作中的状态提示,同时避免用户重复点击。
This project aims to address the resource management issues of AIGC instances by providing an AIGC serverless gateway based on the auto-scaling feature of cloud-native architecture.
The gateway has the following features:
本项目着眼于AIGC实例资源管理问题,基于云原生架构下自动伸缩特性,提供一个AIGC serverless网关。
该网关具备如下特点:
As shown in the figure below, AIGC-Gateway supports managing multiple AIGC model collections. When user A logs in and requests the gateway, the gateway returns the corresponding access endpoint of the instance for the user to connect and use.
以下图为例,AIGC-Gateway支持管理多个AIGC模型集合,用户登录时请求网关并选择使用模型,网关将返回实例对应访问端点,供用户连接使用。
As shown in the figure below, when a new user B logs in and selects an AIGC model, the gateway calls the instance collection interface to create a new instance for the user.
以下图为例,新用户B登录时选择AIGC模型,网关调用实例集合接口为用户创建对应的新实例。
As shown in the figure below, when an old user A logs out or the session expires, the gateway calls the instance collection interface to delete the instance accordingly, releasing computing resources while preserving storage resources.
以下图为例,老用户A登出或session过期,网关调用实例集合接口,定向缩容实例使计算资源释放,同时保留存储资源。
As shown in the figure below, when an old user A logs in and selects an AIGC model, the gateway calls the instance collection interface to create a corresponding instance for user A. The instance name and access endpoint remain consistent with the previous settings, and the mounted persistent storage disk data will not be lost.
以下图为例,老用户A登录选择AIGC模型,网关调用实例集合接口为用户A创建对应的实例,实例名称与访问端点与之前保持一致,挂载的持久化存储盘数据不会丢失。
在values.yaml中增加选项,作用是生成的logto和aigc-gateway相关ingress资源使用非"/"的Path。目的是减少域名申请的数量,加快部署体验。
希望增加下面的功能或者说明。
1 为防止安装后产生的问题,希望增加
a 对Terway版本依赖的说明。并在FAQ中说下网络依赖的问题与解法(面向阿里云客户)
b 对kruise-game版本的更新 --version 0.3.1
2 用户登录后,有时会发生错误,导致无法使用模板上部署的应用。希望能提供给删除资源的功能,运行客户手动释放重建资源。
3 希望提供用户手动删除模板的方法,面向需要删除的场景。
比如客户不再使用当前模板上定义的镜像(应用程序),需要更换别的应用时,或者不使用AIGC-Gateway时,应该允许删除。
AIGC实例出现异常时,如底层计算/存储/网络资源拉起失败、或模型卡死等情况,用户需要自主选择权利,决定是否要将实例回收或重启。
新增 /restart 接口,用户可调用该接口重启容器。对应在Dashboard中增加Restart按钮,在界面实例转圈时/正常运行时可以让用户选择点击。
新增 /delete 接口,删除计算实例,以及对应的存储。对应在Dashboard中增加Uninstall按钮,在安装后的任何阶段都可以让用户选择点击。
目的是使与ai无关的gss不在dashboard展示
可以配置所在namespace以及筛选标签
line 16 in ./aigc-dashboard/src/components/engine.vue
methods: {
getData() {
this.axios.get("/resources").then((response) => {
this.items = response.data
}).catch((error) => {
this.items = [{"kind":"GameServerSet","apiVersion":"game.kruise.io/v1alpha1","metadata":{"name":"stable-diffusion-cpu","namespace":"default","uid":"198242b9-f6c8-4d33-8a80-5bbee44fa353","resourceVersion":"3534496","generation":11,"creationTimestamp":"2023-05-19T11:05:15Z","annotations":{"game.kruise.io/reserve-ids":"3,2,1","kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"game.kruise.io/v1alpha1\",\"kind\":\"GameServerSet\",\"metadata\":{\"annotations\":{},\"name\":\"stable-diffusion-cpu\",\"namespace\":\"default\"},\"spec\":{\"gameServerTemplate\":{\"spec\":{\"containers\":[{\"args\":[\"--listen\",\"--skip-torch-cuda-test\",\"--no-half\"],\"command\":[\"python3\",\"launch.py\"],\"env\":[{\"name\":\"POD_NAME\",\"valueFrom\":{\"fieldRef\":{\"apiVersion\":\"v1\",\"fieldPath\":\"metadata.name\"}}}],\"image\":\"yunqi-registry.cn-shanghai.cr.aliyuncs.com/lab/stable-diffusion:v1.0.0-cpu\",\"name\":\"stable-diffusion\",\"readinessProbe\":{\"failureThreshold\":3,\"initialDelaySeconds\":5,\"periodSeconds\":10,\"successThreshold\":1,\"tcpSocket\":{\"port\":7860},\"timeoutSeconds\":1}}]}},\"network\":{\"networkConf\":[{\"name\":\"IngressClassName\",\"value\":\"nginx\"},{\"name\":\"Port\",\"value\":\"7860\"},{\"name\":\"Host\",\"value\":\"instances\\u003cid\\u003e.c5464a5f2c39341d3b3eda6e2dd37b505.cn-hangzhou.alicontainer.com\"},{\"name\":\"PathType\",\"value\":\"ImplementationSpecific\"},{\"name\":\"Path\",\"value\":\"/\"},{\"name\":\"Annotation\",\"value\":\"nginx.ingress.kubernetes.io/auth-url: https://dashboard.c5464a5f2c39341d3b3eda6e2dd37b505.cn-hangzhou.alicontainer.com/sign-in\"},{\"name\":\"Annotation\",\"value\":\"nginx.ingress.kubernetes.io/auth-signin: https://dashboard.c5464a5f2c39341d3b3eda6e2dd37b505.cn-hangzhou.alicontainer.com/\"}],\"networkType\":\"Kubernetes-Ingress\"},\"replicas\":0,\"scaleStrategy\":{\"scaleDownStrategyType\":\"ReserveIds\"},\"updateStrategy\":{\"rollingUpdate\":{\"maxUnavailable\":\"100%\",\"podUpdatePolicy\":\"InPlaceIfPossible\"}}}}\n"},"managedFields":[{"manager":"manager","operation":"Update","apiVersion":"game.kruise.io/v1alpha1","time":"2023-05-19T11:06:04Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:availableReplicas":{},"f:currentReplicas":{},"f:labelSelector":{},"f:maintainingReplicas":{},"f:observedGeneration":{},"f:readyReplicas":{},"f:replicas":{},"f:updatedReadyReplicas":{},"f:updatedReplicas":{},"f:waitToBeDeletedReplicas":{}}},"subresource":"status"},{"manager":"kubectl-client-side-apply","operation":"Update","apiVersion":"game.kruise.io/v1alpha1","time":"2023-05-22T08:05:50Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{".":{},"f:gameServerTemplate":{".":{},"f:spec":{}},"f:network":{".":{},"f:networkConf":{},"f:networkType":{}},"f:scaleStrategy":{},"f:updateStrategy":{".":{},"f:rollingUpdate":{".":{},"f:maxUnavailable":{},"f:podUpdatePolicy":{}}}}}},{"manager":"manager","operation":"Update","apiVersion":"game.kruise.io/v1alpha1","time":"2023-05-22T08:05:50Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{"f:game.kruise.io/reserve-ids":{}}}}},{"manager":"aigc-gateway","operation":"Update","apiVersion":"game.kruise.io/v1alpha1","time":"2023-05-22T08:12:48Z","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{"f:gameServerTemplate":{"f:metadata":{".":{},"f:creationTimestamp":{}},"f:spec":{"f:containers":{}}},"f:replicas":{},"f:reserveGameServerIds":{}}}}]},"spec":{"replicas":2,"gameServerTemplate":{"metadata":{"creationTimestamp":null},"spec":{"containers":[{"name":"stable-diffusion","image":"yunqi-registry.cn-shanghai.cr.aliyuncs.com/lab/stable-diffusion:v1.0.0-cpu","command":["python3","launch.py"],"args":["--listen","--skip-torch-cuda-test","--no-half"],"env":[{"name":"POD_NAME","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.name"}}}],"resources":{},"readinessProbe":{"tcpSocket":{"port":7860},"initialDelaySeconds":5,"timeoutSeconds":1,"periodSeconds":10,"successThreshold":1,"failureThreshold":3}}]}},"reserveGameServerIds":[3,2,1],"updateStrategy":{"rollingUpdate":{"maxUnavailable":"100%","podUpdatePolicy":"InPlaceIfPossible"}},"scaleStrategy":{},"network":{"networkType":"Kubernetes-Ingress","networkConf":[{"name":"IngressClassName","value":"nginx"},{"name":"Port","value":"7860"},{"name":"Host","value":"instances\u003cid\u003e.c5464a5f2c39341d3b3eda6e2dd37b505.cn-hangzhou.alicontainer.com"},{"name":"PathType","value":"ImplementationSpecific"},{"name":"Path","value":"/"},{"name":"Annotation","value":"nginx.ingress.kubernetes.io/auth-url: https://dashboard.c5464a5f2c39341d3b3eda6e2dd37b505.cn-hangzhou.alicontainer.com/sign-in"},{"name":"Annotation","value":"nginx.ingress.kubernetes.io/auth-signin: https://dashboard.c5464a5f2c39341d3b3eda6e2dd37b505.cn-hangzhou.alicontainer.com/"}]}},"status":{"observedGeneration":11,"replicas":2,"readyReplicas":1,"availableReplicas":1,"currentReplicas":2,"updatedReplicas":2,"updatedReadyReplicas":1,"maintainingReplicas":0,"waitToBeDeletedReplicas":0,"labelSelector":"game.kruise.io/owner-gss=stable-diffusion-cpu"}}]
})
}
},
I prefer to make dashboard show nothing when OKG is not installed.
需要传递哪些用户信息,比如最基本的用户名,支持在gss级别可配置。可以配置在gss的network定义里
比如sd只希望美术能看到,音效的模型只希望做音效的人看到
aigc-gateway的dashboard界面上,点击recover后,某些情况下,在gs没马上创建出来时,获取到的状态不对,对应的界面上的button会变成install。
某些情况之一:使用webhook对相关资源进行准入控制时,webhook失败/超时
首次部署完logoto,打开首页,创建root账号,然后创建web应用,会大概率(几乎100%)发生错误。
at https://logto-admin.c55fb5feab13d4cd2bedc00cd1268eb91.ap-northeast-1.alicontainer.com/console/index.4348ad5c.js:5:61147
at new Promise ()
at r (https://logto-admin.c55fb5feab13d4cd2bedc00cd1268eb91.ap-northeast-1.alicontainer.com/console/index.4348ad5c.js:5:61053)
at c._fetch (https://logto-admin.c55fb5feab13d4cd2bedc00cd1268eb91.ap-northeast-1.alicontainer.com/console/index.4348ad5c.js:5:56192)
at async i (https://logto-admin.c55fb5feab13d4cd2bedc00cd1268eb91.ap-northeast-1.alicontainer.com/console/index.4348ad5c.js:5:50996)
at async a. [as json] (https://logto-admin.c55fb5feab13d4cd2bedc00cd1268eb91.ap-northeast-1.alicontainer.com/console/index.4348ad5c.js:5:51903)
at async https://logto-admin.c55fb5feab13d4cd2bedc00cd1268eb91.ap-northeast-1.alicontainer.com/console/index.4348ad5c.js:5:2783145
at async https://logto-admin.c55fb5feab13d4cd2bedc00cd1268eb91.ap-northeast-1.alicontainer.com/console/index.4348ad5c.js:5:340258
配置logto m2m app的方式与实际安装不符,更新文档以使用户快速体验
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.