sogno-platform / clonemap Goto Github PK
View Code? Open in Web Editor NEWcloud-native Multi-Agent Platform
License: Other
cloud-native Multi-Agent Platform
License: Other
When an agent is deleted, the list of agents from the corresponding agency is not updated.
This prevents new agents to be scheduled to this agency if it has already reached the maximum number of agents per agency.
While trying to terminate single agents or full MAS, I found that deleting something does not work.
Two examples:
I put the request requests.delete("localhost:30009/api/clonemap/mas/2/agents/4")
and get a response 200.
The corresponding agent received the request: 2022-01-11 07:40:01,106 - [INFO] - Agency: Received Request: DELETE /api/agency/agents/4
Afterwards, I can still access http://localhost:30009/api/clonemap/mas/2/agents/4
and get all data. Also, the docker container is not stopped.
The docker container is stopped and the agent with id 4 is deleted.
I put the request requests.delete("localhost:30009/api/clonemap/mas/0")
and get a response 200.
Before this request, running docker ps -a
yields
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d609977c113c testdizwionclonemap "python -u agentlib/…" 9 seconds ago Up 8 seconds mas-0-im-0-agency-5.mas0agencies
1357070d0d84 testdizwionclonemap "python -u agentlib/…" 10 seconds ago Up 9 seconds mas-0-im-0-agency-4.mas0agencies
19410bb7b002 testdizwionclonemap "python -u agentlib/…" 10 seconds ago Up 10 seconds mas-0-im-0-agency-3.mas0agencies
2c8b3ebd44d9 testdizwionclonemap "python -u agentlib/…" 11 seconds ago Up 11 seconds mas-0-im-0-agency-2.mas0agencies
147a352a9b38 testdizwionclonemap "python -u agentlib/…" 14 seconds ago Up 13 seconds mas-0-im-0-agency-1.mas0agencies
d08e9659a79c testdizwionclonemap "python -u agentlib/…" 15 seconds ago Up 14 seconds mas-0-im-0-agency-0.mas0agencies
3335807b57f4 eclipse-mosquitto:1.6.13 "/docker-entrypoint.…" About a minute ago Up About a minute 0.0.0.0:30883->1883/tcp, :::30883->1883/tcp clonemap_mqtt_1
2d25e956e033 registry.git.rwth-aachen.de/acs/public/cloud/mas/clonemap/ams "./ams" About a minute ago Up About a minute 0.0.0.0:30009->9000/tcp, :::30009->9000/tcp clonemap_ams_1
9d106a3448d4 registry.git.rwth-aachen.de/acs/public/cloud/mas/clonemap/clonemap_local "docker-entrypoint.s…" About a minute ago Up About a minute 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp kubestub
The MAS logs the following (I also added two GET requests):
kubestub | Received Request: DELETE /api/container/0
mqtt_1 | 1641887899: Socket error on client auto-8C05EB97-C8E6-E789-1157-8A4A1424425B, disconnecting.
mqtt_1 | 1641887902: Socket error on client auto-B97DC532-C70A-32B5-5673-5915E5B0AFBB, disconnecting.
mqtt_1 | 1641887904: Socket error on client auto-DFE8AD9A-6AEE-7665-8904-77E3931471C9, disconnecting.
mqtt_1 | 1641887907: Socket error on client auto-3A860F08-4444-28F1-5EF2-A13277D1F795, disconnecting.
ams_1 | [INFO] 2022/01/11 07:59:57 Received Request: GET /api/clonemap/mas/1
ams_1 | [ERROR] 2022/01/11 07:59:57 /api/clonemap/mas/1 MAS does not exist
ams_1 | [INFO] 2022/01/11 08:00:00 Received Request: GET /api/clonemap/mas/0
docker ps -a
yields the correct statement, i.e. all agent containers are removed:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3335807b57f4 eclipse-mosquitto:1.6.13 "/docker-entrypoint.…" 2 minutes ago Up 2 minutes 0.0.0.0:30883->1883/tcp, :::30883->1883/tcp clonemap_mqtt_1
2d25e956e033 registry.git.rwth-aachen.de/acs/public/cloud/mas/clonemap/ams "./ams" 2 minutes ago Up 2 minutes 0.0.0.0:30009->9000/tcp, :::30009->9000/tcp clonemap_ams_1
9d106a3448d4 registry.git.rwth-aachen.de/acs/public/cloud/mas/clonemap/clonemap_local "docker-entrypoint.s…" 2 minutes ago Up 2 minutes 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp kubestub
However, the GET request for MAS 0 still gives me all the data.
Also, if I post a new MAS, the ID 0 is not used but instead, ID 1 is created. I think this results from the fact that GET does not return MAS does not exist
.
The MAS is fully deleted and I can re-use ID 0
@kwe712 @s-daehling :
Could you comment on this error and tell me if it's a bug or actually a feature?
As you can see, I am running it locally and using the following docker-compose:
version: "3.7"
services:
kubestub:
container_name: kubestub
hostname: kubestub
image: registry.git.rwth-aachen.de/acs/public/cloud/mas/clonemap/clonemap_local
environment:
CLONEMAP_LOG_LEVEL: ${CLONEMAP_LOG_LEVEL}
volumes:
- /var/run/docker.sock:/var/run/docker.sock
ports:
- 8000:8000
networks:
- clonemap-net
stop_grace_period: 30s
ams:
image: registry.git.rwth-aachen.de/acs/public/cloud/mas/clonemap/ams
environment:
CLONEMAP_DEPLOYMENT_TYPE: ${CLONEMAP_DEPLOYMENT_TYPE}
CLONEMAP_STORAGE_TYPE: ${CLONEMAP_STORAGE_TYPE}
CLONEMAP_LOG_LEVEL: ${CLONEMAP_LOG_LEVEL}
CLONEMAP_STUB_HOSTNAME: ${CLONEMAP_STUB_HOSTNAME}
ports:
- 30009:9000
depends_on:
- kubestub
networks:
- clonemap-net
mqtt:
image: eclipse-mosquitto:1.6.13
ports:
- 30883:1883
depends_on:
- kubestub
- ams
networks:
- clonemap-net
networks:
clonemap-net:
name: clonemap-net
attachable: true
with .env
CLONEMAP_STORAGE_TYPE=local
CLONEMAP_DEPLOYMENT_TYPE=local
# mqtt
CLONEMAP_LOG_LEVEL=info
# fe kubestub
CLONEMAP_STUB_HOSTNAME=kubestub
CLONEMAP_MODULE_MQTT=true
CLONEMAP_MODULE_FRONTEND=true
Starting an agentlib MAS works fine with the docker-compose version, but (for me) fails with kubernetes.
"config":{
"name":"test",
"agentsperagency":1,
"mqtt":{
"active":true
},
"df":{
"active":false
},
"logger":{
"active":true,
"msg":true,
"app":true,
"status":true,
"debug":true
}
},
"imagegroups":[
{
"config": {
"image":"agentlibtest"
},
"agents":[...]
}
],
"graph":{
"node":null,
"edge":null
}
}
running the same example with docker-compose, the image gets pulled from registry.git-ce.rwth-aachen.de/ebc/projects/ebc_acs0017_bmwi_agent/agents_python/agentlib:latest
just fine. With kubernetes, it results in an ErrImagePull
, even if the image is already locally pulled and regardless of whether image-pull-policy is set to "Never" or "Always".
Describe the bug
The clonemap_local kubestub container exits with error code 137, without any meaningful error indicator.
The agencies are not terminated correctly, and all other containers in the docker-compose exit with error code 2.
Expected behavior
Indicate what caused the container to shut down.
Additional content
Logs of kubestub:
Caught sig: terminated
Stop Agency Container mas-0-im-0-agency-0
I have 3 MAS running, from the code I would expect three print statements for agency shutdown.
func (stub *LocalStub) terminate(gracefulStop chan os.Signal) {
var err error
sig := <-gracefulStop
fmt.Printf("Caught sig: %+v\n", sig)
for i := range stub.agencies {
agencyName := "mas-" + strconv.Itoa(stub.agencies[i].MASID) + "-im-" +
strconv.Itoa(stub.agencies[i].ImageGroupID) + "-agency-" +
strconv.Itoa(stub.agencies[i].AgencyID)
fmt.Println("Stop Agency Container " + agencyName)
err = stub.deleteAgency(stub.agencies[i].MASID, stub.agencies[i].ImageGroupID,
stub.agencies[i].AgencyID)
if err != nil {
fmt.Println(err)
// os.Exit(0)
}
}
Container state:
State
Dead false
Error
ExitCode 137
FinishedAt 2023-11-12T14:02:04.413342133Z
OOMKilled false
Paused false
Pid 0
Restarting false
Running false
StartedAt 2023-11-09T17:09:15.202650758Z
Status exited
Smaller things that have been accumulating
The Custom Resource Definition for the deployment of a etcd cluster on k8s is not available anymore. Moreover, the etcd-operator is deprecated
The function LocalStub.createAgency
creates an agency, using go's exec
package. In building the command for starting the agency's container, the name of a configured image is used without being checked first. This could make code injection possible if an image name is chosen containing ;
, followed by any command. This command will be executed on the kubestub container.
Depending on how it is set, the same problem may be caused by the log level setting
Finalize the first version of the WebUI and add documentation for its usage
if an image given in the imagegroups
config is nonexistent, clonemap (on the docker version) gives an exit code 125. giving back an "Image not found" message or similar instead would be helpful here.
in some cases instead of the error code, clonemap does falsely report that agencies were created, but I haven't been able to reproduce that problem so far.
Current behavior: When an Agent's task()
function ends with a runtime error, it stops running without further consequence
Desired behaviour: Runtime error should be returned to agency and logged properly, and the agency should either come to a graceful stop and restart, or attempt to restart the individual agent
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.