Coder Social home page Coder Social logo

industryessentials / ymir Goto Github PK

View Code? Open in Web Editor NEW
575.0 575.0 150.0 11.22 MB

YMIR, a streamlined model development product.

License: Apache License 2.0

Shell 1.08% Dockerfile 0.03% Python 54.97% Mako 0.03% HTML 0.79% TypeScript 23.65% JavaScript 12.73% Less 1.69% CSS 0.71% Go 4.31%

ymir's People

Contributors

aryalfrat avatar elliotmessi avatar fenrir-z avatar ijtljz8rm4yr avatar liule1613 avatar liuzz07 avatar phoenix-xhuang avatar pubalglib avatar rzjm avatar sun-shine6 avatar under-chaos avatar windsorhwu avatar yance-dev avatar yzbx avatar zhang-sj930104 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ymir's Issues

Model Verification:docker images only had sample_image

Describe the issue
Model Verification:docker images only had sample_image

To Reproduce
when I finished training the model(used yolov4) and wanted to verify it, the docker images only had samples

Screenshots
111
111

Env

  • OS: ubuntu:18.04
  • Ymir Version [ release-1.0.0]
  • Docker version 20.10.16, build aa7e414
  • Docker Compose version v2.5.1

Additional context
Add any other context about the problem here.

无法连接到标注平台

使用数据集,选择菜单栏标注按钮,输入邮箱,点击标注,标注平台获取不到任务,以下是报错。平台刚搭建好,是可用的,后面标注这块就不好使了。尝试了label_studio和label_free同样的现象。
`INFO : [20220621-01:10:25] invoker_cmd_base.py:83:server_invoke(): request:
{'user_id': '0001', 'repo_id': '000023', 'req_type': 1001, 'task_id': 't00000010000232949781655773825', 'task_parameters': '{"dataset_id": 125, "keywords": ["cat", "person"], "extra_url": null, "labellers": ["[email protected]"], "keep_annotations": true, "validation_dataset_id": null, "network": null, "backbone": null, "hyperparameter": null, "model_id": null, "mining_algorithm": null, "top_k": null, "generate_annotations": null, "docker_image": null, "docker_image_id": null}', 'req_create_task': {'task_type': 3, 'labeling': {'dataset_id': 't0000001000023ec62961655706852', 'labeler_accounts': ['[email protected]'], 'in_class_ids': [4, 3], 'project_name': 'label_$sample_project_None_1655706551.9790225_mining_dataset_3', 'export_annotation': True}}}
async_mode: True
work_dir: /home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825
INFO : [20220621-01:10:25] percent_log_util.py:65:write_percent_log(): writing task info to /home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825/sub_task/t00000010000232949781655773825/out/monitor.txt
t00000010000232949781655773825 1655773825.686129 0.0 2
INFO : [20220621-01:10:25] invoker_task_base.py:73:create_subtask_workdir_monitor(): task t00000010000232949781655773825 logging weights:
{'/home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825/sub_task/t00000010000232949781655773825/out/monitor.txt': 1.0}

DEBUG : [20220621-01:10:25] connectionpool.py:228:_new_conn(): Starting new HTTP connection (1): 127.0.0.1:9098
DEBUG : [20220621-01:10:25] connectionpool.py:456:make_request(): http://127.0.0.1:9098 "POST /api/v1/tasks HTTP/1.1" 200 20
INFO : [20220621-01:10:25] invoker_task_base.py:159:task_invoke(): processing subtask 0
INFO : [20220621-01:10:25] utils.py:85:wrapper(): |-server_invoke costs 0.01s(0.00m).
INFO : [20220621-01:10:25] server.py:61:data_manage_request(): task t00000010000232949781655773825 result:
INFO : [20220621-01:10:25] invoker_task_labeling.py:25:subtask_invoke_0(): labeling_request: dataset_id: "t0000001000023ec62961655706852"
labeler_accounts: "[email protected]"
in_class_ids: 4
in_class_ids: 3
project_name: "label
$sample_project_None_1655706551.9790225_mining_dataset_3"
export_annotation: true

INFO : [20220621-01:10:25] label_runner.py:56:start_label_task(): start label task!!!
INFO : [20220621-01:10:25] utils.py:24:run_command(): starting cmd:
mir export --root /home/qianyan/ymir/ymir/ymir-workplace/sandbox/0001/000023 --media-location /home/qianyan/ymir/ymir/ymir-workplace/ymir-assets --asset-dir /home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825/sub_task/t00000010000232949781655773825/label_t00000010000232949781655773825/Images --annotation-dir /home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825/sub_task/t00000010000232949781655773825/label_t00000010000232949781655773825/Images --src-revs t0000001000023ec62961655706852@t0000001000023ec62961655706852 --format ls_json -w /home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825/sub_task/t00000010000232949781655773825/export_work_dir --cis cat;person

INFO : [20220621-01:10:26] utils.py:30:run_command(): run cmd succeed:
missing annotations: 0, empty annotations: 0 out of 2 assets
git result:
M annotations.mir
M context.mir
M keywords.mir
M metadatas.mir
M tasks.mir

[exporting-task-1655773826.1539779 43ab7ee] export from t0000001000023ec62961655706852@t0000001000023ec62961655706852
5 files changed, 17 deletions(-)
command done: exporting-task-1655773826.1539779@exporting-task-1655773826.1539779, return code: 0
|-cmd_run costs 0.14s(0.00m).

INFO : [20220621-01:10:26] label_free.py:168:run(): start LABELFREE run()
DEBUG : [20220621-01:10:26] connectionpool.py:228:_new_conn(): Starting new HTTP connection (1): xxxxx:8763
INFO : [20220621-01:12:36] percent_log_util.py:65:write_percent_log(): writing task info to /home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825/sub_task/t00000010000232949781655773825/out/monitor.txt
t00000010000232949781655773825 1655773956.380800 1.0 4 130603 HTTPConnectionPool(host='xxxxx', port=8763): Max retries exceeded with url: /api/projects (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2d4c251bb0>: Failed to establish a new connection: [Errno 110] Connection timed out'))
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/usr/local/lib/python3.8/dist-packages/urllib3/util/connection.py", line 95, in create_connection
raise err
File "/usr/local/lib/python3.8/dist-packages/urllib3/util/connection.py", line 85, in create_connection
sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 398, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 239, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "/usr/lib/python3.8/http/client.py", line 1256, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1302, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1251, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1011, in _send_output
self.send(msg)
File "/usr/lib/python3.8/http/client.py", line 951, in send
self.connect()
File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 205, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f2d4c251bb0>: Failed to establish a new connection: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 440, in send
resp = conn.urlopen(
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 785, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/dist-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='xxxxxx', port=8763): Max retries exceeded with url: /api/projects (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2d4c251bb0>: Failed to establish a new connection: [Errno 110] Connection timed out'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/app/ymir_controller/controller/label_model/base.py", line 19, in wrapper
_ret = f(*args, **kwargs)
File "/app/ymir_controller/controller/label_model/label_free.py", line 169, in run
project_id = self.create_label_project(project_name, keywords, collaborators, expert_instruction)
File "/app/ymir_controller/controller/label_model/label_free.py", line 55, in create_label_project
resp = self._requests.post(url_path=url_path, json_data=data)
File "/app/ymir_controller/controller/label_model/request_handler.py", line 23, in post
resp = requests.post(
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 117, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 529, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 645, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='xxxxx', port=8763): Max retries exceeded with url: /api/projects (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2d4c251bb0>: Failed to establish a new connection: [Errno 110] Connection timed out'))

INFO : [20220621-01:12:36] label_runner.py:78:start_label_task(): finish label task!!!`

尝试使用yolov4镜像推理时报错,以及使用其他公共镜像时,报错container error

您好,

我目前碰到了两个问题,

  1. 我的项目目前使用yolov4可以正常训练,模型训练好后,使用推理功能时,报错找不到result.yaml文件以及infer-result.json。
    image

  2. 在使用除yolov4以外的其他公共镜像比如yolov5,mmdet训练时,报错Error: Could not load UVM kernel module. Is nvidia-modprobe installed。但是我的nvidia-modporbe已经正确安装了
    image
    image

非常规操作下,一个不算bug的问题的建议

描述:
在项目已有ymir-workplace的前提下(即项目已经有数据),移动整个ymir文件夹到其它路径下再部署(或重命名上一级文件夹),会导致所有新建数据集的预处理任务、标注任务均卡住为0%

步骤:
1、修改ymir上一级文件夹的路径/名字(ymir-workplace下的文件及其文件夹的所有者和权限保持与原来一致)
2、再部署
3、新建数据集预处理任务(如数据集采样)或标注任务

显示错误
ERROR : [2022-06-07 03:07:11,538] Job "update_monitor_percent_log (trigger: interval[0:00:20], next run at: 2022-06-07 03:07:31 UTC)" raised an exception
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/apscheduler/executors/base.py", line 125, in run_job
retval = job.func(*job.args, **job.kwargs)
File "ymir_monitor/monitor/utils/crontab_job.py", line 55, in update_monitor_percent_log
runtime_log_content = PercentLogHandler.parse_percent_log(log_path)
File "/app/common/common_utils/percent_log_util.py", line 31, in parse_percent_log
with open(log_file, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory:

查找原因,发现可能是redis数据库里的MONITOR_FINISHED_KEY:v1的raw_log_contents存的是绝对路径导致的。
修改MONITOR_FINISHED_KEY:v1里的值再启动,恢复

ps:感觉不算bug,毕竟是个非常规的部署操作,本来不想提的,一般人不会改名/改路径再启动,但是否redis存相对路径会好点?因为毕竟.env里设置了YMIR_PATH。

bash ymir.sh start

ERROR: The Compose file './docker-compose.yml' is invalid because:
networks.ymirnetwork.ipam.config value Additional properties are not allowed ('gateway' was unexpected)
Unsupported config option for services.backend: 'runtime'

using bash ymir.sh start but isn't successful

Describe the issue
My system is Windows 10 and already installed docker, wsl 2 and check using wsl2 to open docker. But when I go to next step "bash ymir.sh start", it downloaded something like yolo-mining-xxxx, then it closed by itself. I can't pick up the error by it's info.

To Reproduce
My installation step:

  1. installed docker restart computer
  2. installed wsl2 and set docker to use wsl2 then restart
  3. use wsl vi to change '$\r' to '\x', if don't do this step it will showed like '$\r' not found .....
  4. "bash ymir.sh start" then automatically download something it need
  5. closed

Expected behavior
A clear and concise description of what you expected to happen.
start the ymir.
Screenshots
If applicable, add screenshots to help explain your problem.
image

Desktop (please complete the following information):

  • OS: windows 10
  • Browser: not use

Additional context
Add any other context about the problem here.

Suggestion: need a better log monitor and log standard levels

Describe the issue
Can't find the log file and check the problem quickly.

if something wrong, we need to find the current log at ymir-workplace/sandbox/work_dir/TaskTypeTraining/{taskid}/sub_task/t0xxxxxxxx/out/ymir-executor-out.log

And alos the log file is so large, may be something logs is not very helpful, suggest to add the log level control like debug/error/info etc.

界面不能正常跳转

一开始自己用test用户安装未成功,后来YMIR平台的技术人员远程帮忙用root用户安装,软件可以正常工作,非常感谢MIR平台的技术人员的支持。
因为是在笔记本上中起的虚拟机,虚拟机中再起的镜像,可能是由于虚拟机的非正常关机(笔记本没电了),后来出现主界面能进去但不能正常使用的情况,咨询完平台的技术人员,解释可能是权限的问题,后自己重新选用test用户安装(清空了所有镜像,重新下载安装),安装过程中除“ Found orphan containers (ymir-master_label_redis_1, ymir-master_label_api_1, ymir-master_label_minio_1, ymir-master_label_nginx_1, ymir-master_label_mysql_1, ymir-master_label_celery_worker_1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up”外未报其他问题(也不清楚“ Found orphan containers .......这些算不算问题),情况跟之前的一样,界面能进去,之前创建的示例工程也还在(不明白为什么之前创建的示例工程还在),但点击这个示例工程,界面不跳到详细页面中,点击界面中“标签管理”按钮,界面依然不能正常调转和显示。

[Backend]tox error

I want to modify some backend code, but it can't go ahead.
===================================================== log end ===================================================== ERROR: could not install deps [-rrequirements.txt, -rrequirements-dev.txt]; v = InvocationError('/Users/xxx/Documents/code/ymir/ymir/backend/.tox/python/bin/python -m pip install -rrequirements.txt -rrequirements-dev.txt', 1) _____________________________________________________ summary _____________________________________________________

cmd版数据导入失败

按照现在的教程,cmd版labels文件放在mir_demo_repo下,但实际操作会报错,repo is dirty
image
没有labels文件的话,又识别不到标签,
image

GPU not recognized

Describe the issue
GPU not recognized

To Reproduce
1、YMIR:
ymir

2、Nvidia Info

nmsi
nver

Environment (please complete the following information):

  • OS: ubuntu:18.04
  • Ymir Version [ release-1.0.0]
  • Docker version 20.10.16, build aa7e414
  • Docker Compose version v2.5.1
  • Nvidia-docker version:
    NVIDIA Docker: 2.6.0
    Client: Docker Engine - Community
    Version: 20.10.16
    API version: 1.41
    Go version: go1.17.10
    Git commit: aa7e414
    Built: Thu May 12 09:17:23 2022
    OS/Arch: linux/amd64
    Context: default
    Experimental: true

Server: Docker Engine - Community
Engine:
Version: 20.10.16
API version: 1.41 (minimum version 1.12)
Go version: go1.17.10
Git commit: f756502
Built: Thu May 12 09:15:28 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.4
GitCommit: 212e8b6fa2f44b9c21b2798135fc6fb7c53efc16
runc:
Version: 1.1.1
GitCommit: v1.1.1-0-g52de29d
docker-init:
Version: 0.19.0
GitCommit: de40ad0

why my nvidia driver is broken

Describe the issue
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #1: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'bash ymir.sh start'
  2. See error

Expected behavior
我运行 bash ymir.sh start ,进入了web页面,成功上传自己的数据包zip文件,点击label在进入标注页面,标注页面无法显。重启之后,发现显卡驱动有问题,我的显示器异常,用nvidia-smi 命令,显示无显卡驱动,为什么会破坏我的显卡驱动。而重新运行dash ymir.sh start, 能进入web页面,但一段时间之后,终端报错。web页面还是在
Screenshots
If applicable, add screenshots to help explain your problem.
Screenshot from 2022-05-13 09-37-44
Screenshot from 2022-05-13 09-42-17
Screenshot from 2022-05-13 09-45-27

Desktop (please complete the following information):

  • OS: [Ubuntu20.04 LTS]
  • Browser [chrome 版本 101.0.4951.64]
  • Version [e.g. 22]
  • docker-version:20.10

GUI运行训练模型时,报错permission denied nvidia-docker

如题,
按GUI部署步骤部署好后,web端能正常访问,数据集也可以成功导入。
但是开始训练模型时,提示失败,查看日志显示报错, permission denied nvidia-docker是什么原因呢。
报错如图所示,
2AQgvPcwyV

另外,

  1. 新版本docker中,其实不用nvidia-docker命令了,用的是docker run --gpus all,目前该项目不支持这种模式吗?
  2. 程序运行时,好像默认是用root用户,包括新建的文件夹写的日志啥的,都是root的,能否设置成为使用当前用户运行呢?
  3. 我怎么导入自己的docker镜像到项目中呢,帮助文档里暂时没有找到相关的介绍。

谢谢

点开镜像列表的公共镜像,提示分享镜像失败

ymir_app.log中提示:
Traceback (most recent call last):
File "/app/ymir_app/app/api/api_v1/endpoints/images.py", line 142, in get_shared_images
shared_images = await get_shared_images_from_github(
File "/usr/local/lib/python3.8/dist-packages/fastapi_cache/decorator.py", line 49, in inner
ret = await func(*args, **kwargs)
File "/app/ymir_app/app/api/api_v1/endpoints/images.py", line 154, in get_shared_images_from_github
shared_images = get_github_table(url, timeout=timeout)
File "/app/ymir_app/app/utils/github.py", line 37, in get_github_table
tbl = get_markdown_table(url, timeout)
File "/app/ymir_app/app/utils/github.py", line 13, in get_markdown_table
resp = requests.get(url, headers=HEADERS, timeout=timeout)
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 529, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 645, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /IndustryEssentials/ymir/master/docker_executor/public_index.md (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f651cab3400>: Failed to establish a new connection: [Errno 111] Connection refused'))
[2022-06-30 00:17:20 +0000] [52] [INFO] 172.168.254.8:59090 - "GET /api/v1/images/shared HTTP/1.0" 200

请问这是要连接哪个网址连接不上,应该如何解决呢?

训练一段时间后,中途失败了,显示未知错误

[Describe the bug
训练一段时间后(大概一天左右)后失败了,显示未知错误
训练数据是2500张图片,测试数据是200张。
用的显卡是 RTX3050 8G 训练时修改过参数 batch 从64修改为16 图片尺寸参数从608----608修改为416----416

训练错误日志ymir-executor-out.log如下:
2022-07-07 04:35:28,817 - /darknet/train_watcher.py[line:43] - ERROR: error occured in handler: <function _DarknetTrainingHandler._on_best_weights_modified at 0x7ff41bca5830> and path: /out/models/yolov4_best.weights
Traceback (most recent call last):
File "/darknet/train_watcher.py", line 41, in on_modified
handler(self, src_path)
File "/darknet/train_watcher.py", line 52, in _on_best_weights_modified
export_dir='/out/models')
File "/darknet/convert_model_darknet2mxnet_yolov4.py", line 437, in run
net.load_weights(load_param_name)
File "/darknet/convert_model_darknet2mxnet_yolov4.py", line 409, in load_weights
ptr = set_data(module, ptr)
File "/darknet/convert_model_darknet2mxnet_yolov4.py", line 391, in set_data
conv_weights = weights[ptr:ptr +num_weights]
File "/usr/local/lib/python3.7/dist-packages/mxnet/ndarray/ndarray.py", line 511, in getitem
return self._get_nd_basic_indexing(key)
File "/usr/local/lib/python3.7/dist-packages/mxnet/ndarray/ndarray.py", line 792, in _get_nd_basic_indexing
return self._slice(key.start, key.stop)
File "/usr/local/lib/python3.7/dist-packages/mxnet/ndarray/ndarray.py", line 907, in _slice
start, stop, _ = _get_index_range(start, stop, self.shape[0])
File "/usr/local/lib/python3.7/dist-packages/mxnet/ndarray/ndarray.py", line 2343, in _get_index_range
raise IndexError('Slicing stop %d exceeds limit of %d' % (stop, length))
IndexError: Slicing stop 40305504 exceeds limit of 39606267

Environment (please complete the following information):

install failed

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. git clone the repo
  2. checkout to the tag of release-1.0.0
  3. run the shell script by "bash ymir.sh start"
  4. give me an error message like this screen

d00b05b68f9ae9385137f18da43bb24

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • Server OS: ubuntu:20.04
  • Ymir Version [ release-1.0.0]

Additional context
Add any other context about the problem here.

label-free can't work

Describe the issue
label-free can't work when i click the label

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'Label'
  2. Upload my dataset (data.zip just contains 18 pictures)
  3. See error

Screenshots

I want to label new image dataset, when I get the zip file uploaded, Ijust 18 pictures, it has loaded nearly one day with nothing . But I have import a dataset with 1k pictures in 10 seconds

Screenshot from 2022-05-14 15-18-07

when I enter the label-free site ,this also show data reading.

Screenshot from 2022-05-14 14-51-03

when i clicked the label or import button ,it reports "The current project is reading data.Please refresh the project list and try again
Screenshot from 2022-05-14 14-57-48

Desktop (please complete the following information):

  • OS: [Ubuntu 20.04 LTS]
  • Browser [chrome]
  • docker-version 20.10

Additional context
I also get label-free repo via git clone https://github.com/IndustryEssentials/label-free.git, it's the same problem, is there something wrong with the label-free platform? or am i doing something wrong?

gpu负载问题

env:
系统: ubuntu 20.04
显卡:nvidia 2080ti 公版显卡

描述:
在训练过程中,查看显卡占用信息 。只占用到gpu的一半资源。

微信图片_20220705143754

需求:
想满负荷GPU资源进行训练,如何配置?

Can not log in local YMIR server

Hello. First thanks for wonderful YMIR Project. I just install YMIR GUI on my computer, the app is running. But I can not log in the app after I enter the user name and password. Is there a way to debug the problem? For example log file or app output.

提供ymir开发者手册

希望可以提供技术架构文档,以及ymir的大致设计方案
最好有文档可以指导怎么在ubuntu环境搭建本地可调试的ymir环境
方便代码爱好者更容易理解ymir

自制镜像相关问题

1.我们成功启动了训练,但未在容器里找到in文件夹,我们想通过in文件夹里的配置去改写我们自己的镜像文件
2.我们想自制YOLOX相关的镜像,但YMIR页面启动训练时默认YOLOV4,darknet的,能方便咨询一下这部分前端后端的代码在哪改写
谢谢!
image
image
image

Support cluster mode to deploy ?

Support cluster mode to deploy ?
I've deployed and test, It's very cool,But I have multiple GPU nodes. Do we support cluster mode to deploy?
If not, We plan to support it?

公共镜像复制到镜像列表时,一直转圈,日志文件里有错误信息

公共镜像复制到镜像列表时,一直转圈,日志文件里有错误信息

Desktop (please complete the following information):

  • OS: ubuntu22.04
  • Docker version 20.10.17, build 100c701
  • Docker Compose version v2.5.1
  • Version ymir-release-1.0.0

ymir_controller.log日志:
INFO : [20220630-01:40:36] invoker_cmd_base.py:83:server_invoke(): request:
{'user_id': '0001', 'repo_id': '000000', 'req_type': 16, 'task_id': 't0000001000000ae49a01656553236', 'singleton_op': 'industryessentials/executor-det-yolov4-training:release-1.1.0'}
async_mode: True
work_dir:
INFO : [20220630-01:40:36] utils.py:24:run_command(): starting cmd:
docker image inspect industryessentials/executor-det-yolov4-training:release-1.1.0 --format ignore_me

ERROR : [20220630-01:40:36] utils.py:27:run_command(): run cmd error:
stderr: docker: /usr/lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.32' not found (required by docker) docker: /usr/lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.34' not found (required by docker)

stdout:
INFO : [20220630-01:40:36] utils.py:24:run_command(): starting cmd:
docker pull industryessentials/executor-det-yolov4-training:release-1.1.0

ERROR : [20220630-01:40:36] utils.py:27:run_command(): run cmd error:
stderr: docker: /usr/lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.32' not found (required by docker) docker: /usr/lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.34' not found (required by docker)

stdout:
INFO : [20220630-01:40:36] utils.py:85:wrapper(): |-server_invoke costs 0.02s(0.00m).
INFO : [20220630-01:40:36] server.py:61:data_manage_request(): task t0000001000000ae49a01656553236 result: code: 130401
message: "docker: /usr/lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.32\' not found (required by docker)\ndocker: /usr/lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.34' not found (required by docker)\n"

RC_CMD_ERROR_UNKNOWN: unkown error

my ymir service is running in a virtual machine, when I train the model, it prompts an error

model_group_23 > Train
Model Detail
Model Name model_group_23 V2 mAP 0.00%
Task State
Current State Invalid
Failure
Error Reason RC_CMD_ERROR_UNKNOWN: unkown error

my evn:
1、vm
2、centos 7
3、Docker version 20.10.16
4、docker-compose version 1.27.2
5、YMIR config=> SERVER_RUNTIME=runc

Running YMIR in WSL2 fails

Hi
I fail to start YMIR on my windows PC using WSL2. I thinnk it is due to the docker compose is v2 but your code uses v1 (or something similar). See the execution trace below. I have checked that the Nvidia GPU drivers etc. are correctly installed.
/Tomas

Using default tag: latest
latest: Pulling from industryessentials/executor-det-yolov4-training
Digest: sha256:ab7fd377e7945ad668547921ee8b0ddd2a24a55c655614cc836ff5e5d93e1855
Status: Image is up to date for industryessentials/executor-det-yolov4-training:latest
docker.io/industryessentials/executor-det-yolov4-training:latest
Using default tag: latest
latest: Pulling from industryessentials/executor-det-yolov4-mining
Digest: sha256:17f6e5bf7192780acf897a8e24b6c10b280287daa4953e67e27d17bb483f6477
Status: Image is up to date for industryessentials/executor-det-yolov4-mining:latest
docker.io/industryessentials/executor-det-yolov4-mining:latest
[+] Running 6/6
⠿ Container ymir-clickhouse-1 Removed 0.1s
⠿ Container ymir-db-1 Removed 0.1s
⠿ Container ymir-viz-redis-1 Removed 0.1s
⠿ Container ymir-tensorboard-1 Removed 0.1s
⠿ Container ymir-redis-1 Removed 0.1s
⠿ Network ymir_ymirnetwork Removed 0.2s

in prod mode, pulling images.
[+] Running 7/7
⠿ backend Pulled 1.6s
⠿ viz-redis Pulled 1.6s
⠿ web Pulled 1.6s
⠿ clickhouse Pulled 1.6s
⠿ tensorboard Pulled 1.6s
⠿ redis Pulled 1.6s
⠿ db Pulled 1.6s
[+] Running 6/6
⠿ Network ymir_ymirnetwork Created 0.0s
⠿ Container ymir-tensorboard-1 Created 0.1s
⠿ Container ymir-redis-1 Created 0.1s
⠿ Container ymir-viz-redis-1 Created 0.1s
⠿ Container ymir-clickhouse-1 Created 0.1s
⠿ Container ymir-db-1 Created 0.1s
⠋ Container ymir-backend-1 Creating 0.0s
Error response from daemon: Unknown runtime specified nvidia

关于LabelFree标注工具下载数据包出错问题

您好,想问一下超过1000张的任务显示下载超时的解决方案,已经尝试重新拉取下label free的镜像,依旧没有解决。
1)docker-compose -f docker-compose.labelfree.yml pull
2)bash ymir.sh start
WechatIMG402
Uploading WechatIMG648.jpeg…

无法从界面正常跳转到标注系统label studio

labelstudio已经正确配置并且启动了,docker ps结果正常。
从ymir的web界面,点击数据里面的 “跳转到标注平台”
image
能够打开labelstudio,但是设置的token的无效,需要登陆。
登陆后重新点击 “跳转到标注平台” 而且labelstudio里面是空的,无法加载ymir平台上面的数据。
两个系统看起来是独立的。
image

修改MYSQL_PASSWORD会导致初始化启动无法登录

Describe the bug
在没有ymir-workplace的前提下(即第一次初始化启动的时候),修改.env的MYSQL_PASSWORD,会导致启动后无法登录admin那个账号,查找mysql数据库发现无任何数据表(尝试了很多次,不修改MYSQL_PASSWORD进行初始化登录就没这个问题)

To Reproduce
Steps to reproduce the behavior:

  1. 删除ymir-workplace(如果有的话)
  2. 修改.env的MYSQL_PASSWORD
  3. 使用bash ymir.sh start 启动
  4. 使用[email protected]登录

顺便提一下:之所以会发现这个问题,是因为之前使用了旧版本的ymir,含有旧版本的数据,这次git pull更新代码后发现启动正常,但是使用标注数据-到标注平台注册账号(label_studio)会显示307,不得以才删除了ymir-workplace,删除后再初始化重启就可以正常重定向了。但不知道这个是偶然现象还是也是个bug(如果是bug后续有新版本升级的时候是否还会出现这种情况?如果无法重现,只是偶然现象的话可能是我自己的问题)

ARM Rebuild

1、cd ymir/backend/src/ymir_app/deploy/redis
2、docker build -t industryessentials/ymir-backend-redis -f Dockerfile .

after rebuilding ,run the bash ymir.sh start
loging:
standard_init_linux.go:219: exec user process caused: exec format error

It can't pull industryessentials/executor-det-yolov4-training from aliyun mirror

docker pull industryessentials/executor-det-yolov4-training

You can see that the layer 11323ed2c653 is Downloading, but can't find any thing to download.

8412b44cad21: Pulling fs layer

Using default tag: latest
latest: Pulling from industryessentials/executor-det-yolov4-training
11323ed2c653: Downloading
aac8cf1d1c79: Download complete
fc85802d11de: Download complete
5ecaf0dceab7: Download complete
52ea4560e85a: Download complete
de9ca64a95e6: Download complete
e4165bf6e171: Download complete
77cc6fb46c25: Downloading [================================>                  ]  519.5MB/804.9MB
f54c7e16b308: Download complete
4e21240e72c2: Downloading [=================================>                 ]  359.9MB/532.4MB

It will pull again from docker hub. It will be very slow in China. Please provider another docker registry in China.

image

upload file error

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. ready source images
  2. include annotations
  3. zip one files
  4. upload

Expected behavior
there are seventy files
but there is empty

ymir-workplace has no dir named "ymir-sharing", version Ymir-release-1.0.0

thanks for your kind to release this funny project! I try to use ymir on my pc,and I success to launch ymir,but when I import voc2012, I don't find "ymir-sharing", so I create it,and push voc2012 under this dir. whatever I do, I don't import voc2012 dataset on ymir-gui. Please help me, thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.