industryessentials / ymir Goto Github PK

YMIR, a streamlined model development product.

License: Apache License 2.0

Shell 1.08% Dockerfile 0.03% Python 54.97% Mako 0.03% HTML 0.79% TypeScript 23.65% JavaScript 12.73% Less 1.69% CSS 0.71% Go 4.31%

ymir's Introduction

Official Site ^VISIT Apply for Trial ^{TRY IT OUT} SLACK Community ^WELCOME

📘Usage Instruction | 🛠️Installation | 🚀Projects | 🤔Issues Report | 📰Lisence

📫 Feedback on usage issues: [email protected] / Professional consulting for server equipment: [email protected]

English | 简体中文

Citations

If you wish to refer to YMIR in your work, please use the following BibTeX entry.

@inproceedings{huang2021ymir,
      title={YMIR: A Rapid Data-centric Development Platform for Vision Applications},
      author={Phoenix X. Huang and Wenze Hu and William Brendel and Manmohan Chandraker and Li-Jia Li and Xiaoyu Wang},
      booktitle={Proceedings of the Data-Centric AI Workshop at NeurIPS},
      year={2021},
}

What's new

Version 2.0.0 updated on 11/08/2022

YMIR platform

A new model performance diagnosis module.
A new function for visual evaluation of model inference results.
Adding a public algorithm library with a variety of built-in high-precision algorithms.
One-click deployment function, supporting the deployment of algorithms to prerequisite certified devices.
New operating instruction.
Refactory code structure.

Docker

Support yolov5
Support mmdetection
Support yolov7
Support detectron2
Support nanodet
Support vidt: An Extendable, Efficient and Effective Transformer-based Object Detector
Support ymir image testing tool library
Support demo sample image creation documentation
Support ymir image development extension library

View more ymir-executor-fork

Within the public dockerimage

Update yolov5 training image: youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu111-tmi
Update mmdetection training image: youdaoyzbx/ymir-executor:ymir2.0.0-mmdet-cu111-tmi
Update yolov5 image with rv1126 chip deployment support: youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu111-tmid

More code updates ymir-dev.

Deployment Prerequisite (optional)

YMIR supports deploying the trained model and public algorithm model directly to the certified device, for more hardware specs, please check the details.

Introduction

Catalog

Citations
What's New
Depolyment Prerequisite (optional)
1. Introduction to AI SUITE-YMIR
- 1.1. Main functions
- 1.2. Apply for trial
2. Installation
- 2.1. Environment dependencies
- 2.2. Installation of YMIR-GUI
3. Use YMIR-GUI: typical model production process
4. For advanced users: YMIR-CMD (command line) user's guide
- 4.1 Installation
- 4.2 Typical model production process
5. Get the code
- 5.1. Code contribution
- 5.2. About training, inference and mining docker images
6. Design concept
7. MISC
- 7.1. FAQ

1. Introduction to AI SUITE-YMIR

As a streamlined model development product, YMIR(You Mine In Recursion) focuses on the dataset versioning and model iteration in the AI SUITE open-source series.

AI commercialization is currently reaching a stage of maturity in terms of computing hardwares, algorithms, etc. The adoption of AI often encounter challenges such as a lack of skilled developers, high development costs and long iteration cycles.

As a platform, YMIR provides an end-to-end AI development system. This platform reduces costs for companies using artificial intelligence and accelerates the adoption of artificial intelligence. YMIR provides ML developers with one-stop services for data processing, model training, and other steps required in the AI development cycle.

The YMIR platform provides effective model development capabilities with a data-centric approach. The platform integrates concepts such as active learning methods, data and model version control, and workspace. Parallel rapid iteration of data sets and projects for multiple specific tasks are realized by YMIR. The platform uses an open API design, so third-party tools can also be integrated into the system.

1.1. Main functions

A typical model development process can usually be summarized in a few steps: defining the task, preparing the data, training the model, evaluating the model, and deploying the model.

Define the target: Before starting an AI development project, it is important to be clear about what is to be analyzed. This will help developers correctly convert the problem into several typical ML modeling tasks, such as image classification, object detection, etc. Different tasks have different data requirements.
Prepare data: Data preparation is the foundation of a successful AI project. The most important task in this step is to ensure the quality of data and its annotations. Collect all the required data at the beginning is the optimal situation for many projects. Therefore, the project developer may find that some data is missing in subsequent stages. Additional data could be necessary upon the project needs.
Train model: This operation is commonly referred to "modeling". This step refers to the exploration and analysis of prepared data to discover the internal patterns and any links between the input and the expected prediction target. The result of this step is usually one or more machine learning models. These models can be applied to new data to obtain predictions. Developers train their own models using mainstream model training frameworks, such as pytorch, tensorflow, darknet, etc.
Evaluate model: The entire development process has not yet been completed after training the model. Models need to be evaluated and checked before being put into production. Normally, get a production-quality model all at once is not so easy. You need to adjust parameters, and iterate the model continuously. Some common metrics can help you evaluate models quantitatively and pick a satisfactory model. Some common metrics can help you to evaluate models quantitatively.
Deploy model: Models are developed and trained based on previously available data (possibly test data). After a satisfactory model is obtained, it will be applied to real data to make predictions at scale.

YMIR platform mainly meets the needs of users to produce models at scale, provides users with a good and easy-to-use display interface, and facilitates the management and viewing of data and models. The platform contains main functional modules such as project management, tag management, model deployment, system configuration, dockerimage management, etc. It supports the realization of the following main functions.

Function Module	Primary Function	Secondary Function	Function Description
Project Management	Project Management	Project Editing	Supports adding, deleting, and editing projects and project information
Project Management	Iteration Management	Iteration Preparation	Supports setting up the dataset and model information needed for iteration
Project Management	Iteration Management	Iteration Steps	Support to populate the data from the previous round to the next step corresponding to the task parameters
Project Management	Iteration Management	Iteration Charts	Support to display the datasets and models generated during the iterative process in the interface as a graphical comparison
Project Management	Dataset Management	Import datasets	Support users to import prepared datasets by copying public datasets, url addresses, paths, and local imports
Project Management	Data Set Management	View Data Sets	Supports visualization of image data and annotations, and viewing of historical information
Project Management	Data Set Management	Edit Data Set	Support editing and deleting data sets
Project Management	Dataset Management	Dataset Versions	Support creating new dataset versions on the source dataset, with the version number incremented by time
Project Management	Data Set Management	Data Preprocessing	Support image data fusion, filtering, sampling operations
Project Management	Data Set Management	Data Mining	Supports finding the most beneficial data for model optimization in a large number of data sets
Project Management	Data Set Management	Data Annotation	Support for adding annotations to image data
Project Management	Data Set Management	Data Inference	Supports adding annotations to a data set by specifying a model
Project Management	Model Management	Model Import	Support local import of model files to the platform
Project Management	Model Management	Training Models	Support to select datasets, labels, and adjust training parameters to train models according to requirements, and view the corresponding model results after completion
Project Management	Model Management	Model Validation	Support uploading a single image to check the performance of the model in real images through visualization to verify the accuracy of the model
Tag management	Tag management	Add tags	Support adding primary names and aliases of training tags
Model Deployment	Algorithm Management	Public Algorithm	Support algorithm customization, view public algorithms and try them out, support adding to my algorithms
Model Deployment	Algorithm Management	Public Algorithm	Support publishing my algorithms to public algorithms
Model Deployment	Algorithm Management	My Algorithms	Support for viewing and editing my published algorithms and added algorithms
Model Deployment	Algorithm Management	Deploy Algorithms	Support deploying my algorithms to devices and viewing deployment history
Model Deployment	Device Management	View Devices	Support viewing device information and deployment history
Model Deployment	Device Management	Edit Device	Support adding, deploying, and deleting devices
Model Deployment	Device Management	Support Devices	Support viewing and purchasing of supported devices
System Configuration	Mirror Management	My Mirrors	Support for adding custom mirrors to the system (available to administrators only)
System Configuration	Mirror Management	Public Mirror	Support for viewing public mirrors uploaded by others and copying them to your own system
System Configuration	Permissions Configuration	Permissions Management	Support for configuring user permissions (available only to administrators)

1.2. Apply for trial

We provide an online trial version for your convenience. If you need, please fill out the Apply for YMIR Trial , and we will send the trial information to your email address.

2. Installation

How do users choose to install GUI or CMD?

The GUI verision with the supports of model training and model iteration is more suitable for ordinary users.
If you need to modify the default configuration of the system, it is recommended to install CMD;
If you have already deployed the existing version of ymir, please refer to the Upgrade Instructions.

This chapter contains the installation instructions for YMIR-GUI. If you need to use CMD, please refer to the Ymir-CMD user guide.

2.1. Environment dependencies

1.NVIDIA drivers shall be properly installed before installing YMIR. For detailed instructions, see https://www.nvidia.cn/geforce/drivers/.

Docker and Docker Compose installation:

docker compose >= 1.29.2, docker >= 20.10
Installation of Docker and Docker Compose https://docs.docker.com/get-docker/
Installation of nvidia-docker nvidia-docker install-guide

## check the maximum CUDA version supported by the host
nvidia-smi
## for Host support cuda 11+, check nvidia-docker
sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
## for Host support cuda 10+, check nvidia-docker
sudo docker run --rm --gpus all nvidia/cuda:10.2-base-ubuntu18.04 nvidia-smi
## those commands should result in a console output shown below:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.60.02    Driver Version: 510.60.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:0B:00.0 Off |                    0 |
| N/A   62C    P0    55W /  75W |   4351MiB /  7680MiB |     94%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      8132      C                                    4349MiB |
+-----------------------------------------------------------------------------+

Hardware Suggestions

NVIDIA GeForce RTX 2080 Ti or higher is recommended.
The maximum CUDA version supported by the host >= 11.2

2.2. Installation of YMIR-GUI

The user must ensure that all the conditions in Cuda environment dependencies have been met, otherwise the installation may fail.

The YMIR-GUI project package is on DockerHub and the steps to install and deploy YMIR are as follows:

Clone the deployment project YMIR to the local server:

git clone https://github.com/IndustryEssentials/ymir.git

If there is no available GPU and you need to install CPU mode, please change it to CPU boot mode by modifying the .env file to change the SERVER_RUNTIME parameter to runc:

# nvidia for gpu, runc for cpu.

SERVER_RUNTIME=runc

If you do not need to use the label free labeling platform, you can directly execute the start command with the default configuration: bash ymir.sh start.It is recommended not to use the sudo command, otherwise it may cause insufficient privileges.

When the service starts, it asks the user if they want to send usage reports to the YMIR development team, the default is yes if you do not enter it.
The default port number for YMIR's Model Deployment module is 18801. If there is a conflict that needs to be modified, you need to go to the YMIR directory and modify the .env file to configure the ModelDeployment port and MySQL access password:

DEPLOY_MODULE_HOST_PORT=18801
DEPLOY_MODULE_URL=${DEPLOY_MODULE_HOST_PORT}
DEPLOY_MODULE_MYSQL_ROOT_PASSWORD=deploy_db_passwd

Execute the start command after the modification: bash ymir.sh start.

After the service successfully started, YMIR will be available at http://localhost:12001/. If you need to stop the service, run the command: bash ymir.sh stop
The default initial user is super administrator, you can check account and password through the .env file under the project path and modify it before deployment. It is recommended to change the password through the user management interface after the service deployment is completed.

Enter the .env file configuration, set the outgoing mailbox information, and only after the configuration is complete can you send email notifications.

# Email Notification
EMAILS_ENABLED=True
FRONTEND_ENTRYPOINT=<YMIR FRONTEND URL>
SMTP_TLS=
SMTP_PORT=
SMTP_HOST=
SMTP_USER=
SMTP_PASSWORD=
EMAILS_FROM_EMAIL= <SENDER EMAIL ADDRESS>
EMAILS_FROM_NAME=ymir-project
EMAIL_RESET_TOKEN_EXPIRE_HOURS=1
EMAIL_TEMPLATES_DIR=/app/email-templates/build

3. Use YMIR-GUI: typical model production process

As shown in the figure, YMIR divides the model development process into multiple steps. Details about how to run each step are listed in the subsequent sections.

Data and labels are necessary for the training of deep learning, and the training requires a large amount of data with labels. However, what exists in reality is a large amount of unlabeled data, which is too costly in terms of labor and time if all of them are manually labeled.

Therefore, YMIR platform, through active learning, first attains an initial model by local import or a small amount of labeled data, and uses this initial model to mine the most beneficial data for model capability improvement from a large amount of data. After the mining is completed, only this part of the data is labeled and the original training dataset is expanded efficiently.

The updated dataset is used to train the model again to improve the model capability. The YMIR platform provides a more efficient approach than labeling the entire data and then training it, reducing the cost of labeling low-quality data. Through the cycle of mining, labeling, and training, high quality data is expanded and the model capability is improved.

This section uses a complete model iteration process as an example to illustrate how to use the YMIR platform. Please check Operating Instructions.

4. For advanced users: YMIR-CMD (command line) user's guide

This chapter contains the instructions for the YMIR-CMD. If you need to use the GUI, please refer to Ymir-GUI Installation.

4.1 Installation

Mode I. Pip Installation

# Requires >= Python3.8.10
$ pip install ymir-cmd
$ mir --vesion

Mode II. Installation from the source

$ git clone --recursive https://github.com/IndustryEssentials/ymir.git
$ cd ymir/ymir/command
$ python setup.py clean --all install
$ mir --version

4.2 Typical model production process

The above figure shows a typical process of model training: 1) the user prepares external data, 2) imports it into the system, 3) appropriately filters the data, and 4) begins training to obtain a model (possibly with low accuracy). 5) selects images in a dataset to be mined that are suitable for further training based on this model, 6) annotates these images, 7) merges the annotated results with the original training set, and 8) uses the merged results to run the training process again to obtain a better model. This section implement the process shown above using the command line. For details, please check the CMD usage instructions.

5. Get the code

5.1. Code contribution

Any code in the YMIR repo should follow the coding standards and will be checked in the CI tests.

Functional code needs to be unit tested.
Use flake8 or black to format the code before committing. Both of these follow the PEP8 and Google Python Style style guides.
mypy - Python must go through static type checking.

Also check out MSFT Encoding Style for more advice.

5.2. About training, inference, mining docker images and model package structure

Check this document for training, inference and mining details.

Check this document for model package structure details.

6. Design concept

We use the concept of code version control in Git to manage our data and models, use the concept of branches to create new projects so that different tasks on the same set of images can run in parallel. The additions, retrievals, updates, and deletions of datasets and basic operations are created by commits to branches. Logically, each commit stores an updated version of the dataset or new model, as well as the metadata of the operation that led to this change. Finally, only the data changes are merged into the main branch. This branch conceptually aggregates all the data annotated by many projects on the platform. Please see Life of a dataset for specific design concepts.

7. MISC

7.1. FAQ

Why did the upload of the local dataset fail?

Regardless of whether the dataset has a label file, the images folder and annotations folder must be created. The images are placed in the images folder and the format is limited to jpg, jpeg, and png. The annotation files are placed in the annotations folder and the format is the pascal (when there is no annotation file, the folder is empty). Please put the images and annotations in the same folder and compress them into a ".zip" compressed package (not a .rar compressed format).

How should I obtain training and mining configuration files?

The default profile template needs to be extracted in the mirror.

The training image youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu111-tmi has a configuration file template located at: /img-man/training-template.yaml

Mining and inference mirrors The configuration file templates for youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu111-tmi are located at: /img-man/mining-template.yaml (mining) and /img-man/infer-template. yaml (infer).

How can the trained model be used outside the system?

After successful training, the system will output the ID of the model. The user can find the corresponding file according to this id at --model-location. In fact, it is a tar file that can be extracted directly using the tar command to get the "mxnet" model file in parameters and JSON format.

How to solve the deployment, debugging and operation problems encountered in windows system?

It has not been fully tested on Windows server, so we cannot provide service support for the time being.

How to import models I've already trained?

See this document.

ymir's People

Contributors

Stargazers

Watchers

Forkers

oncefrom under-chaos elliotmessi ijtljz8rm4yr fanghuaxue sun-shine6 luciferzap rise-sics-v allensmile liule1613 chengaohua fenrir-z macroustc mailheqiang glacier2021 tulw4r dorakbg moonzhz skytodmoon gongshaojie12 yezizhang wenxiang-li grong uniatlas b-xiang luomor-ai zsffuture guolixue zebrajack 13535148742 snoworld888 xlsean corner4world davidtangwen zhang-sj930104 tomdev2008 spancer amutong yosemite1998 softiger coldlarry liuzz07 yzbx stoensin zone-7 ringwraith fudp lsunix sampson2016 wangyougu010 lxl0928 yyjie zhouyinyan vince172869222 azzz777 qslee-net jeffreysu 519430889 avey777 amiequan data-infra leelevi cv-det daogeee happysky2046 ghq322 g7b9 zhangwd-dev vincentwei2021 qiufeng007 wulongyuan runland franksongsong jeffaudi onelevenvy lujiajun rzjm ownlu modelai yitonghanisabella pubalglib misaka187 jie311 wshmang spring-packer aryalfrat ai-jie01 gdbbq youlingzhanshi dianewu andylenmon liulangxing lyrhy huangxy31 mingshuo18 xinqi-ma nkgfirecream hotzwd yance-dev wubudomain

ymir's Issues

训练一段时间后，中途失败了，显示未知错误

[Describe the bug
训练一段时间后(大概一天左右)后失败了，显示未知错误
训练数据是2500张图片，测试数据是200张。
用的显卡是 RTX3050 8G 训练时修改过参数 batch 从64修改为16 图片尺寸参数从608----608修改为416----416

训练错误日志ymir-executor-out.log如下：
2022-07-07 04:35:28,817 - /darknet/train_watcher.py[line:43] - ERROR: error occured in handler: <function _DarknetTrainingHandler._on_best_weights_modified at 0x7ff41bca5830> and path: /out/models/yolov4_best.weights
Traceback (most recent call last):
File "/darknet/train_watcher.py", line 41, in on_modified
handler(self, src_path)
File "/darknet/train_watcher.py", line 52, in _on_best_weights_modified
export_dir='/out/models')
File "/darknet/convert_model_darknet2mxnet_yolov4.py", line 437, in run
net.load_weights(load_param_name)
File "/darknet/convert_model_darknet2mxnet_yolov4.py", line 409, in load_weights
ptr = set_data(module, ptr)
File "/darknet/convert_model_darknet2mxnet_yolov4.py", line 391, in set_data
conv_weights = weights[ptr:ptr +num_weights]
File "/usr/local/lib/python3.7/dist-packages/mxnet/ndarray/ndarray.py", line 511, in getitem
return self._get_nd_basic_indexing(key)
File "/usr/local/lib/python3.7/dist-packages/mxnet/ndarray/ndarray.py", line 792, in _get_nd_basic_indexing
return self._slice(key.start, key.stop)
File "/usr/local/lib/python3.7/dist-packages/mxnet/ndarray/ndarray.py", line 907, in _slice
start, stop, _ = _get_index_range(start, stop, self.shape[0])
File "/usr/local/lib/python3.7/dist-packages/mxnet/ndarray/ndarray.py", line 2343, in _get_index_range
raise IndexError('Slicing stop %d exceeds limit of %d' % (stop, length))
IndexError: Slicing stop 40305504 exceeds limit of 39606267

Environment (please complete the following information):

Server OS: Ubuntu20.04
Ymir Version release-1.0.0
ymir-executor-out.tar.gz

GUI运行训练模型时，报错permission denied nvidia-docker

如题，
按GUI部署步骤部署好后，web端能正常访问，数据集也可以成功导入。
但是开始训练模型时，提示失败，查看日志显示报错， permission denied nvidia-docker是什么原因呢。
报错如图所示，

另外，

新版本docker中，其实不用nvidia-docker命令了，用的是docker run --gpus all，目前该项目不支持这种模式吗？
程序运行时，好像默认是用root用户，包括新建的文件夹写的日志啥的，都是root的，能否设置成为使用当前用户运行呢？
我怎么导入自己的docker镜像到项目中呢，帮助文档里暂时没有找到相关的介绍。

谢谢

目前不支持windows部署吗？

关于LabelFree数据标注平台标注完后下载VOC格式的包，其中xml内宽高和对应图片的尺寸对应不上

可以出一个ymir-gui如何导入自制镜像的文档吗？

感觉有很多细节都不明白。自建的私有仓库一直无法导入，dockerhub上的公有仓库也无法导入，wuyuanmm1990/yolov5_docker:1.0。是有一些细节没注意到吗？

viewer模块手动构建成功以后，出现数据集详情bug

把viewer模块构建成功，但是在数据集详情点击查看页面接口会报500，但是我要是二次进入，或者点击页面上的随机页去刷新就会显示图片

ARM Rebuild

1、cd ymir/backend/src/ymir_app/deploy/redis
2、docker build -t industryessentials/ymir-backend-redis -f Dockerfile .

after rebuilding ,run the bash ymir.sh start
loging:
standard_init_linux.go:219: exec user process caused: exec format error

模型训练报错,失败原因 CMD: 未知错误

系统centos
t通过数据训练模型的时候一直报错,示例代码也一样,失败原因 CMD: 未知错误

install failed

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

git clone the repo
checkout to the tag of release-1.0.0
run the shell script by "bash ymir.sh start"
give me an error message like this screen

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Server OS: ubuntu:20.04
Ymir Version [ release-1.0.0]

Additional context
Add any other context about the problem here.

运行示例项目到标注这一步的时候，无法下载数据集

minio提示文件不存在

能提供部署详细一点的文档，或者图片吗，部署有点问题呀！

自制镜像相关问题

1.我们成功启动了训练，但未在容器里找到in文件夹，我们想通过in文件夹里的配置去改写我们自己的镜像文件
2.我们想自制YOLOX相关的镜像，但YMIR页面启动训练时默认YOLOV4,darknet的，能方便咨询一下这部分前端后端的代码在哪改写
谢谢！

Unable to terminate labeling task and fetch result dataset

点“终止”，没有任何效果

bash ymir.sh start

ERROR: The Compose file './docker-compose.yml' is invalid because:
networks.ymirnetwork.ipam.config value Additional properties are not allowed ('gateway' was unexpected)
Unsupported config option for services.backend: 'runtime'

有删除数据或项目的按钮吗

假如我不想要某一个项目、或者某一个数据集，那我要如何将它删除呢？

公共镜像复制到镜像列表时，一直转圈，日志文件里有错误信息

Desktop (please complete the following information):

OS: ubuntu22.04
Docker version 20.10.17, build 100c701
Docker Compose version v2.5.1
Version ymir-release-1.0.0

ymir_controller.log日志：
INFO : [20220630-01:40:36] invoker_cmd_base.py:83:server_invoke(): request:
{'user_id': '0001', 'repo_id': '000000', 'req_type': 16, 'task_id': 't0000001000000ae49a01656553236', 'singleton_op': 'industryessentials/executor-det-yolov4-training:release-1.1.0'}
async_mode: True
work_dir:
INFO : [20220630-01:40:36] utils.py:24:run_command(): starting cmd:
docker image inspect industryessentials/executor-det-yolov4-training:release-1.1.0 --format ignore_me

ERROR : [20220630-01:40:36] utils.py:27:run_command(): run cmd error:
stderr: docker: /usr/lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.32' not found (required by docker) docker: /usr/lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.34' not found (required by docker)

stdout:
INFO : [20220630-01:40:36] utils.py:24:run_command(): starting cmd:
docker pull industryessentials/executor-det-yolov4-training:release-1.1.0

stdout:
INFO : [20220630-01:40:36] utils.py:85:wrapper(): |-server_invoke costs 0.02s(0.00m).
INFO : [20220630-01:40:36] server.py:61:data_manage_request(): task t0000001000000ae49a01656553236 result: code: 130401
message: "docker: /usr/lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.32\' not found (required by docker)\ndocker: /usr/lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.34' not found (required by docker)\n"

Can not log in local YMIR server

Hello. First thanks for wonderful YMIR Project. I just install YMIR GUI on my computer, the app is running. But I can not log in the app after I enter the user name and password. Is there a way to debug the problem? For example log file or app output.

Suggestion: need a better log monitor and log standard levels

Describe the issue
Can't find the log file and check the problem quickly.

if something wrong, we need to find the current log at ymir-workplace/sandbox/work_dir/TaskTypeTraining/{taskid}/sub_task/t0xxxxxxxx/out/ymir-executor-out.log

And alos the log file is so large, may be something logs is not very helpful, suggest to add the log level control like debug/error/info etc.

尝试使用yolov4镜像推理时报错，以及使用其他公共镜像时，报错container error

您好，

我目前碰到了两个问题，

我的项目目前使用yolov4可以正常训练，模型训练好后，使用推理功能时，报错找不到result.yaml文件以及infer-result.json。
在使用除yolov4以外的其他公共镜像比如yolov5，mmdet训练时，报错Error: Could not load UVM kernel module. Is nvidia-modprobe installed。但是我的nvidia-modporbe已经正确安装了

Support cluster mode to deploy ?

Support cluster mode to deploy ?
I've deployed and test, It's very cool，But I have multiple GPU nodes. Do we support cluster mode to deploy?
If not, We plan to support it?

label-free can't work

Describe the issue
label-free can't work when i click the label

To Reproduce
Steps to reproduce the behavior:

Go to 'Label'
Upload my dataset (data.zip just contains 18 pictures)
See error

Screenshots

I want to label new image dataset, when I get the zip file uploaded, Ijust 18 pictures, it has loaded nearly one day with nothing . But I have import a dataset with 1k pictures in 10 seconds

when I enter the label-free site ,this also show data reading.

when i clicked the label or import button ,it reports "The current project is reading data.Please refresh the project list and try again

Desktop (please complete the following information):

OS: [Ubuntu 20.04 LTS]
Browser [chrome]
docker-version 20.10

Additional context
I also get label-free repo via git clone https://github.com/IndustryEssentials/label-free.git, it's the same problem, is there something wrong with the label-free platform? or am i doing something wrong?

提供ymir开发者手册

希望可以提供技术架构文档，以及ymir的大致设计方案
最好有文档可以指导怎么在ubuntu环境搭建本地可调试的ymir环境
方便代码爱好者更容易理解ymir

微信二维码过期了

如题，来个能用二维码？

cmd版数据导入失败

按照现在的教程，cmd版labels文件放在mir_demo_repo下，但实际操作会报错，repo is dirty

没有labels文件的话，又识别不到标签，

[Backend]tox error

I want to modify some backend code, but it can't go ahead.
===================================================== log end ===================================================== ERROR: could not install deps [-rrequirements.txt, -rrequirements-dev.txt]; v = InvocationError('/Users/xxx/Documents/code/ymir/ymir/backend/.tox/python/bin/python -m pip install -rrequirements.txt -rrequirements-dev.txt', 1) _____________________________________________________ summary _____________________________________________________

Model Verification:docker images only had sample_image

Describe the issue
Model Verification：docker images only had sample_image

To Reproduce
when I finished training the model(used yolov4) and wanted to verify it, the docker images only had samples

Screenshots

Env

OS: ubuntu:18.04
Ymir Version [ release-1.0.0]
Docker version 20.10.16, build aa7e414
Docker Compose version v2.5.1

Additional context
Add any other context about the problem here.

无法连接到标注平台

使用数据集，选择菜单栏标注按钮，输入邮箱，点击标注，标注平台获取不到任务，以下是报错。平台刚搭建好，是可用的，后面标注这块就不好使了。尝试了label_studio和label_free同样的现象。
`INFO : [20220621-01:10:25] invoker_cmd_base.py:83:server_invoke(): request:
{'user_id': '0001', 'repo_id': '000023', 'req_type': 1001, 'task_id': 't00000010000232949781655773825', 'task_parameters': '{"dataset_id": 125, "keywords": ["cat", "person"], "extra_url": null, "labellers": ["[email protected]"], "keep_annotations": true, "validation_dataset_id": null, "network": null, "backbone": null, "hyperparameter": null, "model_id": null, "mining_algorithm": null, "top_k": null, "generate_annotations": null, "docker_image": null, "docker_image_id": null}', 'req_create_task': {'task_type': 3, 'labeling': {'dataset_id': 't0000001000023ec62961655706852', 'labeler_accounts': ['[email protected]'], 'in_class_ids': [4, 3], 'project_name': 'label_$sample_project_None_1655706551.9790225_mining_dataset_3', 'export_annotation': True}}}
async_mode: True
work_dir: /home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825
INFO : [20220621-01:10:25] percent_log_util.py:65:write_percent_log(): writing task info to /home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825/sub_task/t00000010000232949781655773825/out/monitor.txt
t00000010000232949781655773825 1655773825.686129 0.0 2
INFO : [20220621-01:10:25] invoker_task_base.py:73:create_subtask_workdir_monitor(): task t00000010000232949781655773825 logging weights:
{'/home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825/sub_task/t00000010000232949781655773825/out/monitor.txt': 1.0}

DEBUG : [20220621-01:10:25] connectionpool.py:228:_new_conn(): Starting new HTTP connection (1): 127.0.0.1:9098
DEBUG : [20220621-01:10:25] connectionpool.py:456:make_request(): http://127.0.0.1:9098 "POST /api/v1/tasks HTTP/1.1" 200 20
INFO : [20220621-01:10:25] invoker_task_base.py:159:task_invoke(): processing subtask 0
INFO : [20220621-01:10:25] utils.py:85:wrapper(): |-server_invoke costs 0.01s(0.00m).
INFO : [20220621-01:10:25] server.py:61:data_manage_request(): task t00000010000232949781655773825 result:
INFO : [20220621-01:10:25] invoker_task_labeling.py:25:subtask_invoke_0(): labeling_request: dataset_id: "t0000001000023ec62961655706852"
labeler_accounts: "[email protected]"
in_class_ids: 4
in_class_ids: 3
project_name: "label$sample_project_None_1655706551.9790225_mining_dataset_3"
export_annotation: true

INFO : [20220621-01:10:25] label_runner.py:56:start_label_task(): start label task!!!
INFO : [20220621-01:10:25] utils.py:24:run_command(): starting cmd:
mir export --root /home/qianyan/ymir/ymir/ymir-workplace/sandbox/0001/000023 --media-location /home/qianyan/ymir/ymir/ymir-workplace/ymir-assets --asset-dir /home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825/sub_task/t00000010000232949781655773825/label_t00000010000232949781655773825/Images --annotation-dir /home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825/sub_task/t00000010000232949781655773825/label_t00000010000232949781655773825/Images --src-revs t0000001000023ec62961655706852@t0000001000023ec62961655706852 --format ls_json -w /home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825/sub_task/t00000010000232949781655773825/export_work_dir --cis cat;person

INFO : [20220621-01:10:26] utils.py:30:run_command(): run cmd succeed:
missing annotations: 0, empty annotations: 0 out of 2 assets
git result:
M annotations.mir
M context.mir
M keywords.mir
M metadatas.mir
M tasks.mir

[exporting-task-1655773826.1539779 43ab7ee] export from t0000001000023ec62961655706852@t0000001000023ec62961655706852
5 files changed, 17 deletions(-)
command done: exporting-task-1655773826.1539779@exporting-task-1655773826.1539779, return code: 0
|-cmd_run costs 0.14s(0.00m).

INFO : [20220621-01:10:26] label_free.py:168:run(): start LABELFREE run()
DEBUG : [20220621-01:10:26] connectionpool.py:228:_new_conn(): Starting new HTTP connection (1): xxxxx:8763
INFO : [20220621-01:12:36] percent_log_util.py:65:write_percent_log(): writing task info to /home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825/sub_task/t00000010000232949781655773825/out/monitor.txt
t00000010000232949781655773825 1655773956.380800 1.0 4 130603 HTTPConnectionPool(host='xxxxx', port=8763): Max retries exceeded with url: /api/projects (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2d4c251bb0>: Failed to establish a new connection: [Errno 110] Connection timed out'))
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/usr/local/lib/python3.8/dist-packages/urllib3/util/connection.py", line 95, in create_connection
raise err
File "/usr/local/lib/python3.8/dist-packages/urllib3/util/connection.py", line 85, in create_connection
sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 398, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 239, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "/usr/lib/python3.8/http/client.py", line 1256, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1302, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1251, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1011, in _send_output
self.send(msg)
File "/usr/lib/python3.8/http/client.py", line 951, in send
self.connect()
File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 205, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f2d4c251bb0>: Failed to establish a new connection: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 440, in send
resp = conn.urlopen(
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 785, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/dist-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='xxxxxx', port=8763): Max retries exceeded with url: /api/projects (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2d4c251bb0>: Failed to establish a new connection: [Errno 110] Connection timed out'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/app/ymir_controller/controller/label_model/base.py", line 19, in wrapper
_ret = f(*args, **kwargs)
File "/app/ymir_controller/controller/label_model/label_free.py", line 169, in run
project_id = self.create_label_project(project_name, keywords, collaborators, expert_instruction)
File "/app/ymir_controller/controller/label_model/label_free.py", line 55, in create_label_project
resp = self._requests.post(url_path=url_path, json_data=data)
File "/app/ymir_controller/controller/label_model/request_handler.py", line 23, in post
resp = requests.post(
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 117, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 529, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 645, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='xxxxx', port=8763): Max retries exceeded with url: /api/projects (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2d4c251bb0>: Failed to establish a new connection: [Errno 110] Connection timed out'))

INFO : [20220621-01:12:36] label_runner.py:78:start_label_task(): finish label task!!!`

why my nvidia driver is broken

Describe the issue
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #1: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown

To Reproduce
Steps to reproduce the behavior:

Go to 'bash ymir.sh start'
See error

Expected behavior
我运行 bash ymir.sh start ,进入了web页面，成功上传自己的数据包zip文件，点击label在进入标注页面，标注页面无法显。重启之后，发现显卡驱动有问题，我的显示器异常，用nvidia-smi 命令，显示无显卡驱动，为什么会破坏我的显卡驱动。而重新运行dash ymir.sh start, 能进入web页面，但一段时间之后，终端报错。web页面还是在
Screenshots，
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [Ubuntu20.04 LTS]
Browser [chrome 版本 101.0.4951.64]
Version [e.g. 22]
docker-version:20.10

大概什么时候发布1.2.0版本啊？

修改MYSQL_PASSWORD会导致初始化启动无法登录

Describe the bug
在没有ymir-workplace的前提下（即第一次初始化启动的时候），修改.env的MYSQL_PASSWORD，会导致启动后无法登录admin那个账号，查找mysql数据库发现无任何数据表（尝试了很多次，不修改MYSQL_PASSWORD进行初始化登录就没这个问题）

To Reproduce
Steps to reproduce the behavior:

删除ymir-workplace（如果有的话）
修改.env的MYSQL_PASSWORD
使用bash ymir.sh start 启动
使用[email protected]登录

顺便提一下：之所以会发现这个问题，是因为之前使用了旧版本的ymir，含有旧版本的数据，这次git pull更新代码后发现启动正常，但是使用标注数据-到标注平台注册账号（label_studio）会显示307，不得以才删除了ymir-workplace，删除后再初始化重启就可以正常重定向了。但不知道这个是偶然现象还是也是个bug（如果是bug后续有新版本升级的时候是否还会出现这种情况？如果无法重现，只是偶然现象的话可能是我自己的问题）

upload file error

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

ready source images
include annotations
zip one files
upload

Expected behavior
there are seventy files
but there is empty

Running YMIR in WSL2 fails

Hi
I fail to start YMIR on my windows PC using WSL2. I thinnk it is due to the docker compose is v2 but your code uses v1 (or something similar). See the execution trace below. I have checked that the Nvidia GPU drivers etc. are correctly installed.
/Tomas

Using default tag: latest
latest: Pulling from industryessentials/executor-det-yolov4-training
Digest: sha256:ab7fd377e7945ad668547921ee8b0ddd2a24a55c655614cc836ff5e5d93e1855
Status: Image is up to date for industryessentials/executor-det-yolov4-training:latest
docker.io/industryessentials/executor-det-yolov4-training:latest
Using default tag: latest
latest: Pulling from industryessentials/executor-det-yolov4-mining
Digest: sha256:17f6e5bf7192780acf897a8e24b6c10b280287daa4953e67e27d17bb483f6477
Status: Image is up to date for industryessentials/executor-det-yolov4-mining:latest
docker.io/industryessentials/executor-det-yolov4-mining:latest
[+] Running 6/6
⠿ Container ymir-clickhouse-1 Removed 0.1s
⠿ Container ymir-db-1 Removed 0.1s
⠿ Container ymir-viz-redis-1 Removed 0.1s
⠿ Container ymir-tensorboard-1 Removed 0.1s
⠿ Container ymir-redis-1 Removed 0.1s
⠿ Network ymir_ymirnetwork Removed 0.2s

in prod mode, pulling images.
[+] Running 7/7
⠿ backend Pulled 1.6s
⠿ viz-redis Pulled 1.6s
⠿ web Pulled 1.6s
⠿ clickhouse Pulled 1.6s
⠿ tensorboard Pulled 1.6s
⠿ redis Pulled 1.6s
⠿ db Pulled 1.6s
[+] Running 6/6
⠿ Network ymir_ymirnetwork Created 0.0s
⠿ Container ymir-tensorboard-1 Created 0.1s
⠿ Container ymir-redis-1 Created 0.1s
⠿ Container ymir-viz-redis-1 Created 0.1s
⠿ Container ymir-clickhouse-1 Created 0.1s
⠿ Container ymir-db-1 Created 0.1s
⠋ Container ymir-backend-1 Creating 0.0s
Error response from daemon: Unknown runtime specified nvidia

Database Error

label-free can't load dataset

using bash ymir.sh start but isn't successful

Describe the issue
My system is Windows 10 and already installed docker, wsl 2 and check using wsl2 to open docker. But when I go to next step "bash ymir.sh start", it downloaded something like yolo-mining-xxxx, then it closed by itself. I can't pick up the error by it's info.

To Reproduce
My installation step:

installed docker restart computer
installed wsl2 and set docker to use wsl2 then restart
use wsl vi to change '$\r' to '\x', if don't do this step it will showed like '$\r' not found .....
"bash ymir.sh start" then automatically download something it need
closed

Expected behavior
A clear and concise description of what you expected to happen.
start the ymir.
Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: windows 10
Browser: not use

Additional context
Add any other context about the problem here.

The runc can't train data

The runc can't train data, and there is no error in console.

502 Bad Gateway (nginx/1.21.6)

走docker-compose的方式和走bash ymir.sh start的方式，都报：502 Bad Gateway

版本：CommitID: 34aa07d

only support object detection task?

any plan about support more CV tasks other than object detection

It can't pull industryessentials/executor-det-yolov4-training from aliyun mirror

docker pull industryessentials/executor-det-yolov4-training

You can see that the layer 11323ed2c653 is Downloading, but can't find any thing to download.

8412b44cad21: Pulling fs layer

Using default tag: latest
latest: Pulling from industryessentials/executor-det-yolov4-training
11323ed2c653: Downloading
aac8cf1d1c79: Download complete
fc85802d11de: Download complete
5ecaf0dceab7: Download complete
52ea4560e85a: Download complete
de9ca64a95e6: Download complete
e4165bf6e171: Download complete
77cc6fb46c25: Downloading [================================>                  ]  519.5MB/804.9MB
f54c7e16b308: Download complete
4e21240e72c2: Downloading [=================================>                 ]  359.9MB/532.4MB

It will pull again from docker hub. It will be very slow in China. Please provider another docker registry in China.

关于LabelFree标注工具下载数据包出错问题

您好，想问一下超过1000张的任务显示下载超时的解决方案，已经尝试重新拉取下label free的镜像，依旧没有解决。
1）docker-compose -f docker-compose.labelfree.yml pull
2）bash ymir.sh start

点开镜像列表的公共镜像，提示分享镜像失败

ymir_app.log中提示：
Traceback (most recent call last):
File "/app/ymir_app/app/api/api_v1/endpoints/images.py", line 142, in get_shared_images
shared_images = await get_shared_images_from_github(
File "/usr/local/lib/python3.8/dist-packages/fastapi_cache/decorator.py", line 49, in inner
ret = await func(*args, **kwargs)
File "/app/ymir_app/app/api/api_v1/endpoints/images.py", line 154, in get_shared_images_from_github
shared_images = get_github_table(url, timeout=timeout)
File "/app/ymir_app/app/utils/github.py", line 37, in get_github_table
tbl = get_markdown_table(url, timeout)
File "/app/ymir_app/app/utils/github.py", line 13, in get_markdown_table
resp = requests.get(url, headers=HEADERS, timeout=timeout)
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 529, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 645, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /IndustryEssentials/ymir/master/docker_executor/public_index.md (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f651cab3400>: Failed to establish a new connection: [Errno 111] Connection refused'))
[2022-06-30 00:17:20 +0000] [52] [INFO] 172.168.254.8:59090 - "GET /api/v1/images/shared HTTP/1.0" 200

请问这是要连接哪个网址连接不上，应该如何解决呢？

ymir-workplace has no dir named "ymir-sharing", version Ymir-release-1.0.0

thanks for your kind to release this funny project! I try to use ymir on my pc,and I success to launch ymir,but when I import voc2012, I don't find "ymir-sharing", so I create it,and push voc2012 under this dir. whatever I do, I don't import voc2012 dataset on ymir-gui. Please help me, thanks!

the number picture is 235,but label failed and number is None

RC_CMD_ERROR_UNKNOWN: unkown error

my ymir service is running in a virtual machine， when I train the model, it prompts an error

model_group_23 > Train

Model Detail

Model Name	model_group_23 V2	mAP	0.00%

Task State

Current State	Invalid

Failure

Error Reason	RC_CMD_ERROR_UNKNOWN: unkown error

my evn:
1、vm
2、centos 7
3、Docker version 20.10.16
4、docker-compose version 1.27.2
5、YMIR config=> SERVER_RUNTIME=runc

Failed to start a training task

GPU not recognized

Describe the issue
GPU not recognized

To Reproduce
1、YMIR：

2、Nvidia Info

Environment (please complete the following information):

OS: ubuntu:18.04
Ymir Version [ release-1.0.0]
Docker version 20.10.16, build aa7e414
Docker Compose version v2.5.1
Nvidia-docker version:
NVIDIA Docker: 2.6.0
Client: Docker Engine - Community
Version: 20.10.16
API version: 1.41
Go version: go1.17.10
Git commit: aa7e414
Built: Thu May 12 09:17:23 2022
OS/Arch: linux/amd64
Context: default
Experimental: true

Server: Docker Engine - Community
Engine:
Version: 20.10.16
API version: 1.41 (minimum version 1.12)
Go version: go1.17.10
Git commit: f756502
Built: Thu May 12 09:15:28 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.4
GitCommit: 212e8b6fa2f44b9c21b2798135fc6fb7c53efc16
runc:
Version: 1.1.1
GitCommit: v1.1.1-0-g52de29d
docker-init:
Version: 0.19.0
GitCommit: de40ad0

界面不能正常跳转

一开始自己用test用户安装未成功，后来YMIR平台的技术人员远程帮忙用root用户安装，软件可以正常工作，非常感谢MIR平台的技术人员的支持。
因为是在笔记本上中起的虚拟机，虚拟机中再起的镜像，可能是由于虚拟机的非正常关机(笔记本没电了)，后来出现主界面能进去但不能正常使用的情况，咨询完平台的技术人员，解释可能是权限的问题，后自己重新选用test用户安装(清空了所有镜像，重新下载安装)，安装过程中除“ Found orphan containers (ymir-master_label_redis_1, ymir-master_label_api_1, ymir-master_label_minio_1, ymir-master_label_nginx_1, ymir-master_label_mysql_1, ymir-master_label_celery_worker_1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up”外未报其他问题(也不清楚“ Found orphan containers .......这些算不算问题)，情况跟之前的一样，界面能进去，之前创建的示例工程也还在(不明白为什么之前创建的示例工程还在)，但点击这个示例工程，界面不跳到详细页面中，点击界面中“标签管理”按钮，界面依然不能正常调转和显示。

Permission denied: 'nvidia-docker'

gpu负载问题

env：
系统： ubuntu 20.04
显卡：nvidia 2080ti 公版显卡

描述：
在训练过程中，查看显卡占用信息。只占用到gpu的一半资源。

需求：
想满负荷GPU资源进行训练，如何配置?

无法从界面正常跳转到标注系统label studio

labelstudio已经正确配置并且启动了，docker ps结果正常。
从ymir的web界面，点击数据里面的 “跳转到标注平台”

能够打开labelstudio，但是设置的token的无效，需要登陆。
登陆后重新点击 “跳转到标注平台” 而且labelstudio里面是空的，无法加载ymir平台上面的数据。
两个系统看起来是独立的。

非常规操作下，一个不算bug的问题的建议

描述：
在项目已有ymir-workplace的前提下（即项目已经有数据），移动整个ymir文件夹到其它路径下再部署（或重命名上一级文件夹），会导致所有新建数据集的预处理任务、标注任务均卡住为0%

步骤：
1、修改ymir上一级文件夹的路径/名字（ymir-workplace下的文件及其文件夹的所有者和权限保持与原来一致）
2、再部署
3、新建数据集预处理任务（如数据集采样）或标注任务

显示错误
ERROR : [2022-06-07 03:07:11,538] Job "update_monitor_percent_log (trigger: interval[0:00:20], next run at: 2022-06-07 03:07:31 UTC)" raised an exception
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/apscheduler/executors/base.py", line 125, in run_job
retval = job.func(*job.args, **job.kwargs)
File "ymir_monitor/monitor/utils/crontab_job.py", line 55, in update_monitor_percent_log
runtime_log_content = PercentLogHandler.parse_percent_log(log_path)
File "/app/common/common_utils/percent_log_util.py", line 31, in parse_percent_log
with open(log_file, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory:

查找原因，发现可能是redis数据库里的MONITOR_FINISHED_KEY:v1的raw_log_contents存的是绝对路径导致的。
修改MONITOR_FINISHED_KEY:v1里的值再启动，恢复

ps：感觉不算bug，毕竟是个非常规的部署操作，本来不想提的，一般人不会改名/改路径再启动，但是否redis存相对路径会好点？因为毕竟.env里设置了YMIR_PATH。

industryessentials / ymir Goto Github PK

ymir's Introduction

Citations

What's new

Deployment Prerequisite (optional)

Introduction

1. Introduction to AI SUITE-YMIR

1.1. Main functions

1.2. Apply for trial

2. Installation

2.1. Environment dependencies

2.2. Installation of YMIR-GUI

3. Use YMIR-GUI: typical model production process

4. For advanced users: YMIR-CMD (command line) user's guide

4.1 Installation

4.2 Typical model production process

5. Get the code

5.1. Code contribution

5.2. About training, inference, mining docker images and model package structure

6. Design concept

7. MISC

7.1. FAQ

ymir's People

Contributors

Stargazers

Watchers

Forkers

ymir's Issues

Recommend Projects

Recommend Topics

Recommend Org