Coder Social home page Coder Social logo

industryessentials / ymir Goto Github PK

View Code? Open in Web Editor NEW
577.0 16.0 149.0 11.22 MB

YMIR, a streamlined model development product.

License: Apache License 2.0

Shell 1.08% Dockerfile 0.03% Python 54.97% Mako 0.03% HTML 0.79% TypeScript 23.65% JavaScript 12.73% Less 1.69% CSS 0.71% Go 4.31%

ymir's Introduction

 
Official Site VISIT      Apply for Trial TRY IT OUT      SLACK Community WELCOME
 
 

📫 Feedback on usage issues: [email protected] / Professional consulting for server equipment: [email protected]

 
 

English | 简体中文

Citations

If you wish to refer to YMIR in your work, please use the following BibTeX entry.

@inproceedings{huang2021ymir,
      title={YMIR: A Rapid Data-centric Development Platform for Vision Applications},
      author={Phoenix X. Huang and Wenze Hu and William Brendel and Manmohan Chandraker and Li-Jia Li and Xiaoyu Wang},
      booktitle={Proceedings of the Data-Centric AI Workshop at NeurIPS},
      year={2021},
}

What's new

Version 2.0.0 updated on 11/08/2022

YMIR platform

  • A new model performance diagnosis module.
  • A new function for visual evaluation of model inference results.
  • Adding a public algorithm library with a variety of built-in high-precision algorithms.
  • One-click deployment function, supporting the deployment of algorithms to prerequisite certified devices.
  • New operating instruction.
  • Refactory code structure.

Docker

View more ymir-executor-fork

Within the public dockerimage

  • Update yolov5 training image: youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu111-tmi
  • Update mmdetection training image: youdaoyzbx/ymir-executor:ymir2.0.0-mmdet-cu111-tmi
  • Update yolov5 image with rv1126 chip deployment support: youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu111-tmid

More code updates ymir-dev.

Deployment Prerequisite (optional)

YMIR supports deploying the trained model and public algorithm model directly to the certified device, for more hardware specs, please check the details.

 
 

Introduction

Catalog

1. Introduction to AI SUITE-YMIR

As a streamlined model development product, YMIR(You Mine In Recursion) focuses on the dataset versioning and model iteration in the AI SUITE open-source series.

 
 

AI commercialization is currently reaching a stage of maturity in terms of computing hardwares, algorithms, etc. The adoption of AI often encounter challenges such as a lack of skilled developers, high development costs and long iteration cycles.

As a platform, YMIR provides an end-to-end AI development system. This platform reduces costs for companies using artificial intelligence and accelerates the adoption of artificial intelligence. YMIR provides ML developers with one-stop services for data processing, model training, and other steps required in the AI development cycle.

The YMIR platform provides effective model development capabilities with a data-centric approach. The platform integrates concepts such as active learning methods, data and model version control, and workspace. Parallel rapid iteration of data sets and projects for multiple specific tasks are realized by YMIR. The platform uses an open API design, so third-party tools can also be integrated into the system.

1.1. Main functions

A typical model development process can usually be summarized in a few steps: defining the task, preparing the data, training the model, evaluating the model, and deploying the model.

  • Define the target: Before starting an AI development project, it is important to be clear about what is to be analyzed. This will help developers correctly convert the problem into several typical ML modeling tasks, such as image classification, object detection, etc. Different tasks have different data requirements.

  • Prepare data: Data preparation is the foundation of a successful AI project. The most important task in this step is to ensure the quality of data and its annotations. Collect all the required data at the beginning is the optimal situation for many projects. Therefore, the project developer may find that some data is missing in subsequent stages. Additional data could be necessary upon the project needs.

  • Train model: This operation is commonly referred to "modeling". This step refers to the exploration and analysis of prepared data to discover the internal patterns and any links between the input and the expected prediction target. The result of this step is usually one or more machine learning models. These models can be applied to new data to obtain predictions. Developers train their own models using mainstream model training frameworks, such as pytorch, tensorflow, darknet, etc.

  • Evaluate model: The entire development process has not yet been completed after training the model. Models need to be evaluated and checked before being put into production. Normally, get a production-quality model all at once is not so easy. You need to adjust parameters, and iterate the model continuously. Some common metrics can help you evaluate models quantitatively and pick a satisfactory model. Some common metrics can help you to evaluate models quantitatively.

  • Deploy model: Models are developed and trained based on previously available data (possibly test data). After a satisfactory model is obtained, it will be applied to real data to make predictions at scale.

YMIR platform mainly meets the needs of users to produce models at scale, provides users with a good and easy-to-use display interface, and facilitates the management and viewing of data and models. The platform contains main functional modules such as project management, tag management, model deployment, system configuration, dockerimage management, etc. It supports the realization of the following main functions.

Function Module Primary Function Secondary Function Function Description
Project Management Project Management Project Editing Supports adding, deleting, and editing projects and project information
Project Management Iteration Management Iteration Preparation Supports setting up the dataset and model information needed for iteration
Project Management Iteration Management Iteration Steps Support to populate the data from the previous round to the next step corresponding to the task parameters
Project Management Iteration Management Iteration Charts Support to display the datasets and models generated during the iterative process in the interface as a graphical comparison
Project Management Dataset Management Import datasets Support users to import prepared datasets by copying public datasets, url addresses, paths, and local imports
Project Management Data Set Management View Data Sets Supports visualization of image data and annotations, and viewing of historical information
Project Management Data Set Management Edit Data Set Support editing and deleting data sets
Project Management Dataset Management Dataset Versions Support creating new dataset versions on the source dataset, with the version number incremented by time
Project Management Data Set Management Data Preprocessing Support image data fusion, filtering, sampling operations
Project Management Data Set Management Data Mining Supports finding the most beneficial data for model optimization in a large number of data sets
Project Management Data Set Management Data Annotation Support for adding annotations to image data
Project Management Data Set Management Data Inference Supports adding annotations to a data set by specifying a model
Project Management Model Management Model Import Support local import of model files to the platform
Project Management Model Management Training Models Support to select datasets, labels, and adjust training parameters to train models according to requirements, and view the corresponding model results after completion
Project Management Model Management Model Validation Support uploading a single image to check the performance of the model in real images through visualization to verify the accuracy of the model
Tag management Tag management Add tags Support adding primary names and aliases of training tags
Model Deployment Algorithm Management Public Algorithm Support algorithm customization, view public algorithms and try them out, support adding to my algorithms
Model Deployment Algorithm Management Public Algorithm Support publishing my algorithms to public algorithms
Model Deployment Algorithm Management My Algorithms Support for viewing and editing my published algorithms and added algorithms
Model Deployment Algorithm Management Deploy Algorithms Support deploying my algorithms to devices and viewing deployment history
Model Deployment Device Management View Devices Support viewing device information and deployment history
Model Deployment Device Management Edit Device Support adding, deploying, and deleting devices
Model Deployment Device Management Support Devices Support viewing and purchasing of supported devices
System Configuration Mirror Management My Mirrors Support for adding custom mirrors to the system (available to administrators only)
System Configuration Mirror Management Public Mirror Support for viewing public mirrors uploaded by others and copying them to your own system
System Configuration Permissions Configuration Permissions Management Support for configuring user permissions (available only to administrators)

1.2. Apply for trial

We provide an online trial version for your convenience. If you need, please fill out the Apply for YMIR Trial , and we will send the trial information to your email address.

2. Installation

How do users choose to install GUI or CMD?

  1. The GUI verision with the supports of model training and model iteration is more suitable for ordinary users.

  2. If you need to modify the default configuration of the system, it is recommended to install CMD;

  3. If you have already deployed the existing version of ymir, please refer to the Upgrade Instructions.

This chapter contains the installation instructions for YMIR-GUI. If you need to use CMD, please refer to the Ymir-CMD user guide.

2.1. Environment dependencies

1.NVIDIA drivers shall be properly installed before installing YMIR. For detailed instructions, see https://www.nvidia.cn/geforce/drivers/.

  1. Docker and Docker Compose installation:
## check the maximum CUDA version supported by the host
nvidia-smi
## for Host support cuda 11+, check nvidia-docker
sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
## for Host support cuda 10+, check nvidia-docker
sudo docker run --rm --gpus all nvidia/cuda:10.2-base-ubuntu18.04 nvidia-smi
## those commands should result in a console output shown below:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.60.02    Driver Version: 510.60.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:0B:00.0 Off |                    0 |
| N/A   62C    P0    55W /  75W |   4351MiB /  7680MiB |     94%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      8132      C                                    4349MiB |
+-----------------------------------------------------------------------------+
  1. Hardware Suggestions
  • NVIDIA GeForce RTX 2080 Ti or higher is recommended.

  • The maximum CUDA version supported by the host >= 11.2

2.2. Installation of YMIR-GUI

The user must ensure that all the conditions in Cuda environment dependencies have been met, otherwise the installation may fail.

The YMIR-GUI project package is on DockerHub and the steps to install and deploy YMIR are as follows:

  1. Clone the deployment project YMIR to the local server:
git clone https://github.com/IndustryEssentials/ymir.git
  1. If there is no available GPU and you need to install CPU mode, please change it to CPU boot mode by modifying the .env file to change the SERVER_RUNTIME parameter to runc:

# nvidia for gpu, runc for cpu.

SERVER_RUNTIME=runc

  1. If you do not need to use the label free labeling platform, you can directly execute the start command with the default configuration: bash ymir.sh start.It is recommended not to use the sudo command, otherwise it may cause insufficient privileges.
  • When the service starts, it asks the user if they want to send usage reports to the YMIR development team, the default is yes if you do not enter it.
  • The default port number for YMIR's Model Deployment module is 18801. If there is a conflict that needs to be modified, you need to go to the YMIR directory and modify the .env file to configure the ModelDeployment port and MySQL access password:
DEPLOY_MODULE_HOST_PORT=18801
DEPLOY_MODULE_URL=${DEPLOY_MODULE_HOST_PORT}
DEPLOY_MODULE_MYSQL_ROOT_PASSWORD=deploy_db_passwd

Execute the start command after the modification: bash ymir.sh start.

  1. After the service successfully started, YMIR will be available at http://localhost:12001/. If you need to stop the service, run the command: bash ymir.sh stop

  2. The default initial user is super administrator, you can check account and password through the .env file under the project path and modify it before deployment. It is recommended to change the password through the user management interface after the service deployment is completed.

 
  1. Enter the .env file configuration, set the outgoing mailbox information, and only after the configuration is complete can you send email notifications.
# Email Notification
EMAILS_ENABLED=True
FRONTEND_ENTRYPOINT=<YMIR FRONTEND URL>
SMTP_TLS=
SMTP_PORT=
SMTP_HOST=
SMTP_USER=
SMTP_PASSWORD=
EMAILS_FROM_EMAIL= <SENDER EMAIL ADDRESS>
EMAILS_FROM_NAME=ymir-project
EMAIL_RESET_TOKEN_EXPIRE_HOURS=1
EMAIL_TEMPLATES_DIR=/app/email-templates/build

3. Use YMIR-GUI: typical model production process

 
 

As shown in the figure, YMIR divides the model development process into multiple steps. Details about how to run each step are listed in the subsequent sections.

Data and labels are necessary for the training of deep learning, and the training requires a large amount of data with labels. However, what exists in reality is a large amount of unlabeled data, which is too costly in terms of labor and time if all of them are manually labeled.

Therefore, YMIR platform, through active learning, first attains an initial model by local import or a small amount of labeled data, and uses this initial model to mine the most beneficial data for model capability improvement from a large amount of data. After the mining is completed, only this part of the data is labeled and the original training dataset is expanded efficiently.

The updated dataset is used to train the model again to improve the model capability. The YMIR platform provides a more efficient approach than labeling the entire data and then training it, reducing the cost of labeling low-quality data. Through the cycle of mining, labeling, and training, high quality data is expanded and the model capability is improved.

This section uses a complete model iteration process as an example to illustrate how to use the YMIR platform. Please check Operating Instructions.

4. For advanced users: YMIR-CMD (command line) user's guide

This chapter contains the instructions for the YMIR-CMD. If you need to use the GUI, please refer to Ymir-GUI Installation.

4.1 Installation

Mode I. Pip Installation

# Requires >= Python3.8.10
$ pip install ymir-cmd
$ mir --vesion

Mode II. Installation from the source

$ git clone --recursive https://github.com/IndustryEssentials/ymir.git
$ cd ymir/ymir/command
$ python setup.py clean --all install
$ mir --version

4.2 Typical model production process

process-en

The above figure shows a typical process of model training: 1) the user prepares external data, 2) imports it into the system, 3) appropriately filters the data, and 4) begins training to obtain a model (possibly with low accuracy). 5) selects images in a dataset to be mined that are suitable for further training based on this model, 6) annotates these images, 7) merges the annotated results with the original training set, and 8) uses the merged results to run the training process again to obtain a better model. This section implement the process shown above using the command line. For details, please check the CMD usage instructions.

5. Get the code

5.1. Code contribution

Any code in the YMIR repo should follow the coding standards and will be checked in the CI tests.

  • Functional code needs to be unit tested.

  • Use flake8 or black to format the code before committing. Both of these follow the PEP8 and Google Python Style style guides.

  • mypy - Python must go through static type checking.

Also check out MSFT Encoding Style for more advice.

5.2. About training, inference, mining docker images and model package structure

Check this document for training, inference and mining details.

Check this document for model package structure details.

6. Design concept

We use the concept of code version control in Git to manage our data and models, use the concept of branches to create new projects so that different tasks on the same set of images can run in parallel. The additions, retrievals, updates, and deletions of datasets and basic operations are created by commits to branches. Logically, each commit stores an updated version of the dataset or new model, as well as the metadata of the operation that led to this change. Finally, only the data changes are merged into the main branch. This branch conceptually aggregates all the data annotated by many projects on the platform. Please see Life of a dataset for specific design concepts.

7. MISC

7.1. FAQ

Why did the upload of the local dataset fail?

Regardless of whether the dataset has a label file, the images folder and annotations folder must be created. The images are placed in the images folder and the format is limited to jpg, jpeg, and png. The annotation files are placed in the annotations folder and the format is the pascal (when there is no annotation file, the folder is empty). Please put the images and annotations in the same folder and compress them into a ".zip" compressed package (not a .rar compressed format).

How should I obtain training and mining configuration files?

The default profile template needs to be extracted in the mirror.

The training image youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu111-tmi has a configuration file template located at: /img-man/training-template.yaml

Mining and inference mirrors The configuration file templates for youdaoyzbx/ymir-executor:ymir2.0.0-yolov5-cu111-tmi are located at: /img-man/mining-template.yaml (mining) and /img-man/infer-template. yaml (infer).

How can the trained model be used outside the system?

After successful training, the system will output the ID of the model. The user can find the corresponding file according to this id at --model-location. In fact, it is a tar file that can be extracted directly using the tar command to get the "mxnet" model file in parameters and JSON format.

How to solve the deployment, debugging and operation problems encountered in windows system?

It has not been fully tested on Windows server, so we cannot provide service support for the time being.

How to import models I've already trained?

See this document.

All Contributors

ymir's People

Contributors

aryalfrat avatar elliotmessi avatar fenrir-z avatar ijtljz8rm4yr avatar liule1613 avatar liuzz07 avatar phoenix-xhuang avatar pubalglib avatar rzjm avatar sun-shine6 avatar under-chaos avatar windsorhwu avatar yance-dev avatar yzbx avatar zhang-sj930104 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ymir's Issues

训练一段时间后,中途失败了,显示未知错误

[Describe the bug
训练一段时间后(大概一天左右)后失败了,显示未知错误
训练数据是2500张图片,测试数据是200张。
用的显卡是 RTX3050 8G 训练时修改过参数 batch 从64修改为16 图片尺寸参数从608----608修改为416----416

训练错误日志ymir-executor-out.log如下:
2022-07-07 04:35:28,817 - /darknet/train_watcher.py[line:43] - ERROR: error occured in handler: <function _DarknetTrainingHandler._on_best_weights_modified at 0x7ff41bca5830> and path: /out/models/yolov4_best.weights
Traceback (most recent call last):
File "/darknet/train_watcher.py", line 41, in on_modified
handler(self, src_path)
File "/darknet/train_watcher.py", line 52, in _on_best_weights_modified
export_dir='/out/models')
File "/darknet/convert_model_darknet2mxnet_yolov4.py", line 437, in run
net.load_weights(load_param_name)
File "/darknet/convert_model_darknet2mxnet_yolov4.py", line 409, in load_weights
ptr = set_data(module, ptr)
File "/darknet/convert_model_darknet2mxnet_yolov4.py", line 391, in set_data
conv_weights = weights[ptr:ptr +num_weights]
File "/usr/local/lib/python3.7/dist-packages/mxnet/ndarray/ndarray.py", line 511, in getitem
return self._get_nd_basic_indexing(key)
File "/usr/local/lib/python3.7/dist-packages/mxnet/ndarray/ndarray.py", line 792, in _get_nd_basic_indexing
return self._slice(key.start, key.stop)
File "/usr/local/lib/python3.7/dist-packages/mxnet/ndarray/ndarray.py", line 907, in _slice
start, stop, _ = _get_index_range(start, stop, self.shape[0])
File "/usr/local/lib/python3.7/dist-packages/mxnet/ndarray/ndarray.py", line 2343, in _get_index_range
raise IndexError('Slicing stop %d exceeds limit of %d' % (stop, length))
IndexError: Slicing stop 40305504 exceeds limit of 39606267

Environment (please complete the following information):

GUI运行训练模型时,报错permission denied nvidia-docker

如题,
按GUI部署步骤部署好后,web端能正常访问,数据集也可以成功导入。
但是开始训练模型时,提示失败,查看日志显示报错, permission denied nvidia-docker是什么原因呢。
报错如图所示,
2AQgvPcwyV

另外,

  1. 新版本docker中,其实不用nvidia-docker命令了,用的是docker run --gpus all,目前该项目不支持这种模式吗?
  2. 程序运行时,好像默认是用root用户,包括新建的文件夹写的日志啥的,都是root的,能否设置成为使用当前用户运行呢?
  3. 我怎么导入自己的docker镜像到项目中呢,帮助文档里暂时没有找到相关的介绍。

谢谢

ARM Rebuild

1、cd ymir/backend/src/ymir_app/deploy/redis
2、docker build -t industryessentials/ymir-backend-redis -f Dockerfile .

after rebuilding ,run the bash ymir.sh start
loging:
standard_init_linux.go:219: exec user process caused: exec format error

install failed

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. git clone the repo
  2. checkout to the tag of release-1.0.0
  3. run the shell script by "bash ymir.sh start"
  4. give me an error message like this screen

d00b05b68f9ae9385137f18da43bb24

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • Server OS: ubuntu:20.04
  • Ymir Version [ release-1.0.0]

Additional context
Add any other context about the problem here.

自制镜像相关问题

1.我们成功启动了训练,但未在容器里找到in文件夹,我们想通过in文件夹里的配置去改写我们自己的镜像文件
2.我们想自制YOLOX相关的镜像,但YMIR页面启动训练时默认YOLOV4,darknet的,能方便咨询一下这部分前端后端的代码在哪改写
谢谢!
image
image
image

bash ymir.sh start

ERROR: The Compose file './docker-compose.yml' is invalid because:
networks.ymirnetwork.ipam.config value Additional properties are not allowed ('gateway' was unexpected)
Unsupported config option for services.backend: 'runtime'

公共镜像复制到镜像列表时,一直转圈,日志文件里有错误信息

公共镜像复制到镜像列表时,一直转圈,日志文件里有错误信息

Desktop (please complete the following information):

  • OS: ubuntu22.04
  • Docker version 20.10.17, build 100c701
  • Docker Compose version v2.5.1
  • Version ymir-release-1.0.0

ymir_controller.log日志:
INFO : [20220630-01:40:36] invoker_cmd_base.py:83:server_invoke(): request:
{'user_id': '0001', 'repo_id': '000000', 'req_type': 16, 'task_id': 't0000001000000ae49a01656553236', 'singleton_op': 'industryessentials/executor-det-yolov4-training:release-1.1.0'}
async_mode: True
work_dir:
INFO : [20220630-01:40:36] utils.py:24:run_command(): starting cmd:
docker image inspect industryessentials/executor-det-yolov4-training:release-1.1.0 --format ignore_me

ERROR : [20220630-01:40:36] utils.py:27:run_command(): run cmd error:
stderr: docker: /usr/lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.32' not found (required by docker) docker: /usr/lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.34' not found (required by docker)

stdout:
INFO : [20220630-01:40:36] utils.py:24:run_command(): starting cmd:
docker pull industryessentials/executor-det-yolov4-training:release-1.1.0

ERROR : [20220630-01:40:36] utils.py:27:run_command(): run cmd error:
stderr: docker: /usr/lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.32' not found (required by docker) docker: /usr/lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.34' not found (required by docker)

stdout:
INFO : [20220630-01:40:36] utils.py:85:wrapper(): |-server_invoke costs 0.02s(0.00m).
INFO : [20220630-01:40:36] server.py:61:data_manage_request(): task t0000001000000ae49a01656553236 result: code: 130401
message: "docker: /usr/lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.32\' not found (required by docker)\ndocker: /usr/lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.34' not found (required by docker)\n"

Can not log in local YMIR server

Hello. First thanks for wonderful YMIR Project. I just install YMIR GUI on my computer, the app is running. But I can not log in the app after I enter the user name and password. Is there a way to debug the problem? For example log file or app output.

Suggestion: need a better log monitor and log standard levels

Describe the issue
Can't find the log file and check the problem quickly.

if something wrong, we need to find the current log at ymir-workplace/sandbox/work_dir/TaskTypeTraining/{taskid}/sub_task/t0xxxxxxxx/out/ymir-executor-out.log

And alos the log file is so large, may be something logs is not very helpful, suggest to add the log level control like debug/error/info etc.

尝试使用yolov4镜像推理时报错,以及使用其他公共镜像时,报错container error

您好,

我目前碰到了两个问题,

  1. 我的项目目前使用yolov4可以正常训练,模型训练好后,使用推理功能时,报错找不到result.yaml文件以及infer-result.json。
    image

  2. 在使用除yolov4以外的其他公共镜像比如yolov5,mmdet训练时,报错Error: Could not load UVM kernel module. Is nvidia-modprobe installed。但是我的nvidia-modporbe已经正确安装了
    image
    image

Support cluster mode to deploy ?

Support cluster mode to deploy ?
I've deployed and test, It's very cool,But I have multiple GPU nodes. Do we support cluster mode to deploy?
If not, We plan to support it?

label-free can't work

Describe the issue
label-free can't work when i click the label

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'Label'
  2. Upload my dataset (data.zip just contains 18 pictures)
  3. See error

Screenshots

I want to label new image dataset, when I get the zip file uploaded, Ijust 18 pictures, it has loaded nearly one day with nothing . But I have import a dataset with 1k pictures in 10 seconds

Screenshot from 2022-05-14 15-18-07

when I enter the label-free site ,this also show data reading.

Screenshot from 2022-05-14 14-51-03

when i clicked the label or import button ,it reports "The current project is reading data.Please refresh the project list and try again
Screenshot from 2022-05-14 14-57-48

Desktop (please complete the following information):

  • OS: [Ubuntu 20.04 LTS]
  • Browser [chrome]
  • docker-version 20.10

Additional context
I also get label-free repo via git clone https://github.com/IndustryEssentials/label-free.git, it's the same problem, is there something wrong with the label-free platform? or am i doing something wrong?

提供ymir开发者手册

希望可以提供技术架构文档,以及ymir的大致设计方案
最好有文档可以指导怎么在ubuntu环境搭建本地可调试的ymir环境
方便代码爱好者更容易理解ymir

cmd版数据导入失败

按照现在的教程,cmd版labels文件放在mir_demo_repo下,但实际操作会报错,repo is dirty
image
没有labels文件的话,又识别不到标签,
image

[Backend]tox error

I want to modify some backend code, but it can't go ahead.
===================================================== log end ===================================================== ERROR: could not install deps [-rrequirements.txt, -rrequirements-dev.txt]; v = InvocationError('/Users/xxx/Documents/code/ymir/ymir/backend/.tox/python/bin/python -m pip install -rrequirements.txt -rrequirements-dev.txt', 1) _____________________________________________________ summary _____________________________________________________

Model Verification:docker images only had sample_image

Describe the issue
Model Verification:docker images only had sample_image

To Reproduce
when I finished training the model(used yolov4) and wanted to verify it, the docker images only had samples

Screenshots
111
111

Env

  • OS: ubuntu:18.04
  • Ymir Version [ release-1.0.0]
  • Docker version 20.10.16, build aa7e414
  • Docker Compose version v2.5.1

Additional context
Add any other context about the problem here.

无法连接到标注平台

使用数据集,选择菜单栏标注按钮,输入邮箱,点击标注,标注平台获取不到任务,以下是报错。平台刚搭建好,是可用的,后面标注这块就不好使了。尝试了label_studio和label_free同样的现象。
`INFO : [20220621-01:10:25] invoker_cmd_base.py:83:server_invoke(): request:
{'user_id': '0001', 'repo_id': '000023', 'req_type': 1001, 'task_id': 't00000010000232949781655773825', 'task_parameters': '{"dataset_id": 125, "keywords": ["cat", "person"], "extra_url": null, "labellers": ["[email protected]"], "keep_annotations": true, "validation_dataset_id": null, "network": null, "backbone": null, "hyperparameter": null, "model_id": null, "mining_algorithm": null, "top_k": null, "generate_annotations": null, "docker_image": null, "docker_image_id": null}', 'req_create_task': {'task_type': 3, 'labeling': {'dataset_id': 't0000001000023ec62961655706852', 'labeler_accounts': ['[email protected]'], 'in_class_ids': [4, 3], 'project_name': 'label_$sample_project_None_1655706551.9790225_mining_dataset_3', 'export_annotation': True}}}
async_mode: True
work_dir: /home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825
INFO : [20220621-01:10:25] percent_log_util.py:65:write_percent_log(): writing task info to /home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825/sub_task/t00000010000232949781655773825/out/monitor.txt
t00000010000232949781655773825 1655773825.686129 0.0 2
INFO : [20220621-01:10:25] invoker_task_base.py:73:create_subtask_workdir_monitor(): task t00000010000232949781655773825 logging weights:
{'/home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825/sub_task/t00000010000232949781655773825/out/monitor.txt': 1.0}

DEBUG : [20220621-01:10:25] connectionpool.py:228:_new_conn(): Starting new HTTP connection (1): 127.0.0.1:9098
DEBUG : [20220621-01:10:25] connectionpool.py:456:make_request(): http://127.0.0.1:9098 "POST /api/v1/tasks HTTP/1.1" 200 20
INFO : [20220621-01:10:25] invoker_task_base.py:159:task_invoke(): processing subtask 0
INFO : [20220621-01:10:25] utils.py:85:wrapper(): |-server_invoke costs 0.01s(0.00m).
INFO : [20220621-01:10:25] server.py:61:data_manage_request(): task t00000010000232949781655773825 result:
INFO : [20220621-01:10:25] invoker_task_labeling.py:25:subtask_invoke_0(): labeling_request: dataset_id: "t0000001000023ec62961655706852"
labeler_accounts: "[email protected]"
in_class_ids: 4
in_class_ids: 3
project_name: "label
$sample_project_None_1655706551.9790225_mining_dataset_3"
export_annotation: true

INFO : [20220621-01:10:25] label_runner.py:56:start_label_task(): start label task!!!
INFO : [20220621-01:10:25] utils.py:24:run_command(): starting cmd:
mir export --root /home/qianyan/ymir/ymir/ymir-workplace/sandbox/0001/000023 --media-location /home/qianyan/ymir/ymir/ymir-workplace/ymir-assets --asset-dir /home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825/sub_task/t00000010000232949781655773825/label_t00000010000232949781655773825/Images --annotation-dir /home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825/sub_task/t00000010000232949781655773825/label_t00000010000232949781655773825/Images --src-revs t0000001000023ec62961655706852@t0000001000023ec62961655706852 --format ls_json -w /home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825/sub_task/t00000010000232949781655773825/export_work_dir --cis cat;person

INFO : [20220621-01:10:26] utils.py:30:run_command(): run cmd succeed:
missing annotations: 0, empty annotations: 0 out of 2 assets
git result:
M annotations.mir
M context.mir
M keywords.mir
M metadatas.mir
M tasks.mir

[exporting-task-1655773826.1539779 43ab7ee] export from t0000001000023ec62961655706852@t0000001000023ec62961655706852
5 files changed, 17 deletions(-)
command done: exporting-task-1655773826.1539779@exporting-task-1655773826.1539779, return code: 0
|-cmd_run costs 0.14s(0.00m).

INFO : [20220621-01:10:26] label_free.py:168:run(): start LABELFREE run()
DEBUG : [20220621-01:10:26] connectionpool.py:228:_new_conn(): Starting new HTTP connection (1): xxxxx:8763
INFO : [20220621-01:12:36] percent_log_util.py:65:write_percent_log(): writing task info to /home/qianyan/ymir/ymir/ymir-workplace/sandbox/work_dir/TaskTypeLabel/t00000010000232949781655773825/sub_task/t00000010000232949781655773825/out/monitor.txt
t00000010000232949781655773825 1655773956.380800 1.0 4 130603 HTTPConnectionPool(host='xxxxx', port=8763): Max retries exceeded with url: /api/projects (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2d4c251bb0>: Failed to establish a new connection: [Errno 110] Connection timed out'))
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/usr/local/lib/python3.8/dist-packages/urllib3/util/connection.py", line 95, in create_connection
raise err
File "/usr/local/lib/python3.8/dist-packages/urllib3/util/connection.py", line 85, in create_connection
sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 398, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 239, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "/usr/lib/python3.8/http/client.py", line 1256, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1302, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1251, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1011, in _send_output
self.send(msg)
File "/usr/lib/python3.8/http/client.py", line 951, in send
self.connect()
File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 205, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f2d4c251bb0>: Failed to establish a new connection: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 440, in send
resp = conn.urlopen(
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 785, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/dist-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='xxxxxx', port=8763): Max retries exceeded with url: /api/projects (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2d4c251bb0>: Failed to establish a new connection: [Errno 110] Connection timed out'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/app/ymir_controller/controller/label_model/base.py", line 19, in wrapper
_ret = f(*args, **kwargs)
File "/app/ymir_controller/controller/label_model/label_free.py", line 169, in run
project_id = self.create_label_project(project_name, keywords, collaborators, expert_instruction)
File "/app/ymir_controller/controller/label_model/label_free.py", line 55, in create_label_project
resp = self._requests.post(url_path=url_path, json_data=data)
File "/app/ymir_controller/controller/label_model/request_handler.py", line 23, in post
resp = requests.post(
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 117, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 529, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 645, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='xxxxx', port=8763): Max retries exceeded with url: /api/projects (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2d4c251bb0>: Failed to establish a new connection: [Errno 110] Connection timed out'))

INFO : [20220621-01:12:36] label_runner.py:78:start_label_task(): finish label task!!!`

why my nvidia driver is broken

Describe the issue
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #1: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'bash ymir.sh start'
  2. See error

Expected behavior
我运行 bash ymir.sh start ,进入了web页面,成功上传自己的数据包zip文件,点击label在进入标注页面,标注页面无法显。重启之后,发现显卡驱动有问题,我的显示器异常,用nvidia-smi 命令,显示无显卡驱动,为什么会破坏我的显卡驱动。而重新运行dash ymir.sh start, 能进入web页面,但一段时间之后,终端报错。web页面还是在
Screenshots
If applicable, add screenshots to help explain your problem.
Screenshot from 2022-05-13 09-37-44
Screenshot from 2022-05-13 09-42-17
Screenshot from 2022-05-13 09-45-27

Desktop (please complete the following information):

  • OS: [Ubuntu20.04 LTS]
  • Browser [chrome 版本 101.0.4951.64]
  • Version [e.g. 22]
  • docker-version:20.10

修改MYSQL_PASSWORD会导致初始化启动无法登录

Describe the bug
在没有ymir-workplace的前提下(即第一次初始化启动的时候),修改.env的MYSQL_PASSWORD,会导致启动后无法登录admin那个账号,查找mysql数据库发现无任何数据表(尝试了很多次,不修改MYSQL_PASSWORD进行初始化登录就没这个问题)

To Reproduce
Steps to reproduce the behavior:

  1. 删除ymir-workplace(如果有的话)
  2. 修改.env的MYSQL_PASSWORD
  3. 使用bash ymir.sh start 启动
  4. 使用[email protected]登录

顺便提一下:之所以会发现这个问题,是因为之前使用了旧版本的ymir,含有旧版本的数据,这次git pull更新代码后发现启动正常,但是使用标注数据-到标注平台注册账号(label_studio)会显示307,不得以才删除了ymir-workplace,删除后再初始化重启就可以正常重定向了。但不知道这个是偶然现象还是也是个bug(如果是bug后续有新版本升级的时候是否还会出现这种情况?如果无法重现,只是偶然现象的话可能是我自己的问题)

upload file error

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. ready source images
  2. include annotations
  3. zip one files
  4. upload

Expected behavior
there are seventy files
but there is empty

Running YMIR in WSL2 fails

Hi
I fail to start YMIR on my windows PC using WSL2. I thinnk it is due to the docker compose is v2 but your code uses v1 (or something similar). See the execution trace below. I have checked that the Nvidia GPU drivers etc. are correctly installed.
/Tomas

Using default tag: latest
latest: Pulling from industryessentials/executor-det-yolov4-training
Digest: sha256:ab7fd377e7945ad668547921ee8b0ddd2a24a55c655614cc836ff5e5d93e1855
Status: Image is up to date for industryessentials/executor-det-yolov4-training:latest
docker.io/industryessentials/executor-det-yolov4-training:latest
Using default tag: latest
latest: Pulling from industryessentials/executor-det-yolov4-mining
Digest: sha256:17f6e5bf7192780acf897a8e24b6c10b280287daa4953e67e27d17bb483f6477
Status: Image is up to date for industryessentials/executor-det-yolov4-mining:latest
docker.io/industryessentials/executor-det-yolov4-mining:latest
[+] Running 6/6
⠿ Container ymir-clickhouse-1 Removed 0.1s
⠿ Container ymir-db-1 Removed 0.1s
⠿ Container ymir-viz-redis-1 Removed 0.1s
⠿ Container ymir-tensorboard-1 Removed 0.1s
⠿ Container ymir-redis-1 Removed 0.1s
⠿ Network ymir_ymirnetwork Removed 0.2s

in prod mode, pulling images.
[+] Running 7/7
⠿ backend Pulled 1.6s
⠿ viz-redis Pulled 1.6s
⠿ web Pulled 1.6s
⠿ clickhouse Pulled 1.6s
⠿ tensorboard Pulled 1.6s
⠿ redis Pulled 1.6s
⠿ db Pulled 1.6s
[+] Running 6/6
⠿ Network ymir_ymirnetwork Created 0.0s
⠿ Container ymir-tensorboard-1 Created 0.1s
⠿ Container ymir-redis-1 Created 0.1s
⠿ Container ymir-viz-redis-1 Created 0.1s
⠿ Container ymir-clickhouse-1 Created 0.1s
⠿ Container ymir-db-1 Created 0.1s
⠋ Container ymir-backend-1 Creating 0.0s
Error response from daemon: Unknown runtime specified nvidia

using bash ymir.sh start but isn't successful

Describe the issue
My system is Windows 10 and already installed docker, wsl 2 and check using wsl2 to open docker. But when I go to next step "bash ymir.sh start", it downloaded something like yolo-mining-xxxx, then it closed by itself. I can't pick up the error by it's info.

To Reproduce
My installation step:

  1. installed docker restart computer
  2. installed wsl2 and set docker to use wsl2 then restart
  3. use wsl vi to change '$\r' to '\x', if don't do this step it will showed like '$\r' not found .....
  4. "bash ymir.sh start" then automatically download something it need
  5. closed

Expected behavior
A clear and concise description of what you expected to happen.
start the ymir.
Screenshots
If applicable, add screenshots to help explain your problem.
image

Desktop (please complete the following information):

  • OS: windows 10
  • Browser: not use

Additional context
Add any other context about the problem here.

It can't pull industryessentials/executor-det-yolov4-training from aliyun mirror

docker pull industryessentials/executor-det-yolov4-training

You can see that the layer 11323ed2c653 is Downloading, but can't find any thing to download.

8412b44cad21: Pulling fs layer

Using default tag: latest
latest: Pulling from industryessentials/executor-det-yolov4-training
11323ed2c653: Downloading
aac8cf1d1c79: Download complete
fc85802d11de: Download complete
5ecaf0dceab7: Download complete
52ea4560e85a: Download complete
de9ca64a95e6: Download complete
e4165bf6e171: Download complete
77cc6fb46c25: Downloading [================================>                  ]  519.5MB/804.9MB
f54c7e16b308: Download complete
4e21240e72c2: Downloading [=================================>                 ]  359.9MB/532.4MB

It will pull again from docker hub. It will be very slow in China. Please provider another docker registry in China.

image

关于LabelFree标注工具下载数据包出错问题

您好,想问一下超过1000张的任务显示下载超时的解决方案,已经尝试重新拉取下label free的镜像,依旧没有解决。
1)docker-compose -f docker-compose.labelfree.yml pull
2)bash ymir.sh start
WechatIMG402
Uploading WechatIMG648.jpeg…

点开镜像列表的公共镜像,提示分享镜像失败

ymir_app.log中提示:
Traceback (most recent call last):
File "/app/ymir_app/app/api/api_v1/endpoints/images.py", line 142, in get_shared_images
shared_images = await get_shared_images_from_github(
File "/usr/local/lib/python3.8/dist-packages/fastapi_cache/decorator.py", line 49, in inner
ret = await func(*args, **kwargs)
File "/app/ymir_app/app/api/api_v1/endpoints/images.py", line 154, in get_shared_images_from_github
shared_images = get_github_table(url, timeout=timeout)
File "/app/ymir_app/app/utils/github.py", line 37, in get_github_table
tbl = get_markdown_table(url, timeout)
File "/app/ymir_app/app/utils/github.py", line 13, in get_markdown_table
resp = requests.get(url, headers=HEADERS, timeout=timeout)
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 529, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 645, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /IndustryEssentials/ymir/master/docker_executor/public_index.md (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f651cab3400>: Failed to establish a new connection: [Errno 111] Connection refused'))
[2022-06-30 00:17:20 +0000] [52] [INFO] 172.168.254.8:59090 - "GET /api/v1/images/shared HTTP/1.0" 200

请问这是要连接哪个网址连接不上,应该如何解决呢?

ymir-workplace has no dir named "ymir-sharing", version Ymir-release-1.0.0

thanks for your kind to release this funny project! I try to use ymir on my pc,and I success to launch ymir,but when I import voc2012, I don't find "ymir-sharing", so I create it,and push voc2012 under this dir. whatever I do, I don't import voc2012 dataset on ymir-gui. Please help me, thanks!

RC_CMD_ERROR_UNKNOWN: unkown error

my ymir service is running in a virtual machine, when I train the model, it prompts an error

model_group_23 > Train
Model Detail
Model Name model_group_23 V2 mAP 0.00%
Task State
Current State Invalid
Failure
Error Reason RC_CMD_ERROR_UNKNOWN: unkown error

my evn:
1、vm
2、centos 7
3、Docker version 20.10.16
4、docker-compose version 1.27.2
5、YMIR config=> SERVER_RUNTIME=runc

GPU not recognized

Describe the issue
GPU not recognized

To Reproduce
1、YMIR:
ymir

2、Nvidia Info

nmsi
nver

Environment (please complete the following information):

  • OS: ubuntu:18.04
  • Ymir Version [ release-1.0.0]
  • Docker version 20.10.16, build aa7e414
  • Docker Compose version v2.5.1
  • Nvidia-docker version:
    NVIDIA Docker: 2.6.0
    Client: Docker Engine - Community
    Version: 20.10.16
    API version: 1.41
    Go version: go1.17.10
    Git commit: aa7e414
    Built: Thu May 12 09:17:23 2022
    OS/Arch: linux/amd64
    Context: default
    Experimental: true

Server: Docker Engine - Community
Engine:
Version: 20.10.16
API version: 1.41 (minimum version 1.12)
Go version: go1.17.10
Git commit: f756502
Built: Thu May 12 09:15:28 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.4
GitCommit: 212e8b6fa2f44b9c21b2798135fc6fb7c53efc16
runc:
Version: 1.1.1
GitCommit: v1.1.1-0-g52de29d
docker-init:
Version: 0.19.0
GitCommit: de40ad0

界面不能正常跳转

一开始自己用test用户安装未成功,后来YMIR平台的技术人员远程帮忙用root用户安装,软件可以正常工作,非常感谢MIR平台的技术人员的支持。
因为是在笔记本上中起的虚拟机,虚拟机中再起的镜像,可能是由于虚拟机的非正常关机(笔记本没电了),后来出现主界面能进去但不能正常使用的情况,咨询完平台的技术人员,解释可能是权限的问题,后自己重新选用test用户安装(清空了所有镜像,重新下载安装),安装过程中除“ Found orphan containers (ymir-master_label_redis_1, ymir-master_label_api_1, ymir-master_label_minio_1, ymir-master_label_nginx_1, ymir-master_label_mysql_1, ymir-master_label_celery_worker_1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up”外未报其他问题(也不清楚“ Found orphan containers .......这些算不算问题),情况跟之前的一样,界面能进去,之前创建的示例工程也还在(不明白为什么之前创建的示例工程还在),但点击这个示例工程,界面不跳到详细页面中,点击界面中“标签管理”按钮,界面依然不能正常调转和显示。

gpu负载问题

env:
系统: ubuntu 20.04
显卡:nvidia 2080ti 公版显卡

描述:
在训练过程中,查看显卡占用信息 。只占用到gpu的一半资源。

微信图片_20220705143754

需求:
想满负荷GPU资源进行训练,如何配置?

无法从界面正常跳转到标注系统label studio

labelstudio已经正确配置并且启动了,docker ps结果正常。
从ymir的web界面,点击数据里面的 “跳转到标注平台”
image
能够打开labelstudio,但是设置的token的无效,需要登陆。
登陆后重新点击 “跳转到标注平台” 而且labelstudio里面是空的,无法加载ymir平台上面的数据。
两个系统看起来是独立的。
image

非常规操作下,一个不算bug的问题的建议

描述:
在项目已有ymir-workplace的前提下(即项目已经有数据),移动整个ymir文件夹到其它路径下再部署(或重命名上一级文件夹),会导致所有新建数据集的预处理任务、标注任务均卡住为0%

步骤:
1、修改ymir上一级文件夹的路径/名字(ymir-workplace下的文件及其文件夹的所有者和权限保持与原来一致)
2、再部署
3、新建数据集预处理任务(如数据集采样)或标注任务

显示错误
ERROR : [2022-06-07 03:07:11,538] Job "update_monitor_percent_log (trigger: interval[0:00:20], next run at: 2022-06-07 03:07:31 UTC)" raised an exception
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/apscheduler/executors/base.py", line 125, in run_job
retval = job.func(*job.args, **job.kwargs)
File "ymir_monitor/monitor/utils/crontab_job.py", line 55, in update_monitor_percent_log
runtime_log_content = PercentLogHandler.parse_percent_log(log_path)
File "/app/common/common_utils/percent_log_util.py", line 31, in parse_percent_log
with open(log_file, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory:

查找原因,发现可能是redis数据库里的MONITOR_FINISHED_KEY:v1的raw_log_contents存的是绝对路径导致的。
修改MONITOR_FINISHED_KEY:v1里的值再启动,恢复

ps:感觉不算bug,毕竟是个非常规的部署操作,本来不想提的,一般人不会改名/改路径再启动,但是否redis存相对路径会好点?因为毕竟.env里设置了YMIR_PATH。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.