Coder Social home page Coder Social logo

deepvac / mlab Goto Github PK

View Code? Open in Web Editor NEW
65.0 65.0 16.0 227 KB

“云上炼丹师”中的云

License: GNU General Public License v3.0

Shell 12.37% Prolog 49.56% Dockerfile 38.07%
conda cuda deepvac docker firefox ibus k8s kde kdenlive konsole kubernetes mkl pytorch realvnc ubuntu vnc vscode

mlab's Introduction

DeepVAC

DeepVAC提供了基于PyTorch的AI项目的工程化规范。为了达到这一目标,DeepVAC包含了:

诸多PyTorch AI项目的内在逻辑都大同小异,因此DeepVAC致力于把更通用的逻辑剥离出来,从而使得工程代码的准确性、易读性、可维护性上更具优势。

如果想使得AI项目符合DeepVAC规范,需要仔细阅读DeepVAC标准。 如果想了解deepvac库的设计,请阅读deepvac库的设计

如何基于DeepVAC构建自己的PyTorch AI项目

1. 阅读DeepVAC标准

可以粗略阅读,建立起第一印象。

2. 环境准备

DeepVAC的依赖有:

  • Python3。不支持Python2,其已被废弃;
  • 依赖包:torch, torchvision, tensorboard, scipy, numpy, cv2, Pillow;

这些依赖使用pip命令(或者git clone)自行安装,不再赘述。

对于普通用户来说,最方便高效的方式还是使用MLab HomePod作为DeepVAC的使用环境,这是一个预构建的Docker image,可以帮助用户省掉不必要的环境配置时间。 同时在MLab组织内部,我们也使用MLab HomePod进行日常的模型的训练任务。

3. 安装deepvac库

可以使用pip来进行安装:
pip3 install deepvac
或者
python3 -m pip install deepvac

如果你需要使用deepvac在github上的最新代码,就需要使用如下的开发者模式:

开发者模式

  • 克隆该项目到本地:git clone https://github.com/DeepVAC/deepvac
  • 在你的入口文件中添加:
import sys
#replace with your local deepvac directory
sys.path.insert(0,'/home/gemfield/github/deepvac')

或者设置PYTHONPATH环境变量:

export PYTHONPATH=/home/gemfield/github/deepvac

4. 创建自己的PyTorch项目

  • 初始化自己项目的git仓库;
  • 在仓库中创建第一个研究分支,比如分支名为 LTS_b1_aug9_movie_video_plate_130w;
  • 切换到上述的LTS_b1分支中,开始工作;

5. 编写配置文件

配置文件的文件名均为 config.py,位于你项目的根目录。在代码开始处添加from deepvac import new, AttrDict; 所有用户的配置都存放在这个文件里。config模块提供了6个预定义的作用域:config.core,config.aug,config.cast,config.datasets,config.backbones,config.loss。使用方法如下:

  • 所有和trainer相关(包括train、val、test)的配置都定义在config.core.<my_train_class>中;
  • 所有和deepvac.aug中增强模块相关的配置都定义在config.aug.<my_aug_class>中;
  • 所有和模型转换相关的配置都定义在config.cast.<the_caster_class>中;
  • 所有和Datasets相关的配置都定义在config.datasets.<my_dataset_class>中;
  • 所有和loss相关的配置都定义在config.loss.<my_loss_class>中;
  • 用户可以开辟自己的作用域,比如config.my_stuff = AttrDict(),然后config.my_stuff.name = 'gemfield';
  • 用户可以使用new()来初始化config实例,使用clone()来深拷贝config配置项。

更多配置:

  • 预训练模型加载;
  • checkpoint加载;
  • tensorboard使用;
  • TorchScript使用;
  • 转换ONNX;
  • 转换NCNN;
  • 转换CoreML;
  • 转换TensorRT;
  • 转换TNN;
  • 转换MNN;
  • 开启量化;
  • 开启EMA;
  • 开启自动混合精度训练(AMP);

以及关于配置文件的更详细解释,请阅读config说明.

项目根目录下的train.py中用如下方式引用config.py文件:

from config import config as deepvac_config
from deepvac import DeepvacTrain

class MyTrain(DeepvacTrain):
    ......

my_train = MyTrain(deepvac_config)
my_train()

项目根目录下的test.py中用如下方式引用config.py文件:

from config import config as deepvac_config
from deepvac import Deepvac

class MyTest(Deepvac)
    ......

my_test = MyTest(deepvac_config)
my_test()

之后,train.py/test.py代码中通过如下方式来读写config.core中的配置项

print(self.config.log_dir)
print(self.config.batch_size)
......

此外,鉴于config的核心作用,deepvac还设计了如下的API来方便对config模块的使用:

  • AttrDict
  • new
  • interpret
  • fork
  • clone
from deepvac import AttrDict, new, interpret, fork

关于这些API的使用方法,请访问config API 说明.

6. 编写synthesis/synthesis.py(可选)

编写该文件,用于产生本项目的数据集,用于对本项目的数据集进行自动化检查和清洗。 这一步为可选,如果有需要的话,可以参考Deepvac组织下Synthesis2D项目的实现。

7. 编写aug/aug.py(可选)

编写该文件,用于实现数据增强策略。 deepvac.aug模块为数据增强设计了特有的语法,在两个层面实现了复用:aug 和 composer。比如说,我想复用添加随机斑点的SpeckleAug:

from deepvac.aug.base_aug import SpeckleAug

这是对底层aug算子的复用。我们还可以直接复用别人写好的composer,并且是以直截了当的方式。比如deepvac.aug提供了一个用于人脸检测数据增强的RetinaAugComposer:

from deepvac.aug import RetinaAugComposer

以上说的是直接复用,但项目中更多的是自定义扩展,而且大部分情况下也需要复用torchvision的transform的compose,又该怎么办呢?这里解释下,composer是deepvac.aug模块的概念,compose是torchvision transform模块的概念,之所以这么相似纯粹是因为巧合。

要扩展自己的composer也是很简单的,比如我可以自定义一个composer(我把它命名为GemfieldComposer),这个composer可以使用/复用以下增强逻辑:

  • torchvision transform定义的compose;
  • deepvac内置的aug算子;
  • 我自己写的aug算子。

更详细的步骤请访问:deepvac.aug模块使用

8. 编写Dataset类

代码编写在data/dataloader.py文件中。继承deepvac.datasets类体系,比如FileLineDataset类提供了对如下train.txt这种格式的封装:

#train.txt,第一列为图片路径,第二列为label
img0/1.jpg 0
img0/2.jpg 0
...
img1/0.jpg 1
...
img2/0.jpg 2
...

有时第二列是字符串,并且想把FileLineDataset中使用Image读取图片对方式替换为cv2,那么可以通过如下的继承方式来重新实现:

from deepvac.datasets import FileLineDataset

class FileLineCvStrDataset(FileLineDataset):
    def _buildLabelFromLine(self, line):
        line = line.strip().split(" ")
        return [line[0], line[1]]

    def _buildSampleFromPath(self, abs_path):
        #we just set default loader with Pillow Image
        sample = cv2.imread(abs_path)
        sample = self.compose(sample)
        return sample

哦,FileLineCvStrDataset也已经是deepvac.datasets中提供的类了。

9. 编写训练和验证脚本

在Deepvac规范中,train.py就代表了训练范式。模型训练的代码写在train.py文件中,继承DeepvacTrain类:

from deepvac import DeepvacTrain

class MyTrain(DeepvacTrain):
    pass

继承DeepvacTrain的子类可能需要重新实现以下方法才能够开始训练:

类的方法(*号表示用户一般要重新实现) 功能 备注
preEpoch 每轮Epoch之前的用户操作,DeepvacTrain啥也不做 用户可以重新定义(如果需要的话)
preIter 每个batch迭代之前的用户操作,DeepvacTrain啥也不做 用户可以重新定义(如果需要的话)
postIter 每个batch迭代之后的用户操作,DeepvacTrain啥也不做 用户可以重新定义(如果需要的话)
postEpoch 每轮Epoch之后的用户操作,DeepvacTrain啥也不做 用户可以重新定义(如果需要的话)
doFeedData2Device DeepvacTrain把来自dataloader的sample和target(标签)移动到device设备上 用户可以重新定义(如果需要的话)
doForward DeepvacTrain会进行网络推理,推理结果赋值给self.config.output成员 用户可以重新定义(如果需要的话)
doLoss DeepvacTrain会使用self.config.output和self.config.target进行计算得到此次迭代的loss 用户可以重新定义(如果需要的话)
doBackward 网络反向传播过程,DeepvacTrain会调用self.config.loss.backward()进行BP 用户可以重新定义(如果需要的话)
doOptimize 网络权重更新的过程,DeepvacTrain会调用self.config.optimizer.step() 用户可以重新定义(如果需要的话)
doSchedule 更新学习率的过程,DeepvacTrain会调用self.config.scheduler.step() 用户可以重新定义(如果需要的话)
* doValAcc 在val模式下计算模型的acc,DeepvacTrain啥也不做 用户一般要重新定义,写tensorboard的时候依赖于此

典型的写法如下:

class MyTrain(DeepvacTrain):
    ...
    #因为基类不能处理list类型的标签,重写该方法
    def doFeedData2Device(self):
        self.config.target = [anno.to(self.config.device) for anno in self.config.target]
        self.config.sample = self.config.sample.to(self.config.device)

    #初始化config.core.acc
    def doValAcc(self):
        self.config.acc = your_acc
        LOG.logI('Test accuray: {:.4f}'.format(self.config.acc))


train = MyTrain(deepvac_config)
train()

10. 编写测试脚本

在Deepvac规范中,test.py就代表测试范式。测试代码写在test.py文件中,继承Deepvac类。

和train.py中的train/val的本质不同在于:

  • 舍弃train/val上下文;
  • 网络不再使用autograd上下文;
  • 不再进行loss、反向、优化等计算;
  • 使用Deepvac的*Report模块来进行准确度、速度方面的衡量;

继承Deepvac类的子类必须(重新)实现以下方法才能够开始测试:

类的方法(*号表示必需重新实现) 功能 备注
preIter 每个batch迭代之前的用户操作,Deepvac啥也不做 用户可以重新定义(如果需要的话)
postIter 每个batch迭代之后的用户操作,Deepvac啥也不做 用户可以重新定义(如果需要的话)
doFeedData2Device Deepvac把来自dataloader的sample和target(标签)移动到device设备上 用户可以重新定义(如果需要的话)
doForward Deepvac会进行网络推理,推理结果赋值给self.config.output成员 用户可以重新定义(如果需要的话)
doTest 用户完全自定义的test逻辑,可以通过report.add(gt, pred)添加测试结果,生成报告 看下面的测试逻辑

典型的写法如下:

class MyTest(Deepvac):
    ...
    def doTest(self):
        ...

test = MyTest(deepvac_config)
test()
#test(input_tensor)

当执行test()的时候,DeepVAC框架会按照如下的优先级进行测试:

  • 如果用户传递了参数,比如test(input_tensor),则将针对该input_tensor进行doFeedData2Device + doForward,然后测试结束;
  • 如果用户重写了doTest()函数,则将执行doTest(),然后测试结束;
  • 如果用户配置了config.my_test.test_loader,则将迭代该loader,对每个sample进行doFeedData2Device + doForward,然后测试结束;
  • 以上都不符合,报错退出。

DeepVAC的社区产品

产品名称 简介 当前版本 获取方式/部署形式
DeepVAC 独树一帜的PyTorch工程规范 0.6.0 pip install deepvac
libdeepvac 独树一帜的PyTorch模型部署框架 1.9.0 SDK,下载 & 解压
MLab HomePod 迄今为止最先进的容器化PyTorch模型训练环境 2.0 docker run / k8s
MLab RookPod 迄今为止最先进的成本10万人民币以下的存储解决方案 NA 硬件规范 + k8s yaml
pyRBAC 基于Keycloak的RBAC python实现 NA pip install(敬请期待)
DeepVAC版PyTorch 为MLab HomePod pro版本定制的PyTorch包 1.9.0 conda install -c gemfield pytorch
DeepVAC版LibTorch 为libdeepvac定制的LibTorch库 1.9.0 压缩包,下载 & 解压

mlab's People

Contributors

gemfield avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

mlab's Issues

k8s不支持xrdp

HomePod 2.0添加了rdp支持,在docker上测试OK,但在k8s集群上无法连接。

[新功能]为user添加sudo组

usermod -aG sudo gemfield
Defaults        env_reset
Defaults        mail_badpass
Defaults        secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin"

# User privilege specification
root    ALL=(ALL:ALL) ALL
gemfield ALL=(ALL:ALL) ALL
# Members of the admin group may gain root privileges
%admin ALL=(ALL) ALL

# Allow members of group sudo to execute any command
%sudo   ALL=(ALL:ALL) ALL

#includedir /etc/sudoers.d

[调整] 减小MKL静态库的大小

rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/bin
rm -rf /opt/intel/conda_channel
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/benchmarks
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/examples
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/*.so
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_blas*
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_blacs*
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_gf*
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_intel_thread.a
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_pgi_thread.a
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_tbb_thread.a
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_intel_ilp64.a
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_lapack95*
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_cdft_core.a
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_scalapack*

MLab RookPod 1.0 计划的功能

功能

  • 分布式存储,支持文件系统、块存储、对象存储;
  • 支持热存储和冷存储;
  • 支持数据的导出和导入,实现异地备份;

硬件规格

  • 至少10G的网络交换;
  • 热存储使用ssd,冷存储使用ssd或者hdd

[bug] HomePod上import onnx_coreml出错

>>> import onnx_coreml
WARNING:root:scikit-learn version 0.24.1 is not supported. Minimum required version: 0.17. Maximum required version: 0.19.2. Disabling scikit-learn conversion API.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/onnx_coreml/__init__.py", line 6, in <module>
    from .converter import convert
  File "/opt/conda/lib/python3.8/site-packages/onnx_coreml/converter.py", line 35, in <module>
    from coremltools.converters.nnssa.coreml.graph_pass.mlmodel_passes import remove_disconnected_layers, transform_conv_crop

有人遇到过这个问题了:onnx/onnx-coreml#585

[新功能] 自MLab HomePod 2.0 pro以来pytorch的更新

  • 0dc40474fe Peter Bell Tue Jul 6 19:05:39 2021 -0700 Migrate glu from the THC to ATen (CUDA) (#61153);备注:glu是GatedLinearUnit;
  • a69e947ffd Freey0 Wed Jul 7 07:42:49 2021 -0700 avg_pool3d_backward: Port to structured (#59084)
  • 45ce26c397 Xue Haotian Wed Jul 7 12:32:43 2021 -0700 Port isposinf & isneginf kernel to structured kernels (#60633)
  • baa518e2f6 Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add Int32 support for NNAPI (#59365)
  • cf285d8eea Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add aten::slice NNAPI converter (#59364)
  • d26372794a Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add aten::detach NNAPI converter (#58543)
  • 0be228dd5f Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add aten::flatten NNAPI converter (#60885)
  • b297f65b66 Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add aten::div NNAPI converter (#58541)
  • eab18a9a40 Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add aten::to NNAPI converter (#58540)
  • 14d604a13e Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add aten::softmax NNAPI converter (#58539)
  • 179b3ab88c Xiao Wang Wed Jul 7 20:45:42 2021 -0700 [cuDNN] Enable cudnn_batchnorm_spatial_persistent for BatchNorm3d channels_last_3d (#59129)

[bug] HomePod上apt update出现错误:KeyError: 'suite'

错误如下所示:

gemfield@bd0d4c6acd4c:/etc/apt$ sudo apt update
Hit:1 http://dl.google.com/linux/chrome/deb stable InRelease
Hit:2 http://packages.microsoft.com/repos/code stable InRelease                                                      
Hit:3 https://packages.microsoft.com/repos/vscode stable InRelease                                                   
Get:4 http://security.ubuntu.com/ubuntu focal-security InRelease [109 kB]                                            
Hit:5 http://archive.ubuntu.com/ubuntu focal InRelease                                                               
Hit:6 http://ppa.launchpad.net/kdenlive/kdenlive-stable/ubuntu focal InRelease                                       
Hit:7 https://apt.repos.intel.com/mkl all InRelease                                                  
Get:8 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]                              
Get:9 http://archive.neon.kde.org/user focal InRelease [166 kB]                           
Get:10 http://security.ubuntu.com/ubuntu focal-security/main amd64 DEP-11 Metadata [28.5 kB]        
Get:11 http://archive.ubuntu.com/ubuntu focal-backports InRelease [101 kB]
Get:12 http://security.ubuntu.com/ubuntu focal-security/universe amd64 DEP-11 Metadata [71.3 kB]
Get:13 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 DEP-11 Metadata [365 kB]
Get:14 http://archive.ubuntu.com/ubuntu focal-updates/main DEP-11 64x64 Icons [87.9 kB]
Get:15 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 DEP-11 Metadata [411 kB]
Get:16 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 DEP-11 Metadata [2,540 B]
Get:17 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 DEP-11 Metadata [1,765 B]
Fetched 1,457 kB in 4s (386 kB/s)                                              
Traceback (most recent call last):
  File "/usr/lib/cnf-update-db", line 26, in <module>
    col.create(db)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 94, in create
    self._fill_commands(con)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 138, in _fill_commands
    self._parse_single_commands_file(con, fp)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 176, in _parse_single_commands_file
    suite=tagf.section["suite"]
KeyError: 'suite'
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 153, in apport_excepthook
    with os.fdopen(os.open(pr_filename,
FileNotFoundError: [Errno 2] No such file or directory: '/var/crash/_usr_lib_cnf-update-db.0.crash'

Original exception was:
Traceback (most recent call last):
  File "/usr/lib/cnf-update-db", line 26, in <module>
    col.create(db)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 94, in create
    self._fill_commands(con)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 138, in _fill_commands
    self._parse_single_commands_file(con, fp)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 176, in _parse_single_commands_file
    suite=tagf.section["suite"]
KeyError: 'suite'
Reading package lists... Done

在MLab HomePod 2.0 pro上编译基于pybind11的程序出错

错误信息:

pydeepvac.cpp:37:53: error: no matching function for call to ??pybind11::class_<deepvac::SyszuxVisionTerror>::def(const char [8], <unresolved overloaded function type>)??
   37 |         .def("process", &SyszuxVisionTerror::process);
      |                                                     ^
In file included from /deepvac/libtorch/include/torch/csrc/utils/pybind.h:7,
                 from /deepvac/libtorch/include/torch/csrc/api/include/torch/python.h:12,
                 from /deepvac/libtorch/include/torch/extension.h:6,
                 from /please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:1:
/deepvac/libtorch/include/pybind11/pybind11.h:1315:13: note: candidate: ??template<class Func, class ... Extra> pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >& pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >::def(const char*, Func&&, const Extra& ...) [with Func = Func; Extra = {Extra ...}; type_ = deepvac::SyszuxVisionTerror; options = {}]??
 1315 |     class_ &def(const char *name_, Func&& f, const Extra&... extra) {
      |             ^~~
/deepvac/libtorch/include/pybind11/pybind11.h:1315:13: note:   template argument deduction/substitution failed:
/please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:37:53: note:   couldn??t deduce template parameter ??Func??
   37 |         .def("process", &SyszuxVisionTerror::process);
      |                                                     ^
In file included from /deepvac/libtorch/include/torch/csrc/utils/pybind.h:7,
                 from /deepvac/libtorch/include/torch/csrc/api/include/torch/python.h:12,
                 from /deepvac/libtorch/include/torch/extension.h:6,
                 from /please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:1:
/deepvac/libtorch/include/pybind11/pybind11.h:1333:13: note: candidate: ??template<pybind11::detail::op_id id, pybind11::detail::op_type ot, class L, class R, class ... Extra> pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >& pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >::def(const pybind11::detail::op_<id, ot, L, R>&, const Extra& ...) [with pybind11::detail::op_id id = id; pybind11::detail::op_type ot = ot; L = L; R = R; Extra = {Extra ...}; type_ = deepvac::SyszuxVisionTerror; options = {}]??
 1333 |     class_ &def(const detail::op_<id, ot, L, R> &op, const Extra&... extra) {
      |             ^~~
/deepvac/libtorch/include/pybind11/pybind11.h:1333:13: note:   template argument deduction/substitution failed:
/please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:37:53: note:   mismatched types ??const pybind11::detail::op_<id, ot, L, R>?? and ??const char [8]??
   37 |         .def("process", &SyszuxVisionTerror::process);
      |                                                     ^
In file included from /deepvac/libtorch/include/torch/csrc/utils/pybind.h:7,
                 from /deepvac/libtorch/include/torch/csrc/api/include/torch/python.h:12,
                 from /deepvac/libtorch/include/torch/extension.h:6,
                 from /please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:1:
/deepvac/libtorch/include/pybind11/pybind11.h:1345:13: note: candidate: ??template<class ... Args, class ... Extra> pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >& pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >::def(const pybind11::detail::initimpl::constructor<Args ...>&, const Extra& ...) [with Args = {Args ...}; Extra = {Extra ...}; type_ = deepvac::SyszuxVisionTerror; options = {}]??
 1345 |     class_ &def(const detail::initimpl::constructor<Args...> &init, const Extra&... extra) {
      |             ^~~
/deepvac/libtorch/include/pybind11/pybind11.h:1345:13: note:   template argument deduction/substitution failed:
/please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:37:53: note:   mismatched types ??const pybind11::detail::initimpl::constructor<Args ...>?? and ??const char [8]??
   37 |         .def("process", &SyszuxVisionTerror::process);
      |                                                     ^
In file included from /deepvac/libtorch/include/torch/csrc/utils/pybind.h:7,
                 from /deepvac/libtorch/include/torch/csrc/api/include/torch/python.h:12,
                 from /deepvac/libtorch/include/torch/extension.h:6,
                 from /please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:1:
/deepvac/libtorch/include/pybind11/pybind11.h:1351:13: note: candidate: ??template<class ... Args, class ... Extra> pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >& pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >::def(const pybind11::detail::initimpl::alias_constructor<Args ...>&, const Extra& ...) [with Args = {Args ...}; Extra = {Extra ...}; type_ = deepvac::SyszuxVisionTerror; options = {}]??
 1351 |     class_ &def(const detail::initimpl::alias_constructor<Args...> &init, const Extra&... extra) {
      |             ^~~
/deepvac/libtorch/include/pybind11/pybind11.h:1351:13: note:   template argument deduction/substitution failed:
/please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:37:53: note:   mismatched types ??const pybind11::detail::initimpl::alias_constructor<Args ...>?? and ??const char [8]??
   37 |         .def("process", &SyszuxVisionTerror::process);
      |                                                     ^
In file included from /deepvac/libtorch/include/torch/csrc/utils/pybind.h:7,
                 from /deepvac/libtorch/include/torch/csrc/api/include/torch/python.h:12,
                 from /deepvac/libtorch/include/torch/extension.h:6,
                 from /please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:1:
/deepvac/libtorch/include/pybind11/pybind11.h:1357:13: note: candidate: ??template<class ... Args, class ... Extra> pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >& pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >::def(pybind11::detail::initimpl::factory<Args ...>&&, const Extra& ...) [with Args = {Args ...}; Extra = {Extra ...}; type_ = deepvac::SyszuxVisionTerror; options = {}]??
 1357 |     class_ &def(detail::initimpl::factory<Args...> &&init, const Extra&... extra) {
      |             ^~~
/deepvac/libtorch/include/pybind11/pybind11.h:1357:13: note:   template argument deduction/substitution failed:
/please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:37:53: note:   mismatched types ??pybind11::detail::initimpl::factory<Args ...>?? and ??const char [8]??
   37 |         .def("process", &SyszuxVisionTerror::process);
      |                                                     ^
In file included from /deepvac/libtorch/include/torch/csrc/utils/pybind.h:7,
                 from /deepvac/libtorch/include/torch/csrc/api/include/torch/python.h:12,
                 from /deepvac/libtorch/include/torch/extension.h:6,
                 from /please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:1:
/deepvac/libtorch/include/pybind11/pybind11.h:1363:13: note: candidate: ??template<class ... Args, class ... Extra> pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >& pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >::def(pybind11::detail::initimpl::pickle_factory<Args ...>&&, const Extra& ...) [with Args = {Args ...}; Extra = {Extra ...}; type_ = deepvac::SyszuxVisionTerror; options = {}]??
 1363 |     class_ &def(detail::initimpl::pickle_factory<Args...> &&pf, const Extra &...extra) {
      |             ^~~
/deepvac/libtorch/include/pybind11/pybind11.h:1363:13: note:   template argument deduction/substitution failed:
/please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:37:53: note:   mismatched types ??pybind11::detail::initimpl::pickle_factory<Args ...>?? and ??const char [8]??
   37 |         .def("process", &SyszuxVisionTerror::process);

[新功能]安装最新protobuf

在HomePod上

#dependency
sudo apt-get install autoconf automake libtool curl make g++ unzip

git clone https://github.com/protocolbuffers/protobuf.git
cd protobuf
git submodule update --init --recursive
bash -x ./autogen.sh

./configure
 make
 make check
make install
#refresh shared library cache.
ldconfig

[优化] 改变pybind11包的安装方式

当前的安装方式为:

apt install python3-pybind11

这个包有如下依赖:

gemfield@ThinkPad-X1C:~$ apt show python3-pybind11
Package: python3-pybind11
Version: 2.5.0-5
Source: pybind11
Origin: Ubuntu
Installed-Size: 610 kB
Depends: python3:any, pybind11-dev (= 2.5.0-5)
Recommends: python3-numpy
Homepage: https://github.com/pybind/pybind11
Download-Size: 113 kB
APT-Manual-Installed: yes
APT-Sources: http://archive.ubuntu.com/ubuntu groovy/universe amd64 Packages
......

导致实际会安装如下3个deb包:

  • python3-pybind11
  • pybind11-dev
  • python3

而这并不是必须的。

其中,python3-pybind11包含的文件如下:

/usr/lib/python3/dist-packages/pybind11-2.5.0.egg-info/dependency_links.txt
/usr/lib/python3/dist-packages/pybind11-2.5.0.egg-info/top_level.txt
/usr/lib/python3/dist-packages/pybind11-2.5.0.egg-info/not-zip-safe
/usr/lib/python3/dist-packages/pybind11-2.5.0.egg-info/PKG-INFO
/usr/lib/python3/dist-packages/pybind11/_version.py
/usr/lib/python3/dist-packages/pybind11/__main__.py
/usr/lib/python3/dist-packages/pybind11/__init__.py
/usr/lib/python3/dist-packages/pybind11/include/pybind11/cast.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/complex.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/buffer_info.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/common.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/operators.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/functional.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/attr.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/pytypes.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/embed.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/eigen.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/stl.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/eval.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/iostream.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/numpy.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/pybind11.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/options.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/chrono.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/stl_bind.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/detail/common.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/detail/typeid.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/detail/internals.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/detail/init.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/detail/descr.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/detail/class.h

pybind11-dev包含的文件如下:

/usr/lib/cmake/pybind11/pybind11Targets.cmake
/usr/lib/cmake/pybind11/pybind11Tools.cmake
/usr/lib/cmake/pybind11/FindPythonLibsNew.cmake
/usr/lib/cmake/pybind11/pybind11ConfigVersion.cmake
/usr/lib/cmake/pybind11/pybind11Config.cmake
/usr/share/doc/python3-pybind11/copyright
/usr/share/doc/pybind11-dev/copyright
/usr/share/doc/pybind11-dev/changelog.Debian.gz
/usr/include/pybind11/cast.h
/usr/include/pybind11/complex.h
/usr/include/pybind11/buffer_info.h
/usr/include/pybind11/common.h
/usr/include/pybind11/operators.h
/usr/include/pybind11/functional.h
/usr/include/pybind11/attr.h
/usr/include/pybind11/pytypes.h
/usr/include/pybind11/embed.h
/usr/include/pybind11/eigen.h
/usr/include/pybind11/stl.h
/usr/include/pybind11/eval.h
/usr/include/pybind11/iostream.h
/usr/include/pybind11/numpy.h
/usr/include/pybind11/pybind11.h
/usr/include/pybind11/options.h
/usr/include/pybind11/chrono.h
/usr/include/pybind11/stl_bind.h
/usr/include/pybind11/detail/common.h
/usr/include/pybind11/detail/typeid.h
/usr/include/pybind11/detail/internals.h
/usr/include/pybind11/detail/init.h
/usr/include/pybind11/detail/descr.h
/usr/include/pybind11/detail/class.h

[新功能] xrdp在ubuntu 18.04上可以成功运行,但在ubuntu 20.04上失败了

经过调查,从log上可以看出区别:

# on ubuntu 18.04
xrdp-sesman[57]: (57)(140460571419968)[INFO ] ++ created session (access granted): username gemfield, ip ::ffff:222.128.gemfield:33552 - socket: 11
xrdp-sesman[57]: (57)(140460571419968)[INFO ] starting Xorg session...
xrdp-sesman[57]: (57)(140460571419968)[DEBUG] Closed socket 8 (AF_INET6 :: port 5910)
xrdp-sesman[57]: (57)(140460571419968)[DEBUG] Closed socket 8 (AF_INET6 :: port 6010)
xrdp-sesman[57]: (57)(140460571419968)[DEBUG] Closed socket 8 (AF_INET6 :: port 6210)
xrdp-sesman[57]: (57)(140460571419968)[DEBUG] Closed socket 7 (AF_INET6 ::ffff:127.0.0.1 port 3350)
xrdp[62]: (62)(139675925534528)[INFO ] xrdp_wm_log_msg: login successful for display 10
xrdp[62]: (62)(139675925534528)[INFO ] xrdp_wm_log_msg: login successful for display 10
xrdp-sesman[64]: (64)(140460571419968)[INFO ] calling auth_start_session from pid 64
xrdp[62]: (62)(139675925534528)[DEBUG] xrdp_wm_log_msg: started connecting
xrdp-sesman[64]: pam_unix(xrdp-sesman:session): session opened for user gemfield by (uid=0)
dbus-daemon[10]: [system] Activating service name='org.freedesktop.login1' requested by ':1.5' (uid=0 pid=64 comm="/usr/sbin/xrdp-sesman --nodaemon " label="docker-default (enforce)") (using servicehelper)
New seat seat0.
dbus-daemon[10]: [system] Activating service name='org.freedesktop.systemd1' requested by ':1.6' (uid=0 pid=66 comm="/lib/systemd/systemd-logind " label="docker-default (enforce)") (using servicehelper)
dbus-daemon[10]: [system] Successfully activated service 'org.freedesktop.login1'
dbus-daemon[10]: [system] Activated service 'org.freedesktop.systemd1' failed: Launch helper exited with unknown return code 1


# on ubuntu 20.04
xrdp-sesman[11]: (11)(140645830477376)[INFO ] ++ created session (access granted): username gemfield, ip ::ffff:222.128.gemfield:54766 - socket: 11
xrdp-sesman[11]: (11)(140645830477376)[INFO ] starting Xorg session...
xrdp-sesman[11]: (11)(140645830477376)[DEBUG] Closed socket 8 (AF_INET6 :: port 5910)
xrdp-sesman[11]: (11)(140645830477376)[DEBUG] Closed socket 8 (AF_INET6 :: port 6010)
xrdp-sesman[11]: (11)(140645830477376)[DEBUG] Closed socket 8 (AF_INET6 :: port 6210)
xrdp[37]: (37)(139680321079104)[INFO ] xrdp_wm_log_msg: login successful for display 10
xrdp-sesman[11]: (11)(140645830477376)[DEBUG] Closed socket 7 (AF_INET6 ::ffff:127.0.0.1 port 3350)
xrdp[37]: (37)(139680321079104)[DEBUG] xrdp_wm_log_msg: started connecting
xrdp-sesman[39]: (39)(140645830477376)[INFO ] calling auth_start_session from pid 39
xrdp-sesman[39]: pam_unix(xrdp-sesman:session): session opened for user gemfield by (uid=0)
dbus-daemon[9]: [system] Activating service name='org.freedesktop.login1' requested by ':1.0' (uid=0 pid=39 comm="/usr/sbin/xrdp-sesman --nodaemon " label="docker-default (enforce)") (using servicehelper)
dbus-daemon[9]: [system] Activated service 'org.freedesktop.login1' failed: Launch helper exited with unknown return code 1
xrdp-sesman[39]: pam_systemd(xrdp-sesman:session): Failed to create session: Launch helper exited with unknown return code 1
xrdp-sesman[39]: (39)(140645830477376)[DEBUG] Closed socket 6 (AF_INET6 ::ffff:127.0.0.1 port 3350)
xrdp-sesman[39]: (39)(140645830477376)[DEBUG] Closed socket 7 (AF_INET6 ::ffff:127.0.0.1 port 3350)
......
dbus-daemon[9]: [system] Activating service name='org.freedesktop.login1' requested by ':1.1' (uid=1000 pid=43 comm="/usr/lib/xorg/Xorg :10 -auth .Xauthority -config x" label="docker-default (enforce)") (using servicehelper)
dbus-daemon[9]: [system] Activated service 'org.freedesktop.login1' failed: Launch helper exited with unknown return code 1

可以看到,在ubuntu20.04上,有关于org.freedesktop.login1的dbus错误。

MLab HomePod上onnx转TNN报错:ImportError: cannot import name 'optimizer' from 'onnx' (/opt/conda/lib/python3.8/site-packages/onnx/__init__.py)

转换命令如下(开启了-optimize=1 开关):

python3 onnx2tnn.py /app/gemfield/onnxmodels/v2.onnx -version=v1.0 -optimize=1 -half=0 -o /app/gemfield/onnxmodels

然后报错:

Traceback (most recent call last):
  File "onnx2tnn.py", line 41, in do_optimize
    import onnx2tnn.onnx_optimizer.onnx_optimizer as opt
ModuleNotFoundError: No module named 'onnx2tnn.onnx_optimizer'; 'onnx2tnn' is not a package

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "onnx2tnn.py", line 148, in <module>
    main()
  File "onnx2tnn.py", line 120, in main
    do_optimize(onnx_net_path, input_shape)
  File "onnx2tnn.py", line 43, in do_optimize
    import onnx_optimizer.onnx_optimizer as opt
  File "/app/gemfield/github/TNN/tools/onnx2tnn/onnx-converter/onnx_optimizer/onnx_optimizer.py", line 8, in <module>
    from onnx import optimizer
ImportError: cannot import name 'optimizer' from 'onnx' (/opt/conda/lib/python3.8/site-packages/onnx/__init__.py)

Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired

模型推理时被warning信息刷屏:

  • 环境 MLab HomePod 2.0 pro
  • 错误信息:
[W ___torch_mangle_579.py:74] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:115] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:156] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:196] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:228] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:260] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
no text detected
[W ___torch_mangle_579.py:74] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:115] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:156] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:196] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:228] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:260] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )

[新功能] 添加faiss

RUN git clone https://github.com/facebookresearch/faiss && \
    cd faiss && \
    mkdir build && \
    cd build && \
    cmake .. && \
    make VERBOSE=1 && \
    make install && \
    cd faiss/python && \
    python setup.py install && \
    cd ../../ && \
    make clean

HomePod:2.0 和HomePod:2.0-pro的区别

pro在标准版的基础上,增加了一些面向底层开发者的软件包:

  • boost开发库(libboost-dev libboost-filesystem-dev libboost-program-options-dev libboost-system-dev);
  • cuda开发库(pro直接基于nvidia/cuda:devel系列镜像);
  • MKL静态库(从intel仓库下载的,用于编译静态libtorch应用);
  • pycuda包,tensorrt运行时的依赖;

AI推理部署环境最低要求

宿主机:

  • x86_64 cpu;
  • Linux;
  • 8G RAM;
  • 4G CUDA RAM(图灵架构或以上);
  • nvidia-driver 450+;
  • docker 19.3+;
  • NVIDIA Container Toolkit;

每个AI推理能力:

  • 4G CUDA RAM;
  • 4G RAM;
  • 4 cpu cores.

homepod:2.0-pro torch.cuda.is_available return False and throws " Error 804: forward compatibility was attempted on non supported HW "

ENV

  • 显卡型号: GeForce GTX 1650
  • 驱动版本: Driver Version: 460.80
  • CUDA版本: CUDA Version: 11.3
  • 系统: Ubuntu20.04

ERROR

  • nvidia-smi ok
  • import torch; print(torch.cuda.is_available())
/opt/conda/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at  /opt/conda/conda-bld/pytorch_1623448278899/work/c10/cuda/CUDAFunctions.cpp:115.)
  return torch._C._cuda_getDeviceCount() > 0
False

[新功能] 自HomePod 1.0以来pytorch的更新

  • fd02fc5d715a7647631c5806db736794edc2a52f: Port put_ and take from TH to ATen
  • 4170a6cc24c6867ca6cd48f5581e98a4be89e593: Migrate mode from TH to ATen
  • 6866c033d5aa134a83bc1cb84e3e084a7329167f: [JIT] Add recursive scripting for class type module attributes

[bug] 运行homepod 1.1的时候报错:Error response from daemon: AppArmor enabled on system but the docker-default profile could not be loaded: strconv.Atoi: parsing "file": invalid syntax.

错误如下:

gemfield@ThinkPad-X1C:~$ docker run -it --rm -p 5900:5900 -p 7030:7030 -v /app/gemfield:/app/gemfield -v /home/gemfield/github:/home/gemfield/github gemfield/homepod:1.1 bash
docker: Error response from daemon: AppArmor enabled on system but the docker-default profile could not be loaded: strconv.Atoi: parsing "file": invalid syntax.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.