deepvac / mlab Goto Github PK

View Code? Open in Web Editor NEW

65.0 65.0 16.0 227 KB

“云上炼丹师”中的云

License: GNU General Public License v3.0

Shell 12.37% Prolog 49.56% Dockerfile 38.07%

conda cuda deepvac docker firefox ibus k8s kde kdenlive konsole kubernetes mkl pytorch realvnc ubuntu vnc vscode

mlab's Introduction

DeepVAC

DeepVAC提供了基于PyTorch的AI项目的工程化规范。为了达到这一目标，DeepVAC包含了：

软件工程规范：软件工程规范；
代码规范：代码规范；
deepvac库：deepvac库。

诸多PyTorch AI项目的内在逻辑都大同小异，因此DeepVAC致力于把更通用的逻辑剥离出来，从而使得工程代码的准确性、易读性、可维护性上更具优势。

如果想使得AI项目符合DeepVAC规范，需要仔细阅读DeepVAC标准。如果想了解deepvac库的设计，请阅读deepvac库的设计。

如何基于DeepVAC构建自己的PyTorch AI项目

1. 阅读DeepVAC标准

可以粗略阅读，建立起第一印象。

2. 环境准备

DeepVAC的依赖有：

Python3。不支持Python2，其已被废弃；
依赖包：torch, torchvision, tensorboard, scipy, numpy, cv2, Pillow；

这些依赖使用pip命令（或者git clone）自行安装，不再赘述。

对于普通用户来说，最方便高效的方式还是使用MLab HomePod作为DeepVAC的使用环境，这是一个预构建的Docker image，可以帮助用户省掉不必要的环境配置时间。同时在MLab组织内部，我们也使用MLab HomePod进行日常的模型的训练任务。

3. 安装deepvac库

可以使用pip来进行安装：
pip3 install deepvac
或者
python3 -m pip install deepvac

如果你需要使用deepvac在github上的最新代码，就需要使用如下的开发者模式：

开发者模式

克隆该项目到本地：git clone https://github.com/DeepVAC/deepvac
在你的入口文件中添加：

import sys
#replace with your local deepvac directory
sys.path.insert(0,'/home/gemfield/github/deepvac')

或者设置PYTHONPATH环境变量：

export PYTHONPATH=/home/gemfield/github/deepvac

4. 创建自己的PyTorch项目

初始化自己项目的git仓库；
在仓库中创建第一个研究分支，比如分支名为 LTS_b1_aug9_movie_video_plate_130w；
切换到上述的LTS_b1分支中，开始工作；

5. 编写配置文件

配置文件的文件名均为 config.py，位于你项目的根目录。在代码开始处添加from deepvac import new, AttrDict；所有用户的配置都存放在这个文件里。config模块提供了6个预定义的作用域：config.core,config.aug,config.cast,config.datasets,config.backbones,config.loss。使用方法如下：

所有和trainer相关（包括train、val、test）的配置都定义在config.core.<my_train_class>中；
所有和deepvac.aug中增强模块相关的配置都定义在config.aug.<my_aug_class>中；
所有和模型转换相关的配置都定义在config.cast.<the_caster_class>中；
所有和Datasets相关的配置都定义在config.datasets.<my_dataset_class>中；
所有和loss相关的配置都定义在config.loss.<my_loss_class>中；
用户可以开辟自己的作用域，比如config.my_stuff = AttrDict()，然后config.my_stuff.name = 'gemfield'；
用户可以使用new()来初始化config实例，使用clone()来深拷贝config配置项。

更多配置：

预训练模型加载；
checkpoint加载；
tensorboard使用；
TorchScript使用；
转换ONNX；
转换NCNN；
转换CoreML；
转换TensorRT；
转换TNN；
转换MNN；
开启量化；
开启EMA；
开启自动混合精度训练(AMP)；

以及关于配置文件的更详细解释，请阅读config说明.

项目根目录下的train.py中用如下方式引用config.py文件:

from config import config as deepvac_config
from deepvac import DeepvacTrain

class MyTrain(DeepvacTrain):
    ......

my_train = MyTrain(deepvac_config)
my_train()

项目根目录下的test.py中用如下方式引用config.py文件:

from config import config as deepvac_config
from deepvac import Deepvac

class MyTest(Deepvac)
    ......

my_test = MyTest(deepvac_config)
my_test()

之后，train.py/test.py代码中通过如下方式来读写config.core中的配置项

print(self.config.log_dir)
print(self.config.batch_size)
......

此外，鉴于config的核心作用，deepvac还设计了如下的API来方便对config模块的使用：

AttrDict
new
interpret
fork
clone

from deepvac import AttrDict, new, interpret, fork

关于这些API的使用方法，请访问config API 说明.

6. 编写synthesis/synthesis.py（可选）

编写该文件，用于产生本项目的数据集，用于对本项目的数据集进行自动化检查和清洗。这一步为可选，如果有需要的话，可以参考Deepvac组织下Synthesis2D项目的实现。

7. 编写aug/aug.py（可选）

编写该文件，用于实现数据增强策略。 deepvac.aug模块为数据增强设计了特有的语法，在两个层面实现了复用：aug 和 composer。比如说，我想复用添加随机斑点的SpeckleAug：

from deepvac.aug.base_aug import SpeckleAug

这是对底层aug算子的复用。我们还可以直接复用别人写好的composer，并且是以直截了当的方式。比如deepvac.aug提供了一个用于人脸检测数据增强的RetinaAugComposer：

from deepvac.aug import RetinaAugComposer

以上说的是直接复用，但项目中更多的是自定义扩展，而且大部分情况下也需要复用torchvision的transform的compose，又该怎么办呢？这里解释下，composer是deepvac.aug模块的概念，compose是torchvision transform模块的概念，之所以这么相似纯粹是因为巧合。

要扩展自己的composer也是很简单的，比如我可以自定义一个composer（我把它命名为GemfieldComposer），这个composer可以使用/复用以下增强逻辑：

torchvision transform定义的compose；
deepvac内置的aug算子；
我自己写的aug算子。

更详细的步骤请访问：deepvac.aug模块使用

8. 编写Dataset类

代码编写在data/dataloader.py文件中。继承deepvac.datasets类体系，比如FileLineDataset类提供了对如下train.txt这种格式的封装：

#train.txt，第一列为图片路径，第二列为label
img0/1.jpg 0
img0/2.jpg 0
...
img1/0.jpg 1
...
img2/0.jpg 2
...

有时第二列是字符串，并且想把FileLineDataset中使用Image读取图片对方式替换为cv2，那么可以通过如下的继承方式来重新实现：

from deepvac.datasets import FileLineDataset

class FileLineCvStrDataset(FileLineDataset):
    def _buildLabelFromLine(self, line):
        line = line.strip().split(" ")
        return [line[0], line[1]]

    def _buildSampleFromPath(self, abs_path):
        #we just set default loader with Pillow Image
        sample = cv2.imread(abs_path)
        sample = self.compose(sample)
        return sample

哦，FileLineCvStrDataset也已经是deepvac.datasets中提供的类了。

9. 编写训练和验证脚本

在Deepvac规范中，train.py就代表了训练范式。模型训练的代码写在train.py文件中，继承DeepvacTrain类：

from deepvac import DeepvacTrain

class MyTrain(DeepvacTrain):
    pass

继承DeepvacTrain的子类可能需要重新实现以下方法才能够开始训练：

类的方法（*号表示用户一般要重新实现）	功能	备注
preEpoch	每轮Epoch之前的用户操作，DeepvacTrain啥也不做	用户可以重新定义（如果需要的话）
preIter	每个batch迭代之前的用户操作，DeepvacTrain啥也不做	用户可以重新定义（如果需要的话）
postIter	每个batch迭代之后的用户操作，DeepvacTrain啥也不做	用户可以重新定义（如果需要的话）
postEpoch	每轮Epoch之后的用户操作，DeepvacTrain啥也不做	用户可以重新定义（如果需要的话）
doFeedData2Device	DeepvacTrain把来自dataloader的sample和target(标签)移动到device设备上	用户可以重新定义（如果需要的话）
doForward	DeepvacTrain会进行网络推理，推理结果赋值给self.config.output成员	用户可以重新定义（如果需要的话）
doLoss	DeepvacTrain会使用self.config.output和self.config.target进行计算得到此次迭代的loss	用户可以重新定义（如果需要的话）
doBackward	网络反向传播过程，DeepvacTrain会调用self.config.loss.backward()进行BP	用户可以重新定义（如果需要的话）
doOptimize	网络权重更新的过程，DeepvacTrain会调用self.config.optimizer.step()	用户可以重新定义（如果需要的话）
doSchedule	更新学习率的过程，DeepvacTrain会调用self.config.scheduler.step()	用户可以重新定义（如果需要的话）
* doValAcc	在val模式下计算模型的acc，DeepvacTrain啥也不做	用户一般要重新定义，写tensorboard的时候依赖于此

典型的写法如下：

class MyTrain(DeepvacTrain):
    ...
    #因为基类不能处理list类型的标签，重写该方法
    def doFeedData2Device(self):
        self.config.target = [anno.to(self.config.device) for anno in self.config.target]
        self.config.sample = self.config.sample.to(self.config.device)

    #初始化config.core.acc
    def doValAcc(self):
        self.config.acc = your_acc
        LOG.logI('Test accuray: {:.4f}'.format(self.config.acc))


train = MyTrain(deepvac_config)
train()

10. 编写测试脚本

在Deepvac规范中，test.py就代表测试范式。测试代码写在test.py文件中，继承Deepvac类。

和train.py中的train/val的本质不同在于：

舍弃train/val上下文；
网络不再使用autograd上下文；
不再进行loss、反向、优化等计算；
使用Deepvac的*Report模块来进行准确度、速度方面的衡量；

继承Deepvac类的子类必须（重新）实现以下方法才能够开始测试：

类的方法（*号表示必需重新实现）	功能	备注
preIter	每个batch迭代之前的用户操作，Deepvac啥也不做	用户可以重新定义（如果需要的话）
postIter	每个batch迭代之后的用户操作，Deepvac啥也不做	用户可以重新定义（如果需要的话）
doFeedData2Device	Deepvac把来自dataloader的sample和target(标签)移动到device设备上	用户可以重新定义（如果需要的话）
doForward	Deepvac会进行网络推理，推理结果赋值给self.config.output成员	用户可以重新定义（如果需要的话）
doTest	用户完全自定义的test逻辑，可以通过report.add(gt, pred)添加测试结果，生成报告	看下面的测试逻辑

典型的写法如下：

class MyTest(Deepvac):
    ...
    def doTest(self):
        ...

test = MyTest(deepvac_config)
test()
#test(input_tensor)

当执行test()的时候，DeepVAC框架会按照如下的优先级进行测试：

如果用户传递了参数，比如test(input_tensor)，则将针对该input_tensor进行doFeedData2Device + doForward，然后测试结束；
如果用户重写了doTest()函数，则将执行doTest()，然后测试结束；
如果用户配置了config.my_test.test_loader，则将迭代该loader，对每个sample进行doFeedData2Device + doForward，然后测试结束；
以上都不符合，报错退出。

DeepVAC的社区产品

产品名称	简介	当前版本	获取方式/部署形式
DeepVAC	独树一帜的PyTorch工程规范	0.6.0	pip install deepvac
libdeepvac	独树一帜的PyTorch模型部署框架	1.9.0	SDK，下载 & 解压
MLab HomePod	迄今为止最先进的容器化PyTorch模型训练环境	2.0	docker run / k8s
MLab RookPod	迄今为止最先进的成本10万人民币以下的存储解决方案	NA	硬件规范 + k8s yaml
pyRBAC	基于Keycloak的RBAC python实现	NA	pip install(敬请期待)
DeepVAC版PyTorch	为MLab HomePod pro版本定制的PyTorch包	1.9.0	conda install -c gemfield pytorch
DeepVAC版LibTorch	为libdeepvac定制的LibTorch库	1.9.0	压缩包，下载 & 解压

mlab's People

Contributors

Stargazers

Watchers

Forkers

forths wesleyhuang2014 frankfan007 prince-xuanchan 1icas guigc happybird100 klonggan jasonyank liuyanyi liyunbin straitrobot challovactor weexiaolong

>>> t1.dot(t2.t())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: 1D tensors expected, but got 2D and 2D tensors

pytorch: RuntimeError: 1D tensors expected, but got 2D and 2D tensors

错误如下：

>>> t1.dot(t2.t())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: 1D tensors expected, but got 2D and 2D tensors

usermod -aG sudo gemfield

Defaults        env_reset
Defaults        mail_badpass
Defaults        secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin"

# User privilege specification
root    ALL=(ALL:ALL) ALL
gemfield ALL=(ALL:ALL) ALL
# Members of the admin group may gain root privileges
%admin ALL=(ALL) ALL

# Allow members of group sudo to execute any command
%sudo   ALL=(ALL:ALL) ALL

#includedir /etc/sudoers.d

rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/bin
rm -rf /opt/intel/conda_channel
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/benchmarks
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/examples
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/*.so
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_blas*
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_blacs*
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_gf*
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_intel_thread.a
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_pgi_thread.a
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_tbb_thread.a
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_intel_ilp64.a
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_lapack95*
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_cdft_core.a
rm -rf /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_scalapack*

[新功能] 添加kubectl客户端

添加kubectl客户端

MLab RookPod 1.0 计划的功能

功能

分布式存储，支持文件系统、块存储、对象存储；
支持热存储和冷存储；
支持数据的导出和导入，实现异地备份；

硬件规格

至少10G的网络交换；
热存储使用ssd，冷存储使用ssd或者hdd

HomePod不支持上传本地文件

deepvac.datasets 依赖 pycocotools

安装步骤：

pip3 install pycocotools

[调整] 转换NCNN所需的工具全部内置在HomePod中

从源码编译的过程也固化在Dockerfile中。

[bug] HomePod上import onnx_coreml出错

>>> import onnx_coreml
WARNING:root:scikit-learn version 0.24.1 is not supported. Minimum required version: 0.17. Maximum required version: 0.19.2. Disabling scikit-learn conversion API.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/onnx_coreml/__init__.py", line 6, in <module>
    from .converter import convert
  File "/opt/conda/lib/python3.8/site-packages/onnx_coreml/converter.py", line 35, in <module>
    from coremltools.converters.nnssa.coreml.graph_pass.mlmodel_passes import remove_disconnected_layers, transform_conv_crop

有人遇到过这个问题了：onnx/onnx-coreml#585

[新功能] 自MLab HomePod 2.0 pro以来pytorch的更新

0dc40474fe Peter Bell Tue Jul 6 19:05:39 2021 -0700 Migrate glu from the THC to ATen (CUDA) (#61153);备注：glu是GatedLinearUnit；
a69e947ffd Freey0 Wed Jul 7 07:42:49 2021 -0700 avg_pool3d_backward: Port to structured (#59084)
45ce26c397 Xue Haotian Wed Jul 7 12:32:43 2021 -0700 Port isposinf & isneginf kernel to structured kernels (#60633)
baa518e2f6 Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add Int32 support for NNAPI (#59365)
cf285d8eea Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add aten::slice NNAPI converter (#59364)
d26372794a Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add aten::detach NNAPI converter (#58543)
0be228dd5f Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add aten::flatten NNAPI converter (#60885)
b297f65b66 Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add aten::div NNAPI converter (#58541)
eab18a9a40 Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add aten::to NNAPI converter (#58540)
14d604a13e Akshit Khurana Wed Jul 7 12:37:51 2021 -0700 Add aten::softmax NNAPI converter (#58539)
179b3ab88c Xiao Wang Wed Jul 7 20:45:42 2021 -0700 [cuDNN] Enable cudnn_batchnorm_spatial_persistent for BatchNorm3d channels_last_3d (#59129)

[bug] HomePod上apt update出现错误：KeyError: 'suite'

错误如下所示：

gemfield@bd0d4c6acd4c:/etc/apt$ sudo apt update
Hit:1 http://dl.google.com/linux/chrome/deb stable InRelease
Hit:2 http://packages.microsoft.com/repos/code stable InRelease                                                      
Hit:3 https://packages.microsoft.com/repos/vscode stable InRelease                                                   
Get:4 http://security.ubuntu.com/ubuntu focal-security InRelease [109 kB]                                            
Hit:5 http://archive.ubuntu.com/ubuntu focal InRelease                                                               
Hit:6 http://ppa.launchpad.net/kdenlive/kdenlive-stable/ubuntu focal InRelease                                       
Hit:7 https://apt.repos.intel.com/mkl all InRelease                                                  
Get:8 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]                              
Get:9 http://archive.neon.kde.org/user focal InRelease [166 kB]                           
Get:10 http://security.ubuntu.com/ubuntu focal-security/main amd64 DEP-11 Metadata [28.5 kB]        
Get:11 http://archive.ubuntu.com/ubuntu focal-backports InRelease [101 kB]
Get:12 http://security.ubuntu.com/ubuntu focal-security/universe amd64 DEP-11 Metadata [71.3 kB]
Get:13 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 DEP-11 Metadata [365 kB]
Get:14 http://archive.ubuntu.com/ubuntu focal-updates/main DEP-11 64x64 Icons [87.9 kB]
Get:15 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 DEP-11 Metadata [411 kB]
Get:16 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 DEP-11 Metadata [2,540 B]
Get:17 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 DEP-11 Metadata [1,765 B]
Fetched 1,457 kB in 4s (386 kB/s)                                              
Traceback (most recent call last):
  File "/usr/lib/cnf-update-db", line 26, in <module>
    col.create(db)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 94, in create
    self._fill_commands(con)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 138, in _fill_commands
    self._parse_single_commands_file(con, fp)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 176, in _parse_single_commands_file
    suite=tagf.section["suite"]
KeyError: 'suite'
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 153, in apport_excepthook
    with os.fdopen(os.open(pr_filename,
FileNotFoundError: [Errno 2] No such file or directory: '/var/crash/_usr_lib_cnf-update-db.0.crash'

Original exception was:
Traceback (most recent call last):
  File "/usr/lib/cnf-update-db", line 26, in <module>
    col.create(db)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 94, in create
    self._fill_commands(con)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 138, in _fill_commands
    self._parse_single_commands_file(con, fp)
  File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 176, in _parse_single_commands_file
    suite=tagf.section["suite"]
KeyError: 'suite'
Reading package lists... Done

在MLab HomePod 2.0 pro上编译基于pybind11的程序出错

错误信息：

pydeepvac.cpp:37:53: error: no matching function for call to ??pybind11::class_<deepvac::SyszuxVisionTerror>::def(const char [8], <unresolved overloaded function type>)??
   37 |         .def("process", &SyszuxVisionTerror::process);
      |                                                     ^
In file included from /deepvac/libtorch/include/torch/csrc/utils/pybind.h:7,
                 from /deepvac/libtorch/include/torch/csrc/api/include/torch/python.h:12,
                 from /deepvac/libtorch/include/torch/extension.h:6,
                 from /please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:1:
/deepvac/libtorch/include/pybind11/pybind11.h:1315:13: note: candidate: ??template<class Func, class ... Extra> pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >& pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >::def(const char*, Func&&, const Extra& ...) [with Func = Func; Extra = {Extra ...}; type_ = deepvac::SyszuxVisionTerror; options = {}]??
 1315 |     class_ &def(const char *name_, Func&& f, const Extra&... extra) {
      |             ^~~
/deepvac/libtorch/include/pybind11/pybind11.h:1315:13: note:   template argument deduction/substitution failed:
/please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:37:53: note:   couldn??t deduce template parameter ??Func??
   37 |         .def("process", &SyszuxVisionTerror::process);
      |                                                     ^
In file included from /deepvac/libtorch/include/torch/csrc/utils/pybind.h:7,
                 from /deepvac/libtorch/include/torch/csrc/api/include/torch/python.h:12,
                 from /deepvac/libtorch/include/torch/extension.h:6,
                 from /please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:1:
/deepvac/libtorch/include/pybind11/pybind11.h:1333:13: note: candidate: ??template<pybind11::detail::op_id id, pybind11::detail::op_type ot, class L, class R, class ... Extra> pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >& pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >::def(const pybind11::detail::op_<id, ot, L, R>&, const Extra& ...) [with pybind11::detail::op_id id = id; pybind11::detail::op_type ot = ot; L = L; R = R; Extra = {Extra ...}; type_ = deepvac::SyszuxVisionTerror; options = {}]??
 1333 |     class_ &def(const detail::op_<id, ot, L, R> &op, const Extra&... extra) {
      |             ^~~
/deepvac/libtorch/include/pybind11/pybind11.h:1333:13: note:   template argument deduction/substitution failed:
/please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:37:53: note:   mismatched types ??const pybind11::detail::op_<id, ot, L, R>?? and ??const char [8]??
   37 |         .def("process", &SyszuxVisionTerror::process);
      |                                                     ^
In file included from /deepvac/libtorch/include/torch/csrc/utils/pybind.h:7,
                 from /deepvac/libtorch/include/torch/csrc/api/include/torch/python.h:12,
                 from /deepvac/libtorch/include/torch/extension.h:6,
                 from /please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:1:
/deepvac/libtorch/include/pybind11/pybind11.h:1345:13: note: candidate: ??template<class ... Args, class ... Extra> pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >& pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >::def(const pybind11::detail::initimpl::constructor<Args ...>&, const Extra& ...) [with Args = {Args ...}; Extra = {Extra ...}; type_ = deepvac::SyszuxVisionTerror; options = {}]??
 1345 |     class_ &def(const detail::initimpl::constructor<Args...> &init, const Extra&... extra) {
      |             ^~~
/deepvac/libtorch/include/pybind11/pybind11.h:1345:13: note:   template argument deduction/substitution failed:
/please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:37:53: note:   mismatched types ??const pybind11::detail::initimpl::constructor<Args ...>?? and ??const char [8]??
   37 |         .def("process", &SyszuxVisionTerror::process);
      |                                                     ^
In file included from /deepvac/libtorch/include/torch/csrc/utils/pybind.h:7,
                 from /deepvac/libtorch/include/torch/csrc/api/include/torch/python.h:12,
                 from /deepvac/libtorch/include/torch/extension.h:6,
                 from /please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:1:
/deepvac/libtorch/include/pybind11/pybind11.h:1351:13: note: candidate: ??template<class ... Args, class ... Extra> pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >& pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >::def(const pybind11::detail::initimpl::alias_constructor<Args ...>&, const Extra& ...) [with Args = {Args ...}; Extra = {Extra ...}; type_ = deepvac::SyszuxVisionTerror; options = {}]??
 1351 |     class_ &def(const detail::initimpl::alias_constructor<Args...> &init, const Extra&... extra) {
      |             ^~~
/deepvac/libtorch/include/pybind11/pybind11.h:1351:13: note:   template argument deduction/substitution failed:
/please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:37:53: note:   mismatched types ??const pybind11::detail::initimpl::alias_constructor<Args ...>?? and ??const char [8]??
   37 |         .def("process", &SyszuxVisionTerror::process);
      |                                                     ^
In file included from /deepvac/libtorch/include/torch/csrc/utils/pybind.h:7,
                 from /deepvac/libtorch/include/torch/csrc/api/include/torch/python.h:12,
                 from /deepvac/libtorch/include/torch/extension.h:6,
                 from /please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:1:
/deepvac/libtorch/include/pybind11/pybind11.h:1357:13: note: candidate: ??template<class ... Args, class ... Extra> pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >& pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >::def(pybind11::detail::initimpl::factory<Args ...>&&, const Extra& ...) [with Args = {Args ...}; Extra = {Extra ...}; type_ = deepvac::SyszuxVisionTerror; options = {}]??
 1357 |     class_ &def(detail::initimpl::factory<Args...> &&init, const Extra&... extra) {
      |             ^~~
/deepvac/libtorch/include/pybind11/pybind11.h:1357:13: note:   template argument deduction/substitution failed:
/please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:37:53: note:   mismatched types ??pybind11::detail::initimpl::factory<Args ...>?? and ??const char [8]??
   37 |         .def("process", &SyszuxVisionTerror::process);
      |                                                     ^
In file included from /deepvac/libtorch/include/torch/csrc/utils/pybind.h:7,
                 from /deepvac/libtorch/include/torch/csrc/api/include/torch/python.h:12,
                 from /deepvac/libtorch/include/torch/extension.h:6,
                 from /please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:1:
/deepvac/libtorch/include/pybind11/pybind11.h:1363:13: note: candidate: ??template<class ... Args, class ... Extra> pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >& pybind11::class_< <template-parameter-1-1>, <template-parameter-1-2> >::def(pybind11::detail::initimpl::pickle_factory<Args ...>&&, const Extra& ...) [with Args = {Args ...}; Extra = {Extra ...}; type_ = deepvac::SyszuxVisionTerror; options = {}]??
 1363 |     class_ &def(detail::initimpl::pickle_factory<Args...> &&pf, const Extra &...extra) {
      |             ^~~
/deepvac/libtorch/include/pybind11/pybind11.h:1363:13: note:   template argument deduction/substitution failed:
/please_cd_to/home/gemfield/libdeepvac-face/src/pydeepvac.cpp:37:53: note:   mismatched types ??pybind11::detail::initimpl::pickle_factory<Args ...>?? and ??const char [8]??
   37 |         .def("process", &SyszuxVisionTerror::process);

MLab HomdPod 2.1 待添加功能

添加对torch.elastic的支持；
nvidia驱动瞄准ubuntu 20.04 仓库的最新驱动；

[新功能] 添加转换tensorrt所需的包

[新功能]安装最新protobuf

在HomePod上

#dependency
sudo apt-get install autoconf automake libtool curl make g++ unzip

git clone https://github.com/protocolbuffers/protobuf.git
cd protobuf
git submodule update --init --recursive
bash -x ./autogen.sh

./configure
 make
 make check
make install
#refresh shared library cache.
ldconfig

apt install python3-pybind11

这个包有如下依赖：

gemfield@ThinkPad-X1C:~$ apt show python3-pybind11
Package: python3-pybind11
Version: 2.5.0-5
Source: pybind11
Origin: Ubuntu
Installed-Size: 610 kB
Depends: python3:any, pybind11-dev (= 2.5.0-5)
Recommends: python3-numpy
Homepage: https://github.com/pybind/pybind11
Download-Size: 113 kB
APT-Manual-Installed: yes
APT-Sources: http://archive.ubuntu.com/ubuntu groovy/universe amd64 Packages
......

导致实际会安装如下3个deb包：

python3-pybind11
pybind11-dev
python3

而这并不是必须的。

其中，python3-pybind11包含的文件如下：

/usr/lib/python3/dist-packages/pybind11-2.5.0.egg-info/dependency_links.txt
/usr/lib/python3/dist-packages/pybind11-2.5.0.egg-info/top_level.txt
/usr/lib/python3/dist-packages/pybind11-2.5.0.egg-info/not-zip-safe
/usr/lib/python3/dist-packages/pybind11-2.5.0.egg-info/PKG-INFO
/usr/lib/python3/dist-packages/pybind11/_version.py
/usr/lib/python3/dist-packages/pybind11/__main__.py
/usr/lib/python3/dist-packages/pybind11/__init__.py
/usr/lib/python3/dist-packages/pybind11/include/pybind11/cast.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/complex.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/buffer_info.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/common.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/operators.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/functional.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/attr.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/pytypes.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/embed.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/eigen.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/stl.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/eval.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/iostream.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/numpy.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/pybind11.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/options.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/chrono.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/stl_bind.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/detail/common.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/detail/typeid.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/detail/internals.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/detail/init.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/detail/descr.h
/usr/lib/python3/dist-packages/pybind11/include/pybind11/detail/class.h

pybind11-dev包含的文件如下：

/usr/lib/cmake/pybind11/pybind11Targets.cmake
/usr/lib/cmake/pybind11/pybind11Tools.cmake
/usr/lib/cmake/pybind11/FindPythonLibsNew.cmake
/usr/lib/cmake/pybind11/pybind11ConfigVersion.cmake
/usr/lib/cmake/pybind11/pybind11Config.cmake
/usr/share/doc/python3-pybind11/copyright
/usr/share/doc/pybind11-dev/copyright
/usr/share/doc/pybind11-dev/changelog.Debian.gz
/usr/include/pybind11/cast.h
/usr/include/pybind11/complex.h
/usr/include/pybind11/buffer_info.h
/usr/include/pybind11/common.h
/usr/include/pybind11/operators.h
/usr/include/pybind11/functional.h
/usr/include/pybind11/attr.h
/usr/include/pybind11/pytypes.h
/usr/include/pybind11/embed.h
/usr/include/pybind11/eigen.h
/usr/include/pybind11/stl.h
/usr/include/pybind11/eval.h
/usr/include/pybind11/iostream.h
/usr/include/pybind11/numpy.h
/usr/include/pybind11/pybind11.h
/usr/include/pybind11/options.h
/usr/include/pybind11/chrono.h
/usr/include/pybind11/stl_bind.h
/usr/include/pybind11/detail/common.h
/usr/include/pybind11/detail/typeid.h
/usr/include/pybind11/detail/internals.h
/usr/include/pybind11/detail/init.h
/usr/include/pybind11/detail/descr.h
/usr/include/pybind11/detail/class.h

# on ubuntu 18.04
xrdp-sesman[57]: (57)(140460571419968)[INFO ] ++ created session (access granted): username gemfield, ip ::ffff:222.128.gemfield:33552 - socket: 11
xrdp-sesman[57]: (57)(140460571419968)[INFO ] starting Xorg session...
xrdp-sesman[57]: (57)(140460571419968)[DEBUG] Closed socket 8 (AF_INET6 :: port 5910)
xrdp-sesman[57]: (57)(140460571419968)[DEBUG] Closed socket 8 (AF_INET6 :: port 6010)
xrdp-sesman[57]: (57)(140460571419968)[DEBUG] Closed socket 8 (AF_INET6 :: port 6210)
xrdp-sesman[57]: (57)(140460571419968)[DEBUG] Closed socket 7 (AF_INET6 ::ffff:127.0.0.1 port 3350)
xrdp[62]: (62)(139675925534528)[INFO ] xrdp_wm_log_msg: login successful for display 10
xrdp[62]: (62)(139675925534528)[INFO ] xrdp_wm_log_msg: login successful for display 10
xrdp-sesman[64]: (64)(140460571419968)[INFO ] calling auth_start_session from pid 64
xrdp[62]: (62)(139675925534528)[DEBUG] xrdp_wm_log_msg: started connecting
xrdp-sesman[64]: pam_unix(xrdp-sesman:session): session opened for user gemfield by (uid=0)
dbus-daemon[10]: [system] Activating service name='org.freedesktop.login1' requested by ':1.5' (uid=0 pid=64 comm="/usr/sbin/xrdp-sesman --nodaemon " label="docker-default (enforce)") (using servicehelper)
New seat seat0.
dbus-daemon[10]: [system] Activating service name='org.freedesktop.systemd1' requested by ':1.6' (uid=0 pid=66 comm="/lib/systemd/systemd-logind " label="docker-default (enforce)") (using servicehelper)
dbus-daemon[10]: [system] Successfully activated service 'org.freedesktop.login1'
dbus-daemon[10]: [system] Activated service 'org.freedesktop.systemd1' failed: Launch helper exited with unknown return code 1


# on ubuntu 20.04
xrdp-sesman[11]: (11)(140645830477376)[INFO ] ++ created session (access granted): username gemfield, ip ::ffff:222.128.gemfield:54766 - socket: 11
xrdp-sesman[11]: (11)(140645830477376)[INFO ] starting Xorg session...
xrdp-sesman[11]: (11)(140645830477376)[DEBUG] Closed socket 8 (AF_INET6 :: port 5910)
xrdp-sesman[11]: (11)(140645830477376)[DEBUG] Closed socket 8 (AF_INET6 :: port 6010)
xrdp-sesman[11]: (11)(140645830477376)[DEBUG] Closed socket 8 (AF_INET6 :: port 6210)
xrdp[37]: (37)(139680321079104)[INFO ] xrdp_wm_log_msg: login successful for display 10
xrdp-sesman[11]: (11)(140645830477376)[DEBUG] Closed socket 7 (AF_INET6 ::ffff:127.0.0.1 port 3350)
xrdp[37]: (37)(139680321079104)[DEBUG] xrdp_wm_log_msg: started connecting
xrdp-sesman[39]: (39)(140645830477376)[INFO ] calling auth_start_session from pid 39
xrdp-sesman[39]: pam_unix(xrdp-sesman:session): session opened for user gemfield by (uid=0)
dbus-daemon[9]: [system] Activating service name='org.freedesktop.login1' requested by ':1.0' (uid=0 pid=39 comm="/usr/sbin/xrdp-sesman --nodaemon " label="docker-default (enforce)") (using servicehelper)
dbus-daemon[9]: [system] Activated service 'org.freedesktop.login1' failed: Launch helper exited with unknown return code 1
xrdp-sesman[39]: pam_systemd(xrdp-sesman:session): Failed to create session: Launch helper exited with unknown return code 1
xrdp-sesman[39]: (39)(140645830477376)[DEBUG] Closed socket 6 (AF_INET6 ::ffff:127.0.0.1 port 3350)
xrdp-sesman[39]: (39)(140645830477376)[DEBUG] Closed socket 7 (AF_INET6 ::ffff:127.0.0.1 port 3350)
......
dbus-daemon[9]: [system] Activating service name='org.freedesktop.login1' requested by ':1.1' (uid=1000 pid=43 comm="/usr/lib/xorg/Xorg :10 -auth .Xauthority -config x" label="docker-default (enforce)") (using servicehelper)
dbus-daemon[9]: [system] Activated service 'org.freedesktop.login1' failed: Launch helper exited with unknown return code 1

可以看到，在ubuntu20.04上，有关于org.freedesktop.login1的dbus错误。

MLab HomePod上onnx转TNN报错：ImportError: cannot import name 'optimizer' from 'onnx' (/opt/conda/lib/python3.8/site-packages/onnx/init.py)

转换命令如下（开启了-optimize=1 开关）：

python3 onnx2tnn.py /app/gemfield/onnxmodels/v2.onnx -version=v1.0 -optimize=1 -half=0 -o /app/gemfield/onnxmodels

然后报错：

Traceback (most recent call last):
  File "onnx2tnn.py", line 41, in do_optimize
    import onnx2tnn.onnx_optimizer.onnx_optimizer as opt
ModuleNotFoundError: No module named 'onnx2tnn.onnx_optimizer'; 'onnx2tnn' is not a package

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "onnx2tnn.py", line 148, in <module>
    main()
  File "onnx2tnn.py", line 120, in main
    do_optimize(onnx_net_path, input_shape)
  File "onnx2tnn.py", line 43, in do_optimize
    import onnx_optimizer.onnx_optimizer as opt
  File "/app/gemfield/github/TNN/tools/onnx2tnn/onnx-converter/onnx_optimizer/onnx_optimizer.py", line 8, in <module>
    from onnx import optimizer
ImportError: cannot import name 'optimizer' from 'onnx' (/opt/conda/lib/python3.8/site-packages/onnx/__init__.py)

Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired

模型推理时被warning信息刷屏：

环境 MLab HomePod 2.0 pro
错误信息：

[W ___torch_mangle_579.py:74] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:115] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:156] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:196] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:228] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:260] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
no text detected
[W ___torch_mangle_579.py:74] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:115] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:156] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:196] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:228] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )
[W ___torch_mangle_579.py:260] Warning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. (function )

[新功能] 添加faiss

RUN git clone https://github.com/facebookresearch/faiss && \
    cd faiss && \
    mkdir build && \
    cd build && \
    cmake .. && \
    make VERBOSE=1 && \
    make install && \
    cd faiss/python && \
    python setup.py install && \
    cd ../../ && \
    make clean

boost开发库（libboost-dev libboost-filesystem-dev libboost-program-options-dev libboost-system-dev）；
cuda开发库（pro直接基于nvidia/cuda:devel系列镜像）；
MKL静态库（从intel仓库下载的，用于编译静态libtorch应用）；
pycuda包，tensorrt运行时的依赖；

HomePod的主题风格能否增加macOS大亚瑟

macOS大亚瑟确实比默认的漂亮，但是需要评估下。
此外，登录头像默认设置为DeepVAC LOGO。

AI推理部署环境最低要求

宿主机：

x86_64 cpu；
Linux;
8G RAM;
4G CUDA RAM(图灵架构或以上);
nvidia-driver 450+;
docker 19.3+;
NVIDIA Container Toolkit；

每个AI推理能力：

4G CUDA RAM;
4G RAM;
4 cpu cores.

homepod:2.0-pro torch.cuda.is_available return False and throws " Error 804: forward compatibility was attempted on non supported HW "

ENV

显卡型号: GeForce GTX 1650
驱动版本: Driver Version: 460.80
CUDA版本: CUDA Version: 11.3
系统: Ubuntu20.04

ERROR

nvidia-smi ok
import torch; print(torch.cuda.is_available())

/opt/conda/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at  /opt/conda/conda-bld/pytorch_1623448278899/work/c10/cuda/CUDAFunctions.cpp:115.)
  return torch._C._cuda_getDeviceCount() > 0
False

[新功能] 添加一组基础python包

onnx
onnxruntime
onnx-simplifier
requests
protobuf ?

[新功能] 自HomePod 1.0以来pytorch的更新

fd02fc5d715a7647631c5806db736794edc2a52f: Port put_ and take from TH to ATen
4170a6cc24c6867ca6cd48f5581e98a4be89e593: Migrate mode from TH to ATen
6866c033d5aa134a83bc1cb84e3e084a7329167f: [JIT] Add recursive scripting for class type module attributes

HomePod的system settings打开是黑屏

HomePod的system settings打开是黑屏，估计是缺少KDE组件。

[新功能] 添加对TNN模型转换的支持

[bug] 运行homepod 1.1的时候报错：Error response from daemon: AppArmor enabled on system but the docker-default profile could not be loaded: strconv.Atoi: parsing "file": invalid syntax.