lyken17 / efficient-pytorch Goto Github PK

My best practice of training large dataset using PyTorch.

Python 100.00%

efficient-pytorch's Introduction

Efficient-PyTorch

My best practice of training large dataset using PyTorch.

Speed overview

By following the tips, we can reach achieve ~730 images/second with PyTorch when training ResNet-50 on ImageNet. According to benchmark reported on Tensorflow and MXNet, the performance is still competitive.

Epoch: [0][430/5005]    Time 0.409 (0.405)      Data 626.6 (728.0)      Loss 6.8381 (6.9754)    Error@1 100.000 (99.850) Error@5 99.609 (99.259)
Epoch: [0][440/5005]    Time 0.364 (0.404)      Data 704.2 (727.9)      Loss 6.8506 (6.9725)    Error@1 100.000 (99.851) Error@5 99.609 (99.258)
Epoch: [0][450/5005]    Time 0.350 (0.403)      Data 730.7 (727.3)      Loss 6.8846 (6.9700)    Error@1 100.000 (99.847) Error@5 99.609 (99.258)
Epoch: [0][460/5005]    Time 0.357 (0.402)      Data 716.8 (727.4)      Loss 6.9129 (6.9680)    Error@1 100.000 (99.849) Error@5 99.609 (99.256)
Epoch: [0][470/5005]    Time 0.346 (0.401)      Data 740.8 (727.4)      Loss 6.8574 (6.9657)    Error@1 100.000 (99.850) Error@5 98.828 (99.249)
Epoch: [0][480/5005]    Time 0.425 (0.400)      Data 601.8 (727.3)      Loss 6.8467 (6.9632)    Error@1 100.000 (99.849) Error@5 99.609 (99.239)
Epoch: [0][490/5005]    Time 0.358 (0.399)      Data 715.2 (727.2)      Loss 6.8319 (6.9607)    Error@1 100.000 (99.848) Error@5 99.609 (99.232)
Epoch: [0][500/5005]    Time 0.347 (0.399)      Data 737.4 (726.9)      Loss 6.8426 (6.9583)    Error@1 99.609 (99.843)  Error@5 98.047 (99.220)
Epoch: [0][510/5005]    Time 0.346 (0.398)      Data 740.5 (726.7)      Loss 6.8245 (6.9561)    Error@1 100.000 (99.839) Error@5 99.609 (99.211)
Epoch: [0][520/5005]    Time 0.350 (0.452)      Data 730.7 (724.0)      Loss 6.8270 (6.9538)    Error@1 99.609 (99.834)  Error@5 97.656 (99.193)
Epoch: [0][530/5005]    Time 0.340 (0.450)      Data 752.9 (724.4)      Loss 6.8149 (6.9516)    Error@1 100.000 (99.832) Error@5 98.047 (99.183)

Key Points of Efficiency

Now most frameworks adapt CUDNN as their backends. Without special optimization, the inference time is similiar across frameworks. To optimize training time, we focus on other points such as

Data Loader

The default combination datasets.ImageFolder + data.DataLoader is not enough for large scale classification. According to my experience, even I upgrade to Samsung 960 Pro (read 3.5 GB/s, write 2.0 GB/s), whole training pipeline still suffers at disk I/O.

The reason causing is the slow reading of discountiuous small chunks. To optimize, we need to dump small JPEG images into a large binary file. TensorFlow has its own TFRecord and MXNet uses recordIO. Beside these two, there are other options like hdf5, pth, n5, lmdb etc. Here I choose lmdb because

TFRecord is a private protocal which is hard to hack into. RecordIO's documentation is confusing and do not provide a clean python API.
hdf5 pth n5, though with a straightforward json-like API, require to put the whole file into memory. This is not practicle when you play with large dataset like imagenet.

Data Parallel

The default data parallel of PyTorch, powerd by nn.DataParallel, is in-efficienct! Fisrt, because the GIL of Python, multi-threading do not fully utilize all cores torch/nn/parallel/parallel_apply.py#47. Second, the collective scheme of DataParallel is to gather all results on cuda:0. It leads to imbalance workload and sometimes OOM especially you are running segmentation models.

nn.DistributedDataParllel provides a more elegant solution: Instead of launching call from different threads, it starts with multiple processes (no GIL) and assigns a balanced workload for all GPUs.

(on-going) detailed scripts and experiment numbers.

efficient-pytorch's People

Contributors

Stargazers

Watchers

Forkers

avaritiakael fangyh09 leizi007 hyzcn hzhang57 starstylesky rosefun zdqf happog linzai1992 sunnflower gu1h jangocheng hhy5277 zehaoy npuichigo tahaemara shunsunsun baker-xie stevenlol huiwudiyi harsh-wardhan kleinxin winowang leo-xxx yyuananyvision h-jia phecy sherrycattt chenghuige guhaifudeng highclow chen-song queenie88 yirongmao fedral junhua-zhang chengyuegongr dev233 juampamuc goldentimecoolk wuzhongwulidong rmccorm4 chunlang carol007 lykhahaha talqinyong qipengzhou gabrieldernbach we0091234 slayersong betterhalfwzm lliai yptheangel antecede ashwathaithal seutao robot-ai-machinelearning shijinglei raymon-tian maridia super-ljg yvzhenghao zp1018 favcode pilotbear zhhezhhe kyhoolee nathanlem1 remiyoudu shunlu91 xiang-deng-dl samra-irshad mohitzsh blankworld azuredsky syedrz hehuiguo gkrislara xiaotown123 wuterry codermckee askintution feiwang2018 ylxqll harry-675 zhangxinnan xrosliang jiaxinging jeongah-shin jxncyym thecml cynthia0811 hawl666 immortalsdm chenshen03 dannda smilearc1999 bangbangbanana cpppy

efficient-pytorch's Issues

Mark

msgpack Error

I run into the following error. I am using python 3.5.
Traceback (most recent call last):
File "train.py", line 47, in
train_dataset = ImageFolderLMDB(traindir, train_transform)
File "/media/folder2lmdb.py", line 32, in init
self.keys = msgpack.loads(txn.get(b'keys'))
File "msgpack/_unpacker.pyx", line 209, in msgpack._cmsgpack.unpackb msgpack.exceptions.ExtraData: unpack(b) received extra data.

Large memory occupation

Hi, I'm training faster-rcnn on 4 gpus with coco dataset converted to LMDB.
I used num_worker=4 for the dataloader and I found that the memory occupation is almost 60Gb.
I suspect that the whole dataset is read into memory. But per your description in readme,

Here I choose lmdb because
2. hdf5 pth n5, though with a straightforward json-like API, require to put the whole file into memory. This is not practicle when you play with large dataset like imagenet.

LMDB shouldn't perform like this. Any thought about this?
I can share part of my dataset code

class LMDBWrapper(object):
    def __init__(self, lmdb_path):
        self.env = lmdb.open(lmdb_path, max_readers=1, 
                             subdir=os.path.isdir(lmdb_path),
                             readonly=True, lock=False,
                             readahead=False, meminit=False)
        with self.env.begin(write=False) as txn:
            self.length = pa.deserialize(txn.get(b'__len__'))
            self.keys = pa.deserialize(txn.get(b'__keys__'))

    def get_image(self, image_key):
        env = self.env
        with env.begin(write=False) as txn:
            byteflow = txn.get(u'{}'.format(image_key).encode('ascii'))
        imgbuf = pa.deserialize(byteflow)
        buf = six.BytesIO()
        buf.write(imgbuf)
        buf.seek(0)
        image = Image.open(buf).convert('RGB')

        return np.asarray(image)


class LMDBDataset(Dataset):
    def __init__(self, lmdb_path):
        self.lmdb = None
        self.lmdb_path = lmdb_path

    def init_lmdb(self):
        self.lmdb = LMDBWrapper(self.lmdb_path)

    def __getitem__(self, idx):
        if self.lmdb is None:
            self.init_lmdb()

class CocoInstanceLMDBDataset(LMDBDataset):
    def __init__(self, lmdb_path):
        super().__init__(lmdb_path=lmdb_path)

    def __getitem__(self, idx):
        super().__getitem__(idx)
        ann = self.filtered_anns[idx]
        data = dict()
        # transforms
        return data

Final lmdb file for ImageNet?

Great work.
Could you provide the final lmdb file for ImageNet?

About the transform in dataset

Hello!

Your code for accelerating training is really helpful! Thank you!
In most cases, we only need several transformation for data augmentation such as flip, multi-crop, and I noticed that the code released in Non-local(https://github.com/facebookresearch/video-nonlocal-net/tree/master/process_data/kinetics) store the transformed data into lmdb file, will this accelerate the training the your current code? or have you compared the two method?
If you have some experiment on that, could you share with us ?

What's the function of this line ?Why are you give 'None' to img,target?

Efficient-PyTorch/tools/folder2lmdb.py

Line 38 in 7407177

img, target = None, None

Open more than once "lmdb.open()" would lead to performance drop

Hi, thanks for your hard work. it helps me a lot.
In my case, I've tried to write my data into lmdb format to boost my pytorch performance, however, I've encountered a very strange situation.
if I write code like this in my train.py:

    train_dataset = CamVid(
        'data',
        image_set='train',
        download=args.download
    )
   
    valid_dataset = CamVid(
        'data',
        image_set='val',
        download=args.download
    )

the nvidia-smi command gives me around 80% GPU-Util during training, and if I do this:

    train_dataset = CamVid(
        'data',
        image_set='train',
        download=args.download
    )
     valid_dataset = train_dataset   # do not initialize another dataset instance

GPU-Util immediately boost to 97%.I've tried to save my training data and validation data into 1 lmdb file or separately, nothing changes, is there a reason why this would happen?Why use lmdb.open() function twice in a process would cause this difference, is there anything I did wrong? Thanks

There is my dataset script: https://github.com/weiaicunzai/pytorch-camvid/blob/refactor/optimize_transformations/dataset/camvid.py

Now my project code is on: https://github.com/weiaicunzai/pytorch-camvid/tree/refactor/optimize_transformations. If you need. Please note it's not on the master branch
Simply do

python train.py -net segnet -b 5

to run my code

Thanks in advance.

Speed is a bit slower after using lmdb.

Speed is a bit slower after using lmdb.
30k images with size 1000x1000. The images are stored in SSD.
Are there some locks in lmdb slow the speed?

GIL Claim

I am a bit confused about what you mentioned about the GIL not allowing truly parallelizable code. The official docs here seem to claim if you set num_workers to anything > 1, you can use individual python processes.

training slower after using lmdb

作者，您好，我将自己的图片数据集转换成lmdb后，训练的速度反而下降了。
查看GPU利用率发现，GPU利用率出现很大的波动，从10%-95%。然后我增加num_workers=8，GPU利用率还是会波动，其中CPU利用率只有一个为500%，其余CPU利用率均为20%附近，很明显任务没有均匀的分配到各个CPU线程中去。
我的理解是将图片转换成lmdb文件后，IO的时间会大大减少，GPU利用率应该较高，但是实际却波动很大，请问是不是我的lmdb文件创建出现了问题？
我的lmdb文件创建和数据集读取如下：

关于LMDB的运行与效率

你好！
使用lmdb生成数据集遇到两个问题：
1.在生成mdb文件的过程中，txn.put数据的效率会剧烈下降,甚至程序卡死
2.你是否知道pytorch的Dataloader在GPU运算是，是否会同时准备下一个batch的数据，因为我的数据预处理方法不是调用torchvision中的方法
如果我有描述不清的地方，请告知，谢谢。 @Lyken17

Why does it become slower than pytorch ImageFloder after using ImageFolderLMDB?

I do not know why using ImageFolderLMDB is slower than the original pytorch imagefolder? As seen below, the data loader is still very slow.
==> training...
Epoch: [1][0/10010] Time 20.673 (20.673) Data 19.890 (19.890) Loss 18.0700 (18.0700) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000)
Epoch: [1][128/10010] Time 5.867 (2.609) Data 5.624 (2.248) Loss 14.5102 (14.8609) Acc@1 0.000 (0.212) Acc@5 2.344 (1.163)
Epoch: [1][256/10010] Time 0.427 (2.561) Data 0.000 (2.194) Loss 13.6098 (14.3173) Acc@1 0.781 (0.395) Acc@5 4.688 (2.022)
Epoch: [1][384/10010] Time 1.767 (2.547) Data 1.533 (2.185) Loss 13.1298 (14.0282) Acc@1 0.781 (0.611) Acc@5 4.688 (2.713)
Epoch: [1][512/10010] Time 0.427 (2.536) Data 0.000 (2.168) Loss 13.2681 (13.7872) Acc@1 1.562 (0.778) Acc@5 7.031 (3.513)
Epoch: [1][640/10010] Time 0.428 (2.546) Data 0.000 (2.171) Loss 13.5265 (13.5746) Acc@1 0.781 (0.963) Acc@5 5.469 (4.254)
Epoch: [1][768/10010] Time 0.428 (2.528) Data 0.000 (2.148) Loss 13.7340 (13.4076) Acc@1 2.344 (1.185) Acc@5 11.719 (5.024)
Epoch: [1][896/10010] Time 0.428 (2.545) Data 0.000 (2.161) Loss 9.5429 (13.2311) Acc@1 2.344 (1.360) Acc@5 14.844 (5.666)
Epoch: [1][1024/10010] Time 0.432 (2.552) Data 0.000 (2.167) Loss 13.3585 (13.0788) Acc@1 4.688 (1.572) Acc@5 15.625 (6.357)
Epoch: [1][1152/10010] Time 0.427 (2.573) Data 0.000 (2.189) Loss 10.7350 (12.9583) Acc@1 3.125 (1.752) Acc@5 14.844 (6.968)
Epoch: [1][1280/10010] Time 0.426 (2.582) Data 0.000 (2.197) Loss 11.1195 (12.8287) Acc@1 1.562 (1.976) Acc@5 10.938 (7.657)
Epoch: [1][1408/10010] Time 0.428 (2.596) Data 0.000 (2.210) Loss 11.9660 (12.6995) Acc@1 8.594 (2.179) Acc@5 22.656 (8.289)
Epoch: [1][1536/10010] Time 0.428 (2.617) Data 0.000 (2.232) Loss 12.3775 (12.5861) Acc@1 5.469 (2.372) Acc@5 19.531 (8.919)
Epoch: [1][1664/10010] Time 0.429 (2.630) Data 0.000 (2.245) Loss 13.2347 (12.4921) Acc@1 3.906 (2.576) Acc@5 15.625 (9.503)
Epoch: [1][1792/10010] Time 0.428 (2.644) Data 0.000 (2.260) Loss 13.2709 (12.3985) Acc@1 5.469 (2.781) Acc@5 20.312 (10.083)
Epoch: [1][1920/10010] Time 0.429 (2.654) Data 0.000 (2.272) Loss 12.1958 (12.2974) Acc@1 3.906 (2.996) Acc@5 12.500 (10.677)
Epoch: [1][2048/10010] Time 0.428 (2.660) Data 0.000 (2.278) Loss 11.1101 (12.1962) Acc@1 8.594 (3.199) Acc@5 19.531 (11.232)
Epoch: [1][2176/10010] Time 0.427 (2.668) Data 0.000 (2.287) Loss 10.9185 (12.1079) Acc@1 8.594 (3.425) Acc@5 23.438 (11.803)
Epoch: [1][2304/10010] Time 0.427 (2.676) Data 0.000 (2.294) Loss 9.6112 (12.0138) Acc@1 6.250 (3.621) Acc@5 20.312 (12.326)
Epoch: [1][2432/10010] Time 0.428 (2.681) Data 0.000 (2.298) Loss 10.1364 (11.9359) Acc@1 5.469 (3.829) Acc@5 17.969 (12.881)
Epoch: [1][2560/10010] Time 0.429 (2.690) Data 0.000 (2.307) Loss 11.3065 (11.8573) Acc@1 10.156 (4.054) Acc@5 21.875 (13.425)
Epoch: [1][2688/10010] Time 0.427 (2.695) Data 0.000 (2.312) Loss 9.2579 (11.7760) Acc@1 9.375 (4.267) Acc@5 24.219 (13.953)
Epoch: [1][2816/10010] Time 0.429 (2.700) Data 0.000 (2.316) Loss 9.7976 (11.7000) Acc@1 7.031 (4.478) Acc@5 24.219 (14.497)
Epoch: [1][2944/10010] Time 0.429 (2.703) Data 0.000 (2.319) Loss 10.7845 (11.6275) Acc@1 8.594 (4.686) Acc@5 31.250 (14.988)
Epoch: [1][3072/10010] Time 0.429 (2.707) Data 0.000 (2.322) Loss 10.2352 (11.5538) Acc@1 7.812 (4.890) Acc@5 25.781 (15.484)
Epoch: [1][3200/10010] Time 0.427 (2.711) Data 0.000 (2.326) Loss 8.9137 (11.4865) Acc@1 7.812 (5.096) Acc@5 28.906 (15.967)
Epoch: [1][3328/10010] Time 0.427 (2.716) Data 0.000 (2.333) Loss 9.0424 (11.4188) Acc@1 14.844 (5.327) Acc@5 30.469 (16.466)
Epoch: [1][3456/10010] Time 0.425 (2.719) Data 0.000 (2.336) Loss 8.0848 (11.3500) Acc@1 11.719 (5.547) Acc@5 31.250 (16.957)
Epoch: [1][3584/10010] Time 0.428 (2.721) Data 0.000 (2.339) Loss 9.6352 (11.2820) Acc@1 8.594 (5.767) Acc@5 31.250 (17.437)
Epoch: [1][3712/10010] Time 0.424 (2.737) Data 0.000 (2.355) Loss 9.3809 (11.2152) Acc@1 12.500 (5.973) Acc@5 34.375 (17.908)
Epoch: [1][3840/10010] Time 0.428 (2.750) Data 0.000 (2.367) Loss 9.5469 (11.1501) Acc@1 17.188 (6.177) Acc@5 35.156 (18.367)
Epoch: [1][3968/10010] Time 14.771 (2.754) Data 14.521 (2.373) Loss 8.2969 (11.0877) Acc@1 21.094 (6.392) Acc@5 40.625 (18.829)
Epoch: [1][4096/10010] Time 19.031 (2.757) Data 18.763 (2.375) Loss 10.2194 (11.0260) Acc@1 11.719 (6.595) Acc@5 36.719 (19.264)
Epoch: [1][4224/10010] Time 15.243 (2.758) Data 14.990 (2.376) Loss 9.9028 (10.9593) Acc@1 16.406 (6.821) Acc@5 37.500 (19.714)
Epoch: [1][4352/10010] Time 0.427 (2.758) Data 0.000 (2.375) Loss 9.0877 (10.8982) Acc@1 10.938 (7.043) Acc@5 35.938 (20.154)
Epoch: [1][4480/10010] Time 0.428 (2.759) Data 0.000 (2.376) Loss 8.4131 (10.8390) Acc@1 17.188 (7.257) Acc@5 40.625 (20.591)
Epoch: [1][4608/10010] Time 0.428 (2.760) Data 0.000 (2.378) Loss 8.1081 (10.7796) Acc@1 17.188 (7.478) Acc@5 41.406 (21.022)
Epoch: [1][4736/10010] Time 0.427 (2.763) Data 0.000 (2.381) Loss 8.6183 (10.7280) Acc@1 16.406 (7.686) Acc@5 37.500 (21.441)
Epoch: [1][4864/10010] Time 0.428 (2.763) Data 0.000 (2.381) Loss 8.0330 (10.6725) Acc@1 21.094 (7.886) Acc@5 46.094 (21.837)
Epoch: [1][4992/10010] Time 0.432 (2.765) Data 0.000 (2.383) Loss 8.0333 (10.6177) Acc@1 14.062 (8.091) Acc@5 32.031 (22.243)
Epoch: [1][5120/10010] Time 0.427 (2.767) Data 0.000 (2.386) Loss 8.0960 (10.5613) Acc@1 17.969 (8.296) Acc@5 41.406 (22.637)
Epoch: [1][5248/10010] Time 0.437 (2.768) Data 0.000 (2.387) Loss 10.3152 (10.5071) Acc@1 14.062 (8.490) Acc@5 42.969 (23.038)
Epoch: [1][5376/10010] Time 0.427 (2.769) Data 0.000 (2.388) Loss 7.0815 (10.4570) Acc@1 21.094 (8.688) Acc@5 46.875 (23.417)
Epoch: [1][5504/10010] Time 0.429 (2.771) Data 0.000 (2.390) Loss 7.7832 (10.4077) Acc@1 18.750 (8.889) Acc@5 37.500 (23.800)
Epoch: [1][5632/10010] Time 0.430 (2.772) Data 0.000 (2.392) Loss 7.5529 (10.3525) Acc@1 20.312 (9.094) Acc@5 41.406 (24.189)
Epoch: [1][5760/10010] Time 0.427 (2.773) Data 0.000 (2.393) Loss 8.4238 (10.3028) Acc@1 15.625 (9.299) Acc@5 38.281 (24.568)
Epoch: [1][5888/10010] Time 0.432 (2.775) Data 0.000 (2.395) Loss 8.2156 (10.2520) Acc@1 18.750 (9.495) Acc@5 45.312 (24.929)
Epoch: [1][6016/10010] Time 0.427 (2.775) Data 0.000 (2.395) Loss 6.8980 (10.2046) Acc@1 10.938 (9.690) Acc@5 39.062 (25.283)
Epoch: [1][6144/10010] Time 0.427 (2.776) Data 0.000 (2.396) Loss 7.0476 (10.1565) Acc@1 17.969 (9.886) Acc@5 39.844 (25.650)
Epoch: [1][6272/10010] Time 0.430 (2.776) Data 0.000 (2.396) Loss 6.9353 (10.1123) Acc@1 21.875 (10.068) Acc@5 47.656 (25.985)
Epoch: [1][6400/10010] Time 0.430 (2.777) Data 0.000 (2.397) Loss 7.1662 (10.0699) Acc@1 20.312 (10.249) Acc@5 42.188 (26.319)
Epoch: [1][6528/10010] Time 0.428 (2.779) Data 0.000 (2.399) Loss 8.6659 (10.0253) Acc@1 17.969 (10.437) Acc@5 41.406 (26.658)
Epoch: [1][6656/10010] Time 0.427 (2.778) Data 0.000 (2.398) Loss 7.8911 (9.9847) Acc@1 24.219 (10.616) Acc@5 42.188 (26.972)
Epoch: [1][6784/10010] Time 0.428 (2.778) Data 0.000 (2.398) Loss 7.2867 (9.9397) Acc@1 28.125 (10.805) Acc@5 56.250 (27.306)
Epoch: [1][6912/10010] Time 15.802 (2.779) Data 15.550 (2.400) Loss 6.6393 (9.8960) Acc@1 13.281 (10.987) Acc@5 35.938 (27.635)
Epoch: [1][7040/10010] Time 17.926 (2.780) Data 17.665 (2.400) Loss 7.5947 (9.8533) Acc@1 21.875 (11.172) Acc@5 46.094 (27.955)
Epoch: [1][7168/10010] Time 17.781 (2.780) Data 17.528 (2.400) Loss 6.9177 (9.8136) Acc@1 21.094 (11.351) Acc@5 45.312 (28.273)
Epoch: [1][7296/10010] Time 17.527 (2.781) Data 17.265 (2.400) Loss 7.7678 (9.7743) Acc@1 18.750 (11.522) Acc@5 42.969 (28.572)
Epoch: [1][7424/10010] Time 19.432 (2.782) Data 19.173 (2.402) Loss 7.9332 (9.7351) Acc@1 23.438 (11.700) Acc@5 50.000 (28.874)
Epoch: [1][7552/10010] Time 17.467 (2.782) Data 17.203 (2.401) Loss 7.6893 (9.6940) Acc@1 22.656 (11.887) Acc@5 47.656 (29.188)
Epoch: [1][7680/10010] Time 17.817 (2.782) Data 17.555 (2.401) Loss 6.7318 (9.6543) Acc@1 25.781 (12.051) Acc@5 46.875 (29.478)
Epoch: [1][7808/10010] Time 17.593 (2.782) Data 17.328 (2.400) Loss 6.4962 (9.6153) Acc@1 21.875 (12.228) Acc@5 44.531 (29.770)
Epoch: [1][7936/10010] Time 0.429 (2.782) Data 0.000 (2.400) Loss 6.8113 (9.5767) Acc@1 23.438 (12.404) Acc@5 46.875 (30.066)
Epoch: [1][8064/10010] Time 0.427 (2.783) Data 0.000 (2.402) Loss 6.9980 (9.5405) Acc@1 20.312 (12.583) Acc@5 45.312 (30.356)
Epoch: [1][8192/10010] Time 0.427 (2.783) Data 0.000 (2.402) Loss 7.9975 (9.5054) Acc@1 24.219 (12.750) Acc@5 49.219 (30.635)
Epoch: [1][8320/10010] Time 0.427 (2.783) Data 0.000 (2.402) Loss 7.0064 (9.4687) Acc@1 24.219 (12.919) Acc@5 44.531 (30.910)
Epoch: [1][8448/10010] Time 0.427 (2.785) Data 0.000 (2.404) Loss 7.0096 (9.4320) Acc@1 21.094 (13.092) Acc@5 49.219 (31.185)
Epoch: [1][8576/10010] Time 0.429 (2.785) Data 0.000 (2.404) Loss 7.2485 (9.3975) Acc@1 22.656 (13.253) Acc@5 51.562 (31.456)
Epoch: [1][8704/10010] Time 0.427 (2.786) Data 0.000 (2.405) Loss 8.0318 (9.3637) Acc@1 25.000 (13.416) Acc@5 52.344 (31.717)
Epoch: [1][8832/10010] Time 0.426 (2.787) Data 0.000 (2.406) Loss 7.6812 (9.3286) Acc@1 19.531 (13.572) Acc@5 42.188 (31.976)
Epoch: [1][8960/10010] Time 0.427 (2.787) Data 0.000 (2.406) Loss 6.5118 (9.2929) Acc@1 26.562 (13.735) Acc@5 51.562 (32.238)
Epoch: [1][9088/10010] Time 0.427 (2.788) Data 0.000 (2.406) Loss 7.4556 (9.2585) Acc@1 27.344 (13.901) Acc@5 53.906 (32.497)
Epoch: [1][9216/10010] Time 0.428 (2.787) Data 0.000 (2.405) Loss 6.5463 (9.2264) Acc@1 26.562 (14.053) Acc@5 51.562 (32.747)
Epoch: [1][9344/10010] Time 0.426 (2.788) Data 0.000 (2.406) Loss 6.0262 (9.1929) Acc@1 29.688 (14.207) Acc@5 57.031 (33.005)
Epoch: [1][9472/10010] Time 0.433 (2.788) Data 0.000 (2.406) Loss 7.2222 (9.1606) Acc@1 25.000 (14.363) Acc@5 48.438 (33.257)
Epoch: [1][9600/10010] Time 0.433 (2.788) Data 0.000 (2.406) Loss 6.4898 (9.1287) Acc@1 21.875 (14.515) Acc@5 50.781 (33.494)
Epoch: [1][9728/10010] Time 0.428 (2.789) Data 0.000 (2.406) Loss 7.7649 (9.0955) Acc@1 26.562 (14.669) Acc@5 60.938 (33.748)
Epoch: [1][9856/10010] Time 0.426 (2.790) Data 0.000 (2.407) Loss 5.6638 (9.0646) Acc@1 29.688 (14.819) Acc@5 53.125 (33.993)
Epoch: [1][9984/10010] Time 0.427 (2.789) Data 0.000 (2.406) Loss 6.1442 (9.0355) Acc@1 29.688 (14.967) Acc@5 53.906 (34.216)

Acc@1 14.991 Acc@5 34.258
epoch 1, total time 27894.71

val.lmdb制作与调用

python main.py /home/likunyan/Imagenet --lmdb
会因为找不到val.lmdb报错。但是val的目录下只有文件，没有子目录，与train结构不一致，该怎么处理这个问题

Meet error when num_workers>=2

It seems that multiprocess can not pickle self.env because of the type"environment",is there any way to solve that?

msgpack.exceptions.ExtraData: unpack(b) received extra data

self.keys = msgpack.loads(txn.get(b'__keys__'))

msgpack==0.5.6

不使用DDP，只用lmdb，速度很慢，比原始imread还慢

def folder2lmdb(anno_file, name="train", write_frequency=5000, num_workers=16):
    ids = []
    annotation = []
    for line in open(anno_file,'r'):
        filename = line.strip().split()[0]
        ids.append(filename)
        annotation.append(line.strip().split()[1:])
    lmdb_path = osp.join("app_%s.lmdb" % name)
    isdir = os.path.isdir(lmdb_path)

    print("Generate LMDB to %s" % lmdb_path)
    db = lmdb.open(lmdb_path, subdir=isdir,
                   map_size=1099511627776 * 2, readonly=False,
                   meminit=False, map_async=True)
    
    print(len(ids), len(annotation))
    txn = db.begin(write=True)
    idx = 0
    for filename, label in zip(ids, annotation):
        print(filename, label)
        image = raw_reader(filename)
        txn.put(u'{}'.format(idx).encode('ascii'), dumps_pyarrow((image, label)))
        if idx % write_frequency == 0:
            print("[%d/%d]" % (idx, len(annotation)))
            txn.commit()
            txn = db.begin(write=True)
        idx += 1

    # finish iterating through dataset
    txn.commit()
    keys = [u'{}'.format(k).encode('ascii') for k in range(idx + 1)]
    with db.begin(write=True) as txn:
        txn.put(b'__keys__', dumps_pyarrow(keys))
        txn.put(b'__len__', dumps_pyarrow(len(keys)))

    print("Flushing database ...")
    db.sync()
    db.close()

class DetectionLMDB(data.Dataset):
    def __init__(self, db_path, transform=None, target_transform=None, dataset_name='WiderFace'):
        self.db_path = db_path
        self.env = lmdb.open(db_path, subdir=osp.isdir(db_path),
                             readonly=True, lock=False,
                             readahead=False, meminit=False)
        with self.env.begin(write=False) as txn:
            # self.length = txn.stat()['entries'] - 1
            self.length =pa.deserialize(txn.get(b'__len__'))
            self.keys= pa.deserialize(txn.get(b'__keys__'))

        self.transform = transform
        self.target_transform = target_transform


        self.name = dataset_name
        self.annotation = list()
        self.counter = 0

    def __getitem__(self, index):
        im, gt, h, w = self.pull_item(index)
        return im, gt

    def pull_item(self, index):
        img, target = None, None
        env = self.env
        with env.begin(write=False) as txn:
            byteflow = txn.get(self.keys[index])
        unpacked = pa.deserialize(byteflow)

        # load image
        imgbuf = unpacked[0]
        buf = six.BytesIO()
        buf.write(imgbuf)
        buf.seek(0)
        img = Image.open(buf).convert('RGB')
        img = cv2.cvtColor(np.asarray(img),cv2.COLOR_RGB2BGR)  
        height, width, channels = img.shape
        # load label
        target = unpacked[1]

        if self.target_transform is not None:
            target = self.target_transform(target, width, height)

        if self.transform is not None:
            target = np.array(target)
            img, boxes, labels, poses, angles = self.transform(img, target[:, :4], target[:, 4], target[:,5], target[:,6])
            target = np.hstack((boxes, np.expand_dims(labels, axis=1),
                                       np.expand_dims(poses, axis=1),
                                       np.expand_dims(angles, axis=1)))

        return torch.from_numpy(img).permute(2, 0, 1), target, height, width

    def __len__(self):
        return self.length

    def __repr__(self):
        return self.__class__.__name__ + ' (' + self.db_path + ')'

使用上述代码生成lmdb并用DetectionLMDB作为dataset，速度很慢，不知道为啥，是不是必须跟DDP混合使用呢？

Missing LICENSE

Hello, thanks a lot for this repo :) Right now there is no license info, which means that strictly speaking the code is under no permission/exclusive copyright.

Would it be possible to add a license file to the repo, in case you intend to open source this code? Thanks a lot for consideration, I am happy to help/ open a PR.

LMDB getting slower after iterations

I customize a dataset with LMDB format. Then I train the model in pytorch's DDP mode with 4 GPUs.

The training process is suddenly going down after some iterations (about 6 times slower). The memory doesn't change a lot.

Do you know the possible reasons?

dataloader stuck use lmdb

can anyone encountering this problem ？

detail description can see in :
jnwatson/py-lmdb#350

Bugs in the DDP implementation

Hi,

Good suggestions:) However, there are some problems in the DDP usage, e.g., the metric acc1 is not gathered from different machines.

Regards

Format of ImageFolder

I am wondering what is the file format/structure of ImageFolder? When I downloaded a zip file from imagenet 1k, I just get a folder containing 50000 JPEG images, but it seems like I cannot directly run the code on this folder. Where/how can I get my data in the correct format?

read lmdb file

from folder2lmdb import ImageFolderLMDB
from torch.utils.data import DataLoader
from torchvision.transforms import transforms
import torchvision

dir_folder = "/home/DATA//lsun/church/church_outdoor_train_lmdb/data.mdb"

transform = transforms.Compose([
    transforms.ToTensor(),
])

dataset = ImageFolderLMDB(dir_folder, transform, transform) # <--- error is here
loader = DataLoader(dataset, batch_size=64)

Error

     31     # self.length = pa.deserialize(txn.get(b'__len__'))
     32     # self.keys = txn.stat().keys()
---> 33     self.keys = pa.deserialize(txn.get(b'__keys__'))

TypeError: a bytes-like object is required, not 'NoneType'

About the msgpack

I noticed that you are using pyarrow for serialization and msgpack for deserialization, Do msgpack have faster speed in deserialization?
I have some problem on the msgpack when using:

the error is:

while I using pa.serialize wirh no errors?
Have you met this problem? or do you have any advices?
I think it may caused by the version of msgpack, so could you provide with the version of msgpack?

Why open transactions (txn) repeatedly?

    txn = db.begin(write=True)
    for idx, data in enumerate(data_loader):
        # print(type(data), data)
        image, label = data[0]
        txn.put(u'{}'.format(idx).encode('ascii'), dumps_pyarrow((image, label)))
        if idx % write_frequency == 0:
            print("[%d/%d]" % (idx, len(data_loader)))
            txn.commit()
            txn = db.begin(write=True)

Here you repeatedly commit the data and re-open the transaction to prevent the file from becoming too large? Is this necessary? In practice, I do not find that LMDB is crashed because of too much memory, but it is possible that the dataset I used is too small.
I'm just very strange. The code here looks too wierd.

这里你重复提交数据并重新打开事务，这是为了防止文件过大吗？这是否是有必要的呢？我在实践中并没有发现lmdb因为内存过大而崩溃，但是也有可能我使用的数据集过小。
我只是非常奇怪，毕竟此处的代码看起来太难受了。

add attributes like imgs and classes in ImageFolder()

This seems really helpful to improve IO for pytorch dataset. But I also notice the return value is not totally in the same format as imageFolder() when simply replace in training script. Do you have plan to make it available for output of your ImageFolderLMDB like following?

    classes (list): List of the class names.
    class_to_idx (dict): Dict with items (class_name, class_index).
    imgs (list): List of (image path, class_index) tuples

https://pytorch.org/docs/stable/_modules/torchvision/datasets/folder.html#ImageFolder

Speed up but not enough.

I test it and also get 700+ samples/s, but I don't think it's enough.
DALI is not a good solution,it cost so much GPU memory, I'm considering opencv.
Thx for sharing this.(Your code need to update I think.)