Coder Social home page Coder Social logo

Comments (5)

glenn-jocher avatar glenn-jocher commented on June 17, 2024

Hello! Thanks for your detailed question and for diving deep into the YOLOv5 code! 🌟

The line if im.shape[0] < 5 that you're referring to is indeed checking the shape of the image tensor. In YOLOv5, images are typically manipulated in the format CHW (Channels, Height, Width) after transformations. This specific check is to determine if the image is in CHW format (common in deep learning frameworks like PyTorch) rather than the conventional HWC format used by OpenCV and PIL. If the first dimension (which would be channels in CHW) is less than 5, it likely indicates that the image is in CHW format and needs to be transposed to HWC for certain operations or visualizations.

The condition im.shape[0] < 5 is used because no image channel should have less than 5 channels in typical scenarios (where RGB is 3 channels and RGBA is 4 channels). This is a quick way to infer the tensor layout.

Your suggestion im.shape[0] <= 3 would not be appropriate here, as it would incorrectly transpose images that are already in HWC format but have a height of 3 or less, which is rare but could theoretically occur.

I hope this clears up the confusion! Let me know if you have any more questions. Happy coding! 😊

from yolov5.

Le0v1n avatar Le0v1n commented on June 17, 2024

Thank you very much for your response! @glenn-jocher

Actually, I didn't think of the RGBA image format, and your explanation has given me inspiration. I have another small question. When I use the default training parameters (python train.py --data coco128.yaml --weights yolov5s.pt --img 640), the shape format of im at this point is [H, W, C] instead of [C, H, W]. Here is a screenshot of my DEBUG:

image

At this point, the conditional statement in the code if im.shape[0] < 5 is actually checking if H < 5 rather than C < 5. I'm wondering if the code can be modified from if im.shape[0] < 5 to if im.ndim < 5?

# Pre-process
n, ims = (len(ims), list(ims)) if isinstance(ims, (list, tuple)) else (1, [ims])  # number, list of images
shape0, shape1, files = [], [], []  # image and inference shapes, filenames
for i, im in enumerate(ims):
	f = f"image{i}"  # filename
	if isinstance(im, (str, Path)):  # filename or uri
		im, f = Image.open(requests.get(im, stream=True).raw if str(im).startswith("http") else im), im
		im = np.asarray(exif_transpose(im))  
	elif isinstance(im, Image.Image):  # PIL Image
		im, f = np.asarray(exif_transpose(im)), getattr(im, "filename", f) or f
	files.append(Path(f).with_suffix(".jpg").name)
	# if im.shape[0] < 5:  # image in CHW
	if im.ndim < 5:  # 💡 This is the modification/change.
		im = im.transpose((1, 2, 0))  # reverse dataloader .transpose(2, 0, 1)
	im = im[..., :3] if im.ndim == 3 else cv2.cvtColor(im, cv2.COLOR_GRAY2BGR)  # enforce 3ch input
	s = im.shape[:2]  # HWC
	shape0.append(s)  # image shape
	g = max(size) / max(s)  # gain
	shape1.append([int(y * g) for y in s])
	ims[i] = im if im.data.contiguous else np.ascontiguousarray(im)  # update
shape1 = [make_divisible(x, self.stride) for x in np.array(shape1).max(0)]  # inf shape
x = [letterbox(im, shape1, auto=False)[0] for im in ims]  # pad
x = np.ascontiguousarray(np.array(x).transpose((0, 3, 1, 2)))  # stack and BHWC to BCHW
x = torch.from_numpy(x).to(p.device).type_as(p) / 255  # uint8 to fp16/32

Thank you very much for your patience and response!

from yolov5.

glenn-jocher avatar glenn-jocher commented on June 17, 2024

Hello again!

I appreciate your follow-up question and the code snippet you've provided. The suggestion to use if im.ndim < 5 wouldn't quite address the issue you're encountering. The .ndim property checks the number of dimensions in the array, which for images will typically be 3 (height, width, channels), regardless of the order (HWC or CHW).

The original intent of if im.shape[0] < 5 is to check if the image is in CHW format, assuming that no image height or width (in HWC format) would be less than 5 pixels, which is a reasonable assumption for the datasets typically used. This check is specifically designed to catch cases where the image might be in a format expected by PyTorch (CHW) rather than HWC.

If you're consistently finding that im is in HWC format at this point in the code, it might be worth investigating earlier in the pipeline to ensure that images are being correctly transformed to CHW format where expected, especially before they are passed to model-related functions that expect this format.

For now, the existing check should suffice in most scenarios, but if you're encountering specific issues with image formats, you might need to add additional checks or transformations based on your particular use case or dataset.

Thank you for your keen observations, and feel free to reach out if you have more questions! 😊

from yolov5.

Le0v1n avatar Le0v1n commented on June 17, 2024

@glenn-jocher Thank you very much for your reply. If we directly use im.ndim < 5, it would be too arbitrary and would overlook the difference between HWC and CHW. I appreciate your reminder.

To be honest, the method you have written is really great and can be applied to the majority of datasets. I suggest adding a comment after this code segment, as without any explanation, others might also find it confusing.

Overall, thank you very much for your reply! 😊

from yolov5.

glenn-jocher avatar glenn-jocher commented on June 17, 2024

@Le0v1n hello!

Thank you for your understanding and for the suggestion to add a comment for clarity. It's a great idea to help others who might be reviewing the code in the future. I'll pass this feedback along to the team to consider adding a descriptive comment in the next update.

We appreciate your engagement and thoughtful suggestions! If you have any more ideas or questions, feel free to share. Happy coding! 😊

from yolov5.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.