Comments (13)
π Hello @Killuagg, thank you for your interest in YOLOv5 π! Please visit our βοΈ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a π Bug Report, please provide a minimum reproducible example to help us debug it.
If this is a custom training β Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.
Requirements
Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
Environments
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
- Notebooks with free GPU:
- Google Cloud Deep Learning VM. See GCP Quickstart Guide
- Amazon Deep Learning AMI. See AWS Quickstart Guide
- Docker Image. See Docker Quickstart Guide
Status
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.
Introducing YOLOv8 π
We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 π!
Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.
Check out our YOLOv8 Docs for details and get started with:
pip install ultralytics
from yolov5.
@Killuagg hi there,
Thank you for reaching out and for providing details about your setup and issue. To help you increase the FPS for your camera capture on the Raspberry Pi 4B, here are a few suggestions:
-
Verify Latest Versions: Ensure you are using the latest versions of
torch
and the YOLOv5 repository. This can sometimes resolve performance issues due to optimizations and bug fixes in newer releases. -
Optimize Model Inference:
- Use TensorRT: TensorRT can significantly improve inference speed on devices like the Raspberry Pi. You can convert your ONNX model to TensorRT. Here's a brief guide:
This will generate a TensorRT engine file which you can use for inference.
sudo apt-get install -y libopenblas-base libopenmpi-dev wget https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5s.pt -O yolov5s.pt python3 export.py --weights yolov5s.pt --img 640 --batch 1 --device 0 --include engine
- Use TensorRT: TensorRT can significantly improve inference speed on devices like the Raspberry Pi. You can convert your ONNX model to TensorRT. Here's a brief guide:
-
Reduce Image Size: Lowering the image size can help increase FPS. You can try reducing the
--img
parameter to 320 or even lower, depending on your accuracy requirements:python detect.py --weights best.onnx --img 320 --conf 0.7 --source 0
-
Use a More Efficient Model: If you are using
yolov5s
, you might want to tryyolov5n
(nano), which is designed to be more lightweight and faster, though with a potential trade-off in accuracy:python detect.py --weights yolov5n.onnx --img 640 --conf 0.7 --source 0
-
Optimize Code: Ensure that your code is optimized for performance. For example, make sure that the webcam capture and model inference are not blocking each other. You can use threading to handle webcam capture and inference in parallel.
-
Hardware Acceleration: Ensure that you are utilizing hardware acceleration available on the Raspberry Pi. This includes enabling OpenCV with hardware acceleration and using appropriate libraries that leverage the GPU.
If you continue to experience issues, please provide a minimal reproducible example of your code. This will help us investigate further. You can find more details on creating a minimal reproducible example here.
Feel free to reach out if you have any more questions or need further assistance. The YOLO community and the Ultralytics team are always here to help! π
from yolov5.
Thank for your replied. First when i try to run the detect.py with img 320 the error produce : expected 620 not 320 size. So i only can run the 640 inside my raspberry pi. If i want to run the TensorRT model inside my raspberry pi, do i need to run it on GPU raspberry pi because device available is CPU only. Is there any code inside detect.py that make my fps have limit?
from yolov5.
Hi @Killuagg,
Thank you for your follow-up and for providing additional details. Let's address your concerns one by one.
Image Size Error
The error you encountered (expected 620 not 320 size
) suggests that the model expects a specific input size. To resolve this, you can modify the model's input size to match your desired dimensions. However, if you're constrained to using 640 due to model requirements, let's focus on optimizing other aspects.
TensorRT on Raspberry Pi
Running TensorRT on a Raspberry Pi can indeed provide significant performance improvements, but it typically requires a GPU. Since the Raspberry Pi 4B primarily relies on its CPU, you might not see the same benefits as on a GPU-enabled device. However, you can still try optimizing your setup:
-
Install TensorRT: You can install TensorRT on your Raspberry Pi, but note that the performance gains might be limited due to the lack of a dedicated GPU.
-
Optimize Inference Code: Ensure that your inference code is as efficient as possible. For example, you can use threading to handle webcam capture and model inference in parallel, reducing any potential bottlenecks.
Code Example for Threading
Here's an example of how you might use threading to improve performance:
import cv2
import threading
import time
from yolov5 import YOLOv5
# Load model
model = YOLOv5("best.onnx")
# Function to capture frames
def capture_frames():
global frame
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
time.sleep(0.01) # Adjust sleep time as needed
# Function to run inference
def run_inference():
global frame
while True:
if frame is not None:
results = model.predict(frame)
# Process results
time.sleep(0.01) # Adjust sleep time as needed
# Start threads
frame = None
thread1 = threading.Thread(target=capture_frames)
thread2 = threading.Thread(target=run_inference)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
Verify Latest Versions
Please ensure you are using the latest versions of torch
and the YOLOv5 repository. This can sometimes resolve performance issues due to optimizations and bug fixes in newer releases.
Minimum Reproducible Example
If you continue to experience issues, please provide a minimal reproducible example of your code. This will help us investigate further. You can find more details on creating a minimal reproducible example here.
Feel free to reach out if you have any more questions or need further assistance. The YOLO community and the Ultralytics team are always here to help! π
from yolov5.
Thank you for sharing info. May i know another method without using the TensorRT lite. I mean, its possible the solution only involving the CPU not GPU. Sorry for asking. Plus, may i know if 2000 images for train will effect the FPS?. Because i have other model with 800 images and the FPS still the same.
Why after i run the detect.py using source 0 which is webcam, the file mp4 cannot play on my raspberry pi and also window 11?
from yolov5.
Hi @Killuagg,
Thank you for your detailed follow-up! Let's address your questions and concerns step by step.
CPU-Only Optimization
If you're looking to optimize your YOLOv5 model inference on a CPU-only setup, here are a few strategies you can employ:
-
Model Quantization: Quantizing your model can significantly improve inference speed by reducing the precision of the weights and activations. You can use tools like PyTorch's built-in quantization:
import torch from torch.quantization import quantize_dynamic model = torch.load('best.pt') quantized_model = quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8) torch.save(quantized_model, 'best_quantized.pt')
-
Use a Smaller Model: If you're currently using
yolov5s
, consider switching toyolov5n
(nano), which is designed to be more lightweight and faster:python detect.py --weights yolov5n.pt --img 640 --conf 0.7 --source 0
-
Optimize Code Execution: Ensure that your code is optimized for performance. For example, using threading to handle webcam capture and model inference in parallel can help reduce bottlenecks.
Dataset Size Impact
The number of images used for training (2000 vs. 800) does not directly affect the FPS during inference. The FPS is influenced by the model size, input image size, and the computational power of your device. However, a larger dataset can improve the model's accuracy, which might indirectly affect the processing time if the model becomes more complex.
Video Playback Issues
Regarding the issue with the MP4 file not playing on your Raspberry Pi and Windows 11, it could be related to the codec or the way the video is being saved. Ensure that the video is saved using a widely supported codec like H.264. Hereβs an example of how to save the video correctly:
import cv2
# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'mp4v') # Use 'XVID' for .avi files
out = cv2.VideoWriter('output.mp4', fourcc, 20.0, (640, 480))
while cap.isOpened():
ret, frame = cap.read()
if ret:
# Write the frame
out.write(frame)
else:
break
# Release everything if job is finished
cap.release()
out.release()
cv2.destroyAllWindows()
Minimum Reproducible Example
To help us better understand and resolve your issue, could you please provide a minimal reproducible example of your code? This will allow us to reproduce the bug and investigate a solution. You can find more details on creating a minimal reproducible example here. This step is crucial for us to provide accurate and effective support.
Verify Latest Versions
Lastly, please ensure you are using the latest versions of torch
and the YOLOv5 repository. This can sometimes resolve performance issues due to optimizations and bug fixes in newer releases.
Feel free to reach out if you have any more questions or need further assistance. The YOLO community and the Ultralytics team are always here to help! π
from yolov5.
Ultralytics YOLOv5 π, AGPL-3.0 license
"""
Run YOLOv5 detection inference on images, videos, directories, globs, YouTube, webcam, streams, etc.
Usage - sources:
$ python detect.py --weights yolov5s.pt --source 0 # webcam
img.jpg # image
vid.mp4 # video
screen # screenshot
path/ # directory
list.txt # list of images
list.streams # list of streams
'path/*.jpg' # glob
'https://youtu.be/LNwODJXcvt4' # YouTube
'rtsp://example.com/media.mp4' # RTSP, RTMP, HTTP stream
Usage - formats:
$ python detect.py --weights yolov5s.pt # PyTorch
yolov5s.torchscript # TorchScript
yolov5s.onnx # ONNX Runtime or OpenCV DNN with --dnn
yolov5s_openvino_model # OpenVINO
yolov5s.engine # TensorRT
yolov5s.mlmodel # CoreML (macOS-only)
yolov5s_saved_model # TensorFlow SavedModel
yolov5s.pb # TensorFlow GraphDef
yolov5s.tflite # TensorFlow Lite
yolov5s_edgetpu.tflite # TensorFlow Edge TPU
yolov5s_paddle_model # PaddlePaddle
"""
import argparse
import csv
import os
import platform
import sys
from pathlib import Path
import torch
import time
import pyttsx3
Initialize the TTS engine
engine = pyttsx3.init()
FILE = Path(file).resolve()
ROOT = FILE.parents[0] # YOLOv5 root directory
if str(ROOT) not in sys.path:
sys.path.append(str(ROOT)) # add ROOT to PATH
ROOT = Path(os.path.relpath(ROOT, Path.cwd())) # relative
from ultralytics.utils.plotting import Annotator, colors, save_one_box
from models.common import DetectMultiBackend
from utils.dataloaders import IMG_FORMATS, VID_FORMATS, LoadImages, LoadScreenshots, LoadStreams
from utils.general import (
LOGGER,
Profile,
check_file,
check_img_size,
check_imshow,
check_requirements,
colorstr,
cv2,
increment_path,
non_max_suppression,
print_args,
scale_boxes,
strip_optimizer,
xyxy2xywh,
)
from utils.torch_utils import select_device, smart_inference_mode
@smart_inference_mode()
def run(
weights=ROOT / "best.onnx", # model path or triton URL
source=ROOT / "Data/images", # file/dir/URL/glob/screen/0(webcam)
data=ROOT / "data.yaml", # dataset.yaml path
imgsz=(640, 640), # inference size (height, width)
conf_thres=0.25, # confidence threshold
iou_thres=0.45, # NMS IOU threshold
max_det=1000, # maximum detections per image
device="", # cuda device, i.e. 0 or 0,1,2,3 or cpu
view_img=False, # show results
save_txt=False, # save results to *.txt
save_csv=False, # save results in CSV format
save_conf=False, # save confidences in --save-txt labels
save_crop=False, # save cropped prediction boxes
nosave=False, # do not save images/videos
classes=None, # filter by class: --class 0, or --class 0 2 3
agnostic_nms=False, # class-agnostic NMS
augment=False, # augmented inference
visualize=False, # visualize features
update=False, # update all models
project=ROOT / "runs/detect", # save results to project/name
name="exp", # save results to project/name
exist_ok=False, # existing project/name ok, do not increment
line_thickness=3, # bounding box thickness (pixels)
hide_labels=False, # hide labels
hide_conf=False, # hide confidences
half=False, # use FP16 half-precision inference
dnn=False, # use OpenCV DNN for ONNX inference
vid_stride=1, # video frame-rate stride
):
source = str(source)
save_img = not nosave and not source.endswith(".txt") # save inference images
is_file = Path(source).suffix[1:] in (IMG_FORMATS + VID_FORMATS)
is_url = source.lower().startswith(("rtsp://", "rtmp://", "http://", "https://"))
webcam = source.isnumeric() or source.endswith(".streams") or (is_url and not is_file)
screenshot = source.lower().startswith("screen")
if is_url and is_file:
source = check_file(source) # download
# Directories
save_dir = increment_path(Path(project) / name, exist_ok=exist_ok) # increment run
(save_dir / "labels" if save_txt else save_dir).mkdir(parents=True, exist_ok=True) # make dir
# Load model
device = select_device(device)
model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)
stride, names, pt = model.stride, model.names, model.pt
imgsz = check_img_size(imgsz, s=stride) # check image size
# Dataloader
bs = 1 # batch_size
if webcam:
view_img = check_imshow(warn=True)
dataset = LoadStreams(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)
bs = len(dataset)
elif screenshot:
dataset = LoadScreenshots(source, img_size=imgsz, stride=stride, auto=pt)
else:
dataset = LoadImages(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)
vid_path, vid_writer = [None] * bs, [None] * bs
# FPS calculation
prev_time = time.time()
# Run inference
model.warmup(imgsz=(1 if pt or model.triton else bs, 3, *imgsz)) # warmup
seen, windows, dt = 0, [], (Profile(device=device), Profile(device=device), Profile(device=device))
for path, im, im0s, vid_cap, s in dataset:
current_time = time.time()
fps = 1 / (current_time - prev_time)
prev_time = current_time
with dt[0]:
im = torch.from_numpy(im).to(model.device)
im = im.half() if model.fp16 else im.float() # uint8 to fp16/32
im /= 255 # 0 - 255 to 0.0 - 1.0
if len(im.shape) == 3:
im = im[None] # expand for batch dim
if model.xml and im.shape[0] > 1:
ims = torch.chunk(im, im.shape[0], 0)
# Inference
with dt[1]:
visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False
if model.xml and im.shape[0] > 1:
pred = None
for image in ims:
if pred is None:
pred = model(image, augment=augment, visualize=visualize).unsqueeze(0)
else:
pred = torch.cat((pred, model(image, augment=augment, visualize=visualize).unsqueeze(0)), dim=0)
pred = [pred, None]
else:
pred = model(im, augment=augment, visualize=visualize)
# NMS
with dt[2]:
pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)
# Second-stage classifier (optional)
# pred = utils.general.apply_classifier(pred, classifier_model, im, im0s)
# Define the path for the CSV file
csv_path = save_dir / "predictions.csv"
# Create or append to the CSV file
def write_to_csv(image_name, prediction, confidence):
"""Writes prediction data for an image to a CSV file, appending if the file exists."""
data = {"Image Name": image_name, "Prediction": prediction, "Confidence": confidence}
with open(csv_path, mode="a", newline="") as f:
writer = csv.DictWriter(f, fieldnames=data.keys())
if not csv_path.is_file():
writer.writeheader()
writer.writerow(data)
# Process predictions
for i, det in enumerate(pred): # per image
seen += 1
if webcam: # batch_size >= 1
p, im0, frame = path[i], im0s[i].copy(), dataset.count
s += f"{i}: "
else:
p, im0, frame = path, im0s.copy(), getattr(dataset, "frame", 0)
p = Path(p) # to Path
save_path = str(save_dir / p.name) # im.jpg
txt_path = str(save_dir / "labels" / p.stem) + ("" if dataset.mode == "image" else f"_{frame}") # im.txt
s += "%gx%g " % im.shape[2:] # print string
gn = torch.tensor(im0.shape)[[1, 0, 1, 0]] # normalization gain whwh
imc = im0.copy() if save_crop else im0 # for save_crop
annotator = Annotator(im0, line_width=line_thickness, example=str(names))
if len(det):
# Rescale boxes from img_size to im0 size
det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], im0.shape).round()
# Print results
for c in det[:, 5].unique():
n = (det[:, 5] == c).sum() # detections per class
s += f"{n} {names[int(c)]}{'s' * (n > 1)}, " # add to string
# Write results
for *xyxy, conf, cls in reversed(det):
c = int(cls) # integer class
label = names[c] if hide_conf else f"{names[c]}"
confidence = float(conf)
confidence_str = f"{confidence:.2f}"
if save_csv:
write_to_csv(p.name, label, confidence_str)
if save_txt: # Write to file
xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist() # normalized xywh
line = (cls, *xywh, conf) if save_conf else (cls, *xywh) # label format
with open(f"{txt_path}.txt", "a") as f:
f.write(("%g " * len(line)).rstrip() % line + "\n")
if save_img or save_crop or view_img: # Add bbox to image
c = int(cls) # integer class
label = None if hide_labels else (names[c] if hide_conf else f"{names[c]} {conf:.2f}")
annotator.box_label(xyxy, label, color=colors(c, True))
if save_crop:
save_one_box(xyxy, imc, file=save_dir / "crops" / names[c] / f"{p.stem}.jpg", BGR=True)
# Overlay FPS on the frame
cv2.putText(im0, f"FPS: {fps:.2f}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)
# Stream results
im0 = annotator.result()
if view_img:
if platform.system() == "Linux" and p not in windows:
windows.append(p)
cv2.namedWindow(str(p), cv2.WINDOW_NORMAL | cv2.WINDOW_KEEPRATIO) # allow window resize (Linux)
cv2.resizeWindow(str(p), im0.shape[1], im0.shape[0])
cv2.imshow(str(p), im0)
cv2.waitKey(1) # 1 millisecond
# Save results (image with detections)
if save_img:
if dataset.mode == "image":
cv2.imwrite(save_path, im0)
else: # 'video' or 'stream'
if vid_path[i] != save_path: # new video
vid_path[i] = save_path
if isinstance(vid_writer[i], cv2.VideoWriter):
vid_writer[i].release() # release previous video writer
if vid_cap: # video
fps = vid_cap.get(cv2.CAP_PROP_FPS)
w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
else: # stream
fps, w, h = 30, im0.shape[1], im0.shape[0]
save_path = str(Path(save_path).with_suffix(".mp4")) # force *.mp4 suffix on results videos
vid_writer[i] = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))
vid_writer[i].write(im0)
# Print time (inference-only)
LOGGER.info(f"{s}{'' if len(det) else '(no detections), '}{dt[1].dt * 1E3:.1f}ms")
detections = []
for *xyxy, conf, cls in reversed(det):
detections.append({'label': names[int(cls)]})
# Assuming 'detections' is your list of detected objects
for det in detections:
# Extract the label of the detected object
label = det['label']
print(f"Detected: {label}") # Debugging print statement
# Generate voice feedback
engine.say(f"Detected {label}")
engine.runAndWait()
# Print results
t = tuple(x.t / seen * 1e3 for x in dt) # speeds per image
LOGGER.info(f"Speed: %.1fms pre-process, %.1fms inference, %.1fms NMS per image at shape {(1, 3, *imgsz)}" % t)
if save_txt or save_img:
s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ""
LOGGER.info(f"Results saved to {colorstr('bold', save_dir)}{s}")
if update:
strip_optimizer(weights[0]) # update model (to fix SourceChangeWarning)
def parse_opt():
"""Parses command-line arguments for YOLOv5 detection, setting inference options and model configurations."""
parser = argparse.ArgumentParser()
parser.add_argument("--weights", nargs="+", type=str, default=ROOT / "yolov5s.pt", help="model path or triton URL")
parser.add_argument("--source", type=str, default=ROOT / "data/images", help="file/dir/URL/glob/screen/0(webcam)")
parser.add_argument("--data", type=str, default=ROOT / "data/coco128.yaml", help="(optional) dataset.yaml path")
parser.add_argument("--imgsz", "--img", "--img-size", nargs="+", type=int, default=[640], help="inference size h,w")
parser.add_argument("--conf-thres", type=float, default=0.25, help="confidence threshold")
parser.add_argument("--iou-thres", type=float, default=0.45, help="NMS IoU threshold")
parser.add_argument("--max-det", type=int, default=1000, help="maximum detections per image")
parser.add_argument("--device", default="", help="cuda device, i.e. 0 or 0,1,2,3 or cpu")
parser.add_argument("--view-img", action="store_true", help="show results")
parser.add_argument("--save-txt", action="store_true", help="save results to *.txt")
parser.add_argument("--save-csv", action="store_true", help="save results in CSV format")
parser.add_argument("--save-conf", action="store_true", help="save confidences in --save-txt labels")
parser.add_argument("--save-crop", action="store_true", help="save cropped prediction boxes")
parser.add_argument("--nosave", action="store_true", help="do not save images/videos")
parser.add_argument("--classes", nargs="+", type=int, help="filter by class: --classes 0, or --classes 0 2 3")
parser.add_argument("--agnostic-nms", action="store_true", help="class-agnostic NMS")
parser.add_argument("--augment", action="store_true", help="augmented inference")
parser.add_argument("--visualize", action="store_true", help="visualize features")
parser.add_argument("--update", action="store_true", help="update all models")
parser.add_argument("--project", default=ROOT / "runs/detect", help="save results to project/name")
parser.add_argument("--name", default="exp", help="save results to project/name")
parser.add_argument("--exist-ok", action="store_true", help="existing project/name ok, do not increment")
parser.add_argument("--line-thickness", default=3, type=int, help="bounding box thickness (pixels)")
parser.add_argument("--hide-labels", default=False, action="store_true", help="hide labels")
parser.add_argument("--hide-conf", default=False, action="store_true", help="hide confidences")
parser.add_argument("--half", action="store_true", help="use FP16 half-precision inference")
parser.add_argument("--dnn", action="store_true", help="use OpenCV DNN for ONNX inference")
parser.add_argument("--vid-stride", type=int, default=1, help="video frame-rate stride")
opt = parser.parse_args()
opt.imgsz *= 2 if len(opt.imgsz) == 1 else 1 # expand
print_args(vars(opt))
return opt
def main(opt):
"""Executes YOLOv5 model inference with given options, checking requirements before running the model."""
check_requirements(ROOT / "requirements.txt", exclude=("tensorboard", "thop"))
run(**vars(opt))
if name == "main":
opt = parse_opt()
main(opt)
I am using my modified detect1.py file from YOLOv5 Pytorch. I already follow the code you show but it still cannot show the video. Can you help me modified the code i share.
from yolov5.
Hi @Killuagg,
Thank you for sharing your detailed code and setup. Let's address your concerns step by step to ensure we can help you effectively.
Video Playback Issues
The issue with the video not playing could be related to how the video is being saved or displayed. Let's ensure that the video is saved correctly and that the display logic is handled properly.
Ensure Correct Video Saving
First, let's ensure that the video is saved using a widely supported codec like H.264. Here's a snippet to ensure the video is saved correctly:
# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'mp4v') # Use 'XVID' for .avi files
out = cv2.VideoWriter('output.mp4', fourcc, 20.0, (640, 480))
while cap.isOpened():
ret, frame = cap.read()
if ret:
# Write the frame
out.write(frame)
else:
break
# Release everything if job is finished
cap.release()
out.release()
cv2.destroyAllWindows()
Ensure Correct Video Display
Next, let's ensure that the video display logic is handled correctly. Hereβs a simplified version of your detect.py
script focusing on video display:
import cv2
import time
import torch
from pathlib import Path
from models.common import DetectMultiBackend
from utils.dataloaders import LoadStreams
from utils.general import check_img_size, non_max_suppression, scale_boxes, xyxy2xywh
from utils.plots import Annotator, colors
# Load model
device = torch.device('cpu') # Change to 'cuda' if you have a GPU
model = DetectMultiBackend('best.onnx', device=device)
stride, names = model.stride, model.names
imgsz = check_img_size((640, 640), s=stride) # check image size
# Dataloader
source = '0' # webcam
dataset = LoadStreams(source, img_size=imgsz, stride=stride, auto=True)
# Run inference
model.warmup(imgsz=(1, 3, *imgsz)) # warmup
for path, im, im0s, vid_cap, s in dataset:
im = torch.from_numpy(im).to(device)
im = im.float() / 255.0 # 0 - 255 to 0.0 - 1.0
if len(im.shape) == 3:
im = im[None] # expand for batch dim
# Inference
pred = model(im)
# NMS
pred = non_max_suppression(pred, 0.25, 0.45, None, False, max_det=1000)
# Process predictions
for i, det in enumerate(pred): # per image
im0 = im0s[i].copy()
annotator = Annotator(im0, line_width=3, example=str(names))
if len(det):
det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], im0.shape).round()
for *xyxy, conf, cls in reversed(det):
label = f'{names[int(cls)]} {conf:.2f}'
annotator.box_label(xyxy, label, color=colors(int(cls), True))
# Display results
cv2.imshow(str(path), im0)
if cv2.waitKey(1) == ord('q'): # 1 millisecond
break
cv2.destroyAllWindows()
Verify Latest Versions
Please ensure you are using the latest versions of torch
and the YOLOv5 repository. This can sometimes resolve performance issues due to optimizations and bug fixes in newer releases.
Minimum Reproducible Example
If the issue persists, please provide a minimal reproducible example of your code. This will help us investigate further. You can find more details on creating a minimal reproducible example here. This step is crucial for us to provide accurate and effective support.
Feel free to reach out if you have any more questions or need further assistance. The YOLO community and the Ultralytics team are always here to help! π
from yolov5.
I am sorry.I am confuse where i need to place the code inside the detect.py
from yolov5.
Hi @Killuagg,
Thank you for your patience and for providing more details about your setup. Let's clarify where to place the code within your detect.py
script to ensure everything runs smoothly.
Integrating the Code into detect.py
- Import Necessary Libraries: Ensure you have all the necessary imports at the beginning of your script.
- Initialize the Model and Dataloader: This should be done before the main inference loop.
- Run Inference and Display Results: This is where the main logic of processing each frame and displaying the results will go.
Here's a structured example to guide you:
import argparse
import os
import sys
from pathlib import Path
import torch
import time
import cv2
from models.common import DetectMultiBackend
from utils.dataloaders import LoadStreams
from utils.general import check_img_size, non_max_suppression, scale_boxes, xyxy2xywh
from utils.plots import Annotator, colors
# Initialize the TTS engine
import pyttsx3
engine = pyttsx3.init()
# Define the main function
def run(weights='best.onnx', source='0', imgsz=(640, 640), conf_thres=0.25, iou_thres=0.45, max_det=1000, device='cpu', view_img=False):
# Load model
device = torch.device(device)
model = DetectMultiBackend(weights, device=device)
stride, names = model.stride, model.names
imgsz = check_img_size(imgsz, s=stride) # check image size
# Dataloader
dataset = LoadStreams(source, img_size=imgsz, stride=stride, auto=True)
# Run inference
model.warmup(imgsz=(1, 3, *imgsz)) # warmup
for path, im, im0s, vid_cap, s in dataset:
im = torch.from_numpy(im).to(device)
im = im.float() / 255.0 # 0 - 255 to 0.0 - 1.0
if len(im.shape) == 3:
im = im[None] # expand for batch dim
# Inference
pred = model(im)
# NMS
pred = non_max_suppression(pred, conf_thres, iou_thres, None, False, max_det=max_det)
# Process predictions
for i, det in enumerate(pred): # per image
im0 = im0s[i].copy()
annotator = Annotator(im0, line_width=3, example=str(names))
if len(det):
det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], im0.shape).round()
for *xyxy, conf, cls in reversed(det):
label = f'{names[int(cls)]} {conf:.2f}'
annotator.box_label(xyxy, label, color=colors(int(cls), True))
# Display results
if view_img:
cv2.imshow(str(path), im0)
if cv2.waitKey(1) == ord('q'): # 1 millisecond
break
# Generate voice feedback
detections = [{'label': names[int(cls)]} for *xyxy, conf, cls in reversed(det)]
for det in detections:
label = det['label']
engine.say(f"Detected {label}")
engine.runAndWait()
cv2.destroyAllWindows()
# Define the argument parser
def parse_opt():
parser = argparse.ArgumentParser()
parser.add_argument('--weights', type=str, default='best.onnx', help='model path')
parser.add_argument('--source', type=str, default='0', help='source')
parser.add_argument('--imgsz', type=int, nargs='+', default=[640, 640], help='inference size h,w')
parser.add_argument('--conf-thres', type=float, default=0.25, help='confidence threshold')
parser.add_argument('--iou-thres', type=float, default=0.45, help='NMS IoU threshold')
parser.add_argument('--max-det', type=int, default=1000, help='maximum detections per image')
parser.add_argument('--device', default='cpu', help='cuda device or cpu')
parser.add_argument('--view-img', action='store_true', help='show results')
return parser.parse_args()
# Main entry point
if __name__ == "__main__":
opt = parse_opt()
run(**vars(opt))
Explanation:
- Imports: Ensure all necessary libraries are imported at the beginning.
- Model Initialization: The model is loaded and initialized before the main loop.
- Inference Loop: The loop processes each frame, performs inference, and displays the results.
- Voice Feedback: The text-to-speech engine provides voice feedback for detected objects.
Next Steps:
- Verify Latest Versions: Ensure you are using the latest versions of
torch
and the YOLOv5 repository. - Minimum Reproducible Example: If you encounter further issues, please provide a minimal reproducible example. This will help us investigate and resolve the issue more effectively. You can find more details on creating a minimal reproducible example here.
Feel free to reach out if you have any more questions or need further assistance. The YOLO community and the Ultralytics team are always here to help! π
from yolov5.
I have evaluate my model with val.py. The dataset was image extracted from video. When test with test dataset from google, it have high metrics.If i am using the dataset test from extracted video raspberry pi. i only get 60% metrics.How can i improve it?
from yolov5.
Hi @Killuagg,
Thank you for reaching out and sharing your evaluation results. It's great to hear that your model performs well on the test dataset from Google but not as well on the dataset extracted from video on the Raspberry Pi. Let's explore some potential reasons and solutions to improve your metrics:
-
Dataset Quality and Diversity:
- Consistency: Ensure that the images extracted from the video on the Raspberry Pi are of consistent quality and resolution. Variations in lighting, angle, and motion blur can affect model performance.
- Diversity: The dataset from Google might be more diverse compared to the video frames. Ensure that your training dataset includes a wide variety of scenarios similar to those in your video.
-
Data Augmentation:
- Applying data augmentation techniques can help improve the robustness of your model. Techniques such as random cropping, rotation, flipping, and color adjustments can help your model generalize better to different conditions.
-
Model Fine-Tuning:
- Fine-tune your model on the specific dataset extracted from the video. This can help the model adapt better to the specific characteristics of the video frames.
-
Hyperparameter Tuning:
- Experiment with different hyperparameters such as learning rate, batch size, and number of epochs. Sometimes, fine-tuning these parameters can lead to significant improvements in model performance.
-
Test-Time Augmentation (TTA):
- Utilize Test-Time Augmentation (TTA) during inference to improve metrics. TTA involves making predictions on multiple augmented versions of the input image and then averaging the results. You can enable TTA by adding the
--augment
flag to yourval.py
command:python val.py --weights yolov5x.pt --data coco.yaml --img 832 --augment --half
- For more details on TTA, you can refer to the Test-Time Augmentation (TTA) documentation.
- Utilize Test-Time Augmentation (TTA) during inference to improve metrics. TTA involves making predictions on multiple augmented versions of the input image and then averaging the results. You can enable TTA by adding the
-
Evaluate on Latest Versions:
- Ensure you are using the latest versions of
torch
and the YOLOv5 repository. Updates often include performance improvements and bug fixes that could benefit your model's performance.
- Ensure you are using the latest versions of
If you could provide a minimal reproducible example of your code, it would help us investigate further. You can find more details on creating a minimal reproducible example here. This step is crucial for us to provide accurate and effective support.
Feel free to reach out if you have any more questions or need further assistance. The YOLO community and the Ultralytics team are always here to help! π
from yolov5.
π Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
- Docs: https://docs.ultralytics.com
- HUB: https://hub.ultralytics.com
- Community: https://community.ultralytics.com
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO π and Vision AI β
from yolov5.
Related Issues (20)
- 'RandomSampler' object has no attribute 'set_epoch' HOT 2
- Hyperparameters explanation HOT 4
- Suppress torch.hub.load() Output HOT 8
- How can I save the detections Yolov5 makes when he's working with a camera source? HOT 4
- How to specify yolov5 to train multiple folders? HOT 1
- pulling out model's layer intermediates HOT 2
- Continuous training of a Ultralytics Model HOT 4
- Exporting trained yolov5 model (trained on custom dataset) to 'saved model' format changes the no. of classes and the name of classes to default coco128 values HOT 2
- more details about training procedure HOT 4
- divide the objects into small and large categories based on the size of the bonding boxes HOT 8
- Request for YOLOv5 v6.2 Source Code under GPL-3.0 License HOT 4
- What prevents me from using the AMP functionοΌ HOT 4
- What prevents me from using the AMP functionοΌ HOT 1
- What prevents me from using the AMP functionοΌ HOT 1
- Background annotation HOT 6
- Hi @7rkMnpl, HOT 2
- Multiple GPU Hyperparameter evolution HOT 5
- Marking YOLOv5 Detection Text Outputs with TP or FP HOT 4
- Multiple threads using yolov5 model concurrent inference failed HOT 4
- Detect head structure differs HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from yolov5.