YOLO
- Ömer faruk Subaşı
- Feb 18
- 6 min read

What is YOLO?
YOLO (You Only Look Once) is a popular algorithm used for real-time object tracking. This algorithm utilizes convolutional neural networks (CNN) to identify and distinguish different objects in images. Inspired by the visual processing center of animals, YOLO performs object detection using the CNN structure.
How Does YOLO Work?

Dividing the Image: YOLO divides the image into a grid consisting of a certain number of cells. Each cell represents a specific region of the image.
Analyzing Each Cell: Each cell tries to detect whether there is an object inside it. If an object is present, its location, size, and class (e.g., car, human) are predicted.
Merging Predictions: Predictions from all cells are combined, and as a result, the locations of objects in the image and their respective classes are determined.
Displaying Results: Finally, detected objects are marked with bounding boxes on the image, showing where and what the objects are.
Advantages of YOLO

Fast and Real-Time: YOLO detects all objects in an image in a single forward pass, making it ideal for real-time applications.
Holistic Approach: YOLO analyzes an image in one process, leading to more consistent and holistic results.
High Accuracy: YOLO utilizes the power of convolutional neural networks to achieve high accuracy in object detection.
General Purpose: YOLO can be used for various object detection tasks and can detect different objects simultaneously.
Computational Efficiency: Compared to other object detection algorithms, YOLO requires less computational power, making it suitable for mobile and embedded devices.
Applications and Real-World Use Cases
Security and Surveillance:
YOLO is used globally in security cameras to detect unauthorized access, suspicious activities, and dangerous objects. It is widely applied in shopping malls, airports, and public areas.
Autonomous Vehicles:
YOLO is used in self-driving technologies to detect pedestrians, vehicles, traffic lights, and signs. Companies like Tesla integrate YOLO-based object detection models into their autonomous systems.
Drone Technology:
Drones used in agriculture and military operations actively employ YOLO for object and human detection. It is particularly used for identifying harmful plants and mapping processes.
Medical Imaging:
YOLO algorithms are utilized in medical imaging for detecting abnormalities, tumors, and lesions. Hospitals leverage this technology to accelerate diagnostic processes.
E-commerce and Retail:
Security cameras in stores analyze customer behavior, optimize shelf arrangements, and detect product theft using YOLO. Amazon’s Just Walk Out store technology also benefits from similar technologies.
Intelligent Traffic Management:
YOLO is used for real-time detection of vehicles, pedestrians, and bicycles in traffic. This data is integrated into city management systems to optimize traffic flow and reduce accident risks.
YOLO Model Architecture
YOLO (You Only Look Once) uses a single neural network to directly convert input images into bounding boxes and class probabilities. Its architecture and layers allow it to perform this process quickly and efficiently.

Network Architecture
Input Layer: Receives the input image and resizes it to a model-compatible size (typically 416x416 pixels).
Feature Maps: Uses CNN to extract features from the input image, which are then used for classification and localization.
Prediction Layer: Predicts multiple bounding boxes and class probabilities for each cell, displaying these predictions on the image.
Layers and Their Functions
Convolutional Layers: Extract features from the image and capture low-level details.
ReLU (Rectified Linear Unit): Converts negative values to zero, making the model non-linear and speeding up learning.
Pooling Layers: Makes feature maps more compact while preserving important information.
Fully Connected Layers: Predict bounding boxes and class probabilities.
Prediction Layer: Produces final predictions, determining classes and positions.

Fundamentals of YOLO Object Detection
YOLO (You Only Look Once) converts the input image directly into bounding boxes and class probabilities using a single neural network for object detection. Thanks to its network architecture and layers, it performs this operation quickly and efficiently.
Grid-Based Division
YOLO algorithm divides the input image into a grid. For example, a 416x416 image can be divided into a 13x13 grid.
Each grid cell attempts to detect objects within its area.
Each cell checks whether it is close to the center of a specific object, helping the model learn object positions in the image.
Bounding Box Regression
A bounding box is a rectangular box that surrounds the detected object.
YOLO predicts multiple bounding boxes for each grid cell (e.g., 2 or 3 boxes per cell).
The coordinates (x, y, width, and height) of these boxes are computed using regression, ensuring accurate placement.

Intersection Over Union (IoU)
IoU measures the overlap between the predicted bounding box and the actual box.
Step 1: Compute the intersection area between the predicted and actual boxes.
Step 2: Compute the union area between the predicted and actual boxes.
Step 3: Calculate the IoU score by dividing the intersection area by the union area.

Result: The closer the IoU value is to 1, the more accurate the predicted box is. Typically, an IoU value above 0.5 is considered a successful prediction.

Confidence Score
The confidence score determines whether a bounding box contains an object and how certain the model is about its presence.
Step 1: The model predicts a confidence score for each bounding box, indicating the likelihood of an object being present.
Step 2: If the confidence score is high (e.g., 0.9), the model assumes an object is inside the box. If low (e.g., 0.2), it assumes the box is empty.
Step 3: The confidence score is used together with IoU for object detection.
Non-Maximum Suppression (NMS)
NMS eliminates multiple bounding boxes representing the same object.
Step 1: The model sorts all predictions based on confidence scores.
Step 2: The highest confidence box is selected, and surrounding boxes with high IoU values are eliminated.
Step 3: This process continues until only the best bounding box remains for each object.
Result: NMS removes redundant boxes and ensures the model makes the best prediction for each object.

Object Classification
YOLO classifies objects detected within each bounding box.
Class Probabilities: The model predicts probabilities for possible classes in each bounding box.
Class Prediction: The highest probability class is chosen as the detected object.
Combination with Confidence Score: The classification result is combined with the confidence score to determine both object presence and class.
Object Detection Example

Pc (Confidence Score): Indicates whether an object exists in a cell. A Pc value close to 1 means the model is highly confident about an object’s presence. A Pc value close to 0 means the cell is empty.
Bx (X Coordinate): Represents the bounding box's X coordinate, indicating its horizontal center.
By (Y Coordinate): Represents the bounding box's Y coordinate, indicating its vertical center.
Bw (Width): Represents the bounding box's width.
Bh (Height): Represents the bounding box's height.
C1, C2, C3 (Class Labels): Indicate the predicted object's class.
Grid-Based Division
The YOLO algorithm divides the image into a grid. Each of these grid cells attempts to detect whether there is an object in the image.
Is There an Object? (Pc - Confidence Score)
Step 1: The model calculates the Pc value for each cell. This value indicates whether there is an object in the cell.
Step 2: If Pc is close to 1, the model predicts that there is an object in that cell. If Pc is close to 0, the model predicts that there is no object in that cell. For example, in places indicated by a blue arrow, Pc is 0, and the remaining values are not calculated.
Bounding Box Prediction
Step 1: If the model thinks there is an object in that cell (Pc is high), a bounding box prediction is made.
Step 2: Using the Bx and By values, the center of the bounding box is determined.
Step 3: Using the Bw and Bh values, the width and height of the bounding box are determined.
Class Labels (C1, C2, C3)
Step 1: The model predicts the type of object using class labels (C1, C2, C3).
Step 2: The model calculates a probability for each class and selects the class with the highest probability.
Step 3: The selected class label determines the type of detected object, such as a vehicle, human, or animal.
Result
Step 1: The model detects the presence of an object in each cell based on the Pc value.
Step 2: Bounding box predictions determine the object's position and dimensions.
Step 3: Class labels predict the type of object and finalize the detection.
Comments