← Back to Documentation

Algorithm Reference: Mean Shift Segmentation

Type: Non-Parametric Clustering / Density Estimation
Library: OpenCV (cv2.pyrMeanShiftFiltering)
Application: Image Smoothing & Segmentation

Unlike K-Means, Mean Shift is a non-parametric algorithm, meaning it does not require the user to specify the number of clusters (K) beforehand. It treats the image data points as a probability density function and iteratively shifts each point towards the nearest "mode" (peak density).

1. Mathematical Formulation

The core of the algorithm is the "Mean Shift Vector," which points in the direction of the maximum increase in density. For a data point x, the mean shift vector m(x) is calculated using a Kernel function K (typically a Gaussian kernel):

m(x) = [ ∑ K(x - xi) xi ] / [ ∑ K(x - xi) ] - x

The algorithm repeats the update x ← x + m(x) until the shift vector becomes smaller than a threshold. This effectively moves every pixel to the center of its local "colour neighborhood," creating flat, cartoon-like regions.

2. Python Backend Logic (Snippet)

Implementing Mean Shift on high-resolution images is computationally expensive (O(N2)). To solve this, ImageStylo utilizes a Pyramid Implementation via OpenCV.

We perform the operation on a downscaled version of the image (Gaussian Pyramid) to define clusters, then propagate those clusters back to the high-resolution original.


def process_meanshift(image, spatial_rad, color_rad, level):
    # Apply Pyramid Mean Shift Filtering
    # spatial_rad: The spatial window radius (location)
    # color_rad: The colour window radius (chromaticity)
    # maxLevel: Pyramid scale level (Speed optimization)
    shifted_image = cv2.pyrMeanShiftFiltering(
        image,
        sp=spatial_rad,
        sr=color_rad,
        maxLevel=level,
    )

    return shifted_image

3. Complexity & Constraints

Traditional Mean Shift is O(T × n × hd), which is too slow for real-time web use on 4K images.

Optimization Strategy: By using `cv2.pyrMeanShiftFiltering`, we operate on a Gaussian pyramid (controlled by the `maxLevel` parameter). A `maxLevel` of 2 or 3 reduces the effective pixel count by a factor of 16 or 64 respectively, speeding up processing by orders of magnitude while preserving the main colour boundaries.

Try the Segmentation Tool:

Launch Live Tool