leetcode @ class

March 25, 2025


in between listening to the lectures i was leetcoding. my deep learning class taught about cnn. cnn is a type of neural network architecture that can learn spatial patterns through heirarchical feature extractions, mimicking human visual processing.

core components:

  • convolutional layer: apply filters to detect features (edge -> shape -> object)
  • filters/kernels: matrices (3x3 or 5x5) that slide across images
  • output dim: (input size - filter size) / stride + 1.
  • padding. add border elements to preserve spatial dimensions
    • optimal padding: (filter size - 1) / 2

example:

Padding = (filter size - 1) / 2 For 3x3 filter: Padding = (3 - 1) / 2 = 1

Input: 5x5 image Add 1-pixel border around image → 7x7 Filter: 3x3 Output size = (7 - 3) / 1 + 1 = 5x5

this helps maintain output size to match input size.

what do 1x1 filters do?

  • dimensionality reduction (reduce channels but preserve spatial dim)
  • computational efficiency

GoogleNet/inception networks popularized this

  • reduce parameters
  • control network depth