in between listening to the lectures i was leetcoding. my deep learning class taught about cnn. cnn is a type of neural network architecture that can learn spatial patterns through heirarchical feature extractions, mimicking human visual processing.
core components:
- convolutional layer: apply filters to detect features (edge -> shape -> object)
- filters/kernels: matrices (3x3 or 5x5) that slide across images
- output dim: (input size - filter size) / stride + 1.
- padding. add border elements to preserve spatial dimensions
- optimal padding: (filter size - 1) / 2
example:
Padding = (filter size - 1) / 2 For 3x3 filter: Padding = (3 - 1) / 2 = 1
Input: 5x5 image Add 1-pixel border around image → 7x7 Filter: 3x3 Output size = (7 - 3) / 1 + 1 = 5x5
this helps maintain output size to match input size.
what do 1x1 filters do?
- dimensionality reduction (reduce channels but preserve spatial dim)
- computational efficiency
GoogleNet/inception networks popularized this
- reduce parameters
- control network depth