Pruning

arg min L(x;Wp) subject to || Wp||O <N

Goal is keep the number of non-zero weights within a pre-determined number

Unstructure pruning vs coarse-grained

Unstructured is hard to accelerate due to irregular pattern
However, offers more flexiblity
Coarse grained less flexible but easy to accelerate
Weights of convolutional layer have 4 dimensions: Input channels,
output channels, kernel size height, kernel size width

some of common pruning patterns

Irregular or fine grained Fine grained pruning
Pattern-based pruning follow a particular pattern
Vector-level pruning - prune a row or column
Kernel-level pruning - Complete kernel
Channel - Prune a channel or two
Pattern-based pruning: N:M sparsity, for N contiguous elements prune M
Pattern based pruning is supported by Nvidia Ampere GPU architectures~ can deliver upt 2x speeds

Neural Network Pruning

Goal is to prune parameters that are less important Magnitude pruning considers weights to large absolute value and removes other weights Importance L1 of