Abstract: Structural pruning of neural network parameters reduces computational, energy, and memory transfer costs during inference. We propose a novel method that estimates the contribution of a ...
Abstract: With ongoing advancements in natural language processing (NLP) and deep learning methods, the demand for computational and memory resources has considerably increased, which signifies the ...