Crowd Counting Network with Self-attention Distillation

Authors
Yaoyao Li, Li Wang, Huailin Zhao*, Zhen Nie
School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
*Corresponding author. Email: [email protected]; www.sit.edu.cn
Corresponding Author
Huailin Zhao
Received 5 September 2019, Accepted 11 May 2020, Available Online 2 June 2020.
DOI
https://doi.org/10.2991/jrnal.k.200528.009How to use a DOI?
Keywords
Self-attention distillation; dilated convolution; crowd counting
Abstract
Context information is essential for crowd counting network to estimate crowd numbers, especially in the congested scene accurately. However, shallow layers of common crowd counting networks (i.e., congested scene recognition network) do not own large receptive filed so that they can’t efficiently utilize context information from the crowd scene. To solve this problem, in this paper, we propose a crowd counting network with self-attention distillation. Each input image is first sent to the visual geometry group (VGG)-16 network for feature extracting. Then, the extracted features are processed by the dilated convolutional part for the final crowd density estimation. Specially, we apply self-attention distillation strategy at different locations of the dilated convolutional part to use the global context information from the deeper layers to guide the shallower layers to learn. We compare our method with the other state-of-the-art works on the UCF-QNRF dataset, and the experiment results demonstrate the superiority of our method.
Copyright
© 2020 The Authors. Published by ALife Robotics Corp. Ltd.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).