Deep Fisher Kernels – End to end learning of the Fisher Kernel GMM parameters Conference Paper

Author(s): Sydorov, Vladyslav; Sakurada, Mayu; Lampert, Christoph
Title: Deep Fisher Kernels – End to end learning of the Fisher Kernel GMM parameters
Affiliation IST Austria
Abstract: Fisher Kernels and Deep Learning were two developments with significant impact on large-scale object categorization in the last years. Both approaches were shown to achieve state-of-the-art results on large-scale object categorization datasets, such as ImageNet. Conceptually, however, they are perceived as very different and it is not uncommon for heated debates to spring up when advocates of both paradigms meet at conferences or workshops. In this work, we emphasize the similarities between both architectures rather than their differences and we argue that such a unified view allows us to transfer ideas from one domain to the other. As a concrete example we introduce a method for learning a support vector machine classifier with Fisher kernel at the same time as a task-specific data representation. We reinterpret the setting as a multi-layer feed forward network. Its final layer is the classifier, parameterized by a weight vector, and the two previous layers compute Fisher vectors, parameterized by the coefficients of a Gaussian mixture model. We introduce a gradient descent based learning algorithm that, in contrast to other feature learning techniques, is not just derived from intuition or biological analogy, but has a theoretical justification in the framework of statistical learning theory. Our experiments show that the new training procedure leads to significant improvements in classification accuracy while preserving the modularity and geometric interpretability of a support vector machine setup.
Keywords: deep learning; Fisher kernel; Gaussian mixture models; image classification; support vector machines
Conference Title: CVPR: Computer Vision and Pattern Recognition
Conference Dates: June 24-27, 2014
Conference Location: Columbus, OH, USA
Publisher: IEEE  
Date Published: 2014-09-24
Start Page: 1402
End Page: 1409
DOI: 10.1109/CVPR.2014.182
Notes: This work was in parts funded by the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013)/ERC grant agreement no 308036: "Life-long learning of visual scene understanding" (L3ViSU).
Open access: no
IST Austria Authors
  1. Christoph Lampert
    87 Lampert
Related IST Austria Work