Abstract
Detecting differences in generalization ability between models for visual question answering tasks has proven to be surprisingly difficult. We propose a new statistic, asymptotic sample complexity, for model comparison, and construct a synthetic data distribution to compare a strong baseline CNN-LSTM model to a structured neural network with powerful inductive biases. Our metric identifies a clear improvement in the structured model’s generalization ability relative to the baseline despite their similarity under existing metrics.
Authors
Eli Bingham, Piero Molino, Paul Szerlip, Fritz Obermeyer, Noah D. Goodman
Conference
ViGIL @ NeurIPS 2017
Full Paper
‘Characterizing how Visual Question Answering models scale with the world (PDF)
Uber AI