Detecting differences in generalization ability between models for visual question answering tasks has proven to be surprisingly difficult. We propose a new statistic, asymptotic sample complexity, for model comparison, and construct a synthetic data distribution to compare a strong baseline CNN-LSTM model to a structured neural network with powerful inductive biases. Our metric identifies a clear improvement in the structured model’s generalization ability relative to the baseline despite their similarity under existing metrics.
Eli Bingham, Piero Molino, Paul Szerlip, Fritz Obermeyer, Noah D. Goodman
ViGIL @ NeurIPS 2017
‘Characterizing how Visual Question Answering models scale with the world (PDF)