Children learn thousands of words in the first several years of life, inspiring many theoretical and empirical studies seeking to understand the speed of word learning. The present study revisits recent theoretical analyses of simple sampling models with uniformly-distributed word frequency, and considers a more realistic Zipfian (i.e., power-law) distribution of words and referents. Our new mathematical analysis finds that simple sampling models are unable to account for word learning in feasible time under Zipf-distributed word and referent frequencies. To salvage learning under realistic distributional assumptions, we propose an active learning model which assumes that learners and/or caregivers select (or construct) the contexts from which words are sampled. Using simulations and mathematical analysis, we show that active learners choosing optimal learning situations learn hundreds of times faster than passive learners faced with randomly-sampled situations. Thus, as suggested by past empirical studies, we find theoretical support for the idea that statistical structure in real-world situations–potentially created by the self-directed learner or beneficent teachers–is a potential remedy for the difficulty of learning a Zipf-distributed vocabulary.