Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL
Abstract
While early AutoML frameworks focused on optimizing traditional ML pipelines and their hyperparameters, a recent trend in AutoML is to focus on neural architecture search. In this paper, we introduce Auto-PyTorch, which brings the best of these two worlds together by jointly and robustly optimizing the architecture of networks and the training hyperparameters to enable fully automated deep learning (AutoDL). Auto-PyTorch achieves state-of-the-art performance on several tabular benchmarks by combining multi-fidelity optimization with portfolio construction for warmstarting and ensembling of deep neural networks (DNNs) and common baselines for tabular data. To thoroughly study our assumptions on how to design such an AutoDL system, we additionally introduce a new benchmark on learning curves for DNNs, dubbed LCBench, and run extensive ablation studies of the full Auto-PyTorch on typical AutoML benchmarks, eventually showing that Auto-PyTorch performs better than several state-of-the-art competitors on average.
Community
Proposes Auto-PyTorch: an automatic deep learning (AutoDL) framework for jointly and robustly optimising network architecture (NAS - neural architecture search) and network’s training hyperparameters; also introduces LCBench: a benchmark for learning curves for DNNs (specifically for tabular data). Combine BOHB (robust optimiser for AutoDL) with automatically designed portfolios of architectures and hyperparameters. Uses ConfigSpace (from BOHB) for preprocessing, architecture, and training hyperparameters. Has two parameter spaces: smaller for quick search and full/larger for SOTA performance search (latter has MLPNet and ResNet network types - see table 1 & 2 for full parameter list). Uses BOHB (Bayesian Optimization and Hyperband) for getting well-performing configurations over multiple budgets: fit a kernel density estimator (KDE) as a probabilistic model to explore/exploit promising areas in configuration search space; ensembling inspired by auto-sklearn. Warm start optimization (instead of starting from scratch for every task) using PoSH-Auto-Sklearn (iteratively and greedily add to the portfolio). LCBench: tested on OpenML AutoML benchmarks; cheap/smaller dataset benchmarks for the smaller configuration space search; portfolio collection shows configurations that perform well across datasets. Also has analysis of portfolio size and budget; hyperparameter importance accessed using fANOVA and LPI (local) - number of layers, learning rate, and weight decay are important. BOHB is better than BO (faster error reduction), portfolios for warm starting is better, ensemble models are better, parallel workers in BOHB is better; stacking ensembles in AutoGluon are better than normal ensembles in AutoPyTorch (which wins without ensembles). Auto-PyTorch (portfolio BOHB) has strong performance (CIFAR-10 classification on NAS-Bench-201), comparable to GDAS (one-shot method). Appendix has distribution of datasets for portfolio construction, ablation trajectories (error vs. wall clock time), more comparisons, and proof-of-concept with image data. From Bosch, University of Freiburg, Leibniz University (Germany).
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper