# When MAML Can Adapt Fast and How to Assist When It Cannot

@inproceedings{Arnold2021WhenMC, title={When MAML Can Adapt Fast and How to Assist When It Cannot}, author={S{\'e}bastien M. R. Arnold and Shariq Iqbal and Fei Sha}, booktitle={AISTATS}, year={2021} }

Model-Agnostic Meta-Learning (MAML) and its variants have achieved success in meta-learning tasks on many datasets and settings. On the other hand, we have just started to understand and analyze how they are able to adapt fast to new tasks. For example, one popular hypothesis is that the algorithms learn good representations for transfer, as in multi-task learning. In this work, we contribute by providing a series of empirical and theoretical studies, and discover several interesting yet… Expand

#### Supplemental Presentations

Presentation Slides

#### Figures and Tables from this paper

#### 9 Citations

learn2learn: A Library for Meta-Learning Research

- Computer Science, Mathematics
- ArXiv
- 2020

Meta-learning researchers face two fundamental issues in their empirical work: prototyping and reproducibility. Researchers are prone to make mistakes when prototyping new algorithms and tasks… Expand

A Channel Coding Benchmark for Meta-Learning

- 2021

Meta-learning provides a popular and effective family of methods for data-efficient learning of new tasks. However, several important issues in meta-learning have proven hard to study thus far. For… Expand

A Channel Coding Benchmark for Meta-Learning

- Computer Science, Mathematics
- ArXiv
- 2021

This work proposes the channel coding problem as a benchmark for meta- learning and uses this benchmark to study several aspects of meta-learning, including the impact of task distribution breadth and shift, which can be controlled in the coding problem. Expand

A Representation Learning Perspective on the Importance of Train-Validation Splitting in Meta-Learning

- Computer Science
- ICML
- 2021

This work argues that the trainvalidation split encourages the learned representation to be low-rank without compromising on expressivity, as opposed to the non-splitting variant that encourages high-rank representations. Expand

Co-Transport for Class-Incremental Learning

- Computer Science
- ACM Multimedia
- 2021

CO-transport for class Incremental Learning (COIL), which learns to relate across incremental tasks with the class-wise semantic relationship, is proposed, which efficiently adapts to new tasks, and stably resists forgetting. Expand

How Important is the Train-Validation Split in Meta-Learning?

- Computer Science, Mathematics
- ICML
- 2021

A detailed theoretical study on whether and when the train-validation split is helpful on the linear centroid meta-learning problem, in the asymptotic setting where the number of tasks goes to infinity, and results highlight that data splitting may not always be preferable, especially when the data is realizable by the model. Expand

Offline Meta-Reinforcement Learning with Advantage Weighting

- Computer Science, Mathematics
- ICML
- 2021

This paper introduces the offline meta-reinforcement learning (offline meta-RL) problem setting and proposes an algorithm that performs well in this setting, and proposes MACAW, an optimization-based meta-learning algorithm that uses simple, supervised regression objectives for both the inner and outer loop of meta-training. Expand

Pruning Meta-Trained Networks for On-Device Adaptation

- Computer Science
- CIKM
- 2021

Adapting-aware network pruning (ANP) is proposed, a novel pruning scheme that works with existing meta-learning methods for a compact network capable of fast adaptation and uses weight importance metric based on the sensitivity of the meta-objective rather than the conventional loss function. Expand

Modular Meta-Learning with Shrinkage

- Computer Science, Mathematics
- NeurIPS
- 2020

This work develops general techniques based on Bayesian shrinkage to automatically discover and learn both task-specific and general reusable modules and demonstrates that this method outperforms existing meta-learning approaches in domains like few-shot text-to-speech that have little task data and long adaptation horizons. Expand

#### References

SHOWING 1-10 OF 58 REFERENCES

Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML

- Computer Science, Mathematics
- ICLR
- 2020

The ANIL (Almost No Inner Loop) algorithm is proposed, a simplification of MAML where the inner loop is removed for all but the (task-specific) head of a MAMl-trained network, and performance on the test tasks is entirely determined by the quality of the learned features, and one can remove even the head of the network (the NIL algorithm). Expand

On First-Order Meta-Learning Algorithms

- Computer Science
- ArXiv
- 2018

A family of algorithms for learning a parameter initialization that can be fine-tuned quickly on a new task, using only first-order derivatives for the meta-learning updates, including Reptile, which works by repeatedly sampling a task, training on it, and moving the initialization towards the trained weights on that task. Expand

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

- Computer Science
- ICML
- 2017

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning… Expand

Learning to Learn with Gradients

- Computer Science
- 2018

This thesis discusses gradient-based algorithms for learning to learn, or meta-learning, which aim to endow machines with flexibility akin to that of humans, and shows how these methods can be extended for applications in motor control by combining elements of meta- learning with techniques for deep model-based reinforcement learning, imitation learning, and inverse reinforcement learning. Expand

Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm

- Computer Science, Mathematics
- ICLR
- 2018

This paper finds that deep representation combined with standard gradient descent have sufficient capacity to approximate any learning algorithm, and finds that gradient-based meta-learning consistently leads to learning strategies that generalize more widely compared to those represented by recurrent models. Expand

Meta-learning with differentiable closed-form solvers

- Computer Science, Mathematics
- ICLR
- 2019

The main idea is to teach a deep network to use standard machine learning tools, such as ridge regression, as part of its own internal model, enabling it to quickly adapt to novel data. Expand

Alpha MAML: Adaptive Model-Agnostic Meta-Learning

- Computer Science, Mathematics
- ArXiv
- 2019

An extension to MAML is introduced to incorporate an online hyperparameter adaptation scheme that eliminates the need to tune meta-learning and learning rates, and results with the Omniglot database demonstrate a substantial reduction in theneed to tune MAMl training hyperparameters and improvement to training stability with less sensitivity to hyperparam parameter choice. Expand

Learned Optimizers that Scale and Generalize

- Computer Science, Mathematics
- ICML
- 2017

This work introduces a learned gradient descent optimizer that generalizes well to new tasks, and which has significantly reduced memory and computation overhead, by introducing a novel hierarchical RNN architecture with minimal per-parameter overhead. Expand

Meta-SGD: Learning to Learn Quickly for Few Shot Learning

- Computer Science
- ArXiv
- 2017

Meta-SGD, an SGD-like, easily trainable meta-learner that can initialize and adapt any differentiable learner in just one step, shows highly competitive performance for few-shot learning on regression, classification, and reinforcement learning. Expand

Meta Learning via Learned Loss

- Computer Science, Mathematics
- 2020 25th International Conference on Pattern Recognition (ICPR)
- 2021

This paper presents a meta-learning method for learning parametric loss functions that can generalize across different tasks and model architectures, and develops a pipeline for “meta-training” such loss functions, targeted at maximizing the performance of the model trained under them. Expand