![]() Second, for the same tasks, we sample multiple trajectories from the updated parameters θ’ and backpropagate to θ the gradient of the policy objective.Firstly, for a given set of tasks, we sample multiple trajectories using θ and update the parameter using one (or multiple) gradient step(s) of the policy gradient objective.The meta-training algorithm is divided into two parts: We are looking for a pretrained parameter that can reach near-optimal parameters for every task in one (or a few) gradient step(s)įor every task i=1,2,3, the parameter θ should reach a near-optimal parameter θ *. The figure below illustrates how MAML should work at meta-test time. Considering that we have found a good initialisation parameter θ from which we can perform efficient one-shot adaptation, and given a new task, the new parameter θ’, obtained by gradient descent, should achieve a good performance on the new task. Let’s see how we can do that! MAML Algorithm Meta-testing goalīefore explaining how to train MAML ( meta-training), let’s define what we would expect at meta-test time. Can we directly optimise the initialisation parameter to guarantee good adaptation? In MAML, the parameters of the model are explicitly trained to provide a high performance after fine-tuning via gradient descent. ![]() Hence, the policy should optimise the objectiveĪgain, a major flaw of this method is that the initial parameter θ is optimised to maximise the average return over all tasks but does not guarantee any fast adaptation. train an optimal policy over all these tasks before doing a fine-tuning adaptation. Therefore, another method might be to do multi-task learning, i.e. Moreover, in meta-learning, we have a distribution of meta-training tasks instead of a single task. However, these methods still require a large amount of data and are not suitable for fast few-shot adaptation. fine-tuning of a ResNet trained on ImageNet). The first natural way to learn a new task is to use transfer learning via fine-tuning (e.g. Transfer Learning and Multi-Task Learning From fine-tuning to MAML With MAML, you can train agents that quickly adapt in almost any dense-reward environment. In this post, we introduce our first Meta-RL algorithm: MAML (Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks). Previous post: A simple introduction to Meta-Reinforcement Learning At meta-testing, we apply this algorithm to learn a near-optimal policy.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |