Student Tier
Create a special $1-2 student sponsorship tier with meaningful benefits that acknowledges their support while respecting their budget.
- RL with policy advice. Azar et al., ECML 2013.
Loading actions...
Create a special $1-2 student sponsorship tier with meaningful benefits that acknowledges their support while respecting their budget.
Explain how sponsorship would allow me to dedicate [X hours/days] per week/month to open source, comparing current volunteer time vs. potential sponsored time.
I want you to act as a spoken English teacher and improver. I will speak to you in English and you will reply to me in English to practice my spoken English. I want you to keep your reply neat, limiti...
RL with policy advice. Azar et al., ECML 2013.
- Reduction from RL to bandit problem.
Regret bounds: sum of differences between actual policy and optimal policy.
Regret scales with the number of tasks \sqrt(M), rather than the state and action space.
Brunskill and Li, UAI 2013. Reduce from RL to (active) classification problem.
Provably speeding multitask RL. Guo and Brunskill, AAAI 2015. K tasks sampled from M tasks. Evaluation goal: provably improve performance. Approach: quickly cluster, then share.
Killian et al., NIPS 2017. Bayesian NNs for modeling MDP dynamics.
Smooth latent policy space for crossdomain transfer. Anmar et al., IJCAI 2015. Limited theoretical results (some nice convergence results).
Model-agnostic meta-learning. Finn et al., ICML 2017.