[Robot-learning] Collaboration with Brown

Abel, David david_abel at brown.edu
Tue Aug 14 11:57:26 EDT 2018


Hi all,

I received the following comment from Reid Simmons with a request to revise
> and resubmit the annual report:
>
> "Unclear whether all the reported work was done at UMBC or some was done
> at Brown. If some of the work was done by collaborators, please indicate
> this; if the work was done fully at UMBC, please indicate what types of
> collaboration were done in the past year (and what are expected in the
> coming year)."
>
> The submitted version is attached.  Can some combination of John, Michael,
> Stefanie, Nakul, and David provide some input about collaborations?  I do
> know that Reid has expressed some concern in the past about how/whether the
> two project sites are coordinating, so emphasizing the ways in which our
> work is coordinating and complementing each other would be good to add.
>

John and I have been collaborating on a project together since around
March. I don't see the project described in the attached AMDP writeup, so
here's a brief description.

At a high level, we're investigating whether we can improve how option
models are computed, both in terms of (1) learning options and their
models, and (2) using options to plan (as part of a hierarchy or on their
own). The main insight we're exploiting to improve over current option
models is that the option model shouldn't depend on the exact number of
lower level actions taken in an execution of the option. Instead, we offer
a variant of options that retains a *rough estimate* of the number of lower
level actions taken on a per state basis. This value is most critical in
figuring out how much to discount future plans.

So far we've shown:

   1. A sample bound for learning options using this new model. (How many
   samples $(s, o, s')$ are needed to determine *roughly* how many lower
   level actions will be taken when $o$ is executed in $s$?)
   2. A bound on the value function when using the new, learned, option
   model, compared to using the usual option models.
   3. John has conducted some really interesting experiments in a variety
   of Taxi instances that showcase the potential of the method. In short: we
   can learn faster, and with lower variance, if we use the new option model.

We have several ongoing subtasks:

   - Use the new option model to inform the option reward model, too.
   - Prove similar results as (1.) and (2.) above with the new option
   reward model.
   - Target option models with low variance.

Our writeup is here
<https://www.sharelatex.com/project/5ab3e0446f167e439582055a>. Hope this
helps! Let me know if there is any other information that would be useful --

Best,
-Dave


>
> Michael
>
> --
> Dr. Marie desJardins
> Associate Dean for Academic Affairs
> College of Engineering and Information Technology
> University of Maryland, Baltimore County
> 1000 Hilltop Circle
> Baltimore MD 21250
>
> Email: mariedj at umbc.edu
> Voice: 410-455-3967
> Fax: 410-455-3559
>
> _______________________________________________
> Robot-learning mailing list
> Robot-learning at cs.umbc.edu
> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cs.umbc.edu/pipermail/robot-learning/attachments/20180814/578f1bc6/attachment.html>


More information about the Robot-learning mailing list