[Robot-learning] Collaboration with Brown

Wed Aug 29 07:22:14 EDT 2018

Marie, thanks again for keeping the ball rolling. Here's the pdf you asked
for.

On Wed, Aug 29, 2018 at 4:29 AM Marie desJardins <mariedj at umbc.edu> wrote:

> I've updated the annual report (see attached), but I'm having trouble
> accessing the link that David sent with the writeup -- I haven't used
> sharelatex and I don't know what account (if any) you sent it to.  Can you
> send as an attachment?
>
> Marie
>
> On 8/14/18 11:57 AM, Abel, David wrote:
>
> Hi all,
>
> I received the following comment from Reid Simmons with a request to
>> revise and resubmit the annual report:
>>
>> "Unclear whether all the reported work was done at UMBC or some was done
>> at Brown. If some of the work was done by collaborators, please indicate
>> this; if the work was done fully at UMBC, please indicate what types of
>> collaboration were done in the past year (and what are expected in the
>> coming year)."
>>
>> The submitted version is attached.  Can some combination of John,
>> Michael, Stefanie, Nakul, and David provide some input about
>> collaborations?  I do know that Reid has expressed some concern in the past
>> about how/whether the two project sites are coordinating, so emphasizing
>> the ways in which our work is coordinating and complementing each other
>> would be good to add.
>>
>
> John and I have been collaborating on a project together since around
> March. I don't see the project described in the attached AMDP writeup, so
> here's a brief description.
>
> At a high level, we're investigating whether we can improve how option
> models are computed, both in terms of (1) learning options and their
> models, and (2) using options to plan (as part of a hierarchy or on their
> own). The main insight we're exploiting to improve over current option
> models is that the option model shouldn't depend on the exact number of
> lower level actions taken in an execution of the option. Instead, we offer
> a variant of options that retains a *rough estimate* of the number of
> lower level actions taken on a per state basis. This value is most critical
> in figuring out how much to discount future plans.
>
> So far we've shown:
>
>    1. A sample bound for learning options using this new model. (How many
>    samples $(s, o, s')$ are needed to determine *roughly* how many lower
>    level actions will be taken when $o$ is executed in $s$?)
>    2. A bound on the value function when using the new, learned, option
>    model, compared to using the usual option models.
>    3. John has conducted some really interesting experiments in a variety
>    of Taxi instances that showcase the potential of the method. In short: we
>    can learn faster, and with lower variance, if we use the new option model.
>
> We have several ongoing subtasks:
>
>    - Use the new option model to inform the option reward model, too.
>    - Prove similar results as (1.) and (2.) above with the new option
>    reward model.
>    - Target option models with low variance.
>
> Our writeup is here
> <https://www.sharelatex.com/project/5ab3e0446f167e439582055a>. Hope this
> helps! Let me know if there is any other information that would be useful --
>
> Best,
> -Dave
>
>
>>
>> Michael
>>
>> --
>> Dr. Marie desJardins
>> Associate Dean for Academic Affairs
>> College of Engineering and Information Technology
>> University of Maryland, Baltimore County
>> 1000 Hilltop Circle
>> Baltimore MD 21250
>>
>> Email: mariedj at umbc.edu
>> Voice: 410-455-3967
>> Fax: 410-455-3559
>>
>> _______________________________________________
>> Robot-learning mailing list
>> Robot-learning at cs.umbc.edu
>> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>>
>>
>
>
> _______________________________________________
> Robot-learning mailing listRobot-learning at cs.umbc.eduhttps://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>
>
> _______________________________________________
> Robot-learning mailing list
> Robot-learning at cs.umbc.edu
> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cs.umbc.edu/pipermail/robot-learning/attachments/20180829/81a4487d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Brainstorm__John___Dave_on_Abstraction__Hierarchies__and_Horizons.pdf
Type: application/pdf
Size: 283966 bytes
Desc: not available
URL: <https://lists.cs.umbc.edu/pipermail/robot-learning/attachments/20180829/81a4487d/attachment-0001.pdf>