[Robot-learning] Collaboration with Brown

Tue Aug 21 17:17:00 EDT 2018

Not sure what you mean about "this collaboration issue". As far as I know,
the description Dave sent is the main point of direct collaboration.

Are you saying you think that we need to find more points of contact to
soothe the PM?

On Tue, Aug 21, 2018 at 10:57 AM Marie desJardins <mariedj at umbc.edu> wrote:

> Thanks, David, this is really helpful.
>
> Michael/Stefanie, can you share your thoughts on this collaboration issue?
>
> Marie
>
> On 8/14/18 11:57 AM, Abel, David wrote:
>
> Hi all,
>
> I received the following comment from Reid Simmons with a request to
>> revise and resubmit the annual report:
>>
>> "Unclear whether all the reported work was done at UMBC or some was done
>> at Brown. If some of the work was done by collaborators, please indicate
>> this; if the work was done fully at UMBC, please indicate what types of
>> collaboration were done in the past year (and what are expected in the
>> coming year)."
>>
>> The submitted version is attached.  Can some combination of John,
>> Michael, Stefanie, Nakul, and David provide some input about
>> collaborations?  I do know that Reid has expressed some concern in the past
>> about how/whether the two project sites are coordinating, so emphasizing
>> the ways in which our work is coordinating and complementing each other
>> would be good to add.
>>
>
> John and I have been collaborating on a project together since around
> March. I don't see the project described in the attached AMDP writeup, so
> here's a brief description.
>
> At a high level, we're investigating whether we can improve how option
> models are computed, both in terms of (1) learning options and their
> models, and (2) using options to plan (as part of a hierarchy or on their
> own). The main insight we're exploiting to improve over current option
> models is that the option model shouldn't depend on the exact number of
> lower level actions taken in an execution of the option. Instead, we offer
> a variant of options that retains a *rough estimate* of the number of
> lower level actions taken on a per state basis. This value is most critical
> in figuring out how much to discount future plans.
>
> So far we've shown:
>
>    1. A sample bound for learning options using this new model. (How many
>    samples $(s, o, s')$ are needed to determine *roughly* how many lower
>    level actions will be taken when $o$ is executed in $s$?)
>    2. A bound on the value function when using the new, learned, option
>    model, compared to using the usual option models.
>    3. John has conducted some really interesting experiments in a variety
>    of Taxi instances that showcase the potential of the method. In short: we
>    can learn faster, and with lower variance, if we use the new option model.
>
> We have several ongoing subtasks:
>
>    - Use the new option model to inform the option reward model, too.
>    - Prove similar results as (1.) and (2.) above with the new option
>    reward model.
>    - Target option models with low variance.
>
> Our writeup is here
> <https://www.sharelatex.com/project/5ab3e0446f167e439582055a>. Hope this
> helps! Let me know if there is any other information that would be useful --
>
> Best,
> -Dave
>
>
>>
>> Michael
>>
>> --
>> Dr. Marie desJardins
>> Associate Dean for Academic Affairs
>> College of Engineering and Information Technology
>> University of Maryland, Baltimore County
>> 1000 Hilltop Circle
>> Baltimore MD 21250
>>
>> Email: mariedj at umbc.edu
>> Voice: 410-455-3967
>> Fax: 410-455-3559
>>
>> _______________________________________________
>> Robot-learning mailing list
>> Robot-learning at cs.umbc.edu
>> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>>
>>
>
>
> _______________________________________________
> Robot-learning mailing listRobot-learning at cs.umbc.eduhttps://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>
>
> _______________________________________________
> Robot-learning mailing list
> Robot-learning at cs.umbc.edu
> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cs.umbc.edu/pipermail/robot-learning/attachments/20180821/30fa5e36/attachment.html>