<div dir="ltr">Hi all,<div><br></div><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><font face="Lucida Grande">I received the following comment from Reid

      Simmons with a request to revise and resubmit the annual report:<br>

      <br>

      "Unclear whether all the reported work was done at UMBC or some

      was done at Brown. If some of the work was done by collaborators,

      please indicate this; if the work was done fully at UMBC, please

      indicate what types of collaboration were done in the past year

      (and what are expected in the coming year)."<br>

      <br>

      The submitted version is attached.  Can some combination of John,

      Michael, Stefanie, Nakul, and David provide some input about

      collaborations?  I do know that Reid has expressed some concern in

      the past about how/whether the two project sites are coordinating,

      so emphasizing the ways in which our work is coordinating and

      complementing each other would be good to add.<br></font></div></blockquote><div><br></div><div>John and I have been collaborating on a project together since around March. I don't see the project described in the attached AMDP writeup, so here's a brief description.</div><div><br></div><div>At a high level, we're investigating whether we can improve how option models are computed, both in terms of (1) learning options and their models, and (2) using options to plan (as part of a hierarchy or on their own). The main insight we're exploiting to improve over current option models is that the option model shouldn't depend on the exact number of lower level actions taken in an execution of the option. Instead, we offer a variant of options that retains a <i>rough estimate</i> of the number of lower level actions taken on a per state basis. This value is most critical in figuring out how much to discount future plans.</div><div><br></div><div>So far we've shown:</div><div><ol><li>A sample bound for learning options using this new model. (How many samples $(s, o, s')$ are needed to determine <i>roughly</i> how many lower level actions will be taken when $o$ is executed in $s$?)</li><li>A bound on the value function when using the new, learned, option model, compared to using the usual option models.</li><li>John has conducted some really interesting experiments in a variety of Taxi instances that showcase the potential of the method. In short: we can learn faster, and with lower variance, if we use the new option model.</li></ol><div>We have several ongoing subtasks:</div></div><div><ul><li>Use the new option model to inform the option reward model, too.</li><li>Prove similar results as (1.) and (2.) above with the new option reward model.</li><li>Target option models with low variance.</li></ul></div><div><a href="https://www.sharelatex.com/project/5ab3e0446f167e439582055a">Our writeup is here</a>. Hope this helps! Let me know if there is any other information that would be useful --</div><div><br></div><div>Best,</div><div>-Dave</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><font face="Lucida Grande">

      <br>

      Michael <br><span class="HOEnZb"><font color="#888888">

      <br>

    </font></span></font><span class="HOEnZb"><font color="#888888">

    <div class="m_-2649567925692750419moz-signature">-- <br>

      Dr. Marie desJardins

      <br>

      Associate Dean for Academic Affairs

      <br>

      College of Engineering and Information Technology

      <br>

      University of Maryland, Baltimore County

      <br>

      1000 Hilltop Circle

      <br>

      Baltimore MD 21250

      <br>

      <br>

      Email: <a class="m_-2649567925692750419moz-txt-link-abbreviated" href="mailto:mariedj@umbc.edu" target="_blank">mariedj@umbc.edu</a>

      <br>

      Voice: 410-455-3967

      <br>

      Fax: 410-455-3559</div>

  </font></span></div>


<br>______________________________<wbr>_________________<br>

Robot-learning mailing list<br>

<a href="mailto:Robot-learning@cs.umbc.edu">Robot-learning@cs.umbc.edu</a><br>

<a href="https://lists.cs.umbc.edu/mailman/listinfo/robot-learning" rel="noreferrer" target="_blank">https://lists.cs.umbc.edu/<wbr>mailman/listinfo/robot-<wbr>learning</a><br>

<br></blockquote></div><br></div></div>