<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <font face="Lucida Grande">Hi everyone,<br>

      <br>

      I wanted to share the initial reviews we received on our ICAPS

      submission (which I've also attached).  Based on the reviews, I

      think the paper is unlikely to be accepted, so we are working to

      see whether we can get some new results for an IJCAI submission. 

      We are making good progress on developing hierarchical learning

      methods for AMDPs but we need to (a) move to larger/more complex

      domains, (b) develop some theoretical analysis (complexity,

      correctness, convergence), and (c) work on more AMDP-specific

      hierarchy learning techniques (right now we are using an

      off-the-shelf method called HierGen that works well but may not

      necessarily find the best hierarchy for an AMDP representation). 

      <br>

      <br>

      I'd be very interested to talk more about how this relates to the

      work that's happening at Brown, and to hear any feedback/ideas you

      might have about this work.<br>

      <br>

      Michael/Stephanie, could we maybe set up a time for the three of

      us to have a teleconference?  I'll be on vacation next week but

      the week after that would be good.  Possible times for me -- Mon

      1/29 before 11:30am, between 1-2, or after 4pm; Wed 1/31 before

      10am or after 2pm; Thu 2/1 between 11-1:30 or 3-4; Fri 2/2 any

      time.<br>

      <br>

      BTW, these are the Brown students who are on this list.  Please

      let me know if anyone should be added or removed.<br>

      <br>

      <a class="moz-txt-link-abbreviated" href="mailto:carl_trimbach@brown.edu">carl_trimbach@brown.edu</a><br>

      <a class="moz-txt-link-abbreviated" href="mailto:christopher_grimm@brown.edu">christopher_grimm@brown.edu</a><br>

      <a class="moz-txt-link-abbreviated" href="mailto:david_abel@brown.edu">david_abel@brown.edu</a><br>

      <a class="moz-txt-link-abbreviated" href="mailto:dilip.arumugam@gmail.com">dilip.arumugam@gmail.com</a><br>

      <a class="moz-txt-link-abbreviated" href="mailto:edward_c_williams@brown.edu">edward_c_williams@brown.edu</a><br>

      <a class="moz-txt-link-abbreviated" href="mailto:jun_ki_lee@brown.edu">jun_ki_lee@brown.edu</a><br>

      <a class="moz-txt-link-abbreviated" href="mailto:kcaluru@brown.edu">kcaluru@brown.edu</a><br>

      <a class="moz-txt-link-abbreviated" href="mailto:lsw@brown.edu">lsw@brown.edu</a><br>

      <a class="moz-txt-link-abbreviated" href="mailto:lucas_lehnert@brown.edu">lucas_lehnert@brown.edu</a><br>

      <a class="moz-txt-link-abbreviated" href="mailto:melrose_roderick@brown.edu">melrose_roderick@brown.edu</a><br>

      <a class="moz-txt-link-abbreviated" href="mailto:miles_holland@brown.edu">miles_holland@brown.edu</a><br>

      <a class="moz-txt-link-abbreviated" href="mailto:nakul_gopalan@brown.edu">nakul_gopalan@brown.edu</a><br>

      <a class="moz-txt-link-abbreviated" href="mailto:oberlin@cs.brown.edu">oberlin@cs.brown.edu</a><br>

      <a class="moz-txt-link-abbreviated" href="mailto:sam_saarinen@brown.edu">sam_saarinen@brown.edu</a><br>

      <a class="moz-txt-link-abbreviated" href="mailto:siddharth_karamcheti@brown.edu">siddharth_karamcheti@brown.edu</a><br>

      <br>

      Marie<br>

    </font>

    <div class="moz-forward-container"><br>

      <br>

      -------- Forwarded Message --------

      <table class="moz-email-headers-table" cellspacing="0"

        cellpadding="0" border="0">

        <tbody>

          <tr>

            <th nowrap="nowrap" valign="BASELINE" align="RIGHT">Subject:

            </th>

            <td>ICAPS 2018 review response (submission [*NUMBER*])</td>

          </tr>

          <tr>

            <th nowrap="nowrap" valign="BASELINE" align="RIGHT">Date: </th>

            <td>Thu, 11 Jan 2018 14:59:19 +0100</td>

          </tr>

          <tr>

            <th nowrap="nowrap" valign="BASELINE" align="RIGHT">From: </th>

            <td>ICAPS 2018 <a class="moz-txt-link-rfc2396E" href="mailto:icaps2018@easychair.org"><icaps2018@easychair.org></a></td>

          </tr>

          <tr>

            <th nowrap="nowrap" valign="BASELINE" align="RIGHT">To: </th>

            <td>Marie desJardins <a class="moz-txt-link-rfc2396E" href="mailto:mariedj@umbc.edu"><mariedj@umbc.edu></a></td>

          </tr>

        </tbody>

      </table>

      <br>

      <br>

      <pre>Dear Marie,

Thank you for your submission to ICAPS 2018. The ICAPS 2018 review

response period starts now and ends at January 13.

During this time, you will have access to the current state of your

reviews and have the opportunity to submit a response.  Please keep in

mind the following during this process:

* Most papers have a so-called placeholder review, which was

  necessary to give the discussion leaders access to the reviewer

  discussion. Some of these reviews list questions that already came

  up during the discussion and which you may address in your response but

  in all cases the (usually enthusiastic) scores are meaningless and you

  should ignore them. Placeholder reviews are clearly indicated as such in

  the review.

* Almost all papers have three reviews. Some may have four. A very

  low number of papers are missing one review. We hope to get that

  review completed in the next day. We apologize for this.

* The deadline for entering a response is January 13th (at 11:59pm

  UTC-12 i.e. anywhere in the world).

* Responses must be submitted through EasyChair.

* Responses are limited to 1000 words in total. You can only enter

  one response, not one per review.

* You will not be able to change your response after it is submitted.

* The response must focus on any factual errors in the reviews and any

  questions posed by the reviewers. Try to be as concise and as to the

  point as possible.

* The review response period is an opportunity to react to the

  reviews, but not a requirement to do so. Thus, if you feel the reviews

  are accurate and the reviewers have not asked any questions, then you

  do not have to respond.

* The reviews are as submitted by the PC members, without much

  coordination between them. Thus, there may be inconsistencies.

  Furthermore, these are not the final versions of the reviews. The

  reviews can later be updated to take into account the discussions at

  the program committee meeting, and we may find it necessary to solicit

  other outside reviews after the review response period.

* The program committee will read your responses carefully and

  take this information into account during the discussions. On the

  other hand, the program committee may not directly respond to your

  responses in the final versions of the reviews.

The reviews on your paper are attached to this letter. To submit your

response you should log on the EasyChair Web page for ICAPS 2018 and

select your submission on the menu.

----------------------- REVIEW 1 ---------------------

PAPER: 46

TITLE: Learning Abstracted Models and Hierarchies of Markov Decision Processes

AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie Milani, Shane Parr and Marie desJardins

Significance: 2 (modest contribution or average impact)

Soundness: 3 (correct)

Scholarship: 3 (excellent coverage of related work)

Clarity: 3 (well written)

Reproducibility: 3 (authors describe the implementation and domains in sufficient detail)

Overall evaluation: 1 (weak accept)

Reviewer's confidence: 2 (medium)

Suitable for a demo?: 1 (no)

Nominate for Best Paper Award: 1 (no)

Nominate for Best Student Paper Award (if eligible): 1 (no)

[Applications track ONLY]: Importance and novelty of the application: 6 (N/A (not an Applications track paper))

[Applications track ONLY]: Importance of planning/scheduling technology to the solution of the problem: 5 (N/A (not an Applications track paper))

[Applications track ONLY] Maturity: 7 (N/A (not an Applications track paper))

[Robotics track ONLY]: Balance of Robotics and Automated Planning and Scheduling: 6 (N/A (not a Robotics track paper))

[Robotics Track ONLY]: Evaluation on physical platforms/simulators: 6 (N/A (not a Robotics track paper))

[Robotics Track ONLY]: Significance of the contribution: 6 (N/A (not a Robotics track paper))

----------- Review -----------

The paper proposes a method for learning abstract Markov decision processes (AMDP) from demonstration trajectories and model based reinforcement learning. Experiments show that the method is more effective than the baseline.

On the positive side, a complete method for learning AMDP is given and is shown to be work on the problems used in the experiments. The proposed model based reinforcement learning method based on R-MAX is also shown to outperform the baseline R-MAXQ.

On the negative side, the method for learning the hierarchy, HierGen, is taken from a prior work, leaving the adaptation of R-MAX to learn with hierarchy as the main algorithmic novelty. No convergence proof for the earning method is provided, although it is empirically shown to outperform the baseline R-MAXQ. The experiments are done on toy problems, indicating that the method is probably not ready for more demanding practical problems.

Overall, I am inclined to vote weak accept. The problem is difficult, so I think that the work does represent progress, although it is not yet compelling.

----------------------- REVIEW 2 ---------------------

PAPER: 46

TITLE: Learning Abstracted Models and Hierarchies of Markov Decision Processes

AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie Milani, Shane Parr and Marie desJardins

Significance: 2 (modest contribution or average impact)

Soundness: 3 (correct)

Scholarship: 2 (relevant literature cited but could be expanded)

Clarity: 3 (well written)

Reproducibility: 3 (authors describe the implementation and domains in sufficient detail)

Overall evaluation: -1 (weak reject)

Reviewer's confidence: 4 (expert)

Suitable for a demo?: 2 (maybe)

Nominate for Best Paper Award: 1 (no)

Nominate for Best Student Paper Award (if eligible): 1 (no)

[Applications track ONLY]: Importance and novelty of the application: 6 (N/A (not an Applications track paper))

[Applications track ONLY]: Importance of planning/scheduling technology to the solution of the problem: 5 (N/A (not an Applications track paper))

[Applications track ONLY] Maturity: 7 (N/A (not an Applications track paper))

[Robotics track ONLY]: Balance of Robotics and Automated Planning and Scheduling: 6 (N/A (not a Robotics track paper))

[Robotics Track ONLY]: Evaluation on physical platforms/simulators: 6 (N/A (not a Robotics track paper))

[Robotics Track ONLY]: Significance of the contribution: 6 (N/A (not a Robotics track paper))

----------- Review -----------

The authors introduce a reinforcement learning algorithm for AMDPs that learns a hierarchical structure and a set of hierarchical models. To learn the hierarchical structure, they rely on an existing algorithm called HierGen. This algorithm extracts causal structure from a set of expert trajectories in a factored state environment.

While R-AMDP outperforms R-MAXQ on the two toy problems, I think there is a lot more work to do to show that R-AMDP is a good basis for developing more general algorithms. First, it would be nice to examine the computational complexity of R-AMDP (rather than just empirical comparison in Figure 3). Second, what if R-AMDP is just getting lucky in the two toy tasks presented. Maybe there are other problems where R-AMDP performs poorly. Further, stopping the plots at 50 or 60 trials may just be misleading since R-AMDP could be converging to a suboptimal but pretty good policy early on. It’s also not clear that R-AMDP can be scaled to huge state or action spaces. Does the hierarchical structure discovered by HierGen lend itself to transfer when the dynamics change? It would be nice to have a more rigorous analysis of R-AMDP and a longer discussion of its potential pitfalls (when should we expected it to succeed and when should it fail?). There is a hind of this in the discussio!

 n about HierGen’s inability to distinguish between correlation and causation.

While reading the abstract I expected the contribution to be in learning the hierarchy. The authors should probably change the abstract to avoid this confusion.

----------------------- REVIEW 3 ---------------------

PAPER: 46

TITLE: Learning Abstracted Models and Hierarchies of Markov Decision Processes

AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie Milani, Shane Parr and Marie desJardins

Significance: 3 (substantial contribution or strong impact)

Soundness: 3 (correct)

Scholarship: 3 (excellent coverage of related work)

Clarity: 3 (well written)

Reproducibility: 5 (code and domains (whichever apply) are already publicly available)

Overall evaluation: 3 (strong accept)

Reviewer's confidence: 4 (expert)

Suitable for a demo?: 3 (yes)

Nominate for Best Paper Award: 1 (no)

Nominate for Best Student Paper Award (if eligible): 1 (no)

[Applications track ONLY]: Importance and novelty of the application: 6 (N/A (not an Applications track paper))

[Applications track ONLY]: Importance of planning/scheduling technology to the solution of the problem: 5 (N/A (not an Applications track paper))

[Applications track ONLY] Maturity: 7 (N/A (not an Applications track paper))

[Robotics track ONLY]: Balance of Robotics and Automated Planning and Scheduling: 6 (N/A (not a Robotics track paper))

[Robotics Track ONLY]: Evaluation on physical platforms/simulators: 6 (N/A (not a Robotics track paper))

[Robotics Track ONLY]: Significance of the contribution: 6 (N/A (not a Robotics track paper))

----------- Review -----------

This is only a placeholder review. Please ignore it.

----------------------- REVIEW 4 ---------------------

PAPER: 46

TITLE: Learning Abstracted Models and Hierarchies of Markov Decision Processes

AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie Milani, Shane Parr and Marie desJardins

Significance: 2 (modest contribution or average impact)

Soundness: 2 (minor inconsistencies or small fixable errors)

Scholarship: 3 (excellent coverage of related work)

Clarity: 1 (hard to follow)

Reproducibility: 2 (some details missing but still appears to be replicable with some effort)

Overall evaluation: -1 (weak reject)

Reviewer's confidence: 3 (high)

Suitable for a demo?: 2 (maybe)

Nominate for Best Paper Award: 1 (no)

Nominate for Best Student Paper Award (if eligible): 1 (no)

[Applications track ONLY]: Importance and novelty of the application: 6 (N/A (not an Applications track paper))

[Applications track ONLY]: Importance of planning/scheduling technology to the solution of the problem: 5 (N/A (not an Applications track paper))

[Applications track ONLY] Maturity: 7 (N/A (not an Applications track paper))

[Robotics track ONLY]: Balance of Robotics and Automated Planning and Scheduling: 6 (N/A (not a Robotics track paper))

[Robotics Track ONLY]: Evaluation on physical platforms/simulators: 6 (N/A (not a Robotics track paper))

[Robotics Track ONLY]: Significance of the contribution: 6 (N/A (not a Robotics track paper))

----------- Review -----------

The paper describes an approach for learning abstract models and hierarchies for hierarchies of AMDPs. These hierarchies are similar, if not exactly the same, as those used by frameworks such as MAXQ, where each task in the hierarchy is an MDP with actions corresponding to child tasks. Prior AMDP work apparently uses hand-specified models of each task/AMDP, which are directly used for planning. This paper extends that work by learning the models of each task/AMDP. This is done using RMAX at each task. There is not a discussion of convergence guarantees of the approach. Apparently convergence must occur in a bottom-up way. Experiments are shown in two domains and with two hierarchies in one of the domains (Taxi). The approach appears to learn more efficiently than a prior approach R-MAXQ. The exact reasons for the increased efficiency were not exactly clear based on my understanding from the paper. 

The paper is well-written at a high level, but the more technical and formal descriptions could be improved quite a bit. For example, the key object AMDP, is only described informally (the tuple is not described in detail). Most of the paper is written quite informally.  Another example is that Table 1 talks about "max planner rollouts", but I didn't see where rollouts are used anywhere in the algorithm description. 

After reading the abstract and introduction, I expected that a big part of the contribution would be about actually learning the hierarchy. However, that does not seem to be the case. Rather, an off-the-shelf approach is used to learn hierarchies and then plugged into the proposed algorithm for learning the models of tasks. Further, this is only tried for one of the two experimental domains. The abstract and introduction should be more clear about the contributions of the paper. 

Overall, I was unclear about what to learn from the paper. The main contribution is apparently algorithm 1, which uses R-MAX to learn the models of each AMPD in a given hierarchy. Perhaps this is a novel algorithm, but it feels like more of a baseline in the sense that it is the first thing that one might try given the problem setup. I may not be appreciating some type of complexity that makes this not be straightforward. This baseline approach would have been more interesting if some form of convergence result was provided, similar to what was provided for R-MAXQ.   

The experiments show that R-AMDP learns faster and is more computationally efficient than R-MAXQ. I was unable to get a good understanding for why this was the case. This is likely due to the fact that I was not able to revisit the R-MAXQ algorithm and it was not described in detail in this paper. The authors do try to explain the reasons for the performance improvement, but I was unable to follow exactly. My best guess based on the discussion is that R-MAXQ does not try to exploit the state abstraction provided for each task by the hierarchy ("R-MAXQ must compute a model over all possible future states in a planning envelope after each action"). Is this the primary reason or is there some other reason? Adding the ability to exploit abstractions in R-MAXQ seems straightforward, though maybe I'm missing something.

------------------------------------------------------

Best wishes,

Gabi Röger and Sven Koenig

ICAPS 2018 program chairs

</pre>

    </div>

  </body>

</html>