[Robot-learning] Fwd: ICAPS 2018 review response (submission [NUMBER])

Sun Jan 21 04:51:24 EST 2018

So have kcaluru at brown.edu and miles_holland at brown.edu

The reviews look like typical planning-community reviews -- generally
sensible requests but clearly impossible to accomplish within the page
limit. I guess it's generally hard to please planning reviewers unless
there are some theoretical results. Review 2 actually reads a little like
one that Nakul got for his paper...

I don't know if Michael and Stefanie have answered separately regarding a
meeting; it certainly sounds helpful to continue discussing (AMDP)
hierarchy learning. Both the IJCAI and RSS deadlines are on that week (1/31
and 2/1 respectively), so if possible it may be best to meet after those
deadlines, such as on Fri 2/2 -- unless the intent was to discuss before
the IJCAI deadline.

-Lawson

On Sat, Jan 20, 2018 at 6:04 AM, Littman, Michael <mlittman at cs.brown.edu>
wrote:

> christopher_grimm at brown.edu has graduated.
>
>
> On Fri, Jan 19, 2018 at 12:06 PM, Marie desJardins <mariedj at cs.umbc.edu>
> wrote:
>
>> Hi everyone,
>>
>> I wanted to share the initial reviews we received on our ICAPS submission
>> (which I've also attached).  Based on the reviews, I think the paper is
>> unlikely to be accepted, so we are working to see whether we can get some
>> new results for an IJCAI submission.  We are making good progress on
>> developing hierarchical learning methods for AMDPs but we need to (a) move
>> to larger/more complex domains, (b) develop some theoretical analysis
>> (complexity, correctness, convergence), and (c) work on more AMDP-specific
>> hierarchy learning techniques (right now we are using an off-the-shelf
>> method called HierGen that works well but may not necessarily find the best
>> hierarchy for an AMDP representation).
>>
>> I'd be very interested to talk more about how this relates to the work
>> that's happening at Brown, and to hear any feedback/ideas you might have
>> about this work.
>>
>> Michael/Stephanie, could we maybe set up a time for the three of us to
>> have a teleconference?  I'll be on vacation next week but the week after
>> that would be good.  Possible times for me -- Mon 1/29 before 11:30am,
>> between 1-2, or after 4pm; Wed 1/31 before 10am or after 2pm; Thu 2/1
>> between 11-1:30 or 3-4; Fri 2/2 any time.
>>
>> BTW, these are the Brown students who are on this list.  Please let me
>> know if anyone should be added or removed.
>>
>> carl_trimbach at brown.edu
>> christopher_grimm at brown.edu
>> david_abel at brown.edu
>> dilip.arumugam at gmail.com
>> edward_c_williams at brown.edu
>> jun_ki_lee at brown.edu
>> kcaluru at brown.edu
>> lsw at brown.edu
>> lucas_lehnert at brown.edu
>> melrose_roderick at brown.edu
>> miles_holland at brown.edu
>> nakul_gopalan at brown.edu
>> oberlin at cs.brown.edu
>> sam_saarinen at brown.edu
>> siddharth_karamcheti at brown.edu
>>
>> Marie
>>
>>
>> -------- Forwarded Message --------
>> Subject: ICAPS 2018 review response (submission [*NUMBER*])
>> Date: Thu, 11 Jan 2018 14:59:19 +0100
>> From: ICAPS 2018 <icaps2018 at easychair.org> <icaps2018 at easychair.org>
>> To: Marie desJardins <mariedj at umbc.edu> <mariedj at umbc.edu>
>>
>> Dear Marie,
>>
>> Thank you for your submission to ICAPS 2018. The ICAPS 2018 review
>> response period starts now and ends at January 13.
>>
>> During this time, you will have access to the current state of your
>> reviews and have the opportunity to submit a response.  Please keep in
>> mind the following during this process:
>>
>> * Most papers have a so-called placeholder review, which was
>>   necessary to give the discussion leaders access to the reviewer
>>   discussion. Some of these reviews list questions that already came
>>   up during the discussion and which you may address in your response but
>>   in all cases the (usually enthusiastic) scores are meaningless and you
>>   should ignore them. Placeholder reviews are clearly indicated as such in
>>   the review.
>>
>> * Almost all papers have three reviews. Some may have four. A very
>>   low number of papers are missing one review. We hope to get that
>>   review completed in the next day. We apologize for this.
>>
>> * The deadline for entering a response is January 13th (at 11:59pm
>>   UTC-12 i.e. anywhere in the world).
>>
>> * Responses must be submitted through EasyChair.
>>
>> * Responses are limited to 1000 words in total. You can only enter
>>   one response, not one per review.
>>
>> * You will not be able to change your response after it is submitted.
>>
>> * The response must focus on any factual errors in the reviews and any
>>   questions posed by the reviewers. Try to be as concise and as to the
>>   point as possible.
>>
>> * The review response period is an opportunity to react to the
>>   reviews, but not a requirement to do so. Thus, if you feel the reviews
>>   are accurate and the reviewers have not asked any questions, then you
>>   do not have to respond.
>>
>> * The reviews are as submitted by the PC members, without much
>>   coordination between them. Thus, there may be inconsistencies.
>>   Furthermore, these are not the final versions of the reviews. The
>>   reviews can later be updated to take into account the discussions at
>>   the program committee meeting, and we may find it necessary to solicit
>>   other outside reviews after the review response period.
>>
>> * The program committee will read your responses carefully and
>>   take this information into account during the discussions. On the
>>   other hand, the program committee may not directly respond to your
>>   responses in the final versions of the reviews.
>>
>> The reviews on your paper are attached to this letter. To submit your
>> response you should log on the EasyChair Web page for ICAPS 2018 and
>> select your submission on the menu.
>>
>> ----------------------- REVIEW 1 ---------------------
>> PAPER: 46
>> TITLE: Learning Abstracted Models and Hierarchies of Markov Decision Processes
>> AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie Milani, Shane Parr and Marie desJardins
>>
>> Significance: 2 (modest contribution or average impact)
>> Soundness: 3 (correct)
>> Scholarship: 3 (excellent coverage of related work)
>> Clarity: 3 (well written)
>> Reproducibility: 3 (authors describe the implementation and domains in sufficient detail)
>> Overall evaluation: 1 (weak accept)
>> Reviewer's confidence: 2 (medium)
>> Suitable for a demo?: 1 (no)
>> Nominate for Best Paper Award: 1 (no)
>> Nominate for Best Student Paper Award (if eligible): 1 (no)
>> [Applications track ONLY]: Importance and novelty of the application: 6 (N/A (not an Applications track paper))
>> [Applications track ONLY]: Importance of planning/scheduling technology to the solution of the problem: 5 (N/A (not an Applications track paper))
>> [Applications track ONLY] Maturity: 7 (N/A (not an Applications track paper))
>> [Robotics track ONLY]: Balance of Robotics and Automated Planning and Scheduling: 6 (N/A (not a Robotics track paper))
>> [Robotics Track ONLY]: Evaluation on physical platforms/simulators: 6 (N/A (not a Robotics track paper))
>> [Robotics Track ONLY]: Significance of the contribution: 6 (N/A (not a Robotics track paper))
>>
>> ----------- Review -----------
>> The paper proposes a method for learning abstract Markov decision processes (AMDP) from demonstration trajectories and model based reinforcement learning. Experiments show that the method is more effective than the baseline.
>>
>> On the positive side, a complete method for learning AMDP is given and is shown to be work on the problems used in the experiments. The proposed model based reinforcement learning method based on R-MAX is also shown to outperform the baseline R-MAXQ.
>>
>> On the negative side, the method for learning the hierarchy, HierGen, is taken from a prior work, leaving the adaptation of R-MAX to learn with hierarchy as the main algorithmic novelty. No convergence proof for the earning method is provided, although it is empirically shown to outperform the baseline R-MAXQ. The experiments are done on toy problems, indicating that the method is probably not ready for more demanding practical problems.
>>
>> Overall, I am inclined to vote weak accept. The problem is difficult, so I think that the work does represent progress, although it is not yet compelling.
>>
>> ----------------------- REVIEW 2 ---------------------
>> PAPER: 46
>> TITLE: Learning Abstracted Models and Hierarchies of Markov Decision Processes
>> AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie Milani, Shane Parr and Marie desJardins
>>
>> Significance: 2 (modest contribution or average impact)
>> Soundness: 3 (correct)
>> Scholarship: 2 (relevant literature cited but could be expanded)
>> Clarity: 3 (well written)
>> Reproducibility: 3 (authors describe the implementation and domains in sufficient detail)
>> Overall evaluation: -1 (weak reject)
>> Reviewer's confidence: 4 (expert)
>> Suitable for a demo?: 2 (maybe)
>> Nominate for Best Paper Award: 1 (no)
>> Nominate for Best Student Paper Award (if eligible): 1 (no)
>> [Applications track ONLY]: Importance and novelty of the application: 6 (N/A (not an Applications track paper))
>> [Applications track ONLY]: Importance of planning/scheduling technology to the solution of the problem: 5 (N/A (not an Applications track paper))
>> [Applications track ONLY] Maturity: 7 (N/A (not an Applications track paper))
>> [Robotics track ONLY]: Balance of Robotics and Automated Planning and Scheduling: 6 (N/A (not a Robotics track paper))
>> [Robotics Track ONLY]: Evaluation on physical platforms/simulators: 6 (N/A (not a Robotics track paper))
>> [Robotics Track ONLY]: Significance of the contribution: 6 (N/A (not a Robotics track paper))
>>
>> ----------- Review -----------
>> The authors introduce a reinforcement learning algorithm for AMDPs that learns a hierarchical structure and a set of hierarchical models. To learn the hierarchical structure, they rely on an existing algorithm called HierGen. This algorithm extracts causal structure from a set of expert trajectories in a factored state environment.
>>
>> While R-AMDP outperforms R-MAXQ on the two toy problems, I think there is a lot more work to do to show that R-AMDP is a good basis for developing more general algorithms. First, it would be nice to examine the computational complexity of R-AMDP (rather than just empirical comparison in Figure 3). Second, what if R-AMDP is just getting lucky in the two toy tasks presented. Maybe there are other problems where R-AMDP performs poorly. Further, stopping the plots at 50 or 60 trials may just be misleading since R-AMDP could be converging to a suboptimal but pretty good policy early on. It’s also not clear that R-AMDP can be scaled to huge state or action spaces. Does the hierarchical structure discovered by HierGen lend itself to transfer when the dynamics change? It would be nice to have a more rigorous analysis of R-AMDP and a longer discussion of its potential pitfalls (when should we expected it to succeed and when should it fail?). There is a hind of this in the discussio!
>>  n about HierGen’s inability to distinguish between correlation and causation.
>>
>> While reading the abstract I expected the contribution to be in learning the hierarchy. The authors should probably change the abstract to avoid this confusion.
>>
>> ----------------------- REVIEW 3 ---------------------
>> PAPER: 46
>> TITLE: Learning Abstracted Models and Hierarchies of Markov Decision Processes
>> AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie Milani, Shane Parr and Marie desJardins
>>
>> Significance: 3 (substantial contribution or strong impact)
>> Soundness: 3 (correct)
>> Scholarship: 3 (excellent coverage of related work)
>> Clarity: 3 (well written)
>> Reproducibility: 5 (code and domains (whichever apply) are already publicly available)
>> Overall evaluation: 3 (strong accept)
>> Reviewer's confidence: 4 (expert)
>> Suitable for a demo?: 3 (yes)
>> Nominate for Best Paper Award: 1 (no)
>> Nominate for Best Student Paper Award (if eligible): 1 (no)
>> [Applications track ONLY]: Importance and novelty of the application: 6 (N/A (not an Applications track paper))
>> [Applications track ONLY]: Importance of planning/scheduling technology to the solution of the problem: 5 (N/A (not an Applications track paper))
>> [Applications track ONLY] Maturity: 7 (N/A (not an Applications track paper))
>> [Robotics track ONLY]: Balance of Robotics and Automated Planning and Scheduling: 6 (N/A (not a Robotics track paper))
>> [Robotics Track ONLY]: Evaluation on physical platforms/simulators: 6 (N/A (not a Robotics track paper))
>> [Robotics Track ONLY]: Significance of the contribution: 6 (N/A (not a Robotics track paper))
>>
>> ----------- Review -----------
>> This is only a placeholder review. Please ignore it.
>>
>> ----------------------- REVIEW 4 ---------------------
>> PAPER: 46
>> TITLE: Learning Abstracted Models and Hierarchies of Markov Decision Processes
>> AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie Milani, Shane Parr and Marie desJardins
>>
>> Significance: 2 (modest contribution or average impact)
>> Soundness: 2 (minor inconsistencies or small fixable errors)
>> Scholarship: 3 (excellent coverage of related work)
>> Clarity: 1 (hard to follow)
>> Reproducibility: 2 (some details missing but still appears to be replicable with some effort)
>> Overall evaluation: -1 (weak reject)
>> Reviewer's confidence: 3 (high)
>> Suitable for a demo?: 2 (maybe)
>> Nominate for Best Paper Award: 1 (no)
>> Nominate for Best Student Paper Award (if eligible): 1 (no)
>> [Applications track ONLY]: Importance and novelty of the application: 6 (N/A (not an Applications track paper))
>> [Applications track ONLY]: Importance of planning/scheduling technology to the solution of the problem: 5 (N/A (not an Applications track paper))
>> [Applications track ONLY] Maturity: 7 (N/A (not an Applications track paper))
>> [Robotics track ONLY]: Balance of Robotics and Automated Planning and Scheduling: 6 (N/A (not a Robotics track paper))
>> [Robotics Track ONLY]: Evaluation on physical platforms/simulators: 6 (N/A (not a Robotics track paper))
>> [Robotics Track ONLY]: Significance of the contribution: 6 (N/A (not a Robotics track paper))
>>
>> ----------- Review -----------
>> The paper describes an approach for learning abstract models and hierarchies for hierarchies of AMDPs. These hierarchies are similar, if not exactly the same, as those used by frameworks such as MAXQ, where each task in the hierarchy is an MDP with actions corresponding to child tasks. Prior AMDP work apparently uses hand-specified models of each task/AMDP, which are directly used for planning. This paper extends that work by learning the models of each task/AMDP. This is done using RMAX at each task. There is not a discussion of convergence guarantees of the approach. Apparently convergence must occur in a bottom-up way. Experiments are shown in two domains and with two hierarchies in one of the domains (Taxi). The approach appears to learn more efficiently than a prior approach R-MAXQ. The exact reasons for the increased efficiency were not exactly clear based on my understanding from the paper.
>>
>> The paper is well-written at a high level, but the more technical and formal descriptions could be improved quite a bit. For example, the key object AMDP, is only described informally (the tuple is not described in detail). Most of the paper is written quite informally.  Another example is that Table 1 talks about "max planner rollouts", but I didn't see where rollouts are used anywhere in the algorithm description.
>>
>> After reading the abstract and introduction, I expected that a big part of the contribution would be about actually learning the hierarchy. However, that does not seem to be the case. Rather, an off-the-shelf approach is used to learn hierarchies and then plugged into the proposed algorithm for learning the models of tasks. Further, this is only tried for one of the two experimental domains. The abstract and introduction should be more clear about the contributions of the paper.
>>
>> Overall, I was unclear about what to learn from the paper. The main contribution is apparently algorithm 1, which uses R-MAX to learn the models of each AMPD in a given hierarchy. Perhaps this is a novel algorithm, but it feels like more of a baseline in the sense that it is the first thing that one might try given the problem setup. I may not be appreciating some type of complexity that makes this not be straightforward. This baseline approach would have been more interesting if some form of convergence result was provided, similar to what was provided for R-MAXQ.
>>
>>
>> The experiments show that R-AMDP learns faster and is more computationally efficient than R-MAXQ. I was unable to get a good understanding for why this was the case. This is likely due to the fact that I was not able to revisit the R-MAXQ algorithm and it was not described in detail in this paper. The authors do try to explain the reasons for the performance improvement, but I was unable to follow exactly. My best guess based on the discussion is that R-MAXQ does not try to exploit the state abstraction provided for each task by the hierarchy ("R-MAXQ must compute a model over all possible future states in a planning envelope after each action"). Is this the primary reason or is there some other reason? Adding the ability to exploit abstractions in R-MAXQ seems straightforward, though maybe I'm missing something.
>>
>> ------------------------------------------------------
>>
>> Best wishes,
>> Gabi Röger and Sven Koenig
>> ICAPS 2018 program chairs
>>
>>
>> _______________________________________________
>> Robot-learning mailing list
>> Robot-learning at cs.umbc.edu
>> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>>
>>
>
> _______________________________________________
> Robot-learning mailing list
> Robot-learning at cs.umbc.edu
> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cs.umbc.edu/pipermail/robot-learning/attachments/20180121/f4cc0d70/attachment-0001.html>

[Robot-learning] Fwd: ICAPS 2018 review response (submission [*NUMBER*])

[Robot-learning] Fwd: ICAPS 2018 review response (submission [NUMBER])