[Robot-learning] Fwd: ICAPS 2018 review response (submission [NUMBER])

Mon Jan 22 15:18:56 EST 2018

I can do 2/2 any time in the afternoon.

Stefanie

On 01/22/2018 11:38 AM, Marie desJardins wrote:
> Fri 2/2 would be good for a phone call with Stefanie and Michael.  (Does 
> that work for both of you? -- if so, what time is good?  I'm fairly 
> unconstrained.)
> 
> Fri 2/2 won't work for a larger group meeting, though -- John will be at 
> AAAI.  I'll be traveling on Fri 2/9 but John should be back by then, so 
> maybe we could plan a joint group meeting that day -- do you still have 
> your regular meetings on Fridays?
> 
> Marie
> 
> 
> On 1/21/18 11:23 AM, Stefanie Tellex wrote:
>> I agree, for after the winter deadlines.
>>
>> Stefanie
>>
>> On 01/21/2018 04:51 AM, Lawson Wong wrote:
>>> So have kcaluru at brown.edu <mailto:kcaluru at brown.edu> and 
>>> miles_holland at brown.edu <mailto:miles_holland at brown.edu>
>>>
>>> The reviews look like typical planning-community reviews -- generally 
>>> sensible requests but clearly impossible to accomplish within the 
>>> page limit. I guess it's generally hard to please planning reviewers 
>>> unless there are some theoretical results. Review 2 actually reads a 
>>> little like one that Nakul got for his paper...
>>>
>>> I don't know if Michael and Stefanie have answered separately 
>>> regarding a meeting; it certainly sounds helpful to continue 
>>> discussing (AMDP) hierarchy learning. Both the IJCAI and RSS 
>>> deadlines are on that week (1/31 and 2/1 respectively), so if 
>>> possible it may be best to meet after those deadlines, such as on Fri 
>>> 2/2 -- unless the intent was to discuss before the IJCAI deadline.
>>>
>>> -Lawson
>>>
>>>
>>> On Sat, Jan 20, 2018 at 6:04 AM, Littman, Michael 
>>> <mlittman at cs.brown.edu <mailto:mlittman at cs.brown.edu>> wrote:
>>>
>>> christopher_grimm at brown.edu <mailto:christopher_grimm at brown.edu> has
>>>     graduated.
>>>
>>>
>>>     On Fri, Jan 19, 2018 at 12:06 PM, Marie desJardins
>>>     <mariedj at cs.umbc.edu <mailto:mariedj at cs.umbc.edu>> wrote:
>>>
>>>         Hi everyone,
>>>
>>>         I wanted to share the initial reviews we received on our ICAPS
>>>         submission (which I've also attached).  Based on the reviews, I
>>>         think the paper is unlikely to be accepted, so we are working to
>>>         see whether we can get some new results for an IJCAI submission.
>>>         We are making good progress on developing hierarchical learning
>>>         methods for AMDPs but we need to (a) move to larger/more complex
>>>         domains, (b) develop some theoretical analysis (complexity,
>>>         correctness, convergence), and (c) work on more AMDP-specific
>>>         hierarchy learning techniques (right now we are using an
>>>         off-the-shelf method called HierGen that works well but may not
>>>         necessarily find the best hierarchy for an AMDP representation).
>>>
>>>         I'd be very interested to talk more about how this relates to
>>>         the work that's happening at Brown, and to hear any
>>>         feedback/ideas you might have about this work.
>>>
>>>         Michael/Stephanie, could we maybe set up a time for the three of
>>>         us to have a teleconference?  I'll be on vacation next week but
>>>         the week after that would be good.  Possible times for me -- Mon
>>>         1/29 before 11:30am, between 1-2, or after 4pm; Wed 1/31 before
>>>         10am or after 2pm; Thu 2/1 between 11-1:30 or 3-4; Fri 2/2 
>>> any time.
>>>
>>>         BTW, these are the Brown students who are on this list. Please
>>>         let me know if anyone should be added or removed.
>>>
>>> carl_trimbach at brown.edu <mailto:carl_trimbach at brown.edu>
>>> christopher_grimm at brown.edu <mailto:christopher_grimm at brown.edu>
>>> david_abel at brown.edu <mailto:david_abel at brown.edu>
>>> dilip.arumugam at gmail.com <mailto:dilip.arumugam at gmail.com>
>>> edward_c_williams at brown.edu <mailto:edward_c_williams at brown.edu>
>>> jun_ki_lee at brown.edu <mailto:jun_ki_lee at brown.edu>
>>> kcaluru at brown.edu <mailto:kcaluru at brown.edu>
>>> lsw at brown.edu <mailto:lsw at brown.edu>
>>> lucas_lehnert at brown.edu <mailto:lucas_lehnert at brown.edu>
>>> melrose_roderick at brown.edu <mailto:melrose_roderick at brown.edu>
>>> miles_holland at brown.edu <mailto:miles_holland at brown.edu>
>>> nakul_gopalan at brown.edu <mailto:nakul_gopalan at brown.edu>
>>> oberlin at cs.brown.edu <mailto:oberlin at cs.brown.edu>
>>> sam_saarinen at brown.edu <mailto:sam_saarinen at brown.edu>
>>> siddharth_karamcheti at brown.edu
>>> <mailto:siddharth_karamcheti at brown.edu>
>>>
>>>         Marie
>>>
>>>
>>>         -------- Forwarded Message --------
>>>         Subject:     ICAPS 2018 review response (submission [*NUMBER*])
>>>         Date:     Thu, 11 Jan 2018 14:59:19 +0100
>>>         From:     ICAPS 2018 <icaps2018 at easychair.org>
>>> <mailto:icaps2018 at easychair.org>
>>>         To:     Marie desJardins <mariedj at umbc.edu> 
>>> <mailto:mariedj at umbc.edu>
>>>
>>>
>>>
>>>         Dear Marie,
>>>
>>>         Thank you for your submission to ICAPS 2018. The ICAPS 2018 
>>> review
>>>         response period starts now and ends at January 13.
>>>
>>>         During this time, you will have access to the current state 
>>> of your
>>>         reviews and have the opportunity to submit a response. Please 
>>> keep in
>>>         mind the following during this process:
>>>
>>>         * Most papers have a so-called placeholder review, which was
>>>            necessary to give the discussion leaders access to the 
>>> reviewer
>>>            discussion. Some of these reviews list questions that 
>>> already came
>>>            up during the discussion and which you may address in your 
>>> response but
>>>            in all cases the (usually enthusiastic) scores are 
>>> meaningless and you
>>>            should ignore them. Placeholder reviews are clearly 
>>> indicated as such in
>>>            the review.
>>>
>>>         * Almost all papers have three reviews. Some may have four. A 
>>> very
>>>            low number of papers are missing one review. We hope to 
>>> get that
>>>            review completed in the next day. We apologize for this.
>>>
>>>         * The deadline for entering a response is January 13th (at 
>>> 11:59pm
>>>            UTC-12 i.e. anywhere in the world).
>>>
>>>         * Responses must be submitted through EasyChair.
>>>
>>>         * Responses are limited to 1000 words in total. You can only 
>>> enter
>>>            one response, not one per review.
>>>
>>>         * You will not be able to change your response after it is 
>>> submitted.
>>>
>>>         * The response must focus on any factual errors in the 
>>> reviews and any
>>>            questions posed by the reviewers. Try to be as concise and 
>>> as to the
>>>            point as possible.
>>>
>>>         * The review response period is an opportunity to react to the
>>>            reviews, but not a requirement to do so. Thus, if you feel 
>>> the reviews
>>>            are accurate and the reviewers have not asked any 
>>> questions, then you
>>>            do not have to respond.
>>>
>>>         * The reviews are as submitted by the PC members, without much
>>>            coordination between them. Thus, there may be 
>>> inconsistencies.
>>>            Furthermore, these are not the final versions of the 
>>> reviews. The
>>>            reviews can later be updated to take into account the 
>>> discussions at
>>>            the program committee meeting, and we may find it 
>>> necessary to solicit
>>>            other outside reviews after the review response period.
>>>
>>>         * The program committee will read your responses carefully and
>>>            take this information into account during the discussions. 
>>> On the
>>>            other hand, the program committee may not directly respond 
>>> to your
>>>            responses in the final versions of the reviews.
>>>
>>>         The reviews on your paper are attached to this letter. To 
>>> submit your
>>>         response you should log on the EasyChair Web page for ICAPS 
>>> 2018 and
>>>         select your submission on the menu.
>>>
>>>         ----------------------- REVIEW 1 ---------------------
>>>         PAPER: 46
>>>         TITLE: Learning Abstracted Models and Hierarchies of Markov 
>>> Decision Processes
>>>         AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie 
>>> Milani, Shane Parr and Marie desJardins
>>>
>>>         Significance: 2 (modest contribution or average impact)
>>>         Soundness: 3 (correct)
>>>         Scholarship: 3 (excellent coverage of related work)
>>>         Clarity: 3 (well written)
>>>         Reproducibility: 3 (authors describe the implementation and 
>>> domains in sufficient detail)
>>>         Overall evaluation: 1 (weak accept)
>>>         Reviewer's confidence: 2 (medium)
>>>         Suitable for a demo?: 1 (no)
>>>         Nominate for Best Paper Award: 1 (no)
>>>         Nominate for Best Student Paper Award (if eligible): 1 (no)
>>>         [Applications track ONLY]: Importance and novelty of the 
>>> application: 6 (N/A (not an Applications track paper))
>>>         [Applications track ONLY]: Importance of planning/scheduling 
>>> technology to the solution of the problem: 5 (N/A (not an 
>>> Applications track paper))
>>>         [Applications track ONLY] Maturity: 7 (N/A (not an 
>>> Applications track paper))
>>>         [Robotics track ONLY]: Balance of Robotics and Automated 
>>> Planning and Scheduling: 6 (N/A (not a Robotics track paper))
>>>         [Robotics Track ONLY]: Evaluation on physical 
>>> platforms/simulators: 6 (N/A (not a Robotics track paper))
>>>         [Robotics Track ONLY]: Significance of the contribution: 6 
>>> (N/A (not a Robotics track paper))
>>>
>>>         ----------- Review -----------
>>>         The paper proposes a method for learning abstract Markov 
>>> decision processes (AMDP) from demonstration trajectories and model 
>>> based reinforcement learning. Experiments show that the method is 
>>> more effective than the baseline.
>>>
>>>         On the positive side, a complete method for learning AMDP is 
>>> given and is shown to be work on the problems used in the 
>>> experiments. The proposed model based reinforcement learning method 
>>> based on R-MAX is also shown to outperform the baseline R-MAXQ.
>>>
>>>         On the negative side, the method for learning the hierarchy, 
>>> HierGen, is taken from a prior work, leaving the adaptation of R-MAX 
>>> to learn with hierarchy as the main algorithmic novelty. No 
>>> convergence proof for the earning method is provided, although it is 
>>> empirically shown to outperform the baseline R-MAXQ. The experiments 
>>> are done on toy problems, indicating that the method is probably not 
>>> ready for more demanding practical problems.
>>>
>>>         Overall, I am inclined to vote weak accept. The problem is 
>>> difficult, so I think that the work does represent progress, although 
>>> it is not yet compelling.
>>>
>>>         ----------------------- REVIEW 2 ---------------------
>>>         PAPER: 46
>>>         TITLE: Learning Abstracted Models and Hierarchies of Markov 
>>> Decision Processes
>>>         AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie 
>>> Milani, Shane Parr and Marie desJardins
>>>
>>>         Significance: 2 (modest contribution or average impact)
>>>         Soundness: 3 (correct)
>>>         Scholarship: 2 (relevant literature cited but could be expanded)
>>>         Clarity: 3 (well written)
>>>         Reproducibility: 3 (authors describe the implementation and 
>>> domains in sufficient detail)
>>>         Overall evaluation: -1 (weak reject)
>>>         Reviewer's confidence: 4 (expert)
>>>         Suitable for a demo?: 2 (maybe)
>>>         Nominate for Best Paper Award: 1 (no)
>>>         Nominate for Best Student Paper Award (if eligible): 1 (no)
>>>         [Applications track ONLY]: Importance and novelty of the 
>>> application: 6 (N/A (not an Applications track paper))
>>>         [Applications track ONLY]: Importance of planning/scheduling 
>>> technology to the solution of the problem: 5 (N/A (not an 
>>> Applications track paper))
>>>         [Applications track ONLY] Maturity: 7 (N/A (not an 
>>> Applications track paper))
>>>         [Robotics track ONLY]: Balance of Robotics and Automated 
>>> Planning and Scheduling: 6 (N/A (not a Robotics track paper))
>>>         [Robotics Track ONLY]: Evaluation on physical 
>>> platforms/simulators: 6 (N/A (not a Robotics track paper))
>>>         [Robotics Track ONLY]: Significance of the contribution: 6 
>>> (N/A (not a Robotics track paper))
>>>
>>>         ----------- Review -----------
>>>         The authors introduce a reinforcement learning algorithm for 
>>> AMDPs that learns a hierarchical structure and a set of hierarchical 
>>> models. To learn the hierarchical structure, they rely on an existing 
>>> algorithm called HierGen. This algorithm extracts causal structure 
>>> from a set of expert trajectories in a factored state environment.
>>>
>>>         While R-AMDP outperforms R-MAXQ on the two toy problems, I 
>>> think there is a lot more work to do to show that R-AMDP is a good 
>>> basis for developing more general algorithms. First, it would be nice 
>>> to examine the computational complexity of R-AMDP (rather than just 
>>> empirical comparison in Figure 3). Second, what if R-AMDP is just 
>>> getting lucky in the two toy tasks presented. Maybe there are other 
>>> problems where R-AMDP performs poorly. Further, stopping the plots at 
>>> 50 or 60 trials may just be misleading since R-AMDP could be 
>>> converging to a suboptimal but pretty good policy early on. It’s also 
>>> not clear that R-AMDP can be scaled to huge state or action spaces. 
>>> Does the hierarchical structure discovered by HierGen lend itself to 
>>> transfer when the dynamics change? It would be nice to have a more 
>>> rigorous analysis of R-AMDP and a longer discussion of its potential 
>>> pitfalls (when should we expected it to succeed and when should it 
>>> fail?). There is a hind of this in the discussio!
>>>           n about HierGen’s inability to distinguish between 
>>> correlation and causation.
>>>
>>>         While reading the abstract I expected the contribution to be 
>>> in learning the hierarchy. The authors should probably change the 
>>> abstract to avoid this confusion.
>>>
>>>         ----------------------- REVIEW 3 ---------------------
>>>         PAPER: 46
>>>         TITLE: Learning Abstracted Models and Hierarchies of Markov 
>>> Decision Processes
>>>         AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie 
>>> Milani, Shane Parr and Marie desJardins
>>>
>>>         Significance: 3 (substantial contribution or strong impact)
>>>         Soundness: 3 (correct)
>>>         Scholarship: 3 (excellent coverage of related work)
>>>         Clarity: 3 (well written)
>>>         Reproducibility: 5 (code and domains (whichever apply) are 
>>> already publicly available)
>>>         Overall evaluation: 3 (strong accept)
>>>         Reviewer's confidence: 4 (expert)
>>>         Suitable for a demo?: 3 (yes)
>>>         Nominate for Best Paper Award: 1 (no)
>>>         Nominate for Best Student Paper Award (if eligible): 1 (no)
>>>         [Applications track ONLY]: Importance and novelty of the 
>>> application: 6 (N/A (not an Applications track paper))
>>>         [Applications track ONLY]: Importance of planning/scheduling 
>>> technology to the solution of the problem: 5 (N/A (not an 
>>> Applications track paper))
>>>         [Applications track ONLY] Maturity: 7 (N/A (not an 
>>> Applications track paper))
>>>         [Robotics track ONLY]: Balance of Robotics and Automated 
>>> Planning and Scheduling: 6 (N/A (not a Robotics track paper))
>>>         [Robotics Track ONLY]: Evaluation on physical 
>>> platforms/simulators: 6 (N/A (not a Robotics track paper))
>>>         [Robotics Track ONLY]: Significance of the contribution: 6 
>>> (N/A (not a Robotics track paper))
>>>
>>>         ----------- Review -----------
>>>         This is only a placeholder review. Please ignore it.
>>>
>>>         ----------------------- REVIEW 4 ---------------------
>>>         PAPER: 46
>>>         TITLE: Learning Abstracted Models and Hierarchies of Markov 
>>> Decision Processes
>>>         AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie 
>>> Milani, Shane Parr and Marie desJardins
>>>
>>>         Significance: 2 (modest contribution or average impact)
>>>         Soundness: 2 (minor inconsistencies or small fixable errors)
>>>         Scholarship: 3 (excellent coverage of related work)
>>>         Clarity: 1 (hard to follow)
>>>         Reproducibility: 2 (some details missing but still appears to 
>>> be replicable with some effort)
>>>         Overall evaluation: -1 (weak reject)
>>>         Reviewer's confidence: 3 (high)
>>>         Suitable for a demo?: 2 (maybe)
>>>         Nominate for Best Paper Award: 1 (no)
>>>         Nominate for Best Student Paper Award (if eligible): 1 (no)
>>>         [Applications track ONLY]: Importance and novelty of the 
>>> application: 6 (N/A (not an Applications track paper))
>>>         [Applications track ONLY]: Importance of planning/scheduling 
>>> technology to the solution of the problem: 5 (N/A (not an 
>>> Applications track paper))
>>>         [Applications track ONLY] Maturity: 7 (N/A (not an 
>>> Applications track paper))
>>>         [Robotics track ONLY]: Balance of Robotics and Automated 
>>> Planning and Scheduling: 6 (N/A (not a Robotics track paper))
>>>         [Robotics Track ONLY]: Evaluation on physical 
>>> platforms/simulators: 6 (N/A (not a Robotics track paper))
>>>         [Robotics Track ONLY]: Significance of the contribution: 6 
>>> (N/A (not a Robotics track paper))
>>>
>>>         ----------- Review -----------
>>>         The paper describes an approach for learning abstract models 
>>> and hierarchies for hierarchies of AMDPs. These hierarchies are 
>>> similar, if not exactly the same, as those used by frameworks such as 
>>> MAXQ, where each task in the hierarchy is an MDP with actions 
>>> corresponding to child tasks. Prior AMDP work apparently uses 
>>> hand-specified models of each task/AMDP, which are directly used for 
>>> planning. This paper extends that work by learning the models of each 
>>> task/AMDP. This is done using RMAX at each task. There is not a 
>>> discussion of convergence guarantees of the approach. Apparently 
>>> convergence must occur in a bottom-up way. Experiments are shown in 
>>> two domains and with two hierarchies in one of the domains (Taxi). 
>>> The approach appears to learn more efficiently than a prior approach 
>>> R-MAXQ. The exact reasons for the increased efficiency were not 
>>> exactly clear based on my understanding from the paper.
>>>
>>>         The paper is well-written at a high level, but the more 
>>> technical and formal descriptions could be improved quite a bit. For 
>>> example, the key object AMDP, is only described informally (the tuple 
>>> is not described in detail). Most of the paper is written quite 
>>> informally.  Another example is that Table 1 talks about "max planner 
>>> rollouts", but I didn't see where rollouts are used anywhere in the 
>>> algorithm description.
>>>
>>>         After reading the abstract and introduction, I expected that 
>>> a big part of the contribution would be about actually learning the 
>>> hierarchy. However, that does not seem to be the case. Rather, an 
>>> off-the-shelf approach is used to learn hierarchies and then plugged 
>>> into the proposed algorithm for learning the models of tasks. 
>>> Further, this is only tried for one of the two experimental domains. 
>>> The abstract and introduction should be more clear about the 
>>> contributions of the paper.
>>>
>>>         Overall, I was unclear about what to learn from the paper. 
>>> The main contribution is apparently algorithm 1, which uses R-MAX to 
>>> learn the models of each AMPD in a given hierarchy. Perhaps this is a 
>>> novel algorithm, but it feels like more of a baseline in the sense 
>>> that it is the first thing that one might try given the problem 
>>> setup. I may not be appreciating some type of complexity that makes 
>>> this not be straightforward. This baseline approach would have been 
>>> more interesting if some form of convergence result was provided, 
>>> similar to what was provided for R-MAXQ.
>>>
>>>
>>>         The experiments show that R-AMDP learns faster and is more 
>>> computationally efficient than R-MAXQ. I was unable to get a good 
>>> understanding for why this was the case. This is likely due to the 
>>> fact that I was not able to revisit the R-MAXQ algorithm and it was 
>>> not described in detail in this paper. The authors do try to explain 
>>> the reasons for the performance improvement, but I was unable to 
>>> follow exactly. My best guess based on the discussion is that R-MAXQ 
>>> does not try to exploit the state abstraction provided for each task 
>>> by the hierarchy ("R-MAXQ must compute a model over all possible 
>>> future states in a planning envelope after each action"). Is this the 
>>> primary reason or is there some other reason? Adding the ability to 
>>> exploit abstractions in R-MAXQ seems straightforward, though maybe 
>>> I'm missing something.
>>>
>>>         ------------------------------------------------------
>>>
>>>         Best wishes,
>>>         Gabi Röger and Sven Koenig
>>>         ICAPS 2018 program chairs
>>>
>>>
>>>         _______________________________________________
>>>         Robot-learning mailing list
>>> Robot-learning at cs.umbc.edu <mailto:Robot-learning at cs.umbc.edu>
>>> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>>> <https://lists.cs.umbc.edu/mailman/listinfo/robot-learning>
>>>
>>>
>>>
>>>     _______________________________________________
>>>     Robot-learning mailing list
>>> Robot-learning at cs.umbc.edu <mailto:Robot-learning at cs.umbc.edu>
>>> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>>> <https://lists.cs.umbc.edu/mailman/listinfo/robot-learning>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Robot-learning mailing list
>>> Robot-learning at cs.umbc.edu
>>> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>>>
>>
>> _______________________________________________
>> Robot-learning mailing list
>> Robot-learning at cs.umbc.edu
>> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
> 
> -- 
> Dr. Marie desJardins
> Associate Dean for Academic Affairs
> College of Engineering and Information Technology
> University of Maryland, Baltimore County
> 1000 Hilltop Circle
> Baltimore MD 21250
> 
> Email: mariedj at umbc.edu
> Voice: 410-455-3967
> Fax: 410-455-3559
> 
> 
> _______________________________________________
> Robot-learning mailing list
> Robot-learning at cs.umbc.edu
> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
> 

[Robot-learning] Fwd: ICAPS 2018 review response (submission [*NUMBER*])

[Robot-learning] Fwd: ICAPS 2018 review response (submission [NUMBER])