[Robot-learning] Fwd: ICAPS 2018 review response (submission [*NUMBER*])
Littman, Michael
mlittman at cs.brown.edu
Mon Jan 22 15:22:25 EST 2018
Meaning not 11am, but starting at noon? I can do noon.
On Mon, Jan 22, 2018 at 3:18 PM, Stefanie Tellex <stefie10 at cs.brown.edu>
wrote:
> I can do 2/2 any time in the afternoon.
>
> Stefanie
>
>
> On 01/22/2018 11:38 AM, Marie desJardins wrote:
>
>> Fri 2/2 would be good for a phone call with Stefanie and Michael. (Does
>> that work for both of you? -- if so, what time is good? I'm fairly
>> unconstrained.)
>>
>> Fri 2/2 won't work for a larger group meeting, though -- John will be at
>> AAAI. I'll be traveling on Fri 2/9 but John should be back by then, so
>> maybe we could plan a joint group meeting that day -- do you still have
>> your regular meetings on Fridays?
>>
>> Marie
>>
>>
>> On 1/21/18 11:23 AM, Stefanie Tellex wrote:
>>
>>> I agree, for after the winter deadlines.
>>>
>>> Stefanie
>>>
>>> On 01/21/2018 04:51 AM, Lawson Wong wrote:
>>>
>>>> So have kcaluru at brown.edu <mailto:kcaluru at brown.edu> and
>>>> miles_holland at brown.edu <mailto:miles_holland at brown.edu>
>>>>
>>>> The reviews look like typical planning-community reviews -- generally
>>>> sensible requests but clearly impossible to accomplish within the page
>>>> limit. I guess it's generally hard to please planning reviewers unless
>>>> there are some theoretical results. Review 2 actually reads a little like
>>>> one that Nakul got for his paper...
>>>>
>>>> I don't know if Michael and Stefanie have answered separately regarding
>>>> a meeting; it certainly sounds helpful to continue discussing (AMDP)
>>>> hierarchy learning. Both the IJCAI and RSS deadlines are on that week (1/31
>>>> and 2/1 respectively), so if possible it may be best to meet after those
>>>> deadlines, such as on Fri 2/2 -- unless the intent was to discuss before
>>>> the IJCAI deadline.
>>>>
>>>> -Lawson
>>>>
>>>>
>>>> On Sat, Jan 20, 2018 at 6:04 AM, Littman, Michael <
>>>> mlittman at cs.brown.edu <mailto:mlittman at cs.brown.edu>> wrote:
>>>>
>>>> christopher_grimm at brown.edu <mailto:christopher_grimm at brown.edu> has
>>>> graduated.
>>>>
>>>>
>>>> On Fri, Jan 19, 2018 at 12:06 PM, Marie desJardins
>>>> <mariedj at cs.umbc.edu <mailto:mariedj at cs.umbc.edu>> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I wanted to share the initial reviews we received on our ICAPS
>>>> submission (which I've also attached). Based on the reviews, I
>>>> think the paper is unlikely to be accepted, so we are working to
>>>> see whether we can get some new results for an IJCAI submission.
>>>> We are making good progress on developing hierarchical learning
>>>> methods for AMDPs but we need to (a) move to larger/more complex
>>>> domains, (b) develop some theoretical analysis (complexity,
>>>> correctness, convergence), and (c) work on more AMDP-specific
>>>> hierarchy learning techniques (right now we are using an
>>>> off-the-shelf method called HierGen that works well but may not
>>>> necessarily find the best hierarchy for an AMDP representation).
>>>>
>>>> I'd be very interested to talk more about how this relates to
>>>> the work that's happening at Brown, and to hear any
>>>> feedback/ideas you might have about this work.
>>>>
>>>> Michael/Stephanie, could we maybe set up a time for the three of
>>>> us to have a teleconference? I'll be on vacation next week but
>>>> the week after that would be good. Possible times for me -- Mon
>>>> 1/29 before 11:30am, between 1-2, or after 4pm; Wed 1/31 before
>>>> 10am or after 2pm; Thu 2/1 between 11-1:30 or 3-4; Fri 2/2 any
>>>> time.
>>>>
>>>> BTW, these are the Brown students who are on this list. Please
>>>> let me know if anyone should be added or removed.
>>>>
>>>> carl_trimbach at brown.edu <mailto:carl_trimbach at brown.edu>
>>>> christopher_grimm at brown.edu <mailto:christopher_grimm at brown.edu>
>>>> david_abel at brown.edu <mailto:david_abel at brown.edu>
>>>> dilip.arumugam at gmail.com <mailto:dilip.arumugam at gmail.com>
>>>> edward_c_williams at brown.edu <mailto:edward_c_williams at brown.edu>
>>>> jun_ki_lee at brown.edu <mailto:jun_ki_lee at brown.edu>
>>>> kcaluru at brown.edu <mailto:kcaluru at brown.edu>
>>>> lsw at brown.edu <mailto:lsw at brown.edu>
>>>> lucas_lehnert at brown.edu <mailto:lucas_lehnert at brown.edu>
>>>> melrose_roderick at brown.edu <mailto:melrose_roderick at brown.edu>
>>>> miles_holland at brown.edu <mailto:miles_holland at brown.edu>
>>>> nakul_gopalan at brown.edu <mailto:nakul_gopalan at brown.edu>
>>>> oberlin at cs.brown.edu <mailto:oberlin at cs.brown.edu>
>>>> sam_saarinen at brown.edu <mailto:sam_saarinen at brown.edu>
>>>> siddharth_karamcheti at brown.edu
>>>> <mailto:siddharth_karamcheti at brown.edu>
>>>>
>>>> Marie
>>>>
>>>>
>>>> -------- Forwarded Message --------
>>>> Subject: ICAPS 2018 review response (submission [*NUMBER*])
>>>> Date: Thu, 11 Jan 2018 14:59:19 +0100
>>>> From: ICAPS 2018 <icaps2018 at easychair.org>
>>>> <mailto:icaps2018 at easychair.org>
>>>> To: Marie desJardins <mariedj at umbc.edu> <mailto:
>>>> mariedj at umbc.edu>
>>>>
>>>>
>>>>
>>>> Dear Marie,
>>>>
>>>> Thank you for your submission to ICAPS 2018. The ICAPS 2018
>>>> review
>>>> response period starts now and ends at January 13.
>>>>
>>>> During this time, you will have access to the current state of
>>>> your
>>>> reviews and have the opportunity to submit a response. Please
>>>> keep in
>>>> mind the following during this process:
>>>>
>>>> * Most papers have a so-called placeholder review, which was
>>>> necessary to give the discussion leaders access to the
>>>> reviewer
>>>> discussion. Some of these reviews list questions that
>>>> already came
>>>> up during the discussion and which you may address in your
>>>> response but
>>>> in all cases the (usually enthusiastic) scores are
>>>> meaningless and you
>>>> should ignore them. Placeholder reviews are clearly
>>>> indicated as such in
>>>> the review.
>>>>
>>>> * Almost all papers have three reviews. Some may have four. A
>>>> very
>>>> low number of papers are missing one review. We hope to get
>>>> that
>>>> review completed in the next day. We apologize for this.
>>>>
>>>> * The deadline for entering a response is January 13th (at
>>>> 11:59pm
>>>> UTC-12 i.e. anywhere in the world).
>>>>
>>>> * Responses must be submitted through EasyChair.
>>>>
>>>> * Responses are limited to 1000 words in total. You can only
>>>> enter
>>>> one response, not one per review.
>>>>
>>>> * You will not be able to change your response after it is
>>>> submitted.
>>>>
>>>> * The response must focus on any factual errors in the reviews
>>>> and any
>>>> questions posed by the reviewers. Try to be as concise and
>>>> as to the
>>>> point as possible.
>>>>
>>>> * The review response period is an opportunity to react to the
>>>> reviews, but not a requirement to do so. Thus, if you feel
>>>> the reviews
>>>> are accurate and the reviewers have not asked any questions,
>>>> then you
>>>> do not have to respond.
>>>>
>>>> * The reviews are as submitted by the PC members, without much
>>>> coordination between them. Thus, there may be
>>>> inconsistencies.
>>>> Furthermore, these are not the final versions of the
>>>> reviews. The
>>>> reviews can later be updated to take into account the
>>>> discussions at
>>>> the program committee meeting, and we may find it necessary
>>>> to solicit
>>>> other outside reviews after the review response period.
>>>>
>>>> * The program committee will read your responses carefully and
>>>> take this information into account during the discussions.
>>>> On the
>>>> other hand, the program committee may not directly respond
>>>> to your
>>>> responses in the final versions of the reviews.
>>>>
>>>> The reviews on your paper are attached to this letter. To
>>>> submit your
>>>> response you should log on the EasyChair Web page for ICAPS
>>>> 2018 and
>>>> select your submission on the menu.
>>>>
>>>> ----------------------- REVIEW 1 ---------------------
>>>> PAPER: 46
>>>> TITLE: Learning Abstracted Models and Hierarchies of Markov
>>>> Decision Processes
>>>> AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie
>>>> Milani, Shane Parr and Marie desJardins
>>>>
>>>> Significance: 2 (modest contribution or average impact)
>>>> Soundness: 3 (correct)
>>>> Scholarship: 3 (excellent coverage of related work)
>>>> Clarity: 3 (well written)
>>>> Reproducibility: 3 (authors describe the implementation and
>>>> domains in sufficient detail)
>>>> Overall evaluation: 1 (weak accept)
>>>> Reviewer's confidence: 2 (medium)
>>>> Suitable for a demo?: 1 (no)
>>>> Nominate for Best Paper Award: 1 (no)
>>>> Nominate for Best Student Paper Award (if eligible): 1 (no)
>>>> [Applications track ONLY]: Importance and novelty of the
>>>> application: 6 (N/A (not an Applications track paper))
>>>> [Applications track ONLY]: Importance of planning/scheduling
>>>> technology to the solution of the problem: 5 (N/A (not an Applications
>>>> track paper))
>>>> [Applications track ONLY] Maturity: 7 (N/A (not an Applications
>>>> track paper))
>>>> [Robotics track ONLY]: Balance of Robotics and Automated
>>>> Planning and Scheduling: 6 (N/A (not a Robotics track paper))
>>>> [Robotics Track ONLY]: Evaluation on physical
>>>> platforms/simulators: 6 (N/A (not a Robotics track paper))
>>>> [Robotics Track ONLY]: Significance of the contribution: 6 (N/A
>>>> (not a Robotics track paper))
>>>>
>>>> ----------- Review -----------
>>>> The paper proposes a method for learning abstract Markov
>>>> decision processes (AMDP) from demonstration trajectories and model based
>>>> reinforcement learning. Experiments show that the method is more effective
>>>> than the baseline.
>>>>
>>>> On the positive side, a complete method for learning AMDP is
>>>> given and is shown to be work on the problems used in the experiments. The
>>>> proposed model based reinforcement learning method based on R-MAX is also
>>>> shown to outperform the baseline R-MAXQ.
>>>>
>>>> On the negative side, the method for learning the hierarchy,
>>>> HierGen, is taken from a prior work, leaving the adaptation of R-MAX to
>>>> learn with hierarchy as the main algorithmic novelty. No convergence proof
>>>> for the earning method is provided, although it is empirically shown to
>>>> outperform the baseline R-MAXQ. The experiments are done on toy problems,
>>>> indicating that the method is probably not ready for more demanding
>>>> practical problems.
>>>>
>>>> Overall, I am inclined to vote weak accept. The problem is
>>>> difficult, so I think that the work does represent progress, although it is
>>>> not yet compelling.
>>>>
>>>> ----------------------- REVIEW 2 ---------------------
>>>> PAPER: 46
>>>> TITLE: Learning Abstracted Models and Hierarchies of Markov
>>>> Decision Processes
>>>> AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie
>>>> Milani, Shane Parr and Marie desJardins
>>>>
>>>> Significance: 2 (modest contribution or average impact)
>>>> Soundness: 3 (correct)
>>>> Scholarship: 2 (relevant literature cited but could be expanded)
>>>> Clarity: 3 (well written)
>>>> Reproducibility: 3 (authors describe the implementation and
>>>> domains in sufficient detail)
>>>> Overall evaluation: -1 (weak reject)
>>>> Reviewer's confidence: 4 (expert)
>>>> Suitable for a demo?: 2 (maybe)
>>>> Nominate for Best Paper Award: 1 (no)
>>>> Nominate for Best Student Paper Award (if eligible): 1 (no)
>>>> [Applications track ONLY]: Importance and novelty of the
>>>> application: 6 (N/A (not an Applications track paper))
>>>> [Applications track ONLY]: Importance of planning/scheduling
>>>> technology to the solution of the problem: 5 (N/A (not an Applications
>>>> track paper))
>>>> [Applications track ONLY] Maturity: 7 (N/A (not an Applications
>>>> track paper))
>>>> [Robotics track ONLY]: Balance of Robotics and Automated
>>>> Planning and Scheduling: 6 (N/A (not a Robotics track paper))
>>>> [Robotics Track ONLY]: Evaluation on physical
>>>> platforms/simulators: 6 (N/A (not a Robotics track paper))
>>>> [Robotics Track ONLY]: Significance of the contribution: 6 (N/A
>>>> (not a Robotics track paper))
>>>>
>>>> ----------- Review -----------
>>>> The authors introduce a reinforcement learning algorithm for
>>>> AMDPs that learns a hierarchical structure and a set of hierarchical
>>>> models. To learn the hierarchical structure, they rely on an existing
>>>> algorithm called HierGen. This algorithm extracts causal structure from a
>>>> set of expert trajectories in a factored state environment.
>>>>
>>>> While R-AMDP outperforms R-MAXQ on the two toy problems, I
>>>> think there is a lot more work to do to show that R-AMDP is a good basis
>>>> for developing more general algorithms. First, it would be nice to examine
>>>> the computational complexity of R-AMDP (rather than just empirical
>>>> comparison in Figure 3). Second, what if R-AMDP is just getting lucky in
>>>> the two toy tasks presented. Maybe there are other problems where R-AMDP
>>>> performs poorly. Further, stopping the plots at 50 or 60 trials may just be
>>>> misleading since R-AMDP could be converging to a suboptimal but pretty good
>>>> policy early on. It’s also not clear that R-AMDP can be scaled to huge
>>>> state or action spaces. Does the hierarchical structure discovered by
>>>> HierGen lend itself to transfer when the dynamics change? It would be nice
>>>> to have a more rigorous analysis of R-AMDP and a longer discussion of its
>>>> potential pitfalls (when should we expected it to succeed and when should
>>>> it fail?). There is a hind of this in the discussio!
>>>> n about HierGen’s inability to distinguish between
>>>> correlation and causation.
>>>>
>>>> While reading the abstract I expected the contribution to be in
>>>> learning the hierarchy. The authors should probably change the abstract to
>>>> avoid this confusion.
>>>>
>>>> ----------------------- REVIEW 3 ---------------------
>>>> PAPER: 46
>>>> TITLE: Learning Abstracted Models and Hierarchies of Markov
>>>> Decision Processes
>>>> AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie
>>>> Milani, Shane Parr and Marie desJardins
>>>>
>>>> Significance: 3 (substantial contribution or strong impact)
>>>> Soundness: 3 (correct)
>>>> Scholarship: 3 (excellent coverage of related work)
>>>> Clarity: 3 (well written)
>>>> Reproducibility: 5 (code and domains (whichever apply) are
>>>> already publicly available)
>>>> Overall evaluation: 3 (strong accept)
>>>> Reviewer's confidence: 4 (expert)
>>>> Suitable for a demo?: 3 (yes)
>>>> Nominate for Best Paper Award: 1 (no)
>>>> Nominate for Best Student Paper Award (if eligible): 1 (no)
>>>> [Applications track ONLY]: Importance and novelty of the
>>>> application: 6 (N/A (not an Applications track paper))
>>>> [Applications track ONLY]: Importance of planning/scheduling
>>>> technology to the solution of the problem: 5 (N/A (not an Applications
>>>> track paper))
>>>> [Applications track ONLY] Maturity: 7 (N/A (not an Applications
>>>> track paper))
>>>> [Robotics track ONLY]: Balance of Robotics and Automated
>>>> Planning and Scheduling: 6 (N/A (not a Robotics track paper))
>>>> [Robotics Track ONLY]: Evaluation on physical
>>>> platforms/simulators: 6 (N/A (not a Robotics track paper))
>>>> [Robotics Track ONLY]: Significance of the contribution: 6 (N/A
>>>> (not a Robotics track paper))
>>>>
>>>> ----------- Review -----------
>>>> This is only a placeholder review. Please ignore it.
>>>>
>>>> ----------------------- REVIEW 4 ---------------------
>>>> PAPER: 46
>>>> TITLE: Learning Abstracted Models and Hierarchies of Markov
>>>> Decision Processes
>>>> AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie
>>>> Milani, Shane Parr and Marie desJardins
>>>>
>>>> Significance: 2 (modest contribution or average impact)
>>>> Soundness: 2 (minor inconsistencies or small fixable errors)
>>>> Scholarship: 3 (excellent coverage of related work)
>>>> Clarity: 1 (hard to follow)
>>>> Reproducibility: 2 (some details missing but still appears to
>>>> be replicable with some effort)
>>>> Overall evaluation: -1 (weak reject)
>>>> Reviewer's confidence: 3 (high)
>>>> Suitable for a demo?: 2 (maybe)
>>>> Nominate for Best Paper Award: 1 (no)
>>>> Nominate for Best Student Paper Award (if eligible): 1 (no)
>>>> [Applications track ONLY]: Importance and novelty of the
>>>> application: 6 (N/A (not an Applications track paper))
>>>> [Applications track ONLY]: Importance of planning/scheduling
>>>> technology to the solution of the problem: 5 (N/A (not an Applications
>>>> track paper))
>>>> [Applications track ONLY] Maturity: 7 (N/A (not an Applications
>>>> track paper))
>>>> [Robotics track ONLY]: Balance of Robotics and Automated
>>>> Planning and Scheduling: 6 (N/A (not a Robotics track paper))
>>>> [Robotics Track ONLY]: Evaluation on physical
>>>> platforms/simulators: 6 (N/A (not a Robotics track paper))
>>>> [Robotics Track ONLY]: Significance of the contribution: 6 (N/A
>>>> (not a Robotics track paper))
>>>>
>>>> ----------- Review -----------
>>>> The paper describes an approach for learning abstract models
>>>> and hierarchies for hierarchies of AMDPs. These hierarchies are similar, if
>>>> not exactly the same, as those used by frameworks such as MAXQ, where each
>>>> task in the hierarchy is an MDP with actions corresponding to child tasks.
>>>> Prior AMDP work apparently uses hand-specified models of each task/AMDP,
>>>> which are directly used for planning. This paper extends that work by
>>>> learning the models of each task/AMDP. This is done using RMAX at each
>>>> task. There is not a discussion of convergence guarantees of the approach.
>>>> Apparently convergence must occur in a bottom-up way. Experiments are shown
>>>> in two domains and with two hierarchies in one of the domains (Taxi). The
>>>> approach appears to learn more efficiently than a prior approach R-MAXQ.
>>>> The exact reasons for the increased efficiency were not exactly clear based
>>>> on my understanding from the paper.
>>>>
>>>> The paper is well-written at a high level, but the more
>>>> technical and formal descriptions could be improved quite a bit. For
>>>> example, the key object AMDP, is only described informally (the tuple is
>>>> not described in detail). Most of the paper is written quite informally.
>>>> Another example is that Table 1 talks about "max planner rollouts", but I
>>>> didn't see where rollouts are used anywhere in the algorithm description.
>>>>
>>>> After reading the abstract and introduction, I expected that a
>>>> big part of the contribution would be about actually learning the
>>>> hierarchy. However, that does not seem to be the case. Rather, an
>>>> off-the-shelf approach is used to learn hierarchies and then plugged into
>>>> the proposed algorithm for learning the models of tasks. Further, this is
>>>> only tried for one of the two experimental domains. The abstract and
>>>> introduction should be more clear about the contributions of the paper.
>>>>
>>>> Overall, I was unclear about what to learn from the paper. The
>>>> main contribution is apparently algorithm 1, which uses R-MAX to learn the
>>>> models of each AMPD in a given hierarchy. Perhaps this is a novel
>>>> algorithm, but it feels like more of a baseline in the sense that it is the
>>>> first thing that one might try given the problem setup. I may not be
>>>> appreciating some type of complexity that makes this not be
>>>> straightforward. This baseline approach would have been more interesting if
>>>> some form of convergence result was provided, similar to what was provided
>>>> for R-MAXQ.
>>>>
>>>>
>>>> The experiments show that R-AMDP learns faster and is more
>>>> computationally efficient than R-MAXQ. I was unable to get a good
>>>> understanding for why this was the case. This is likely due to the fact
>>>> that I was not able to revisit the R-MAXQ algorithm and it was not
>>>> described in detail in this paper. The authors do try to explain the
>>>> reasons for the performance improvement, but I was unable to follow
>>>> exactly. My best guess based on the discussion is that R-MAXQ does not try
>>>> to exploit the state abstraction provided for each task by the hierarchy
>>>> ("R-MAXQ must compute a model over all possible future states in a planning
>>>> envelope after each action"). Is this the primary reason or is there some
>>>> other reason? Adding the ability to exploit abstractions in R-MAXQ seems
>>>> straightforward, though maybe I'm missing something.
>>>>
>>>> ------------------------------------------------------
>>>>
>>>> Best wishes,
>>>> Gabi Röger and Sven Koenig
>>>> ICAPS 2018 program chairs
>>>>
>>>>
>>>> _______________________________________________
>>>> Robot-learning mailing list
>>>> Robot-learning at cs.umbc.edu <mailto:Robot-learning at cs.umbc.edu>
>>>> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>>>> <https://lists.cs.umbc.edu/mailman/listinfo/robot-learning>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Robot-learning mailing list
>>>> Robot-learning at cs.umbc.edu <mailto:Robot-learning at cs.umbc.edu>
>>>> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>>>> <https://lists.cs.umbc.edu/mailman/listinfo/robot-learning>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Robot-learning mailing list
>>>> Robot-learning at cs.umbc.edu
>>>> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>>>>
>>>>
>>> _______________________________________________
>>> Robot-learning mailing list
>>> Robot-learning at cs.umbc.edu
>>> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>>>
>>
>> --
>> Dr. Marie desJardins
>> Associate Dean for Academic Affairs
>> College of Engineering and Information Technology
>> University of Maryland, Baltimore County
>> 1000 Hilltop Circle
>> Baltimore MD 21250
>>
>> Email: mariedj at umbc.edu
>> Voice: 410-455-3967
>> Fax: 410-455-3559
>>
>>
>> _______________________________________________
>> Robot-learning mailing list
>> Robot-learning at cs.umbc.edu
>> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>>
>> _______________________________________________
> Robot-learning mailing list
> Robot-learning at cs.umbc.edu
> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cs.umbc.edu/pipermail/robot-learning/attachments/20180122/be482401/attachment-0001.html>
More information about the Robot-learning
mailing list