PARADISE Publications


PARADISE FOR THE OBJECTIVE FUNCTION FOR REINFORCEMENT LEARNING : Reinforcement learning for dialogue systems requires an objective function that actually measures the system performance in a general way so that the dialogue system is appropriately optimized. My work on reinforcement learning for optimizing dialogue strategy selection in ELVIS (EmaiL Voicemail Interactive System) used PARADISE to learn such an objective function.

Reinforcement learning related publications:

Marilyn Walker. An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email . Journal of Artificial Intelligence Research, JAIR, Vol 12., pp. 387-416, 2000.

Marilyn Walker, Jeanne Fromer, Shrikanth Narayanan. Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email. In Proceedings of ACL/COLING 98 , 1998.

PARADISE FOR PERFORMANCE MODELING AND EVALUATION OF SPOKEN DIALOGUE SYSTEMS: Our early work on PARADISE was applied to dialogue systems that we built at AT&T. In 2000, PARADISE was adopted as the evaluation framework for the DARPA Communicator project and applied to the official 2000 and 2001 evaluation experiments. This provided an opportunity to test PARADISE and the generality of the derived performance function across systems and over time. A primary research focus in our Communicator work has been to test fully automatic evaluation using fully automatic metrics extractable in real time, in order to have the potential to allow the dialog system to adapt based on its assessment of its current performance. Our work on fully automatic evaluation modules for DARPA Communicator is summarized in the publications below.

Evaluation Publications:

M. Walker, R. Passonneau, J. Aberdeen, J. Boland, E. Bratt, J. Garofolo, L. Hirschman, A. Le, S. Lee, S. Narayanan, K. Papineni, B. Pellom, J. Polifroni, A. Potamianos, P. Prabhu, A. Rudnicky, G. Sanders, S. Seneff, D. Stallard, S. Whittaker. Cross-Site Evaluation in DARPA Communicator: The June 2000 Data Collection Submitted to Computer Speech and Language , 2002.

M. Walker, A. Rudnicky, J. Aberdeen , E. Bratt, J. Garofolo, H. Hastie, A. Le, B. Pellom, A. Potamianos, R. Passonneau, R. Prasad, S. Roukos, G. Sanders, S. Seneff and D. Stallard. DARPA Communicator Evaluation: Progress from 2000 to 2001. In ICSLP 2002.

M. Walker, A. Rudnicky, R. Prasad, J. Aberdeen , E. Bratt, J. Garofolo, H. Hastie, A. Le , B. Pellom, A. Potamianos, R. Passonneau, S. Roukos, G. Sanders, S. Seneff, D. Stallard. DARPA Communicator: Cross-System Results for the 2001 Evaluation . In ICSLP 2002.

Helen Wright Hastie, Rashmi Prasad and Marilyn A. Walker, What's the Trouble: Automatically Identifying Problematic Dialogs in DARPA Communicator Dialog Systems In Meeting of the Association of Computational Linguistics , 2002.

Helen Wright Hastie, Marilyn A. Walker and Rashmi Prasad, Automatic Evaluation: Using a DATE Dialogue Act Tagger for User Satisfaction and Task Completion Prediction In Language Resources and Evaluation Conference , 2002.

Marilyn A. Walker, Rebecca Passonneau and Julie E. Boland Quantitative and Qualitative Evaluation of Darpa Communicator Spoken Dialogue Systems In Meeting of the Association of Computational Linguistics , 2001.

M. Walker, J. Aberdeen, J. Boland, E. Bratt, J. Garofolo, L. Hirschman, A. Le, S. Lee, S. Narayanan, K. Papineni, B. Pellom, J. Polifroni, A. Potamianos, P. Prabhu, A. Rudnicky, G. Sanders, S. Seneff, D. Stallard, S. Whittaker. DARPA Communicator Dialog Travel Planning Systems: The June 2000 Data Collection In EUROSPEECH: European Conference on Speech Processing , 2001.

Marilyn A. Walker and Rebecca Passonneau. DATE: A Dialogue Act Tagging Scheme for Evaluation of Spoken Dialogue Systems. In Human Language Technology Conference , San Diego, March, 2001.

Marilyn A. Walker, Lynette Hirschman and John Aberdeen. Evaluation For Darpa Communicator Spoken Dialogue Systems In Language Resources and Evaluation Conference, LREC , 2000. 

Marilyn A. Walker, Candace Kamm and Julie Boland. Developing And Testing General Models Of Spoken Dialogue System Performance In Language Resources and Evaluation Conference, LREC , 2000. 

Candace Kamm, Marilyn A. Walker, and Diane Litman. "Evaluating Spoken Language Systems" In Proceedings of American Voice Input/Output Society, , AVIOS, 1999. Recipient of a Best Paper Award

Marilyn. A. Walker, Candace. A. Kamm and Diane J. Litman. Towards Developing General Models of Usability with PARADISE . In Submission, 1999.

Marilyn A. Walker, Julie Boland and Candace Kamm. The Utility of Elapsed Time as a Usability Metric for Spoken Dialogue Systems In ASRU99 , 1999. 

Marilyn Walker, Diane J. Litman, Candace A. Kamm and Alicia Abella. Evaluating Spoken Dialogue Agents with {PARADISE}: Two Case Studies . In Computer Speech and Language, 12-3, 1998.

Marilyn Walker, Diane Litman, Candace Kamm and Alicia Abella. PARADISE: A Framework for Evaluating Spoken Dialogue Agents . PDF . In Proceedings of the 35th Annual Meeting of the Association of Computational Linguistics , ACL 97, 1997.