这是用户在 2025-8-7 21:09 为 https://ieeexplore.ieee.org/document/10731271 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
《错误重要吗?比较不同年龄段对物理辅助机器人所犯错误的信任反应 | IEEE 会议出版物 | IEEE Xplore》 --- Do Mistakes Matter? Comparing Trust Responses of Different Age Groups to Errors Made by Physically Assistive Robots | IEEE Conference Publication | IEEE Xplore

Do Mistakes Matter? Comparing Trust Responses of Different Age Groups to Errors Made by Physically Assistive Robots
错误真的重要吗?比较不同年龄段对物理辅助机器人所犯错误的信任反应


Abstract:

Trust is a key factor in ensuring acceptable human-robot interaction, especially in settings where robots may be assisting with critical activities of daily living. When ...Show More

Abstract:  摘要:

Trust is a key factor in ensuring acceptable human-robot interaction, especially in settings where robots may be assisting with critical activities of daily living. When practically deployed, robots are bound to make occasional mistakes, yet the degree to which these errors will impact a care recipient’s trust in the robot, especially in performing physically assistive tasks, remains an open question. To investigate this, we conducted experiments where participants interacted with physically assistive robots which would occasionally make intentional mistakes while performing two different tasks: bathing and feeding. Our study considered the error response of two populations: younger adults at a university (median age 26) and older adults at an independent living facility (median age 83). We observed that the impact of errors on a users’ trust in the robot depends on both their age and the task that the robot is performing. We also found that older adults tend to evaluate the robot on factors unrelated to the robot’s performance, making their trust in the system more resilient to errors when compared to younger adults. Code and supplementary materials are available on our project webpage1.
信任是确保可接受人机交互的关键因素,特别是在机器人可能协助执行日常生活关键活动的环境中。在实际部署中,机器人难免会偶尔犯错误,但这些错误对护理对象信任机器人的程度,特别是在执行物理辅助任务时,仍然是一个悬而未决的问题。为了研究这一点,我们进行了实验,参与者与偶尔会故意犯错的物理辅助机器人进行交互,这些机器人执行两种不同的任务:洗澡和喂食。我们的研究考虑了两个群体的错误反应:大学中的年轻成年人(中位年龄 26 岁)和独立生活设施中的老年人(中位年龄 83 岁)。我们观察到,错误对用户信任机器人的影响取决于他们的年龄以及机器人正在执行的任务。我们还发现,老年人倾向于根据与机器人性能无关的因素来评估机器人,与年轻人相比,他们的系统信任对错误更为坚韧。 代码和补充材料可在我们的项目网页上获取 1
Published in: 2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN)
发表于:2024 年 IEEE 第 33 届机器人与人类交互通信国际会议(ROMAN)
Date of Conference: 26-30 August 2024
会议日期:2024 年 8 月 26 日至 30 日
Date Added to IEEE Xplore: 30 October 2024
添加至 IEEE Xplore 日期:2024 年 10 月 30 日
ISBN Information:  ISBN 信息:

ISSN Information:   ISSN 信息:

Conference Location: Pasadena, CA, USA
会议地点:美国加利福尼亚州帕萨迪纳

Funding Agency:   资助机构:


SECTION I.  第一部分。

INTRODUCTION  引言

Assistive robots may be used to provide care for a diverse range of individuals spanning across a variety of ages and disabilities. Just as older adults may require more physical assistance as they age, younger adults with motor impairments may similarly rely on a caregiver to assist with activities of daily living, such as bathing and feeding. Preferences and expectations for effective physical-robot interaction may differ widely across these age groups, prompting further investigation into how to best design caregiving systems.
辅助机器人可用于为不同年龄段和残疾类型的多样化人群提供护理。正如老年人随着年龄增长可能需要更多的身体协助一样,患有运动障碍的年轻人也可能同样依赖护理者协助进行日常生活活动,如洗澡和喂食。这些不同年龄段人群对有效的物理机器人交互的偏好和期望可能存在巨大差异,这促使我们进一步研究如何最佳设计护理系统。

In particular, the factors that impact human-robot trust are critical to understand [1]. During long-term use of a physically assistive robot, it is unavoidable that the robot may make mistakes, just as any human caregiver may. Previous research has considered how such errors impact trust in human-robot cooperative or social interactions [2]–​[6]. However, there is comparatively less work [7] exploring how such errors impact trust during physical human-robot interactions, which may be perceived as more personal or risky. It is further unclear whether there are age-based disparities in user trust responses. Understanding these responses is essential in the design of systems that care-recipients will not lose confidence in given occasional, inevitable errors.
尤其需要理解影响人机信任的因素 [1] 。在使用物理辅助机器人进行长期交互的过程中,机器人不可避免地可能会犯错,正如任何人类照护者一样。以往的研究已经考虑了此类错误如何影响人机合作或社会交互中的信任 [2] –​ [6] 。然而,相对较少的工作 [7] 探讨了此类错误如何影响物理人机交互中的信任,后者可能被视为更具个性或风险。此外,目前尚不清楚用户信任响应是否存在基于年龄的差异。理解这些响应对于设计系统至关重要,这些系统在偶尔、不可避免的错误发生时,能够使照护对象不会失去信心。

Fig. 1: - 
Top: Examples of intentional errors made by a Stretch RE1 robot performing a bathing task with younger participants. Bottom: Examples of intentional errors made by an Obi robot performing a feeding task with older participants.
Fig. 1:   图 1:

Top: Examples of intentional errors made by a Stretch RE1 robot performing a bathing task with younger participants. Bottom: Examples of intentional errors made by an Obi robot performing a feeding task with older participants.
上方:展示 Stretch RE1 机器人在与年轻参与者进行沐浴任务时故意犯错的示例。下方:展示 Obi 机器人在与老年参与者进行喂食任务时故意犯错的示例。

In this work, we design and run a human study to investigate how peoples’ trust towards a robot changes when the robot makes mistakes while performing two different physically assistive tasks: bed-bathing and feeding. For each task, participants experienced several successful trials to establish some baseline trust, which we assessed with a questionnaire. We then randomly exposed participants to a number of manufactured errors and readministered the questionnaire to assess any fluctuation in their attitude toward the robot. To guide our assessment, we rely on the definition of trust as a subject’s openness to the consequences of an action performed by another party, the robot in this case, over which the subject has no control [8]. Per this definition, participants could not influence the robot’s behavior, only being allowed to observe how it performed a given task.
在这项工作中,我们设计并开展了一项人类研究,以探究当机器人在执行两种不同的物理辅助任务——床上沐浴和喂食——时犯错,人们的信任如何变化。对于每项任务,参与者经历了若干次成功的试验以建立一定的基准信任,我们通过问卷来评估这种信任。然后,我们随机向参与者暴露若干人为制造的错误,并重新施以问卷,以评估他们对机器人态度的任何波动。为了指导我们的评估,我们依据信任的定义,即一个主体对他方(在此案例中为机器人)所执行的行为可能产生的后果的开放程度,而该主体对此行为没有控制权 [8] 。根据这一定义,参与者无法影响机器人的行为,只能观察其如何执行给定任务。

We ran our study with two different participant populations, one group of ten adults (median age 26) at a university and one group of nine older adults (median age 83) at an independent living facility. Through analysis of their questionnaire responses, we found that the effect errors had on trust was a function of both the task and the participant population. For the bathing task, younger adults’ trust in the robot took an initial hit once errors began but returned to their baseline with continued interaction with the robot, even though errors were still present. In the feeding task, younger adults’ trust decreased from baseline after the robot began making mistakes, and remained lowered with continued errors during the task. Older adults, however, did not have any statistically significant differences in their trust towards the robot before or after they were exposed to errors during the feeding task. Further thematic analysis of the responses to open-ended questions from both groups revealed that prior experience with robots impacted whether participants primarily evaluated the robot on task performance or other unrelated factors. We observed that older adults in our study, none of whom had any experience with robots, were more likely to evaluate the feeding robot on factors unrelated to task performance like cost, ethics, or perceived acceptability.
我们以两组不同的参与者群体进行了这项研究,一组是十名成年人(中位年龄 26 岁)在大学,另一组是九名老年人(中位年龄 83 岁)在独立生活设施。通过分析他们的问卷回答,我们发现错误对信任的影响是任务和参与者群体的函数。在洗澡任务中,一旦开始出现错误,年轻人的机器人信任度就会受到初始影响,但随着与机器人的持续互动,即使错误仍然存在,他们的信任度也恢复到了基线水平。在喂食任务中,一旦机器人开始犯错,年轻人的信任度就会从基线下降,并在任务过程中持续错误的情况下保持较低水平。然而,老年人在喂食任务中接触错误前后,对机器人的信任度没有统计学上的显著差异。进一步的主题分析显示,两组开放式问题的回答表明,参与者与机器人的先前经验影响了他们是否主要根据任务表现或其他无关因素来评估机器人。 我们观察到,在我们的研究中,所有参与者均无机器人使用经验,但老年人更倾向于从与任务表现无关的因素(如成本、伦理或感知可接受性)来评价喂食机器人。

Through this research, we aim to answer the following research questions:
通过这项研究,我们旨在回答以下研究问题:

  • RQ1: In a physically assistive context, how do errors in robot behavior affect the trust formed by users towards the robot?
    RQ1:在物理辅助情境下,机器人行为的错误如何影响用户对机器人的信任?

  • RQ2: Do older adults respond differently to robot errors than individuals from a younger-aged population? If so, in what measurable ways?
    RQ2:老年人对机器人错误是否与年轻人群体的个体反应不同?如果是,在哪些可测量的方式上?

SECTION II.  第二节。

Related Work  相关工作

Across the literature, there is a lack of consensus on how to define trust, with potentially many definitions of trust in different contexts. In the context of physically assistive robotics, care-recipients may have limited ability to independently compensate for or correct mistakes made by the robot during an interaction. For such individuals, their willingness to trust a caregiving robot is likely to be strongly associated with their belief that the robot is capable of behaving appropriately without their intervention [4], [9]. These constraints warrant a notion of trust that accounts for the fact that the user cannot fully control the robot. In this work, we adopt Mayer et al.’s definition of trust as "the willingness of a party to be vulnerable to the actions of another party based on the expectation that the other will perform a particular action important to the trustor, irrespective of the ability to monitor or control that other part." [8]
在文献中,对于如何定义信任尚未达成共识,不同情境下可能存在多种信任定义。在物理辅助机器人领域,护理对象可能缺乏独立弥补或纠正机器人交互过程中所犯错误的能力。对于这类个体而言,他们信任护理机器人的意愿很可能与其相信机器人能够在无需干预的情况下适当行动的信念密切相关 [4][9] 。这些限制要求我们建立一种能够考虑用户无法完全控制机器人的信任概念。在本工作中,我们采用 Mayer 等人的定义,将信任描述为“一方基于预期另一方将执行对信任者重要的特定行为,而愿意使自己处于易受对方行为影响的脆弱状态,无论是否具备监控或控制该方的能力。” [8]

Existing research has explored the factors that impact trust formation in a variety of robotic applications, including socially assistive robots for older adults [10], robot therapy [11], guidance robots for evacuation scenarios [12], [13], and grocery bagging robots [14], among others. Broadly, these works have established that trust is critical for appropriate human-robot interaction. Furthermore, humans’ trust in robots can be strongly impacted by failures, although they may not always recognize or respond to those robot errors.
现有研究已探讨了影响多种机器人应用中信任建立的因素,包括为老年人提供社会辅助的机器人 [10] 、机器人治疗 [11] 、疏散场景中的引导机器人 [12][13] 以及购物袋装机器人 [14] 等。总体而言,这些研究已证实信任对于适当的人机交互至关重要。此外,人类对机器人的信任可能受到故障的强烈影响,尽管他们可能并不总是能识别或对机器人错误做出反应。

Since trust has been demonstrated to be a key indicator of long-term use and acceptance of robotic technology [1], [4], it stands to reason that understanding and mitigating the impact of errors, which are inevitable in long-term caregiving, is paramount in the development of caregiving robots. A number of physically assistive robotic systems have been proposed for tasks like bed bathing [15]–​[18] and feeding [19]–​[21], however, few studies have investigated trust formation, and the impact robot errors can have, in these contexts. Bhattacharjee et al. found that individuals with motor impairments generally preferred for a feeding robot to not make mistakes but were willing to accept errors up to 30% of the time, suggesting that their tolerance of errors may be higher than able-bodied individuals [7]. Our study builds on this work to investigate how trust responses to robot errors may differ among individuals in different age groups and based on the task, robot-assisted bathing or feeding, being performed.
既然信任已被证明是机器人技术长期使用和接受的关键指标 [1][4] ,那么理解和减轻错误的影响——这些在长期照护中不可避免——在护理机器人开发中至关重要。已经提出了许多物理辅助机器人系统用于床浴 [15] –​ [18] 和喂食 [19] –​ [21] 等任务,然而,很少有研究调查在这些情境中信任的形成以及机器人错误可能产生的影响。Bhattacharjee 等人发现,患有运动障碍的人通常更希望喂食机器人不要犯错,但愿意接受高达 30%的错误,这表明他们可能比健全人更能容忍错误 [7] 。我们的研究基于这项工作,旨在调查不同年龄组的人在执行护理机器人辅助沐浴或喂食任务时,对机器人错误的信任反应可能存在哪些差异。

Fig. 2: - 
We designed a custom 3D-printed tool to allow the Stretch RE1 robot to hold a wet washcloth in its gripper.
Fig. 2:   图 2:

We designed a custom 3D-printed tool to allow the Stretch RE1 robot to hold a wet washcloth in its gripper.
我们设计了一种定制的 3D 打印工具,使 Stretch RE1 机器人能够用其夹爪抓持湿毛巾。

SECTION III.  第三节

Methodology  方法

In this section, we describe our design for simple autonomous bed-bathing and feeding systems, as well as the intentional errors we defined for each. We then outline the procedure for our human study, which investigates how robot errors affect trust in both younger and older adult populations. Finally, we detail the measures and questionnaires that we use to assess changes in trust towards the robot and present our hypotheses for the study.
在本节中,我们描述了简单自主沐浴和喂食系统的设计,以及为每个系统定义的故意错误。接着,我们概述了人类研究程序,该研究调查了机器人错误如何影响年轻人和老年人群体对机器人的信任。最后,我们详细说明了用于评估机器人信任变化的措施和问卷,并提出了研究假设。

A. System Design  A. 系统设计

We developed two systems to complete the bathing and feeding tasks. The systems were designed with the minimum functionality to autonomously complete their respective tasks but mimicked how a more sophisticated system might approach a similar goal.
我们开发了两个系统来完成洗澡和喂食任务。这些系统被设计为具有最小功能,能够自主完成各自的任务,但模拟了更复杂系统如何处理类似目标的方式。

1) Bathing System Design  1) 洗澡系统设计

We define a simulated assistive bed-bathing task where a Stretch RE1 [22] mobile manipulator uses a wet washcloth to wipe a stripe of shaving cream off of a person’s lower leg while they are laying in a hospital bed. The robot grasps 3D-printed bathing tool, pictured in Fig. 2, to which we can easily attach and remove wet washcloths. During the human study, we replaced the washcloths once they became saturated with shaving cream – after every three trials. We place a layer of foam between the bottom of the bathing tool and the washcloth, which allows the tool’s bathing surface to better conform to the body and improves the safety of the physical contact.
我们定义了一个模拟的辅助床边洗澡任务,其中 Stretch RE1 [22] 移动机械臂使用湿毛巾擦去躺在病床上的一个人小腿上的剃须膏条纹。该机器人抓取 3D 打印的洗澡工具(如图 Fig. 2 所示),我们可以轻松地将其连接和拆卸湿毛巾。在人类研究中,一旦毛巾被剃须膏浸透——每三次试验后——我们就更换毛巾。我们在洗澡工具底部和毛巾之间放置了一层泡沫,这使得工具的洗澡表面能更好地贴合身体,并提高了物理接触的安全性。

We use a camera affixed above the hospital bed to capture RGB images of the human body, which are then fed into BlazePose, as implemented in Google’s MediaPipe Pose [23], to determine the position and orientation of the participant’s lower leg. The estimated human pose is verified and adjusted as needed using a custom GUI before robot actuation for bathing assistance begins. We place a single fiducial tag at the bottom left corner of the hospital bed and define a global coordinate frame with its origin at the center of this tag. We transform the detected human pose and the robot’s position to this global frame.
我们使用固定在医院病床上方的一台摄像机来捕捉人体 RGB 图像,然后将这些图像输入到 Google 的 MediaPipe Pose [23] 中实现的 BlazePose,以确定参与者的下腿位置和方向。在机器人开始辅助沐浴之前,我们使用自定义 GUI 对估计的人体姿态进行验证和调整。我们在病床的左下角放置一个单个的 fiducial tag,并以该 tag 的中心为原点定义一个全局坐标系。我们将检测到的人体姿态和机器人的位置转换到这个全局坐标系中。

The robot uses its head-mounted camera to detect the fiducial tag on the bed and localize itself within the global frame. The robot moves along the bed and extends its end effector to just below the right knee. Finally, the robot moves its end effector down the participant’s shin towards the right ankle in a side-to-side sweeping motion. The robot uses effort sensing at its wrist to maintain contact between the cleaning tool its holding and the person’s lower leg.
该机器人使用其头戴摄像头检测床上的基准标签,并在全局坐标系中定位自身。机器人沿床移动,并将末端执行器伸至右膝下方。最后,机器人以左右扫动的动作,将末端执行器从参与者的胫骨向下移动至右踝。机器人通过手腕处的力矩传感来维持其持有的清洁工具与参与者小腿之间的接触。

2) Feeding System Design  2) 喂食系统设计

In the feeding task, the objective is for a commercial feeding robot, called Obi [24], to scoop up a morsel of food from its built-in bowls using a spoon, bring the spoon to the participant’s mouth and allow them to take a bite, and then scrape any remaining food on the spoon into the bowls. We separate the feeding process into three distinct segments – scooping, feeding, and cleaning – where intentional errors can be introduced. To synthesize motor commands for the robot to complete the task, we use Obi’s kinesthetic teaching function and physically guide the robot through approximate feeding trajectories. We recorded trajectories for both successful and erroneous robot actions, which we splice into the scooping, feeding, and cleaning actions. To feed participants in our human study, we select scooping, feeding, and cleaning actions, which may be randomly selected to contain errors, and the robot replays these in sequence to execute a full feeding trajectory.
在喂食任务中,目标是让名为 Obi [24] 的商用喂食机器人使用勺子从其内置碗中舀起一小块食物,将勺子带到参与者的嘴边让他们咬一口,然后将勺子上剩余的食物刮入碗中。我们将喂食过程分为三个不同的阶段——舀取、喂食和清洁——在这些阶段中可以引入有意错误。为了合成机器人完成任务所需的运动指令,我们使用 Obi 的运动教学功能,并物理引导机器人通过近似的喂食轨迹。我们记录了成功和错误机器人动作的轨迹,并将这些轨迹拼接成舀取、喂食和清洁动作。为了在我们的人体研究中喂食参与者,我们选择舀取、喂食和清洁动作,这些动作可能被随机选择以包含错误,机器人按顺序重放这些动作以执行完整的喂食轨迹。

The participants were fed while seated in front of a height-adjustable overbed table, with the Obi robot placed on top. We adjust the height of the table the robot sits on to align its spoon, when raised to the feeding position, with the participant’s mouth. Since the robot does not actually attempt to put food inside a person’s mouth, instead only bringing the spoon within a few centimeters of it, participants were instructed to move forward to take a bite, but only if the spoon was easily within reach. To accommodate dietary restrictions, each participant was given a choice of one hard food (Plain or Honey Nut Cheerios, M&M’s, or canned corn) and one soft food (plain yogurt, pudding, or applesauce) to be fed during the experiment. The robot begins the experiment by scooping hard food and then switches to feeding soft food halfway through, after 5 trials.
参与者坐在可调节高度的床旁桌上进食,Obi 机器人在桌面上。我们调整机器人所坐桌面的高度,使其勺子在与参与者进食位置相同时对准其嘴部。由于机器人并未实际尝试将食物放入他人口中,而是仅将勺子靠近几厘米,参与者被指示向前移动以咬取食物,但前提是勺子容易够到。为适应饮食限制,每位参与者被提供一种硬食(普通麦片、蜂蜜坚果麦片、M&M's 或罐装玉米)和一种软食(普通酸奶、布丁或苹果酱)的选择,用于实验中的进食。实验开始时机器人先舀取硬食,并在 5 次试验后中途切换为喂食软食。

3) Intentional Errors  3) 故意错误

For both the bathing and feeding tasks, we define several types of error. The relative severity of each error ranges from those that induce a minor reduction in performance, to those that result in complete task failure. We list the errors for each task, in order of severity, in Table I. Time-series of a successful trial and the intentional errors for each task are depicted in Fig. 3.
对于洗澡和喂食任务,我们定义了多种类型的错误。每种错误的相对严重程度从导致轻微性能下降到导致任务完全失败不等。我们按严重程度顺序列出了每个任务中的错误,见 Table I 。成功试验的时间序列和每个任务中的故意错误如图 Fig. 3 所示。

B. Study Design  B. 研究设计

To evaluate peoples’ trust response when physically assistive robots make mistakes, we ran a human study (Carnegie Mellon University IRB approval under 2022.00000337) with informed consent. After consenting to the study, participants were asked to fill out a questionnaire about their demographics and previous experience with robots. Additionally, participants completed the Negative Attitudes Towards Robots (NARS) scale, a validated survey for assessing participants’ baseline levels of anxiety towards robotic agents [25], [26]. We then gave participants a brief description of what the robot would attempt to do and its expected behavior, without mentioning that the robot would make intentional errors, before beginning the experiment.
为了评估人们在物理辅助机器人犯错时的信任反应,我们进行了一项人类研究(卡内基梅隆大学伦理委员会批准,编号 2022.00000337),并获得了知情同意。在同意参与研究后,要求参与者填写一份关于其人口统计信息和机器人先前经验的问卷。此外,参与者完成了机器人负面态度量表(NARS),这是一个经过验证的调查问卷,用于评估参与者对机器人代理的基线焦虑水平 [25][26] 。然后在实验开始前,我们向参与者简要描述了机器人将尝试执行的任务及其预期行为,但没有提及机器人会故意犯错。

TABLE I: Summary of intentional errors
表 I:故意错误总结
Table I:- 
Summary of intentional errors

For a given task, either bathing or feeding, we ran a total of nine trials, divided into three sets of three trials. While the first set of trials is error-free, we randomly selected errors, defined in Table I, to occur in the last two sets of trials, with no more than two errors appearing in a given set. Each participant experienced three distinct, preprogrammed errors, with the order of the errors being randomized. After each set, we administer a questionnaire with seven Likert items, detailed in Table II, to assess how participants’ attitudes towards the robot changed over the course of the experiment.
针对特定任务(洗澡或喂食),我们共进行了九次试验,分为三组,每组三次试验。其中第一组试验没有错误,而在后两组试验中,我们随机选择了在 Table I 中定义的错误发生,且每组试验中出现的错误不超过两个。每位参与者都会经历三种不同的、预先编程的错误,错误的顺序被随机化。每组试验结束后,我们通过包含七个李克特项目的问卷(详细见 Table II )来评估参与者对机器人的态度在实验过程中的变化。

After completing all trials, we asked participants a set of four open-ended questions, described in Table III. At the end of the experiment, we debriefed participants on the true nature of the study, asking whether they suspected that any errors made by the robot were not genuine and then revealing that errors were, in fact, performed intentionally to get their reaction. Participants were specifically asked not to share information about the study with outside individuals in order to maintain the integrity of the study’s deceptive elements.
完成所有试验后,我们向参与者提出了一套四个开放式问题,如 Table III 所述。实验结束时,我们对参与者进行了说明,揭示了研究的真实性质,询问他们是否怀疑机器人出现的任何错误并非真实,然后透露这些错误实际上是故意制造的以获取他们的反应。我们特别要求参与者不要将研究信息透露给外界,以保持研究欺骗性元素的整体性。

1) Younger Adult Population
1) 年轻成年人群体

We recruited younger adults for the study by posting promotional flyers in buildings on the Carnegie Mellon University campus. Participants recruited at the university were asked to participate in both bathing and feeding tasks, completing nine trials of each for a total of 18 trials. The order of the two tasks was alternated between participants to eliminate any ordering bias. For both tasks, participants were positioned in a hospital bed so that they could clearly observe the robot’s behavior. The head of the bed was raised to a slight incline so that participants could see the Stretch RE1 robot while laying down for the bathing task. The head of the bed was further raised for the feeding task so that participants could maintain a seated pose as the Obi robot fed them.
我们通过在卡内基梅隆大学校园内的建筑物张贴宣传海报招募了年轻成年人参与这项研究。在大学招募的参与者被要求参与洗澡和喂食两项任务,每项任务完成九次试验,总共进行十八次试验。为了消除顺序偏差,两项任务的顺序在参与者之间交替进行。对于两项任务,参与者被置于病床上,以便他们能够清晰地观察机器人的行为。床头被略微抬高,以便参与者在进行洗澡任务时躺着能够看到 Stretch RE1 机器人。在进行喂食任务时,床头进一步抬高,以便参与者在 Obi 机器人喂食时保持坐姿。

Fig. 3: - 
Summary time-series for successful trials and intentional error cases in the bathing (top) and feeding (bottom) tasks.
Fig. 3:   图 3:

Summary time-series for successful trials and intentional error cases in the bathing (top) and feeding (bottom) tasks.
成功试验和有意错误案例的洗澡(顶部)和喂食(底部)任务的时序图摘要。

2) Older Adult Population
2) 老年人口

We recruited older adults for the study from Vincentian Terrace Place, an independent living facility in the Pittsburgh area, via a 30-minute, onsite Q&A session. During the session, attendees were shown a live demo of the Obi robot, were given a chance to ask questions about the robot and the study, and could choose to sign up to participate. Due to constraints associated with running the study at the independent living facility, we were unable to conduct the bathing trials with the older adult population. Specifically, it was infeasible to transfer both the fully adjustable hospital bed and the overhead camera rig to the facility for the study. Instead, the recruited older adults only participated in nine feeding trials, which we conducted in an isolated kitchenette room at the independent living facility. Participants completed the feeding trials while seated in a lounge chair in front of a height-adjustable table.
我们从匹兹堡地区的 Vincentian Terrace Place 独立生活设施招募了老年人参与研究,通过一个 30 分钟的现场问答环节进行招募。在问答环节中,参与者观看了一个 Obi 机器人的现场演示,有机会就机器人和研究提出问题,并可以选择报名参加。由于在独立生活设施进行研究的限制,我们无法与老年人口进行洗澡试验。具体来说,将完全可调节的病床和天花板摄像头支架转移到设施进行研究是不可行的。相反,招募的老年人只参与了九次喂食试验,这些试验我们在独立生活设施的隔离小厨房房间进行。参与者坐在可调节高度的桌子前,坐在休息椅上完成了喂食试验。

C. Measures  C. 测量方法

Based on RQ1 and RQ2, introduced in Section I, we develop the following two hypotheses:
基于在 Section I 中介绍的研究问题 1(RQ1)和研究问题 2(RQ2),我们提出了以下两个假设:

  • H1. Observable errors in robot behavior cause users to find the robot less trustworthy and cause them to be less open to future interaction with the robot.
    H1. 机器人行为中的可观察错误会导致用户降低对机器人的信任度,并减少他们与机器人未来交互的意愿。

  • H2. The impact of task-affecting robot errors on trust in the robotic system is greater among older adults.
    H2. 对机器人系统信任的影响,受任务相关机器人错误影响的程度在老年人中更为显著。

We seek to substantiate our hypotheses using several measures administered at various points throughout our human study. To evaluate participants’ perception of the robot, we adapted seven questions from the Human-Computer Trust (HCT) Scale developed by Madsen et al. [27]. We omitted questions unrelated to the function of the feeding and bathing robots and re-worded the selected questions to be more relevant to the actual task (e.g. replacing the word "system" with "robot"). The seven finalized questions, which we administered as five-point Likert items (1 = Strongly Disagree, 5 = Strongly Agree) after each set of three trials, can be grouped into four constructs involved in trust formation as defined by Madsen et al. [27]: Reliability, Technical Competence, Understandability, and Faith. Each category had two associated Likert items, except Faith, which had only one. The questions, and their associated subscales, are summarized in Table II.
我们希望通过在人类研究过程中不同时间点实施的多种测量来验证我们的假设。为了评估参与者对机器人的认知,我们改编了 Madsen 等人开发的《人机信任量表》(HCT)中的七个问题 [27] 。我们删除了与喂食和沐浴机器人功能无关的问题,并将所选问题重新措辞,使其更贴近实际任务(例如,将"系统"一词替换为"机器人")。最终确定的七个问题,在每次三组试验后以五点李克特量表(1=强烈反对,5=强烈同意)的形式实施,可以根据 Madsen 等人定义的信任形成中的四个结构进行分组 [27] :可靠性、技术能力、可理解性和信念。每个类别有两个相关的李克特项目,信念除外,信念只有一个。这些问题及其相关的子量表总结在 Table II

TABLE II: Likert items adapted from the HCT Scale [27]
表 II:源自 HCT 量表 [27] 的李克特项目
Table II:- 
Likert items adapted from the HCT Scale [27]
TABLE III: Open-Ended Questions
表 III:开放式问题
Table III:- 
Open-Ended Questions

At the end of all the trials, we also conducted a short open-ended survey, detailed in Table III, with each participant. For the study with older adults, we replaced the words "older adults" with "people" in Question 1, and replaced "for an older parent of grandparent" with "later in life" in Question 2. All participants were asked to respond to the open-ended questions verbally. Audio was recorded so that responses could be transcribed post-hoc and analyzed.
在所有试验结束后,我们还对每位参与者进行了一项简短的开放式问卷调查,详细内容见 Table III 。在针对老年人的研究中,我们将问题 1 中的"older adults"替换为"people",将问题 2 中的"for an older parent of grandparent"替换为"later in life"。所有参与者均被要求口头回答开放式问题。录制了音频,以便事后进行转录和分析。

SECTION IV.  第四节

Results  结果

We ran an in-lab study at Carnegie Mellon University (10 participants, 4 female, mean age 26.1±11.5), as well as at an independent living facility (9 participants, 8 female, mean age 81.9±7.6), for a total of 19 participants. For the in-lab environment, each participant completed 18 total trials, nine trials with the bathing robot and feeding robot respectively. At the independent living facility, each participant completed nine trials with only the feeding robot. At both locations, the trials were followed by a set of open-ended questions. Due to poor quality of the recorded audio from one of the participants at the university, their response to the open-ended questions could not be transcribed and had to be excluded in the analysis presented in Section IV-C.1.
我们在卡内基梅隆大学(10 名参与者,4 名女性,平均年龄 26.1±11.5)以及一个独立生活设施(9 名参与者,8 名女性,平均年龄 81.9±7.6)进行了一项实验室研究,共 19 名参与者。在实验室环境中,每位参与者完成了 18 次总试验,其中分别有 9 次是与沐浴机器人和喂食机器人进行的。在独立生活设施中,每位参与者仅完成了 9 次与喂食机器人相关的试验。在两个地点,试验后都有一系列开放式问题。由于大学一名参与者录制的音频质量较差,其开放式问题的回答无法转录,因此在 Section IV-C.1 中呈现的分析中不得不将其排除。

Fig. 4: - 
For each set of trials across both tasks and participant populations, we present the composite scores in four trust subscales. The composite scores are given by the sum of all Likert item responses within each subscale. The Reliability, Understandability, and Technical Competence categories are on a scale of 0-10 while the Faith category is on a scale of 0-5. Statistically significant differences in the subscale scores between sets are denoted with an asterisk.
Fig. 4:   图 4:

For each set of trials across both tasks and participant populations, we present the composite scores in four trust subscales. The composite scores are given by the sum of all Likert item responses within each subscale. The Reliability, Understandability, and Technical Competence categories are on a scale of 0-10 while the Faith category is on a scale of 0-5. Statistically significant differences in the subscale scores between sets are denoted with an asterisk.
对于每项任务和参与者群体的所有试验,我们展示了四个信任子量表的复合分数。复合分数由每个子量表内所有李克特项目响应的总和给出。可靠性、可理解性和技术能力类别采用 0-10 分制,而信仰类别采用 0-5 分制。子量表分数之间的显著差异用星号表示。

Of the university participants, 5/10 indicated having some previous experience working with robots while none of the participants at the independent living facility reported having such experience. Older participants had a slightly higher median NARS score of 35 compared to younger participants, who had a median score of 29.5. However, we did not observe a statistically significant difference between the two groups after running a two-sample t-test (p=0.55).
在大学参与者中,5/10 人表示之前有过与机器人合作的经验,而独立生活设施的参与者中没有人报告过此类经验。年长参与者的中位数 NARS 分数为 35,略高于年轻参与者(中位数为 29.5)。然而,在运行双样本 t 检验后(p=0.55),我们没有观察到两组之间存在统计学上的显著差异。

The questionnaire shown in Table II was administered after each set of three trials for each task, resulting in three sets of responses per task. Throughout our analysis, we group responses to the Likert items by their subscale, summing the responses to questions within the same subscale to obtain a subscale score (within 0-10 for Reliability, Technical Competence, Understandability; within 0-5 for Faith). We used a Wilcoxon signed-rank test to perform a pairwise comparison of the subscale scores from all three sets of trials. Fig. 4 summarizes, for each task and participant population, the distribution of subscale scores after each set of trials. P-values from the pairwise set comparisons are expressed via annotations on the figures, with statistically significant results denoted with asterisks.
Table II 中所示问卷在每个任务的三次试验后进行施测,每个任务产生三组响应。在我们的分析中,我们将 Likert 项目的响应按其子量表分组,将同一子量表内问题的响应相加以获得子量表得分(可靠性、技术能力、可理解性为 0-10 分;信念为 0-5 分)。我们使用 Wilcoxon 符号秩检验对所有三次试验的子量表得分进行配对比较。 Fig. 4 总结了每个任务和参与者群体在每次试验后子量表得分的分布。配对集比较的 P 值通过图形上的注释表示,统计学上显著的结果用星号标出。

A. Bathing System  A. 洗浴系统

Between the first and second sets of responses, we observed a statistically significant difference in scores for the Reliability, Technical Competence, and Understandability categories. There was also a statistically significant difference in the same categories between the second and third sets. Fig. 4 shows that Likert responses decrease after errors are first introduced in the second set, but increase from the second to third set, almost returning to baseline levels. In fact, there was no appreciable difference in responses between the first and third sets, indicating that, after taking an initial hit when errors are first introduced, trust in the robot appears to rebound as the experiment progresses, despite continued mistakes. The cause of this phenomenon is not clear, but previous works have suggested that trust in a robot can be repaired based not only by system reliability but also system transparency and appearance [28]. Alternatively, it is also possible that the intentional errors made by the bathing robot were generally perceived as less severe than those made by the feeding robot. Ultimately, we fail to reject the null hypothesis for H1 for the bathing task because there is no statistically significant difference in participants’ trust towards the robot at the beginning and end of the study.
在第一组和第二组反应之间,我们在可靠性、技术能力和可理解性类别中观察到评分存在统计学上的显著差异。在第二组和第三组之间,这些类别中也存在统计学上的显著差异。 Fig. 4 显示,在第二组中,当错误首次引入后,李克特响应评分下降,但从第二组到第三组又上升,几乎恢复到基线水平。事实上,第一组和第三组之间的响应没有明显差异,这表明,尽管在错误首次引入时信任度有所下降,但随着实验的进行,对机器人的信任似乎会反弹,尽管仍然存在持续的错误。这种现象的原因尚不明确,但以往的研究表明,对机器人的信任不仅可以通过系统可靠性,还可以通过系统透明度和外观来修复 [28] 。或者,也有可能洗澡机器人故意犯的错误普遍被认为比喂食机器人犯的错误严重程度较低。 最终,在沐浴任务中,我们未能拒绝 H1 的原假设,因为在研究开始和结束时,参与者对机器人的信任没有统计学上的显著差异。

B. Feeding System  B. 喂养系统

For the younger population, responses in all subscales except Faith showed a statistically significant decrease between both the first and second sets, as well as the first and third sets. There is no significant difference in responses between the second and third sets. From Fig. 4, we see that trust towards the robot decreases immediately following the introduction of errors in Set 2. However, unlike in the bathing task, trust responses towards the robot do not recover despite continued errors in Set 3, instead remaining practically unchanged from Set 2. From these results, we can reject the null hypothesis for H1 in the younger adult population for the feeding task.
对于年轻群体,除了信仰维度外,所有子量表在第一组与第二组之间、第一组与第三组之间均表现出统计学上的显著下降。第二组与第三组之间的响应无显著差异。从 Fig. 4 可以看出,在第二组引入错误后,对机器人的信任立即下降。然而与沐浴任务不同,尽管在第三组中持续出现错误,但对机器人的信任响应并未恢复,反而与第二组相比几乎保持不变。基于这些结果,我们可以拒绝在年轻成年群体中关于喂食任务的 H1 零假设。

In contrast to this distinct pattern observed in the younger adult group, the older population did not have any significant change in responses between any of the trial sets in any of the four question subscales. As a result, we fail to reject the null hypothesis for H1 for older adults in the feeding task.
与年轻成年人组中观察到的这种明显模式形成对比的是,老年人群在四个问题子量表中的任何一组试验中,其反应都没有发生任何显著变化。因此,在喂食任务中,我们未能拒绝针对老年人的 H1 零假设。

To examine H2, we further compare the trust responses between the younger and older adult populations. For this comparison, we compute the difference in subscale scores between Set 1 and Set 2, denoted ∆S12, as well as between Set 1 and Set 3, denoted ∆S13. ∆S12 and ∆S13 represent the change in trust between the error-free trials and the trials where errors were introduced. In order to determine whether there is a statistically significant difference in how errors impact the trust of younger and older adults, we perform a two-sample t-test between the ∆S12 values of both populations, as well as between the ∆S13 values. Across all question categories, there are no statistically significant differences between trust response in the younger and older populations after errors were introduced. Therefore, for H2, we fail to reject the null hypothesis and thus cannot show that errors impact older adults’ trust in the robot more than for younger adults. In fact, the opposite seemed to occur. We further explore the underlying reasons behind these results in our thematic analysis of participant responses to our set of open-ended questions.
为检验假设 H2,我们进一步比较了年轻群体和年长群体对错误的信任反应。为此,我们计算了 Set 1 与 Set 2 之间的子量表得分差异,记作∆S 12 ,以及 Set 1 与 Set 3 之间的子量表得分差异,记作∆S 13 。∆S 12 和∆S 13 分别表示在无错误试验与引入错误试验中信任的变化。为了确定错误对年轻群体和年长群体信任的影响是否存在统计学上的显著差异,我们对两组群体的∆S 12 值以及∆S 13 值进行了双样本 t 检验。在所有问题类别中,引入错误后,年轻群体和年长群体的信任反应均未显示出统计学上的显著差异。因此,对于假设 H2,我们未能拒绝原假设,因此无法证明错误对年长群体对机器人的信任影响大于年轻群体。事实上,情况似乎恰恰相反。我们将在对参与者开放式问题的回答进行主题分析时,进一步探讨这些结果背后的深层原因。

C. Thematic Analysis of Open-Ended Responses
C. 开放式回答的主题分析

We conducted our thematic analysis of the open-ended responses to the questions summarized in Table III using QualCoder, a qualitative data analysis software [29].
我们使用定性数据分析软件 QualCoder 对问题 Table III 中总结的开放式回答进行了主题分析。

1) Inductive Analysis  1) 归纳分析

To develop a code bank from our open-ended responses, two researchers first independently completed an inductive analysis of the response data for bathing and feeding. We then resolved conflicts and duplicates between both sets of codes to generate a final set of mutually agreed-upon codes. This final set of codes was applied to all of the response data.
为了从我们的开放式回答中开发编码库,两位研究人员首先独立完成了对沐浴和喂食回答数据的归纳分析。然后我们解决了两组编码之间的冲突和重复,以生成最终的一组相互同意的编码。这组最终编码被应用于所有回答数据。

We found that, when discussing Q1, 4/9 younger adult participants found the bathing robot to be helpful while 3/9 cited reliability concerns. Across both study locations, 14/18 participants indicated that they were comfortable with the use of the feeding robot as a form of assistance. However, three participants, all older adults, clarified that they were only comfortable with robotic assistance to reduce the workload of a human caregiver or in the absence of human caregiving. They expressed a strong belief in the importance of maintaining human interaction with those that require assistance. Feedback of this nature was not given by any of the younger participants.
我们发现,在讨论 Q1 时,9 名年轻成年人中有 4 人认为洗澡机器人有帮助,而 3 人提出了可靠性问题。在两个研究地点中,18 名参与者中有 14 人表示他们愿意将喂食机器人作为辅助工具使用。然而,有三位都是老年人的参与者澄清,他们只愿意在减轻人类护理员的工作量或在没有人类护理的情况下接受机器人辅助。他们强烈表达了对保持需要帮助的人与人类互动重要性的信念。这种性质的反馈没有被任何年轻参与者提供。

When discussing Q2, only 3/9 participants from the younger population said they might purchase the bathing system for an older parent or grandparent. Four participants stated that it was not competent enough to replace a human caregiver, while others expressed a general level of uncertainty about having a robot engage in bathing, deeming it a dangerous and/or sensitive task best assisted by a human. By contrast, responses to the feeding robot were much more positive, with 16/19 participants affirming that they would consider purchasing the robot. Two older adults indicated that personal independence would be a motivating factor for them to use this technology. Reasons cited by younger individuals included easing the burden on caregivers and giving parents or grandparents the option of independence.
在讨论 Q2 时,年轻群体中仅有 3/9 的参与者表示可能会为年长的父母或祖父母购买洗澡系统。有 4 名参与者表示该系统尚不足以取代人类护理人员,而其他人则对让机器人参与洗澡表示普遍的不确定,认为这是一项危险和/或敏感的任务,最好由人类协助完成。相比之下,对喂食机器人的反应要积极得多,19/19 的参与者表示会考虑购买该机器人。两名老年人表示,个人独立将是他们使用这项技术的动机因素。年轻人提出的原因包括减轻护理人员负担以及为父母或祖父母提供独立选择的机会。

Participants who would not consider purchasing the feeding robot gave judgments that were not specifically related to the robot’s performance during the trials. Two individuals from the younger population stated they were uncomfortable with the idea of robots having full, or "too much" autonomy. One of the older adults indicated that, although they could think of an individual who would find use in the feeding technology, they were nervous that "she’s real proud and stubborn, so she might not accept it." When asked whether they felt that this person’s perception of a robot feeding her would be different than that of a person feeding her, they responded "Yes, she could be [so] stubborn that she could think that way.".
那些不愿意考虑购买喂食机器人的参与者给出的判断与试验期间机器人的表现没有直接关联。年轻群体中有两人表示他们不习惯机器人具有完全或"太多"的自主性。老年人中的一位表示,尽管他们可以想到一个会认为喂食技术有用的人,但他们担心"她很自豪而且固执,可能不会接受它"。当被问及他们是否觉得这个人对于机器人喂食她的看法会与人类喂食她的看法不同时,他们回答说"是的,她可能固执到会这么想"。

Fig. 5: - 
Left Column: Pie charts representing the portion of individuals in each population who stated that they had previous experience with robots. Right Column: Pie charts representing the proportion of performance-based vs. nonperformance-based statements made about the robot in each population.
Fig. 5:   图 5:

Left Column: Pie charts representing the portion of individuals in each population who stated that they had previous experience with robots. Right Column: Pie charts representing the proportion of performance-based vs. nonperformance-based statements made about the robot in each population.
左侧列:表示每个群体中声称有先前机器人使用经验的个体比例的饼图。右侧列:表示每个群体中关于机器人的基于表现与非基于表现陈述比例的饼图。

Q3 was intended to assess whether or not participants perceived the robot’s intentional mistakes as genuine errors. When asked "Did the robot perform the task as you expected?", 6/9 younger participants specifically pointed out that the bathing robot made an error while 8/9 did the same for the feeding robot. However, only 4/9 of the older participants reported having observed errors with the feeding robot.
Q3 旨在评估参与者是否认为机器人有意犯的错误是真正的错误。当被问及"机器人完成任务符合您的预期吗?"时,9 名年轻参与者中有 6 名特别指出洗澡机器人犯了一个错误,而 9 名中有 8 名对喂食机器人做了同样的反应。然而,9 名年长参与者中只有 4 名报告观察到喂食机器人出现了错误。

Among responses to Q4, most participants (12/18) did not believe there were any tasks robots should not assist with, or could not think of any. 6/18 participants across both groups mentioned that they would be uncomfortable having robots perform daily living tasks that they perceived as "intimate" or "dangerous", like toileting, shaving, or cooking. Three of these participants were in the younger population. Only one participant, a member of the younger group, cited bathing as an example of such a private task, indicating that very few of our participants had any negative bias towards any of the tasks in our study.
在 Q4 的回答中,大多数参与者(12/18)不相信有任何任务机器人不应该协助,或者无法想到任何。在两个组中,有 6/18 的参与者提到,他们对于机器人执行他们认为是"亲密"或"危险"的日常生活任务(如如厕、剃须或烹饪)会感到不舒服。其中这三名参与者属于年轻群体。只有一名参与者,即年轻群体中的一员,将沐浴作为这类私人任务的例子,这表明我们参与者中几乎没有人对我们研究中的任何任务有任何负面偏见。

2) Deductive Analysis  2) 演绎分析

While performing the inductive analysis, we observed a trend in the types of responses younger participants gave to Q1 and Q2 compared to older participants. It appeared that younger individuals were more likely to evaluate the robot based on its performance on the task, specifically citing whether or not the robot made mistakes or if they believed the robot completed the task well in their responses. For example, a younger participant gave the following performance-based evaluation of the feeding robot:
在进行归纳分析时,我们观察到年轻参与者与年长参与者在回答 Q1 和 Q2 时给出的反应类型存在趋势差异。似乎年轻个体更倾向于根据机器人在任务中的表现来评估机器人,具体体现在他们是否提及机器人是否犯错,或是否认为机器人很好地完成了任务。例如,一位年轻参与者对喂食机器人给出了以下基于表现的评估:

"I found the feeding robot to be really effective. I thought it was more consistent most of the time."
我发现喂食机器人非常有效。我认为它在大多数时候表现得更一致。

Older individuals, on the other hand, were more likely to evaluate the robot based on other factors unrelated to how well the robot performed the task. We consider the following statement, given by a older individual, to be a non-performance-based judgement of the feeding robot:
另一方面,年长者更倾向于根据与机器人完成任务好坏无关的其他因素来评价机器人。我们认为一位年长者给出的以下陈述,是对喂食机器人的一种非基于性能的判断:

"If someone was in a home with only one caregiver, I wouldn’t use this - I would use the caregiver, it’s more personal."
如果有人只在一个家庭中有一个照护者,我不会使用这个——我会使用照护者,这样更个性化。

To further investigate this, we developed two codes for "Performance-Based" and "Non-Performance-Based" statements and applied them to the open-ended responses in a round of deductive analysis. For the younger adult group, we only apply these codes to statements made about the feeding robot to ensure fair comparison between both populations. We summarise the results of this analysis in Fig. 5.
为了进一步研究这个问题,我们开发了两个代码用于"基于表现"和"非基于表现"的陈述,并将它们应用于一轮演绎分析中的开放式回答。对于年轻成人组,我们仅将这些代码应用于关于喂食机器人的陈述,以确保两组人群之间的公平比较。我们在 Fig. 5 中总结了这项分析的结果。

We observed that, across all older adults at the independent living facility, 83% of their statements regarding their openness to purchasing or using the robot were associated with factors unrelated to performance. For example, the likelihood of acceptance by the user, or concerns about the lack of human interaction. This mirrors how Likert results from older adults, shown in Fig. 4, had no significant differences between any set of responses, even after errors were introduced in the robot feeding trials. Their evaluation of the robot does not appear to be primarily grounded by its performance, so their trust in the system is relatively unaffected by error.
我们观察到,在独立生活设施中的所有老年人中,83%的关于他们购买或使用机器人的开放性陈述与表现无关的因素相关。例如,用户接受的可能性,或对缺乏人际互动的担忧。这与老年人 Likert 结果一致,如图 Fig. 4 所示,即使在机器人喂食试验中引入错误后,任何一组回答之间也没有显著差异。他们对机器人的评价似乎并非主要基于其表现,因此他们对系统的信任相对不受错误的影响。

On the other hand, 48% of responses from individuals in the younger population specifically cited performance and observation of errors when describing whether or not they would consider using or purchasing the robot, while 52% of statements had to do with external factors like general concerns about automation. Compared to older adults, younger individuals’ trust in the robot, as assessed by their Likert item responses in Fig. 4, was more likely to be influenced by errors. Reinforcing this finding, younger adults were also more likely to cite errors when making a later judgment of the system in their open-ended responses. Interestingly, in both groups the proportion of performance-based to non-performance-based comments appears to be roughly aligned with the proportion of participants who did not have any experience with robots, as shown in Fig. 5.
另一方面,在年轻群体中,有 48%的受访者特别提到性能和错误观察,在描述是否会考虑使用或购买机器人时;而 52%的陈述则涉及外部因素,如对自动化的普遍担忧。与老年人相比,年轻人在 Fig. 4 中通过李克特项目反应评估的机器人信任度,更有可能受到错误的影响。这一发现得到了进一步证实,年轻人在开放式回答中对系统做出后期判断时,也更可能提及错误。有趣的是,在两个群体中,基于性能的评论与非基于性能的评论的比例,似乎与没有机器人使用经验的参与者的比例大致一致,如图 Fig. 5 所示。

3) Debrief  3) 总结

As part of the debrief process at the end of the experiment, we asked participants if they had any suspicions that errors made by the robots were not genuine. Of all eighteen participants, only two participants from the younger population explicitly indicated they had been suspicious that researchers had caused the errors intentionally. An additional two, one from each population, indicated that they thought something was off about the robot’s erroneous behavior, but did not think that we had intentionally caused the mistakes. All other participants responded that they had not suspected any kind of manipulation in the robot’s performance.
在实验结束时的总结过程中,我们询问参与者是否怀疑机器人出现的错误并非真实。在所有十八名参与者中,只有两名来自年轻群体的参与者明确表示他们曾怀疑研究人员故意制造了错误。另外两名参与者,每个群体各一名,表示他们认为机器人的错误行为有些不对劲,但并不认为我们故意制造了这些错误。所有其他参与者都表示他们没有怀疑机器人表现中存在任何形式的操纵。

SECTION V.  第五节。

Conclusion  结论

In this work, we conduct a human study in two different age groups to assess how errors made while performing a physically assistive task, in this case robot-assisted bathing and feeding, impact user trust in the robot. For younger adults, we observed a statistically significant decrease in trust towards the robot after errors were introduced in both the bathing and feeding tasks. However, this trust recovered to baseline despite continued errors in the bathing task but did not recover for the feeding task. In contrast, older adults did not have any statistically significant changes in trust towards the robot before and after it began to make mistakes in the feeding task. Thematic analysis of open-ended responses from both groups revealed that older adults, and generally those with no experience with robots, tend to evaluate the robot on factors completely unrelated to performance. Our results suggest that, for both younger and older adults, trust in physically assistive robots can be resilient to errors depending on the assistive task. However, non-performance-based judgments of the robot may ultimately drive user evaluation of the system, especially if the user has less familiarity with robotic systems.
在本研究中,我们针对两个不同年龄组进行了一项人类研究,以评估在执行物理辅助任务(在此案例中为机器人辅助沐浴和喂食)过程中出现的错误如何影响用户对机器人的信任。对于年轻成年人,我们在沐浴和喂食任务中引入错误后观察到对机器人的信任度出现了统计学上显著的下降。然而,尽管沐浴任务中持续出现错误,这种信任度仍恢复到基线水平,但在喂食任务中并未恢复。相比之下,老年人在喂食任务开始出现错误前后,对机器人的信任度没有出现统计学上显著的改变。对两组开放式回答的主题分析表明,老年人,以及通常没有机器人使用经验的群体,倾向于根据与性能完全无关的因素来评价机器人。我们的结果表明,对于年轻和老年成年人,物理辅助机器人的信任度是否能够抵抗错误,取决于辅助任务的具体情况。 然而,非基于性能的机器人判断可能会最终影响用户对系统的评价,尤其是当用户对机器人系统不太熟悉时。

ACKNOWLEDGMENT  致谢

We would like to thank Kenna Embree, Lindsey Efkemann, and Melia Black for their assistance in coordinating this project at Vincentian Terrace Place. We also thank Elizabeth J. Carter for her invaluable insights and feedback.
我们感谢 Kenna Embree、Lindsey Efkemann 和 Melia Black 在 Vincentian Terrace Place 协调此项目时提供的协助。我们还要感谢 Elizabeth J. Carter 提供的宝贵见解和反馈。

 < Previous   |   Back to Results   |   Next > 
< 上一页 | 返回结果 | 下一页 >

References

References is not available for this document.