Xiaoxuan HEI

Ph.D. candidate
ENSTA, Institut Polytechnique de Paris

Home

Research

CV

Teaching

Personal

Social

Contact:
Office R228
ENSTA - U2IS
828, Boulevard des Maréchaux,
91762 Palaiseau Cedex

Publications

Investigating the Impact of Humor on Learning in Robot-Assisted Education
Xiaoxuan Hei, Heng Zhang and Adriana Tapus
2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Social robots have shown significant potential in enhancing learning experiences, and humor has been proven to be beneficial for learning. This study investigates the impact of both the presence and timing of humor on students’ learning outcomes and overall learning experience. A total of 24 participants were randomly assigned to one of the three conditions: (C1) interact with a robot with no humor, (C2) interact with a robot with humor at pre-defined moments during the lesson, and (C3) interact with a robot that triggers humor based on engagement levels. The results revealed that the humor at pre-defined moments condition (C2) led to significantly better learning outcomes and longer interaction times compared to the other two conditions. While the adaptive humor in Condition C3 did not significantly outperform Condition C1, it showed positive effects on participants' perceived learning effectiveness and engagement. These findings contribute to the understanding of how humor, when strategically timed, can enhance the effectiveness of social robots in educational settings.


Learning from Human Conversations: A Seq2Seq based Multi-modal Robot Facial Expression Reaction Framework in HRI
Zhegong Shangguan, Xiaoxuan Hei, Fangjun Li, Chuang Yu, Siyang Song, Jianzhuang Zhao, Angelo Cangelosi and Adriana Tapus
2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Nonverbal communication plays a crucial role in both human-human and human-robot interactions (HRIs), where facial expressions convey emotions, intentions and trust. Enabling humanoid robots to generate human-style facial reactions in response to human speech and facial behaviours remains significant challenges. In this work, we leverage human-human interaction (HHI) datasets to train a humanoid robot, facilitating it to learn and imitate facial reactions to both speech and facial expression inputs. Specifically, we extend a sequence-to-sequence (Seq2Seq)-based framework that enables robots to simulate human-style virtual facial expressions that are appropriate for responding to the perceived human user behaviours. Then, we propose a deep neural network-based motor mapping model to translate these expressions into physical robot movements. Experiments demonstrate that our facial reaction–motor mapping framework successfully enables robotic self-reactions to various human behaviours, where our model can best predict 50 frames (two seconds) of facial reactions in response to the input user behaviour of the same duration, aligning with human cognitive and neuromuscular processes. Our code is provided at https://github.com/mrsgzg/Robot_Face_Reaction.


Estimating User Engagement in Human Robot Interaction Using a Dynamic Bayesian Network
Xiaoxuan Hei, Heng Zhang and Adriana Tapus
2025 IEEE International Conference on Robotics and Automation (ICRA)

Engagement is a key concept in Human-Robot Interaction (HRI), as high engagement often leads to improved user experience and task performance. However, accurately estimating engagement during interactions is challenging. In this study, we propose a Dynamic Bayesian Network (DBN) to infer user engagement from various modalities, including head rotation, eye movements, facial expressions captured through visual sensors, as well as facial temperature variations measured by a thermal camera. Data was gathered from a human-robot interaction (HRI) experiment, where a robot guided participants and encouraged them to share their thoughts and insights on environmental issues. Our approach successfully combines these diverse features to offer a thorough assessment of user engagement. The network was tested on its capacity to classify participants as either engaged or not engaged, achieving an accuracy of 0.83 and an Area Under the Curve (AUC) of 0.82. These findings underscore the strength of our DBN in detecting user engagement during interactions.


“Oh! It’s Fun Chatting with You!” A Humor-aware Social Robot Chat Framework
Heng Zhang, Adnan Saood, Juan Jose Garcia Cardenas, Xiaoxuan Hei and Adriana Tapus
2025 IEEE International Conference on Robotics and Automation (ICRA)

Humor is a key element in human interactions, essential for building connections and rapport. To enhance human-robot communication, we developed a humor-aware chat framework that enables robots to deliver contextually appropriate humor. This framework takes into account the interaction environment, and user’s profile as well as emotional state. Two GPT models are used to generate responses. The initial one, named sensor-GPT, processes contextual data from the sensor along with the user’s response and conversation history to create prompts for the second one, chat-GPT. These prompts can guide the model on how to integrate appropriate humor elements into the conversation, ensuring that the dialogue is both contextually relevant and humorous. Our experiment compared the effectiveness of humor expression between our framework and the GPT-4o model. The results demonstrate that robots using our framework significantly outperform those using GPT-4o in humor expression, extending conversations, and improving overall interaction quality.


Exploring Cognitive Load Dynamics in Human-Machine Interaction for Teleoperation: A User-Centric Perspective on Remote Operation System Design
Juan Jose Garcia Cardenas, Xiaoxuan Hei and Adriana Tapus
2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Teleoperated robots, especially in hazardous environments, integrate human cognition with machine efficiency, but can increase cognitive load, causing stress and reducing task performance and safety. This study examines the impact of the information available to the operator on cognitive load, physiological responses (e.g., GSR, blinking, facial temperature), and performance during teleoperation in three conditions: C1 - in presence, C2 - remote with Visual feedback, and C3 - remote with telepresence robot. The findings from our user study involving 20 participants show that information availability significantly impacts perceived cognitive load, as evidenced by the differences observed between conditions in our analysis. Furthermore, the results indicated that blinking rates varied significantly among the conditions. The results also underline that individuals with higher error scores on the spatial orientation test (SOT), reflecting lower spatial ability, are more likely to experience failure in conditions 2 and 3. The results show that information availability significantly affects cognitive load and teleoperation performance, especially depth perception of the robot’s actions. Additionally, the thermal and GSR data findings indicate an increase in stress and anxiety levels when operators perform conditions 2 and 3, thus corroborating an increase in the user’s cognitive load.


Exploring Help-Seeking Behavior, Performance, and Cognitive Load in Individual Tutoring: A Comparative Study between Human Tutors and Social Robots
Xiaoxuan Hei, Heng Zhang and Adriana Tapus
2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN)

Social robots have become increasingly prevalent in the context of one-on-one tutoring, serving as effective educational aids. In response to this trend, the present study was devised to conduct a comparative analysis between human tutors and robot tutors. Additionally, the study aims to investigate how varying previous knowledge in robots influence students’ tendencies for help-seeking. By examining the performance and physiological signals of participants, this research seeks to provide valuable insights into the effectiveness of social robots in educational contexts. 21 participants were divided into three groups, each seeking assistance from a human tutor (HT), seeking help from a robot without any prior knowledge of robots (RT1), and seeking help from a robot after gaining some understanding of its capabilities (RT2). Our results demonstrated that participants sought more help from robot than from human and participants in Group RT2 performed better than participants in Group RT1. However, participants experienced greater cognitive load when interacting with a robot tutor compared to interacting with a human tutor. Future work could focus on developing interventions to alleviate students’ cognitive load during interactions with robot tutors.


Robot Laughter: Does an appropriate laugh facilitate the robot’s humor performance?
Heng Zhang, Xiaoxuan Hei, Junpei Zhong and Adriana Tapus
2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN)

Laughter serves as a subtle social signal in human interaction, playing an essential role in expressing emotions and facilitating social connections. However, laughter comes in various forms and is usually accompanied by different non-verbal expressions, such as facial expressions and gestures. These accompanying factors can significantly influence the effect of laughter in diverse contexts, thus complicating the research on laughter, especially in understanding its role in social dynamics. Consequently, endowing robots with the ability to appropriately use laughter in interactions with humans is still a big challenge. Our current study focuses on the effect of robot laughter on robot humor expression. Our objective is to investigate whether and how two factors, the type of laughter and the robot laughter gesture, impact the overall humor performance. In this study, we selected four types of laughter (sarcastic, joyful, embarrassed, and relieved laughter) from a laughter corpus based on four specific types of jokes (Affiliative, Aggressive, Self-enhancing, and Self-defeating). For each type of laughter, we designed distinct robot gestures. During the humor performance, the robot NAO delivered jokes accompanied by matching or mismatching laughter, with or without corresponding gestures. To enhance the quantity and diversity of experimental data, we conducted an online survey utilizing recordings of the robot’s humor performance. The experimental findings indicate that when the robot’s laughter matches the type of humor in the joke, participants rate the humor performance significantly higher compared to situations where there is a mismatch. Additionally, the results confirm the positive impact of robot laughter gestures on humor performance.


Toward a Multi-dimensional Humor Dataset for Social Robots
Heng Zhang, Xiaoxuan Hei, Juan Jose Garcia Cardenas, Xin Miao and Adriana Tapus
2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN)

Expressing humor in social interactions presents a significant challenge for humans due to its intricate linguistic nature. This complexity is further magnified when teaching robots to express humor appropriately. Among the various expressions of humor, jokes are one of the most commonly used. Therefore, a well-annotated joke dataset holds significant promise in enhancing a robot’s ability to express humor effectively. This paper introduces a dataset comprising over two thousand jokes, with the aim of providing rich material and multidimensional selection criteria for the humor expression of the robot. The creation of this joke dataset involved a collaborative effort among robot experts studying HRI, psychologists with rich humor research experience, and GPT-3. The annotation process primarily concentrated on four dimensions within the dataset: the humor style of jokes, semantic words (aligned with semantic gestures), keywords, and ratings of joke funniness. We additionally outline several prospective applications of this dataset. We introduce a BERT-based neural network model trained on the dataset with semantic word labels. This model aims to empower robots to choose suitable semantic words from jokes and articulate them alongside corresponding semantic gestures. Moreover, we offer suggestions for utilizing jokes from this dataset to facilitate the adaptive expression of humor by social robots. These endeavors will further enhance the multi-modal humor expression capability of social robots.


A bilingual social robot with sign language and natural language
Xiaoxuan Hei, Chuang Yu, Heng Zhang and Adriana Tapus
HRI '24: Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction

In situations where both deaf and non-deaf individuals are present in a public setting, it would be advantageous for a robot to communicate using both sign and natural languages simultaneously. This would not only address the needs for diverse users but also pave the way for a richer and more inclusive spectrum of human-robot interactions. To achieve this, a framework for a bilingual robot has been proposed in this paper. The robot exhibits the ability to articulate messages in spoken language, complemented by non-verbal cues such as expressive gestures, all while concurrently conveying information through sign language. The system can generate natural language expressions with speech audio, spontaneous prosody-based gestures, and sign language displayed on a virtual avatar on a robot's screen. The preliminary findings from this research showcase the robot's capacity to seamlessly blend natural language expressions with synchronized gestures and sign language, underlining its potential to revolutionize communication dynamics in diverse settings.


Semantic gesture in robot humor: A new dimension to enhance the humor expression
Heng Zhang, Chuang Yu, Xiaoxuan Hei and Adriana Tapus
HRI '24: Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction

Humor is pervasive in our daily life. It serves not only to build rapport but also to ease tension during interactions and create stronger social interactions. In the realm of Human-Robot Interaction (HRI), humor also plays a vital role in fostering engaging and positive interactions. However, endowing robots with the ability to express humor appropriately is still a challenge. Drawing inspiration from pantomime and sign language humor, our research focuses on the role of semantic gestures in the social robot's expression of humor. In this work, we conducted an experiment in which NAO robot made humorous performances by using a series of semantic gestures. The results of the online survey show that the semantic gesture can significantly enhance the degree of funniness of the robot humor performance. Furthermore, the impact of the semantic gesture is closely tied to both the clarity of its expression and the appropriateness of the chosen semantic words.


Evaluating Students’ Experiences in Hybrid Learning Environments: A Comparative Analysis of Kubi and Double Telepresence Robots
Xiaoxuan Hei, Valentine Denis, Pierre-Henri Oréfice, Alia Afyouni, Paul Laborde, Damien Legois, Ioana Ocnarescu, Margarita Anastassova and Adriana Tapus
2023 15th International Conference on Social Robotics (ICSR)

Amidst the Covid-19 pandemic, distance learning was employed on an unprecedented level. As the lockdown measures have eased, it has become a parallel option alongside traditional in-person learning. Nevertheless, the utilization of basic videoconferencing tools such as Zoom, Microsoft Teams, and Google Meet comes with a multitude of constraints that extend beyond technological aspects. These limitations are intricately linked with human behavior, psychology, but also pedagogy, drastically changing the interactions that take place during learning. Telepresence robots have been widely used due to their advantages in enhancing a sense of in-person. To investigate the opportunities, the impact, and the risks associated with the usage of telepresence robots in an educational context, we conducted an experiment in a real setting, in the specific use case of a design school and a project-based class. We are interested in the experience of a classroom and the relationships between a distance student, his/her peers, and the professor/instructor. This study employed two types of robots: a Kubi robot (a semi-static tablet-based system) and a Double robot (a mobile telepresence robot). The primary objective was to ascertain the perceptions and experiences of both remote and in-person students during their interaction with these robots. The results of the study demonstrate a marked preference among students for the Double robot over the Kubi, as indicated by their feedback.


Robots in education: Influence of Regulatory Focus Theory
Xiaoxuan Hei, Heng Zhang and Adriana Tapus
2023 32nd IEEE International Conference on Robot and Human Interactive Communication (ROMAN)

The Covid-19 pandemic has massively developed the use of distance learning. The limits of this practice have gradually come to light, both for students and for teachers. It is now crucial to design alternative solutions to overcome the shortcomings of videoconferencing in terms of involvement, concentration, learning, and equity. Social robots are increasingly used as tutors in the educational context and help improve teaching efficiency. Many psychology-based principles have been applied in education to guide instructional strategies, motivate students, and create a positive and productive learning environment. In this work, we use Regulatory Focus Theory (RFT), which categorizes an individual’s motivation into two types: Promotion and Prevention. Promotion-focused individuals are motivated by the potential for growth and achievement, whereas prevention-focused individuals are motivated by the potential for avoiding negative outcomes. Based on RFT, we aim to explore if and how the regulatory-focused behavior of the tutor robot can affect participants’ learning outcomes. In this work, a language learning scenario was designed with two conditions: (1) a robot tutor with promotion-focused behavior, (2) a robot tutor with prevention-focused behavior. The results are encouraging and support that promotion robot tutor can increase the learning efficiency of promotion participants and prevention robot tutor will enhance the learning interest of prevention participants.


Speech-Driven Robot Face Action Generation with Deep Generative Model for Social Robots
Chuang Yu, Heng Zhang, Zhegong Shangguan, Xiaoxuan Hei, Angelo Cangelosi and Adriana Tapus
2022 14th International Conference on Social Robotics (ICSR)

The natural co-speech facial action as a kind of non-verbal behavior plays an essential role in human communication, which also leads to a natural and friendly human-robot interaction. However, a lot of previous works for robot speech-based behaviour generation are rule-based or handcrafted methods, which are time-consuming and with limited synchronization levels between the speech and the facial action. Based on the Generative Adversarial Networks (GAN) model, this paper developed an effective speech-driven facial action synthesizer, i.e., given an acoustic speech, a synchronous and realistic 3D facial action sequence is generated. In addition, a mapping between the 3D human facial action to the real robot facial action that regulates Zeno robot facial expressions is also completed. The evaluation results show the model has potential for natural human-robot interaction.



Review

HRI 2024

ICRA 2024 2025

IROS 2024 2025

ROMAN 2024 2025

ICDL 2025

ICSR 2025

TAROS 2025

Journal: Advanced Robotics



Organization

Workshop IROS 2025

Enhancing Human Engagement in Social Assistive Robotics: Exploring Interaction, Cognitive Load, and Adaptive Support

Workshop ICSR 2025

Cognitive Load and Engagement in Human-Robot Collaboration: From Robots in Hazardous Environments to Educational Applications