Data-efficient multimodal human action recognition for proactive human–robot collaborative assembly: A cross-domain few-shot learning approach

التفاصيل البيبلوغرافية
العنوان: Data-efficient multimodal human action recognition for proactive human–robot collaborative assembly: A cross-domain few-shot learning approach
المؤلفون: Wang, Tianyu, Liu, Zhihao, Wang, Lihui, Li, Mian, Wang, Xi Vincent, Dr., 1985
المصدر: Robotics and Computer-Integrated Manufacturing. 89
مصطلحات موضوعية: Cross-domain few-shot learning, Data-efficient, Human action recognition, Human–robot collaborative assembly, Multimodal
الوصف: With the recent vision of Industry 5.0, the cognitive capability of robots plays a crucial role in advancing proactive human–robot collaborative assembly. As a basis of the mutual empathy, the understanding of a human operator's intention has been primarily studied through the technique of human action recognition. Existing deep learning-based methods demonstrate remarkable efficacy in handling information-rich data such as physiological measurements and videos, where the latter category represents a more natural perception input. However, deploying these methods in new unseen assembly scenarios requires first collecting abundant case-specific data. This leads to significant manual effort and poor flexibility. To deal with the issue, this paper proposes a novel cross-domain few-shot learning method for data-efficient multimodal human action recognition. A hierarchical data fusion mechanism is designed to jointly leverage the skeletons, RGB images and depth maps with complementary information. Then a temporal CrossTransformer is developed to enable the action recognition with very limited amount of data. Lightweight domain adapters are integrated to further improve the generalization with fast finetuning. Extensive experiments on a real car engine assembly case show the superior performance of proposed method over state-of-the-art regarding both accuracy and finetuning efficiency. Real-time demonstrations and ablation study further indicate the potential of early recognition, which is beneficial for the robot procedures generation in practical applications. In summary, this paper contributes to the rarely explored realm of data-efficient human action recognition for proactive human–robot collaboration.
وصف الملف: print
الوصول الحر: https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-346813Test
https://doi.org/10.1016/j.rcim.2024.102785Test
قاعدة البيانات: SwePub
الوصف
تدمد:07365845
18792537
DOI:10.1016/j.rcim.2024.102785