Food related scene recognition in egocentric images

التفاصيل البيبلوغرافية
العنوان: Food related scene recognition in egocentric images
المؤلفون: Leyva Vallina, María
المساهمون: Radeva, Petia
المصدر: Recercat. Dipósit de la Recerca de Catalunya
instname
UPCommons. Portal del coneixement obert de la UPC
Universitat Politècnica de Catalunya (UPC)
بيانات النشر: Universitat Politècnica de Catalunya, 2017.
سنة النشر: 2017
مصطلحات موضوعية: Neural networks (Computer science), scene recognition, reconeixement d'escena, Informàtica [Àrees temàtiques de la UPC], aprenentatge profund, Visió per ordinador, Xarxes neuronals (Informàtica), lifelogging, deep learning, egocentric images, imatges egocèntriques, computer vision, visió computacional
الوصف: En col·laboració amb la Universitat de Barcelona (UB) Lifelogging is a raising field nowadays with the normalization of many devices that collect data from our daily routines. Egocentric cameras are particularly interesting devices that allow us to capture very rich information about the life of the wearer, including his/her social interactions, activities and contexts where he or she spends the day. Context or scene is one of the things that influences us most, in almost every aspect of our lives, and also one of the most challenging things to log, analyze and visualize with an automatic device. But, among all kind of contexts, one of the most important is the one related with food. We are what we eat, and we eat depending on where we are. So, in order to keep track of a person’s relation with food related environments, we are going to propose a deep learning based approach in order to perform food related scene recognition in images gathered from an egocentric camera. We explore in detail and propose an optimal framework for food related environment recognition. Moreover, we introduce a new egocentric dataset called Egoplaces, that contains over 60.000 thousand labeled images distributed in 28 categories, corresponding to 27 food related scenes and one non food related, and we propose several techniques to automatically classify the environment the user is seeing. We had to face several challenges, including a small amount of images, images with small range of view and noise, and, particularly, the problem of having a very unbalanced dataset. We propose several techniques to deal with it, using deep convolutional networks to do the classification, and varying the training strategy. We explore the possibilities of learning incrementally by doing several training iterations introducing new categories in each, choosing the most frequent labels first. We also propose a hierarchical learning strategy, by exploiting the semantic relations among the labels, and learning from less to more specific. We explore the possibility of applying Bayesian inference when doing hierarchical classification. Finally, we propose to introduce repeated images in our dataset in order to overcome the unbalanced problem, and a post-classification smoothing technique based on K-Nearest Neighbours algorithm that exploits the fact of egocentric images coming in a sequence.
وصف الملف: application/pdf
اللغة: English
الوصول الحر: https://explore.openaire.eu/search/publication?articleId=dedup_wf_001::e4d90885e24f48b0e8eae12f60681325Test
https://hdl.handle.net/2117/105658Test
حقوق: OPEN
رقم الانضمام: edsair.dedup.wf.001..e4d90885e24f48b0e8eae12f60681325
قاعدة البيانات: OpenAIRE