Abstract:
Nowadays, a significant part of our time is spent sharing multimodal data on social media sites such as Instagram, Facebook and Twitter. The particular way through which users present themselves to social media can provide useful insights into their behaviours, personalities, perspectives, motives and needs. This article proposes to use multimodal data collected from Instagram accounts to predict the five basic prototypical needs described in Glasser's choice theory (i.e., Survival , Power , Freedom , Belonging , and Fun ). We automate the identification of the unconsciously perceived needs from Instagram profiles by using both visual and textual contents. The proposed approach aggregates the visual and textual features extracted using deep learning and constructs a homogeneous representation for each profile through the proposed Bag-of-Content . Finally, we perform multi-label classification on the fusion of both modalities. We validate our proposal on a large database, consensually annotated by two expert psychologists, with more than 30,000 images, captions and comments. Experiments show promising accuracy and complementary information between visual and textual cues.