Koushi Hiraoka (Graduate School of Information Science and Electrical Engineering) ‘s paper has been accepted for IEEE Access.
Congratulations!
Authors
Koushi Hiraoka, Yugo Nakamura, Yutaka Arakawa
Affiliation
Graduate School of Information Science and Electrical Engineering,
Department of Information Science and Technology
Manuscript Title
EdgeVLM as a Privacy Filter: Towards Privacy-Aware Activity Recognition from Wearable Camera Using Image Captions
Abstract
Egocentric video captured by wearable cameras offers rich contextual information for recognizing human activities in daily life. However, such video often includes sensitive personal details that must be protected from external threats. This creates a fundamental trade-off between preserving data utility and ensuring privacy, particularly in scenarios where continuous activity monitoring is required. In this study, we explore the concept of using EdgeVLM—a vision-language model designed to run entirely on edge devices—as a privacy filter for wearable camera data. We investigate the impact of caption granularity and demonstrate that our method locally transforms egocentric video into semantically rich textual image captions, enabling activity detection without transmitting raw visual content to the cloud. This edge-based processing preserves contextual cues while minimizing privacy risks through data minimization. To evaluate this approach, we conducted a quantitative user study (N=88) that found EdgeVLM-generated captions notably decreased participants’ privacy concerns compared to raw, blurred, or cartoonized images. Critically, from a bystander’s perspective, the proposed method demonstrated privacy protection levels statistically comparable to canny edge detection. Additionally, when combined with accelerometer data, the caption-based method achieved a 77.2% accuracy in recognizing desk activities such as typing, mousing, swiping, drinking, and writing—effectively replacing pixel-level visual information with text while maintaining performance comparable to models using unfiltered visuals. These findings indicate that EdgeVLM-based image captioning is a promising privacy-conscious solution for wearable camera applications, facilitating continuous activity recognition while protecting user privacy at the edge.
Journal name
IEEE Access
Relevant SDGs
SDGS 12 (Responsible consumption and production)
Comments
This paper addresses the trade-off between privacy and recognition accuracy in activity recognition using wearable cameras. A key contribution of this work is the application of a Local VLM as a privacy filter, with a specific focus on satisfying users’ subjective privacy requirements. By utilizing locally generated captions, our approach significantly enhances user assurance compared to other privacy filters, while achieving recognition accuracy comparable to that of raw, unprocessed images.