Real-time facial emotion detection system using deep learning for classifying 7 emotion categories from video or live camera
Understanding human emotions through facial expressions is critical for applications ranging from mental health assessment and customer experience analysis to human-computer interaction and security systems. Traditional emotion recognition methods rely on manual observation and subjective interpretation, which are time-consuming, inconsistent across observers, limited in scale, and unable to process real-time data streams. Existing automated systems often struggle with varying lighting conditions, different facial orientations, diverse ethnic backgrounds, and require expensive hardware or cloud processing that raises privacy concerns.
There is a significant need for an accurate, real-time, privacy-preserving emotion recognition system that can operate on standard hardware while processing both recorded videos and live camera feeds. This project addresses these challenges by developing a deep learning-based facial emotion recognition system that combines RFB-320 SSD for robust face detection with the FER+ ONNX model for accurate emotion classification across seven categories (neutral, happiness, surprise, sadness, anger, disgust, fear). The system processes 64x64 grayscale images with real-time performance monitoring, provides visual feedback through emotion-specific imagery, operates entirely on local hardware without cloud dependency, and achieves accurate emotion detection across various lighting conditions and facial orientations, making emotion analysis accessible for healthcare, customer service, education, and security applications.
The system implements a two-stage pipeline using state-of-the-art deep learning models. Stage 1 employs the RFB-320 SSD (Single Shot Detector) model with pre-trained weights (RFB-320.caffemodel and RFB-320.prototxt) for accurate face localization with 0.5 confidence threshold. Stage 2 uses the FER+ ONNX model for emotion classification, processing 64x64 grayscale facial regions and outputting probability distributions across seven emotion categories with softmax activation.
The system supports dual operation modes: real-time webcam processing (expression_ssd_detect_realtime.py) and video file analysis (expression_ssd_detect.py). Each frame undergoes preprocessing including resizing, normalization, and grayscale conversion before face detection. Detected faces are extracted, resized to 64x64 pixels, and fed into the FER+ model for emotion classification. Results are rendered as text overlays on the original frame, with emotion-specific images displayed in separate windows for enhanced visual feedback and user experience.
The application provides intuitive visual output with bounding boxes around detected faces, emotion labels overlaid on video frames, and real-time FPS monitoring for performance tracking. The system displays emotion-specific reference images from the images/ directory (organized by emotion category) in separate windows, enabling users to validate detections. Video processing mode saves output as infer2-test.avi for post-analysis, while keyboard controls (press 'q' to quit) ensure easy operation during both webcam and video file processing.