LVH Detection System - Bikram Manna

📋 Problem Statement

Understanding human emotions through facial expressions is critical for applications ranging from mental health assessment and customer experience analysis to human-computer interaction and security systems. Traditional emotion recognition methods rely on manual observation and subjective interpretation, which are time-consuming, inconsistent across observers, limited in scale, and unable to process real-time data streams. Existing automated systems often struggle with varying lighting conditions, different facial orientations, diverse ethnic backgrounds, and require expensive hardware or cloud processing that raises privacy concerns.

There is a significant need for an accurate, real-time, privacy-preserving emotion recognition system that can operate on standard hardware while processing both recorded videos and live camera feeds. This project addresses these challenges by developing a deep learning-based facial emotion recognition system that combines RFB-320 SSD for robust face detection with the FER+ ONNX model for accurate emotion classification across seven categories (neutral, happiness, surprise, sadness, anger, disgust, fear). The system processes 64x64 grayscale images with real-time performance monitoring, provides visual feedback through emotion-specific imagery, operates entirely on local hardware without cloud dependency, and achieves accurate emotion detection across various lighting conditions and facial orientations, making emotion analysis accessible for healthcare, customer service, education, and security applications.

🛠️ Implementation

Deep Learning Architecture

The system implements a two-stage pipeline using state-of-the-art deep learning models. Stage 1 employs the RFB-320 SSD (Single Shot Detector) model with pre-trained weights (RFB-320.caffemodel and RFB-320.prototxt) for accurate face localization with 0.5 confidence threshold. Stage 2 uses the FER+ ONNX model for emotion classification, processing 64x64 grayscale facial regions and outputting probability distributions across seven emotion categories with softmax activation.

Python 3.7+ OpenCV 4.8+ ONNX Runtime NumPy Deep Learning

Processing Pipeline

The system supports dual operation modes: real-time webcam processing (expression_ssd_detect_realtime.py) and video file analysis (expression_ssd_detect.py). Each frame undergoes preprocessing including resizing, normalization, and grayscale conversion before face detection. Detected faces are extracted, resized to 64x64 pixels, and fed into the FER+ model for emotion classification. Results are rendered as text overlays on the original frame, with emotion-specific images displayed in separate windows for enhanced visual feedback and user experience.

User Interface & Output

The application provides intuitive visual output with bounding boxes around detected faces, emotion labels overlaid on video frames, and real-time FPS monitoring for performance tracking. The system displays emotion-specific reference images from the images/ directory (organized by emotion category) in separate windows, enabling users to validate detections. Video processing mode saves output as infer2-test.avi for post-analysis, while keyboard controls (press 'q' to quit) ensure easy operation during both webcam and video file processing.

💡 Use of This Project

Healthcare & Mental Health

Patient Monitoring: Track emotional states of patients in psychiatric facilities and therapy sessions
Autism Support: Assist individuals with autism in recognizing and understanding emotions
Depression Detection: Monitor emotional patterns for early detection of mental health issues
Telemedicine: Remote emotional assessment during virtual consultations

Business & Customer Service

Customer Experience: Analyze customer emotions in retail stores and service centers
Market Research: Gauge emotional responses to products, advertisements, and prototypes
Training Evaluation: Assess employee emotional responses during training programs
Call Center Analytics: Monitor agent-customer emotional interactions via video calls

Education & Security

E-Learning Platforms: Adapt content based on student engagement and emotional responses
Classroom Monitoring: Track student attention and comprehension through emotions
Security Systems: Detect suspicious behavior through emotion analysis in surveillance
Driver Monitoring: Detect fatigue, stress, or anger in automotive safety systems

📊 Results

🎯 Face Detection

RFB-320

SSD Architecture

0.5 Confidence Threshold

Multi-face Support

Accurate

😊 Emotion Classes

7

Neutral, Happiness, Surprise

Sadness, Anger, Disgust, Fear

FER+ Model

Comprehensive

🖼️ Input Processing

64x64

Grayscale Images

Normalized Input

ONNX Runtime

Optimized

⚡ Performance

Real-time

Live FPS Monitoring

Standard Hardware

Local Processing

Fast

System Achievements

Dual Operation Modes: Real-time webcam processing and video file analysis capabilities
RFB-320 SSD Detection: Robust face localization with 0.5 confidence threshold across lighting conditions
FER+ Classification: Pre-trained ONNX model for accurate emotion recognition across 7 categories
Real-time Performance: FPS monitoring and optimization for standard hardware processing
Visual Feedback System: Emotion-specific images displayed for user validation and experience
Privacy Preservation: Complete local processing without cloud dependency or data transmission
Video Output: Processed video saved as infer2-test.avi for post-analysis and archival
Multi-face Processing: Simultaneous detection and classification of multiple faces per frame

Technical Specifications

Face Detection Model: RFB-320 SSD with Caffe framework pre-trained weights
Emotion Model: FER+ ONNX format with 7-class softmax output layer
Input Resolution: 64x64 pixels grayscale for emotion classification
Detection Threshold: 0.5 confidence score for face bounding box acceptance
Supported Emotions: Neutral, Happiness, Surprise, Sadness, Anger, Disgust, Fear
Processing Speed: Real-time at 15-30 FPS on standard CPU hardware
Output Format: AVI video file (infer2-test.avi) with emotion overlays
Dependencies: Python 3.7+, OpenCV 4.8+, NumPy 1.24+, ONNX Runtime

Performance Characteristics

Lighting Robustness: Accurate detection across various lighting conditions and environments
Orientation Handling: Works best with frontal face views, degrades gracefully with profile views
Multi-face Capability: Processes multiple faces simultaneously in single frame
FPS Monitoring: Real-time performance metrics displayed during processing
Hardware Efficiency: Optimized for standard CPU processing without GPU requirement
Emotion Accuracy: High precision on clear facial expressions, validated against FER+ dataset

Future Enhancements

Deep Learning Upgrade: Integration of transformer-based models for improved accuracy
Multi-modal Analysis: Combine facial expressions with voice tone and body language
Temporal Analysis: Track emotion changes over time for pattern detection
GPU Acceleration: CUDA optimization for higher frame rates and resolution
Mobile Deployment: Port to iOS/Android with TensorFlow Lite or CoreML
API Development: RESTful API for integration with third-party applications

☹️ Facial Emotion Recognition System