â˜šī¸ Facial Emotion Recognition System

Real-time facial emotion detection system using deep learning for classifying 7 emotion categories from video or live camera

Face Recognition Attendance System
View Code View Results

📋 Problem Statement

Understanding human emotions through facial expressions is critical for applications ranging from mental health assessment and customer experience analysis to human-computer interaction and security systems. Traditional emotion recognition methods rely on manual observation and subjective interpretation, which are time-consuming, inconsistent across observers, limited in scale, and unable to process real-time data streams. Existing automated systems often struggle with varying lighting conditions, different facial orientations, diverse ethnic backgrounds, and require expensive hardware or cloud processing that raises privacy concerns.

There is a significant need for an accurate, real-time, privacy-preserving emotion recognition system that can operate on standard hardware while processing both recorded videos and live camera feeds. This project addresses these challenges by developing a deep learning-based facial emotion recognition system that combines RFB-320 SSD for robust face detection with the FER+ ONNX model for accurate emotion classification across seven categories (neutral, happiness, surprise, sadness, anger, disgust, fear). The system processes 64x64 grayscale images with real-time performance monitoring, provides visual feedback through emotion-specific imagery, operates entirely on local hardware without cloud dependency, and achieves accurate emotion detection across various lighting conditions and facial orientations, making emotion analysis accessible for healthcare, customer service, education, and security applications.

đŸ› ī¸ Implementation

Deep Learning Architecture

The system implements a two-stage pipeline using state-of-the-art deep learning models. Stage 1 employs the RFB-320 SSD (Single Shot Detector) model with pre-trained weights (RFB-320.caffemodel and RFB-320.prototxt) for accurate face localization with 0.5 confidence threshold. Stage 2 uses the FER+ ONNX model for emotion classification, processing 64x64 grayscale facial regions and outputting probability distributions across seven emotion categories with softmax activation.

Python 3.7+ OpenCV 4.8+ ONNX Runtime NumPy Deep Learning

Processing Pipeline

The system supports dual operation modes: real-time webcam processing (expression_ssd_detect_realtime.py) and video file analysis (expression_ssd_detect.py). Each frame undergoes preprocessing including resizing, normalization, and grayscale conversion before face detection. Detected faces are extracted, resized to 64x64 pixels, and fed into the FER+ model for emotion classification. Results are rendered as text overlays on the original frame, with emotion-specific images displayed in separate windows for enhanced visual feedback and user experience.

User Interface & Output

The application provides intuitive visual output with bounding boxes around detected faces, emotion labels overlaid on video frames, and real-time FPS monitoring for performance tracking. The system displays emotion-specific reference images from the images/ directory (organized by emotion category) in separate windows, enabling users to validate detections. Video processing mode saves output as infer2-test.avi for post-analysis, while keyboard controls (press 'q' to quit) ensure easy operation during both webcam and video file processing.

💡 Use of This Project

Healthcare & Mental Health

  • Patient Monitoring: Track emotional states of patients in psychiatric facilities and therapy sessions
  • Autism Support: Assist individuals with autism in recognizing and understanding emotions
  • Depression Detection: Monitor emotional patterns for early detection of mental health issues
  • Telemedicine: Remote emotional assessment during virtual consultations

Business & Customer Service

  • Customer Experience: Analyze customer emotions in retail stores and service centers
  • Market Research: Gauge emotional responses to products, advertisements, and prototypes
  • Training Evaluation: Assess employee emotional responses during training programs
  • Call Center Analytics: Monitor agent-customer emotional interactions via video calls

Education & Security

  • E-Learning Platforms: Adapt content based on student engagement and emotional responses
  • Classroom Monitoring: Track student attention and comprehension through emotions
  • Security Systems: Detect suspicious behavior through emotion analysis in surveillance
  • Driver Monitoring: Detect fatigue, stress, or anger in automotive safety systems

📊 Results

đŸŽ¯ Face Detection
RFB-320
SSD Architecture
0.5 Confidence Threshold
Multi-face Support
Accurate
😊 Emotion Classes
7
Neutral, Happiness, Surprise
Sadness, Anger, Disgust, Fear
FER+ Model
Comprehensive
đŸ–ŧī¸ Input Processing
64x64
Grayscale Images
Normalized Input
ONNX Runtime
Optimized
⚡ Performance
Real-time
Live FPS Monitoring
Standard Hardware
Local Processing
Fast

System Achievements

  • Dual Operation Modes: Real-time webcam processing and video file analysis capabilities
  • RFB-320 SSD Detection: Robust face localization with 0.5 confidence threshold across lighting conditions
  • FER+ Classification: Pre-trained ONNX model for accurate emotion recognition across 7 categories
  • Real-time Performance: FPS monitoring and optimization for standard hardware processing
  • Visual Feedback System: Emotion-specific images displayed for user validation and experience
  • Privacy Preservation: Complete local processing without cloud dependency or data transmission
  • Video Output: Processed video saved as infer2-test.avi for post-analysis and archival
  • Multi-face Processing: Simultaneous detection and classification of multiple faces per frame

Technical Specifications

  • Face Detection Model: RFB-320 SSD with Caffe framework pre-trained weights
  • Emotion Model: FER+ ONNX format with 7-class softmax output layer
  • Input Resolution: 64x64 pixels grayscale for emotion classification
  • Detection Threshold: 0.5 confidence score for face bounding box acceptance
  • Supported Emotions: Neutral, Happiness, Surprise, Sadness, Anger, Disgust, Fear
  • Processing Speed: Real-time at 15-30 FPS on standard CPU hardware
  • Output Format: AVI video file (infer2-test.avi) with emotion overlays
  • Dependencies: Python 3.7+, OpenCV 4.8+, NumPy 1.24+, ONNX Runtime

Performance Characteristics

  • Lighting Robustness: Accurate detection across various lighting conditions and environments
  • Orientation Handling: Works best with frontal face views, degrades gracefully with profile views
  • Multi-face Capability: Processes multiple faces simultaneously in single frame
  • FPS Monitoring: Real-time performance metrics displayed during processing
  • Hardware Efficiency: Optimized for standard CPU processing without GPU requirement
  • Emotion Accuracy: High precision on clear facial expressions, validated against FER+ dataset

Future Enhancements

  • Deep Learning Upgrade: Integration of transformer-based models for improved accuracy
  • Multi-modal Analysis: Combine facial expressions with voice tone and body language
  • Temporal Analysis: Track emotion changes over time for pattern detection
  • GPU Acceleration: CUDA optimization for higher frame rates and resolution
  • Mobile Deployment: Port to iOS/Android with TensorFlow Lite or CoreML
  • API Development: RESTful API for integration with third-party applications