--- language: en license: apache-2.0 library_name: pytorch tags: - pytorch - torch - emotion-recognition - transformer - mediapipe - computer-vision - deep-learning - facial-expression - affective-computing - sequential-data model-index: - name: emotion_sequence_transformer_bilstm_mp478_seq256 results: - task: type: sequence-classification dataset: type: dataset name: Optimized 478-Point 3D Facial Landmark Dataset metrics: - name: accuracy type: float value: 0.71 inference: "Supports PyTorch inference" --- # 🧠 Emotion Sequence Transformer + BiLSTM MP478 Seq256 ## 📘 Overview This repository provides a **Transformer + BiLSTM-based Emotion Recognition model** trained on **MediaPipe landmark sequences** extracted from facial points. The model classifies human emotions into six categories: **Angry, Disgust, Fear, Happy, Neutral, Sad**. It processes **temporal sequences** of 256 frames per clip with 478 landmarks per frame, learning the dynamic patterns of human expression. The model is optimized for **real-time emotion inference** and can be used in applications such as **sign language understanding** and **emotion-aware human-computer interaction**. --- ## 🧩 Model Architecture The model is built using **Transformer layers** for sequence modeling: 1. **Input Layer:** - Accepts sequences of shape `(256, 478*3)` corresponding to 3D coordinates of 478 landmarks over 256 frames. 2. **Transformer Encoder Layers:** - Capture temporal dependencies and dynamic patterns of human motion using self-attention mechanisms. 3. **Fully Connected Layers:** - Transform the encoder outputs into probabilities for six emotion classes. 4. **Output Layer:** - Softmax activation for multi-class emotion classification. --- ## 📊 Dataset **Custom MediaPipe Landmark Dataset** - Extracted from labeled video clips representing six emotions. - Preprocessing includes normalization, sequence grouping (256 frames per clip), and balanced augmentation. - Total dataset is split into training, validation, and test sets. --- ## ⚙️ Training Configuration | Parameter | Description | | ------------------- | ----------------------------- | | **Architecture** | Transformer | | **Sequence Length** | 256 frames | | **Input Features** | 478 landmarks × 3 coordinates | | **Optimizer** | Adam | | **Learning Rate** | 1e-4 | | **Loss Function** | CrossEntropyLoss | | **Batch Size** | 32 | | **Epochs** | 60 | --- ## 📈 Performance Summary | Metric | Score | | --------------- | ----- | | **Accuracy** | 0.71 | | **Macro F1** | 0.70 | | **Weighted F1** | 0.71 | ### Classification Report | Class | Precision | Recall | F1-Score | Support | | ------- | --------- | ------ | -------- | ------- | | Angry | 0.78 | 0.63 | 0.70 | 139 | | Disgust | 0.77 | 0.79 | 0.78 | 128 | | Fear | 0.50 | 0.58 | 0.54 | 114 | | Happy | 0.95 | 0.92 | 0.94 | 129 | | Neutral | 0.61 | 0.81 | 0.69 | 101 | | Sad | 0.64 | 0.52 | 0.58 | 134 | --- ## 🖼️ Visualizations - **Confusion Matrix** ![Training Accuracy and Loss](images/Accuracies_and_Losses.png) _Training and validation accuracy and loss._ - **Multi-Class ROC Curves (AUC per class)** ![ROC Curves](images/ROC_Curves.png) _Multi-class ROC curves with AUC values._ - **Confusion Matrix (Heatmap)** ![Confusion Matrix](images/Confusion_Matrix.png) _Confusion matrix heatmap on test set._ --- ## 🧩 Model Files | File | Description | | ------------------------------------------------------ | ------------------------------------------ | | `emotion_sequence_transformer_mp478_seq256.pt` | Original PyTorch Transformer model | | `emotion_sequence_transformer_mp478_seq256_weights.pt` | Original PyTorch Transformer model weights | --- ## 🚀 Usage and Preprocessing To correctly use this model for prediction, you must first preprocess your video data using the provided assets for standardization and label encoding. ### 1. Preprocessing Assets The necessary files for video preprocessing are stored in the **`assets/`** folder of this repository: | File Name | Purpose | Required for Step | | :--------------------------------- | :--------------------------------------------------------------------------------------- | :---------------- | | **`emotion_label_encoder.joblib`** | Maps predicted indices back to human-readable emotion labels (e.g., 0 -> 'Happy'). | Post-Inference | | **`global_mean_tensor.pt`** | Global mean tensor used to **normalize** the extracted MediaPipe features. | Preprocessing | | **`global_std_tensor.pt`** | Global standard deviation tensor used to **normalize** the extracted MediaPipe features. | Preprocessing | You must load the **mean tensor** and **std tensor** to standardize your input feature sequences before feeding them into the BiLSTM model. ### 2. Complete Example For a full, runnable demonstration showing how to load the model, use the assets for standardization, and run inference on a video, please refer to the usage notebook: - **Notebook:** **`emotion-sequence-transformer-bilstm-usage.ipynb`** This file provides the complete code necessary to replicate the deployment environment. --- ## 🚀 Key Features - Real-time emotion recognition from **MediaPipe landmarks** - Transformer-based sequence modeling for dynamic human motion - Handles **six primary emotion classes** --- ## 🏷️ Tags `emotion-recognition` `transformer` `sequential-data` `mediapipe` `human-emotion` `deep-learning` `pytorch` `torchscript` `affective-computing` `fine-tuning` `real-time` --- ## 👤 Author & Model Info **Author:** P.S. Abewickrama Singhe **Developed with:** PyTorch **License:** Apache-2.0 **Date:** October 2025