---
language: en
license: apache-2.0
library_name: pytorch
tags:
  - pytorch
  - torch
  - emotion-recognition
  - transformer
  - mediapipe
  - computer-vision
  - deep-learning
  - facial-expression
  - affective-computing
  - sequential-data
model-index:
  - name: emotion_sequence_transformer_bilstm_mp478_seq256
    results:
      - task:
          type: sequence-classification
        dataset:
          type: dataset
          name: Optimized 478-Point 3D Facial Landmark Dataset
        metrics:
          - name: accuracy
            type: float
            value: 0.71
    inference: "Supports PyTorch inference"
---

# 🧠 Emotion Sequence Transformer + BiLSTM MP478 Seq256

## 📘 Overview

This repository provides a **Transformer + BiLSTM-based Emotion Recognition model** trained on **MediaPipe landmark sequences** extracted from facial points. The model classifies human emotions into six categories: **Angry, Disgust, Fear, Happy, Neutral, Sad**.

It processes **temporal sequences** of 256 frames per clip with 478 landmarks per frame, learning the dynamic patterns of human expression. The model is optimized for **real-time emotion inference** and can be used in applications such as **sign language understanding** and **emotion-aware human-computer interaction**.

---

## 🧩 Model Architecture

The model is built using **Transformer layers** for sequence modeling:

1. **Input Layer:**

   - Accepts sequences of shape `(256, 478*3)` corresponding to 3D coordinates of 478 landmarks over 256 frames.

2. **Transformer Encoder Layers:**

   - Capture temporal dependencies and dynamic patterns of human motion using self-attention mechanisms.

3. **Fully Connected Layers:**

   - Transform the encoder outputs into probabilities for six emotion classes.

4. **Output Layer:**

   - Softmax activation for multi-class emotion classification.

---

## 📊 Dataset

**Custom MediaPipe Landmark Dataset**

- Extracted from labeled video clips representing six emotions.
- Preprocessing includes normalization, sequence grouping (256 frames per clip), and balanced augmentation.
- Total dataset is split into training, validation, and test sets.

---

## ⚙️ Training Configuration

| Parameter           | Description                   |
| ------------------- | ----------------------------- |
| **Architecture**    | Transformer                   |
| **Sequence Length** | 256 frames                    |
| **Input Features**  | 478 landmarks × 3 coordinates |
| **Optimizer**       | Adam                          |
| **Learning Rate**   | 1e-4                          |
| **Loss Function**   | CrossEntropyLoss              |
| **Batch Size**      | 32                            |
| **Epochs**          | 60                            |

---

## 📈 Performance Summary

| Metric          | Score |
| --------------- | ----- |
| **Accuracy**    | 0.71  |
| **Macro F1**    | 0.70  |
| **Weighted F1** | 0.71  |

### Classification Report

| Class   | Precision | Recall | F1-Score | Support |
| ------- | --------- | ------ | -------- | ------- |
| Angry   | 0.78      | 0.63   | 0.70     | 139     |
| Disgust | 0.77      | 0.79   | 0.78     | 128     |
| Fear    | 0.50      | 0.58   | 0.54     | 114     |
| Happy   | 0.95      | 0.92   | 0.94     | 129     |
| Neutral | 0.61      | 0.81   | 0.69     | 101     |
| Sad     | 0.64      | 0.52   | 0.58     | 134     |

---

## 🖼️ Visualizations

- **Confusion Matrix**
  ![Training Accuracy and Loss](images/Accuracies_and_Losses.png)
  _Training and validation accuracy and loss._

- **Multi-Class ROC Curves (AUC per class)**
  ![ROC Curves](images/ROC_Curves.png)
  _Multi-class ROC curves with AUC values._

- **Confusion Matrix (Heatmap)**
  ![Confusion Matrix](images/Confusion_Matrix.png)
  _Confusion matrix heatmap on test set._

---

## 🧩 Model Files

| File                                                   | Description                                |
| ------------------------------------------------------ | ------------------------------------------ |
| `emotion_sequence_transformer_mp478_seq256.pt`         | Original PyTorch Transformer model         |
| `emotion_sequence_transformer_mp478_seq256_weights.pt` | Original PyTorch Transformer model weights |

---

## 🚀 Usage and Preprocessing

To correctly use this model for prediction, you must first preprocess your video data using the provided assets for standardization and label encoding.

### 1. Preprocessing Assets

The necessary files for video preprocessing are stored in the **`assets/`** folder of this repository:

| File Name                          | Purpose                                                                                  | Required for Step |
| :--------------------------------- | :--------------------------------------------------------------------------------------- | :---------------- |
| **`emotion_label_encoder.joblib`** | Maps predicted indices back to human-readable emotion labels (e.g., 0 -> 'Happy').       | Post-Inference    |
| **`global_mean_tensor.pt`**        | Global mean tensor used to **normalize** the extracted MediaPipe features.               | Preprocessing     |
| **`global_std_tensor.pt`**         | Global standard deviation tensor used to **normalize** the extracted MediaPipe features. | Preprocessing     |

You must load the **mean tensor** and **std tensor** to standardize your input feature sequences before feeding them into the BiLSTM model.

### 2. Complete Example

For a full, runnable demonstration showing how to load the model, use the assets for standardization, and run inference on a video, please refer to the usage notebook:

- **Notebook:** **`emotion-sequence-transformer-bilstm-usage.ipynb`**

This file provides the complete code necessary to replicate the deployment environment.

---

## 🚀 Key Features

- Real-time emotion recognition from **MediaPipe landmarks**
- Transformer-based sequence modeling for dynamic human motion
- Handles **six primary emotion classes**

---

## 🏷️ Tags

`emotion-recognition` `transformer` `sequential-data` `mediapipe` `human-emotion` `deep-learning` `pytorch` `torchscript` `affective-computing` `fine-tuning` `real-time`

---

## 👤 Author & Model Info

**Author:** P.S. Abewickrama Singhe  
**Developed with:** PyTorch  
**License:** Apache-2.0  
**Date:** October 2025