🎬 IMDB Movie Review Sentiment (SimpleRNN | Keras)

A lightweight SimpleRNN model trained on the Keras IMDB dataset to predict movie review sentiment.

This Hugging Face repo hosts the trained model artifact used by a Streamlit inference app.

Training → Model → Inference

Training notebook (Colab): https://colab.research.google.com/drive/14A_qc4aLvx5I0cFsK9lJYHRymGjzZIyK
Inference app (Streamlit): https://github.com/sparklerz/Deep-Learning-Fundamentals-Suite
(page: pages/03_IMDB_Sentiment_SimpleRNN.py)

What’s in this repo

artifacts/simple_rnn_imdb.h5 — trained Keras model
artifacts/config.json — key inference settings:
- max_features (vocab size cap)
- max_len (sequence length)
- threshold_default (classification threshold)

Inputs

A short English movie review (free text).

Preprocessing (same as Streamlit app)

Lowercase + tokenize with regex: [a-z']+
Convert tokens to integer IDs using the Keras IMDB word index (tensorflow.keras.datasets.imdb.get_word_index())
Apply the standard Keras IMDB offset:
- start token = 1
- unknown token = 2
- word indices are shifted by +3
Clip words to max_features; anything outside becomes 2 (unknown)
Pad/truncate to max_len using pad_sequences (padding="pre", truncating="post")

Output

A single probability: P(positive) in [0, 1].
Decision rule:
- Positive if P(positive) >= threshold
- Negative otherwise
Default threshold is read from artifacts/config.json (typically 0.5).

Quickstart (load + predict)

import re
import numpy as np
import tensorflow as tf
from huggingface_hub import hf_hub_download
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.datasets import imdb
import json

REPO_ID = "ash001/imdb-sentiment-simple-rnn"

# Load model + config
model_path = hf_hub_download(REPO_ID, "artifacts/simple_rnn_imdb.h5")
cfg_path   = hf_hub_download(REPO_ID, "artifacts/config.json")
cfg = json.load(open(cfg_path, "r"))

model = tf.keras.models.load_model(model_path, compile=False)
word_index = imdb.get_word_index()

max_features = int(cfg["max_features"])
max_len = int(cfg["max_len"])
threshold = float(cfg.get("threshold_default", 0.5))

def text_to_sequence(text: str):
    text = text.lower()
    tokens = re.findall(r"[a-z']+", text)

    seq = [1]  # start token
    for w in tokens:
        idx = word_index.get(w, 2) + 3
        if idx >= max_features:
            idx = 2
        seq.append(idx)

    return pad_sequences([seq], maxlen=max_len, truncating="post", padding="pre")

text = "This movie was surprisingly good, with great acting and a strong ending."
X = text_to_sequence(text)

prob_pos = float(model.predict(X, verbose=0).reshape(-1)[0])
label = "Positive" if prob_pos >= threshold else "Negative"
print("P(positive) =", prob_pos, "|", label)

license: apache-2.0

Downloads last month: -