🎬 IMDB Movie Review Sentiment (SimpleRNN | Keras)

A lightweight SimpleRNN model trained on the Keras IMDB dataset to predict movie review sentiment.

This Hugging Face repo hosts the trained model artifact used by a Streamlit inference app.

Training β†’ Model β†’ Inference

What’s in this repo

  • artifacts/simple_rnn_imdb.h5 β€” trained Keras model
  • artifacts/config.json β€” key inference settings:
    • max_features (vocab size cap)
    • max_len (sequence length)
    • threshold_default (classification threshold)

Inputs

  • A short English movie review (free text).

Preprocessing (same as Streamlit app)

  • Lowercase + tokenize with regex: [a-z']+
  • Convert tokens to integer IDs using the Keras IMDB word index (tensorflow.keras.datasets.imdb.get_word_index())
  • Apply the standard Keras IMDB offset:
    • start token = 1
    • unknown token = 2
    • word indices are shifted by +3
  • Clip words to max_features; anything outside becomes 2 (unknown)
  • Pad/truncate to max_len using pad_sequences (padding="pre", truncating="post")

Output

  • A single probability: P(positive) in [0, 1].
  • Decision rule:
    • Positive if P(positive) >= threshold
    • Negative otherwise
  • Default threshold is read from artifacts/config.json (typically 0.5).

Quickstart (load + predict)

import re
import numpy as np
import tensorflow as tf
from huggingface_hub import hf_hub_download
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.datasets import imdb
import json

REPO_ID = "ash001/imdb-sentiment-simple-rnn"

# Load model + config
model_path = hf_hub_download(REPO_ID, "artifacts/simple_rnn_imdb.h5")
cfg_path   = hf_hub_download(REPO_ID, "artifacts/config.json")
cfg = json.load(open(cfg_path, "r"))

model = tf.keras.models.load_model(model_path, compile=False)
word_index = imdb.get_word_index()

max_features = int(cfg["max_features"])
max_len = int(cfg["max_len"])
threshold = float(cfg.get("threshold_default", 0.5))

def text_to_sequence(text: str):
    text = text.lower()
    tokens = re.findall(r"[a-z']+", text)

    seq = [1]  # start token
    for w in tokens:
        idx = word_index.get(w, 2) + 3
        if idx >= max_features:
            idx = 2
        seq.append(idx)

    return pad_sequences([seq], maxlen=max_len, truncating="post", padding="pre")

text = "This movie was surprisingly good, with great acting and a strong ending."
X = text_to_sequence(text)

prob_pos = float(model.predict(X, verbose=0).reshape(-1)[0])
label = "Positive" if prob_pos >= threshold else "Negative"
print("P(positive) =", prob_pos, "|", label)

license: apache-2.0

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support