π¬ IMDB Movie Review Sentiment (SimpleRNN | Keras)
A lightweight SimpleRNN model trained on the Keras IMDB dataset to predict movie review sentiment.
This Hugging Face repo hosts the trained model artifact used by a Streamlit inference app.
Training β Model β Inference
- Training notebook (Colab): https://colab.research.google.com/drive/14A_qc4aLvx5I0cFsK9lJYHRymGjzZIyK
- Inference app (Streamlit): https://github.com/sparklerz/Deep-Learning-Fundamentals-Suite
(page:pages/03_IMDB_Sentiment_SimpleRNN.py)
Whatβs in this repo
artifacts/simple_rnn_imdb.h5β trained Keras modelartifacts/config.jsonβ key inference settings:max_features(vocab size cap)max_len(sequence length)threshold_default(classification threshold)
Inputs
- A short English movie review (free text).
Preprocessing (same as Streamlit app)
- Lowercase + tokenize with regex:
[a-z']+ - Convert tokens to integer IDs using the Keras IMDB word index (
tensorflow.keras.datasets.imdb.get_word_index()) - Apply the standard Keras IMDB offset:
- start token =
1 - unknown token =
2 - word indices are shifted by
+3
- start token =
- Clip words to
max_features; anything outside becomes2(unknown) - Pad/truncate to
max_lenusingpad_sequences(padding="pre", truncating="post")
Output
- A single probability: P(positive) in
[0, 1]. - Decision rule:
PositiveifP(positive) >= thresholdNegativeotherwise
- Default threshold is read from
artifacts/config.json(typically0.5).
Quickstart (load + predict)
import re
import numpy as np
import tensorflow as tf
from huggingface_hub import hf_hub_download
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.datasets import imdb
import json
REPO_ID = "ash001/imdb-sentiment-simple-rnn"
# Load model + config
model_path = hf_hub_download(REPO_ID, "artifacts/simple_rnn_imdb.h5")
cfg_path = hf_hub_download(REPO_ID, "artifacts/config.json")
cfg = json.load(open(cfg_path, "r"))
model = tf.keras.models.load_model(model_path, compile=False)
word_index = imdb.get_word_index()
max_features = int(cfg["max_features"])
max_len = int(cfg["max_len"])
threshold = float(cfg.get("threshold_default", 0.5))
def text_to_sequence(text: str):
text = text.lower()
tokens = re.findall(r"[a-z']+", text)
seq = [1] # start token
for w in tokens:
idx = word_index.get(w, 2) + 3
if idx >= max_features:
idx = 2
seq.append(idx)
return pad_sequences([seq], maxlen=max_len, truncating="post", padding="pre")
text = "This movie was surprisingly good, with great acting and a strong ending."
X = text_to_sequence(text)
prob_pos = float(model.predict(X, verbose=0).reshape(-1)[0])
label = "Positive" if prob_pos >= threshold else "Negative"
print("P(positive) =", prob_pos, "|", label)
license: apache-2.0
- Downloads last month
- -