theme_model / README.md
4nkh's picture
Update README.md
91ad21d verified
metadata
license: apache-2.0
datasets:
  - 4nkh/theme_data
language:
  - en
metrics:
  - precision
  - f1
  - recall
  - accuracy
base_model:
  - google-bert/bert-base-uncased
pipeline_tag: text-classification
library_name: transformers
tags:
  - multi-label
  - theme_detection
  - mentorship
  - entrepreneurship
  - startup success
  - json automation

Theme classification model (multi-label)

This repository contains a fine-tuned BERT model for classifying short texts into community-oriented themes. The model was trained locally and pushed to the Hugging Face Hub.

Model details

  • Model architecture: bert-base-uncased (fine-tuned)
  • Problem type: multi-label classification
  • Labels: mentorship, entrepreneurship, startup success
  • Training data: train_theme.jsonl (included)
  • Final evaluation (example run):
    • eval_loss: 0.1822
    • eval_micro/f1: 1.0
    • eval_macro/f1: 1.0

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

repo = "4nkh/theme_model"
tokenizer = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo)

texts = ["Our co-op paired first-time founders with veteran shop owners to troubleshoot setbacks."]
inputs = tokenizer(texts, truncation=True, padding=True, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    probs = torch.sigmoid(logits)
    preds = (probs >= 0.5).int()
    print('probs', probs.numpy(), 'preds', preds.numpy())

Notes

  • This model uses a threshold of 0.5 for multi-label predictions. Adjust thresholds per-class as needed.
  • If you want to re-train or fine-tune further, see train_theme_model.py in this folder.

License

Specify your license here (e.g., Apache-2.0) or remove this section if you prefer a different license.