metadata
license: apache-2.0
datasets:
- 4nkh/theme_data
language:
- en
metrics:
- precision
- f1
- recall
- accuracy
base_model:
- google-bert/bert-base-uncased
pipeline_tag: text-classification
library_name: transformers
tags:
- multi-label
- theme_detection
- mentorship
- entrepreneurship
- startup success
- json automation
Theme classification model (multi-label)
This repository contains a fine-tuned BERT model for classifying short texts into community-oriented themes. The model was trained locally and pushed to the Hugging Face Hub.
Model details
- Model architecture: bert-base-uncased (fine-tuned)
- Problem type: multi-label classification
- Labels:
mentorship,entrepreneurship,startup success - Training data:
train_theme.jsonl(included) - Final evaluation (example run):
- eval_loss: 0.1822
- eval_micro/f1: 1.0
- eval_macro/f1: 1.0
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
repo = "4nkh/theme_model"
tokenizer = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo)
texts = ["Our co-op paired first-time founders with veteran shop owners to troubleshoot setbacks."]
inputs = tokenizer(texts, truncation=True, padding=True, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.sigmoid(logits)
preds = (probs >= 0.5).int()
print('probs', probs.numpy(), 'preds', preds.numpy())
Notes
- This model uses a threshold of 0.5 for multi-label predictions. Adjust thresholds per-class as needed.
- If you want to re-train or fine-tune further, see
train_theme_model.pyin this folder.
License
Specify your license here (e.g., Apache-2.0) or remove this section if you prefer a different license.