|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- 4nkh/theme_data |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- precision |
|
|
- f1 |
|
|
- recall |
|
|
- accuracy |
|
|
base_model: |
|
|
- google-bert/bert-base-uncased |
|
|
pipeline_tag: text-classification |
|
|
library_name: transformers |
|
|
tags: |
|
|
- multi-label |
|
|
- theme_detection |
|
|
- mentorship |
|
|
- entrepreneurship |
|
|
- startup success |
|
|
- json automation |
|
|
--- |
|
|
# Theme classification model (multi-label) |
|
|
|
|
|
This repository contains a fine-tuned BERT model for classifying short texts into community-oriented themes. The model was trained locally and pushed to the Hugging Face Hub. |
|
|
|
|
|
Model details |
|
|
|
|
|
- Model architecture: bert-base-uncased (fine-tuned) |
|
|
- Problem type: multi-label classification |
|
|
- Labels: `mentorship`, `entrepreneurship`, `startup success` |
|
|
- Training data: `train_theme.jsonl` (included) |
|
|
- Final evaluation (example run): |
|
|
- eval_loss: 0.1822 |
|
|
- eval_micro/f1: 1.0 |
|
|
- eval_macro/f1: 1.0 |
|
|
|
|
|
Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
repo = "4nkh/theme_model" |
|
|
tokenizer = AutoTokenizer.from_pretrained(repo) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(repo) |
|
|
|
|
|
texts = ["Our co-op paired first-time founders with veteran shop owners to troubleshoot setbacks."] |
|
|
inputs = tokenizer(texts, truncation=True, padding=True, return_tensors="pt") |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
logits = outputs.logits |
|
|
probs = torch.sigmoid(logits) |
|
|
preds = (probs >= 0.5).int() |
|
|
print('probs', probs.numpy(), 'preds', preds.numpy()) |
|
|
``` |
|
|
|
|
|
Notes |
|
|
|
|
|
- This model uses a threshold of 0.5 for multi-label predictions. Adjust thresholds per-class as needed. |
|
|
- If you want to re-train or fine-tune further, see `train_theme_model.py` in this folder. |
|
|
|
|
|
License |
|
|
|
|
|
Specify your license here (e.g., Apache-2.0) or remove this section if you prefer a different license. |