eruiner commited on
Commit
1184440
·
verified ·
1 Parent(s): e954aed

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +100 -3
README.md CHANGED
@@ -1,3 +1,100 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Qwen/Qwen2.5-1.5B
5
+ - Qwen/Qwen2.5-3B
6
+ task_categories:
7
+ - text-classification
8
+ language:
9
+ - en
10
+ - zh
11
+ tags:
12
+ - quality-assessment
13
+ - text-quality
14
+ - regression
15
+ pipeline_tag: text-classification
16
+ library_name: transformers
17
+ ---
18
+
19
+ # Qwen2.5 Text Quality Classifier
20
+
21
+ Fine-tuned Qwen2.5-1.5B and Qwen2.5-3B models for automated text quality assessment. Predicts quality scores on a 0-1 scale focusing on educational value and mathematical intelligence.
22
+
23
+ ## Model Details
24
+
25
+ - **Base Models**: Qwen2.5-1.5B / Qwen2.5-3B
26
+ - **Task**: Text Quality Regression
27
+ - **Languages**: English, Chinese
28
+ - **Training Data**: [OpenSQZ/Classifiers-Data](https://huggingface.co/datasets/OpenSQZ/Classifiers-Data)
29
+ - **Loss Function**: MSE Loss
30
+
31
+ ## Performance
32
+
33
+ | Model | Test MSE Loss |
34
+ |-------|---------------|
35
+ | Qwen2.5-1.5B | 0.00226 |
36
+ | Qwen2.5-3B | 0.00209 |
37
+
38
+ ## Quick Start
39
+
40
+ ### Installation
41
+ ```bash
42
+ pip install transformers torch
43
+ ```
44
+
45
+ ### Usage
46
+
47
+ ```python
48
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
49
+ import torch
50
+
51
+ # Load model and tokenizer
52
+ model_name = "OpenSQZ/Qwen2.5-1.5B-Classifier" # or Qwen2.5-3B-Quality-Classifier
53
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
54
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
55
+
56
+ # Predict quality score
57
+ text = "Linear algebra is fundamental to understanding vector spaces and matrix operations in mathematics."
58
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=8192)
59
+
60
+ with torch.no_grad():
61
+ outputs = model(**inputs)
62
+ score = torch.sigmoid(outputs.logits).item()
63
+
64
+ print(f"Quality Score: {score:.3f}") # Output: Quality Score: 0.847
65
+ ```
66
+
67
+ ## Quality Score Interpretation
68
+
69
+ | Score Range | Quality Level | Use Case |
70
+ |-------------|---------------|----------|
71
+ | 0.8 - 1.0 | Excellent | Premium training data |
72
+ | 0.6 - 0.8 | Good | Standard training data |
73
+ | 0.4 - 0.6 | Average | Conditional use |
74
+ | 0.0 - 0.4 | Poor | Filter out |
75
+
76
+ ## Model Selection
77
+
78
+ - **1.5B Model**: Faster inference, good for real-time applications
79
+ - **3B Model**: Higher accuracy, better for batch processing
80
+
81
+ ## Limitations
82
+
83
+ - Optimized for educational and mathematical content
84
+ - May not generalize well to creative or subjective content
85
+ - Scores should be used as guidance, not absolute judgments
86
+
87
+ ## Citation
88
+
89
+ ```bibtex
90
+ @model{qwen25_quality_classifier_2025,
91
+ title={Qwen2.5 Text Quality Classifier},
92
+ author={Chao Li, Yifan Zhang},
93
+ year={2025},
94
+ publisher={OpenSQZ}
95
+ }
96
+ ```
97
+
98
+ ## License
99
+
100
+ Apache 2.0