pankajmathur commited on
Commit
d4bf336
Β·
verified Β·
1 Parent(s): d9ae245

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -8
README.md CHANGED
@@ -11,6 +11,7 @@ tags:
11
  pipeline_tag: text-generation
12
  datasets:
13
  - HuggingFaceTB/smol-smoltalk
 
14
  ---
15
 
16
  # nanochat-d20
@@ -19,10 +20,14 @@ datasets:
19
 
20
 
21
  ### Training Pipeline
22
- 1.**Base-training** General PreTraining using nanochat framework
23
- 2. **Mid-training**: General instruction tuning on SmolTalk, MMLU, GSM8K, Spelling tasks
24
- 3. **SFT (Supervised Fine-Tuning)**: Chat-specific training on ARC, GSM8K, SmolTalk
25
- 4. **RL (Reinforcement Learning)**: Optional GRPO-style training on GSM8K (if included)
 
 
 
 
26
 
27
 
28
  ## Repository Structure
@@ -34,13 +39,13 @@ datasets:
34
  β”œβ”€β”€ mid_checkpoints/d34/ # Mid-training checkpoint
35
  β”‚ β”œβ”€β”€ model_*.pt
36
  β”‚ └── meta_*.json
37
- β”œβ”€β”€ chatsft_checkpoints/d34/ # SFT checkpoint
38
  β”‚ β”œβ”€β”€ model_*.pt
39
  β”‚ └── meta_*.json
40
- β”œβ”€β”€ chatsft_checkpoints_int8/d34/ # SFT checkpoint
41
  β”‚ β”œβ”€β”€ model_*.pt
42
  β”‚ └── meta_*.json
43
- β”œβ”€β”€ chatrl_checkpoints/d34/ # RL checkpoint (if available)
44
  β”‚ β”œβ”€β”€ model_*.pt
45
  β”‚ └── meta_*.json
46
  β”œβ”€β”€ report/ # Evaluation reports
@@ -55,7 +60,7 @@ MIT License (same as nanochat)
55
 
56
  ## Acknowledgments
57
 
58
- - [Andrej Karpathy](https://github.com/karpathy) for the nanochat framework and pre-trained base model
59
 
60
  ```bibtex
61
  @misc{nanochat,
 
11
  pipeline_tag: text-generation
12
  datasets:
13
  - HuggingFaceTB/smol-smoltalk
14
+ - karpathy/fineweb-edu-100b-shuffle
15
  ---
16
 
17
  # nanochat-d20
 
20
 
21
 
22
  ### Training Pipeline
23
+
24
+ 1.**Base-training** PreTraining on FineWeb-EDU dataset using nanochat framework
25
+
26
+ 2. **Mid-training**: General instruction tuning on SmolTalk, MMLU, GSM8K, Spelling tasks
27
+
28
+ 3. **SFT (Supervised Fine-Tuning)**: Chat-specific training on ARC, GSM8K, SmolTalk
29
+
30
+ 4. **RL (Reinforcement Learning)**: Optional GRPO-style training on GSM8K (if included)
31
 
32
 
33
  ## Repository Structure
 
39
  β”œβ”€β”€ mid_checkpoints/d34/ # Mid-training checkpoint
40
  β”‚ β”œβ”€β”€ model_*.pt
41
  β”‚ └── meta_*.json
42
+ β”œβ”€β”€ chatsft_checkpoints/d20/ # SFT checkpoint
43
  β”‚ β”œβ”€β”€ model_*.pt
44
  β”‚ └── meta_*.json
45
+ β”œβ”€β”€ chatsft_checkpoints_int8/d20/ # SFT checkpoint
46
  β”‚ β”œβ”€β”€ model_*.pt
47
  β”‚ └── meta_*.json
48
+ β”œβ”€β”€ chatrl_checkpoints/d20/ # RL checkpoint (if available)
49
  β”‚ β”œβ”€β”€ model_*.pt
50
  β”‚ └── meta_*.json
51
  β”œβ”€β”€ report/ # Evaluation reports
 
60
 
61
  ## Acknowledgments
62
 
63
+ - [Andrej Karpathy](https://github.com/karpathy) for the nanochat framework
64
 
65
  ```bibtex
66
  @misc{nanochat,