-
-
-
-
-
-
Inference Providers
Active filters:
rlhf
tasksource/deberta-small-long-nli
Zero-Shot Classification
•
0.1B
•
Updated
•
21.6k
•
•
48
sileod/deberta-v3-base-tasksource-nli
Zero-Shot Classification
•
0.2B
•
Updated
•
5.88k
•
•
133
stanfordnlp/SteamSHP-flan-t5-xl
Updated
•
6
•
43
stanfordnlp/SteamSHP-flan-t5-large
Updated
•
105
•
33
sileod/deberta-v3-large-tasksource-nli
Zero-Shot Classification
•
0.4B
•
Updated
•
448
•
37
sileod/deberta-v3-large-tasksource-rlhf-reward-model
Text Classification
•
Updated
•
45.1k
•
•
11
trl-lib/llama-7b-se-rl-peft
Updated
•
103
trl-lib/llama-7b-se-rm-peft
toloka/gpt2-large-rl-prompt-writing
Text Generation
•
0.8B
•
Updated
•
7
•
3
AdamG012/chat-opt-1.3b-rlhf-actor-deepspeed
Text Generation
•
Updated
•
4
•
5
AdamG012/chat-opt-1.3b-rlhf-critic-deepspeed
Text Generation
•
Updated
•
3
•
3
AdamG012/chat-opt-1.3b-rlhf-actor-ema-deepspeed
Text Generation
•
Updated
•
1
•
8
sileod/mdeberta-v3-base-tasksource-nli
Zero-Shot Classification
•
0.3B
•
Updated
•
41
•
18
Text Generation
•
Updated
•
2
•
5
Text Generation
•
Updated
•
5
•
3
Text Generation
•
Updated
•
12
•
6
argilla/roberta-base-reward-model-falcon-dolly
Text Classification
•
Updated
•
5
•
4
Text Generation
•
Updated
•
1
PKU-Alignment/beaver-7b-v1.0
Reinforcement Learning
•
7B
•
Updated
•
10
•
13
lyogavin/Anima33B-DPO-Belle-1k
Text Generation
•
Updated
•
1
lyogavin/Anima33B-DPO-Belle-1k-merged
Text Generation
•
Updated
•
10
•
12
PKU-Alignment/beaver-7b-v1.0-reward
Reinforcement Learning
•
7B
•
Updated
•
1.56k
•
17
PKU-Alignment/beaver-dam-7b
Updated
•
3.18k
•
17
PKU-Alignment/beaver-7b-v1.0-cost
Reinforcement Learning
•
7B
•
Updated
•
1.91k
•
10
Ablustrund/moss-rlhf-reward-model-7B-zh
Updated
•
3
•
23
OpenMOSS-Team/moss-rlhf-reward-model-7B-en
OpenMOSS-Team/moss-rlhf-sft-model-7B-en
OpenMOSS-Team/moss-rlhf-policy-model-7B-en
lightonai/alfred-40b-0723
Text Generation
•
Updated
•
9
•
46