I think the current architecture should be consistent with Ministral3, but why is it different?
Ministral3:
Mistral3ForConditionalGeneration
Devstral-2:
Ministral3ForCausalLM
I'm curious about the reason for this design.
· Sign up or log in to comment