I think the current architecture should be consistent with Ministral3, but why is it different?

#5
by win10 - opened

I think the current architecture should be consistent with Ministral3, but why is it different?

Ministral3:

Mistral3ForConditionalGeneration

Devstral-2:

Ministral3ForCausalLM

I'm curious about the reason for this design.

Sign up or log in to comment