huginn-0125

#662
by heroOfOrion - opened

This is Huginn, version 01/25. This is a latent recurrent-depth model with 3.5B parameters, trained for 800B tokens on AMD MI250X machines. This is a proof-of-concept model, but surprisingly capable in reasoning and code given its training budget and size. All details on this model can be found in the tech report: "Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach."

https://huggingface.co./tomg-group-umd/huginn-0125

This model uses the RavenForCausalLM architecture which is unfortionately not yet supported by llama.cpp. Because of this I have not attempted to queue this model. Currently only the following architectures are supported by llama.cpp.

  • GPTNeoXForCausalLM
  • BloomForCausalLM
  • BloomModel
  • MPTForCausalLM
  • OrionForCausalLM
  • BaichuanForCausalLM
  • BaiChuanForCausalLM
  • XverseForCausalLM
  • FalconForCausalLM
  • RWForCausalLM
  • GPTBigCodeForCausalLM
  • GPTRefactForCausalLM
  • StableLmForCausalLM
  • StableLMEpochForCausalLM
  • LlavaStableLMEpochForCausalLM
  • LLaMAForCausalLM
  • LlamaForCausalLM
  • MistralForCausalLM
  • MixtralForCausalLM
  • DeciLMForCausalLM
  • BitnetForCausalLM
  • GrokForCausalLM
  • DbrxForCausalLM
  • MiniCPMForCausalLM
  • MiniCPM3ForCausalLM
  • QWenLMHeadModel
  • Qwen2ForCausalLM
  • Qwen2VLForConditionalGeneration
  • WavTokenizerDec
  • Qwen2MoeForCausalLM
  • GPT2LMHeadModel
  • PhiForCausalLM
  • Phi3ForCausalLM
  • PhiMoEForCausalLM
  • PlamoForCausalLM
  • CodeShellForCausalLM
  • InternLM2ForCausalLM
  • InternLM3ForCausalLM
  • BertModel
  • BertForMaskedLM
  • CamembertModel
  • RobertaModel
  • NomicBertModel
  • XLMRobertaModel
  • XLMRobertaForSequenceClassification
  • GemmaForCausalLM
  • Gemma2ForCausalLM
  • Starcoder2ForCausalLM
  • Rwkv6ForCausalLM
  • RWKV6Qwen2ForCausalLM
  • MambaForCausalLM
  • MambaLMHeadModel
  • FalconMambaForCausalLM
  • CohereForCausalLM
  • Cohere2ForCausalLM
  • OlmoForCausalLM
  • OLMoForCausalLM
  • Olmo2ForCausalLM
  • OlmoeForCausalLM
  • JinaBertModel
  • JinaBertForMaskedLM
  • OpenELMForCausalLM
  • ArcticForCausalLM
  • DeepseekForCausalLM
  • DeepseekV2ForCausalLM
  • DeepseekV3ForCausalLM
  • T5WithLMHeadModel
  • T5ForConditionalGeneration
  • MT5ForConditionalGeneration
  • UMT5ForConditionalGeneration
  • T5EncoderModel
  • JAISLMHeadModel
  • GlmForCausalLM
  • ChatGLMModel
  • ChatGLMForConditionalGeneration
  • NemotronForCausalLM
  • ExaoneForCausalLM
  • GraniteForCausalLM
  • GraniteMoeForCausalLM
  • ChameleonForConditionalGeneration
  • ChameleonForCausalLM
mradermacher changed discussion status to closed

Sign up or log in to comment