Youlln's picture
Update README.md
016c853 verified
metadata
library_name: transformers
license: apache-2.0
base_model:
  - MaziyarPanahi/calme-3.2-instruct-78b
  - Sakalti/ultiima-72B
tags:
  - merge
  - mergekit

ECE-TRIOMPHANT-2.1-YL-72B-SLERP-V1

This model has been produced by:

  • ROBERGE Marial, engineering student at French Engineering School ECE
  • ESCRIVA Mathis, engineering student at French Engineering School ECE
  • LALAIN Youri, engineering student at French Engineering School ECE
  • RAGE LILIAN, engineering student at French Engineering School ECE
  • HUVELLE Baptiste, engineering student at French Engineering School ECE

Under the supervision of:

  • Andre-Louis Rochet, Lecturer at ECE & Co-Founder of TW3 Partners
  • Paul Lemaistre, CTO of TW3 Partners

With the contribution of:

  • ECE engineering school as sponsor and financial contributor
  • François STEPHAN as director of ECE
  • Gérard REUS as acting director of iLAB
  • Matthieu JOLLARD ECE Alumni
  • Louis GARCIA ECE Alumni

Supervisory structure

The iLab (intelligence Lab) is a structure created by the ECE and dedicated to artificial intelligence

About ECE

ECE, a multi-program, multi-campus, and multi-sector engineering school specializing in digital engineering, trains engineers and technology experts for the 21st century, capable of meeting the challenges of the dual digital and sustainable development revolutions.

ECE-TRIOMPHANT-2.1-YL-72B-SLERP-V1 est un modèle de langage fusionné créé à partir des modèles Sakalti/ultiima-72B et MaziyarPanahi/calme-3.2-instruct-78b. Grâce à la méthode SLERP (Spherical Linear Interpolation), il combine les forces des deux architectures pour offrir des performances optimales sur des tâches complexes de traitement du langage naturel (NLP).

Caractéristiques

  • Méthode de fusion : SLERP (Spherical Linear Interpolation).
  • Modèles sources :
  • Points forts :
    • Performances améliorées sur des tâches multi-domaines et de raisonnement.
    • Capacité de traitement étendue grâce à la fusion des couches critiques.
    • Optimisation en bfloat16 pour des calculs rapides et efficaces.
  • Applications cibles :
    • Raisonnement mathématique.
    • Compréhension contextuelle.
    • Tâches instructives (Instruction Following).

Configuration

slices:
  - sources:
      - model: MaziyarPanahi/calme-3.2-instruct-78b
        layer_range: [0, 80]  # Limité à 80 couches
      - model: Sakalti/ultiima-72B
        layer_range: [0, 80]  # Correspondance avec le 78B
merge_method: slerp
base_model: MaziyarPanahi/calme-3.2-instruct-78b
parameters:
  t:
    - filter: self_attn
      value: [0, 0.25, 0.5, 0.75, 1]
    - filter: mlp
      value: [1, 0.75, 0.5, 0.25, 0]
    - value: 0.5
dtype: bfloat16