boomer-4b / README.md
Ajmalps's picture
Update README.md
ec90815 verified
metadata
license: apache-2.0
language:
  - en
library_name: transformers

Boomer-4b: A Leap in Language Model Innovation πŸš€

Introduction πŸŽ‰

In the spirit of open innovation, we're thrilled to share our pioneering work on pretraining with a custom architecture and dataset. boomer-4b, our 3.51 billion parameter marvel, represents a significant stride in the AI field. Crafted meticulously from custom synthetic data generated with textbook style. This model not only exemplifies our commitment to advancing the boundaries of AI through creative architecture but also through thoughtful data amalgamation.

Quick Start πŸš€

Jump straight into using boomer-4b:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("budecosystem/boomer-4b")
model = AutoModelForCausalLM.from_pretrained("budecosystem/boomer-4b", torch_dtype=torch.bfloat16)
inputs = tokenizer("Newton's second law", return_tensors="pt")
sample = model.generate(**inputs, max_length=128)
print(tokenizer.decode(sample[0]))

Model Insights πŸ”

Architecture Highlights:

  • Layers: 24
  • Heads: 32
  • Model Dimension: 2048
  • Vocab Size: 32000
  • Sequence Length: 2048
  • Intermediate Size: 11008

Training Configuration πŸ“Š

The training was finely tuned with the following hyperparameters:

  • Per Device Train Batch Size: 6
  • Gradient Accumulation Steps: 1
  • Learning Rate: 2e-5
  • Optimizer: AdamW
  • Beta Values: 0.9, 0.99
  • Mixed Precision (FP16): True

Evaluations and Comparisons πŸ…

boomer-4b has been rigorously evaluated across several benchmarks:

Model MMLU ARC HellaSwag GSM8K Winogrande MATH MathQA DROP LogiQA
boomer-4b 55.59 58.53 74.70 47.76 72.22 4.00 35.98 0.74 31.80
GeneZC/MiniChat-3B 39.17 44.03 67.19 10.54 65.27 - - - -
openlm-research/open_11ama_3b_v2 27.12 44.03 71.6 0.91 67.01 - - - -
microsoft/phi-2 58.11 61.09 75.11 54.81 74.35 - - - -
TinyLlama/TinyLlama-1.1B-intermediate 26.04 33.87 60.31 1.44 59.51 - - - -

Why boomer-4b? ✨

boomer-4b's remarkable performance across a variety of benchmarks not only showcases its robustness and versatility but also highlights its superiority in handling complex reasoning and understanding tasks. It stands as a continuation of our pursuit of excellence in AI, building on the foundation laid by boomer 1b.

Limitations of boomer-4b

Despite its impressive achievements, boomer-4b encounters challenges in areas requiring intricate mathematical problem-solving and sophisticated logical reasoning, as reflected in its subdued performance in MATH and LogiQA evaluations. This variability in task performance suggests limitations in its capacity to uniformly apply and adapt its knowledge base across a spectrum of reasoning and synthesis challenges, pointing to areas for further refinement and enhancement.

Acknowledgments πŸ™

A special thanks to the open-source community and the researchers who paved the way for innovations like boomer. Our team's dedication to curating the dataset and fine-tuning the model has been instrumental in achieving this milestone.

Dive into the future of AI with boomer-4b and explore its capabilities in pushing the boundaries of what's possible in language understanding and beyond.