--- license: apache-2.0 language: - en library_name: transformers --- # Boomer-4b: A Leap in Language Model Innovation 🚀 ## Introduction 🎉 In the spirit of open innovation, we're thrilled to share our pioneering work on pretraining with a custom architecture and dataset. **boomer-4b**, our 3.51 billion parameter marvel, represents a significant stride in the AI field. Crafted meticulously from custom synthetic data generated with textbook style. This model not only exemplifies our commitment to advancing the boundaries of AI through creative architecture but also through thoughtful data amalgamation. ## Quick Start 🚀 Jump straight into using boomer-4b: ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("budecosystem/boomer-4b") model = AutoModelForCausalLM.from_pretrained("budecosystem/boomer-4b", torch_dtype=torch.bfloat16) inputs = tokenizer("Newton's second law", return_tensors="pt") sample = model.generate(**inputs, max_length=128) print(tokenizer.decode(sample[0])) ``` ## Model Insights 🔍 **Architecture Highlights:** - **Layers**: 24 - **Heads**: 32 - **Model Dimension**: 2048 - **Vocab Size**: 32000 - **Sequence Length**: 2048 - **Intermediate Size**: 11008 ## Training Configuration 📊 The training was finely tuned with the following hyperparameters: - **Per Device Train Batch Size**: 6 - **Gradient Accumulation Steps**: 1 - **Learning Rate**: 2e-5 - **Optimizer**: AdamW - **Beta Values**: 0.9, 0.99 - **Mixed Precision (FP16)**: True ## Evaluations and Comparisons 🏅 boomer-4b has been rigorously evaluated across several benchmarks: | Model | MMLU | ARC | HellaSwag | GSM8K | Winogrande | MATH | MathQA | DROP | LogiQA | |-------|------|-----|-----------|-------|------------|------|--------|------|--------| | **boomer-4b** | 55.59 | 58.53 | **74.70** | 47.76 | **72.22** | 4.00 | 35.98 | 0.74 | 31.80 | | GeneZC/MiniChat-3B | 39.17 | 44.03 | 67.19 | 10.54 | 65.27 | - | - | - | - | | openlm-research/open_11ama_3b_v2 | 27.12 | 44.03 | 71.6 | 0.91 | 67.01 | - | - | - | - | | microsoft/phi-2 | 58.11 | 61.09 | 75.11 | 54.81 | 74.35 | - | - | - | - | | TinyLlama/TinyLlama-1.1B-intermediate | 26.04 | 33.87 | 60.31 | 1.44 | 59.51 | - | - | - | - | ## Why boomer-4b? ✨ boomer-4b's remarkable performance across a variety of benchmarks not only showcases its robustness and versatility but also highlights its superiority in handling complex reasoning and understanding tasks. It stands as a continuation of our pursuit of excellence in AI, building on the foundation laid by boomer 1b. ## Limitations of boomer-4b Despite its impressive achievements, boomer-4b encounters challenges in areas requiring intricate mathematical problem-solving and sophisticated logical reasoning, as reflected in its subdued performance in MATH and LogiQA evaluations. This variability in task performance suggests limitations in its capacity to uniformly apply and adapt its knowledge base across a spectrum of reasoning and synthesis challenges, pointing to areas for further refinement and enhancement. ## Acknowledgments 🙏 A special thanks to the open-source community and the researchers who paved the way for innovations like boomer. Our team's dedication to curating the dataset and fine-tuning the model has been instrumental in achieving this milestone. Dive into the future of AI with boomer-4b and explore its capabilities in pushing the boundaries of what's possible in language understanding and beyond.