Triangle104 commited on
Commit
88589fb
1 Parent(s): b1d537e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +134 -0
README.md CHANGED
@@ -18,6 +18,140 @@ base_model: AstroMLab/AstroSage-8B
18
  This model was converted to GGUF format from [`AstroMLab/AstroSage-8B`](https://huggingface.co/AstroMLab/AstroSage-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
19
  Refer to the [original model card](https://huggingface.co/AstroMLab/AstroSage-8B) for more details on the model.
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ## Use with llama.cpp
22
  Install llama.cpp through brew (works on Mac and Linux)
23
 
 
18
  This model was converted to GGUF format from [`AstroMLab/AstroSage-8B`](https://huggingface.co/AstroMLab/AstroSage-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
19
  Refer to the [original model card](https://huggingface.co/AstroMLab/AstroSage-8B) for more details on the model.
20
 
21
+ ---
22
+ Model details:
23
+ -
24
+ https://arxiv.org/abs/2411.09012
25
+
26
+ AstroSage-Llama-3.1-8B is a domain-specialized natural-language AI assistant tailored for research in astronomy, astrophysics, and cosmology. Trained on the complete collection of astronomy-related arXiv papers from 2007-2024 along with millions of synthetically-generated question-answer pairs and other astronomical literature, AstroSage-Llama-3.1-8B demonstrates excellent proficiency on a wide range of questions. This achievement demonstrates the potential of domain specialization in AI, suggesting that focused training can yield capabilities exceeding those of much larger, general-purpose models.
27
+ Model Details
28
+
29
+ Base Architecture: Meta-Llama-3.1-8B
30
+ Base Model: Meta-Llama-3.1-8B
31
+ Parameters: 8 billion
32
+ Training Focus: Astronomy, Astrophysics, Cosmology, and Astronomical Instrumentation
33
+ License: Llama 3.1 Community License
34
+ Development Process:
35
+ Continued Pre-training (CPT) on astronomical literature
36
+ Supervised Fine-tuning (SFT) on QA pairs and instruction sets
37
+ Model merging with Meta-Llama-3.1-8B-Instruct (75% CPT+SFT / 25% Meta-Instruct)
38
+
39
+ Using the model
40
+
41
+ import torch
42
+ from transformers import AutoModelForCausalLM, AutoTokenizer
43
+
44
+ # Load the model and tokenizer
45
+ model = AutoModelForCausalLM.from_pretrained("AstroMLab/AstroSage-8b", device_map="auto")
46
+ tokenizer = AutoTokenizer.from_pretrained("AstroMLab/AstroSage-8b")
47
+
48
+ # Function to generate a response
49
+ def generate_response(prompt):
50
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
51
+
52
+ outputs = model.generate(
53
+ **inputs,
54
+ max_new_tokens=128,
55
+ do_sample=True,
56
+ pad_token_id=tokenizer.eos_token_id,
57
+ )
58
+ response = outputs[0][inputs['input_ids'].shape[-1]:]
59
+ decoded = tokenizer.decode(response, skip_special_tokens=True)
60
+
61
+ return decoded
62
+
63
+ # Example usage
64
+ prompt = """
65
+ You are an expert in general astrophysics. Your task is to answer the following question:
66
+ What are the main components of a galaxy?
67
+ """
68
+ response = generate_response(prompt)
69
+ print(response)
70
+
71
+ Model Improvements and Performance
72
+
73
+ AstroSage-Llama-3.1-8B shows remarkable performance improvements:
74
+ Model Score (%)
75
+ AstroSage-Llama-3.1-8B 80.9
76
+ GPT-4o 80.4
77
+ LLaMA-3.1-8B 73.7
78
+ Gemma-2-9B 71.5
79
+ Qwen-2.5-7B 70.4
80
+ Yi-1.5-9B 68.4
81
+ InternLM-2.5-7B 64.5
82
+ Mistral-7B-v0.3 63.9
83
+ ChatGLM3-6B 50.4
84
+
85
+ The model demonstrates:
86
+
87
+ Outperformance of all 8B parameter models
88
+ Comparable performance to GPT-4o (80.4%)
89
+ ~1000x more cost-effective than proprietary models
90
+ 7 percentage-point improvement over base Llama-3.1-8b model
91
+
92
+ Training Data
93
+ -
94
+ Continued Pre-training:
95
+ ~250,000 arXiv preprints (2007-2024) from astro-ph and gr-qc
96
+ Astronomy-related Wikipedia articles
97
+ Selected astronomy textbooks
98
+ Total: 3.3 billion tokens, 19.9 GB plaintext
99
+
100
+ Supervised Fine-tuning:
101
+ 8.8 million curated QA pairs
102
+ Filtered Infinity-Instruct-7M dataset
103
+ Paper summaries and metadata
104
+ Total: 2.0 billion tokens, 9.8 GB plaintext
105
+
106
+ Intended Use
107
+ -
108
+ Curiosity-driven question answering
109
+ Brainstorming new ideas
110
+ Astronomical research assistance
111
+ Educational support in astronomy
112
+ Literature review and summarization
113
+ Scientific explanation of concepts
114
+
115
+ Limitations
116
+ -
117
+ Training data cutoff: January 2024
118
+ As with all LLMs, hallucinations are possible
119
+ Limited by 8B parameter size for complex reasoning
120
+ Paper metadata not perfectly memorized
121
+ Performance primarily validated on multiple-choice questions
122
+ Primarily trained for use in English
123
+
124
+ Technical Specifications
125
+ -
126
+ Architecture: Based on Meta-Llama 3.1
127
+ Training Infrastructure: ORNL OLCF Frontier
128
+ Hosting: Hugging Face Hub (AstroMLab/AstroSage-8B)
129
+
130
+ Ethical Considerations
131
+ -
132
+ While this model is designed for scientific use:
133
+
134
+ Should not be used as sole source for critical research decisions
135
+ Output should be verified against primary sources
136
+ May reflect biases present in astronomical literature
137
+
138
+ Citation and Contact
139
+ -
140
+ Corresponding author: Tijmen de Haan (tijmen dot dehaan at gmail dot com)
141
+ AstroMLab: astromachinelearninglab at gmail dot com
142
+ Please cite the AstroMLab 3 paper when referencing this model:
143
+
144
+ @preprint{dehaan2024astromlab3,
145
+ title={AstroMLab 3: Achieving GPT-4o Level Performance in Astronomy with a Specialized 8B-Parameter Large Language Model},
146
+ author={Tijmen de Haan and Yuan-Sen Ting and Tirthankar Ghosal and Tuan Dung Nguyen and Alberto Accomazzi and Azton Wells and Nesar Ramachandra and Rui Pan and Zechang Sun},
147
+ year={2024},
148
+ eprint={2411.09012},
149
+ archivePrefix={arXiv},
150
+ primaryClass={astro-ph.IM},
151
+ url={https://arxiv.org/abs/2411.09012},
152
+ }
153
+
154
+ ---
155
  ## Use with llama.cpp
156
  Install llama.cpp through brew (works on Mac and Linux)
157