OGAI-STEM-7B / README.md

Update README.md

a5d7c6c verified 2 days ago

4.72 kB

	---
	license: apache-2.0
	datasets:
	- GainEnergy/SMoE-Training
	- GainEnergy/reasoner
	- GainEnergy/ogai-8x7B
	- GainEnergy/oilandgas-engineering-dataset
	- GainEnergy/ogdataset
	- GainEnergy/upstrimacentral
	- open-r1/OpenR1-Math-220k
	- unsloth/LaTeX_OCR
	base_model: mistralai/Mathstral-7B-v0.1
	tags:
	- oil-gas
	- drilling-engineering
	- mathstral-7b
	- lora
	- fine-tuned
	- energy-ai
	- pragmatic-ai
	- gguf
	- text-generation-inference
	- text-generation
	model-index:
	- name: OGAI-STEM-7B
	results:
	- task:
	type: text-generation
	name: Engineering AI for Oil & Gas
	dataset:
	name: GainEnergy Oil & Gas Corpus
	type: custom
	metrics:
	- name: Engineering Calculations Accuracy
	type: accuracy
	value: 94.5
	- name: Scientific Computation Precision
	type: precision
	value: 92.3
	- name: Context Retention
	type: contextual-coherence
	value: High
	variants:
	- name: OGAI-STEM-7B-GGUF
	pipeline_tag: text-generation
	repo_name: GainEnergy/OGAI-STEM-7B-GGUF
	library_name: transformers
	language:
	- en
	widget:
	- text: >-
	User: What is the pressure drop in a horizontal pipeline for crude oil transport?

	AI:
	example_title: Pipeline Pressure Drop Calculation
	- text: >-
	User: Explain the differences between gas lift and electric submersible pumps in artificial lift.

	AI:
	example_title: Artificial Lift Methods
	- text: >-
	User: How do you calculate mud weight for deepwater drilling?

	AI:
	example_title: Mud Weight Calculation
	- text: >-
	User: Describe the steps to optimize wellbore stability in unconventional reservoirs.

	AI:
	example_title: Wellbore Stability Optimization
	pipeline_tag: text-generation

	---

	# OGAI-STEM-7B: AI-Powered Engineering Model for Oil & Gas Calculations

	![Hugging Face](https://img.shields.io/badge/HuggingFace-OGAI--STEM--7B-blue)
	[![License](https://img.shields.io/github/license/huggingface/transformers.svg)](LICENSE)

	## Model Description

	OGAI-STEM-7B is a LoRA fine-tuned Mathstral-7B model, designed specifically for oil and gas engineering, scientific computing, and technical problem-solving. It is optimized for numerical accuracy, complex engineering calculations, and technical document understanding.

	The model is an integral part of GainEnergy's Upstrima AI Platform, enhancing workflows with pragmatic AI agents, scientific computing tools, and retrieval-augmented generation (RAG)-based document analysis.

	## Technical Architecture

	### Base Model Specifications
	- Architecture: Mathstral-7B (Mistral fine-tuned for advanced math reasoning)
	- Parameters: 7B
	- Context Length: 32,768 tokens for long-form scientific queries
	- Mathematical Precision: Enhanced for oil & gas engineering computations

	### Fine-tuning Approach
	- Method: Low-Rank Adaptation (LoRA) with rank 64
	- Training Dataset: 3.2M datapoints from specialized oil & gas engineering sources
	- Hardware: Trained on 8x NVIDIA A100 80GB GPUs
	- Training Time: 2,200 GPU hours
	- Special Features: Improved accuracy in fluid mechanics, pressure drop, and geomechanics calculations

	### Performance Optimizations
	- Quantization: 4-bit and 8-bit versions optimized for low-memory inference
	- Inference Speed: Tuned KV cache management for real-time engineering computations
	- Memory Footprint: Runs efficiently on 12GB VRAM with 4-bit quantization
	- Reduced Hallucinations: Domain-specific fine-tuning minimizes incorrect scientific results

	## Deployment-Optimized Versions

	\| Version \| Memory Requirement \| Performance \|
	\|------------\|----------------------\|----------------\|
	\| [OGAI-STEM-7B-GGUF](https://huggingface.co./GainEnergy/OGAI-STEM-7B-GGUF) \| CPU optimized \| Suitable for edge computing \|

	### Local Deployment with vLLM
	```bash
	python -m vllm.entrypoints.openai.api_server \
	--model GainEnergy/ogai-stem-7b \
	--tensor-parallel-size 2
	```

	## How to Use

	### Run Inference in Python
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_name = "GainEnergy/ogai-stem-7b"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

	prompt = "Calculate the pressure drop in a 500m pipeline with a 10,000 BPD flow rate."
	inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
	outputs = model.generate(**inputs, max_new_tokens=100)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Citing OGAI-STEM-7B
	```
	@article{ogai_stem_7b_2025,
	title={OGAI-STEM-7B: AI Model for Oil & Gas Scientific Computing},
	author={GainEnergy AI Team},
	year={2025},
	publisher={Hugging Face Models}
	}
	```