A High-Performance AI Inference Library for ONNX and Phi-4
Phinx is an advanced AI inference library designed to leverage ONNX Runtime GenAI and the Phi-4 Multimodal ONNX model for fast, efficient, and scalable AI applications. Built for developers seeking seamless integration of generative and multimodal AI into their projects, Phinx provides an optimized and flexible runtime environment with robust performance.
Key Features
- ONNX-Powered Inference – Efficient execution of Phi-4 models using ONNX Runtime GenAI.
- Multimodal AI – Supports text, image, and multi-input inference for diverse AI tasks.
- Optimized Performance – Accelerated inference leveraging ONNX optimizations for speed and efficiency.
- Developer-Friendly API – Simple yet powerful APIs for quick integration into Delphi, Python, and other platforms.
- Self-Contained & Virtualized – The
Phinx.model
file acts as a virtual folder inside the application, bundling the Phi-4 ONNX model files and all required dependencies into a single, easily distributable format. Phinx is ideal for AI research, creative applications, and production-ready generative AI solutions. Whether you’re building chatbots, AI-powered content generation, or multimodal assistants, Phinx delivers the speed and flexibility you need!
Phinx Model File Format (Phinx.model
)
The Phinx.model format is a specialized file format designed for storing ONNX-based machine learning models, optimized for CUDA-powered inference. It provides a structured, efficient, and extensible way to encapsulate all required components for seamless model execution.
Key Features
Self-Contained & Virtualized
- The
Phinx.model
file serves as a virtual folder inside the application, encapsulating all necessary components for model execution. - It includes the Phi-4 ONNX model files, along with all required dependencies, ensuring portability and ease of deployment.
- The
Optimized for CUDA Inference
- Designed to leverage GPU acceleration, making it ideal for high-performance AI applications.
- Ensures fast loading and execution with CUDA-optimized computations.
Structured & Extensible
- Stores model weights, metadata, configuration parameters, and dependencies in a well-organized manner.
- Future-proof design allows for extensibility, supporting additional configurations or optimizations as needed.
Simplified Deployment
- By consolidating all required files into a single
.model
file, it simplifies model distribution and integration into AI applications. - Eliminates external dependency management, ensuring plug-and-play usability.
- By consolidating all required files into a single
🚧 Note: This repository is currently in the setup phase, and documentation is not yet available. However, the code is fully functional and generally stable. Stay tuned—this README, along with the documentation and other resources, will be updated soon! 🚀
Phinx – Powering AI with Phi4, ONNX & CUDA, Seamless, Efficient, and Built for Performance! ⚡
Made with ❤️ in Delphi