AI & ML interests
Mixture of Experts, Branch Merge Train, International Cooperation, Reuse, https://github.com/ontocord/MDEL
Recent Activity
Multi-Domain Expert Learning (M*DEL)**:
How to increase knowledge without breaking the bank?
Ontocord.AI and the open science community.
M*DEL is a volunteer open science community for creating better mixtures of experts with volunteers from: Bedrock AI, TurkuNLP, ETH, Redmond.AI, Incite, MICS CentraleSupelec, Centro de Excelência em Inteligência Artificial, VietAI, Technion - Israel Institute of Technology, Nous Research, University of Western Australia, KoboldAI Community, LAION.AI, Mila, Luleå University of Technology, Juelich Supercomputing Center, Tokyo Tech, RIKEN, Together
OSS AI models can lead to increased innovation, accessibility, transparency, and community building. However we need a mechanism to train more capable models in an efficient and modular way.
The proposed method that we call Multi-Domain Expert Learning (MDEL) involves branching from a base model, training each branch independently on a specific domain for specific layers or other adapters, and merging the trained models at the end. Additionally, the specific layers or adapters are kept as experts, with a classifier used as a router to activate the experts during inference. This approach makes it possible to easily increase expertise of a model, to independently train more "adapters", and to reuse previously trained experts and models without retraining, resulting in a modular and efficient system.
In this effort, we seek international labs and open science aligned researchers and companies in various countries to each train a set of domain experts of their choosing, thereby enabling international participation and knowledge sharing. This will also result in lower costs for training and a lower environmental impact due to reuse and lower energy usage. Currently we have volunteers from four continents and are looking for more.
We will be using a varient of the c-BTM (https://arxiv.org/pdf/2303.14177v1.pdf) method and will be focusing on models ranging from 7-70B parameters.
In some of our models, we will also be adding multi-lingual, multi-modal abilities for both understanding and generation with context lengths of 8K-35K tokens.
Languages will include: hi, vi, en, ja, fi. We may add others if compute is available.
If you are interested in contributing to this project, please reach out to us and learn more about how you can get involved at [email protected].
Let's work together to create open-source models that benefit everyone! 🤝 #AI #MDEL #Supercomputers #Summit #OpenSource #Innovation #VolunteersNeeded #OpenScience #DemocratizeAI
Requirements for joining this HF Repo: By joining this hf repo, you agree that you will not disclose any data we are gathering or ideas we present in our community channels until after a paper has been written.
This protects the intellecctual freedom of researchers and their right to publish and benefit from their work.
** Why did we change the term "Layer" to "Learning"? Because we are exploring, in addition to layerwise experts, also working with different adapters and architecture like Flamingo (https://arxiv.org/abs/2204.14198), EMU (https://arxiv.org/abs/2307.05222) and a novel multi-node architecture for training loras we call lora-x, which will allow us to swap out different component experts to improve the performance of the model.