Mixture of Experts Transformer Decoder

Microsoft releases new Phi models optimized for multimodal processing, efficiency

A standard transformer model analyzes the text before and after a word to understand its meaning. According to Microsoft, Phi-4-mini is based on a version of the architecture called a decoder-only ...

InfoWorld29d

Microsoft’s Phi-4-multimodal AI model handles speech, text, and video

Phi-4-multimodal is a 5.6 billion parameter model that uses the mixture-of-LoRAs technique ... billion parameter model based on a dense decoder-only transformer that supports sequences up to ...

IEEE14d

DenseFormer-MoE: A Dense Transformer Foundation Model with Mixture of Experts for Multi-Task Brain Image Analysis

This paper proposes a Dense Transformer Foundation Model with Mixture of Experts (DenseFormer-MoE), which integrates dense convolutional network, Vision Transformer and Mixture of Experts (MoE) to ...

Analytics India Magazine28d

Tencent Releases Hunyuan Turbo S to Rival DeepSeek as Competition Heats up in China

The model uses an innovative hybrid-mamba-transformer fusion architecture ... first time the Mamba architecture has been applied losslessly to a super-large Mixture of Experts (MoE) model. Tencent ...

unite27d

5 Best Open Source LLMs (March 2025)

Falcon 2 utilizes an optimized decoder-only transformer architecture that enables strong performance at a smaller scale compared to other open models. TII plans to further boost efficiency using ...

GitHub19d

Releases: shekharsomani98/Mixture-of-Experts-using-Switch-Transformers

You can create a release to package software, along with release notes and links to binary files, for other people to use. Learn more about releases in our docs.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results