A standard transformer model analyzes the text before and after a word to understand its meaning. According to Microsoft, Phi-4-mini is based on a version of the architecture called a decoder-only ...
Phi-4-multimodal is a 5.6 billion parameter model that uses the mixture-of-LoRAs technique ... billion parameter model based on a dense decoder-only transformer that supports sequences up to ...
This paper proposes a Dense Transformer Foundation Model with Mixture of Experts (DenseFormer-MoE), which integrates dense convolutional network, Vision Transformer and Mixture of Experts (MoE) to ...
The model uses an innovative hybrid-mamba-transformer fusion architecture ... first time the Mamba architecture has been applied losslessly to a super-large Mixture of Experts (MoE) model. Tencent ...
Falcon 2 utilizes an optimized decoder-only transformer architecture that enables strong performance at a smaller scale compared to other open models. TII plans to further boost efficiency using ...
You can create a release to package software, along with release notes and links to binary files, for other people to use. Learn more about releases in our docs.