A standard transformer model analyzes the text before and after a word to understand its meaning. According to Microsoft, Phi-4-mini is based on a version of the architecture called a decoder-only ...
Phi-4-multimodal is a 5.6 billion parameter model that uses the mixture-of-LoRAs technique ... billion parameter model based on a dense decoder-only transformer that supports sequences up to ...
This paper proposes a Dense Transformer Foundation Model with Mixture of Experts (DenseFormer-MoE), which integrates dense convolutional network, Vision Transformer and Mixture of Experts (MoE) to ...
The model uses an innovative hybrid-mamba-transformer fusion architecture ... first time the Mamba architecture has been applied losslessly to a super-large Mixture of Experts (MoE) model. Tencent ...
Falcon 2 utilizes an optimized decoder-only transformer architecture that enables strong performance at a smaller scale compared to other open models. TII plans to further boost efficiency using ...
You can create a release to package software, along with release notes and links to binary files, for other people to use. Learn more about releases in our docs.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results