Perform big model inference on a thousand-yuan GPU. BMInf performs low-cost and high-efficiency inference for big models,which can perform big model inference with more than 10 billion parameters on a single thousand-yuan GPU (GTX 1060).
Hardware Friendly
BMInf supports running models with more than 10 billion parameters on a single NVIDIA GTX 1060 GPU.
Open Source
The parameters of models are open-source. Users can access big models locally without accessing an online API.
Comprehensive Ability
BMInf supports CPM1, CPM2.1, and EVA. The abilities of these models cover text completion, text generation, and dialogue generation.
Convenient Deployment
Fast and convenient for developing downstream applications.
With BMInf, you can run inference on big models from anywhere.
We benchmarked BMInf with CPM2 decoding task on different platforms, and the results far exceeded PyTorch.
10B Model Decoding Speed
Supported Models
CPM2.1 is an upgraded version of CPM2 , which is a general Chinese pre-trained language model with 11 billion parameters. Based on CPM2, CPM2.1 introduces a generative pre-training task and was trained via the continual learning paradigm. In experiments, CPM2.1 has a better generation ability than CPM2.
CPM1 is a generative Chinese pre-trained language model with 2.6 billion parameters. The architecture of CPM1 is similar to GPT and it can be used in various NLP tasks such as conversation, essay generation, cloze test, and language understanding.
EVA is a Chinese pre-trained dialogue model with 2.8 billion parameters. EVA performs well on many dialogue tasks, especially in the multi-turn interaction of human-bot conversations.