Get Started¶

MMDeploy provides useful tools for deploying OneDL Lab models to various platforms and devices.

With the help of them, you can not only do model deployment using our pre-defined pipelines but also customize your own deployment pipeline.

These instructions may be outdated since version 1.4.0, especially the non-linux versions.

Introduction¶

In MMDeploy, the deployment pipeline can be illustrated by a sequential modules, i.e., Model Converter, MMDeploy Model and Inference SDK.

deploy-pipeline

Model Converter¶

Model Converter aims at converting training models from OneDL Lab into backend models that can be run on target devices. It is able to transform PyTorch model into IR model, i.e., ONNX, TorchScript, as well as convert IR model to backend model. By combining them together, we can achieve one-click end-to-end model deployment.

MMDeploy Model¶

MMDeploy Model is the result package exported by Model Converter. Beside the backend models, it also includes the model meta info, which will be used by Inference SDK.

Inference SDK¶

Inference SDK is developed by C/C++, wrapping the preprocessing, model forward and postprocessing modules in model inference. It supports FFI such as C, C++, Python, C#, Java and so on.

Prerequisites¶

In order to do an end-to-end model deployment, MMDeploy requires Python 3.10+ and PyTorch 2.2+.

Step 0. Install uv

Step 1. Install PyTorch following official instructions, e.g.

On GPU platforms:

uv pip install --torch-backend={cuda_version} torch=={pytorch_version} torchvision

On CPU platforms:

uv pip install --torch-backend=cpu torch=={pytorch_version} torchvision

Note

On GPU platform, please ensure that {cuda_version} matches your host CUDA toolkit version. Otherwise, it probably brings in conflicts when deploying model with TensorRT.

Installation¶

We recommend that users follow our best practices installing MMDeploy.

Step 0. Install OneDL MMCV.

pip install -U onedl-mim
mim install onedl-mmengine
mim install onedl-mmcv

Step 1. Install MMDeploy and inference engine We currently don’t provide prebuilt packages.

Note: Please build MMDeploy from source

Convert Model¶

After the installation, you can enjoy the model deployment journey starting from converting PyTorch model to backend model by running tools/deploy.py.

Based on the above settings, we provide an example to convert the Faster R-CNN in OneDL-MMDetection to TensorRT as below:

# clone mmdeploy to get the deployment config. `--recursive` is not necessary
git clone -b main https://github.com/vbti-development/onedl-mmdeploy.git

# clone mmdetection repo. We have to use the config file to build PyTorch nn module
git clone -b 3.x https://github.com/vbti-development/onedl-mmdetection.git
cd onedl-mmdetection
mim install -v -e .
cd ..

# download Faster R-CNN checkpoint
wget -P checkpoints https://mmassets.onedl.ai/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth

# run the command to start model conversion
python mmdeploy/tools/deploy.py \
    mmdeploy/configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py \
    mmdetection/configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py \
    checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
    mmdetection/demo/demo.jpg \
    --work-dir mmdeploy_model/faster-rcnn \
    --device cuda \
    --dump-info

The converted model and its meta info will be found in the path specified by --work-dir. And they make up of OneDL MMDeploy Model that can be fed to OneDL MMDeploy SDK to do model inference.

For more details about model conversion, you can read how_to_convert_model. If you want to customize the conversion pipeline, you can edit the config file by following this tutorial.

Tip

You can convert the above model to onnx model and perform ONNX Runtime inference just by changing ‘detection_tensorrt_dynamic-320x320-1344x1344.py’ to ‘detection_onnxruntime_dynamic.py’ and making ‘–device’ as ‘cpu’.

Inference Model¶

After model conversion, we can perform inference not only by Model Converter but also by Inference SDK.

Inference by Model Converter¶

Model Converter provides a unified API named as inference_model to do the job, making all inference backends API transparent to users. Take the previous converted Faster R-CNN tensorrt model for example,

from mmdeploy.apis import inference_model
result = inference_model(
  model_cfg='mmdetection/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py',
  deploy_cfg='mmdeploy/configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py',
  backend_files=['mmdeploy_model/faster-rcnn/end2end.engine'],
  img='mmdetection/demo/demo.jpg',
  device='cuda:0')

Note

‘backend_files’ in this API refers to backend engine file path, which MUST be put in a list, since some inference engines like OpenVINO and ncnn separate the network structure and its weights into two files.

Inference by SDK¶

You can directly run OneDL MMDeploy demo programs in the precompiled package to get inference results.

wget https://github.com/vbti-development/onedl-mmdeploy/releases/download/v1.3.1/mmdeploy-1.3.1-linux-x86_64-cuda11.8.tar.gz
tar xf mmdeploy-1.3.1-linux-x86_64-cuda11.8
cd mmdeploy-1.3.1-linux-x86_64-cuda11.8
# run python demo
python example/python/object_detection.py cuda ../mmdeploy_model/faster-rcnn ../mmdetection/demo/demo.jpg
# run C/C++ demo
# build the demo according to the README.md in the folder.
./bin/object_detection cuda ../mmdeploy_model/faster-rcnn ../mmdetection/demo/demo.jpg

Note

In the above command, the input model is SDK Model path. It is NOT engine file path but actually the path passed to –work-dir. It not only includes engine files but also meta information like ‘deploy.json’ and ‘pipeline.json’.

In the next section, we will provide examples of deploying the converted Faster R-CNN model talked above with SDK different FFI (Foreign Function Interface).

Python API¶

from mmdeploy_runtime import Detector
import cv2

img = cv2.imread('mmdetection/demo/demo.jpg')

# create a detector
detector = Detector(model_path='mmdeploy_models/faster-rcnn', device_name='cuda', device_id=0)
# run the inference
bboxes, labels, _ = detector(img)
# Filter the result according to threshold
indices = [i for i in range(len(bboxes))]
for index, bbox, label_id in zip(indices, bboxes, labels):
  [left, top, right, bottom], score = bbox[0:4].astype(int),  bbox[4]
  if score < 0.3:
      continue
  cv2.rectangle(img, (left, top), (right, bottom), (0, 255, 0))

cv2.imwrite('output_detection.png', img)

You can find more examples from here.

C++ API¶

Using SDK C++ API should follow next pattern,

Now let’s apply this procedure on the above Faster R-CNN model.

#include <cstdlib>
#include <opencv2/opencv.hpp>
#include "mmdeploy/detector.hpp"

int main() {
  const char* device_name = "cuda";
  int device_id = 0;
  std::string model_path = "mmdeploy_model/faster-rcnn";
  std::string image_path = "mmdetection/demo/demo.jpg";

  // 1. load model
  mmdeploy::Model model(model_path);
  // 2. create predictor
  mmdeploy::Detector detector(model, mmdeploy::Device{device_name, device_id});
  // 3. read image
  cv::Mat img = cv::imread(image_path);
  // 4. inference
  auto dets = detector.Apply(img);
  // 5. deal with the result. Here we choose to visualize it
  for (int i = 0; i < dets.size(); ++i) {
    const auto& box = dets[i].bbox;
    fprintf(stdout, "box %d, left=%.2f, top=%.2f, right=%.2f, bottom=%.2f, label=%d, score=%.4f\n",
            i, box.left, box.top, box.right, box.bottom, dets[i].label_id, dets[i].score);
    if (dets[i].score < 0.3) {
      continue;
    }
    cv::rectangle(img, cv::Point{(int)box.left, (int)box.top},
                  cv::Point{(int)box.right, (int)box.bottom}, cv::Scalar{0, 255, 0});
  }
  cv::imwrite("output_detection.png", img);
  return 0;
}

When you build this example, try to add MMDeploy package in your CMake project as following. Then pass -DMMDeploy_DIR to cmake, which indicates the path where MMDeployConfig.cmake locates. You can find it in the prebuilt package.

find_package(MMDeploy REQUIRED)
target_link_libraries(${name} PRIVATE mmdeploy ${OpenCV_LIBS})

For more SDK C++ API usages, please read these samples.

For the rest C, C# and Java API usages, please read C demos, C# demos and Java demos respectively. We’ll talk about them more in our next release.

Accelerate preprocessing（Experimental）¶

If you want to fuse preprocess for acceleration，please refer to this doc

Evaluate Model¶

You can test the performance of deployed model using tool/test.py. For example,

python ${MMDEPLOY_DIR}/tools/test.py \
    ${MMDEPLOY_DIR}/configs/detection/detection_tensorrt_dynamic-320x320-1344x1344.py \
    ${MMDET_DIR}/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py \
    --model ${BACKEND_MODEL_FILES} \
    --metrics ${METRICS} \
    --device cuda:0

Note

Regarding the –model option, it represents the converted engine files path when using Model Converter to do performance test. But when you try to test the metrics by Inference SDK, this option refers to the directory path of MMDeploy Model.

You can read how to evaluate a model for more details.