One of the non-functional quality attributes for proc_image_processing is the performance. First, the environment was changed to support the use of the CUDA API and allow developers to create filters that will be executed on the GPU of the Jetson AGX Xavier. Though, filters can still benefit from the multiple cores of the CPU.
Also, we made sure that the CPU implementation of the module is using efficient C++ coding style and techniques. Among those techniques, we can find:
Production build also now defaults to Release
which is way more optimized than Debug
build that was previously used.
To profile a filter's performance, a new parameter has been added to show their execution time. This parameter is visible from the telemetry and can be toggled individually for each filter. As such, the execution time will then be showed in the module's output console by using a ROS_INFO message.
OpenCV provides an implementation of a function that allows us to benefit from multiple CPU cores when writing custom filters. This function, named parallel_for_(), takes, as parameters, a range (of indices) and a class that extends cv::ParallelLoobBody
. As such, a class named ParallelLoopBodyWrapper
as been added so that it facilitates it's usage for custom filters. To use it,
Here is an example to benefit from multiple CPU cores.
First, extend ParallelLoopBodyWrapper
and override the operator()
function. This function is where you will manipulate and modify the image matrix. Here's an example for the ContrastAndBrightnessFilter:
class ParallelCABF : public ParallelLoopBodyWrapper {
public:
explicit ParallelCABF(cv::Mat &image, RangedParameter<double> &contrast, RangedParameter<double> &brightness) :
image(image),
contrast_(contrast),
brightness_(brightness) {}
~ParallelCABF() override = default;
void operator()(const cv::Range &range) const override {
for (auto r = range.start; r < range.end; r++) {
int y = r / image.cols;
int x = r % image.cols;
auto& vec = const_cast<cv::Vec3b &>(image.at<cv::Vec3b>(y, x));
for (auto c = 0; c < image.channels(); c++) {
vec[c] = cv::saturate_cast<uchar>(contrast_.getValue() * (vec[c]) + brightness_.getValue());
}
}
}
private:
cv::Mat image;
RangedParameter<double> contrast_, brightness_;
};
Next, with this body implementation as a private inner class of the filter, we can now implement it's apply() method this way:
void apply(cv::Mat &image) override {
cv::parallel_for_(cv::Range(0, image.rows * image.cols), ParallelCABF(image, contrast_, brightness_));
}