The node proc_vision_ros2 have to objective to get a model of IA and interfer on it. After it return the detections, the distance and angle to align. To realise that the project was originaly in python but go in c++ at the end of this concept
The project have start after the return of the competitions ROBOSUB2025. During this competition, a lot of issues have append on this node and different function.
The main issue was issue on the time take to give a resultat. The return it between 250 ms to 2s.
Another issue was to have precision in distance, the step was in 0.5m.
Third issue was link to crash of the system.
This issue need to be fixed to be able to correctly use this system.
To have a reliable system and usable multiple condition was needed:
A primary aproach was to get time to each part to investigate the part whis most time taken.
Four area was define:
Detecion:
4 another processing
the resultat of this testing:
preprocess : 66 ms
process : 28~34 ms
post-process: 180~350 ms
another processing: 2~2000 ms
To start the post-process and another processing need to be rework.
The two have the same problem whic was the processing by the python of matrix.
The project issue was solve by using numpy to reduce the time need to fixe the issue.
For the post-process this was solved by transforming the output of the IA in numpy matrix and operate on the matrix. The change done on another processing was to using the image was already on numpy matrix to operate on it. the principal point was the extraction of the sub matrix of detection.
The time needed after this simple change was
prost-process : 2~4 ms
another processing : 2~100 ms
a second round was needed for the another process. It was realise by changing the technic use to get the distance in the same time. the previous method was to get all pixels in center of the matrix and use it to get the distance. The new method was to use istogramm and get the most present value using numpy.
This change permit to achive a
2~40 ms whis approximatly 1 ms by detection it's not possible to go under because of python itself
The preprocess can be separated in two part. The first is the input message and transformation to readable and the second is the transformation of the image into a readable image for the model.
For the first part the modification was to use Image instead of CompressedImage. CompressedImage is good for transport between device but the image came directly from the jetson so the Image was better. This have to effect to permit to gain 5 ms and another effect was the depth is now given in mm instead of 0.5m.
Another change was to remove a transformation that alterate the image because the bad quality was because the camera change of mode between dark and light. This change reduce the time taken by 20 m.
The final change was to rework the remaining code to reduce the number of operation. this achive another 20 ms of gain.
This different change permit to achive a 20 ms for the preprocess.
After this first succes the system was go to
preprocess : 19~22 ms
process : 28~34 ms
post-process: 2~4 ms
another processing: 2~40 ms
for another processing the 95% was around 15 ms with the number of detection during the competition. This give a median maximum time to run of 22+34+4+15= 75ms. The objectiv is now in area to get the objective fixed 66 ms. But two problematic have appeard, first the alignement to a object, two the half of time is because of the inference of the model.
So a decision was taken to have to three phase: one created a calcul of the alignement needed to be executed, second the change of model runtime and third the switch of the all node to c++.
the calcul was done by function but needed to be restrained because of the time taken, 2~3 ms by element, so the decision was made to concetrate on only few object to to alterate the performance.
The model type is onnx, it's transformed from a pytorch engigne. It have advantage to run on any computer and can be compiled only one time for all.
The focus was rapidly turn to TensorRT, the principe was compiled to run with only nvidia GPU but it's compressed runtime offere better performance. So a test was prepared, trtexec was use to transform the engigne from onnx to trt engine, the first test show a reduction of 3 ms in average. A second test was organised but the transformation was done by reducing the weigth from float 32 to float 16. it resultat to a resultat of 16~19 ms.
The final time is for python 22+19+4+15=60ms. The result is compliant to objectif of 95% under 66 ms, so the frame rate of 15 fps is achive.
The switch was done to achive a under 33 ms to be in 30 fps. The principe was just to convert all code in python to c++. This modification give a pretty good result to 9~11ms. This result give a possiblity to be at 60 fps.
After this rework, a change was made whis model, different model of YOLO was done, the result final was the better to 30fps is YOLOV11L with a 24~26 ms in general with a object distance and angle in maximum 250 micro second with a average of 125 micro second.
The rework is very good. All objectif was achive whis a reduction from 2s to 26ms in maximum with a 100% under the goal. The distance is now given in cm.
During the rework all crash source was corrected or protected.
The branch Base_Branch_for_Python is for all dev and develop for running version on jetson