the First try consist only of image we took with the simulator. Since a discution with a professor at ETS told us that CNN are resistant to translation, a little bit resistant to rotation and very weak to variation in scale. We tried to take a lot of image with a variation in scale and a little bit of image with a variation in rotation. At the time we had a bug with the simulation that all images were mirrored in y. Here's an example of images we had in ours dataset:



The dataset have exactly 3,175 images like this. For this try, we only trained with a variation of step and batch size. We trained with the model zoo SSD MobileNet V2 FPNLite 640x640.
Here's the result with a step of 15000 and a batch of 16: