Logo image
Computer Vision for Measuring Boxes: a comparison of Model-based Bayesian Inference and Convolutional Neural Networks
Graduate Thesis/Dissertation   Open access

Computer Vision for Measuring Boxes: a comparison of Model-based Bayesian Inference and Convolutional Neural Networks

Elliot Munro
Master of Science - MSc, University of Otago
University of Otago
2021
Handle:
https://hdl.handle.net/10523/10753

Abstract

New Zealand Computer-Vision dimensioning MCMC Bayesian-Inference CNN
This thesis compares two approaches for measuring the dimensions of boxes using computer vision (CV). The first approach is model-based Bayesian Inference (MBI), which uses a geometric cuboid model as well a camera-conveyor system model. The second approach is using Convolutional Neural Networks (CNNs). The methods were compared on: statistical scoring rules that were applied to posterior probability density function estimates, training and testing times, robustness to noise addition, and cuboid edge roundedness. Convolutional Neural Network (CNN) training data was generated by photo-realistic rendering of computer generated 3D CAD models, consisting of: 6,000 training images, 2,000 validation images, 2,000 testing images spread across five textures. Training was performed using Keras. MCMC was implemented in Python 3 using the same test data as used with CNN models. Methods of the CNN and MBI approaches were also briefly compared on images of real boxes. It was found that Corner Detection (CD) performance was the strong limiting factor for MBI performance, which was able to form tight posterior estimates when complete CD occurred. Due to low mixing, sometimes MCMC sampling runs became stuck in local minima, causing overly tight estimates and leading to poor scores. With full CD and good mixing, MCMC scores would tend to outperform CNN scores. MCMC suffered from an approximately 1000 times longer testing time than CNN (70 s vs 70 ms). However, CNN required significant time (strongly reduced by GPU) and data to pre-train the model before use. The robustness of the techniques were measured by systematically adding gaussian noise to images, as well as rounding the edges of the boxes. It was found that above a threshold noise variance of 100 (images used a 255 RGB colour scale) the CD failed to detect corners, breaking the MBI and causing poor performance compared to CNN, which broke more gracefully. Likewise, when box edges were rounded it was found that above an edge rounding threshold of 0.15 (0=cuboid and 1=sphere) CD failed, breaking the MBI and causing poor performance compared to CNN, which broke more gracefully. A single-image-input (SII) CNN model demonstrated greater robustness with respect to noise addition than a two-image-input (TII) CNN model. There was negligible difference between SII and TII with respect to box edge rounding.
pdf
MSc_Thesis.pdfDownloadView

Metrics

102 File views/ downloads
321 Record Views

Details

Logo image