Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Xiaoxin Tang; Steven Mills; David Eyers; Zhiyi Huang; Kai-Cheung Leung; Minyi Guo; IEEE

doi:10.1109/ICPP.2013.85

Back

Conference proceeding

Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Xiaoxin Tang, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung Leung, Minyi Guo and IEEE

2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), pp.718-727

Proceedings of the International Conference on Parallel Processing

01/01/2013

DOI: https://doi.org/10.1109/ICPP.2013.85

Abstract

Computer Science

Computer Science, Hardware & Architecture

Science & Technology

Technology

Parallel programming is the mainstream for today's HPC applications. Programmers need to parallelize their programs to achieve better performance on multicore systems. However, due to a lack of good understanding of parallelism in algorithms, scheduling policy in runtime systems, and multicore architectures, programmers usually find it very hard to write high-performance, scalable programs on these parallel platforms. Although using a parallelized library written by experts can reduce the amount of work for coding, it does not automatically guarantee good performance according to our study. A better understanding of parallelism in algorithms, the OS/runtime systems, and hardware architectures is necessary if programmers wish to further improve performance. In this paper, we use SIFT-based feature matching within large-scale image collections to show the importance of three factors-the level of parallelism, scheduling policy, and memory architecture-that affect the performance of large-scale feature matching on multicore systems. We demonstrate experimental results using programs based on OpenCV and OpenMP, which are executed on both 16-core and 64-core machines. From our experimental results, we find that images with a large number of features achieve poor scalability on the 64-core machine due to a poor cache utilization. To address this issue of cache performance, we propose a Divide-and-Merge algorithm that divides the feature space into several small sub-spaces so that they fit within the cache. Our experiments show that the performance tuning addressing all of the three factors improves the speedup of feature matching from 10.6x to 21.5x on the 64-core machine. While the speedup is improved by 103%, the scalability of the feature matching algorithm is improved by up to 6.45 times on the 64-core machine with our performance tuning. Our study indicates that performance tuning on multicore systems is very challenging even for a simple image processing algorithm.

Metrics

1 Record Views

Details

Record Identifier: 9926549625701891
Title: Performance Tuning on Multicore Systems for Feature Matching within Image Collections
Creators: Xiaoxin Tang
Steven Mills
David Eyers
Zhiyi Huang
Kai-Cheung Leung
Minyi Guo
IEEE
Publication Details: 2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), pp.718-727
Academic Unit: Computer Science
Publisher: IEEE
Grant note: 2011AA01A202 / National High-Tech R&D Program of China (863 Program); National High Technology Research and Development Program of China
Date published ; e-published: 01/01/2013
Language: English
Resource Type; Subtype: Conference proceeding

Performance Tuning on Multicore Systems for Feature Matching within Image Collections

Abstract

Related links

Metrics

Details

Usage Policy