Abstract
Approximate <inline-formula><tex-math notation="LaTeX">k</tex-math> <inline-graphic xlink:href="eyers-ieq1-2748131.gif"/> </inline-formula> Nearest Neighbours (A <inline-formula><tex-math notation="LaTeX">k</tex-math> <inline-graphic xlink:href="eyers-ieq2-2748131.gif"/> </inline-formula>NN) search is widely used in domains such as computer vision and machine learning. However, A<inline-formula><tex-math notation="LaTeX">k </tex-math> <inline-graphic xlink:href="eyers-ieq3-2748131.gif"/> </inline-formula>NN search in high-dimensional datasets does not scale well on multicore platforms, due to its large memory footprint. Parallel A <inline-formula><tex-math notation="LaTeX">k</tex-math> <inline-graphic xlink:href="eyers-ieq4-2748131.gif"/> </inline-formula>NN search using space subdivision for filtering helps reduce the memory footprint, but its loss of precision is unstable. In this paper, we propose a new data filtering method-PCAF-for parallel A<inline-formula><tex-math notation="LaTeX">k</tex-math> <inline-graphic xlink:href="eyers-ieq5-2748131.gif"/> </inline-formula>NN search based on principal component analysis. PCAF improves on previous methods, demonstrating sustained, high scalability for a wide range of high-dimensional datasets on both Intel and AMD multicore platforms. Moreover, PCAF maintains highly precise A<inline-formula><tex-math notation="LaTeX">k</tex-math> <inline-graphic xlink:href="eyers-ieq6-2748131.gif"/> </inline-formula>NN search results.