Abstract
Software development effort estimation is important for quality management in the software development industry, yet its automation still remains a challenging issue. Applying machine learning algorithms alone often can not achieve satisfactory results. In this paper, we present an integrated data mining framework that incorporates domain knowledge into a series of data analysis and modeling processes, including visualization, feature selection, and model validation. An empirical study on the software effort estimation problem using a benchmark dataset shows the effectiveness of the proposed approach.