Accurately extracting buildings from aerial images has essential research significance for timely understanding human intervention on the land. The distribution discrepancies between diversified unlabeled remote sensing images (changes in imaging sensor, location, and environment) and labeled historical images significantly degrade the generalization performance of deep learning algorithms. Unsupervised domain adaptation (UDA) algorithms have recently been proposed to eliminate the distribution discrepancies without re-annotating training data for new domains. Nevertheless, due to the limited information provided by a single source domain, single-source UDA is not an optimal choice when multi-temporal and multi-region remote sensing images are available. We propose a multi-source UDA (MSUDA) framework SPENet for building extraction, aiming at selecting, purifying and exchanging information from multisource domains to better adapt the model to the target domain. Specifically, the framework effectively utilizes richer knowledge by extracting target-relevant information from multiple source domains, purifying target domain information with low-level features of buildings, and exchanging target domain information in a collaborative learning manner. Extensive experiments and ablation studies constructed on twelve city datasets prove the effectiveness of our method against existing state-of-the-art methods, e.g., our method achieves 59.1% IoU on Austin and Kitsap -> Potsdam, which surpasses the target domain supervised method by 2.2%.