Abstract : This chapter describes a technique that can geo-localize arbitrary 2D depictions of architectural sites, including drawings, paintings and historical pho-tographs. This is achieved by aligning the input depiction with a 3D model of the corresponding site. The task is very difficult as the appearance and scene structure in the 2D depictions can be very different from the appearance and geometry of the 3D model, e.g., due to the specific rendering style, drawing error, age, lighting or change of seasons. In addition, we face a hard search problem: the number of possible alignments of the depiction to a set of 3D models from different architec-tural sites is huge. To address these issues, we develop a compact representation of complex 3D scenes. 3D models of several scenes are represented by a set of discrim-inative visual elements that are automatically learnt from rendered views. Similar to object detection, the set of visual elements, as well as the weights of individual features for each element, are learnt in a discriminative fashion. We show that the learnt visual elements are reliably matched in 2D depictions of the scene despite large variations in rendering style (e.g. watercolor, sketch, historical photograph) and structural changes (e.g. missing scene parts, large occluders) of the scene. We demonstrate that the proposed approach can automatically identify the correct archi-tectural site as well as recover an approximate viewpoint of historical photographs and paintings with respect to the 3D model of the site. Fig. 1 Our system automatically geo-localizes paintings, drawings, and historical photographs by recovering their viewpoint with respect to a geo-referenced 3D model of the depicted architec-tural site. Here geo-localized paintings of Notre Dame in Paris are visualized in the Google Earth geobrowser.