Abstract : In this paper we propose a robust and direct 2D-to- 3D registration method for localizing 2D cameras in a known 3D environment. Although the 3D environment is known, localizing the cameras remains a challenging problem that is particularly undermined by the unknown 2D-3D correspondences, outliers, scale ambiguities and occlusions. Once the cameras are localized, the Structure-from-Motion reconstruction obtained from image correspondences is refined by means of a constrained nonlinear optimization that benefits from the knowledge of the scene. We also propose a common optimization framework for both localization and refinement steps in which projection errors in one view are minimized while preserving the existing relationships between images. The problem of occlusion and that of missing scene parts are handled by employing a scale histogram while the effect of data inaccuracies is minimized using an M-estimator- based technique.