Abstract : Accurate estimation of camera motion is very important for many robotics applications involving SfM and visual SLAM. Such accuracy is attempted by refining the estimated motion through nonlinear optimization. As many modern robots are equipped with both 2D and 3D cameras, it is both highly desirable and challenging to exploit data acquired from both modalities to achieve a better localization. Existing refinement methods, such as Bundle adjustment and loop closing, may be employed only when precise 2D-to-3D correspondences across frames are available. In this paper, we propose a framework for robot localization that benefits from both 2D and 3D information without requiring such accurate correspondences to be established. This is carried out through a 2D-3D based initial motion estimation followed by a constrained nonlinear optimization for motion refinement. The initial motion estimation finds the best possible 2D-to-3D correspondences and localizes the cameras with respect the 3D scene. The refinement step minimizes the projection errors of 3D points while preserving the existing relationships between images. The problems of occlusion and that of missing scene parts are handled by comparing the image-based reconstruction and 3D sensor measurements. The effect of data inaccuracies is minimized using an M-estimator based technique. Our experiments have demonstrated that the proposed framework allows to obtain a good initial motion estimate and a significant improvement through refinement.