Abstract : The concept of augmented reality audio (ARA) characterizes techniques where a physically real sound and voice environment is extended with virtual, geolocalized sound objects. We show that the authoring of an ARA scene can be done through an iterative process composed of two stages: in the first one the author has to move in the rendering zone to apprehend the audio spatialization and the chronology of the audio events and in the second one a textual editing of the sequencing of the sound sources and DSP acoustics parameters is done. This authoring process is based on the join use of two XML languages, OpenStreetMap for maps and A2ML for Interactive 3D Audio. A2ML being a format for a cue-oriented interactive audio system, requests for interactive audio services are done through TCDL, a Tag-based Cue Dispatching language. This separation of modeling and audio rendering is similar to what is done for the web of documents with HTML and CSS style sheets.