Abstract : Peer-to-peer file sharing systems have grown to the extent that they now generate most of the Internet traffic, way ahead of Web traffic. Understanding workload properties of peer-to-peer systems is necessary to optimize their performance. In this paper we present an empirical study of a workload gathered by crawling the eDonkey network a dominant file sharing system for over 50 days. Besides confirming the presence of some well-known features, such as the prevalence of free-riding and the Zipf-like distribution of file popularity, we also analyze several previously ignored aspects of such workloads. More specifically, we measure the geographical clustering of peers offering a given file. We find that most files are offered mostly by peers of a single country, although popular files don't have such a clear home country. We also analyze the overlap between contents offered by different peers. We find that peer contents tend to be clustered, which may be taken as evidence that peers possess specific interests. We leverage this and allow peers to search for content without any server support, by maintaining a list of semantic neighbours, i.e. peers with similar interests. Simulation results confirm the clustering property of the trace and show that a high hit ratio is achieved by querying the most recently discovered peers even after removing the top 15% most generous peers. Results also indicate that the clustering is much higher for rare files.