Evidence of transcription at polyT short tandem repeats
Résumé
Background: Using the Cap Analysis of Gene Expression technology, the FANTOM5 consortium provided one of the most comprehensive maps of Transcription Start Sites (TSSs) in several species. Strikingly, ∼ 72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers.
Results: Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at short tandem repeats (STRs) corresponding to homopolymers of thymidines (T). Additional analyses confirm that these CAGEs are truly associated with transcriptionally active chromatin marks. Furthermore, we train a sequence-based deep learning model able to predict CAGE signal at T STRs with high accuracy (∼ 81%). Extracting features learned by this model reveals that transcription at T STRs is mostly directed by STR length but also instructions lying in the downstream sequence. Excitingly, our model also predicts that genetic variants linked to human diseases affect this STR-associated transcription.
Conclusions: Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism. We also provide a new metric that can be considered in future studies of STR-related complex traits.
Results: Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at short tandem repeats (STRs) corresponding to homopolymers of thymidines (T). Additional analyses confirm that these CAGEs are truly associated with transcriptionally active chromatin marks. Furthermore, we train a sequence-based deep learning model able to predict CAGE signal at T STRs with high accuracy (∼ 81%). Extracting features learned by this model reveals that transcription at T STRs is mostly directed by STR length but also instructions lying in the downstream sequence. Excitingly, our model also predicts that genetic variants linked to human diseases affect this STR-associated transcription.
Conclusions: Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism. We also provide a new metric that can be considered in future studies of STR-related complex traits.
Origine : Fichiers produits par l'(les) auteur(s)