COVID-19 Disease Map, a computational knowledge repository of SARS-CoV-2 virus-host interaction mechanisms

We hereby describe a large-scale community effort to build an open-access, interoperable, and computable repository of COVID-19 molecular mechanisms - the COVID-19 Disease Map. We discuss the tools, platforms, and guidelines necessary for the distributed development of its contents by a multi-faceted community of biocurators, domain experts, bioinformaticians, and computational biologists. We highlight the role of relevant databases and text mining approaches in enrichment and validation of the curated mechanisms. We describe the contents of the map and their relevance to the molecular pathophysiology of COVID-19 and the analytical and computational modelling approaches that can be applied to the contents of the COVID-19 Disease Map for mechanistic data interpretation and predictions. We conclude by demonstrating concrete applications of our work through several use cases.


Introduction
The coronavirus disease 2019 (COVID-19) pandemic due to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1] has already resulted in the infection of over 40 million people worldwide, of whom one million have died 1 . The molecular pathophysiology that links SARS-CoV-2 infection to the clinical manifestations and course of COVID-19 is complex and spans multiple biological pathways, cell types and organs [2,3]. To gain the insights into this complex network, the biomedical research community needs to approach it from a systems perspective, collecting the mechanistic knowledge scattered across the scientific literature and bioinformatic databases, and integrating it using formal systems biology standards.
With this goal in mind, we initiated a collaborative effort involving over 230 biocurators, domain experts, modelers and data analysts from 120 institutions in 30 countries to develop the COVID-19 Disease Map, an open-access collection of curated computational diagrams and models of molecular mechanisms implicated in the disease [4].
To this end, we aligned the biocuration efforts of the Disease Maps Community [5,6], Reactome [7], and WikiPathways [8] and developed common guidelines utilising standardised encoding and annotation schemes, based on community-developed systems biology standards [9][10][11], and persistent identifier repositories [12]. Moreover, we integrated relevant knowledge from public repositories [13][14][15][16] and text mining resources, providing a means to update and refine contents of the map. The fruit of these efforts was a series of pathway diagrams describing key events in the COVID-19 infectious cycle and host response.
We ensured that this comprehensive diagrammatic description of disease mechanisms is machine-readable and computable. This allows us to develop novel bioinformatics workflows, creating executable networks for analysis and prediction. In this way, the map is both human and machine-readable, lowering the communication barrier between biocurators, domain experts, and computational biologists significantly. Computational modelling, data analysis, and their informed interpretation using the contents of the map have the potential to identify molecular signatures of disease predisposition and development, and to suggest drug repositioning for improving current treatments.
COVID-19 Disease Map is a collection of 41 diagrams containing 1836 interactions between 5499 elements, supported by 617 publications and preprints. The summary of diagrams available in the COVID-19 Disease Map can be found online 2 in Supplementary Material 1. The map is a constantly evolving resource, refined and updated by ongoing efforts of biocuration, sharing and analysis. Here, we report its current status.
In Section 2 we explain the set up of our community effort to construct the interoperable content of the resource, involving biocurators, domain experts and data analysts. In Section 3 we demonstrate that the scope of the biological maps in the resource reflects the state-ofthe-art about the molecular biology of COVID-19. Next, we outline analytical workflows that can be used on the contents of the map, including initial, preliminary outcomes of two such workflows, discussed in detail as use cases in Section 4. We conclude in Section 5 with an outlook to further development of the COVID-19 map and the utility of the entire resource in future efforts towards building and applying disease-relevant computational repositories.

Building and sharing the interoperable content
The COVID-19 Disease Map project involves three main groups: (i) biocurators, (ii) domain experts, and (iii) analysts and modellers: i.
Biocurators develop a collection of systems biology diagrams focused on the molecular mechanisms of SARS-CoV-2. ii.
Domain experts refine the contents of the diagrams, supported by interactive visualisation and annotations. iii.
Analysts and modellers develop computational workflows to generate hypotheses and predictions about the mechanisms encoded in the diagrams.
All three groups have an important role in the process of building the map, by providing content, refining it, and defining the downstream computational use of the map. Figure 1 illustrates the ecosystem of the COVID-19 Disease Map Community, highlighting the roles of different participants, available format conversions, interoperable tools, and downstream uses. The information about the community members and their contributions are disseminated via the FAIRDOMHub [17], so that content distributed across different collections can be uniformly referenced.

Creating and accessing the diagrams
The biocurators of the COVID-19 Disease Map diagrams follow the guidelines developed by the Community, and specific workflows of WikiPathways [8] and Reactome [7]. The biocurators build literature-based systems biology diagrams, representing the molecular processes implicated in the COVID-19 pathophysiology, their complex regulation and the phenotypic outcomes. These diagrams are main building blocks of the map, and are composed of biochemical reactions and interactions (further called altogether interactions) taking place between different types of molecular entities in various cellular compartments. As there are multiple teams working on related topics, biocurators can provide an expert review across pathways and across platforms. This is possible, as all platforms offer intuitive visualisation, interpretation, and analysis of pathway knowledge to support basic and clinical research, genome analysis, modelling, systems biology, and education. Table 1 lists information about the created content. For more details see Supplementary Material 1. communicating to refine, interpret and apply COVID-19 Disease Map diagrams. These diagrams are created and maintained by biocurators, following pathway database workflows or standalone diagram editors, and reviewed by domain experts. The content is shared via pathway databases or a GitLab repository; all can be enriched by integrated resources of text mining and interaction databases. The COVID-19 Disease Map diagrams, available in layout-aware systems biology formats and integrated with external repositories, are available in several formats allowing a range of computational analyses, including network analysis and Boolean, kinetic or multiscale simulations.
Both interactions and interacting entities are annotated following a uniform, persistent identification scheme, using either MIRIAM or Identifiers.org [18], and the guidelines for annotations of computational models [19]. Viral protein interactions are explicitly annotated with their taxonomy identifiers to highlight findings from strains other than SARS-CoV-2. Moreover, tools like ModelPolisher [20], SBMLsqueezer [21] or MEMOTE 3 help to automatically complement the annotations in the SBML format and validate the model (see also Supplementary Material 2).

Enrichment using knowledge from databases and text mining
The knowledge on COVID-19 mechanisms is rapidly evolving, as demonstrated by the rapid growth of the COVID-19 Open Research Dataset (CORD-19) dataset, a source scientific manuscript text and metadata on COVID-19 and related coronavirus research [28]. CORD-19 currently contains over 130,000 articles and preprints, over four times more than when it was introduced 10 . In such a quickly evolving environment, biocuration efforts need to be supported by other repositories of structured knowledge about molecular mechanisms relevant for COVID-19, like molecular interaction databases, or text mining resources. Contents of such repositories may suggest improvements in the existing COVID-19 Disease Map diagrams, or establish a starting point for developing new pathways (see Section "Biocuration of database and text mining content").

Interaction and pathway databases
Interaction and pathway databases contain structured and annotated information on protein interactions or causal relationships. While interaction databases focus on pairs of molecules, offering broad coverage of literature-reported findings. Pathway databases provide detailed description of biochemical processes and their regulations of related interactions, supported by diagrams. Both types of resources can be a valuable input for COVID-19 Disease Map biocurators, given the comparability of identifiers used for molecular annotations, and the reference to publications used for defining an interaction or building a pathway. Table 2

Text mining resources
Text-mining approaches can help to sieve through such rapidly expanding literature with natural language processing (NLP) algorithms based on semantic modelling, ontologies, and linguistic analysis to automatically extract and annotate relevant sentences, biomolecules, and their interactions. This scope was recently extended to pathway figure mining: decoding pathway figures into their computable representations [30]. Altogether, these automated workflows lead to the construction of knowledge graphs: semantic networks incorporating ontology concepts, unique biomolecule references, and their interactions extracted from abstracts or full-text documents [31].
The COVID-19 Disease Map Project integrates open-access text mining resources, INDRA [32], BioKB 15 , AILANI COVID-19 16 , and PathwayStudio 10 . All platforms offer keyword-based search allowing interactive exploration. Additionally, the map benefits from an extensive protein-protein interaction network (PPI) 17 generated with a custom text-mining pipeline using OpenNLP 18 and GNormPlus [33]. This pipeline was applied to the CORD-19 dataset and the collection of MEDLINE abstracts associated with the genes in the SARS-CoV-2 PPI network [34] using the Entrez Gene Reference-Into-Function (GeneRIF). For detailed descriptions of the resources, see Supplementary Material 3.

Biocuration using database and text mining content
Molecular interactions from databases and knowledge graphs from text mining resources discussed above (from now on called altogether 'knowledge graphs') have a broad coverage at the cost of depth of mechanistic representation. This content can be used by the biocurators in the process of building and updating the systems biology focused diagrams. Biocurators can use this content in three main ways: by visual exploration, by programmatic comparison, and by direct incorporation of the content.
First, the biocurators can visually explore the contents of the knowledge graphs using available search interfaces to locate new knowledge and encode it in the diagrams. Moreover, solutions like COVIDminer project 19 , PathwayStudio and AILANI offer a visual representation of a group of interactions for a better understanding of their biological context, allowing search by interactions, rather than just isolated keywords. Finally, INDRA and AILANI offer assistant bots that respond to natural language queries and return meaningful answers extracted from knowledge graphs.
Second, programmatic access and reproducible exploration of the knowledge graphs is possible via data endpoints: SPARQL for BioKB and Application Programming Interfaces for INDRA, AILANI, and Pathway Studio. Users can programmatically submit keyword queries and retrieve functions, interactions, pathways, or drugs associated with submitted gene lists. This way, otherwise time-consuming tasks like an assessment of completeness of a given diagram, or search for new literature evidence, can be automated to a large extent.
Finally, biocurators can directly incorporate the content of knowledge graphs into SBML format using BioKC [35]. Additionally, the contents of the Elsevier COVID-19 Pathway Collection can be translated to SBGNML 20 preserving the layout of the diagrams. The SBGNML content can then be converted into other diagram formats used by biocurators (see Section 2.3 below).

Interoperability of the diagrams and annotations
The biocuration of the COVID-19 Disease Map is distributed across multiple teams, using varying tools and associated systems biology representations. This requires a common approach to annotations of evidence, biochemical reactions, molecular entities and their interactions. Moreover, the interoperability of layout-aware formats is needed for comparison and integration of the diagrams in the map.

Layout-aware formats for molecular mechanisms
The COVID-19 Disease Map diagrams are encoded in one of three layout-aware formats for standardised representation of molecular interactions: SBML 21 [36][37][38], SBGNML [27], and GPML [24]. These XML-based formats focus to a varying degree on user-friendly graphical representation, standardised visualisation, and support of computational workflows. For the detailed description of the formats, see Supplementary Material 1.
Each of these three languages has a different focus: SBML emphasizes standardised representation of the data model underlying molecular interactions, SBGNML provides standardised graphical representation of molecular processes, while GPML allows for a partially standardised representation of uncertain biological knowledge. Nevertheless, all three formats are centered around molecular interactions, provide a constrained vocabulary to encode element and interaction types, encode layout of their diagrams and support stable identifiers for diagram components. These shared properties, supported by a common ontology 22 [39], allow cross-format mapping and enable translation of key properties between the formats. Therefore, when developing the contents of the map, biocurators use the tools they are familiar with, facilitating this distributed task.

Format interoperability
The COVID-19 Disease Map Community ecosystem of tools and resources (see Figure 1) ensures interoperability between the three layout-aware formats for molecular mechanisms: SBML, SBGNML, and GPML. Essential elements of this setup are tools capable of providing cross-format translation functionality [40,41] and supporting harmonised visualisation processing. Another essential translation interface is a representation of Reactome pathways in WikiPathways GPML [42] and SBML. The SBML export of Reactome content has been optimised in the context of this project and facilitates integration with the other COVID-19 Disease Map software components.
The contents of the COVID-19 Disease Map diagrams can be directly transformed into inputs of computational pipelines and data repositories. Besides the direct use of SBML format in kinetic simulations, CellDesigner SBML files can be transformed into SBML qual [43] using CaSQ [44], enabling Boolean modelling-based simulations (see also Supplementary Material 3). In parallel, CaSQ converts the diagrams to the SIF format 23 , supporting pathway modelling workflows using simplified interaction networks. Notably, the GitLab repository features an automated translation of stable versions of diagrams into SBML qual. Finally, translation of the diagrams into XGMML format (the eXtensible Graph Markup and Modelling Language) using Cytoscape [45] or GINSim [46] allows for network analysis and interoperability with molecular interaction repositories [47].

Structure and scope of the map
Thanks to the community effort discussed above supported by a rich bioinformatics framework, we constructed the COVID-19 Disease Map, focussing on the mechanisms known from other coronaviruses [48] and suggested by early experimental investigations [PMID:32511329]. Then, we applied the analytical and modelling workflows to the contributed diagrams and associated interaction databases to propose initial map-based insights into COVID-19 molecular mechanisms.
The COVID-19 Disease Map is an evolving repository of pathways affected by SARS-CoV-2. Figure 2. It is currently centred on molecular processes involved in SARS-CoV-2 entry, 22 http://www.ebi.ac.uk/sbo/main/ 23 http://www.cbmc.it/fastcent/doc/SifFormat.htm replication, and host-pathogen interactions. As mechanisms of host susceptibility, immune response, cell and organ specificity emerge, these will be incorporated into the next versions of the map. The COVID-19 Map represents the mechanisms in a "host cell". This follows literature reports on cell specificity of SARS-CoV-2 [3,[49][50][51][52][53]. Some pathways included in the COVID-19 Map may be shared among different cell types, as for example the IFN-1 pathway found in cells such as dendritic, epithelial, and alveolar macrophages [54][55][56][57][58]. While at this stage, we do not address cell specificity explicitly in our diagrams, extensive annotations may allow identification of pathways relevant to the cell type of interest.
The SARS-CoV-2 infection process and COVID-19 progression follow a sequence of steps ( Figure 3), starting from viral attachment and entry, which involve various dynamic processes on different time scales that are not captured in static representations of pathways. Correlation of symptoms and potential drugs suggested to date helps downstream data exploration and drug target interpretation in the context of therapeutic interventions.

Antigen-Presenting Cell
Natural killer

Target cell
Granulocytes

Bronchial epithelium
Nasal mucosa

PAMP Signalling
Integrative stress response

Endoplasmic Reticulum Stress
Dendritic cells

Virus attachment and entry
Transmission of SARS-CoV-2 primarily occurs through contact with respiratory drops, airborne transmission, and through contact with contaminated surfaces [66][67][68]. Upon contact with the respiratory epithelium, the virus infects cells mostly by binding the spike surface glycoprotein (S) to angiotensin-converting enzyme 2 (ACE2) with the help of serine protease TMPRSS2 [69][70][71][72]. Importantly, recent results suggest viral entry using other receptors of lungs and the immune system [73,74]. Once attached, SARS-CoV-2 can enter cells either by direct fusion of the virion and cell membranes in the presence of proteases • Dendritic cells.

Cytokine release and systemic inflammation
Asymptomatic/Pre -symptomatic.

Immune response and inflammation modulation
Vaccine? Pre-exposure prophylaxis? Antivirals?

Organ damage
Multiple organ dysfunction ARDS, complications.

Death
Host response • Cellular stress.

Treatment of complications
Systemic and ventilation support Oxygen therapy Host RAAS ARDS; Acute respiratory distress syndrome. RAAS; Renin-angiotensin-aldosterone system. SIRS; Systemic inflammatory response syndrome.

SARS-CoV-2
Pathophysiology Virus-host cell interactions and host response

Severe Mild
Critical Asymptomatic (Lung, heart, kidney) (Nasal and respiratory epithelium, alveoli, vascular endothelial) TMPRSS2 and furin or by endocytosis in their absence. Regardless of the entry mechanism, the S protein has to be activated to initiate the plasma or endosome membrane fusion process. While in the cell membrane, S protein is activated by TMPRSS2 and furin, in the endosome S protein is activated by cathepsin B (CTSB) and cathepsin L (CTSL) [71,75]. Activated S promotes the cell-or endosome-membrane fusion [76] with the virion membrane, and then the nucleocapsid is injected into the cytoplasm. These mechanisms are represented in the corresponding diagrams of the map 25 .

Replication and release
Within the host cell, SARS-CoV-2 hijacks the rough endoplasmic reticulum (RER)-linked host translational machinery. It then synthesises viral proteins replicase polyprotein 1a (pp1a) and replicase polyprotein 1ab (pp1ab) directly from the virus (+)genomic RNA (gRNA) [48,77]. Through a complex cascade of proteolytic cleavages, pp1a and pp1ab give rise to 16 non-structural proteins (Nsps) [78][79][80]. Most of these Nsps collectively form the replication transcription complex (RTC) that is anchored to the membrane of the double-membrane vesicle [78,81] Endoplasmic reticulum stress and unfolded protein response As discussed above, the virus hijacks the ER to replicate. Production of large amounts of viral proteins exceeds the protein folding capacity of the ER, creating an overload of unfolded proteins. As a result, the unfolded protein response (UPR) pathways are triggered to assure the ER homeostasis, using three main signalling routes of UPR via PERK, IRE1, and ATF6 [87]. Their role is to mitigate the misfolded protein load and reduce oxidative stress. The resulting protein degradation is coordinated with a decrease in protein synthesis via eIF2alpha phosphorylation and induction of protein folding genes via the transcription factor XBP1 [88]. When the ER is unable to restore its function, it can trigger cell apoptosis [89,90].
The results are ER stress and activation of the UPR. The expression of some human coronavirus (HCoV) proteins during infection, in particular the S glycoprotein, may induce activation of the ER stress in the host cells [91]. Based on SARS-CoV results, this may lead to activation of the PERK [92], IRE1 and in an indirect manner, of the ATF6 pathways [93].

Autophagy and protein degradation
Processes of degrading malfunctioning proteins and damaged organelles, including the ubiquitin-proteasome system (UPS) and autophagy [94] are essential to maintain energy homeostasis and prevent cellular stress [95,96]. Autophagy is also involved in cell defence, including direct destruction of the viruses via virophagy, presentation of viral antigens, and inhibition of excessive inflammatory reactions [97,98].
SARS-CoV-2 directly affects the process of UPS-based protein degradation, as indicated by the host-virus interactome dataset published recently [34]. This mechanism may be a defence against viral protein degradation [99]. The map describes in detail the nature of this interaction, namely the impact of Orf10 virus protein on the Cul2 ubiquitin ligase complex and its potential substrates.
Interactions between SARS-CoV-2 and host autophagy pathways are inferred based on results from other CoVs. A finding that CoVs use double-membrane vesicles and LC3-I for replication [100] may suggest that the virus induces autophagy, possibly in ATG5-dependent manner [101], although some evidence points to the contrary [102]. Also, the CoV Nsp6 restricts autophagosome expansion, compromising the degradation of viral components [103]. Recently revealed mutations in Nsp6 [104] indicate its importance, although the exact effect of the mutations remains unknown. Based on the connection between autophagy and the endocytic pathway of the virus replication cycle [105], autophagy modulation was suggested as a potential therapy strategy, either pharmacologically [96,[105][106][107], or via fasting [108].

Apoptosis
Apoptosis, a synonym for programmed cell death, is triggered by virus-host interaction upon infection, as the early death of the virus-infected cells may prevent viral replication. Many viruses block or delay cell death by expressing anti-apoptotic proteins to maximize the production of viral progeny [109]. In turn, apoptosis induction at the end of the viral replication cycle might assist in viral dissemination while reducing an inflammatory response. For instance, SARS-CoV-2 [110] and MERS [111] are able to invoke apoptosis in lymphocytes, compromising the immune system.
Apoptosis follows two major pathways [112], called extrinsic and intrinsic. Extrinsic signals are transmitted by death ligands and their receptors (e.g., FasL and TNF-alpha). Activated death receptors recruit adaptors like FADD and TRADD, and initiator procaspases like caspase-8, leading to cell death with the help of effector caspases-3 and 7 [113,114]. In turn, the intrinsic pathway involves mitochondria-related members of the Bcl-2 protein family. Cellular stress causes Bcl-2 proteins-mediated release of cytochrome c from the mitochondria into the cytoplasm. Cytochrome c then forms a complex with Apaf1 and recruits initiator procaspase-9 to form the apoptosome, leading to the proteolytic activation of caspase-9. Activated caspase-9 can now initiate the caspase cascade by activating effector caspases 3 and 7 [114]. The intrinsic pathway is modulated by SARS-CoV molecules [115,116]. As intrinsic apoptosis involves mitochondria, its activity may also be exacerbated by SARS-CoV-2 disruptions of the electron transport chain, mitochondrial translation, and transmembrane transport [34]. The resulting mitochondrial dysfunction may lead to increased release of reactive oxygen species and pro-apoptotic factors.
Another vital crosstalk is that of the intrinsic pathway with the PI3K-Akt pro-survival pathway. Activated Akt can phosphorylate and inactivate various pro-apoptotic proteins, including Bad and caspase-9 [117]. SARS-CoV uses PI3K-Akt signalling cascade to enhance infection [118]. Moreover, SARS-CoV could affect apoptosis in a cell-type-specific manner [119,120].
SARS-CoV structural proteins S, E, M, N, and accessory proteins 3a, 3b, 6, 7a, 8a, and 9b have been shown to act as crucial effectors of apoptosis in vitro. Structural proteins seem to affect mainly the intrinsic apoptotic pathway, with p38 MAPK and PI3K/Akt pathways regulating cell death. Accessory proteins can induce apoptosis via different cascades and in a cellspecific manner [114]. SARS-CoV E and 7a protein were shown to activate the intrinsic pathway by blocking anti-apoptotic Bcl-XL localized to the ER [121]. SARS-CoV M protein and the ion channel activity of E and 3a were shown to interfere with pro-survival signalling cascades [114,122].

Integrative stress response: endothelial damage, coagulation, and inflammation
The viral replication and the consequent immune and inflammatory responses cause damage to the epithelium and pulmonary capillary vascular endothelium and activate the main intracellular defence mechanisms, as well as the humoral and cellular immune responses. Resulting cellular stress and tissue damage [123,124] impair respiratory capacity and lead to acute respiratory distress syndrome (ARDS) [61,125,126]. Hyperinflammation is a known complication, causing widespread damage, organ failure, and death, followed by a not yet completely understood rapid increase of cytokine levels (cytokine storm) [127][128][129], and acute ARDS [130]. Other reported complications, such as coagulation disturbances and thrombosis are associated with severe cases, but the specific mechanisms are still unknown [64,[131][132][133], although some reports suggest that COVID-19 coagulopathy has a distinct profile [134].
ACE2, used by SARS-CoV-2 for host cell entry, is a regulator of RAS and is widely expressed in the affected organs [142]. The main function of ACE2 is the conversion of AngII to angiotensin 1-7 (Ang1-7), and these two angiotensins trigger the counter-regulatory arms of RAS [143]. The signalling via AngII and its receptor AGTR1, elevated in the infected [142,144], induces the coagulation cascade leading to microvascular thrombosis [145], while Ang1-7 and its receptor MAS1 attenuate these effects [143].

PAMP signalling
The innate immune system detects specific pathogen-associated molecular patterns (PAMPs), through Pattern Recognition Receptors (PRRs). Detection of SARS-CoV-2 is mediated through receptors that recognise double-stranded and single-stranded RNA in the endosome during endocytosis of the virus particle, or in the cytoplasm during the viral replication. These receptors mediate the activation of transcription factors such as AP1, NFkappaB, IRF3, and IRF7, responsible for the transcription of antiviral proteins, in particular, interferon-alpha and beta [146,147].
SARS-CoV-2 reduces the production of type I interferons to evade the immune response [49]. The detailed mechanism is not clear yet; however, SARS-CoV M protein inhibits the IRF3 activation [148] and suppresses NFkappaB and COX2 transcription. At the same time, SARS-CoV N protein activates NFkappaB [149], so the overall impact is unclear. These pathways are also negatively regulated by SARS-CoV nsp3 papain-like protease domain (PLPro) [150].
The map contains the initial recognition process of the viral particle by the innate immune system and the viral mechanisms to evade the immune response. It provides the connection between virus entry (detecting the viral endosomal patterns), its replication cycle (detection cytoplasmic viral patterns), and the effector pathways of pro-inflammatory cytokines, especially of the interferon type I class. The latter seems to play a crucial but complex role in COVID-19 pathology: both negative [151,152] and positive effects [3,153] of interferons on virus replication have been reported.
Interferon type I signalling Interferons (IFNs) are central players in the antiviral immune response of the host cell [55], specifically affected by SARS-CoV-2 [154][155][156][157]. Type I IFNs are induced upon viral recognition of PAMPs by various host PRRs [48] as discussed earlier. The IFN-I pathway diagram represents the activation of TLR7 and IFNAR and the subsequent recruiting of adaptor proteins and the downstream signalling cascades regulating key transcription factors including IRF3/7, NF-kappaB, AP-1, and ISRE [48,158]. Further, the map shows IRF3mediated induction of IFN-I, affected by the SARS-CoV-2 proteins. SARS-CoV Nsp3 and Orf6 interfere with IRF3 signalling [159,160] and SARS-CoV M, N, Nsp1 and Nsp3 act as interferon antagonists [48,150,158,161,162]. Moreover, coronaviruses ORF3a, ORF6 and nsp1 proteins can repress interferon expression and stimulate the degradation of IFNAR1 and STAT1 during the Unfolded Protein Response (UPR) [163,164].
Another mechanism of viral RNA recognition is RIG-like receptor signalling [58], leading to STING activation [165], and via the recruitment of TRAF3, TBK1 and IKKepsilon to phosphorylation of IRF3 [56]. This in turn induces the transcription of IFNs alpha, beta and lambda [166]. SARS-CoV viral papain-like-proteases, contained within the nsp3 and nsp 16 proteins, inhibit STING and the downstream IFN secretion [167]. In line with this hypothesis, SARS-CoV-2 infection results in a unique inflammatory response defined by low levels of IFN-I and high expression of cytokines [58,168]. The IFNlambda diagram describes the IFNL receptor signaling cascade [169], including JAK-STAT signaling and the induction of Interferon Stimulated Genes, which encode antiviral proteins [170]. The interactions of SARS-CoV-2 proteins with the IFNL pathway are based on the literature [171] or SARS-CoV homology [58].

Altered host metabolism
Metabolic pathways govern the immune microenvironment by modulating the availability of nutrients and critical metabolites [172]. Infectious entities reprogram host metabolism to create favourable conditions for their reproduction [173]. SARS-CoV-2 proteins interact with a variety of immunometabolic pathways, several of which are described below.
The tryptophan-kynurenine pathway is closely related to heme metabolism. The ratelimiting step of this pathway is catalysed by the indoleamine 2,3 dioxygenase enzymes (IDO1 and IDO2) in dendritic cells, macrophages, and epithelial cells in response to inflammatory cytokines like IFN-gamma, IFN-1, TGF-beta, TNF-alpha, and IL-6 [194][195][196]. Crosstalk with the HMOX1 pathway also increases the expression of IDO1 and HMOX1 in a feed-forward manner. Metabolomics analyses from severe COVID-19 patients revealed enrichment of kynurenines and depletion of tryptophan, indicating robust activation of IDO enzymes [197,198]. Depletion of tryptophan [173,199,200] and kynurenines and their derivatives affect the proliferation and immune response of a range of T cells [176,[201][202][203][204][205]. However, despite high levels of kynurenines in COVID-19, CD8+ T-cells and Th17 cells are enriched in lung tissue, and T-regulatory cells are diminished [206]. This raises the question of whether and how the immune response elicited in COVID-19 evades suppression by the kynurenine pathway.
The SARS-CoV-2 protein Nsp14 interacts with three human proteins: GLA, SIRT5, and IMPDH2 [34]. The galactose metabolism pathway, including the GLA enzyme [207], is interconnected with amino sugar and nucleotide sugar metabolism. SIRT5 is a NADdependent desuccinylase and demalonylase regulating serine catabolism, oxidative metabolism and apoptosis initiation [208][209][210]. Moreover, nicotinamide metabolism regulated by SIRT5 occurs downstream of the tryptophan metabolism, linking it to the pathways discussed above. Finally, IMPDH2 is the rate-limiting enzyme in the de novo synthesis of GTP, allowing regulation of purine metabolism and downstream potential antiviral targets [211,212].
The pyrimidine synthesis pathway, tightly linked to purine metabolism, affects viral DNA and RNA synthesis. Pyrimidine deprivation is a host targeted antiviral defence mechanism, which blocks viral replication in infected cells and can be regulated pharmacologically [213][214][215]. It appears that components of the DNA damage response connect the inhibition of pyrimidine biosynthesis to the interferon signalling pathway, probably via STING-induced TBK1 activation that amplifies interferon response to viral infection, discussed above. Inhibition of de novo pyrimidine synthesis may have beneficial effects on the recovery from COVID-19 [215]; however, this may happen only in a small group of patients.

Biocuration roadmap
COVID-19 pathways featured in the previous section cover mechanisms reported so far. Still, certain aspects of the disease were not represented in detail because of their complexity, namely cell-type-specific immune response, and susceptibility features. Their mechanistic description is of great importance, as suggested by clinical reports on the involvement of these pathways in the molecular pathophysiology of the disease. The mechanisms outlined below will be the next targets in our curation roadmap.
Cell type-specific immune response COVID-19 causes serious disbalance in multiple populations of immune cells. Some studies report that COVID-19 patients have a significant decrease of peripheral CD4+ and CD8+ cytotoxic T lymphocytes (CTLs), B cells, NK cells, as well as higher levels of a broad range of cytokines and chemokines [128,[216][217][218][219]. The disease causes functional exhaustion of CD8+ CTLs and NK cells, induced by SARS-CoV-2 S protein and by excessive pro-inflammatory cytokine response [217,220]. Moreover, the ratio of naïve-to-memory helper T-cells, as well as the decrease of T regulatory cells, correlate with COVID-19 severity [206]. Conversely, high levels of Th17 and cytotoxic CD8+ T-cells have been found in the lung tissue [221]. Pulmonary recruitment of lymphocytes into the airways may explain the lymphopenia and the increased neutrophil-lymphocyte ratio in peripheral blood found in COVID-19 patients [216,222,223]. In this regard, an abnormal increase of the Th17:Treg cell ratio may promote the release of pro-inflammatory cytokines and chemokines, increasing disease severity [224].
Importantly, age is one of the key aspects contributing to the severity of the disease. The elderly are at high risk of developing severe or critical disease [227,247]. Age-related elevated levels of pro-inflammatory cytokines (inflammation) [247][248][249][250], immunosenescence and cellular stress of ageing cells [125,227,247,251,252] may contribute to the risk. In contrast, children are generally less likely to develop severe disease [253,254], with the exception of infants [125,[255][256][257]. However, some previously healthy children and adolescents can develop a multisystem inflammatory syndrome following SARS-CoV-2 infection [258][259][260][261][262].
We aim to connect the susceptibility features to specific molecular mechanisms and better understand the contributing factors. This can lead to a series of testable hypotheses, including the role of vitamin D counteracting pro-inflammatory cytokine secretion [270][271][272] in an age-dependent manner [247,273], and modifying the severity of the disease. Another example of a testable hypothesis may be that the immune phenotype associated with asthma inhibits pro-inflammatory cytokine production and modifies gene expression in the airway epithelium, protecting against severe COVID-19 [245,246,274].

Bioinformatics analysis and computational modelling roadmap for hypothesis generation
In order to understand complex and often indirect dependencies between different pathways and molecules, we need to combine computational and data-driven analyses. Standardised representation and programmatic access to the contents of the COVID-19 Disease Map enable the development of reproducible analytical and modelling workflows. Here, we discuss the range of possible approaches and demonstrate preliminary results, focusing on interoperability, reproducibility, and applicability of the methods and tools.
Our goal is to work on the computational challenges as a community, involving the biocurators and domain experts in the analysis of the COVID-19 Disease Map and rely on their feedback to evaluate the outcomes. In this way, we aim to identify approaches to tackle the complexity and the size of the map, proposing a state-of-the-art framework for robust analysis, reliable models, and useful predictions.

Data integration and network analysis
Visualisation of omics data can help contextualise the map with experimental data creating data-specific blueprints. These blueprints could be used to highlight parts of the map that are active in one condition versus another (treatment versus control, patient versus healthy, normal versus infected cell, etc.). Combining information contained in multiple omics platforms can make patient stratification more powerful, by reducing the number of samples needed or by augmenting the precision of the patient groups [275,276]. Approaches that integrate multiple data types without the accompanying mechanistic diagrams [277][278][279] produce patient groupings that are difficult to interpret. In turn, classical pathway analyses often produce long lists mixing generic and cell-specific pathways, making it challenging to pinpoint relevant information. Using disease maps to interpret omics-based clusters addresses the issues related to contextualised visual data analytics.

Footprint based analysis
Footprints are signatures of a molecular regulator determined by the expression levels of its targets [280]. For example, a footprint can contain targets of a transcription factor (TF) or peptides phosphorylated by a kinase. Combining multiple omics readouts and multiple measurements can increase the robustness of such signatures. Nevertheless, an essential component is the mechanistic description of the targets of a given regulator, allowing computation of its footprint. With available SARS-CoV-2 related omics and interaction datasets [281], it is possible to infer which TFs and signalling pathways are affected upon infection [282]. Combining the COVID-19 Disease map regulatory interactions with curated collections of TF-target interactions like DoRothEA [283] will provide a contextualised evaluation of the effect of SARS-CoV-2 infection at the TF level.

Viral-host interactome
The virus-host interactome is a network of virus-human protein-protein interactions (PPIs) that can help understanding the mechanisms of disease [34,[284][285][286]. It can be expanded by merging virus-host PPI data with human PPI and protein data [287] to discover clusters of interactions indicating human mechanisms and pathways affected by the virus [288]. These clusters first of all can be interpreted at the mechanistic level by visual exploration of COVID-19 Disease Map diagrams. In addition, these clusters can potentially reveal additional pathways to add to the COVID-19 Disease Map (e.g., E protein interactions or TGFBeta diagrams) or suggest new interactions to introduce into the existing diagrams.

Mechanistic and dynamic computational modelling
Computational modelling is a powerful approach that enables in silico experiments, produces testable hypotheses, helps elucidate regulation and, finally, can suggest via predictions novel therapeutic targets and candidates for drug repurposing.

Mechanistic pathway modelling
Mechanistic models of pathways allow bridging variations at the scale of molecular activity to variations at the level of cell behaviour. This can be achieved by coupling the molecular interactions of a given pathway with its endpoint and by contextualising the molecular activity using omics datasets. HiPathia is such a method, processing transcriptomic or genomic data to estimate the functional profiles of a pathway conditioned by the data studied and linkable to phenotypes such as disease symptoms or other endpoints of interest [289,290]. Moreover, such mechanistic modelling can be used to predict the effect of interventions as, for example, the effect of targeted drugs [291]. HiPathia integrates directly with the diagrams of the COVID-19 Map using the SIF format provided by CaSQ (see Section 2.3), as well as with the associated interaction databases (see Section 2.2).
The drawback of approaches like HiPathia is their computational complexity, limiting the size of the diagrams they can process. An approach to large-scale mechanistic pathway modelling is to transform them into causal networks. CARNIVAL [292] combines the causal representation of networks [13] with transcriptomics, phosphoproteomics, or metabolomics data [280] to contextualise cellular networks and extract mechanistic hypotheses. The algorithm identifies a set of coherent causal links connecting upstream drivers such as stimulations or mutations to downstream changes in transcription factor activities.

Discrete computational modelling
Analysis of the dynamics of molecular networks is necessary to understand their dynamics and deepen our understanding of crucial regulators behind disease-related pathophysiology. Discrete modelling framework provides this possibility. COVID-19 Disease Map diagrams, translated to SBML qual (see Section 2.3), can be directly imported by tools like Cell Collective [293] or GINsim [46] for analysis. Preserving annotations and layout information ensures transparency and reusability of the models. Importantly, Cell Collective is an online user-friendly modelling platform 26 that provides features for real-time in silico simulations and analysis of complex signalling networks. The platform allows users without computational background to simulate or analyse models to generate and prioritise new hypotheses. References and layout are used for model visualisation, supporting the interpretation of the results. The mathematics and code behind each model, however, remain accessible to all users. In turn, GINsim is a tool providing a wide range of analysis methods, including efficient identification of the states of convergence of a given model (attractors). Model reduction functionality can also be employed to facilitate the analysis of large-scale models.

Multiscale and stochastic computational modelling
Viral infection and immune response are complex processes that span many different scales, from molecular interactions to multicellular behaviour. The modelling and simulation of such complex scenarios require a dedicated multiscale computational architecture, where multiple models run in parallel and communicate among them to capture cellular behaviour and intercellular communications. Multiscale agent-based models simulate processes taking place at different time scales, e.g., diffusion, cell mechanics, cell cycle, or signal transduction [294], proposed also for COVID-19 [295]. PhysiBoSS [296] allows such simulation of intracellular processes by combining the computational framework of PhysiCell [297] with MaBoSS [298] tool for stochastic simulation of logical models to study of transient effects and perturbations [299]. Implementation of detailed COVID-19 signalling models in the PysiBoSS framework may help to better understand complex dynamics of multi-scale processes as interactions and crosstalk between immune system components and the host cell in COVID-19.

Case study: RNA-Seq-based analysis of transcription factor (TF) activity
In this case study, we combine computational approaches discussed above and present results derived from omics data analysis on the COVID-19 Disease Maps diagrams. We measured the effect of COVID-19 at the transcription factor (TF) activity level by applying VIPER [300] combined with DoRothEA regulons [283] on RNA-seq datasets of the SARS-CoV-2 infected cell line [168]. Then, we mapped the TFs normalised enrichment score (NES) on the Interferon type I signalling pathway diagram of the COVID-19 Disease Map using the SIF files generated by CaSQ (see Section 2.3). As highlighted in Figure 4, our manually curated pathway included some of the most active TFs after SARS-CoV-2 infection, such as STAT1, STAT2 , IRF9 and NFKB1. These genes are well known to be involved in cytokine signalling and first antiviral response [301,302]. Interestingly, they are located downstream of various viral proteins (E, S, Nsp1, Orf7a and Orf3a) and members of the MAPK pathway (MAPK8, MAPK14 and MAP3K7). SARS-CoV-2 infection is known to promote MAPK activation, which mediates the cellular response to pathogenic infection and promotes the production of proinflammatory cytokines [281]. Altogether, we identified signaling events that may capture the mechanistic response of the human cells to the viral infection.

Case study: RNA-seq-based analysis of pathway signalling
In this use case, the Hipathia [289] algorithm was used to calculate the level of activity of the subpathways from the COVID-19 Apoptosis diagram, with the aim to evaluate whether COVID-19 Disease Map diagrams can be used for pathway modelling approach. To this end, a public RNA-seq dataset from human SARS-CoV-2 infected lung cells (GEO GSE147507) was used. First, the RNA-seq gene expression data was normalized with the Trimmed mean of M values (TMM) normalization [303], then rescaled to range [0;1] for the calculation of the signal and normalised using quantile normalisation [304]. The normalised gene expression values were used to calculate the level of activation of the subpathways, then a case/control contrast with a Wilcoxon test was used to assess differences in signaling activity between the two conditions. The activation levels have been calculated using transcriptional data from GSE147507 and Hipathia mechanistic pathway analysis algorithm. Each node represents a gene (ellipse), a metabolite (circle) or a function (square). The pathway is composed of circuits from a receptor gene/metabolite to an effector gene/function, with interactions simplified to inhibitions or activations (see Section 2.3, SIF format). Significantly deregulated circuits are highlighted by color arrows (red: activated in infected cells). The color of the node corresponds to the level of differential expression of each node in SARS-CoV-2 infected cells vs normal lung cells. Blue: down-regulated elements, red: up-regulated elements, white: elements with not statistically significant differential expression. Hipathia calculates the overall circuit activation, and can indicate deregulated interaction even if interacting elements are not individually differentially expressed.
Results of the Apoptosis pathway analysis can be seen in Figure 5 and Supplementary Material 5. Importantly, Hipathia calculates the overall activation of circuits (series of causally connected elements), and can indicate deregulated interactions resulting from a cumulative effect, even if interacting elements are not individually differentially expressed. When discussing differential activation, we refer to the circuits, while individual elements are mentioned as differentially expressed. The analysis shows an overactivation of several circuits, specifically the one ending in the effector protein BAX. This overactivation seems to be led by the overexpression of the BAD protein, inhibiting BCL2-MCL1-BCL2L1 complex, which in turn inhibits BAX. Indeed, SARS-CoV-2 infection can invoke caspase8-induced apoptosis [305], where BAX together with the ripoptosome/caspase-8 complex, may act as a pro-inflammatory checkpoint [306]. This result is supported by studies in SARS-CoV, showing BAX overexpression following infection [121,307]. Overall, our findings recapitulate reported outcomes. With evolving contents of the COVID-19 Disease Map and new omics data becoming available, new mechanism-based hypotheses can be formulated.

Parallel efforts
In the COVID19 Disease Map community we strive to produce interoperable content and seamless downstream analyses, translating the graphic representations of the molecular mechanisms to executable models. We are also aware of parallel efforts towards modelling of COVID-19 mechanisms, which we plan to include as a part of our ecosystem. These efforts are not yet directly interoperable with the COVID 19 Disease Map content as they use either different notation schemes or require parameters not covered by our biocuration guidelines At the same time, they provide a complementary source of information and the opportunity to create an even broader toolset to tackle the pandemic.
The modified Edinburgh Pathway Notation (mEPN) scheme [308] allows for the detailed visual encoding of molecular processes using the yEd platform but diagrams are constructed in such a way as to also function as Petri nets. These can then be used directly for activity simulations using the BioLayout network analysis tool [309]. The current mEPN COVID-19 model details the replication cycle of SARS-CoV-2, integrated with a range of host defence systems, e.g. type 1 interferon signalling, TLR receptors, OAS systems, etc. Simulations of altered gene expression, interactions with drug targets or changes to interaction kinetics can be represented by introducing relevant transitions or nodes directly in the diagram. Currently, models constructed in mEPN can be saved as SBGN.ml files, however is a loss of information and the features associated computationally are not compatible with other COVID-19 Disease Map diagrams (not modelled as Petri nets).
The COVID-19 Disease Map can support dynamic kinetic modelling to quantify the behaviour of different pathways and evaluate the dynamic effects of perturbations. However, it is necessary to assign a kinetic equation or a rate law to every reaction in the diagram to be analysed. This process is challenging because any given reaction depends on its cellular and physiological context, which makes it difficult to parameterise. Software support of tools like SBMLsqueezer [21] and reaction kinetics databases like SABIO-RK [310] are indispensable in this effort. Nevertheless, the most critical factor is the availability of experimentally validated parameters that can be reliably applied in SARS-CoV-2 modelling scenarios.

Discussion
The COVID-19 Disease Map is both a knowledgebase and a computational repository. On the one hand, it is a graphical, interactive representation of disease-relevant molecular mechanisms linking many knowledge sources. On the other hand, it is a computational resource of curated content for graph-based analyses and disease modelling. It offers a shared mental map for understanding the dynamic nature of the disease at the molecular level and also its dynamic propagation at a systemic level. Thus, it provides a platform for a precise formulation of models, accurate data interpretation, monitoring of therapy, and potential for drug repositioning.
The COVID-19 Disease Map spans three platforms and assembles diagrams describing molecular mechanisms of COVID-19. These diagrams are grounded in the relevant published SARS-CoV-2 research, completed where necessary by mechanisms discovered in related beta-coronaviruses. This unprecedented effort of community-driven biocuration resulted in over forty diagrams with molecular resolution constructed since March 2020. It demonstrates that expertise in biocuration, clear guidelines and text mining solutions can accelerate the passage from the published findings to a meaningful mechanistic representation of knowledge. The COVID 19 Disease Map can provide the tipping point to shortcut research data generation and knowledge accumulation, creating a formalized and standardized streamline of well defined tasks.
This approach to an emerging pandemic leveraged the capacity and expertise of an entire swath of the bioinformatics community, bringing them together to improve the way we build and share knowledge. By aligning our efforts, we strive to provide COVID-19 specific pathway models, synchronize content with similar resources and encourage discussion and feedback at every stage of the curation process. With new results published every day, and with the active engagement of the research community, we envision the COVID-19 Disease Map as an evolving and continuously updated knowledge base whose utility spans the entire research and development spectrum from basic science to pharmaceutical development and personalized medicine.
Moreover, our approach includes a large-scale effort to create interoperable tools and seamless downstream analysis pipelines to boost the applicability of established methodologies to the COVID-19 Disease Map content. This includes harmonisation of formats, support of standards, and transparency in all steps to ensure wide use and content reusability. Preliminary results of such efforts are presented in the case studies.
The COVID-19 Disease Map Community is open and expanding as more people with complementary expertise join forces. In the longer run, the map's content will help to find robust signatures related to SARS-CoV-2 predisposition or response to various treatments, along with the prioritization of new potential drug targets or drug candidates. We aim to provide the tools to deepen our understanding of the mechanisms driving the infection and help boost drug development supported with testable suggestions. We aim at building armor for new treatments to prevent new waves of COVID-19 or similar pandemics.