Linking and Negotiating Uncertainty Theories Over Linked Data

There is no credibility insurance measure for the information provided by the Web. In most cases, information cannot be checked for accuracy. Semantic Web technologies aimed to give structure and sense to information published on the Web and to provide us with a machine-readable data format for interlinked data. However, Semantic Web standards do not offer the possibility to represent and attach uncertainty to such data in a way allowing the reasoning over the latter. Moreover, uncertainty is context-dependent and may be represented by multiple theories which apply different calculi. In this paper, we present a new vocabulary and a framework for handling generic uncertainty representation and reasoning. The meta-Uncertainty vocabulary offers a way to represent uncertainty theories and annotate Linked Data with uncertainty information. We provide the tools to represent uncertainty calculi linked to the previous theories using the LDScript function scripting language. Moreover, we describe the semantics of contexts in uncertainty reasoning with meta-uncertainty. We describe the mapping between RDF triples and their uncertainty information, and we demonstrate the effect on the query writing process in Corese. We discuss the translatability of uncertainty theories and, finally, the negotiation of an answer annotated with uncertainty information.


INTRODUCTION
The Web is a massive source of information, and the emergence of Semantic Web [3] technologies allowed the transition from a document-driven view of the Web to a data-driven one. As no credibility insurance measure is taken in most cases, information on the Web cannot be checked for accuracy. In fact, many websites can post whatever they want including biased, subjective, incomplete or uncertain information [20]. In addition, several issues involving the modeling of uncertain Web Data and services are raised by Benslimane et al. [2] arguing that uncertainty should be modeled and measured in order to be associated with a piece of information or a service. Such a model prepares the ground for querying, searching, composing and harnessing uncertain data. Semantic Web technologies aimed to offer both structure and sense to the existing Web. However, Semantic Web standards do not offer the possibility to represent and attach uncertainty to such pieces of data while keeping standard semantics without further extension.
Various theories exist and can be used to model different aspects or types of uncertainty. Moreover, the choice of one theory among the others depends on the context and the application. Thus we argue that the bridge between uncertainty and the Semantic Web is crucial because of the dependence of many applications on uncertain data, the existence of many uncertainty theories, the need of interoperability and reusability of uncertain data.
First of all, the definition of uncertainty itself is challenging. It can be epistemic, i.e., stemming from our ignorance (incomplete knowledge, lack of a model) of an entity or process, or ontic, i.e., representing the inherent randomness of a phenomenon or system (a roulette, for instance). In addition, the border between these two types of uncertainty is somehow blurred and arbitrary, in that it depends on our point of view and on the level of abstraction of knowledge representation.
Multiple uncertainty theories are currently applied in Artificial Intelligence in order to reconcile the data, correct them, extrapolate or predict new values or simply assess the degree of uncertainty of information or error comparing to a dataset [14]. Probability theory is but one example, if not the best known and most timehonored, of such theories, which focuses on the representation and manipulation of the ontic uncertainty. Other theories are interested in the quantization (or "granulation") of vague data. As the words "tall" or "cold" do not have a specific measure, they still can be modeled using fuzzy-sets and manipulated using fuzzy logic. A non-exhaustive list of uncertainty theories and their formalization can be found in [17].
To integrate uncertainty in the Semantic Web stack, two aspects must be taken into account: • Syntax: uncertainty needs to be represented in order to be queried or published. The format must be machine-readable and interchangeable. The vocabulary proposed by the W3C URW3-XG group [18] enables to annotate data with the type, the model and the derivation of uncertainty. The group offers a limited list of models (Fuzzy sets, rough sets, etc.) with which neither information regarding the quantification of uncertainty nor the specificities of each approach and theory are provided. • Semantics: reasoners are used on top of ontologies to infer new triples (based on OWL profiles) or validate the knowledge base with the schema. Direct semantics in OWL [8] is intended to give sense to ontology structures compatible with the SROIQ description logic. Faced with uncertainty as described below and assuming that triples can be annotated with uncertainty values, OWL does not provide the necessary tools to manipulate uncertainty, as, with the generation of new information, uncertainty must be provided too. Several extensions to the standards are done in the Schema layer itself such as FuzzyOWL [22] or PossOWL [19], while other addressed the 'Unifying logic' layer like FuzzyDL [11]. Such standards can be extended too to enable inferring new information about uncertainty or keep only triples that follow a defined set of rules in the knowledge base.
Dividino et al. [12] presented a framework extending both RDF and SPARQL to enable meta-knowledge querying. Their framework enables the system administrator to define meta-knowledge properties and for each, should define the intended semantics and the knowledge dimension inside the application in a non-standard format. Thus, neither semantics for the dimensions are publishable, nor semantics for uncertainty itself are provided.
Another problem that arises from the different theories is the relationship between theories and data. Some theories are suitable for specific contexts and applications. Which requires either finding the suitable theory to annotate the data, or a suitable transformation of the existing annotations to fit the requirement. Both reasons raise the question of alternatives and negotiation of content with the data sources, alongside with the possibility for users to represent their preferences in queries.
In this paper, we focus on a generic representation for uncertainty information, one that can be publishable and reusable. We summarize the mUnc vocabulary enabling both the representation of uncertainty theories with their calculi and the annotation of data with uncertainty information. We present an extension to RDF Semantics by giving a contextual meaning to Named Graphs. We offer the possibility to use mUnc alongside with our framework built on top of the Corese Semantic Web engine [10] and using the LDScript function scripting language [9] based on the SPARQL filter language, to map between triples and their uncertainty metadata, using multiple mapping modes. We discuss the translatability of uncertainty theories and propose to extend mUnc to fill that gap, and then discuss the negotiation of theories over HTTP as a special case of conneg (Content Negotiation) where clients can state their preferences in terms of uncertainty theories and servers can select or translate among theories to serve the best possible answer.
The rest of the article is organized as follows. Section 2 first recalls the uncertainty in information theory and discusses a model for the existing uncertainty theories in the literature. Then, Section 3 discusses the integration of uncertainty in the semantic web, offering a presentation of the mUnc vocabulary, discussing the publishing of uncertainty approaches and their calculi on the Semantic Web, and giving an overview of metadata mapping modes and how to query over uncertain Linked Data. Section 4 is dedicated to the translatability of uncertainty theories and the negotiation details. Related works are discussed in Section 5 while the last section summarizes our work and offers a glimpse over the perspectives.

UNCERTAINTY IN INFORMATION
Information uncertainty can present itself in different forms depending on the domain of the definition. Data itself can be wrong from the source: entry mistakes, deceit, ambiguity, sensor errors, etc. In the context of decision-making, we link uncertainty to the outcome by defining uncertain data as "entries leading to a wrong output". Whether the data are incomplete (or imprecise), vague (or fuzzy, ambiguous), incorrect (or invalid). We find the previous specifications under other definition in the literature. For example, Dubois et al. [14] state that incompleteness, uncertainty, graduality, and granularity are different. They consider uncertainty as a measure for ignorance of the truth of a primitive item of information (proposition, statement, a subset of possible values, etc.), and quantified by a numerical or symbolic token located in the metalevel. In their context, the authors discuss assigning to each primitive item (fine grain) of information A, a number д(A) ∈ {0, 1} which evaluates the confidence of an agent in the truth of a proposition asserting v ∈ A. The presented property is also called confidence function, sometimes capacity [7] or fuzzy measure [23]. This measure preserves: The confidence function can have multiple types according to the data in measure. It can be a possibility, necessity or even a probability measure. The former two measures are dual and the latter satisfies the additivity property. Several theories were established in order to formalize uncertainty, such as probability theory, possibility theory, Dempster-Shafer evidence theory, belief functions, etc. Each of the previous has a set of measures and a logic to read or to infer new values. For instance, possibility theory states that given an event A, which is a set of interpretations or outcomes, the possibility of the event to occur is given by the possibility distribution: While laying under the Open World Assumption, inconsistency is another challenging aspect of uncertain data where the existence of multiple interpretations of the same knowledge base leading to undecidability. As an alternative of two-valued logic, other types of logic are exploited: defeasible logic based on rules or paraconsistent logic are among those. Such alternatives offer to draw conclusions from partial and conflicting information.
Uncertainty is also affected by the context of the reading [12]. A distributive reading of uncertainty means that statements in a set provide each a portion of the uncertainty value of the whole set. Collective reading, on the other hand, assigns to the whole set an uncertainty measure. The two readings are not exhaustive, an alternative would be inheriting the uncertainty measure of the set.
Klir et al. [17] observe that dealing with uncertainty consists of four distinct levels: representation, calculus, measurement, and development of methodological aspects of the theory. Our focus in this paper is on the first two levels.

INTEGRATING UNCERTAINTY IN THE SEMANTIC WEB
Semantic Web [3] is meant for machines to understand and infer knowledge as humans do while exploring the Web. Granularizing and linking pieces of data allow machines to provide more relative content and help in the query resolution process. Schema and ontology languages offered a backbone to the existing information by providing the vocabulary of a defined topic, the relationships ruling over the different terms and the semantics allowing to logically establish links in between. Several standards (RDF, RDFS, OWL, SPARQL, etc.) of the Semantic Web technology stack enable the representation and querying of RDF data.
To integrate uncertainty in the Semantic Web, this dimension must comply with the standards and, like any other data on the web, must be reusable and publishable. In this section, we offer an ontology covering uncertainty representation, and we show the possibility of replicating the semantics of uncertainty using calculi represented in the LDScript function scripting language.

mUnc: a vocabulary for uncertainty theories
To enable uncertainty representation on the Semantic Web, we need to opt for an interchangeable format to write uncertainty theories. We propose mUnc 1 , an RDFS ontology for uncertainty theories. mUnc (for meta-Uncertainty) enables publishing uncertainty information based on uncertainty theories. Figure 1 gives an overview of the core concepts and properties of mUnc . We have adopted the definition of sentence and world proposed in the URW3-XG ontology. A sentence is an expression evaluating a truth value, while the world represents the context in which a sentence is stated. Still, unlike the previous definition, both sentences and worlds can be annotated with meta information. For instance, the sentence ex:S1 representing the triple ⟨ex:StefanoTacconi, dbo:height, 188⟩ referring to the height of the football player is stated in the context of the French language chapter of DBpedia [4], assuming that the latter is consistent [5]. Uncertainty is considered a specialization of the general concept of meta. This simplifies the task if any future extensions for other types of metadata such as provenance or trust are proposed. We do not include the concept of Agent, as it can be included using other vocabularies like the W3C PROV 2 Ontology.
An uncertainty theory (Uncertainty Approach) is linked to a set of features and operators. The features are the metrics on which uncertainty theory is based to indicate the degree of truth, credibility, or likelihood of a sentence. Each feature links a value to the uncertainty information. The operators represent the logic to apply to the previous values, while the operations are the implemented calculus for such logic. Other concepts in the URW3-XG ontology like the type or the derivation of uncertainty can be represented as features of an uncertainty approach.
To illustrate the previous definitions, we can annotate the previous sentence using probability theory. It can be represented using only one feature: the probability value. We choose three logical operators to include with the definition: and, or, not. Listing 1: Representing uncertainty with mUnc the listing 1 shows how to assert that a sentence ex:S1 is true with a probability of 0.7. For the sake of illustration, we use reification to attach an IRI to the previous sentence, although no preference about metadata representation methods is stated [16].

Uncertainty calculi
Semantic Web ontology languages focus on classification ontological knowledge and do not support the provision of procedural attachments or functions inside ontologies. Our model allows linking the features of uncertainty approaches to their proper calculi (arithmetic, logical, comparison, etc). To represent the calculi, we rely on the LDScript function definition language [9], a programming language whose objects are RDF entities. It is built on top of SPARQL and relies on the SPARQL filter expression language.
LDScript as a language permits variable declaration, assignment, function call, return, etc. Using LDScript, we can define functions named with an IRI and one or several arguments that are variables in the SPARQL syntax. This enables defining uncertainty operations and linking them to uncertainty features.
To continue with the previous example, considering the fact that the sentence ex:S1 is true with a probability of 0.7, and is stated in a context ex:C1 where all contained facts are considered to be true with a probability of 0.9. The probability of a conjunction of two supposed independent events A and B is given by Equation (1): Such value can be calculated for the user using the function referenced by ex:multProbability and defined in LDScript as shown in the following example: function ex:multProbability(?pA, ?pB){?pA * ?pB} Therefore, binding the function ex:multProbability(0.7, 0.9) during a SPARQL query execution will return 0.63. The former definition of the probabilistic approach using mUnc can be enriched by linking the IRI of the function to the declared feature, simply by adding the triple: ex:ProbabilityValue ex:and ex:multProbability.
As stated before, each function is considered as a resource, due to the IRI defining its name. We can store such functions in SPARQL files all over the Web, and access their code using their reference.

Contextualizing, Mapping and Querying Uncertain Linked Data
As mentioned before, uncertainty depends on a context. A single world may accept different interpretations using different uncertainty measurements. Dubois [13] explains that beliefs are the different views upon a single world. We rely on the fact that data is issued by querying n uncertain data sources s 1 , s 2 , ..., s n , each possibly containing several contexts C i j , each representing consistent information. This means that each context contains a set of triples that do not lead to contradictory reasoning. mUnc does not provide an extension of RDF Semantics. Instead, we rely on the SPARQL query language to provide a mapping between sentences and the uncertainty information presented to the user. Moreover, we consider mUnc as an approach to providing definitions of known and custom uncertainty theories, for which we do not provide any specific semantics. The possibility of defining a calculus alongside with the ontology is an alternative to generalize and to reuse of the shared rules between uncertainty theories such as maximizing or minimizing a feature.
We note U S C i j the uncertainty information about the sentence S cited in the context C i j and U C i j the uncertainty information about the context C i j . Each sentence S stated in a context C i j of a source s i , will be mapped to a combined set of pairs (Uncertainty Feature, Uncertainty Value) issued from both sentence and context metadata (notedÛ S C i j ). This requires defining a metadata-mapping mode (see table 1).
The modes depend on the purpose of the application, the data itself, and the semantics of uncertainty theories. In the first mode, only uncertainty information linked to context C i j is considered. The second mode considers only pairs from the lowest level of granularity, while the third mode enables inheriting context metadata but overrides the values for existing features in uncertainty information linked to the sentence.
In this paper, we use a specific meta mapping mode which relies on uncertainty calculus to evaluate a new set of pairs based on both The munc:metaList function is declared in the example as "@public". This keyword is implemented in Corese as many others (@define, @visitor, @trace, . . . ) defining specific routines in the Semantic Web engine. The former keyword allows the previous code to be accessed globally in the engine through its reference, and without the need to rewrite the function with each query. The listing 2 translates the metaList algorithm into LDScript. The result of binding this function in a SPARQL query is a string that groups all uncertainty features and their corresponding values from the Universal Uncertainty Information set of the corresponding sentence. Context Corese also implements Linked Functions enabling storing LDScript [9] functions in external SPARQL query files on the Web. Such functions, referenced by IRIs, may be called at the moment of query execution. The former feature permits publishing and executing the calculi of uncertainty approaches. Additionally, this feature may be extended to allow the capitalization of existing software libraries from other programming languages like C++ or Java.
The Semantic Web engine also allows defining specific routines preceding the query execution. One can integrate query transformation or precalculations of some variables. We implemented the previous meta-mapping mode in extension to the visitor "@metadata" and enabled rewriting SPARQL queries to simplify querying for uncertainty information. Using "@metadata" and with munc:metaList publicly defined, querying for the height of the football player Stefano Tacconi in a data source is as follows.

NEGOTIATING UNCERTAINTY ON THE SEMANTIC WEB
In addition to the previous two-step process leading to the generation of Universal Uncertainty Information Sets alongside with sentences, users may actually have a preference for one theory or another.In this section, we will discuss the translatability between uncertainty theories and how, using HTTP content negotiation (conneg), users may negotiate the theory they want for their results.

Translating uncertainty between theories
Many examples reject the claim that uncertainty can be represented only using probability theory. However, the belief about uncertainty being the lack of information or the deficiencies due to a shortage of knowledge urge researchers to believe that it may be unified, or at least, that the different views may be linked. Dubois et al. [15] stated that transformation is useful in any problem where heterogeneous uncertain and imprecise data must be dealt with (e.g. subjective, linguistic-like evaluations and statistical data). Zadeh [25] cites the example of Dempster-Shafer theory which is a theory of random sets. The latter are a probability distribution of possibility distributions. An interesting analysis of the possibility-probability transformation and its links to graphical models can be found in [1].
With the use of our framework, each and every context will issue an answer to the user. If the answers are annotated with the same theory and the same set of features, this enables ranking the results or offers more options to control the results. In the example of search engines, this could support a uniform criteria to order the results shown to the user. However, on an open Web where several open sources are queried, the results might use different features from different theories.
A translation must offer to transform a Universal Uncertainty Information SetÛ S C i j of a sentence S annotated following an uncertainty approach T 1 , to another set annotated with a different uncertainty approach T 2 . The translatability of theories should take into account several issues such as the symmetry, the reversibility, and the possible loss of information.
To fit in with the previous requirements, we define a translatability relationship between two uncertainty theories as follows: Definition 4.1. A theory T 1 has a translatability relationship with a theory T 2 , if there exists a mapping M : F T 1 → F T 2 from the set of features F T 1 represented in theory T 1 to the set of features F T 2 represented in theory T 2 such that every possible feature of F T 1 is mapped to a set of feature of F T 2 semantically coherent with the uncertainty initially expressed in T 1 . We note T 1 >| T 2 .
The former definition is valid for all theories that have a relationship allowing the conversion of features from one theory to another, regardless the loss of information. In case the conversion does not generate a loss of information allowing the reversibility of the operation, we define the relationship as follows: Definition 4.2. A theory T 1 has an ideal translatability relationship with a theory T 2 , if T 1 is translatable to T 2 (T 1 >| T 2 ) and there is no loss of information in the translation. We note T 1 ≫ T 2 We should mention that an ideal translatability might not be reversible, regardless the semantics of the translatability. The loss of information disables the backward operation. If the other case is considered, where we have no loss of information, then we can define a full translation as follows: Definition 4.3. A theory T 1 has a full translatability relationship with a theory T 2 , iff T 1 ideally translatable to T 2 (T 1 ≫ T 2 ) and, inversely, T 2 ideally translatable to T 1 (T 2 ≫ T 1 ). We note T 1 ⊗ T 2 .
Using our mUnc vocabulary and the framework previously proposed, we are able to formalize the translation (if it exists) between the different theories. For this, we extended mUnc with the set of the following properties: • munc:hasTranslation (definition 4.1)    We can note that full translatability being an equivalence relation it allows to form equivalence classes by transitive closure i which we have translatability with no loss of information from a theory T i to any other theory T j of its class. We note this set TC U (T i ).
To illustrate the previous extension, we propose to represent the example proposed in [21] about the Optimal Transformation (OT) from probability to possibility. We declare a translatability relationship between probability theory ex:Probability and possibility theory ex:Possibility representing the two different uncertainty theories. We enrich the data source with the triples below, where ex:translateProbaToPoss is an LDScript function.

Negotiation of Uncertainty Headers
Based on the previous model we can now support the possibility of negotiating answers annotated with different uncertainty theories. Content negotiation can be based on HTTP headers or non-HTTP methods such as query arguments in IRIs. Following the W3C working draft proposed by Svensson et al. [24] we propose that clients may negotiate a representation annotated with a specific uncertainty theory, using q-values to express their preference regarding the uncertainty theories they are to receive. Since uncertainty theories are already defined using mUnc and named with IRIs, both server and client can exchange and verify the conformity of their options. We propose to handle three use cases: (1) Uncertainty information exists in the queried source in one or many requested uncertainty theories. We answer with the first theory selected by the user. In the example, uncertainty information is issued from a context annotated with evidence.
GET /some/resource HTTP/1. (2) The data source does not offer information about all requested theories, but a translation from existing uncertainty information to one or more requested theories is available. In this example, uncertainty is available in probability theory. The returned information are evaluated using the function ex:translateProbaToPoss and presented to the client with an indication about the type of translation the data underwent.
GET /some/resource HTTP/1. We note that the selection of a suitable translation starts from the transitive closure of full translationsTC U (possibility) offering more information and graduates to the normal translatability relationship. (3) The data source has no information about the theory and no available translations, we answer the user with the existing information. The default uncertainty information proposed by the server is returned in such case.

RELATED WORKS
mUnc can extend the work done by Cabrio et al. [6] by enriching the proposed fuzzy labeling algorithms with definitions of other uncertainty theories that can be more suitable to the data. The Linked Data sources can adopt this approach to enrich federated queries with uncertainty information and, progressively, build a consensus-based Linked Data source. A set of other applications such as fake news detection (definition of a theory and logic for fake news), argumentation-based systems and even community-based data sources such as DBpedia can use mUnc to enrich their future content with uncertainty information. Furthermore, mUnc shares the same objectives as the W3C Credible Web community group [20] as for exchanging data which bears directly on credibility assessment while keeping standardization in data interchange. The focus on uncertainty translatability was mainly in AI-based applications. We point that Semantic Web requirements in term of interoperability and information usage are very important and

CONCLUSION AND PERSPECTIVES
In this paper, we discussed the representation and publication of uncertainty on the Semantic Web. We presented a vocabulary allowing the representation of uncertainty theories and the annotation of sentences using the Semantic Web standards. We explained the publishing of reusable uncertainty calculus using LDScript. We also offered the possibility to translate between uncertainty theories and to negotiate uncertainty information following a specific theory.
Uncertainty representation is the first step of a long process, including the preliminary calculus of uncertainty values and the propagation of uncertainty among interconnected Linked Data sources. The translation process is also a first step enabling, to some extent, merging uncertain data annotated using different uncertainty approaches. In our future work, we would like to implement context overlapping, allowing the selectivity inside the source between contexts and the optimization of the storage. We would also study the relationship between data, applications, and uncertainty theories that are used.