A. Ailijiang, A. Charapko, M. Demirbas, and T. Kosar, Multileader WAN Paxos: Ruling the Archipelago with Fast Consensus. arXiv CoRR, abs/1703.08905, 2017.

S. Balaji-arun, R. Peluso, G. Palmieri, B. Losa, and . Ravindran, Speeding up Consensus by Chasing Fast Decisions, International Conference on Dependable Systems and Networks (DSN, 2017.

C. E. , B. Bezerra, F. Pedone, and R. Van-renesse, Scalable State-Machine Replication, International Conference on Dependable Systems and Networks (DSN), 2014.

B. Burns, B. Grant, D. Oppenheimer, E. A. Brewer, J. Wilkes et al., ACM Queue, 2016.

M. Burrows, The Chubby Lock Service for Loosely-Coupled Distributed Systems, Symposium on Operating Systems Design and Implementation (OSDI), 2006.

B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, Benchmarking Cloud Serving Systems with YCSB, Symposium on Cloud Computing (SoCC), 2010.

J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost et al., Spanner: Google's Globally-Distributed Database, Symposium on Operating Systems Design and Implementation (OSDI), 2012.

D. Huynh-tu-dang, M. Sciascia, F. Canini, R. Pedone, and . Soulé, NetPaxos: Consensus at Network Speed, Symposium on Software Defined Networking Research (SOSR), 2015.

S. Gilbert and N. A. Lynch, Brewer's Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Services, 2002.

M. Herlihy and J. M. Wing, Linearizability: A Correctness Condition for Concurrent Objects, ACM Trans. Program. Lang. Syst, 1990.

H. Howard, D. Malkhi, and A. Spiegelman, Flexible Paxos: Quorum Intersection Revisited, International Conference on Principles of Distributed Systems (OPODIS), 2016.

P. Hunt, M. Konar, B. Flavio-paiva-junqueira, and . Reed, ZooKeeper: Wait-free Coordination for Internet-scale Systems, USENIX Annual Technical Conference (USENIX ATC), 2010.

B. C. Flavio-paiva-junqueira, M. Reed, and . Serafini, Zab: High-performance broadcast for primary-backup systems, International Conference on Dependable Systems and Networks (DSN), 2011.

L. Lamport, The Part-Time Parliament, ACM Trans. Comput. Syst, 1998.

L. Lamport, Generalized Consensus and Paxos, 2005.

, Leslie Lamport. Fast Paxos. Distributed Computing, 2006.

L. Lamport, Lower Bounds for Asynchronous Consensus. Distributed Computing, 2006.

J. Li, E. Michael, N. Kr, A. Sharma, D. R. Szekeres et al., Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering, Symposium on Operating Systems Design and Implementation (OSDI), 2016.

S. Liu, P. Viotti, C. Cachin, V. Quéma, and M. Vukolic, XFT: Practical Fault Tolerance beyond Crashes, Symposium on Operating Systems Design and Implementation (OSDI), 2016.

Y. Mao, K. Flavio-paiva-junqueira, and . Marzullo, Mencius: Building Efficient Replicated State Machine for WANs, Symposium on Operating Systems Design and Implementation (OSDI), 2008.

H. Moniz, J. Leitão, R. J. Dias, J. Gehrke, M. Nuno et al., Blotter: Low Latency Transactions for Geo-Replicated Storage, International Conference on World Wide Web, 2017.

I. Moraru, Egalitarian Distributed Consensus, 2014.

I. Moraru, D. G. Andersen, and M. Kaminsky, There Is More Consensus in Egalitarian Parliaments, Symposium on Operating Systems Principles (SOSP), 2013.

F. Nawab, D. Agrawal, and A. E. Abbadi, DPaxos: Managing Data Closer to Users for Low-Latency and Mobile Applications, International Conference on Management of Data (SIGMOD), 2018.

M. Brian, B. Oki, and . Liskov, Viewstamped Replication: A General Primary Copy, Symposium on Principles of Distributed Computing (PODC), 1988.

D. Ongaro and J. K. Ousterhout, Search of an Understandable Consensus Algorithm, 2014.

F. Pedone and A. Schiper, Generic Broadcast, International Symposium on Distributed Computing (DISC), 1999.

S. Peluso, A. Turcu, R. Palmieri, G. Losa, and B. Ravindran, Making Fast Consensus Generally Faster, International Conference on Dependable Systems and Networks (DSN), 2016.

F. B. Schneider, Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial, ACM Comput. Surv, 1990.

A. Shraer, B. Reed, D. Malkhi, and F. Junqueira, Dynamic Reconfiguration of Primary/Backup Clusters, USENIX Annual Technical Conference (USENIX ATC), 2012.

P. Sutra, On the correctness of Egalitarian Paxos, Inf. Process. Lett, 2020.

A. Turcu, S. Peluso, R. Palmieri, and B. Ravindran, Be General and Don't Give Up Consistency in Geo-Replicated Transactional Systems, International Conference on Principles of Distributed Systems (OPODIS), 2014.

M. Uluyol, A. Huang, A. Goel, M. Chowdhury, and H. V. Madhyastha, Near-Optimal Latency Versus Cost Tradeoffs in Geo-Distributed Storage, Symposium on Networked Systems Design and Implementation (NSDI), p.2020

C. Wang, J. Jiang, X. Chen, N. Yi, and H. Cui, APUS: Fast and Scalable Paxos on RDMA, Symposium on Cloud Computing (SoCC), 2017. B State-Machine Replication with Atlas

, State-machine replication (SMR) implements what is called in literature a universal construction 4 , that is a general mechanism to obtain a linearizable shared object from a sequential one. In Appendix A.2, we proved that Atlas correctly implements the SMR protocol specification given in §2. This section explains how to build a universal construction from this protocol. To achieve this, we first introduce some preliminary notions. Then, we explain how to implement any linearizable data type on top of the Atlas protocol

B. , Preliminaries We base our reasoning and algorithms upon the notion of trace 5 , that is a class of equivalent command words. Two words in a class contain the same commands and sort non-commuting ones in the same order. A trace can be seen as as special case of the notion of c

, We assume a sequential object specified by the following components: (i) a set of states ; (ii) an initial state s 0 ? ; (iii) a set of commands that can be performed on the object; (iv) a set of their response values ; and (v) a transition function ? : × ? × . In the following, we use special symbols ? and ? that do not belong to . When applying a command, we use .st and .val selectors to respectively extract the state and the response value, i.e., given a state s and a command c, we let ? (s, c) = (? (s, c).st, ? (s, c).val). Without lack of generality, we consider that commands are applicable to every state. A command c is a read if it does not change the object state: ?s. ? (s, c).st = s; otherwise, c is a write. We denote by Read and Write the set of read and write commands. Command words. A command word x is a sequence of commands. The empty word is denoted 1 and * is the set of all command words

, We write c i ? x when c occurs at least i > 0 times in x. pos(c i , x) is the position of the i-th occurrence of command c in x, with pos(c i , x) = 0 when c i x. The shorthand c i < x d j stands for pos(c i , x) < pos(d j , x)

. Lemma, Then, |xy| c equals |x | c + |y| c . Moreover, if c k ? xy then pos(c k , xy) equals pos(c k , x), if k ? |x | c and |x | + pos

, We define function ? * by the repeated application of ? . In detail, for a state s we define ? * (s, 1) = (s, nil), for some symbol nil ? , and if x is non-empty then we have: ? * (s, x) = ? (s

, Two commands c and d commute, written c d, if in every state s we have: ? * (s, cd)

, * (s, dc).val = ? * (s, c).val

, * (s, cd).val = ? * (s, d).val

, Two words x, y ? * are equivalent, written x ? y, when there exist words z 1 , . . . , z k ?1 such that z 1 = x, z k = y and for all i, 1 ? i < k, there exist words z ? , z ?? and commands c d satisfying z i = z ? cdz ?? , z i+1 = z ? dcz ?? . This means that a word can be obtained from another by successive transpositions of neighboring commuting commands

, Relation x ? y holds iff cmd(x) = cmd(y) and for any c ? d, c i < x d j ? c i < y d j

, If x ? y then for every command c, ? * (s 0 , x | ?c i ).val = ? * (s 0 , y| ?c i ), Lemma 5

M. Herlihy, Wait-free synchronization, ACM Trans. Program. Lang. Syst, 1991.

, The Book of Traces, 1995.

V. Diekert and Y. Métivier, Partial Commutation and Traces, Handbook of Formal Languages, vol.3, 1997.
URL : https://hal.archives-ouvertes.fr/hal-00307048