A distributed SQL database is a single relational database which replicates data across multiple servers. Distributed SQL databases are strongly consistent and most support consistency across racks, data centers, and wide area networks including cloud availability zones and cloud geographic zones. Distributed SQL databases typically use the Paxos or Raft algorithms to achieve consensus across multiple nodes.

Sometimes distributed SQL databases are referred to as NewSQL but NewSQL is a more inclusive term that includes databases that are not distributed databases.

History

edit

Google's Spanner popularized the modern distributed SQL database concept. Google described the database and its architecture in a 2012 whitepaper called "Spanner: Google's Globally-Distributed Database." The paper described Spanner as having evolved from a Big Table-like key value store into a temporal multi-version database where data is stored in "schematized semi-relational tables."[1]

Spanner uses atomic clocks with the Paxos algorithm to accomplish consensus with regards to state distributed between servers. In 2010, and earlier implementation, ClustrixDB (now MariaDB Xpand) moved from a hardware appliance to a Paxos-based software database[2] and was later acquired by MariaDB[3] and added to a SaaS cloud offering called SkySQL.[4] In 2015, two Google engineers left the company to create Cockroach DB which achieves similar results using the Raft algorithm without atomic clocks or custom hardware.[5]

Spanner is primarily used for transactional and time-series use cases. However, Google furthered this research with a follow on paper about Google F1 which it describes as a Hybrid transactional/analytical processing database built on Spanner.[1]

Architecture

edit

Distributed SQL databases have the following general characteristics:

  • synchronous replication
  • strong transactional consistency across at least availability zones (i.e. ACID compliance)[6]
  • relational database front end structure – meaning data represented as tables with rows and columns similar to any other RDBMS
  • automatically sharded data storage
  • underlying key–value storage[7][1]
  • native SQL implementation

Following the CAP Theorem, distributed SQL databases are "CP" or consistent and partition-tolerant. Algorithmically they sacrifice availability in that a failure of a primary node can make the database unavailable for writes.

All distributed SQL implementations require some kind of temporal synchronization to guarantee consistency. With the exception of Spanner, most do not use custom hardware to provide atomic clocks. Spanner is able to synchronize writes with temporal guarantees. Implementations without custom hardware require servers to compare clock offsets and potentially retry reads.[8]

Distributed SQL implementations

edit
Vendor API License model
Amazon Aurora PostgreSQL & MySQL Proprietary
CockroachDB PostgreSQL-like Proprietary
Google Spanner Proprietary SQL-like Proprietary
MySQL Cluster MySQL Open Source (GPLv2)
NuoDB Proprietary SQL Proprietary
YugabyteDB PostgreSQL & Cassandra CQL-like Open Source (Apache 2.0)
TiDB MySQL-like Open Source (Apache 2.0)
MariaDB XPand MariaDB Proprietary
Teradata Proprietary SQL-like Proprietary
YDB[9] Proprietary SQL-like, PostgreSQL-like Open Source (Apache 2.0)

Compared to NewSQL

edit

CockroachDB, YugabyteDB and others have at times referred to themselves as NewSQL databases. Some of the NewSQL databases have fundamentally different architectures, but were cited as examples of NewSQL by Matthew Aslett who coined the term.[10] In essence, distributed SQL databases are built from the ground-up and NewSQL databases include replication and sharding technologies added to existing client-server relational databases like PostgreSQL.[11] Some experts define DistributedSQL databases as a more specific subset of NewSQL databases.[12]

References

edit
  1. ^ a b c Shute, Jeff; Whipkey, Chad; Vingralek, Radek; et al. (2013). F1: A Distributed SQL Database That Scales (PDF). The 39th International Conference on Very Large Data Bases, August 26th- 30th 2013, Riva del Garda, Trento, Italy. Vol. 6. VLDB Endowment. Archived (PDF) from the original on 2026-03-10. Retrieved 2026-05-10.
  2. ^ Higginbotham, Stacey (May 3, 2010). "Clustrix Builds the Webscale Holy Grail: A Database That Scales". gigaom.com.[dead link]
  3. ^ "MariaDB acquires Clustrix". 20 September 2018.
  4. ^ Baer (dbInsight), Tony. "For MariaDB, it's time to put the pieces together". ZDNet.
  5. ^ Morgan, Timothy Prickett (February 22, 2017). "Google Spanner Inspires CockroachDB To Outrun It". The Next Platform.
  6. ^ The future of databases: distributed SQL & MariaDB ®, retrieved 2022-12-21
  7. ^ "The Architecture of a Distributed SQL Database". 23 September 2020 – via www.youtube.com.
  8. ^ "Living Without Atomic Clocks". Cockroach Labs. April 21, 2020.
  9. ^ "YDB is an open-source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions". ydb.tech.
  10. ^ "What we talk about when we talk about NewSQL — Too much information". Archived from the original on 2020-06-14. Retrieved 2021-01-26.
  11. ^ "SQL vs. NoSQL Databases: What's the Difference?". www.ibm.com. 12 June 2022.
  12. ^ Prabagaren, Gokul (October 30, 2019). "NewSQL — The Next Evolution in Databases". Medium.

📚 Artikel Terkait di Wikipedia

SQL

the Distributed Data Management Architecture. Distributed SQL processing ala DRDA is distinctive from contemporary distributed SQL databases. SQL deviates

NoSQL

NoSQL (a colloquial title that became formal, meaning "not only SQL" or "non-relational") refers to a type of database design that stores and retrieves

Trino (SQL query engine)

Trino is an open-source distributed SQL query engine designed to query large data sets distributed over one or more heterogeneous data sources. Trino can

Apache Ignite

needed] On top of its distributed foundation, Apache Ignite supports interfaces including JCache-compliant key-value APIs, ANSI-99 SQL with joins, ACID transactions

NewSQL

(database) Distributed Relational Database Architecture Distributed SQL Aslett, Matthew (2011). "How Will The Database Incumbents Respond To NoSQL And NewSQL?"

Presto (SQL query engine)

(including PrestoDB, and PrestoSQL which was re-branded to Trino) is a distributed query engine for big data using the SQL query language. Its architecture

YugabyteDB

YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte. Yugabyte was founded by

Distributed database

2013-07-17. Distributed SQL synchronously accesses and updates data distributed among multiple databases. [...] Distributed SQL includes distributed queries