Data professionals have a lot of options when it comes to managed cloud-based analytics warehouses. As the technical program manager for Google BigQuery, I may be biased, but when I look out at competitive offerings, it’s manageability that really sets BigQuery apart.
When it comes to cloud analytics services, the term “fully managed” tends to be used quite broadly. However, not all cloud data warehouses are created equal. BigQuery’s unique serverless architecture offers a high standard of what it means to be a “fully managed” technology. In the end, BigQuery users benefit from an always-improving, seamlessly scalable, fast and reliable service.
Let’s take a look at how BigQuery is architected, and how that translates into better manageability for end users.
Under the hood, BigQuery employs a vast set of multi-tenant services driven by low-level Google infrastructure technologies like Dremel, Colossus, Jupiter and Borg.
Folks can start using BigQuery by simply loading data and running SQL commands. There's no need to build, deploy or provision clusters; no need to size VMs, storage, or hardware resources; no need to setup disks, define replication, configure compression and encryption, and so forth.
Users are able to seamlessly scale to dozens of petabytes and back to zero because BigQuery engineers have already deployed the resources required to reach this scale. Therefore, scaling is simply a matter of using BigQuery more, rather than provisioning larger clusters. Folks just need to mind best practices and usage quotas.
BigQuery employs the Capacitor columnar storage format on top of Colossus storage system, writing customer data in an opinionated fashion that's optimized for performance and durability. Under the hood, background processes continually study and optimize storage. BigQuery users are insulated from this underlying complexity.
BigQuery does not have a concept of primary keys, sort keys, indexes or distribution keys, simplifying database administration. One only needs to optimize for cost by defining partitioned tables, or perhaps employing a data sharding strategy.