Apache Hive
Distributed Data Warehouse at Massive Scale
The Apache Hive™ is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale and facilitates reading, writing, and managing petabytes of data residing in distributed storage using SQL.
Battle-Tested Performance
Over 18 years of development and optimization, handling petabytes of data in production environments across the globe.
Vibrant Ecosystem
Seamlessly integrates with Spark, Presto, Impala, and hundreds of other tools in the modern data stack.
SQL-First Approach
Familiar SQL interface makes it easy for data analysts and engineers to work with big data without learning new languages.
Cloud-Native Ready
Native support for S3, Azure Data Lake, Google Cloud Storage, and other cloud storage systems.
Enterprise Security
Comprehensive security features including Kerberos authentication, fine-grained access control, and audit logging.
Apache Foundation
Backed by the Apache Software Foundation with a strong commitment to open source principles and community governance.
What is Apache Hive?
Apache Hive is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale.
Central Metadata Repository
Hive Metastore (HMS) provides a central repository of metadata that can easily be analyzed to make informed, data driven decisions, making it a critical component of many data lake architectures.
SQL Analytics at Scale
Built on top of Apache Hadoop with support for S3, ADLS, GS and more. Hive allows users to read, write, and manage petabytes of data using familiar SQL syntax.
Key Features & Capabilities
Powerful tools for modern data analytics and management
HiveServer2 (HS2)
HS2 supports multi-client concurrency and authentication with better support for open API clients like JDBC and ODBC, enabling seamless integration with business intelligence tools and applications.
Learn More
Hive Metastore Server (HMS)
The central repository of metadata for Hive tables and partitions, providing clients including Hive, Impala, and Spark access through the metastore service API. A fundamental building block for modern data lakes.
Learn MoreACID Transactions
Full ACID support for ORC tables and insert-only support for all other formats, ensuring data consistency and reliability in concurrent environments.
Learn MoreData Compaction
Query-based and MapReduce-based data compactions are supported out-of-the-box, optimizing storage efficiency and query performance.
Learn More
Apache Iceberg Support
Out-of-the-box support for Apache Iceberg tables, a cloud-native, high-performance open table format, via Hive StorageHandler for modern data lake architectures.
Learn More


Security & Observability
Enterprise-grade security with Kerberos authentication and seamless integration with Apache Ranger for authorization and Apache Atlas for data lineage and governance.
Learn More
Low Latency Analytics (LLAP)
Interactive and sub-second SQL queries through persistent query infrastructure and optimized data caching, making Hive suitable for real-time analytics workloads.
Learn MoreCost-Based Optimizer
Apache Calcite's cost-based query optimizer (CBO) and execution framework automatically optimize SQL queries for optimal performance and resource utilization.
Learn MoreData Replication
Bootstrap and incremental replication capabilities for robust backup and disaster recovery, ensuring business continuity and data protection.
Learn MoreReady to Get Started with Apache Hive?
Join thousands of organizations using Apache Hive to power their data analytics and build modern data lakes.