📑 Table of Contents
Apache ORC
Initial release20 February 2013; 13 years ago (2013-02-20)[1]
Stable release
2.1.2 / 6 May 2025; 13 months ago (2025-05-06)[2]
Operating systemCross-platform
TypeDatabase management system
LicenseApache License 2.0
Websiteorc.apache.org
RepositoryORC Repository

Apache ORC (Optimized Row Columnar) is a free and open-source column-oriented data storage format.[3] It is similar to the other columnar-storage file formats available in the Hadoop ecosystem such as RCFile and Parquet. It is used by most of the data processing frameworks Apache Spark, Apache Hive, Apache Flink, and Apache Hadoop.

In February 2013, the Optimized Row Columnar (ORC) file format was announced by Hortonworks in collaboration with Facebook.[1] A calendar month later, the Apache Parquet format was announced, developed by Cloudera and Twitter.[4]

Apache ORC format is widely supported including Amazon Web Services' Glue[5],Google Cloud Platform's BigQuery,[6] and Pandas (software).[7]

History

edit
Version Original release date Latest version Release date
Unsupported: 1.0 2016-01-25 1.0.0 2016-01-25
Unsupported: 1.1 2016-06-10 1.1.2 2016-07-08
Unsupported: 1.2 2016-08-25 1.2.3 2016-12-12
Unsupported: 1.3 2017-01-23 1.3.4 2017-10-16
Unsupported: 1.4 2017-05-08 1.4.5 2019-12-09
Unsupported: 1.5 2018-05-14 1.5.13 2021-09-15
Unsupported: 1.6 2019-09-03 1.6.14 2022-04-14
Unsupported: 1.7 2021-09-15 1.7.8 2023-01-21
Supported: 1.8 2022-09-03 1.8.9 2025-05-06
Supported: 1.9 2023-06-28 1.9.6 2025-05-06
Supported: 2.0 2024-03-08 2.0.5 2025-05-06
Latest version: 2.1 2025-01-09 2.1.2 2025-05-06
Legend:
Unsupported
Supported
Latest version
Preview version

See also

edit

References

edit
  1. ^ a b Alan Gates (February 20, 2013). "The Stinger Initiative: Making Apache Hive 100 Times Faster". Hortonworks blog. Archived from the original on March 28, 2013.
  2. ^ "Apache ORC - Releases". Retrieved 15 May 2025.
  3. ^ Yin Huai, Siyuan Ma, Rubao Lee, Owen O'Malley, and Xiaodong Zhang (2013). "Understanding Insights into the Basic Structure and Essential Issues of Table Placement Methods in Clusters ". VLDB' 39. pp. 1750–1761. CiteSeerX 10.1.1.406.4342. doi:10.14778/2556549.2556559.{{cite conference}}: CS1 maint: multiple names: authors list (link)
  4. ^ Justin Kestelyn (March 13, 2013). "Introducing Parquet: Efficient Columnar Storage for Apache Hadoop". Cloudera blog. Archived from the original on September 19, 2016. Retrieved May 4, 2017.
  5. ^ "Using the ORC format in AWS Glue". docs.aws.amazon.com. Retrieved August 21, 2024.
  6. ^ "Load an ORC file". cloud.google.com/bigquery/docs. Retrieved May 15, 2025.
  7. ^ "pandas.read_orc". pandas.pydata.org. Retrieved May 15, 2025.

📚 Artikel Terkait di Wikipedia

Apache Arrow

domains, including analytics, genomics, and cloud computing. Apache Parquet and Apache ORC are popular examples of on-disk columnar data formats. Arrow

Data orientation

processing (OLAP). Examples of column-oriented formats include Apache ORC, Apache Parquet, Apache Arrow, formats used by BigQuery, Amazon Redshift and Snowflake

Apache Parquet

versioning and transactional integrity. Apache Parquet is comparable to RCFile and Optimized Row Columnar (ORC) file formats — all three fall under the

Orc (disambiguation)

Company Apache ORC, a file format Orc (album), an album by Oh Sees Orc (programming language) Orcs: First Blood, a series of books by Stan Nicholls Orc (sometimes

List of free and open-source software packages

Hierarchical Data Format .ods – OpenDocument Spreadsheet .orc – Apache ORC .parquet – Apache Parquet .protobuf – Protocol Buffers developed by Google .shp

Zlib

packages. The Apache Subversion and CVS version control systems, which use zlib to compress traffic to and from remote repositories. The Apache ORC column-oriented

Apache Hive

Hive were plain text, sequence file, optimized row columnar (ORC) format and RCFile. Apache Parquet can be read via plugin in versions later than 0.10 and

Apache Iceberg

Retrieved 5 October 2022. "FastIngest: Low-latency Gobblin with Apache Iceberg and ORC format". engineering.linkedin.com. Archived from the original on