Apache hadoop YARN: Yet another resource negotiator

Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, Eric Baldeschwieler

Research output: Chapter in Book/Report/Conference proceedingConference contribution

808 Citations (Scopus)

Abstract

The initial design of Apache Hadoop [1] was tightly focused on running massive, MapReduce jobs to process a web crawl. For increasingly diverse companies, Hadoop has become the data and computational agorá - the de facto place where data and computational resources are shared and accessed. This broad adoption and ubiquitous usage has stretched the initial design well beyond its intended target, exposing two key shortcomings: 1) tight coupling of a specific programming model with the resource management infrastructure, forcing developers to abuse the MapReduce programming model, and 2) centralized handling of jobs' control flow, which resulted in endless scalability concerns for the scheduler. In this paper, we summarize the design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN. The new architecture we introduced decouples the programming model from the resource management infrastructure, and delegates many scheduling functions (e.g., task fault-tolerance) to per-application components. We provide experimental evidence demonstrating the improvements we made, confirm improved efficiency by reporting the experience of running YARN on production environments (including 100% of Yahoo! grids), and confirm the flexibility claims by discussing the porting of several programming frameworks onto YARN viz. Dryad, Giraph, Hoya, Hadoop MapReduce, REEF, Spark, Storm, Tez.

Original languageEnglish
Title of host publicationProceedings of the 4th Annual Symposium on Cloud Computing, SoCC 2013
PublisherAssociation for Computing Machinery (ACM)
ISBN (Print)9781450324281
DOIs
Publication statusPublished - 01-01-2013
Event4th Annual Symposium on Cloud Computing, SoCC 2013 - Santa Clara, CA, United States
Duration: 01-10-201303-10-2013

Conference

Conference4th Annual Symposium on Cloud Computing, SoCC 2013
CountryUnited States
CitySanta Clara, CA
Period01-10-1303-10-13

Fingerprint

Fault tolerance
Electric sparks
Flow control
Scalability
Scheduling
Industry

All Science Journal Classification (ASJC) codes

  • Software

Cite this

Vavilapalli, V. K., Murthy, A. C., Douglas, C., Agarwal, S., Konar, M., Evans, R., ... Baldeschwieler, E. (2013). Apache hadoop YARN: Yet another resource negotiator. In Proceedings of the 4th Annual Symposium on Cloud Computing, SoCC 2013 [5] Association for Computing Machinery (ACM). https://doi.org/10.1145/2523616.2523633
Vavilapalli, Vinod Kumar ; Murthy, Arun C. ; Douglas, Chris ; Agarwal, Sharad ; Konar, Mahadev ; Evans, Robert ; Graves, Thomas ; Lowe, Jason ; Shah, Hitesh ; Seth, Siddharth ; Saha, Bikas ; Curino, Carlo ; O'Malley, Owen ; Radia, Sanjay ; Reed, Benjamin ; Baldeschwieler, Eric. / Apache hadoop YARN : Yet another resource negotiator. Proceedings of the 4th Annual Symposium on Cloud Computing, SoCC 2013. Association for Computing Machinery (ACM), 2013.
@inproceedings{f793732e92ba4445a870a105d0373b7c,
title = "Apache hadoop YARN: Yet another resource negotiator",
abstract = "The initial design of Apache Hadoop [1] was tightly focused on running massive, MapReduce jobs to process a web crawl. For increasingly diverse companies, Hadoop has become the data and computational agor{\'a} - the de facto place where data and computational resources are shared and accessed. This broad adoption and ubiquitous usage has stretched the initial design well beyond its intended target, exposing two key shortcomings: 1) tight coupling of a specific programming model with the resource management infrastructure, forcing developers to abuse the MapReduce programming model, and 2) centralized handling of jobs' control flow, which resulted in endless scalability concerns for the scheduler. In this paper, we summarize the design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN. The new architecture we introduced decouples the programming model from the resource management infrastructure, and delegates many scheduling functions (e.g., task fault-tolerance) to per-application components. We provide experimental evidence demonstrating the improvements we made, confirm improved efficiency by reporting the experience of running YARN on production environments (including 100{\%} of Yahoo! grids), and confirm the flexibility claims by discussing the porting of several programming frameworks onto YARN viz. Dryad, Giraph, Hoya, Hadoop MapReduce, REEF, Spark, Storm, Tez.",
author = "Vavilapalli, {Vinod Kumar} and Murthy, {Arun C.} and Chris Douglas and Sharad Agarwal and Mahadev Konar and Robert Evans and Thomas Graves and Jason Lowe and Hitesh Shah and Siddharth Seth and Bikas Saha and Carlo Curino and Owen O'Malley and Sanjay Radia and Benjamin Reed and Eric Baldeschwieler",
year = "2013",
month = "1",
day = "1",
doi = "10.1145/2523616.2523633",
language = "English",
isbn = "9781450324281",
booktitle = "Proceedings of the 4th Annual Symposium on Cloud Computing, SoCC 2013",
publisher = "Association for Computing Machinery (ACM)",
address = "United States",

}

Vavilapalli, VK, Murthy, AC, Douglas, C, Agarwal, S, Konar, M, Evans, R, Graves, T, Lowe, J, Shah, H, Seth, S, Saha, B, Curino, C, O'Malley, O, Radia, S, Reed, B & Baldeschwieler, E 2013, Apache hadoop YARN: Yet another resource negotiator. in Proceedings of the 4th Annual Symposium on Cloud Computing, SoCC 2013., 5, Association for Computing Machinery (ACM), 4th Annual Symposium on Cloud Computing, SoCC 2013, Santa Clara, CA, United States, 01-10-13. https://doi.org/10.1145/2523616.2523633

Apache hadoop YARN : Yet another resource negotiator. / Vavilapalli, Vinod Kumar; Murthy, Arun C.; Douglas, Chris; Agarwal, Sharad; Konar, Mahadev; Evans, Robert; Graves, Thomas; Lowe, Jason; Shah, Hitesh; Seth, Siddharth; Saha, Bikas; Curino, Carlo; O'Malley, Owen; Radia, Sanjay; Reed, Benjamin; Baldeschwieler, Eric.

Proceedings of the 4th Annual Symposium on Cloud Computing, SoCC 2013. Association for Computing Machinery (ACM), 2013. 5.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Apache hadoop YARN

T2 - Yet another resource negotiator

AU - Vavilapalli, Vinod Kumar

AU - Murthy, Arun C.

AU - Douglas, Chris

AU - Agarwal, Sharad

AU - Konar, Mahadev

AU - Evans, Robert

AU - Graves, Thomas

AU - Lowe, Jason

AU - Shah, Hitesh

AU - Seth, Siddharth

AU - Saha, Bikas

AU - Curino, Carlo

AU - O'Malley, Owen

AU - Radia, Sanjay

AU - Reed, Benjamin

AU - Baldeschwieler, Eric

PY - 2013/1/1

Y1 - 2013/1/1

N2 - The initial design of Apache Hadoop [1] was tightly focused on running massive, MapReduce jobs to process a web crawl. For increasingly diverse companies, Hadoop has become the data and computational agorá - the de facto place where data and computational resources are shared and accessed. This broad adoption and ubiquitous usage has stretched the initial design well beyond its intended target, exposing two key shortcomings: 1) tight coupling of a specific programming model with the resource management infrastructure, forcing developers to abuse the MapReduce programming model, and 2) centralized handling of jobs' control flow, which resulted in endless scalability concerns for the scheduler. In this paper, we summarize the design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN. The new architecture we introduced decouples the programming model from the resource management infrastructure, and delegates many scheduling functions (e.g., task fault-tolerance) to per-application components. We provide experimental evidence demonstrating the improvements we made, confirm improved efficiency by reporting the experience of running YARN on production environments (including 100% of Yahoo! grids), and confirm the flexibility claims by discussing the porting of several programming frameworks onto YARN viz. Dryad, Giraph, Hoya, Hadoop MapReduce, REEF, Spark, Storm, Tez.

AB - The initial design of Apache Hadoop [1] was tightly focused on running massive, MapReduce jobs to process a web crawl. For increasingly diverse companies, Hadoop has become the data and computational agorá - the de facto place where data and computational resources are shared and accessed. This broad adoption and ubiquitous usage has stretched the initial design well beyond its intended target, exposing two key shortcomings: 1) tight coupling of a specific programming model with the resource management infrastructure, forcing developers to abuse the MapReduce programming model, and 2) centralized handling of jobs' control flow, which resulted in endless scalability concerns for the scheduler. In this paper, we summarize the design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN. The new architecture we introduced decouples the programming model from the resource management infrastructure, and delegates many scheduling functions (e.g., task fault-tolerance) to per-application components. We provide experimental evidence demonstrating the improvements we made, confirm improved efficiency by reporting the experience of running YARN on production environments (including 100% of Yahoo! grids), and confirm the flexibility claims by discussing the porting of several programming frameworks onto YARN viz. Dryad, Giraph, Hoya, Hadoop MapReduce, REEF, Spark, Storm, Tez.

UR - http://www.scopus.com/inward/record.url?scp=84893249524&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893249524&partnerID=8YFLogxK

U2 - 10.1145/2523616.2523633

DO - 10.1145/2523616.2523633

M3 - Conference contribution

AN - SCOPUS:84893249524

SN - 9781450324281

BT - Proceedings of the 4th Annual Symposium on Cloud Computing, SoCC 2013

PB - Association for Computing Machinery (ACM)

ER -

Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R et al. Apache hadoop YARN: Yet another resource negotiator. In Proceedings of the 4th Annual Symposium on Cloud Computing, SoCC 2013. Association for Computing Machinery (ACM). 2013. 5 https://doi.org/10.1145/2523616.2523633