Lesson 1: Background Concepts
Apache Hadoop version 1 has become a popular tool for working with big data. One of the limitations of Hadoop, however, has been the single MapReduce computational paradigm. Apache Hadoop YARN addresses this and other issues. In this lesson you learn the fundamental differences between Hadoop version 1 and Hadoop 2 with YARN and the five clear advantages of the new YARN design.
Lesson 2: Running Hadoop YARN on Personal Systems
A production Hadoop installation, whether it be a local cluster or in the cloud, can be difficult to configure and costly to operate. This lesson presents several installation scenarios including a single laptop, a desktop, a small cluster, and the Cloud. Both Apache Hadoop source and the Hortonworks HDP Sandbox are used for local systems, and when installing in the Cloud, Apache Whirr is demonstrated. These environments can be used to try most of the examples in this tutorial.
Lesson 3: Functional Description of YARN Components
Apache Hadoop YARN introduces new components to the Hadoop ecosystem. In this lesson an explanation of what each of these components does and how they interact with each other is clearly presented. In addition, various YARN scheduler options and acomplete application life cycle walk through are explained.
Lesson 4: Apache Hadoop YARN Cluster Installation Methods
Installing Hadoop is not as hard as it used to be. In this lesson, both shell script and graphical installation methods are described. The graphics installation employs the new open source Ambari tool. In addition, a the steps to install and configure the Ganglia and Nagios cluster monitoring tools are provided.
Lesson 5: Apache Hadoop YARN Cluster Administration Methods
In this Lesson, monitoring and administering an Apache Hadoop YARN cluster are described. Similar to the installation lesson, both shell scripts and the Ambari GUI tool are presented. Several essential administration tips for Apache Hadoop YARN are also provided.
Lesson 6: Running Existing Applications with Apache Hadoop YARN
One of the successful goals of Hadoop version 2 was compatibility with version 1 applications. In this Lesson, the new MapReduce framework that runs under YARN is explained. Almost all existing applications are compatible, and any important differences are presented. In addition, job tracking using the new YARN web GUI is demonstrated.
Lesson 7: Using YARN Distributed Shell and Introduction to Applications
Apache Hadoop YARN includes an application called distributed shell that enables shell commands to be run within YARN Containers on cluster nodes. In this lesson, a distributed shell example is presented and then expanded into a blueprint for other YARN applications.
Lesson 8: Exploring Apache Hadoop YARN Application Frameworks
The Apache Hadoop YARN architecture enables non-MapReduce applications to operate on Hadoop clusters. This capability has spawned a new set of applications that can take advantage Hadoop’s big data capabilities. In this lesson, some of these application frameworks and how they differ from MapReduce processing are introduced.