Scientists, developers, and other technologists from many different industries are taking advantage of AWS to perform big data analytics and meet the challenges of the increasing volume, variety, and velocity of digital information. AWS offers a portfolio of cloud computing services to help you manage big data by reducing costs, scaling to meet demand, and increasing the speed of innovation. In this quest, you’ll learn to work with advanced services for Big Data.
In this Quest, you will delve deeper into the uses and capabilities of Amazon Redshift. You will use a remote SQL client to create and configure tables, and gain practice loading large data sets into Redshift. You will explore the effects of schema variations and compression. You will explore visualization of Redshift data, and connect Redshift with Amazon Machine Learning to create a predictive data model.
In this lab, you will deploy a fully functional Hadoop cluster, ready to analyze log data in just a few minutes. You will start by launching an Amazon EMR cluster and then use a HiveQL script to process sample log data stored in an Amazon S3 bucket. HiveQL is a SQL-like scripting language for data warehousing and analysis. You can then use a similar setup to analyze your own log files.
In this lab, you will experiment with and compare different types of data loading using Amazon Redshift. You will create tables, load data using S3, remote hosts, and practice troubleshooting data loading errors. For the lab to function as written, please DO NOT change the auto assigned region.
This lab demonstrates how to launch an Amazon Elastic MapReduce (EMR) cluster for Big Data processing and use Hive with SQL-style queries to analyze data. You will create a Hadoop cluster using Amazon EMR which will allow to run interactive Hive queries against data stored in Amazon S3. You will use Hive to normalize the data in a more useful way, and you will run queries to analyze the data.
The lab will give you the basic understanding of Amazon Redshift data warehouse service. It will demonstrate the basic steps required to get started with Redshift: creating a cluster, loading data and performing queries against that data.
Amazon Kinesis Firehose is the easiest way to load streaming data into AWS. This hand-son lab will demonstrate how Amazon Kinesis Firehose can capture and automatically load streaming data into an Elasticsearch cluster.
The lab demonstrates how to use Amazon RedShift to create a cluster, load data, run queries and monitor performance. Note: Students will download a free SQL client as part of this lab.