Scientists, developers, and other technologists from many different industries are taking advantage of AWS to perform big data analytics and meet the challenges of the increasing volume, variety, and velocity of digital information. AWS offers a portfolio of cloud computing services to help you manage big data by reducing costs, scaling to meet demand, and increasing the speed of innovation. In this quest, you’ll learn to work with advanced services for Big Data.
In this Quest, you will delve deeper into the uses and capabilities of Amazon Redshift. You will use a remote SQL client to create and configure tables, and gain practice loading large data sets into Redshift. You will explore the effects of schema variations and compression. You will explore visualization of Redshift data, and connect Redshift with Amazon Machine Learning to create a predictive data model.
In this lab, you will deploy a fully functional Hadoop cluster, ready to analyze log data in just a few minutes. You will start by launching an Amazon EMR cluster and then use a HiveQL script to process sample log data stored in an Amazon S3 bucket. HiveQL is a SQL-like scripting language for data warehousing and analysis. You can then use a similar setup to analyze your own log files.
In this lab, you will experiment with and compare different types of data loading using Amazon Redshift. You will create tables, load data using S3, remote hosts, and practice troubleshooting data loading errors. For the lab to function as written, please DO NOT change the auto assigned region.
SPL-223 - Hybrid Storage and Data Migration with AWS Storage Gateway File Gateway
This lab demonstrates how to launch an Amazon Elastic MapReduce (EMR) cluster for Big Data processing and use Hive with SQL-style queries to analyze data. You will create a Hadoop cluster using Amazon EMR which will allow to run interactive Hive queries against data stored in Amazon S3. You will use Hive to normalize the data in a more useful way, and you will run queries to analyze the data.
Amazon Kinesis Firehose is the easiest way to load streaming data into AWS. This hand-son lab will demonstrate how Amazon Kinesis Firehose can capture and automatically load streaming data into an Elasticsearch cluster.
In this lab you will enable client-side at-rest encryption using AWS KMS-managed key for data stored in Amazon S3 with the EMR File System (EMRFS). Within Amazon EMR you will create security configuration to encrypt the object written to S3 with client-side encryption using the AWS KMS-managed key specified by you, and decrypt objects with the same key that was used to encrypt them. This will allow you to more easily leverage frameworks like Apache Spark, Apache Tez, and Apache Hadoop MapReduce on Amazon EMR to run big data analytics, stream processing, machine learning, and ETL workloads on confidential data.