Data Science Capstone: Real World ML Decisions
SPL-214 - Version 1.0.2
This data set is being provided to you by permission of IMDb and is subject to the terms of the AWS Digital Training Agreement (available at https://aws.amazon.com/training/digital-training-agreement). You are expressly prohibited from copying, modifying, selling, exporting or using this data set in any way other than for the purpose of completing this lab.
© 2018 Amazon Web Services, Inc. and its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited.
Errors or corrections? Email us at email@example.com.
Other questions? Contact us at https://aws.amazon.com/contact-us/aws-training/
Welcome to the AWS Machine Learning Data Science Capstone: Real World ML Decisions lab where you’ll build, train, and test a machine learning model from the ground up! In this lab you clean data, conduct feature engineering, compare algorithms, and get a firsthand look at how Amazon employees working with machine learning approach ML pipelines.
This lab synthesizes the math-based topics you learned in the Machine Learning Data Scientist path, and you’ll use machine learning to solve a real-life business challenge that the Amazon Studios team faced in the past. This lab is meant to pair with the free digital content for the Machine Learning Data Science Capstone project found here, by selecting your “Learning Library” and searching for “Capstone” https://www.aws.training/learningobject/wbc?id=27201
For the purposes of this lab:
You are assuming the role of a lead data scientist in 2005 and you’re presented with a challenge: Amazon Studios wants to produce award-winning films and, therefore, focus the budget on projects with the best chance of winning those awards. Using the actual dataset from IMDb, an Amazon subsidiary, for movies made between 1990 and 2005, you begin your investigation.
The IMDb dataset is a feature-rich, comprehensive listing of all films released during that time period; it includes critical data such as cast and crew, synopsis, and other production data.
Your task in this lab is to predict which movies will most likely be nominated for an award during the “upcoming” 2005 awards season by building an awards analysis prediction model.
This lab requires:
- Access to a notebook computer with Wi-Fi and Microsoft Windows, Mac OS X, or Linux (Ubuntu, SuSE, or Red Hat)
- The Qwiklabs lab environment is not accessible using an iPad or tablet device, but you can use these devices to access the student guide.
- For Microsoft Windows users: Administrator access to the computer.
- An Internet browser such as Chrome, Firefox, or IE9 (previous versions of Internet Explorer are not supported).
This lab requires approximately 4 hours to complete.
Notice the lab properties below the lab title:
- setup - The estimated time to set up the lab environment
- access - The time the lab will run before automatically shutting down
- completion - The estimated time the lab should take to complete
- At the top of your screen, launch your lab by clicking
If you are prompted for a token, use the one distributed to you (or credits you have purchased).
A status bar shows the progress of the lab environment creation process. The AWS Management Console is accessible during lab resource creation, but your AWS resources may not be fully available until the process is complete.
- Open your lab by clicking
This will automatically log you into the AWS Management Console.
Please do not change the Region unless instructed.
Common login errors
Error : Federated login credentials
If you see this message:
- Close the browser tab to return to your initial lab window
- Wait a few seconds
- Click again
You should now be able to access the AWS Management Console.
Error: You must first log out
If you see the message, You must first log out before logging into a different AWS account:
- Click click here
- Close your browser tab to return to your initial Qwiklabs window
- Click again
Join Qwiklabs to read the rest of this lab...and more!
- Get temporary access to the Amazon Web Services Console.
- Over 200 labs from beginner to advanced levels.
- Bite-sized so you can learn at your own pace.