MIS 655 Grand Canyon University Data Analytics & Classification Trees Questions
MIS-655 Classification Trees
Directions: Use the information below to complete this assignment.
For this assignment, you will use the “Carseats” dataset located in R’s ISLR package.
You are an analyst for a manufacturer of child car safety seats. Management has collected data on 400 stores and wants to know if you can identify the variables that make a store a high-sales volume store (more than 10,000 units sold per year). The data includes information on competitor prices, shelf location, the community location of the store (income level, average age, urban/non-urban), and other variables. Your task is to determine the indicators that suggest a store will produce a high-sales volume and communicate your findings to management.
Question 1: What are the assumptions of classification and regression tree (CART) models? What are the limitations of CART models? For what types of business problems would CART be an appropriate model to use? Use a specific example to support your rationale. MIS 655 Grand Canyon University Data Analytics & Classification Trees Questions
Question 2: Load the ISLR library into your R environment. Within this library is the data set you need for this assignment. Load the “Carseats” data into an object called “carseats” and check to ensure that the data loaded correctly (you should have 400 rows and 11 columns). Management has asked you to predict high sales volume, but the sales variable is the number of units sold (in thousands) in the last year. Create a new variable called “HighVol” that has the classes “yes” and “no” to indicate whether the location sold 10,000 units or more in the past year. How many stores produced a high volume?
Question 3: Load the rpart library into your R environment (rpart contains the tree function necessary for fitting CART models). Partition the data set into training (60%) and testing (40%) sets. Build a single classification tree with the training data and HighVol as the target. (Hint: Be sure to exclude the Sales variable from the model since it was used to create our outcome variable.)
- Which variable(s) were used in the tree model?
- How would you use the model to predict whether or not a store would produce a high volume?
- What is the accuracy of the model when using the training and test data? Use the function confusionMatrix to create a misclassification table to include with your answer.
- Consider the following store: ShelvLoc = Good, Price = 115, no local advertising budget, and local income of $46,000. Based on the classification model, would a store with those features be predicted to be a high-performing store? Explain your answer.
Question 4: Pruning a tree is important to ensure that the model has not overfit the data. Following the example provided in the book, prune the model created in Question 3 to minimize the cross-validation error. How did the tree change? How many levels does the pruned tree include? What are the 3 most important variables and their relative importance according to the pruned tree model? MIS 655 Grand Canyon University Data Analytics & Classification Trees Questions
Question 5: What is the accuracy of the pruned tree model when using the training and test data? Use the function confusion Matrix to create a misclassification table to include with your answer.
Part 2 (Analysis of results and recommendations): Based upon your analysis, what are the indicators of whether a store will produce a high sales volume? Discuss how management can use this information as it considers where to target its efforts for expanding its product line. Present your findings and recommendations in the form of a 250-word (minimum) executive summary that includes relevant data, charts, and tables in Microsoft Word. Be sure to include your R code and R output as a .txt file with your submission. MIS 655 Grand Canyon University Data Analytics & Classification Trees Questions