Instructions:
Read the questions below before responding with an answer. Make sure to read each problem fully, as obtaining full credit on responses will require answering all parts of the question completely. Some questions may warrant use of research to support your answer. Please ensure to cite any outside sources you consult. This exam is also expected to be an individual assessment. Any responses bearing similarity between submissions from different students or other sections of the class will result in a score of 0.
Due Date: Tuesday, February 22nd, 2022 @ 11:59PM |
Note: All questions for this exam will focus on a review of the online retail datasets. Please refer to the appendix for documentation
on how to load these datasets as well as a data dictionary of their contents.
- Introductory Research [25Pts]: You are approached by a UK-based online retail company that provides you data to analyze with the goal of exploring trends, understanding potential markets, and recommending ways to increase revenue. To begin your work, you have decided to explore prior research in online retail data analysis. What prior work has been done in this field? What best-practices have previous studies recommended for exploring trends in this type of data? Which metrics are considered most important for prioritizing in the analysis and informing value? Cite at least TWO prior works in your
- Exploratory Analysis [75Pts]: Using the data provided and your initial review from question 1, complete the following tasks:
- Introduce at least FIVE questions that you plan to explore in the online retail datasets based on the Exploratory Data Analysis [EDA] approach that we introduced in class. Explain why each of the five questions is relevant for your goal of helping understand trends and potential
- Showcase the results of exploring your questions. For each set of results, please provide a ggplot summary to distill your output, then interpret your findings in Please also include the code you used to develop your results and walk through the programmatic logic. For clarity, you will show at least five ggplot summaries with interpretation.
- Use your results to make conclusions and recommendations. Based on the story that your analysis tells, what would you recommend to help the online retailer understand new markets and increase revenue? Is there any covariation that might inform new types of potential customers to target? Does your work show that there are certain products, customers, locations, or timing where the opportunity for increasing sales is the highest?
- OPTIONAL EXTRA CREDIT [10Pts]: Explore ways that you can extract further insights to support your proposed strategy using K-Means First, perform research and define the concept of clustering while also explaining the process of
K-Means clustering and why it is useful. Second, research a methodology to perform K-Means clustering on the online retail dataset and show your resulting clusters in a figure. Lastly, interpret your clusters and explain their relevance.
[Hint: Consider exploring the factoextra library for further considerations regarding plotting cluster outputs].
Note: For this exercise, you are welcome to test any number of cluster centers using any number of variables within the data. The main goal for full credit is to be able to implement your code and interpret why your results are meaningful.