15.572 - Analytics Lab: Action Learning Seminar on Analytics, Machine Learning & the Digital Economy
(Fall 2017, 9 units)
Instructors: Sinan Aral, Erik Brynjolfsson
In this course, student teams select and deliver a project using analytics, machine learning, or other digital technologies to solve business problems.
Each project presents unique and specific challenges and includes access to a full dataset. Projects fit in a variety of industries and sectors and address a diversity of advanced problem types. In the first three years of the course, project sponsors included Amazon, Boston Public Schools, Dell Services, eBay, Gates Foundation, GE Transportation, IBM Watson, LinkedIn, MasterCard, Nasdaq, and others.
Class rosters from previous years have been comprised of students from a variety of programs including MBA, EMBA, Sloan Fellows, MFin, SDM, IDM, LGO, ORC, MSMS, EECS, and Urban Studies. Analytics Lab is also a required course for all students in the MBAn program.
Class meeting: Thursdays 4-5:30pm, plus a project sponsor pitch session in mid-September and a final presentation workshop in early December (attendance is mandatory).
Fall 2016 Syllabus
The admissions cycle for Fall 2017 will begin in May. The link to the application will appear on this webpage.
Selective admission by application only (no bidding necessary): coursework or experience in analytics, statistics, computer science, management, and economics; applications considered on the basis of relevant learning, experience, and motivation toward data analytic work, with extra weight given to data analytic courses taken and to data analytic project and job experience; attention given to a representation of students with technical and computational experience, managerial experience, experience implementing analytical models, and entrepreneurial work using analytics.
The course is not open to listeners and in-person attendance is mandatory.
For questions, please contact Susan Young <susany @ mit.edu>; also see slides presented at April 2016 Action Learning Open House.
Projects from 2014 and 2015 have included:
1. Big Data as a Service (Amazon, 2014)
- Develop demand forecasting of value to Amazon’s retail vendors
- Data: 1 million records of daily transactions of one product group (textbooks), 16 variables, no vendor identification
- Team effort: Correlations, data visualization (Loess Regression with R), exploration of best sales predictor variables
- Recommendations for further model development
2. The “Myth of the Crystal Ball”: Understanding Forecasting Errors at Amazon (Amazon, 2015)
- Challenge: Help Amazon quantify the impact of supply chain forecasting errors to better prioritize forecast improvements in the future
- Data: 75 million rows containing daily demand and forecast data for 206 thousand products over two weeks
- Analysis: Defined different kinds of costs associated with forecasting errors and their magnitudes. Used statistical methods in R running on a cloud computing system to quantify lost profit due to forecast error
- Recommendation: Incorporate indirect costs into the evaluation of forecasting errors. Look for variation across product categories
3. Understanding Supply and Demand in the Boston Public Schools (Boston Public Schools, 2015):
- Use the BPS student dataset to generate hypotheses about what drives demand for schools in the Boston area, helping BPS to "right-size" school districts
4. Populating "Popular Now": Rebooting our News Story Recommendation Algorithm (Christian Science Monitor, 2015):
- Develop a news recommendation algorithm to drive page views and user engagement on the Christian Science Monitor site. Try to beat the existing "Popular Now" algorithm
5. Understanding Successful eBay Sale Prices (eBay, 2015):
- Challenge: Find the factors that best predict successful prices for new and used eBay items in different categories and under a variety of sales conditions
- Data: 3 months of sales data, totaling over 147 million separate transactions (about 24 gb with some preprocessing required)
- Analysis: Using machine learning and a “bag-of-words” model, looked into inclusion of special characters and its effect on price, drivers of the difference in prices between new and used items, and price differences between auctions and Buy-It-Now goods
- Recommendations for further analysis: Define a “feature space” for different goods on eBay, perform seller network analysis, and use timing to better predict prices
6. Predicting Hospital Readmission (Dell, 2015)
- Challenge: Use analytics to find the factors that best predict 30-day hospital readmission
- Data: 1500 patient admissions at one US hospital, with 26 fields describing each case
- Analysis: Generated additional features, then used logistic regression, support vector machines, and classification trees to predict readmission
- Recommendation: Expand analysis to more hospitals and incorporate data from new sources (e.g. wearables) to help reduce readmission risk
7. Finding the Next Watson Use Case (IBM Watson, 2014)
- Case chosen: compliance by financial institutions with federal regulations
- Problem: Dodd-Frank and Volker Rule impose 1700 pages of regulations, affecting millions of a large bank’s documents, requiring thousands of FTE’s, estimated $70b cost of compliance by all large banks over last six years
- Team proposal: Use Watson as a “Regulatory Analyst” to sift information, identify and connect info, conduct impact analysis, and make decisions on changes to comply
- Represents huge savings in cost of compliance, improved quality and timeliness of response; could also be used by regulators to streamline regulation
8. Identifying Fraud for an Online Gift Card Platform (Raise Marketplace, 2015):
- Challenge: Develop an algorithm to help Raise classify transactions as fraudulent or legitimate
- Data: 100 thousand transactions labeled as “fraudulent” or “non-fraudulent” with 83 descriptive fields
- Analysis: Classified transactions using naïve Bayes, penalized logistic regression, and tree-based methods. Achieved 99.7% accuracy!
- Recommendation: Incorporate user information, time of day, and transaction size to predict fraud
9. Predictive Maintenance in the Elevator and Escalator Industry (Schindler Elevator, 2015):
- Challenge: Help Schindler use predictive analytics to revise its maintenance strategy and better perform preventative intervention
- Data: 1000 elevator-specific files describing elevator operation and maintenance needs
- Analysis: Used regression techniques to predict potential need for future maintenance and likelihood of service trips for different elevator codes
- Recommendation: Determine the appropriate priority for elevator maintenance given limited resources. Error codes can be predicted, but potentially more important is efficient allocation of resources
10. Using Geospatial Data to Develop a New Kind of Football Analytics (Telemetry Sports, 2015):
- Challenge: Use a new source of geospatial NFL data to classify plays, evaluate players, and design football strategy
- Data: Real NFL game data from selected Indianapolis Colts plays, as well as over 10,000 simulated football plays from EA’s Madden NFL game
- Analysis: The team used machine learning and regression techniques to identify player positions on the field, isolate player routes in game, classify plays, calculate new measures of “player elusiveness”, and project expected yardage per play
- Recommendation: Geospatial data offers significant opportunities for evaluating success in sports. This type of analysis would be particularly useful for optimal play selection
11. Multi-channel Consumer Profiling for eCommerce (WOOX Innovations, 2014)
- Provide more segmentation and profiles of potential customers for our high quality headphones
- Data: Internal data on sales efforts, such as results of 1M email sales campaign
- Team designed and initiated an analytical approach: conducted a survey (via M-Turk) of consumer brand attitudes, motivations to buy;
- Team conducted a social media analysis of perceptions of brands
- Recommendations: specific consumer segments by age, activity in social media, type of phone, etc. and next steps of marketing focus for WOOX
12. Predicting New Product Adoption for American Apparel (Zensar, 2014)
- Sponsor challenge: “We may have people with experience, wisdom, and opinions, predicting sales of a new line of jeans. Can we do better with analytics?”
- Data: For 128 products introduced in 2013-2014, total sales by week, prices, and some other variables
- Team explored the data: Four adoption archetypes discovered: Uniform, Blockbuster, Linear, and Stairstep; BUT nothing in the data enabled a prediction
- Team scrambled with Zensar and AA, got social media data of consumer comments on some products in the database. Using Word Tree text analytics, plots of extracted valuation comments against subsequent sales volumes
- Conclusion: “Social Media data provide useful insights on consumer and show correlations with sales that should be explored further.”