Correlation and Regression
Linear regression is one of the essential tools in statistical analysis. In this course, we'll walk through step-by-step how to conduct many important analyses using SPSS.
Although you will learn the basics of what these statistics are, we'll avoid complicated mathematical discussions and go right to what you need to know to conduct these analyses.
Linear regression is basically a tool that allows you to test relationships between many variables at the same time, control for variables' effects, and create simple statistical models that allow you to make predictions.
In this course, we'll cover the following key topics:
Correlations: You probably already know this, but understanding how to test the correlation between two variables gets us started in this course.
Simple Linear Regression: Taking correlations one step further by creating a statistical model.
Multiple Linear Regression: Being able to test multiple predictors at the same time and testing the unique effect of each.
Hierarchical Linear Regression: How to test for the influence of different variables by adding them to the model one at a time.
Interaction Analysis: How to test whether there's a two-way interaction between variables (also known as a "moderator" analysis)
Choose the best statistical tool
Statistical Test Selectorby laerd
Work through the steps below to select the appropriate statistical test
for your research. If we do not have a study design that matches your own
Regression and Correlation
These days many employees, during work hours, spend time on the Internet doing personal things, things not related to their work. This is called “cyberloafing.” Research at ECU, by Mike Sage, graduate student in Industrial/Organizational Psychology, has related the frequency of cyberloafing to personality and age.
Suggested statistical tests
Skewed data Recoding and creating new variables
Mann-Whitney U test
Multiple linear regression
Research #4 - Titanic
The ship Titanic sank in 1912 with the loss of most of its passengers. Details can be obtained on 1309 passengers and crew on board the ship Titanic. The main use of this data set is Chi-squared and logistic regression with survival as the key dependent variable. Summary statistics for the categorical variables can be demonstrated and the cost of the ticket (fare) is very skewed so it can be used to demonstrate skewed data and differences between means and medians etc.
Research #5 - Psychological
Logistic regression is used to predict a categorical (usually dichotomous) variable from a set of predictor variables.
As an example of the use of logistic regression in psychological research, consider the research done by Wuensch and Poteat and published in the Journal of Social Behavior and Personality, 1998, 13, 139-150.
Research #6 - Financial Distress
This data set deals with the financial distress prediction for a sample of companies.
Which features are most indicative of financial distress?
What types of machine learning models perform best on this dataset?
Research #7 - Pizza Store Evaluation
PizzaOL is a new Pizzeria in the city that sells Pizza only online using Mobile App since more than 2 years. Customer can pay via e-payment methods such as credit card or PizzaPay. PizzaOL's owner has appointed you to recommend key indicators to assess the PizzaOL performance and to build its own strategy in the next year.
PizzaOL's owner has assigned this task to you to help her build final understanding about her customers in more objective manner. However, PizzaOL's owner wants to plan for next year and wants to allocate the budget in an optimal manner. To do so, you will need to perform different data analysis tasks (see sheet Requirements) in order to capture the impact and effect of different factors and variables on some other key indicators.
Capstone Project Employee Attrition
This project is based on a hypothetical dataset downloaded from IBM HR Analytics Employee Attrition & Performance. It has 1,470 data points (rows) and 35 features (columns) describing each employee’s background and characteristics; and labelled (supervised learning) with whether they are still in the company or whether they have gone to work somewhere else. Machine Learning models can help to understand and determine how these factors relate to workforce attrition.
Perform exploratory data analysis to find a pattern or find and filter the criteria which are most responsible for attrition.