Goal 3
Test of hypothesis
One of the more difficult skills in data analysis is deciding which statistical models and tests to use in a particular situation.
The choice of statistical model/test is affected by two things:
-
The kind of question we are asking.
-
The nature of data we have:
-
what type of variables: ratio, interval, ordinal or nominal?
-
are the assumptions of a particular model or test satisfied by the data?
-
-
The schematic key (below) provides a overview of the statistical models and tests we’ve covered in this book, structured in the form of a key. The different choices in the key are determined by a combination of the type of question being asked, and the nature of the data under consideration.
Test of normality - Awards example
The number of awards earned by students at one high school. Predictors of the number of awards earned include the type of program in which the student was enrolled (e.g., vocational, general or academic) and the score on their final exam in math. This data set is used to show an example of the Poisson Regression. The predicted variable is the number of awards and the predictors are the program type and the Maths score.
Suggested tests
Descriptive statistics
Skewed data
Kruskall-Wallis test
Poisson regression
T-test - Cholesterol example
A study tested whether cholesterol was reduced after using a certain brand of margarine as part of a low fat, low cholesterol diet. The subjects consumed on average 2.31g of the active ingredient, stanol easter, a day. This data set contains information on 18 people using margarine to reduce cholesterol over three-time points. The data set can be used to demonstrate paired t-tests, repeated measures ANOVA and a mixed between-within ANOVA using the final variable ‘Margarine’. The dataset is also good for discussion about meaningful differences as the difference between weeks 4 and 8 is very small but significant
Suggested tests
Descriptive statistics
Recoding and computing new variables
T-tests
ANOVA: within groups & repeated measures
Mann-Whitney U test - Crime example
This data set gives a variety of variables by US state at two time points 10 years apart. A variety of regressions and t-tests can be carried out with the main scale dependent being and Crime Rate (offences per million population) and t-tests with the independent being whether or not the state is in the south. Mostly discrete variables as they measure populations per 1000, there are some continuous variables such as those measuring expenditure
Suggested tests
Descriptive statistics
Skewed data
Recoding and computing new variables
T-tests
Mann-Whitney U test
Scatterplots and correlation
Simple linear and multiple regression
Chi-squared test
Cluster analysis
Paired samples t-test - students volunteer example
A teacher developed 3 exams for the same course. He needs to know if they're equally difficult so he asks his students to complete all 3 exams in random order. Only 19 students volunteer. Their data -partly shown below- are in compare-exams.sav. They hold the number of correct answers for each student on all 3 exams.
Suggested tests
Descriptive statistics
Skewed data
One-sample t-test
Mann-Whitney U test
Two samples t-test
Paired samples t-test
Research #3
Absenteeism- is a major expense to most organizations. Getting a handle on it, predicting it and affecting it is important for organizations. This dataset provided for HR data scientists to practice on
Inspiration
1- Describe the data set
2- Insight the most Absenteeism features
3- Report the analysis