
PREPARING THE DATA FOR ANALYSIS
OUTLIER TREATMENT
Outliers are extreme values that might affect the assumptions of a parametric models. It is important to investigate the reasons for the presence of such extreme values in the variable/s however since our data is secondary we do not have the chance of doing that.
Presence of outliers for each variables can be detected using a boxplot or a histogram. For our analysis we have used boxplots to detect the variables with outliers. We have decided to treat the outliers by whinsorizing, i.e. replacing the outliers by a certain benchmark. We have used the following benchmark to detect and whinsorize the outliers:
-
Lower Benchmark: 1st Quartile - 1.5*Interquartile Range
-
Upper Benchmark: 3rd Quartile + 1.5*Interquartile Range
However we have analysed that there is no outliers beyond the lower benchmark for any of the variables. We have also identified that the variable nos. 14, 17, 19, 20 and 21 has outlier values. The following R Code will help in correcting that.
