1. V

    Can I remove these outliers?

    Hi, is it acceptable if I remove the outliers with charges above 55k for this regression analysis? Or is there any other option to minimize their impact in the model? Thank you
  2. A

    Can I exclude outliers when calculating mean or standard deviation (small-sample)?

    Hey, I am analyzing my results for a high-school biology paper. My data set consists of 20 groups in total, each having 4 repeats, so each is a small sample. I have a few (4) outliers that have a value lower than the control group (which is practically impossible and thus certainly the effect of...
  3. K

    SigmaPlot Boxplots

    Hi all, New here and hoping someone might be able to help (with what is a potentially stupid/obvious question--so apologies if that is the case). I am trying to make some boxplots on SigmaPlot, however I have realized the calculation they use to compute the quartiles, as well as the whisker...
  4. A

    Am I dealing with outliers, or something else (skewness of 106)?

    So I have not exactly a homework problem, but I just discovered how fun statistic modelling is, and usually I use already clean datasets. However, I am dealing with a credit default dataset that a lecturer showed me as a challenge. I want to do a logistic regression, a random forest, and XGBoost...
  5. M

    How many standard deviations to determine outliers

    Are outliers more than two or three standard deviations from the mean?
  6. O

    Appropriate use of z-scores

    I was thinking about z-scores and I'm curious about their usage when data are skewed/non-normal. I often see zscores being used to identify outliers, e.g. with z>1.96, 2.58, etc. HOWEVER: the z-score calculation of z = (x - mean(x)) / stdev(x) is dependent on the mean, and the mean is not an...
  7. A

    Quick Cronbach's Alpha Question

    Hi all, This may seem like a silly question, but I was wondering if I should remove outliers prior to running reliability analysis or run the tests with outliers included. I am hoping to get a Cronbach's alpha values for three questionnaires that I used in my research.
  8. 1

    Winsorizing when I have different size groups

    I have a sample that consists of large, medium, and small firms and i want to run a separate regression for each size group. When I winsorize a variable should I do it for the whole sample (i.e. select the variable in the whole sample) or for each size group alone (i.e select the large firm...
  9. A

    LOF : Local Outlier Factor

    I have a few questions about Lof : http://www.dbs.ifi.lmu.de/Publikationen/Papers/LOF.pdf 1. I don't understand of k-distance(o) inside reach-distr definition reach-distk(p, o) = max { k-distance(o), d(p, o) }? 2. in my opinion before i use lof,i have to scale(scale function in R) and...
  10. L

    Outliers, winorising and t-test

    I have two group of 25 and 27 people. I want to compare them on a sex activity score that vary from 0 to 9. My distribution is algedly normal, the skewness and kurtosis respect parameter, but most people have answers like 0 or 1. I have three people eith a score of 4, and one 6. Using z...
  11. S

    K-means cluster for skewed dataset

    Hi, I've 7 columns(variables) and their percentiles are shown as below for 14K rows. I've tried to create k-means clusters for 14k observations of 7 variables. A-G are products the numerics are turnover for A-G If you look at the table, massive dataset is having no turnover for all...
  12. N

    Removing outliers - endless story?

    I've been collecting some data for the first time, investigating differences between three groups of people and their numbers of hospital visits. Before running any tests (probably ANCOVA with a few covariates), I have made a box-plot in SPSS to visualize the data. There are many outliers, which...
  13. J

    increasing the statistical significance of outliers

    Can someone give me some assistance on the use case below? Summary To create a Key Performance Indicator (as a percent of server utilization where 0% is the best and 100% the worst) that will alert a user when one or more servers in a 30 server environment needs attention. Details Each...
  14. R

    Durbin Watson related with outliers?

    I am trying to run a multiple regression with a sample data belongs to a cohort of 500 subjects. My dependant variable is a psychological score (scale), independant variables are age(scale), sex(nominal) and education(scale). I run the test and Durbin-Watson was 1.99 which is good but I had 2...
  15. G

    winsorisation of a numerical scale from 1 to 23 assigned to the credit rating

    Dear Members, Iam preparing a regression model and my supervisor told me to winsorise all the variables. However, among the variables I have a numerical scale from 1 to 23 assigned to the rating of the 3 major rating agencies. Do you think make sense to winsorise it? can the values of 1 and...
  16. A


    Hello there, I have data set of 5846 observations out of which 15 observations are outliers. I need to perform multiple linear regression which as I know is highly sensitive to outliers. Do I have to filter those 15 outliers out or they will not mess up my analysis, since the number of...
  17. M

    Detecting outliers non parametric data

    Hi, what is the best (and easy) way to detect outliers in spss for skewed, not exactly normally distributed data? I mainly need it for logistic regression an I assume that even tho logistic r. does not assume normality I still need to make sure that there are no extreme outliers Thanks a lot
  18. G

    Assessing outliers in a proportion

    Hi all, Just wondering how one would assess outliers in the following: At the start of a two week period, people were asked to nominate a number of good deeds that they would perform in the next two weeks (N). At the end of the two weeks, they report how many they actually perform (C)...
  19. O

    outliers in survival probability

    Hi all, We use survival data of 90 patients that were divided in two sub groups. (31, 59). Two patients (out of the 31 sub group) survived 50% longer than all others. Are they considered outliers? What are the criterions to define outliers? Thanks for your help, Or :D
  20. I

    Fix holes in the sales history for forecast preperation

    I have to forecast about +/- 700 products for my company and I am planning on using some kind of smoothing method like Holt-Winter. The problem i have, is that the history has "holes" in it, meaning there are dips in the sales history due to out of stock situations. Before i can calculate a...