# Outliers Parametric model

#### acp

##### New Member
I'm struggling to decide how to normalise my data for modeling.

I am dealing with crowdfunded projects and a huge chunk of them (15%) raised $0 -$10 dollars, therefore failing. Those produce a very strong positive skew that is impossible to normalise. (tried log, z-scores, cubic). And will not adding value to my model.

Therefore I decided to remove them. (although these outliers are valid).

I tried Winsorizing them, but suspected that was wrong. Thus concluding that my only option is to trim them by dropping the top and bottom 10% of values.

Is this approach correct, or is there a better method?
Perhaps a non-parametric model..

#### obh

##### Active Member
Please don't remove a correct outlier, this is a mistake.

Maybe there is a way to transform to normal if you want you can send example data.
Anyway, non-parametric tests are a good choice, with no normality assumption

#### acp

##### New Member
Please don't remove a correct outlier, this is a mistake.

Maybe there is a way to transform to normal if you want you can send example data.
Anyway, non-parametric tests are a good choice, with no normality assumption
Hi, thanks for replying. That was my impression as well.

Here is the link to the data; https://ufile.io/vgxm1
4 features of crowdfunding campaigns on Kickstarter: ID, Backers Funding Goal (numeric)

The data is raw, so I can post a jupyter notebook tomorrow if that helps. However it should be clean enough for a quick look through.

#### acp

##### New Member
Is it really unreasonable? My understanding is that they don't get the money unless they reach their goal so setting it to be \$1 and bring happy with whatever they raise might be something that makes sense to them
Thats a great observation Dason. Didn't think of that.

Unfortunately for me this puts trimming out of the question. Any ideas on how to proceed?

Thanks

#### Dason

Can you post a sample of the data and say what your goal/question is

#### acp

##### New Member
Can you post a sample of the data and say what your goal/question is
I can only upload this in txt format and its quite a small sample. the full thing can be found here as a csv: https://ufile.io/vgxm1

The goal is to construct a regression model to predict the funding of campaigns based on the other input variables.

Even some pointers for non-parametric models or outlier detection would be usefull, since I spent a lot of time doing the wrong thing, it seems.
So far i trimmed, removed outliers 2std from mean, added a contant so as to change 1s and be able to use log...

I'm new to this stuff so thanks for the patience and interest!

#### Attachments

• 62.8 KB Views: 4