# Replacing or remove data

#### russel2000

##### New Member
Hi guys, I need your help. I have a quite large database. It contains names of companies (observations) a its financial ratios. As you know, database of financial ratios has too much outliers - for example, one company has ROA = 6 % and the other has ROA = 500 %.

So, I need this outliers either replace by another number (for example by number of a 95 percentile) or remove. What is better?

And how can I in SAS this outliers remove or replace? I have maybe 20 financial ratios and I need to do this for each one.

Thank you very much for your suggestions.

#### Con-Tester

##### Member
To extract a subset that includes only those ROA values that fall within a given range, the following SAS code can be used:
Code:
[B]data[/B]	[i]<subset_name>[/i];
[B]set[/B]	[i]<all_set_name>[/i];
[B]where[/B] ([i]<min_value>[/i] <= ROA <= [i]<max_value>[/i]);
[B]run[/B];
To create a dataset that adjusts ROA values to fall within a given range, the following SAS code can be used:
Code:
[B]data[/B]	[i]<subset_name>[/i];
[B]set[/B]	[i]<all_set_name>[/i];
[B]if[/B] (ROA < [i]<min_value>[/i]) [B]then[/B]
ROA = [i]<min_value>[/i];
[B]else if[/B] (ROA > [i]<max_value>[/i]) [B]then[/B]
ROA = [i]<max_value>[/i];
[B]run[/B];
In the above code snippets, the correct library+dataset name specifications and values must of course be substituted for “<subset_name>”, “<all_set_name>”, “<min_value>” and “<max_value>”.

To decide which strategy is the better one, it would help to know how the ROA values are distributed, and also what you wish to do with them (for example, build a given type of model versus generating a simple summary report).