Data cleaning process -descriptive analysis

#1
I 'm trying to do a data cleaning to a large dataset (3200 variables with 108000 subjects). There are cells with values like " <190" or " >=85" or a text " overweight" at variables with anthropometric measurements. Could you advice me which is the best way to treat these cases in a descriptive analysis. For example, if the variable of interest measures the height and there are 100 cells with values " <150cm" what is better to do? How can i calculate the mean?
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
I would report the count per category along with percentage. I am guessing the damage is done and you can't get back the continuous values if they ever existed.
 
#3
Data cleaning process
Keep a record and look at trends of where most errors are coming from, as this will make it a lot easier to identify fix the incorrect or corrupt data. This is especially important if you are integrating other solutions with your fleet management software, so that errors don’t clog up the work of other departments.
It’s important that you standardize the point of entry and check the importance of it. By standardizing your data process you will ensure a good point of entry and reduce the risk of duplication.
Validate the accuracy of your data once you have cleaned your existing database. Research and invest in data tools that allow you to clean your data in real-time. Some tools now even use AI or machine learning to better test for accuracy.
Identify duplicates, since this will help you save time when analyzing data. This can be avoided by researching and investing in different data cleaning tools, as mentioned above, that can analyze raw data in bulk and automate the process for you.
After your data has been standardized, validated, and scrubbed for duplicates, use third-party sources to append it. Reliable third-party sources can capture information directly from first-party sites, then clean and compile the data to provide more complete information for business intelligence and analytics.
Communicate the new standardized cleaning process to your team. Now that you’ve scrubbed down your data, it’s important to keep it clean. This will help you develop and strengthen your customer segmentation and send more targeted information to customers and prospects, so you want to make sure you get your team in line with it.
https://rankifyweb.com