Help with correcting spatial bias and sampling effort


I've recently been carrying out a nationwide survey into tick distribution in the UK. I've collected data by questionnaire from vet practices over the last 8months and am just about to start analysing my data but have come across a rather large problem.

I realise I have a rather large spatial bias in my data. I wanted to cover the whole of the UK randomly. But I have found that vet practices in towns and cities were far more likely to take part than those in the country who don't have as many staff. Is there a way to correct for this?

Also, I have a problems with sampling effort. I originally asked all vet practices to fill in five questionnaires a week using randomly picked dogs. This hasn't quite gone to plan, some vet practices have done this, others have sent in 3 one week and 12 the next, others have only sent a questionnaire in when they have found a tick on a dog (I want negative results as well as positive ones as I wanted to create a distribution model). Is there any possible way to correct this?

My statistical knowledge is quite poor, but any help would be fantastic.




TS Contributor
Some recommendations

PROBLEM No. 1: Representation

You should avoid those problems using the sampling weights, which you should have calculated based in the sampling probabilities assigned to each vet. This weights can be thought as a measure of how much individuals are represented by an observation. This one would be the best option.

If you can't obtain the sampling weights, then you could create some "importance weights" to your data. They are are weights that indicate the "importance" of an observation in some vague sense (For instance, if you think that you only reached half of the country vets, you may give each country vet a weight of 2). This is not a formal procedure, so you must be careful when assigning values to each observation.

PROBLEM No. 2: Missing or Incomplete Data

This is a common problem in every survey. Usually, statisticians take a larger sample than required so they can drop the useless or incomplete cases. When that is not possible, the only alternative is an amputation method for missing data. Now, this is a bit hard if you don't have a statistical background knowledge so you may need to get some help for that.

Good Luck