Calculate type 2 error using one data file

med

New Member
#1
I am a bit confused on how to calculate type 2 error to check whether the sample data I am using is sufficient. I have a data file that I have used to build a machine learning model. This data file consists of 500 entries describing information about entities and the funding these entities have received. The ML model uses these information to predict whether an entity will receive funding in the future or not.

Now I want to calculate type 2 error. In all the courses and tutorials I have read online they talked about two hypothesis. Usually they are two experiments or tests done over a different period of time or in different setups. But I was asked in this project to calculate type 2 error using one data file. I manage to calculate the lower and upper bounds of the null hypothesis; but then I stopped because I don't have the expected mean of the second hypothesis. I actually don't know what should be the alternative hypothesis in this case.

My questions are: is it possible to calculate type 2 error in this case? can I use the 500 entries to build the ML model and consider this the null hypothesis; then get another data-set that contains the same information but describing different entities and use it as the alternative hypothesis (this dataset will not necessarily be different in terms if time or any other parameter)?

Thanks.
 

hlsmith

Omega Contributor
#2
Historically ML is more used with prediction instead of inference. This is partially what you are coming up against. What machine learning algorithm did you use and is this solely a prediction use? You may be likely to do an in-sample confusion matrix, but it will under-represent your type 2 error, since the same data was used to build the model and test it.