Biometric analysis help


Im attempting to perform an analysis on a number of characteristics of a subject to identify them. For any given user, I have anywhere from four to twenty data sets, A, B, C, D... etc, each containing any number of integers (but all of the same length): A = {213, 456, 9802...}, B = {804, 4329, 903...}... etc. These data sets are obtained when the user is registered into the program.

Then, to log in, the same characteristics are measured once each; for example, on one attempt to log in, the computer might get the data set {A=123, B=234, ...etc}.

My question: how can i determine if the user attempting to get in is the same as the one that was registered?

This is assuming that between registration and identification, the characteristics stay about the same; this i have tested and is true.

My current method is to, for each list(A, B, C...), test whether the login value is within one standard deviation of the list values. If this is true for each list, the user is accepted; if any value is outside of the standard deviation, the user is rejected.

What i am looking for is a way to determine:
1) The threshold values for standard deviations, eg 1 standard deviation, 2 standard deviations that the target value should be within
2) The percentage of failed lists that is considered acceptable before the user is rejected
that provides the lowest False Acceptance and False Rejection rates, probably based on the number and/or length of lists.

I know programming, and i know biometrics, but i dont know statistics. It seems like there should be a simple formula for calculating this, and it also seems like my problem can be reduced to something much simpler, but i lack the the statistics experience necessary to figure out what these are. Thank you in advance for your help!
You are trying to compare the means of different log-ins of the same individuals. If the Z is 2 standard deviations, you are 95 out of 100 times sure that the candidate is the same. If you want to be 99% sure, standard deviation times (Z) should be 2.56 times.
Thank you for your response!

This would be true if there was only one data set per person; however, there is a list of data sets for each person. How many data points in the list would need to be within 2 standard deviations to validate the user as a whole?


New Member
Depending on the nature of your data, I would use supervised machine learning techniques.

For example, if you are recording a large number of various variables (say, eye color, height, weight, hair color, etc., etc), but for each data set the variables stay the same, and if the number of these variables is large, I would use a machine learning technique such as support vector machines or random forests.

In such a setting, your data is a large N x K matrix of N variables and K samples (usually N>>K), and the samples are grouped into categories. In your case, variables correspond to the subsequent integers in the data sets; the data sets are samples; the users are the categories.

The goal of the algorithm is to derive a function that can correctly assign a previously unknown sample (="data set") to the category (="user").

By training a machine learning algorithm, you can not only derive an optimal "formula" to make a decision based on your biometric data set, but also based on the prior knowledge estimate the error rates (misclassifications).

In R, there are several packages that can do the job for you, including CMA, MLearnInterface and of course the various packages for various methods.