trinker (11-11-2013)
trinker (11-11-2013)
After Dason explained what inf means I get what his equation is saying.
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
Jake had said in the chatbox that I need to be clearer about what I'm doing with all the individual distance measures. I am taking the mean of them. So from the example above:
I then do (0 + 52 + 19 + 0 + 0)/5 = 14.2Code:A1 = overlap = 0 A2 = |391 - 339| = 52 A3 = |580 - 199| = 19 A4 = overlap = 0 A5 = overlap = 0
I tried to represent this as:
But Jake suggests this may not be adequate.
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
Three comments:
1. The mean that you compute above is the mean distance for only the A set (not the B set). Maybe this is what you want, I don't know. My assumption had been that you wanted the mean distance over the entire dataset (i.e., over both the A and B sets). Maybe you can clarify.
2. I would just write , not .
3. The summation symbol is ambiguous. From looking at the formula alone it is not clear if you are summing over the 's, the 's, or over both (in which case there should really be two summation symbols).
In God we trust. All others must bring data.
~W. Edwards Deming
No Jake I want the information to be kept separate. Sorry I was not clear about this but yes I calculate for only set A and I will then reverse the process and calculate it for only set B.
Thanks. I wasn't sure about this and asked above but in me flury of questions it got lost. Thanks.
Now that I made point one clearer (I hope) do you still feel this way? I don't know how to show I want to sum all the distances from each code A interval to the nearest code B interval.
PS If people want to provide feedback and don't feel like typing out the LaTeX:
is...
Code:\frac{1}{n_{a_i}}\sum{\max{\left \{ a_i-b_j,a_j-b_i, 0 \right \}}}
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
It is not really a matter of opinion: The formula alone, without additional outside knowledge about what you are doing with these numbers (which we reading this thread do have, but viewers of your talk and readers of your paper will not have), is ambiguous. Summing over the 's or summing over the 's in your expression would potentially yield completely different results, and we can't tell from the unmarked summation sign which it is supposed to be. Because I have read the rest of this thread, I happen to know that you want to sum over the 's, not the 's. So it should look like this:
Notice the subscript qualifying the summation symbol.
The only issue that remains is that your distance measure describes how the distance between and is computed once you have determined which is closest for each , but it does not describe exactly how you determine which is closest for each . Do you see what I mean?
In God we trust. All others must bring data.
~W. Edwards Deming
@Jake I thought the takes care of that.
This is also my ignorance but it seems I actually want to sum all not as my thinking was that in this particular equation the meant beginning of a interval and meant end of interval. This may mean I actually don't understand as much as I thought.
So my thought is maybe:
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
I think we need to just stop for a moment. This is what BGM wrote
This only gives the distance between two fixed intervals. That isn't the end operation that you're looking to do. For each interval in your set A you want to look at the distance between that interval and all of the intervals in B and then take the smallest distance. However I honestly think the notation isn't ideal here. Since you have a set of intervals that you call A and another set that you call B using as the notation for our intervals is in my opinion... not ideal.
I don't have emotions and sometimes that makes me very sad.
That makes sense. I'm realizing this.
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
I would start by defining some notation for the intervals themselves.
Let where where for all (ie A is the set of your intervals you call 'A'
Let where where for all (ie B is the set of intervals you call 'B').
Then we can define
Then the 'distance' function you want to describe can be written as
where
I don't have emotions and sometimes that makes me very sad.
bryangoodrich (11-11-2013), Jake (11-11-2013), trinker (11-11-2013)
Ok here's a rework:
Let represent the interval of code a and the interval of code b. Also let be the start of an interval and be the end of an interval.
Is this more sensible (i.e. does it capture what I'm doing in a mathematically accurate way)? How can it be improved? Critique away.
EDIT I see Dason has put something up while I was thinking this through.
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
Tweet |