+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 16 to 27 of 27

Thread: Representing a distance measure with math equation

  1. #16
    Dark Knight
    Points: 6,762, Level: 54
    Level completed: 6%, Points required for next Level: 188
    vinux's Avatar
    Posts
    2,011
    Thanks
    52
    Thanked 241 Times in 205 Posts

    Re: Representing a distance measure with math equation




    Quote Originally Posted by Dason View Post
    You could try something like this:
    Let B_j be the jth interval in B then define
    B = \cup B_j

    Now

    \text{dist}(A_i, B) = inf\{|x - y| \text{ such that } x \in A_i, y \in B\}

    One of the other methods might be better for your audience though?
    If A and B are closed sets, you can replace inf with min.


    EDIT: @trinker, I think you can take this measure.
    In the long run, we're all dead.

  2. The Following User Says Thank You to vinux For This Useful Post:

    trinker (11-11-2013)

  3. #17
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: Representing a distance measure with math equation

    After Dason explained what inf means I get what his equation is saying.
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  4. #18
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: Representing a distance measure with math equation

    Jake had said in the chatbox that I need to be clearer about what I'm doing with all the individual distance measures. I am taking the mean of them. So from the example above:

    Code: 
    A1 = overlap     =  0
    A2 = |391 - 339| = 52
    A3 = |580 - 199| = 19
    A4 = overlap     =  0
    A5 = overlap     =  0
    I then do (0 + 52 + 19 + 0 + 0)/5 = 14.2

    I tried to represent this as:



    But Jake suggests this may not be adequate.
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  5. #19
    Cookie Scientist
    Points: 13,431, Level: 75
    Level completed: 46%, Points required for next Level: 219
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,293
    Thanks
    66
    Thanked 584 Times in 438 Posts

    Re: Representing a distance measure with math equation

    Three comments:

    1. The mean that you compute above is the mean distance for only the A set (not the B set). Maybe this is what you want, I don't know. My assumption had been that you wanted the mean distance over the entire dataset (i.e., over both the A and B sets). Maybe you can clarify.

    2. I would just write n_a, not n_{a_i}.

    3. The summation symbol is ambiguous. From looking at the formula alone it is not clear if you are summing over the i's, the j's, or over both (in which case there should really be two summation symbols).
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  6. #20
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: Representing a distance measure with math equation

    Quote Originally Posted by Jake View Post
    1. The mean that you compute above is the mean distance for only the A set (not the B set). Maybe this is what you want, I don't know. My assumption had been that you wanted the mean distance over the entire dataset (i.e., over both the A and B sets). Maybe you can clarify.
    No Jake I want the information to be kept separate. Sorry I was not clear about this but yes I calculate for only set A and I will then reverse the process and calculate it for only set B.

    Quote Originally Posted by Jake View Post
    2. I would just write n_a, not n_{a_i}.
    Thanks. I wasn't sure about this and asked above but in me flury of questions it got lost. Thanks.

    Quote Originally Posted by Jake View Post
    3. The summation symbol is ambiguous. From looking at the formula alone it is not clear if you are summing over the i's, the j's, or over both (in which case there should really be two summation symbols).
    Now that I made point one clearer (I hope) do you still feel this way? I don't know how to show I want to sum all the distances from each code A interval to the nearest code B interval.


    PS If people want to provide feedback and don't feel like typing out the LaTeX:



    is...

    Code: 
    \frac{1}{n_{a_i}}\sum{\max{\left \{ a_i-b_j,a_j-b_i, 0 \right \}}}
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  7. #21
    Cookie Scientist
    Points: 13,431, Level: 75
    Level completed: 46%, Points required for next Level: 219
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,293
    Thanks
    66
    Thanked 584 Times in 438 Posts

    Re: Representing a distance measure with math equation

    Quote Originally Posted by trinker View Post
    Now that I made point one clearer (I hope) do you still feel this way?
    It is not really a matter of opinion: The formula alone, without additional outside knowledge about what you are doing with these numbers (which we reading this thread do have, but viewers of your talk and readers of your paper will not have), is ambiguous. Summing over the i's or summing over the j's in your expression would potentially yield completely different results, and we can't tell from the unmarked summation sign which it is supposed to be. Because I have read the rest of this thread, I happen to know that you want to sum over the i's, not the j's. So it should look like this:

    Notice the i subscript qualifying the summation symbol.

    The only issue that remains is that your distance measure describes how the distance between i and j is computed once you have determined which j is closest for each i, but it does not describe exactly how you determine which j is closest for each i. Do you see what I mean?
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  8. #22
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: Representing a distance measure with math equation

    @Jake I thought the takes care of that.

    This is also my ignorance but it seems I actually want to sum all a not a_i as my thinking was that in this particular equation the a_i meant beginning of a interval and a_j meant end of interval. This may mean I actually don't understand as much as I thought.

    So my thought is maybe:

    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  9. #23
    Devorador de queso
    Points: 95,754, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,932
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Representing a distance measure with math equation

    I think we need to just stop for a moment. This is what BGM wrote

    Quote Originally Posted by BGM View Post
    Assume you have intervals in the form of [a_i, b_i] in which you already know a_i \leq b_i.

    It seems that you want to define the "distance" between two intervals [a_i, b_i] and [a_j, b_j] to be

    \max\{a_i-b_j, a_j - b_i, 0\}
    This only gives the distance between two fixed intervals. That isn't the end operation that you're looking to do. For each interval in your set A you want to look at the distance between that interval and all of the intervals in B and then take the smallest distance. However I honestly think the notation isn't ideal here. Since you have a set of intervals that you call A and another set that you call B using [a_i, b_i] as the notation for our intervals is in my opinion... not ideal.
    I don't have emotions and sometimes that makes me very sad.

  10. #24
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: Representing a distance measure with math equation

    That makes sense. I'm realizing this.
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  11. #25
    Devorador de queso
    Points: 95,754, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,932
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Representing a distance measure with math equation

    I would start by defining some notation for the intervals themselves.

    Let A = \{A_1, A_2, \ldots, A_n\} where A_i = [a_{i,1}, a_{i,2}] where a_{i,1} < a_{i,2} for all i=1,\ldots,n (ie A is the set of your intervals you call 'A'

    Let B = \{B_1, \ldots, B_m\} where B_i = [b_{i,1}, b_{i,2}] where b_{i,1} < b_{i,2} for all i=1,\ldots,m (ie B is the set of intervals you call 'B').

    Then we can define

    f(A_i, B_j) = max\{a_{i,1} - b_{j,2}, b_{j,1} - a_{i,2}, 0\}

    Then the 'distance' function you want to describe can be written as

    \text{dist}(A, B) = \frac{1}{n} \sum_{i=1}^n \text{min}_j f(A_i, B_j)

    where \text{min}_j f(A_i, B_j) = \text{min}\{f(A_i, B_1), f(A_i, B_2), \ldots, f(A_i, B_m)\}
    I don't have emotions and sometimes that makes me very sad.

  12. The Following 3 Users Say Thank You to Dason For This Useful Post:

    bryangoodrich (11-11-2013), Jake (11-11-2013), trinker (11-11-2013)

  13. #26
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: Representing a distance measure with math equation

    Ok here's a rework:

    Let a represent the interval of code a and b the interval of code b. Also let s be the start of an interval and e be the end of an interval.



    Is this more sensible (i.e. does it capture what I'm doing in a mathematically accurate way)? How can it be improved? Critique away.

    EDIT I see Dason has put something up while I was thinking this through.
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  14. #27
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: Representing a distance measure with math equation


    @Dason Thanks. I have to read more to understand the formula completely (I follow it but the f() is unfamiliar. I assume it means the function of....). But I'm getting a sense of how important it is to describe ahead of time what each variable stands for.
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats