preferred equation display

trinker

ggplot2orBust
#1
I'm looking for people's opinions on the best way to present the same equation.Below are a series of differently formatted equations. Which equation would be preferred in a publication and why?

\(F1 = \frac{\frac{\sum{f_{i}}}{N} - \frac{\sum{c_{i}}}{N}+ 100}{2}\)

\(F2 = \frac{\sum{f_{i}}}{2N} - \frac{\sum{c_{i}}}{2N}+ 50\)

\(F3 = \frac{\sum{f_{i}} - \sum{c_{i}}}{2N}+ 50\)

\(F4 = \frac{\sum\limits_{i=1}^{n}{f_{i}} - \sum \limits_{i=1}^{n}{c_{i}}}{2N}+ 50\)

\(F5 = \frac{1}{2N}\sum\limits_{i=1}^{n}{f_{i}} - \frac{1}{2N}\sum \limits_{i=1}^{n}{c_{i}}+ 50\)

I've labeled them F1 through F5 to distinguish them.

Thank you in advance.
 

vinux

Dark Knight
#2
I would choose something more readable. So I may not choose F1( F1 is not bad). I guess the important part is one should follow consistent style. Different Journals have different preference of aesthetics in the equations.

You could also use slant line (/) for fractions if the denominator is small ( I have seen this in JASA journals).
 
#5
If you are talking about pretty-ness factor only, I like the last two best. I have no real logic behind this decision except they are pleasing to the eye and not so cramped they are impossible to read.
 

vinux

Dark Knight
#6
Ohh.. I thought your question was in general. What about this.

\(F6 = \frac{1}{2N} \left ( \sum\limits_{i=1}^{n}{f_{i}} - \sum \limits_{i=1}^{n}{c_{i}} \right )+ 50\)

or
\(F7 = \frac{1}{2N} \sum\limits_{i=1}^{n} ({f_{i}} - {c_{i}})+50\)
or
\(F8 = \sum\limits_{i=1}^{n} \frac{({f_{i}} - {c_{i}})}{2N} +50\)
 
Last edited:

trinker

ggplot2orBust
#7
n is the number of items that are f (formal) or on the c summation the number of items that are contextual. N is the total number of words used. f_i, c_i are the individual items that are formal or contextual. I may not have the notation correct as the original paper did a pretty poor job of writing the equation mathematically.

I actually really like vinux's equation.
 

Dason

Ambassador to the humans
#8
But are f_i and c_i just indicator functions? The problem I have right now is that your notation isn't very clear. I'm thinking it might just be cleaner to define \(n_f\) as the number of words that are formal and \(n_c\) be the number of words that are contextual. Then you reduce the whole thing down to \(\frac{n_f - n_c}{2N} + 50\) which gets rid of the summations. This could probably be cleaned up a little too but if this accurately represents the quantity of interest then I think it conveys the meaning much more clearly.
 

trinker

ggplot2orBust
#9
I agree Dason, I didn't know I could do that as I'm not a math back ground. Particularly to other people in my field this equation is much more clear.
 
#10
I agree with Dason. But since I tend to think o “n” as something that is fixed in advanced and not a random variable, I would prefer to substitute Dasons “n_f” simply with “f”. Thus letting f = total number of formal and c total number contextual.

Thus ((f-c)/2N) + 50

Then I am starting to speculate about the how the underlying proportion of p:s might vary. Maybe as a Kalman filter and even varying variance like in garch models. (And thereby I have been influenced by Trinker in being more contextual and less formal in using such an expression as garch.)

Edit: I never really saw how the terms were defined so I could have completely misunderstood it all and been talking in the blue.
 
Last edited:

Dason

Ambassador to the humans
#11
Is that right though? I mean if \(n_f\) is always less than or equal to N then the upper bound is 50.5 (and by similar reasoning the lower bound is 49.5).
 

trinker

ggplot2orBust
#13
Here's their original formula. I was trying to make it more pleasing and using N rather than having to stat that the frequencies are percentages ( guess what I did in R was multiply by 100 so Dason that allwos it to vary 0-100):

F = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun freq. – verb freq. – adverb freq. – interjection freq. + 100)/2

FRANCIS HEYLIGHEN and JEAN-MARC DEWAELE (2002) said:
If we add up the frequencies of the formal categories,
subtract the frequencies of the deictic categories and normalize to 100, we get a
measure which will always increase with an increase of formality. This leads us to the
following simple formula:

F = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun
freq. – verb freq. – adverb freq. – interjection freq. + 100)/2

The frequencies are here expressed as percentages of the number of words belonging to
a particular category with respect to the total number of words in the excerpt. F will
then vary between 0 and 100% (but obviously never reach these limits). The more
formal the language excerpt, the higher the value of F is expected to be.
Maybe I'm miscalculating/misrepresenting it but it should be able to vary between 0 and 100 (though it will never actually reach those bounds)

Here's an R example of how to calculate it:

Code:
fi <-c(10, 20, 12, 5)   #formal     (the 4 categories)
ci <- c(12, 13, 15, 3)  #contextual (the 4 categories)
oi <- 6

N <- sum(fi) + sum(ci) + oi

((100*sum(fi/N) - sum(100*ci/N)) + 100)/2


#more contextual
fi <-c(10, 20, 12, 5)   #formal     (the 4 categories)
ci <- c(32, 43, 15, 3)  #contextual (the 4 categories)
oi <- 6

N <- sum(fi) + sum(ci) + oi

((100*sum(fi/N) - sum(100*ci/N)) + 100)/2

#more formal
fi <-c(50, 30, 62, 25)   #formal     (the 4 categories)
ci <- c(32, 43, 15, 3)  #contextual (the 4 categories)
oi <- 6

N <- sum(fi) + sum(ci) + oi

((100*sum(fi/N) - sum(100*ci/N)) + 100)/2
I'm looking for a simple an eloquent equation.

Greta asked:

Cant you just take the proportion 100*sum(fi/N) ? (formal proportion)
in the chatbox...

Code:
#more formal
fi <-c(50, 30, 62, 25)   #formal     (the 4 categories)
ci <- c(32, 43, 15, 3)  #contextual (the 4 categories)
oi <- 6

N <- sum(fi) + sum(ci) + oi

> sum(fi/N)*100
[1] 62.78195
> 
> ((100*sum(fi/N) - sum(100*ci/N)) + 100)/2
[1] 63.90977
>
They give slightly different outputs. So how can I make their poorly formatted equation look more mathematical?
 

Dason

Ambassador to the humans
#14
The reason it can't be reduced is because there are three categories. If there were only 2 categories we could reduce the formula slightly.
 

trinker

ggplot2orBust
#15
I played with the equation and this is the best I can represent it (simplest):

\(50(\frac{n_{f}-n_{c}}{N} + 1)\)

My previous formula did not account for the multiplying the proportions by 100 (to make percents)
 

trinker

ggplot2orBust
#16
This is what I'm thinking more formally:

\( F = 50(\frac{n_{f}-n_{c}}{N} + 1)\)

\( f = \left \{noun, adjective, preposition, article\right \}\)

\( c = \left \{pronoun, verb, adverb, interjection\right \}\)

\( N = \sum{(f + c + conjunctions)} \)