# preferred equation display

#### trinker

##### ggplot2orBust
I'm looking for people's opinions on the best way to present the same equation.Below are a series of differently formatted equations. Which equation would be preferred in a publication and why?

$$F1 = \frac{\frac{\sum{f_{i}}}{N} - \frac{\sum{c_{i}}}{N}+ 100}{2}$$

$$F2 = \frac{\sum{f_{i}}}{2N} - \frac{\sum{c_{i}}}{2N}+ 50$$

$$F3 = \frac{\sum{f_{i}} - \sum{c_{i}}}{2N}+ 50$$

$$F4 = \frac{\sum\limits_{i=1}^{n}{f_{i}} - \sum \limits_{i=1}^{n}{c_{i}}}{2N}+ 50$$

$$F5 = \frac{1}{2N}\sum\limits_{i=1}^{n}{f_{i}} - \frac{1}{2N}\sum \limits_{i=1}^{n}{c_{i}}+ 50$$

I've labeled them F1 through F5 to distinguish them.

Thank you in advance.

#### vinux

##### Dark Knight
I would choose something more readable. So I may not choose F1( F1 is not bad). I guess the important part is one should follow consistent style. Different Journals have different preference of aesthetics in the equations.

You could also use slant line (/) for fractions if the denominator is small ( I have seen this in JASA journals).

#### Dason

##### Ambassador to the humans
What do n, N, f_i, c_i stand for?

#### duskstar

##### New Member
If you are talking about pretty-ness factor only, I like the last two best. I have no real logic behind this decision except they are pleasing to the eye and not so cramped they are impossible to read.

#### vinux

##### Dark Knight

$$F6 = \frac{1}{2N} \left ( \sum\limits_{i=1}^{n}{f_{i}} - \sum \limits_{i=1}^{n}{c_{i}} \right )+ 50$$

or
$$F7 = \frac{1}{2N} \sum\limits_{i=1}^{n} ({f_{i}} - {c_{i}})+50$$
or
$$F8 = \sum\limits_{i=1}^{n} \frac{({f_{i}} - {c_{i}})}{2N} +50$$

Last edited:

#### trinker

##### ggplot2orBust
n is the number of items that are f (formal) or on the c summation the number of items that are contextual. N is the total number of words used. f_i, c_i are the individual items that are formal or contextual. I may not have the notation correct as the original paper did a pretty poor job of writing the equation mathematically.

I actually really like vinux's equation.

#### Dason

##### Ambassador to the humans
But are f_i and c_i just indicator functions? The problem I have right now is that your notation isn't very clear. I'm thinking it might just be cleaner to define $$n_f$$ as the number of words that are formal and $$n_c$$ be the number of words that are contextual. Then you reduce the whole thing down to $$\frac{n_f - n_c}{2N} + 50$$ which gets rid of the summations. This could probably be cleaned up a little too but if this accurately represents the quantity of interest then I think it conveys the meaning much more clearly.

#### trinker

##### ggplot2orBust
I agree Dason, I didn't know I could do that as I'm not a math back ground. Particularly to other people in my field this equation is much more clear.

#### GretaGarbo

##### Human
I agree with Dason. But since I tend to think o “n” as something that is fixed in advanced and not a random variable, I would prefer to substitute Dasons “n_f” simply with “f”. Thus letting f = total number of formal and c total number contextual.

Thus ((f-c)/2N) + 50

Then I am starting to speculate about the how the underlying proportion of p:s might vary. Maybe as a Kalman filter and even varying variance like in garch models. (And thereby I have been influenced by Trinker in being more contextual and less formal in using such an expression as garch.)

Edit: I never really saw how the terms were defined so I could have completely misunderstood it all and been talking in the blue.

Last edited:

#### Dason

##### Ambassador to the humans
Is that right though? I mean if $$n_f$$ is always less than or equal to N then the upper bound is 50.5 (and by similar reasoning the lower bound is 49.5).

#### vinux

##### Dark Knight
Again I was looking at mechanically. Simplified F6 further. I have added F7 and F8.

#### trinker

##### ggplot2orBust
Here's their original formula. I was trying to make it more pleasing and using N rather than having to stat that the frequencies are percentages ( guess what I did in R was multiply by 100 so Dason that allwos it to vary 0-100):

F = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun freq. – verb freq. – adverb freq. – interjection freq. + 100)/2

FRANCIS HEYLIGHEN and JEAN-MARC DEWAELE (2002) said:
If we add up the frequencies of the formal categories,
subtract the frequencies of the deictic categories and normalize to 100, we get a
measure which will always increase with an increase of formality. This leads us to the
following simple formula:

F = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun
freq. – verb freq. – adverb freq. – interjection freq. + 100)/2

The frequencies are here expressed as percentages of the number of words belonging to
a particular category with respect to the total number of words in the excerpt. F will
then vary between 0 and 100% (but obviously never reach these limits). The more
formal the language excerpt, the higher the value of F is expected to be.
Maybe I'm miscalculating/misrepresenting it but it should be able to vary between 0 and 100 (though it will never actually reach those bounds)

Here's an R example of how to calculate it:

Code:
fi <-c(10, 20, 12, 5)   #formal     (the 4 categories)
ci <- c(12, 13, 15, 3)  #contextual (the 4 categories)
oi <- 6

N <- sum(fi) + sum(ci) + oi

((100*sum(fi/N) - sum(100*ci/N)) + 100)/2

#more contextual
fi <-c(10, 20, 12, 5)   #formal     (the 4 categories)
ci <- c(32, 43, 15, 3)  #contextual (the 4 categories)
oi <- 6

N <- sum(fi) + sum(ci) + oi

((100*sum(fi/N) - sum(100*ci/N)) + 100)/2

#more formal
fi <-c(50, 30, 62, 25)   #formal     (the 4 categories)
ci <- c(32, 43, 15, 3)  #contextual (the 4 categories)
oi <- 6

N <- sum(fi) + sum(ci) + oi

((100*sum(fi/N) - sum(100*ci/N)) + 100)/2
I'm looking for a simple an eloquent equation.

Cant you just take the proportion 100*sum(fi/N) ? (formal proportion)
in the chatbox...

Code:
#more formal
fi <-c(50, 30, 62, 25)   #formal     (the 4 categories)
ci <- c(32, 43, 15, 3)  #contextual (the 4 categories)
oi <- 6

N <- sum(fi) + sum(ci) + oi

> sum(fi/N)*100
[1] 62.78195
>
> ((100*sum(fi/N) - sum(100*ci/N)) + 100)/2
[1] 63.90977
>
They give slightly different outputs. So how can I make their poorly formatted equation look more mathematical?

#### Dason

##### Ambassador to the humans
The reason it can't be reduced is because there are three categories. If there were only 2 categories we could reduce the formula slightly.

#### trinker

##### ggplot2orBust
I played with the equation and this is the best I can represent it (simplest):

$$50(\frac{n_{f}-n_{c}}{N} + 1)$$

My previous formula did not account for the multiplying the proportions by 100 (to make percents)

#### trinker

##### ggplot2orBust
This is what I'm thinking more formally:

$$F = 50(\frac{n_{f}-n_{c}}{N} + 1)$$

$$f = \left \{noun, adjective, preposition, article\right \}$$

$$c = \left \{pronoun, verb, adverb, interjection\right \}$$

$$N = \sum{(f + c + conjunctions)}$$