calculate (column-wise) how many entries are larger than the first one

gianmarco

TS Contributor
#1
calculate (column-wise) how many entries are larger than the first ones

Hello,
I have the following small dataframe (11 rows, 3 columns). What I would like to accomplish is to get a new dataframe (1 row, 3 columns) in which I can store the result of the following:
for each column, how many values (those stored between the 2nd and last row) are larger than those stored in the first entry?

I anticipate that apply() should be used, but I am stuck on how to set the arguments in their proper way.

Thanks for any elucidation.

dataset example
Code:
           V1          V2           V3
1  0.07475911 0.010017181 0.0004135741
2  0.04871367 0.007738233 0.0010774013
3  0.03784803 0.026359263 0.0085449876
4  0.04610366 0.033958232 0.0036214020
5  0.08131207 0.012650076 0.0005890652
6  0.03606262 0.020054940 0.0057157420
7  0.02695144 0.006757635 0.0028437977
8  0.03308447 0.009096489 0.0002031851
9  0.04429150 0.014330313 0.0021478636
10 0.05791205 0.033364729 0.0006484988
11 0.02731512 0.018663906 0.0011334852
datset code
Code:
mydata <- structure(list(V1 = c(0.0747591058857557, 0.0487136696984259, 
0.0378480340645289, 0.0461036616401223, 0.0813120688124586, 0.0360626171041528, 
0.0269514388399552, 0.033084469153666, 0.044291495248143, 0.0579120542053213, 
0.0273151151084487), V2 = c(0.0100171805122288, 0.00773823273073306, 
0.026359263461461, 0.0339582323053759, 0.0126500756861239, 0.0200549398881861, 
0.0067576348865556, 0.00909648913102188, 0.0143303128004088, 
0.0333647291889855, 0.0186639060879541), V3 = c(0.000413574079856227, 
0.00107740125669744, 0.00854498758739067, 0.00362140200519855, 
0.000589065216804203, 0.00571574198004087, 0.00284379768765823, 
0.00020318505348539, 0.00214786356414398, 0.000648498838053301, 
0.00113348520263523)), .Names = c("V1", "V2", "V3"), row.names = c(NA, 
-11L), class = "data.frame")
 
#2
The following subsets and expands the first row to an equal length of the data.frame less the first row and compares the two then uses colSums() to tally the greater numbers.

Code:
colSums(mydata[-1,] > mydata[1,][col(mydata[-1,])])
V1 V2 V3 
1  7  9
 

trinker

ggplot2orBust
#3
Thought it might be fun to throw up a tidyverse & data.table solution as well:

Code:
library(tidyverse)

mydata %>%
    summarize_all(.funs = function(x) sum(x[1] < x[-1]))

##   V1 V2 V3
## 1  1  7  9

library(data.table)

as.data.table(mydata)[, lapply(.SD, function(x) {sum(x[1] < x[-1])})]

##    V1 V2 V3
## 1:  1  7  9
and the original apply method you asked about (though the colSums approach is faster):

Code:
apply(mydata, 2, function(x) sum(x[1] < x[-1]))