Hello,
I have an assignment that requires me to create a function that calculates the correlations between two columns of 332 csv files based on my input. In short, what it does is find correlations between two columns of only the files that have either the same or more of complete cases as the threshold. My code is as follows:
corr <- function(directory, threshold = 0){
setwd(file.path(getwd(), directory))
a <- list.files(getwd())
i <- 1
while(i <= length(a)){
b <- read.table(a, header = TRUE, sep = ",")
if(sum(as.numeric(complete.cases(b))) >= threshold & exists("f")){
tf <- cor(b[, 2], b[, 3], use = "pairwise.complete.obs")
f <- cbind(f, tf)
rm(tf)
}
if(sum(as.numeric(complete.cases(b))) >= threshold & !exists("f")){
f <- cor(b[, 2], b[, 3], use = "pairwise.complete.obs")
}
i <- i+1
}
f
setwd(...)
}
The code works fine but as part of a follow up question I'm supposed to run the following bit of code:
cr <- corr("specdata")
cr <- sort(cr)
set.seed(868)
out <- round(cr[sample(length(cr), 5)], 4)
print(out)
When I try to run the "out" bit it produces the following message:
"cannot take a sample larger than the population when 'replace = FALSE'"
Clearly the problem lies in that part but I have no idea how to solve the problem. Can anybody point me in the direction as to why this is screwing up?
I would greatly appreciate it.
I have an assignment that requires me to create a function that calculates the correlations between two columns of 332 csv files based on my input. In short, what it does is find correlations between two columns of only the files that have either the same or more of complete cases as the threshold. My code is as follows:
corr <- function(directory, threshold = 0){
setwd(file.path(getwd(), directory))
a <- list.files(getwd())
i <- 1
while(i <= length(a)){
b <- read.table(a, header = TRUE, sep = ",")
if(sum(as.numeric(complete.cases(b))) >= threshold & exists("f")){
tf <- cor(b[, 2], b[, 3], use = "pairwise.complete.obs")
f <- cbind(f, tf)
rm(tf)
}
if(sum(as.numeric(complete.cases(b))) >= threshold & !exists("f")){
f <- cor(b[, 2], b[, 3], use = "pairwise.complete.obs")
}
i <- i+1
}
f
setwd(...)
}
The code works fine but as part of a follow up question I'm supposed to run the following bit of code:
cr <- corr("specdata")
cr <- sort(cr)
set.seed(868)
out <- round(cr[sample(length(cr), 5)], 4)
print(out)
When I try to run the "out" bit it produces the following message:
"cannot take a sample larger than the population when 'replace = FALSE'"
Clearly the problem lies in that part but I have no idea how to solve the problem. Can anybody point me in the direction as to why this is screwing up?
I would greatly appreciate it.