Merge two staggered columns to one

ledzep

Point Mass at Zero
#1
Dear all,

I have two columns of data. Some info is on one column and some on the other. I want to merge those two columns to make a new one.

An example data will make things more clear.
Code:
col1  col2 
1      NA
1      NA
1       NA
NA     5
NA     5
NA     5
2       NA
2       NA
2       NA
NA    9
NA    9
I want to merge two columns to a new column and the end result will look like:

Code:
col1  col2   col3
1      NA     1 
1      NA     1
1       NA    1
NA     5      5
NA     5      5
NA     5      5
2       NA    2
2       NA    2
2       NA    2
NA    9       9
NA    9       9


For some reason if(statement) {statement1} else {statment2} is not working.

I have requested R to perform this task:
if col1 is NA then please create col3 based on the values of col2, else based on Col1 itself.

My Code
Code:
if(d.f$col1=="NA") {d.f$col3<-(d.f$col2)} else {d.f$col3<-(d.f$col1)}
Many Thanks.
 

Dason

Ambassador to the humans
#2
Is the data always numeric? If so you could use rowsum with na.rm = T to create the new column.

BTW it's better practice to use "is.na" then to compare to "NA".
 

ledzep

Point Mass at Zero
#3
Thanks for your prompt reply. The data is always numeric. rowsum won't work for me as I have formula inside the statements in my if else. I am not looking to add them (although by the example I've presented makes it look like I want to add these two columns).
I also tried converting NA to 99999 and then apply the if else argument, which didn't work.

Any clue?
 

Link

Ninja say what!?!
#4
are the columns just vectors, or are they columns in a dataframe or matrix?

One option is to just take the max (or min) over the two columns:

Code:
col3 <- pmax(col1, col2, na.rm=TRUE);
OR
col3 <- pmin(col1, col2, na.rm=TRUE)
 

Dason

Ambassador to the humans
#5
... but why do you need the ifelse? Are there cases where there are values in both columns? Then I could understand the concern. But if that's not the case then something like this should do just fine...
Code:
> dat <- data.frame(col1 = c(1,2,3,NA,NA,NA,2,4,2), col2 = c(NA,NA,NA,6,2,8,NA,NA,NA))
> dat$col3 <- rowSums(dat, na.rm = T)
> dat
  col1 col2 col3
1    1   NA    1
2    2   NA    2
3    3   NA    3
4   NA    6    6
5   NA    2    2
6   NA    8    8
7    2   NA    2
8    4   NA    4
9    2   NA    2
 

Dason

Ambassador to the humans
#7
There are a few reasons that your if/else code doesn't work. You should really think about these types of problems as indexing problems instead of trying to cobble together an if/else type situation.

Code:
> dat <- data.frame(col1 = c(1,2,3,NA,NA,NA,2,4,2), col2 = c(NA,NA,NA,6,2,8,NA,NA,1))
> dat$col3 <- dat$col1
> idx <- is.na(dat$col1)
> dat$col3[idx] <- dat$col2[idx]
> dat
  col1 col2 col3
1    1   NA    1
2    2   NA    2
3    3   NA    3
4   NA    6    6
5   NA    2    2
6   NA    8    8
7    2   NA    2
8    4   NA    4
9    2    1    2
On the topic of why your if/else doesn't work: 1) Comparing NA directly to "NA" isn't going to do what you want and 2) Does it make sense to say if(c(TRUE, FALSE, TRUE)){ ... }? Not really - it ignores everything except for the first value of the vector in the if() statement.
 

Link

Ninja say what!?!
#8
Here's a "clean" redo of your ifelse coding:

Code:
col3 <- col1
col3[is.na(col3)] <- col2[is.na(col3)]
I haven't tested it yet but it should work.
 

ledzep

Point Mass at Zero
#10
Thanks Link. That was a very neat code. You just saved my life here.
Many Thanks to both of you (Dason and Link). :)