How to handle unit quantities on invoices

#1
I am working with invoice data and analyzing thousands of prices and part numbers on invoices. I think I have to convert the data into individual data points to be able to sample it correctly and do calculations. Is there a way to convert or transform this data without writing a script to do it? And for what it is worth I am working with R.


Invoice line 1 , Qty = 3 , PartNumber = 6-280Z
Invoice line 2 , Qty = 2, PartNumber = ABC
Invoice line 3 , Qty = 1 PartNumber = 78EFG

converted....

1 , 6-280Z
1 , 6-280Z
1 , 6-280Z
1, ABC
1, ABC
1, 78EFG

Thank!:confused:
 

Dason

Ambassador to the humans
#2
Is the data already in R in that form? Or do you have a text file that looks like that?

Edit: Also I moved the thread to the R/S-plus subforum. Seems like the better place for it.
 

Dason

Ambassador to the humans
#4
Does this work?

Code:
dat <- read.csv("test.txt", header = F)
n <- as.numeric(unlist(strsplit(as.character(dat$V2),"="))[c(F,T)])
item <- unlist(strsplit(as.character(dat$V3), "="))[c(F,T)]
item <- gsub(" ", "", item)
rep(item,n)
The file I was using looked like this:
Code:
Invoice line 1 , Qty = 3 , PartNumber = 6-280Z
Invoice line 2 , Qty = 2, PartNumber = ABC
Invoice line 3 , Qty = 1, PartNumber = 78EFG
What it should look like after you use read.csv
Code:
> dat
               V1        V2                   V3
1 Invoice line 1   Qty = 3   PartNumber = 6-280Z
2 Invoice line 2    Qty = 2     PartNumber = ABC
3 Invoice line 3    Qty = 1   PartNumber = 78EFG
All I create is a vector with the specified number of part numbers. I don't really know what you wanted the "1"'s for in your original post (you have a column of 1s when you say what it should be like when it's converted) but you could probably add those yourself.
 

Dason

Ambassador to the humans
#6
Just starting with R you say? My code probably looks pretty cryptic in that case. You might want to go through it piece by piece and see what I'm doing in each step.

For example the line:
Code:
n <- as.numeric(unlist(strsplit(as.character(dat$V2),"="))[c(F,T)])
is really me starting with dat$V2.

Then I turn it from a factor variable into characters using as.character.

Then I use strsplit which allows you to split strings based on some criteria. I chose to split the string anywhere where a "=" was present.

The result is a list. According to how you specified the layout of the input file I expected that there should be two elements in each part of the list. There would be the part that says "Qty" and then the part that actually gives you the number you want. We want the second part. I use unlist to convert the entire list into a single vector. I know that I want every other element in the vector. Doing something like name_of_vector[c(F,T)] will give every second element of a vector as long as the vector is of even length. You might want to play around with that to see why that works.

The resulting vector is a vector of characters that contains the numbers we want. We actually want to use the numbers, not characters that represent the numbers so I apply as.numeric to convert the strings to numbers.