how to normalize demand/availability matrix for Citibike data

#1
I am not a statistician but would appreciate an outside perspective on my current project analyzing citibike data. This is a bit complicated so please bear with me.

My goal is to determine to what extent bikes are delivered to stations when they need them, that is, the the average amount of deliveries each station receives per hour divided by the average number of instances per hour that station is empty.

I have created both matrices. First the one with amount of empty instances per hour per station:

<pre>
> head(empty)
cb_id 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
1 72 1 0 0 0 0 0 0 0 0 0 0 0 23 26 33 22 30 26 4 4 25 17 3 0 0
2 79 22 40 42 35 21 26 31 36 29 6 14 19 3 0 0 0 0 0 0 0 0 1 3 16 0
3 82 0 0 0 0 0 0 0 0 0 0 0 0 3 14 19 6 20 33 45 32 22 13 1 5 4
4 83 5 1 0 3 6 6 6 6 6 6 1 3 6 4 0 2 0 0 1 7 8 7 7 6 5
5 116 19 18 23 12 12 5 0 0 2 5 5 12 19 13 11 3 0 8 10 7 11 29 24 15 3
6 119 0 0 0 5 6 6 6 6 6 6 6 11 15 7 8 12 12 17 15 6 6 7 5 0 2
</pre>


And second, a matrix with the sum of bikes delivered per hour per station:

<pre>

> head(deliveries)
id 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
1 72 3 1 2 8 3 1 0 7 12 17 23 15 17 24 20 16 24 23 15 11 11 8 2 5
2 79 1 0 1 3 1 0 0 2 4 4 13 12 18 12 8 10 6 8 11 8 4 4 0 1
3 82 0 1 4 5 1 0 0 1 0 0 6 6 7 7 11 7 12 10 6 0 2 1 4 1
4 83 0 0 1 1 2 0 1 1 4 2 6 5 9 4 8 4 16 13 11 11 5 7 0 0
5 116 1 2 3 3 3 1 3 10 28 36 21 23 26 33 30 18 26 43 44 26 19 16 4 1
6 119 0 0 1 0 0 0 0 1 0 0 1 1 2 0 0 0 0 1 0 0 0 0 0 1
</pre>


My ultimate objective is to give each station an hourly rating. However, I have several question about how to do this and I also have another variable that needs to be integrated into the equation .

Firstly, I would like to weight the first matrix (empty instances per hour) by demand, because it doesn't really matter if a bike station is empty in the middle of the night and there is nobody taking bikes from it. So, what I have is a matrix of hourly outgoing trips per hour per station:

<pre>
> head(demand)
id 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
1 72 25 9 14 3 10 10 28 175 406 230 155 151 202 167 179 185 275 298 280 185 110 84 93 51
2 79 36 17 9 3 2 7 32 88 110 131 89 125 149 165 161 147 178 309 339 201 115 78 67 39
3 82 10 3 5 10 0 11 15 58 129 110 49 62 62 100 70 73 72 86 116 61 49 37 26 22
4 83 24 15 10 5 3 4 39 53 108 98 80 118 116 110 135 158 157 196 176 132 118 94 91 102
5 116 40 45 15 9 16 37 75 205 497 527 362 287 316 353 359 309 365 653 598 468 328 242 168 102
6 119 0 0 1 2 0 0 11 56 26 12 21 6 27 15 18 5 14 19 25 6 4 0 1 0

</pre>


How would one go about weighting the first matrix (empty) by the demand matrix?


Secondly, how does normalize the data to deal with all of the zeros involved?

Once I have a weighted matrix (empty), how would I come up with a per station per hour score ?

Again, my objective is to give a rating to stations so that the ones that receive more bikes per hour where there are more empty instances have the highest ratings.


So far, I have tried the `scale` function in R but it produces normalized values in the -1 to 1 range, which I am not sure I can weigh by demand. Any advice would be appreciated

<pre>
> head(scale(empty))
0 1 2 3 4 5 6 7 8 9
[1,] -0.72129601 -0.7601973 -0.74410995 -0.7147783 -0.6839001 -0.6675647 -0.6627556 -0.6502253 -0.6521588 -0.6417624
[2,] 0.02349643 0.6102992 0.70318929 0.5306350 0.1083441 0.3120011 0.5319629 0.7760114 0.5522260 -0.3702965
[3,] -0.75676232 -0.7601973 -0.74410995 -0.7147783 -0.6839001 -0.6675647 -0.6627556 -0.6502253 -0.6521588 -0.6417624
[4,] -0.57943078 -0.7259349 -0.74410995 -0.6080285 -0.4575446 -0.4415110 -0.4315198 -0.4125191 -0.4029757 -0.3702965
[5,] -0.08290249 -0.1434739 0.04845868 -0.2877794 -0.2311891 -0.4791867 -0.6627556 -0.6502253 -0.5690977 -0.4155408
[6,] -0.75676232 -0.7601973 -0.74410995 -0.5368621 -0.4575446 -0.4415110 -0.4315198 -0.4125191 -0.4029757 -0.3702965
10 11 12 13 14 15 16 17 18 19
[1,] -0.6270548 -0.61293290 0.169758386 -0.0757114 0.1131819 -0.1698997 0.2507940 0.2564759 -0.6172263 -0.5890485
[2,] 0.1468310 0.26022362 -0.646510337 -0.8927267 -0.8737715 -0.8747281 -0.8763701 -0.8497344 -0.8011727 -0.7915685
[3,] -0.6270548 -0.61293290 -0.646510337 -0.4527954 -0.3055256 -0.6825021 -0.1249274 0.5543018 1.2682241 0.8285917
[4,] -0.5717773 -0.47506608 -0.524070029 -0.7670321 -0.8737715 -0.8106528 -0.8763701 -0.8497344 -0.7551861 -0.4371585
[5,] -0.3506670 -0.06146562 0.006504641 -0.4842191 -0.5447870 -0.7786151 -0.8763701 -0.5093620 -0.3413068 -0.4371585
[6,] -0.2953895 -0.10742123 -0.156749103 -0.6727611 -0.6345101 -0.4902762 -0.4255045 -0.1264430 -0.1113738 -0.4877885
20 21 22 23
[1,] 0.6108913 0.0184376 -0.6515502 -0.7400978
[2,] -0.8036182 -0.7953597 -0.6515502 -0.2264336
[3,] 0.4411501 -0.1850117 -0.7239196 -0.5795778
[4,] -0.3509752 -0.4901857 -0.5068115 -0.5474738
[5,] -0.1812340 0.6287856 0.1083279 -0.2585376
[6,] -0.4641359 -0.4901857 -0.5791809 -0.7400978

</pre>




Thanks in advance.
 
Last edited: