I have been working on this for about 5 days, and gotten to the point where my brain is going in circles. Hopefully you can help me understand what type of analysis this requires.

I am conducting what should be a ‘simple’ analysis of count data of a rare condition/outcome. The data I have to work with is simply counts by year. I have been asked to identify whether or not this condition is increasing. No covariates, I have 11 observations (11 years).

My first step was to take a look at the data graphically. Using graphs I discovered one outlier in the third year, however having few data points and not being asked to make any predictions, I am hoping to leave it in for now (It is in the same direction, just more extreme, see graph at bottom of pdf). I created two and three year moving averages, graphed each, and am using the three year moving average.

Using SPSS I played with the curve estimation function with time, with the original data, two year moving average, and three year moving average. I found that a quadratic function fit the data best, a linear function not at all (significant, rsquare=.85). (See bottom of attached pdf for pretty graph with two lines of fit and confidence intervals from SPSS curve estimation). Double checked this in excel curve fitting as well to be on the safe side. The count data is not significantly skewed or kurtotic.

However I understand the spss curve fitting analysis may not be appropriate due to being count data, autocorrelation (does this apply for independent health events? These would be different people each year) etc. Also likely the confidence intervals would be off?

This took me on a wild goose chase through various high level tend analysis texts which gave me a headache. I am likely over-thinking things at this point as what I need is a very simple p-value to attach saying yes this is a quadratic relationship, that uses an appropriate (or at least acceptable to reviewers!) test.

Eventually I settled back down into Poisson and negative binomial models, both of which I am attempting to learn. I created a quadratic variable of the three year moving average count (count squared). Not being certain about the dispersion in the data (I have not tested this before, am working on it) I conducted a negative binomial model on the data (attached). Getting turned around, so I hope I have it right.

The issue here is that while the test of model effects is significant, the omnibus test is not? The confidence interval also crosses zero. It looks as though this model is a bust, which lead me to think that I am doing something wrong.

Not sure what my next step should be at this point. I thought I should check in case I am barking up the wrong tree entirely.

Questions:

1) Am I correct in that the curve estimation is a good first step, but would not be appropriate as a ‘final answer’ for obtaining a p-value for a shape of a count curve?

2) Is negative binomial analysis what I am looking for here?

3) If so, what would be my next step?

So many grateful thanks for any help you can give me, even if it is simply directing me to resources!

I am using SPSS 22. Reasonably familiar with other regressions (more Logistic, and more cox survival analysis).

I tried to get the samples online but I kept getting an error message saying "By variable not sorted properly" when I did sort but I suspect not the second variable.

The table is basically list of patients with their bed location transfer. In the same nursing unit the patient can switch beds. I am trying to get their total length of stay at each nursing units during the visit. The purpose is to filter out patients who stayed less than 48 hours if the first nursing unit after they arrived to the hospital was "A".

Here is the sample of the table with DayStayed is the result that I want to achieve.

ID Admission Date CheckinDateTime CheckoutDatetime NursingUnit DayStayed

1111 Jan 1, 2014 8:00AM Jan 1, 2014 8:00AM Jan 2, 8:00 AM A 1

1111 Jan 1, 2014 8:00AM Jan 2, 2014 8:00AM Jan 3, 2014 8:00AM A 2

Here is the code:

Code:

` `

proc sort data=noSignoff2;

by ID CheckinDateTime ;

run;

data noSignoff2A;

set noSignoff2;

by ID nursingunit;

if first.ID and first.nursingunit then DayStayed=0;

DayStayed+ LOSInUnit;

*Note: LOSInUnit is a field I created to calculated days difference between checkindatetime and checkoutdatetime;

if last.ID and last.nursingunit then output;

drop DayStayed;

run;

I have two questions regarding running for loop

1) How to tidy up the output

2) Speeding up the time taken to execute

Code:

`install.packages("kappaSize")`

require(kappaSize)

p<-c(seq(0.81,0.99,0.05))

q<-c(0.7,0.8,0.9)

# Start counting time

ptm <- proc.time()

p<-c(seq(0.81,0.99,0.05))

q<-c(0.7,0.8,0.9)

for(i in 1:length(p)){

for(j in 1:length(q)){

out<-CIBinary(p[i], 0.8, kappaU=NA, props=q[j], raters=2, alpha=0.05)

k<-cbind(k0=out[1], props=out[4],sample=out[8])

print(k)

}

}

# End Clock

proc.time() - ptm

Code:

` k0 props sample`

kappa0 0.81 0.7 11577

k0 props sample

kappa0 0.81 0.8 15157

k0 props sample

kappa0 0.81 0.9 26852

k0 props sample

kappa0 0.86 0.7 322

k0 props sample

kappa0 0.86 0.8 422

k0 props sample

kappa0 0.86 0.9 746

k0 props sample

kappa0 0.91 0.7 96

k0 props sample

kappa0 0.91 0.8 126

k0 props sample

kappa0 0.91 0.9 222

k0 props sample

kappa0 0.96 0.7 46

k0 props sample

kappa0 0.96 0.8 60

k0 props sample

kappa0 0.96 0.9 105

Code:

`> proc.time() - ptm`

user system elapsed

3.70 0.04 3.76

1) Is there a faster way or running it?

2) Is there a nice way of making the output tidier i.e. the header is outputted only once? possibly using some wrapper functions?

Thnx ]]>