Using %cutpoint macro in SAS


I am trying to find out optimal cut-points for my continuous variable of smoking using an outcome based method. I came across a SAS macro "%cutpoint" being used for this. How ever I have no knowledge of using a macro in SAS. The code is

%cutpoint(dvar=, endpoint=, data=, trunc=, type=, range=, fe=, plot=plot, ngroups=,
padjust=, zoom=);
as given in the document Finding Optimal Cutpoints for Continuous Covariates
with Binary and Time-to-Event Outcomes

I opened my dataset in SAS (SAS 9.4) and ran the code using the explanation in the above document. But it gave me errors ( (WARNING: Apparent invocation of macro CUTPOINT not resolved. ERROR 180-322: Statement is not valid or it is used out of proper order). After searching on the internet, I come to know that I haven’t called the macro function. I need the program that runs the “cutpoint” macro and run it in SAS. Then call the macro. But I do not know how to perform these steps.

Can anyone please help with these steps.


Less is more. Stay pure. Stay poor.
I am taking a look at this. I agree, if one is not familiar with macros - things get confusing very easily.


Less is more. Stay pure. Stay poor.
Well I skimmed the article (pretty good) and was going to present an example, but I have not found a copy of the actual macro "cutpoint"? Do you have the macro code, not just the call portion from above?

I also found another post here on this, but I believe I reference a macro based on the macro, so it isn't at the location I listed (

It looks like you were interested in cutpoint not findcut, which is partially presented here:

Let me know your thoughts!
Hi hlsmith,
Thakyou very much for looking into this. I am actually a Stata user and use SAS only occassionally. So I had no idea what to do. or what the repository does.
My outcome is binary and that's why I am interested in cutpoint rather than findcut :).
Yeah I too had come across all the links you mentioned above, the findcut at mayo as well, but did not know what to do with it. As you mentioned, its their methods (P values, oR's correction) that got me interested in this. Do you think findcut can be modified and used for a binary outcome?
BTW, I also found another code but I do not know for which software it is.
Choose Cutpoints for Categorizing a Continuous Predictor

Thanks again
I downloaded findcut from the mayo website but couldn't find any mention of cutpoint (except as an additional parameter). As presumed, I couldn't use the findcut though as I don't have the time variable. Also, if you use Stata, I had stumbled up on 'cutpt' by Phil Clayton(CUTPT: Stata module for empirical estimation of cutpoint for a diagnostic test)which uses a nice balance of sensitivity and specificity under the ROC curve to find a cutpoint for continuous variables. I dont know if it's usable in my case. BUT the point is, when just played around with cutpt in my data, the optimal cutpoint it gave me for the continuous variable (smoking) was 25.7 packyears. From an RC spline graph I had created, it seemed like around 20 packyears is the point from which my graph starts to curve out. What do you think?
But I still feel the cutpoint macro is more attractive because of the top 10 cutpoint options they provide and scoring.


Less is more. Stay pure. Stay poor.
That looks like "R" code in the link you provided in post # 5 or S-pus. Yes, the Findcut seem irrelevant for many reasons.

I was getting ready to email the author, J. Mandrekar, since he is still at Mayo Clinic. Though I think I will pass. Reasons being the code is now at least 10 years old and I think the splines could probably be done in a better fashion with confidence bands now. Also, the code appeared not to incorporate covariates.

I would imagine since the code outputs chi-sq that it is also getting the ORs from that procedure as well.

This article caught my attention because I actually did most of this coding without a macro last week, but I looked at histograms instead of splines. Which I had wanted to do. My example included only a few possible cutpoints, so it was manageable to run all of the models without a macro. I also used logistic regression so that I could control for covariates.

I would be interested in the code if you get it.

Question: So on the spline, you are looking for a bimodal shape and potentially target the first valley or nadir? Also, what did you plot on your splines (X continuous variable, Y ORs)?
Ok, So I dont know what I did is correct or not but the data I am using is in the wide form. Its a case control study and outcome is binary. So i created a variable named time with value =1 (only one time point. And i ran the findcut macro. It gave me the output. 4 methods are given. Of these, Contal and O'Quigley Method Cox Model Hazard Ratio selected cutpoint at 30.9 packyears. Cox Model Wald P-value selected cutpoints range from 13.5 to 54 pack years (p value <0.0001). The False Discovery Rate method selected a range from 14 to 53. but the lowest p-value was = 0.0000246 and those were for 83 values of packyears between 23.5 and 47. This correlates with the spline graph as well(Attached is my spline graph). Let me know your thoughts on this. (I am not able to upload a stata (version 9) data file here).
Last edited:
I plotted continuous variable on the x axis and log odds ratios on the Y. In my graph, the log odds ratios increase up from 0, starts bending around 20 pack years, but still goes up till around 35 and then starts slowing down untill its starts flattening out around 70 pack years..


Less is more. Stay pure. Stay poor.
Do you have two potential distributions for pack-years? Typically you try to find a cutoff for two distributions for a continuous variable predicting binary outcome. So non-pregnant women have a hormone distribution and pregnant women have a different distribution. Then you find a plausible demarcation between the two distributions to predict outcome. I feeling like pack-years in your scenario has a linear relationship with outcome and is monotonic.

How did you get all of the ORs for the graph or did the Findcut generate it.

Also, you have to watch out when you eventually draft results, because you have an artificially contrived outcome prevalence (due to your study design). So certain statistics and your confidence intervals may not translate to other samples.

I am not sure about your tricking of the Findcut macro to get it to work. Will have to think about it.

By two two potential distribution you mean distribution of packyears among cases and controls (my outcome)? or related to a third variable? My final aim is interaction of the binary categorized pack years with another binary variable. Through splines, i am only able to stratify the graph based on the third variable.

I considered linear and various non linear forms for pack years. And the RC splien regression model with the present knot positions I chose had the lowest AIC. Also all non linear forms had better fit than the linear form (lr test). That's the reason I am using splines.

As I mentioned earlier, I use Stata rather than SAS. So I used a user written command named xbrcspline to generate point wise estimates which I plotted against the pack years.

Also what did you mean by "artificially contrived outcome prevalence ". My outcome is a rare. I understand that in a CaCo study, the base line risk is fixed.


Less is more. Stay pure. Stay poor.
Taking off for day. But meant two distribution that predict outcome. Also splines stratified by other variable seems appealing in finding an interaction (perhaps if lines cross).

Lastly, I an not well versed in spline models!
OhK. Perhaps tomorrow then. but do let me know what you meant by "artificially contrived outcome prevalence ". I am curious now. And thanks for the discussion.


Less is more. Stay pure. Stay poor.
A prevalence topic, but you probably already knew this:

Also, if you were calculating positive predictive values, which I don't think you are, those are influenced by prevalence.

Lastly, I would believe that prevalence would possibly influence confidence intervals. In that there could potentially be more or less observations in some groupings based on prevalences.

As for splines, do you list the number degrees for the number of changes in directions (e.g., polynomial degrees) and knots for the number of modal shapes?
I think what you are talking about is predicted probability (PP) rather than log odds ratios(coefficients). I cannot use PP because prevalence or baseline risk is fixed in a ca co study. Also PP depends on the constant term which is incorrect in a ca co study.Since mine is a Ca-co study, i cannot use predicted probabilities to plot my graph. I am using log odds ratios. after running my logistic regression with my spline variables, I create point wise estimates (log odds ratios) and plot it against the x variable. I hope this is what you were trying to warn me of.

For splines, listing number of knots is the straight forward option. But I fount out the knot positions that best fit my data (comparing different knot positions, running the regression model and comparing model fit) and listed the knot positions rather than number of knots. for example, Mine is a restricted cubic spline with 3 knots at 5th 50th and 95th percentiles of the distribution of my continuous variable (smoking, if not equal to 0)


Less is more. Stay pure. Stay poor.
Yes, that is comparable to what I was referencing. I get the presentation of logodds (beta coefficient).

I need to look into making splines with references to knot location, that was a helpful comment.


Less is more. Stay pure. Stay poor.
I did it in SAS, but the locations didn't really seem to move the knot locations.

My graph says it has five knots. What are the chi-sq values associated with the knots (see below). And is the 5th used as a reference? Thanks
I do not exactly understand your graph. What kinda spline did you use?And you got the estimates and created created marginal probabilities? I have no idea about the 5th position being used as reference. Did you specify the percentiles for knots or just ran the codes?


Less is more. Stay pure. Stay poor.
Thanks for your time. Yes, I probably need to better explain the content. This is from a logistic regression with a single continuous variable. Based on the outcome, I am hypothesizing that there are two slightly different distributions of the continuous variable - going on. I use the spline to help me confirm the best cutoff for later on dichotomization of the continuous variable. Though the spline is not as good as looking at numerical outputs or histograms, since there isn't a great threshold (cutpoint) between the two potential continuous distributions.

So I modeled the binary variable predicted by a fitted spline with 5 knots. I let the program find the knots. I get a c-statistic (AUC) for the variable, etc. Though I also get the outputed table above. I was think that the model used some type of piecewise regression - that it predicted the outcome based on 5 line segments (from the spline). Let me know if you have more questions.