Confusion with writing command for meta-analysis in STATA

I am a beginner in STATA and do not understand how to write syntax for the meta-analysis command using the variables in my research study. Any help would be so greatly appreciated!

My study:
I am trying to use meta-analysis to investigate how several different variables impact prevalence rates for personality disorder diagnosis. There are 30 studies in my analysis, which report the prevalence rates for 10 personality disorder diagnoses. I want to see how the prevalence rates are impacted by the following variables: gender of participants; country that the study was conducted in; method of sample selection; measure for diagnosis.
The examples for writing meta-analysis command in STATA that I have seen talk about comparing control groups and treatment groups and I just don't understand how to follow the steps to plug in my variables for this command.

Thank you for reading and any help you can offer!

PhD student in psychology
I believe what you need is to conduct a meta-regression analysis.

See -findit metareg- and install the user-written -metareg- program by Roger Harbord. Then type -help metareg- to review the syntax for metaregression.

To conduct the meta-regression, set the prevalence rate variable as the dependent variable, set any independent variables or covariates as independent variables, and then add the -metareg- option -wsse(SE)- where SE is the variable name of the standard errors of the prevalence rates.

An example of the -metareg- syntax is as follows:

metareg prev_rate gender country sampling_method diagnose_measure, wsse(SE)
After you run the metaregression, you can obtain the inverse-variance weighted, pooled prevalence rate, its standard error, z-test of the rate, and the 95% confidence interval with:

Thank you, RedOwl, this is such a huge help!
I've downloaded the metareg command and reviewed the syntax. I don't have the standard errors of the prevalence rates- is that an analysis that I am supposed to conduct first? How would I do that? The only data I have from each of the 30 studies is the sample size, prevalence rates for the 10 diagnoses, and then the independent variables that relate to study methodology (i.e., gender, sample selection, etc).

I so appreciate your assistance!!
Do you have 95% confidence intervals for each of the prevalence rates?
If so, you can estimate the standard errors from that information.
If you have no means of estimating the standard errors of the prevalence rates,
you could weight the studies based on the inverse of their sample sizes.

Assuming your sample size variable is named SS, you would do that as:

gen InvSampSize = 1 / SS
Then substitue InvSampSize for the SE in the -wsse- option of -metareg-. That actually
results in weighting the pooled effect size by the sample size, not by its inverse.

metareg prev_rate gender country sampling_method diagnose_measure, wsse(InvSampSize)
That is not a perfect solution, but it should provide a good approximation of the results you
seek. It assumes that studies with larger sample sizes tend to have more precise estimates
(i.e., smaller confidence intervals), and it gives the prevalence rates from those studies
relatively greater weight in estimating the pooled prevalence rate.

If I were conducting your study for publication, I would write to the original authors
and request the SEs or CIs for their prevalence rates.

By the way, if your studies are all of equal or approximately equal sample size I would do the
following, which equally weights the study effects in creating the pooled prevalence rate.

gen one = 1
metareg prev_rate gender country sampling_method diagnose_measure, wsse(one)
Last edited:
On further reflection, since prevalence rates are proportions, you could estimate
their standard errors with the following, assuming SS = sample size and
prev_rate = prevalence rate:

gen SE = sqrt((prev-rate * (1 - prev_rate))/SS)
Then you can conduct the metaregression and estimate the
pooled prevalence rate as:

metareg prev_rate gender country sampling_method diagnose_measure, wsse(SE)

By the way, in all of the above I assume you know how to handle the independent
variables in the model.

I assume gender is a binary variable, because it cannot be included as a string variable.

If country is a numeric variable with multiple values (i.e., a factor variable), you will need to
decompose it into a series of binary dummy variables (country1, country2, country3, etc.)
and then include all but one of those dummy variables in the -metareg- syntax. Similar issues
exist for sampling_method and diagnose_measure.

Also, be aware that the command -metareg- does not allow Stata's factor variable notation.
All of the right-hand side (RHS) variables need to be continuous or binary variables.
Last edited:
Thanks again for this extremely helpful information. I'm going to work on creating dummy variables for all the IV's I'm using, aside from gender. I will be sure to get back to you on how this goes. Many thanks, RedOwl!
I entered all the dummy variables and ran the metareg command as you instructed. But I'm confused reading the output because I don't understand how to determine the results for the dummy variables that I left out of the syntax. For example, for the variable measure, there were three categories: clinician, self-report, structured. I created yes/no dummy variables for each. I included two of the three dummy variables (clinician and self-report) and left out one as you instructed (structured). How would I determine the results for structured measures in the output? I apologize if I'm missing something obvious here!
Thanks for your continued help :)
OK, assume you have created binary dummy variables for
clinician, self-report, and structured where 1 means
the participant has that characteristic and 0 means
that the participant does not have that characteristic.

For any given participant, only one of the three dummy
variables can have a value of 1, and the other two
dummy variables must have values of 0.

So, if a participant has values of 0 on clinician and also
on self-report, then we know that that participant must
have a value of 1 for structured.

In regression, when we break a categorical variable into
a set of dummy variables, we include k-1 dummy variables.
That is, we must omit one of the dummy variables, because
that variable's value can be determined from the others.

In your meta-regression, you are predicting prevalence
rate. Assume you include dummy variables for self-report
and for structured. The prevalence rate you obtain when
solving the regression formula setting the values of
self-report and structured to 0 is the prevalence rate for
the ommitted dummy variable, clinician.

You can think of the coefficients for the two dummy
variables you include in the meta-regression as indicators
of how the prevalence rate changes for each as compared
to the omitted dummy variable. The omitted dummy
variable is just the baseline.

You can find more about this by googling "regression with
dummy variables."
That was a very helpful explanation. The output now makes sense to me, thank you! I feel bad to ask another question related to this thread but I think it will be my last one. I want to run a heterogeneity test, and I believe the correct one would be Hedges Q, so that I can drop any variables that do not have sufficient variance for analysis. How would I write the syntax for Hedges Q? I can't find this from doing a google search or looking at other threads in the forum here.
Add -tau2test- to the list of options in the -metareg- command line. That will give you
a test of Q for heterogeneity in the distribution of the effect sizes.

You will see an I-squared statistic in the output. I-square = 100 * (Q - df)/Q .
When I-square > 75%, you have high heterogeneity.

After running the metagression, run:

ereturn list
That will show you the Q, which is shown as e(Q). The df for Q is shown in
the ereturn results as e(df_Q).

By the way, you can also add -graph- to the list of options in the -metareg- command
line, but only if you have a single independent variable in a given model. That will produce
a scatterplot with overlaid fitted regression line and with the size of the observations varied
based on weight (i.e., influence based on inverse variance).

You should read the help file for -metareg- at:

help metareg
Last edited:
hi again,
i am in the final stages of reporting my results for this study. i am still a little confused about finding the q statistic i need to report from my output. i have copy and pasted an example of the output:

Meta-regression Number of obs = 23
REML estimate of between-study variance tau2 = 73.67
% residual variation due to heterogeneity I-squared_res = 100.00%
Proportion of between-study variance explained Adj R-squared = -3.94%
With Knapp-Hartung modification
schizotypal | Coef. Std. Err. t P>|t| [95% Conf. Interval]
region_US | -1.589325 3.889449 -0.41 0.687 -9.677877 6.499226
_cons | 6.393756 2.145722 2.98 0.007 1.931482 10.85603

. ereturn list

e(N) = 23
e(tau2) = 73.66574354080174
e(df_m) = 1
e(Q) = 1363269045.911021
e(df_Q) = 21
e(I2) = .9999999845958506
e(df_r) = 21
e(q_KH) = 1.000000368421232
e(remll) = -58.00442344053781
e(remll_c) = -681634421.6371268
e(chi2_c) = 1363268727.265407
e(tau2_0) = 70.87640975059107
e(F) = .1669742367781242

So in the above output, i see that the tau2 statistic would make sense to me as a test of heterogeneity. but the i-squared you mentioned as the statistic to look at does not make sense to me as it says 100%. Please advise if you have time to do help again on this! I appreciate all of your help!