voiding data set merging problems when by-variable has different lengths.

Ana

New Member
#1
When merging 2 tables on a common by-variable you run the risk of having a different length on this (or these) by variable(s). If this is the case, SAS will casually inform you with the following warning:



WARNING: Multiple lengths were specified for the BY variable mergevar by input data sets. This may cause unexpected results.



On first sight you might be inclined to ignore this pretty warning but SAS will do exactly as it states: it may cause unexpected results. In some cases the merge might be successful, other times the merge is partial or just not executed at all, leaving you with an empty table.



Take the following example:



Code:
DATA table1;

          length mergevar $10;

          mergevar = "blabla";

          table1varIwant = "interesting stuff";

RUN;

 

DATA table2;

          length mergevar $11;

          mergevar = "blabla";

          table2varIwant = "more interesting stuff";

RUN;

 

PROC SORT DATA=table1;

          BY mergevar;

RUN;

 

PROC SORT DATA=table2;

          BY mergevar;

RUN;

 

DATA combinedtable;

          MERGE table1 table2;

          BY mergevar;

RUN;

The warning will be shown in the log.



WARNING: Multiple lengths were specified for the BY variable mergevar by input data sets. This may cause unexpected results.



Hopefully you will get the correct result but if you have a lot of observations to merge on, changes are… you won’t.



One way to solve this is by first determining the longest length for the ‘mergevar’ and using that information in a new length statement just before the actual merge.







The PROC SQL puts the longest length in a macro variable called ‘maxlength’.





Code:
PROC SQL;

          SELECT max(length) INTO :maxlength

                   FROM sashelp.vcolumn

                             WHERE libname='WORK' 

                                      and memname in

                                      ("table1", "table2")

                                      and upcase(name)="mergevar";

QUIT;

This macrovariable is than used to set the LENGTH of the mergevar.



Code:
DATA combinedtable;

          LENGTH mergevar $ &maxlength;

          MERGE table1 table2;

          BY mergevar;

RUN;


This merge runs without warning and you should have the expected result.

Time to celebrate!