SAS: PROC SQL (How to cope w/ duplicate variable names)

2017/02/02

This is the article how duplicated variables for both datasets should be dealt with merging. In the previous article (LINK), Proq SQL has problems for right join and full join.
The issue is that an observation in the second dataset on SELECT clause will become missing in the created new dataset if the observation is missing in the first dataset on SELECT clause.

To deal with this issue, Jorgensen describes that there are THREE approaches to handling the issue of incoming datasets having variables with the same name. In this article, I will explain the third bullet.

Always select the value from one dataset
Keep the variables from both datasets as separate variables in the output dataset
Assign the value of the output variable based on the values of the incoming variables

The #3 is popular because it is possible to extract no matter which incoming variable is non-missing or dose not have invalid value (Jorgensen). The grammar for #3 has two ways:

The CASE statement: PROC SQL's approach of allowing conditional assignment of values.
The COALESCE statement: A function that allows the selection of the first non-missing value in a list of variables.

REFERENCE:
Jorgensen G. 'PROC SQL: Tips and Translations for Data Step Users.' NESUG 2009. PDF File

SAS: PROC SQL (How to cope w/ duplicate variable names)

Search This Blog

Report Abuse