Formats & Informats in SAS


Formats and informats are one of the most important PROCs in SAS, as it defines the appearance of SAS variable in ouputs of a dataset. They are used to group variables, without changing the internal data of input dataset. 

Informats are used to 'READ' a value in a particular manner or it generally tells SAS about what is the structure of datatype of an input dataset. It doesnot give any information about how SAS is going to write this data in output dataset. This information is defined by Formats, which will tell SAS how to 'WRITE' data in dataset.

There are many informats & formats pre-defined into SAS. Few of them are...

DOLLAR5. : These informats are used to read values like salary, revenue, profit etc., values which are having a '$' sign mentioned in data. '5.' which is mentioned after 'DOLLAR' informat is the length of variable. In order to write the values in same format as in raw datafile, we have to mention all formats with variable in a FORMAT statement. 

COMMA6.:  A COMMA informat is used to read values having a comma delimiter in data value. These values are generally for a numeric variable & in absence of the appropriate COMMA informat SAS will treate those numeric values as charcters & output a missing "." value in place of value.

DATE9.:  SAS has lots of Date formats which we will explain in detail in our future post related to Date type variable. For now just remember DATE9. is a format which is used to read date type variable like 12JUN1993.

MMDDYY10.: This informat/format is used to read/write a date variable which is like 10/06/1983. In its absence SAS will treat these values as character variable.

MMDDYY8.: This is used to read or write 10/06/83.

Let's write  SAS formats for a data file having data of employees.

EMP_ID   GENDER  SALARY 
101               M              23000
103               m              35000
104               Male          42500
110                F               32500
119               Female      39500


SAS gives liberty to user to create user defined formats & informats using PROC FORMAT & PROC INFORMAT.

We will define a fomat for the salary bracket of the company employees.

PROC FORMAT;
VALUE SAL_BCKT  LOW - <35000 = 'Less than 35k'
                                    35000 - <50000 = 'btw 35 - 50k'
                                    50000 - HIGH = 'Gtr than 50k'
                                    other   = 'Missing';
RUN;

Since Gender is defined in so many ways we have to create a format in order to bring all in a similar standard form.

PROC FORMAT;
VALUE  $GEN   'm','M', 'Male''MALE'
                             'f','F','Female' = 'FEMALE';
RUN;      

Now when we write a code to input this employee data file we will introduce these formats

DATA  EMP_SAL;
INPUT EMP_ID GENDER$ SALARY;
INFILE " <FILE-PATH>" DSD;
FORMAT SALARY SAL_BCKT. GENDER $GEN.;
RUN;

Output dataset EMP_SAL Contains following values.


SAS users will always want to use their customised Formats & it is a general practise to save all the formats together in a single library which when loaded to SAS, will bring all previously defined Formats & Informats to SAS.

A SAS Format default library is Work folder. It can be changed to a pre-determined location using Library option in PROC FORMAT statement.

PROC FORMAT  LIBRARY = mylib;

Now whenevr we use a Format statement, SAS will first look into default formats, then into work library, then any other user defined library. In order to increase system performance & save all the time SAS spends looking for user defined formats in other places. we have to set

OPTIONS FMTSEARCH (mylib);

Now SAS will start looking for Formats from mylib library.

If a user want to see definition of all the Formats defined in SAS, he can do so by giving fmtlib option into Proc format statement. All the definition will come in output window.

PROC FORMAT LIBRARY = mylib FMTLIB;
RUN;

A 'SELECT' statement  can be used in order to see a particular format definition.





No comments:

Post a Comment

My First SAS Program