Steps involved in Code processing in SAS

This post is an attempt to answer those questions like what exactly happen when SAS runs a code or What happens at the back-end, when a user writes a code in SAS programming window & submit or run the program. Understanding this back-end SAS processing helps a programmer to know how SAS run & sometimes make their life easy by taking advantage of execution process.

SAS process any submitted code or data in 2 steps.

Compile Stage: In this stage SAS performs following tasks
1. In compile stage SAS assigns area in memory to store dataset called input buffer
2. SAS checks for input file & determines various variable attributes ( i.e. datatype, length etc.)
3. Reads code for any invalid syntax, errors & determines names of variable.
4. A descriptor portion is formed which will store all information related to variables such as variable name, datatype, length, label, default format & informat etc.

During compile state, SAS doesnot read any data from input file & doesnot evaluate any logical or condtional loops or statement. A reserved memory called as program data vector is also created to store all information about variables, data step, errors etc. Then SAS starts checking input data code & if any variable is assigned in between, it checks for its datatype, name & assign it a place in descriptor portion. 

Tip: We can declare the length for different variables before an input statement. This length will be then stored in descriptor portion as default length. 

Lets' consider a small program for salary of employees.

DATA EMP_SAL;
INPUT EMP_ID EMP_NAM$ AGE GENDER$ SALARY;
SALPERAGE = SALARY/AGE;
DATALINES;
101 AJAY 30 M 30000
102 MANI 28 F 28500
103 SAHIL 32 M 35000
;
RUN;

After submitting above program, SAS will first create a descriptor portion, which will store all attributes of variables. Descriptor portion will look like this

Descriptor Portion:

EMP_ID
EMP_NAM
AGE
GENDER
SALARY
SALPERAGE
NUMERIC (8BYTES) 
Format 12.
Informat 12.
 CHARACTER
 (8BYTES)
Format $8.
Informat $8.
NUMERIC (8BYTES)
Format 12.
Informat 12. 
CHARACTER (8BYTES)
Format $8.
Informat $8. 
NUMERIC (8BYTES)
Format 12.
Informat 12. 
NUMERIC (8BYTES)
Format 12.
Informat 12.

As you can see SAS has allocated default memories to each variable in input buffer. 

Now SAS is ready to run it's second stage of processing code.

Execution Stage: In excution stage,  all values are set to missing or no value. In SAS numeric missing value is denoted by 'PERIOD' i.e. "." & a character missing value is denoted by 'BLANK SPACE'  i.e. "  ".SAS starts with intially setting all variable values to missing & this happens every time SAS reads a new line of data. An internal pointer keeps a track of current record executed. SAS will keep on running or executing till it reaches an end of file marker.

Program Data Vector:  _n_ =1


EMP_ID
EMP_NAM
AGE
GENDER
SALARY
SALPERAGE
NUMERIC (8BYTES) 
Format 12.
Informat 12.
 CHARACTER
 (8BYTES)
Format $8.
Informat $8.
NUMERIC (8BYTES)
Format 12.
Informat 12. 
CHARACTER (8BYTES)
Format $8.
Informat $8. 
NUMERIC (8BYTES)
Format 12.
Informat 12. 
NUMERIC (8BYTES)
Format 12.
Informat 12.
 .

. 

. 
. 


When SAS executes first obs or line of data in above program

Program Data Vector: _n_ =1
EMP_ID
EMP_NAM
AGE
GENDER
SALARY
SALPERAGE
NUMERIC (8BYTES) 
Format 12.
Informat 12.
 CHARACTER
 (8BYTES) 
Format $8.
Informat $8.
NUMERIC (8BYTES)
Format 12.
Informat 12. 
CHARACTER (8BYTES)
Format $8.
Informat $8. 
NUMERIC (8BYTES)
Format 12.
Informat 12. 
NUMERIC (8BYTES)
Format 12.
Informat 12.
 101
AJAY
30
M
30000
. 

Then it calculates SALPERAGE variable & put this value in Program data Vector

Program Data Vector: _n_ =1
EMP_ID
EMP_NAM
AGE
GENDER
SALARY
SALPERAGE
NUMERIC (8BYTES) 
Format 12.
Informat 12.
 CHARACTER
 (8BYTES) 
Format $8.
Informat $8.
NUMERIC (8BYTES)
Format 12.
Informat 12. 
CHARACTER (8BYTES)
Format $8.
Informat $8. 
NUMERIC (8BYTES)
Format 12.
Informat 12. 
NUMERIC (8BYTES)
Format 12.
Informat 12.
 101
AJAY
30 
M
30000 
1000 

Once SAS reaches a run statement, it save data in input buffer or dataset & again turns back to dataline statement. Internal pointer for program is set to 2 & all values of variables in Program Data Vector is set to missing and SAS is ready to execute second line of data.

Program Data Vector: _n_ =2
EMP_ID
EMP_NAM
AGE
GENDER
SALARY
SALPERAGE
NUMERIC (8BYTES) 
Format 12.
Informat 12.
 CHARACTER
 (8BYTES) 
Format $8.
Informat $8.
NUMERIC (8BYTES)
Format 12.
Informat 12. 
CHARACTER (8BYTES)
Format $8.
Informat $8. 
NUMERIC (8BYTES)
Format 12.
Informat 12. 
NUMERIC (8BYTES)
Format 12.
Informat 12.
 .

. 

. 
. 

SAS will keep on executing until it reaches an end of file marker.

No comments:

Post a Comment

My First SAS Program