Inputting / Importing a file into SAS.


SAS has various methods of input a file into SAS as a SAS dataset. These method depend upon the type & format of data that we are bringing into SAS. Following are some of various methods to bring in large data files as SAS dataset

1. Infile statement
2. Column input
3. Format input
4. Using library engine
5. Proc import
6. Using menu options.

Infile statement: An infile statement is the most common & most frequently used data input method. This method has various options depending upon the format of data in input file. Below is the syntax for a file in csv format. I will explain the usage of all the options in detail

DATA FILE1;
INPUT STUDENT ID GENDER$ HEIGHT WEIGHT GRADES$;
INFILE " <FILE PATH>.../CLASS5.CSV" DSD TRUNCOVER;
RUN;

An Input statement will declare all the variable names before an Infile statement. This will create a field for all the variables in discriptor portion at the compile stage.
In an infile statement we have to give a path or a file reference in " " in place of file path. This will tell SAS where the data file is located, which we have to fetch in SAS.
A DSD option tells that data is "delimiter sensitive data" & if no delimiter is mentioned  in delim = or dlm =' '  option then, SAS takes it as a comma seperated value file.

TRUNCOVER, MISSOVER, STOPOVER, SCANOVER & FLOWOVER  are some of important options we need to mention, after having a look at the csv file.

Truncover: This option is used in column input & formatted input method. It is used to assign contents of input buffer to a variable field, when default length for variable is shorter than expected.

Missover:  It will put all remaining variables to missing in a line, if no. of variables are more than no. of observation.

Flowover: It continues to read input data record if it doesnot find values in current input line.

Stopover: Causes datastep to stop processing if an input statement reaches end of current line without finding values for all variables. It sets _error_ to 1 & stops inputting values in dataset & print an incomplete dataline in log. 

Column input: A Column input is a simpler way of inputting a .txt file in SAS. There is a drawback though that if data value consists of  commas or dollar sign then it cannot be used, It can input dates as character value.

Consider a txt file having data as below

Col. No.5             10         15          20          25          30          35
_  _  _  _ || _ _ _ _  ||  _ _ _ _ || _ _ _ _ || _ _ _ _||_ _ _ _ || _ _ _ _ ||
A J  A Y   S I N G H    3 5    3 5 0 0 0   M
N E E R J A                  2 8   3 2 0 0 0    F

Here is a program showing column input!!

DATA SAL_DATA;
INPUT NAME $ 1-10 AGE 12-13 SALARY 15-19 GENDER $ 21;
INFILE "<FILE-PATH.TXT>";
RUN;

NAME column starts from 1 to 10th & hence its position is mentioned in input statement 1-10. Similarly for AGE column position is 12-13 & others are SALARY at 15-19 & GENDER at 21.


Formatted Input:
 In this method we can mention informats to let SAS know what kind of variable observation we are inputting. This method can input all non-standard or standard firmats, date formats etc. We have to mention starting position of values, and not complete length, unlike column input method where we have to mention starting & finishing position for a variable observation. All starting position are prefixed by a "@" which is called as column pointer.


Now consider data in above file, but this time in more complex form.



Col. No.5             10         15          20          25          30          35
_  _  _  _ || _ _ _ _ ||  _  _ _  _ || _ _ _ _ || _ _ _ _ ||_ _ _ _ || _ _ _ _ ||
M R .     A JA Y   S I N G H    3 5 .  5   $ 35 0 0 0 . 0 0   M
N E E R J A                             2 8          $ 32 0 0 0 . 0 0   F


DATA SAL_DATA;
INPUT  @ 1  NAME  $  15.
              @16 AGE  3.1.
              @21 SALARY DOLLAR9.2.
              @31 GENDER $ 1.;
FORMAT NAME $15. AGE 3.1. SALARY DOLLAR9.2. GENDER $1.;
INFILE "<FILE-PATH>";
RUN;

Looking at input statement, it can be mentioned in 1 line, but in order to be in more undertandable form i have mentioned each variable declaration in new line. This is a valid statement as SAS needs a ";" semicolon at end of every statement & we have closed our Input statemen after declaring GENDER variable. Notice an 'AT' "@" sign before every variable name which is pointing towards its starting position. An informat is declared after every name. This tells SAS how to read the data & stored in descriptor portion of data.

In order to store or write a data value in SAS dataset in the similar way it is read or mentioned in data file, we have to mention a format statement after input statement. Here it is required to introduce FORMATS & INFORMATS, which will be described in greater length in upcoming post. For now you can just remember that A SAS Informat is used to read data into input buffer, while a SAS Format is used to write the data into SAS Dataset. 

No comments:

Post a Comment

My First SAS Program