For this Friday’s SASsy session (October 27, 2017) let’s talk about some of the basics of the DATA step.
Since most people use Excel to enter their data I like to draw comparisons between the two programs: Excel and SAS. As many of you know I LOVE examples, and yes, I will take someone else’s code, reconfigure it to meet my needs, run it, and go from there. I have noticed this trend with many of the students I meet today. Which is AWESOME! However, we need to understand the DATA step. You the saying “Garbage In, Garbage out”.
We tend to start all of our programs with something like this:
The function or word “Data” tells SAS that we are about to start working with some data – which is great! But what it is actually doing behind the scenes, is creating the framework for your incoming dataset and getting ready to name it. “first” in this example is the name of your dataset. It’s the name YOU give your dataset in SAS. There is nothing magical about this name – it can be whatever name you want.
So “Data first;” is creating the framework for your dataset which you are calling first.
Let’s switch over to Excel. When you enter data into Excel, what is the first thing you do? The second thing? The third thing? And the last thing? Yup! One of the last things you do when entering data into Excel is to save it and give it a name.
In SAS – you save your dataset with a name before entering the data, and in Excel – our tendency is to enter the data first, save and name it last.
You’ve read in some data into SAS and you’ve called the dataset first – like the example above. But now we want to make a few changes to the dataset without making changes to the original file. In Excel, you would use the File -> Save As… option. Open the file in Excel, Save As another name – maybe new_first. In SAS we use the SET function to do this.
Data new_first – let’s SAS know you are getting ready to save a new dataset with the name new_first. The SET statement tells SAS that you want to open a SAS dataset that already exists and it’s called first. Again a little backwards to what we do in Excel – but it accomplishes the same thing. Remember in SAS, we save the file first then either open another dataset or input the data. In Excel we tend to open the file add data, or save as last.
Is there anything wrong with this piece of coding:
What is it doing?
When would we use this?
One option that is available in ALL PROCs in SAS is the DATA= option. In my opinion, one of those options that are really great to have and a great habit to get into. I cannot talk about this enough. When using the DATA= option at the end of calling a particular PROCedure, you are ensuring that SAS will use the correct dataset for the analysis. By default SAS will use the last dataset referred to in your program. For most cases, this is not a problem however, if you save output results in a SAS dataset within a PROCedure, that is the last dataset that SAS will remember and therefore use it in the next PROCedure call, which may or may not be what you want. For example:
output out=beefmeans mean=mean;
model weight = trmt;
What dataset is the Proc means using?
What dataset is the Proc glimmix using?
Proc means data=beef;
output out=beefmeans mean=mean;
Proc glimmix data=beef;
model weight = trmt;
By using the DATA= option, there is no longer any question, which dataset each PROCedure should be using.
Temporary vs Permanent SAS Datasets
The examples I’ve shown above are all using what is referred to as Temporary SAS Datasets. What exactly does that mean? Well… you close down SAS and all your SAS datasets are gone! But no problem right? You’ll import them again, or read them in again the next time you open your SAS program. This is absolutely FINE!
When you look at your LOG window, which I know you all do every time you run a SAS program 🙂 you should have noticed that your SAS dataset names are not simply FIRST or NEW_FIRST or BEEF. They are listed as WORK.FIRST, WORK.NEW_FIRST, or WORK.BEEF
But I don’t recall adding WORK. anything to my SAS coding – so where did this come from and should I care about it? The WORK. is the name of the library inside the SAS program where SAS has stored your datasets. You may have heard me refer to SAS as a blackbox. When you read your data into SAS and create that dataset called FIRST as an example, the SAS dataset is stored “inside” your SAS program. There is NO tangible file associated with that dataset. In order for SAS to stay organized, it creates libraries within itself, and one of those libraries is called WORK. The WORK Library is a temporary space for all the working files that are used during your current session. So, when we turn off or close our SAS program, it clears out the WORK library or the temporary working space. Ok, this is fine, we now know where that WORK. comes from and what it refers to.
What if we want to create a permanent SAS dataset? A dataset that we create in SAS that can sit outside of the blackbox. A dataset that has a tangible file that I could save and maybe send to a fellow researcher. These datasets have a file ending of .sas7bdat, if you have permanent SAS datasets from older versions of SAS, the file ending would be .sd . This may be handy if you have an EXTREMELY large dataset that takes a long time to read in. But for most of our files today, reading the data in every time I open SAS and creating temporary datasets is just fine.
How do you create a Permanent SAS dataset? By creating a new library in SAS. Remember SAS organizes itself in terms of libraries, we have the WORK library for temporary space and files. Now, we want to create a new one. First we need to identify a location on our computer/laptop where we want to save the SAS datasets. Then we create a library that is linked to that location. At the top of our SAS program we will use the following:
libname sasdata “D:\research”;
libname – tells SAS that we are creating a new library.
sasdata – tells SAS that we want to call this new library SASDATA – this can be anything you want to call it!
“D:\research” – is the location of our new SAS library.
So far, all we’ve done is created a link between our location on our computer to SAS. Remember when we used our Data first; statement it created WORK.FIRST. So our goal is to create a dataset called SASDATA.FIRST. To accomplish this we use the same coding as before, except we are going to give SAS the full name of the dataset we are saving, which means including the new library name.
By using this code, we will now have a file in the D:\research directory called first.sas7bdat
Now what? I have a permanent SAS datafile – how do I use it?
Remember the SET function???
How would you use the SET function in reading a Permanent SAS dataset?