Random Stata tricks: The 2024 supercut

It’s the end of the year, and what’s more suitable to wrap it up than showing people all the thingamajigs I learned to do in Stata?

What I’m going to show below might not be new to you. But if I was learning them for the first time in 2024, then there’s a fair chance that there exists at least one (1) individual on the internet who will find them useful!


First and foremost, let’s start with something really funny: Stata’s do-file editor somehow allows you to insert your cursor at multiple locations simultaneously, just like an actual programming IDE.

I have no idea why and how, but this turns out to be handy when you need to edit several lines of your code in a similar manner.


Next let’s talk about two things that I surprisingly had no idea about until this year. I’m grouping them as for me they serve the same purpose: Minimize unnecessary Stata outputs to the hard drive.

Seasoned Stata users might know where this is leading to: Temp files.

Temp files are another kind of Stata macro that stands for placeholder file names. They are treated akin to locals – which means that if you save something in a temp file, Stata stores said stuff temporarily somewhere in its root folder, associates this temporarily stored stuff with the name assigned to the tempfile macro, and deletes the temporary file immediately when the code block is done executing. I find them handy for complicated data-cleaning work where I need to dice up the dataset in different ways and stitch them back together.

So two things: 1. This does not stop the file from ever being stored on your hard drive. But at least it no longer lies there forever! 2. Temp files are locals, so they are “gone” gone after running the code (Otherwise it defies the purpose isn’t it?).

Alright, enough talking. Here is a sketch of how temp files can be used in said setting:

Stata
// Let's assume the working directory has been set and the dataset is just called "data"
use data, clear
tempfile temp_data 

/* Insert whatever data cleaning code here */

save `temp_data'

// If later down the track you'll need to merge the saved temporary dataset with something else, it can be called just like any other local!
merge 1:1 id using `temp_data'
save The_actual_dataset, replace

The handy thing with temp files is that they can be used to store formats other than .dta files as well! Here is our second gimmick which solves one of my long-standing questions: How do you export a single .pdf file with all the graphs you wanted in Stata?

It turns out that there is a set of commands called putpdf out there. Those commands quite literally put stuff into a (yet to be formally saved) .pdf file. Here I think things will make more sense if I start with a hypothetical example:

Stata
// Imagine that the dataset has already been loaded and we have a list of variables that each need a) A histogram; b) A kernel density plot.
local var_list /* Insert the variables here */

******************************************
* Plotting
******************************************
putpdf clear // Makes sure that there is no existing .pdf file in the background
putpdf begin // Creates an empty .pdf file

putpdf paragraph // Tells Stata "We'll add content from here"

tempfile hist_temp
tempfile kern_temp

foreach var in varlist `var_list' {
  hist `var' // Just vanilla code here as this is not the emphasis
  graph export "`hist_temp'", as(png) replace
  putpdf image "`hist_temp'"
  
  kdensity `var' // Just vanilla code here as this is not the emphasis either
  graph export "`kern_temp'", as(png) replace
  putpdf image "`kern_temp'"
  
  putpdf pagebreak // Adds a pagebreak
	putpdf paragraph

}

putpdf save "Plots.pdf", replace // Saves the .pdf file to an actual location in the hard drive

The key thing I noticed when using the putpdf commands is that Stata would need the putpdf paragraph command whenever you start a new file/page. But other than that, as long as you can keep track of your mental image of the yet-to-be-saved .pdf file, it’s pretty straightforward.

The other thing is that you can see in this example that temp files can store things in whatever format as long as it is unambiguously specified.


Now onto some cool stuff. Say we have a local holding a list of variables to loop through, and now we need to only loop through some of them. There’s actually a cool way to do this that will make you feel like a real programmer (if you’re a goofball like me, that is)!

We are talking about list manipulations here. Keep in mind that all list manipulations work with macros and are supposed to work with macros only. Hence any locals should not be wrapped with single quotation marks.

Searching “macro lists” in Stata’s help file brings up the full list of things that can be done as list manipulations. I’ll offer two specific examples below to illustrate a bit of technical details:

Stata
/* Example 1 */
// Imagine working with a long survey dataset where there are two observations (id=1 -husband/2 - wife) per household (HHID). Here I'll reshape the hypothetical dataset to wide, and change the variable labels

quiet ds // This the command that generates the r(varlist)
local varlist `r(varlist)' // Exports a list of all variables

local exclude HHID id // These are the two variables that do not need re-labeling after the reshape
local allvars: list varlist-exclude // allvars becomes a local that includes all the variables excluding HHID and id

foreach var of varlist `allvars' {
  local lbl`var' : variable label `var' // Stores the original variable labels
}

reshape wide `allvars', i(HHID) j(id) string

// This is a foreach-in loop because it's not actually looping through the variables in allvars
foreach var in `allvars' {
  label variable `var'_1 "`lbl`var'' (Husband)"
  label variable `var'_2 "`lbl`var'' (Wife)"
  // Loads the original variable label, appends the extra string, and labels the new variables
}
Stata
/* Example 2 */
// Imagine looping through a list of variables, where the sequence of the variables matter.
// However, some of the variables are slightly different from the others (e.g., continuous vs. categorical) and requires a slightly different set of commands.

local all_variables /* Insert the variables here */
local oddballs /* Insert the "different" variables here */

foreach var in `all_variables' {
  if `: list var in oddballs'{
    /* The "different" set of commands */
  }
  else {
    /* The "regular" set of commands */
  }
}

What you’ll notice from the two examples is that the result from a macro list manipulation can be either stored into another macro, or called directly. If called directly, Stata will treat the result as a local, and it needs to be wrapped by single quotation marks.

List manipulations are supposed to work with locals, but if one of your macro lists is defined as a global they will still work as long as the global macro is appropriately referred to:

Stata
// Let's illustrate this with a modified version from example 1:

quiet ds 
global varlist `r(varlist)' 

global exclude HHID id 
local allvars: list global(varlist)-global(exclude) 

A global macro can be processed by local manipulations if you wrap its name with global().

Leave a Reply

Your email address will not be published. Required fields are marked *