2 The SDR design

Iván A. Carrillo and Alan F. Karr

2.1 Finite population

The SDR finite population of interest can be represented as in Table 2.1. At wave 1, i.e., the first time of interest, there is a finite set, $U_{1 (1)} = U_{1},$ of $N_{1 (1)} = N_{1}$ Ph.D. holders, either recent or not, who satisfy the requirements of the SDR.

Table 2.1
SDR finite population

$\begin{matrix} j : & 1 & 2 & 3 & \dots & J - 1 & J \\ U_{1 (1)} & \supseteq & U_{2 (1)} & \supseteq & U_{3 (1)} & \supseteq & \dots & \supseteq & U_{J - 1 (1)} & \supseteq & U_{J (1)} \\ N_{1 (1)} & \geq & N_{2 (1)} & \geq & N_{3 (1)} & \geq & \dots & \geq & N_{J - 1 (1)} & \geq & N_{J (1)} \\ U_{2 (2)} & \supseteq & U_{3 (2)} & \supseteq & \dots & \supseteq & U_{J - 1 (2)} & \supseteq & U_{J (2)} \\ N_{2 (2)} & \geq & N_{3 (2)} & \geq & \dots & \geq & N_{J - 1 (2)} & \geq & N_{J (2)} \\ ⋱ & ⋮ & ⋮ \\ U_{J - 1 (J - 1)} & \supseteq & U_{J (J - 1)} \\ N_{J - 1 (J - 1)} & \geq & N_{J (J - 1)} \\ U_{J (J)} \\ N_{J (J)} \\ U_{1} & U_{2} & U_{3} & \dots & U_{J - 1} & U_{J} \\ N_{1} & N_{2} & N_{3} & \dots & N_{J - 1} & N_{J} \end{matrix}$

At wave 2 only a subset of the subjects in $U_{1 (1)}$ still satisfy the SDR requirements; we call this subset, of $N_{2 (1)}$ subjects, $U_{2 (1)} .$ In addition, there is a set of new, recent Ph.D. recipients, who have obtained their degree since wave 1, and also satisfy the other requirements of the survey. This set of new graduates in scope is called $U_{2 (2)}$ and is of size $N_{2 (2)} .$ Therefore, at wave 2, there is a total of $N_{2} = N_{2 (1)} + N_{2 (2)}$ subjects in the population of interest $U_{2} = U_{2 (1)} \cup U_{2 (2)} .$

At the next wave, wave 3, the same process occurs. Some people in $U_{2 (1)}$ leave the population of interest and there are only $N_{3 (1)}$ left in $U_{3 (1)} .$ The same thing happens with the set $U_{2 (2)};$ only a subset $U_{3 (2)}$ of $N_{3 (2)}$ among them still satisfy the requirements of the SDR. Additionally, there are $N_{3 (3)}$ recent graduates entering the population of interest; this set is called $U_{3 (3)} .$ In total, the finite population of interest at wave 3 is $U_{3} = U_{3 (1)} \cup U_{3 (2)} \cup U_{3 (3)},$ with $N_{3} = N_{3 (1)} + N_{3 (2)} + N_{3 (3)}$ subjects.

This procedure, of thinning of old cohorts and adding new cohorts, continues until the last wave of interest, wave $J .$ We notice that the finite population of interest changes at every wave due to two main reasons. Firstly, some of the subjects in the old cohorts are no longer in scope at the current wave, and they are not part of the current target population. Secondly, the recent graduates are added to the target population in the current wave. We denote by $j = 1,2, \dots, J$ the wave of interest (outside the parenthesis) and by $j^{'} = 1,2, \dots, J$ the cohort to which a subject belongs (inside the parenthesis), and therefore $U_{j (j^{'})} = U_{wave (cohort)} .$

2.2 Sampling

The sampling design of the SDR has a similar structure to the finite population and is depicted in Table 2.2. At wave 1, a (complex) sample $s_{1 (1)} = s_{1}$ of $n_{1 (1)} = n_{1}$ subjects is selected from within the $N_{1}$ elements in $U_{1} .$ Each element $i$ in $s_{1}$ is interviewed and its data collected; also, there is a design weight $w_{i 1} = 1 / π_{i 1}$ associated with it, which is the inverse of its inclusion probability at wave 1.

Table 2.2
SDR Sample

$\begin{matrix} j : & 1 & 2 & 3 & \dots & J - 1 & J \\ s_{1 (1)} & \supseteq & s_{2 (1)} & \supseteq & s_{3 (1)} & \supseteq & \dots & \supseteq & s_{J - 1 (1)} & \supseteq & s_{J (1)} \\ n_{1 (1)} & \geq & n_{2 (1)} & \geq & n_{3 (1)} & \geq & \dots & \geq & n_{J - 1 (1)} & \geq & n_{J (1)} \\ s_{2 (2)} & \supseteq & s_{3 (2)} & \supseteq & \dots & \supseteq & s_{J - 1 (2)} & \supseteq & s_{J (2)} \\ n_{2 (2)} & \geq & n_{3 (2)} & \geq & \dots & \geq & n_{J - 1 (2)} & \geq & n_{J (2)} \\ s_{3 (3)} & \supseteq & \dots & \supseteq & s_{J - 1 (3)} & \supseteq & s_{J (3)} \\ n_{3 (3)} & \geq & \dots & \geq & n_{J - 1 (3)} & \geq & n_{J (3)} \\ ⋱ & ⋮ & ⋮ \\ s_{J - 1 (J - 1)} & \supseteq & s_{J (J - 1)} \\ n_{J - 1 (J - 1)} & \geq & n_{J (J - 1)} \\ s_{J (J)} \\ n_{J (J)} \\ s_{1} & s_{2} & s_{3} & \dots & s_{J - 1} & s_{J} \\ n_{1} & n_{2} & n_{3} & \dots & n_{J - 1} & n_{J} \end{matrix}$

At the second wave, the elements in $s_{1 (1)}$ who are not in scope anymore are simply dropped from the frame (though their observations at wave 1 are kept), and a subsample $s_{2 (1)},$ of size $n_{2 (1)},$ of those still in scope is selected. Not all the members in $s_{1 (1)}$ who are still in scope at wave 2 are retained in the sample; this is in order to be able to make up room for the sample of the new Ph.D. recipients and still maintain more or less the same sample size as in wave 1. A sample $s_{2 (2)}$ of size $n_{2 (2)}$ is selected from $U_{2 (2)};$ people in $s_{2 (2)}$ form the second cohort. The total sample at wave 2 is $s_{2} = s_{2 (1)} \cup s_{2 (2)},$ which is of size $n_{2} = n_{2 (1)} + n_{2 (2)},$ which is approximately equal to $n_{1} .$ All the people in $s_{2}$ are interviewed at wave 2. The design weights at wave 2, $w_{i 2} = 1 / π_{i 2},$ are such that the sample $s_{2}$ represents the population of interest at wave 2, namely $U_{2} .$

The same procedure is repeated at each wave, till the last one $(J),$ where a subsample of the remaining subjects from each of the previous $J - 1$ cohorts is selected, and a new sample (the new cohort) $s_{J (J)}$ of recent graduates is selected from $U_{J (J)} .$ At the last wave, all people in $s_{J} = \cup_{j^{'} = 1}^{J} s_{J (j^{'})}$ are interviewed and a design weight $w_{i J} = 1 / π_{i J}$ is created for each person interviewed, so that $s_{J}$ represents the finite population $U_{J} .$

With respect to how the selection of the individuals that are dropped is made, for example in 2008, according to NSF (2012), the subsample $s_{08} \ s_{08 (08)}$ was selected by stratifying $s_{06}$ "into 150 strata based on three variables: demographic group, degree field, and sex.� They go on to explain that:

the past practice of selecting the sample with probability proportional to size continued, where the measure of size was the base weight associated with the previous survey cycle. For each stratum, the sampling algorithm started by identifying and removing self-representing cases through an iterative procedure. Next, the non-self-representing cases within each stratum were sorted by citizenship, disability status, degree field, and year of doctoral degree award. Finally, the balance of the sample (i.e., the total allocation minus the number of self-representing cases) was selected from each stratum systematically with probability proportional to size.

It is worth mentioning that up to 1989 the cohort (or more specifically the graduation year) was part of the stratifying variables (and weight-adjustment cells), but beginning in 1991 it has not been; it was replaced by the disability status. For more details about the subsampling procedure, including the description of the sample allocation, see NSF (2012) or Cox, Grigorian, Wang and Harter (2010).

From the preceding description, it is clear that the design of the SDR is not a rotating panel design. Beside the fact that the composition of the finite population of interest is changing over time, a rotating panel design would select, at time $j,$ a new cohort from $U_{j},$ and not from $U_{j} \ U_{j - 1}$ as the SDR does.

Another peculiarity of the SDR is that, at each wave $j,$ a frame of the recent graduates $U_{j (j)}$ exists, from which the new cohort $s_{j (j)}$ can be selected straightforwardly. However, in other applications, the cost of building such a frame, i.e., a frame of new members, may be excessive (particularly as it cumulates over waves), and the new cohort may need to be selected from $U_{j}$ (as opposed to from $U_{j (j)}$ ). The method proposed in this paper can also be applied in such cases, as long as for the total sample at wave $j, s_{j},$ a cross-sectional weight can be created to represent $U_{j} .$ We further discuss this topic in Section 3.2.

Notice that in the notation $s_{j (j^{'})},$ the quantity $j$ represents the wave to which the sample refers, and $j^{'}$ denotes the sample's cohort, i.e., the wave at which the sample was first selected. The notation for the weights is $w_{i j},$ where the first subscript identifies the subject, and the second refers to the wave of interest, regardless of when the subject was first selected.

Previous | Next

Date modified:: 2017-09-20

Language selection

Search and menus

Search

Publications

Survey Methodology

Browse by

2 The SDR design

2.1 Finite population

2.2 Sampling