2 The SDR design
Iván A. Carrillo and Alan F. Karr
Previous | Next
2.1 Finite population
The SDR finite population
of interest can be represented as in Table 2.1. At wave 1, i.e., the first time of interest, there is a finite set, of Ph.D. holders, either recent or not, who
satisfy the requirements of the SDR.
Table 2.1
SDR finite population
At wave 2 only a subset
of the subjects in still satisfy the SDR requirements; we call
this subset, of subjects, In addition, there is a set of new, recent
Ph.D. recipients, who have obtained their degree since wave 1, and also satisfy
the other requirements of the survey. This set of new graduates in scope is
called and is of size Therefore, at wave 2, there is a total of subjects in the population of interest
At the next wave, wave 3,
the same process occurs. Some people in leave the population of interest and there are
only left in The same thing happens with the set only a subset of among them still satisfy the requirements of
the SDR. Additionally, there are recent graduates entering the population of
interest; this set is called In total, the finite population of interest at
wave 3 is with subjects.
This procedure, of
thinning of old cohorts and adding new cohorts, continues until the last wave
of interest, wave We notice that the finite population of
interest changes at every wave due to two main reasons. Firstly, some of the
subjects in the old cohorts are no longer in scope at the current wave, and
they are not part of the current target population. Secondly, the recent
graduates are added to the target population in the current wave. We denote by the wave of interest (outside the parenthesis)
and by the cohort to which a subject belongs (inside
the parenthesis), and therefore
2.2 Sampling
The sampling design of
the SDR has a similar structure to the finite population and is depicted in
Table 2.2. At wave 1, a (complex) sample of subjects is selected from within the elements in Each element in is interviewed and its data collected; also,
there is a design weight associated with it, which is the inverse of
its inclusion probability at wave 1.
Table 2.2
SDR Sample
At the second wave, the
elements in who are not in scope anymore are simply
dropped from the frame (though their observations at wave 1 are kept), and a
subsample of size of those still in scope is selected. Not all
the members in who are still in scope at wave 2 are retained
in the sample; this is in order to be able to make up room for the sample of
the new Ph.D. recipients and still maintain more or less the same sample size
as in wave 1. A sample of size is selected from people in form the second cohort. The total sample at
wave 2 is which is of size which is approximately equal to All the people in are interviewed at wave 2. The design weights
at wave 2, are such that the sample represents the population of interest at wave
2, namely
The same procedure is
repeated at each wave, till the last one where a subsample of the remaining subjects
from each of the previous cohorts is selected, and a new sample (the new
cohort) of recent graduates is selected from At the last wave, all people in are interviewed and a design weight is created for each person interviewed, so
that represents the finite population
With respect to how the
selection of the individuals that are dropped is made, for example in 2008,
according to NSF (2012), the subsample was selected by stratifying "into 150 strata based on three variables:
demographic group, degree field, and sex.� They go on to explain that:
the past practice
of selecting the sample with probability proportional to size continued, where
the measure of size was the base weight associated with the previous survey
cycle. For each stratum, the sampling algorithm started by identifying and
removing self-representing cases through an iterative procedure. Next, the non-self-representing
cases within each stratum were sorted by citizenship, disability status, degree
field, and year of doctoral degree award. Finally, the balance of the sample (i.e., the total allocation minus the
number of self-representing cases) was selected from each stratum
systematically with probability proportional to size.
It is worth
mentioning that up to 1989 the cohort (or more specifically the graduation
year) was part of the stratifying variables (and weight-adjustment cells), but
beginning in 1991 it has not been; it was replaced by the disability status.
For more details about the subsampling procedure, including the description of
the sample allocation, see NSF (2012) or Cox, Grigorian, Wang and Harter
(2010).
From the preceding
description, it is clear that the design of the SDR is not a rotating panel
design. Beside the fact that the composition of the finite population of
interest is changing over time, a rotating panel design would select, at time a new cohort from and not from as the SDR does.
Another peculiarity of
the SDR is that, at each wave a frame of the recent graduates exists, from which the new cohort can be selected straightforwardly. However, in
other applications, the cost of building such a frame, i.e., a frame of new members, may be excessive (particularly as it
cumulates over waves), and the new cohort may need to be selected from (as opposed to from ). The method proposed in this paper can also be
applied in such cases, as long as for the total sample at wave a cross-sectional weight can be created to
represent We further discuss this topic in Section 3.2.
Notice that in the
notation the quantity represents the wave to which the sample
refers, and denotes the sample's cohort, i.e., the wave at which the sample was
first selected. The notation for the weights is where the first subscript identifies the
subject, and the second refers to the wave of interest, regardless of when the
subject was first selected.
Previous | Next