Tracking repeatedly measured variables in the National Longitudinal Survey of Children and Youth: An illustration based on volunteering during adolescence
Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.
Researchers are able to examine changes in trends over time, through the examination of responses to repeatedly-asked questions, among the same respondents, over several cycles of longitudinal data. Working with these repeatedly-measured responses can often be challenging. This article examines trends in youth's volunteering activities, using data from the National Longitudinal Survey of Children and Youth, to highlight several issues that researchers should consider when working with repeated measures.
Longitudinal analysis using the National Longitudinal Survey of Children and Youth (NLSCY) requires linking a respondent's information across several survey cycles. This information, or repeatedly measured variables, can be found in a number of possible data files produced by the NLSCY.3 For a researcher to create a new dataset comprised of repeated measures, the name of the repeated measure, and the data file in which the repeated measure is located, must be known for each cycle being used in the analysis. In this paper, we discuss several challenges with creating a longitudinal dataset from the NLSCY using an example of self-reported volunteer activities for adolescents aged 12 to 19.
Organized activities, such as volunteering and community service, can provide positive developmental contexts (Mahoney et. al., 2006). The NLSCY provides an opportunity to examine changes in population-based trends over time in the involvement of adolescents in organized activities. To conduct this analysis, repeated measures for the same individual needs to be tracked over time to determine if there are changes in their volunteering behaviour.
Information for the same individual can be identified in different NLSCY data files by using the unique person identifier called 'persruk'. The 'persruk' variable allows the researcher to merge variables from multiple data files.
In addition to identifying data from the same individual over time, the researcher must consider three additional issues. First, the researcher must identify which NLSCY cycles the variables of interest were collected. Second, the researcher must evaluate the consistency of the wording of the questions and the response options across cycles. Third, the researcher must identify the variable name used for the repeated measure in each cycle. The variable naming convention in the NLSCY is formed in part by the location of the survey question and by the survey instrument used to collect responses to the question. For some variables, tracking a repeated measure across cycles for the same individual is relatively straight-forward because the identical question is used in each cycle and only the first letter of the variable name changes between cycles.4 In other cases, the situation is more complex.
An example using volunteer activities for adolescents aged 12 to 15
Researchers who wish to include individuals from multiple age-levels within each cycle must identify the variable name for the measure of interest for each age-level within each cycle. Even if the same survey question is phrased identically across cycles, the same survey question can have two different variable names within the same cycle. The difference in variable name depends on the survey instrument used to collect responses. For example, a self-complete questionnaire is the survey instrument used to collect information on volunteer activities for adolescents aged 12 to 15. However, an interviewer-administered youth questionnaire is the survey instrument used to collect the same information for adolescents aged 16 to 19. Consequently, both survey instruments ask the identical question and collect the same information but result in a different variable name.
The assessment of volunteer activities in the NLSCY are first collected at age 12 in Cycle 2, and continue through to age 19 in Cycle 5. Table 1 displays the eight volunteering measures asked in the NLSCY. The first seven questions are scored dichotomously (yes/no) while the eighth question assesses frequency of volunteering ranging from everyday to less than once a month.
Five of the volunteering questions, questions 2 through 6, are asked of respondents aged 12 or older in Cycles 2 through 5. Responses to these five questions allow for the analysis of within-person change in volunteering over time, as well as an estimation of population-level trajectories in volunteer participation for adolescents. Two additional questions, 1 and 7, are asked only for certain age groups in certain cycles. For example, question 1 was only asked of respondents aged 12 and 13 in Cycles 2, 3, and 4.
Table 2 provides a breakdown for the dichotomous volunteer activity variables by survey cycle and age group. As can be seen, the same question can have a different variable name within the same cycle depending on the respondent's age. The names of the volunteering variables for adolescents aged 15 and under are different from adolescents aged 16 and older. For example, question 5 has the name (DATCBQ5E) in Cycle 4 for adolescents aged 12 to 15, but has a different name (DACYD12D) for adolescents aged 16 and 17. This difference arises as responses to the question being captured by two different survey instruments.
Table 2 National Longitudinal Survey of Children and Youth Dichotomous Volunteering Variable Names by Cycle, Age-Group, and Type of Volunteering
In addition to within-cycle differences in variable names, the variable naming convention changes across cycles. For example, to track an individual response for item 2, the response of adolescents aged 12 in Cycle 2 is provided in the variable BATCBQ5B. A response for the same adolescents in Cycle 3, when they are age 14, is found in the variable CATCBQ5B. Note how only the first letter of the variable name changes, indicating a change in the cycle. However, a response by the same adolescents in Cycle 4, when they are age 16, is found in the variable DACYD12A. This variable name deviates substantially from the previous two because of the change in survey instrument, from the self-complete questionnaire to the interviewer-administered youth questionnaire.
Another consideration is that the variables for different age groups can exist in different data files. For example, in Cycle 5 the variable EATCBQ5B is located in the 10-to-19-year-old data file (NLSCY_02_05_1019_mas) for adolescents aged 12 to 15. However, responses for the corresponding variable for adolescents aged 16 to 19, EACYD12A, are located in the Cycle 5 longitudinal file (NLSCY_02_C5_LONG_mas).
In summary, researchers wishing to examine within-person changes for the same variable over time should take care to check under different variable names and in different data files if responses appear to be systematically missing for specific age groups. Researchers are encouraged to carefully review the NLSCY documentation and codebooks to ensure the question and responses remain consistent across cycles for repeated measures. Some repeated measures are only collected by certain survey instruments or in certain cycles. Moreover, repeatedly measured variables may also exist under other variable names or in another data files.
In conclusion, we recommend that researchers, who are planning a longitudinal analysis of repeatedly measured variables using the NLSCY, first determine the availability of the variables of interest across survey cycles. In our own work, we found a chart similar to Table 2 was essential to tracking the volunteering variables across age groups and cycles. Indeed, not all of the volunteering variables we planned to analyze were asked of each age group in each cycle. By completing such a chart, researchers may gain vital information about the availability of key variables for their analyses and the feasibility of the proposed study. Researchers are encouraged to include a chart like Table 2 with their research proposal for data access through the Statistics Canada Research Data Centres program.
Mahoney, J. L., Harris, A. L., & Eccles, J. S. (2006). Organized activity participation, positive youth development, and the over-scheduling hypothesis. Social Policy Report, 20, 3-30.
Statistics Canada. 2005. "Microdata User Guide, National Longitudinal Survey of Children and Youth, Cycle 5, September 2002 to June 2003." Special Surveys Division. Ottawa.
- Correspondence author Bradley A. Corbett, University of Western Ontario Research Data Centre, 1030 Social Science Building, University of Western Ontario, 1151 Richmond St., London, Ontario, Canada N6A 5C2. (519) 850-2971. email@example.com, affiliations Statistics Canada, Research Data Centre and Department of Sociology, University of Western Ontario.
- Brock Research Institute for Youth Studies.
- Five cycles of data were available when this article was written.
- For a review of the NLSCY variable naming conventions see the Cycle 5 User Guide, pp. 45 to 48.
- Date modified: