4 Application to the SDR

Iván A. Carrillo and Alan F. Karr

Previous | Next

The dataset we use is the restricted SDR data, under a license agreement from NSF. The SDR collects information about employment situation, principal employer, principal job, past employment, recent education, demographics, and disability, among others that vary from wave to wave. We use only information requested in all the waves of interest: 1995, 1997, 1999, 2001, 2003, 2006, and 2008.

To illustrate our methodology, we constructed a model for individuals' salaries over time. The response is the log of salary (in the principal job), with an identity link function, and several covariates; modeling log of salary (as opposed to salary) is a standard practice. There are both time-independent covariates (such as gender) and time-dependent ones (such as employment sector). We have four major classes of covariates. The Degree variables are: degree field, years since degree, and age at graduation. The Job variables are: job field or category, sector, postdoc indicator, adjunct faculty indicator, hours worked per week in the principal job, weeks per year in the principal job, how related is the job to the doctoral degree, part-time for different reasons, number of months since started in the principal job, the starting month in the principal job, whether the employer/type of job has changed since previous wave, and whether changed employer/type of job since previous wave because was laid off or job terminated. The Person's demographics are: gender, citizenship status, race/ethnicity, presence of children in family, marital status, and spouse's working status. Finally, the "Environment variables are: years since 1995, state (of employment), and the consumer price index (of the region of employment). The full list of variables, interactions, and categories can be found in Carrillo and Karr (2011). For categorical variables, the reference category is the one with the largest count.

The dataset for our model consists of 59,346 subjects and 190,693 observations, distributed as: n 95 = MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamOBam aaBaaaleaacaaI5aGaaGynaaqabaGccqGH9aqpaaa@3CF9@  30,234, n 97 = MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamOBam aaBaaaleaacaaI5aGaaG4naaqabaGccqGH9aqpaaa@3CFB@  30,652, n 99 = MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamOBam aaBaaaleaacaaI5aGaaGyoaaqabaGccqGH9aqpaaa@3CFD@  26,732, n 01 = MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamOBam aaBaaaleaacaaIWaGaaGymaaqabaGccqGH9aqpaaa@3CEC@  26,778, n 03 = MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamOBam aaBaaaleaacaaIWaGaaG4maaqabaGccqGH9aqpaaa@3CEE@  24,956, n 06 = MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamOBam aaBaaaleaacaaIWaGaaGOnaaqabaGccqGH9aqpaaa@3CF1@  25,910, and n 08 = MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamOBam aaBaaaleaacaaIWaGaaGioaaqabaGccqGH9aqpaaa@3CF3@  25,431. Those data correspond to non-missing salaries between $5,000 and $999,995, for people with consistent ages across the waves, and with non-missing value for the variable indicating whether the (postsecondary educational institution) employer was public or private. The average (cross-sectional) survey weight for each of those waves are: w ¯ 95 = MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGabm4Day aaraWaaSbaaSqaaiaaiMdacaaI1aaabeaakiabg2da9aaa@3D1A@  15.37, w ¯ 97 = MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGabm4Day aaraWaaSbaaSqaaiaaiMdacaaI3aaabeaakiabg2da9aaa@3D1C@  16.28, w ¯ 99 = MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGabm4Day aaraWaaSbaaSqaaiaaiMdacaaI5aaabeaakiabg2da9aaa@3D1E@  19.96, w ¯ 01 = MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGabm4Day aaraWaaSbaaSqaaiaaicdacaaIXaaabeaakiabg2da9aaa@3D0D@  20.74, w ¯ 03 = MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGabm4Day aaraWaaSbaaSqaaiaaicdacaaIZaaabeaakiabg2da9aaa@3D0F@  22.71, w ¯ 06 = MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGabm4Day aaraWaaSbaaSqaaiaaicdacaaI2aaabeaakiabg2da9aaa@3D12@  22.93, and w ¯ 08 = MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGabm4Day aaraWaaSbaaSqaaiaaicdacaaI4aaabeaakiabg2da9aaa@3D14@  24.88.

The survey weights that we use for each wave are the final adjusted weights. These weights are the original design weights adjusted for nonresponse and post-stratification. However, the theory that we developed in Section 3 assumes that the weights are the inverse of the selection probabilities; in other words, the original design weights. This is a mismatch whose effect we plan to investigate in the future. On the other hand, the calculations in the last part of the Appendix (which do not assume anything about the weights) suggest that the effect of this mismatch is small.

The covariates and interactions that we considered were selected because they were suggested either by exploratory analyses or by the subject matter experts at the NSF. Carrillo and Karr (2011) present the estimated β MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaeqOSdi gaaa@3AE9@  coefficients in the model y ij =log( SALARY ij )= X ij β+ ε ij , MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamyEam aaBaaaleaacaWGPbGaamOAaaqabaGccqGH9aqpciGGSbGaai4Baiaa cEgadaqadaqaaiaabofacaqGbbGaaeitaiaabgeacaqGsbGaaeywam aaBaaaleaacaWGPbGaamOAaaqabaaakiaawIcacaGLPaaacqGH9aqp ceWGybGbauaadaWgaaWcbaGaamyAaiaadQgaaeqaaGGabOGae8NSdi Mae83kaSIaeqyTdu2aaSbaaSqaaiaadMgacaWGQbaabeaakiaacYca aaa@5397@  where X ij MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiwam aaBaaaleaacaWGPbGaamOAaaqabaaaaa@3C2E@  includes the intercept along with the other covariates. This β MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaacceGae8 NSdigaaa@3AEF@  corresponds to the one in model ξ, MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaeqOVdG Naaiilaaaa@3BBB@  in Formula (3.1), and whose properties are discussed in Section 3. The working covariance matrix is estimated to be V ^ i = ϕ ^ R( α ^ ), MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGabmOvay aajaWaaSbaaSqaaiaadMgaaeqaaOGaeyypa0Jafqy1dyMbaKaacaWH sbWaaeWaaeaacuaHXoqygaqcaaGaayjkaiaawMcaaiaacYcaaaa@42F8@  with ϕ ^ = σ ^ 2 = ( is j=95 08 w ij e ^ ij 2 )/ ( is j=95 08 w ij p ) =0.196, MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGafqy1dy MbaKaacqGH9aqpcuaHdpWCgaqcamaaCaaaleqabaGaaGOmaaaakiab g2da9maalyaabaWaaeWaaeaadaaeqaqabSqaaiaadMgacqGHiiIZca WGZbaabeqdcqGHris5aOWaaabmaeaacaWG3bWaaSbaaSqaaiaadMga caWGQbaabeaakiqadwgagaqcamaaDaaaleaacaWGPbGaamOAaaqaai aaikdaaaaabaGaamOAaiabg2da9iaaiMdacaaI1aaabaGaaGimaiaa iIdaa0GaeyyeIuoaaOGaayjkaiaawMcaaaqaamaabmaabaWaaabeae qaleaacaWGPbGaeyicI4Saam4Caaqab0GaeyyeIuoakmaaqadabaGa am4DamaaBaaaleaacaWGPbGaamOAaaqabaGccqGHsislcaWGWbaale aacaWGQbGaeyypa0JaaGyoaiaaiwdaaeaacaaIWaGaaGioaaqdcqGH ris5aaGccaGLOaGaayzkaaaaaiabg2da9iaaicdacaaIUaGaaGymai aaiMdacaaI2aGaaiilaaaa@6CE3@  where e ^ ij = y ij X ij β ^ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGabmyzay aajaWaaSbaaSqaaiaadMgacaWGQbaabeaakiabg2da9iaadMhadaWg aaWcbaGaamyAaiaadQgaaeqaaOGaeyOeI0IabmiwayaafaWaaSbaaS qaaiaadMgacaWGQbaabeaaiiqakiqb=j7aIzaajaaaaa@460C@  and p= MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiCai abg2da9aaa@3B43@  208 is the number of covariates in X ij , w ij MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiwam aaBaaaleaacaWGPbGaamOAaaqabaGccaGGSaGaam4DamaaBaaaleaa caWGPbGaamOAaaqabaaaaa@3FED@  is the cross-sectional weight for subject i MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamyAaa aa@3A36@  at wave j MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamOAaa aa@3A37@  as long as i s j MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamyAai abgIGiolaadohadaWgaaWcbaGaamOAaaqabaaaaa@3DCD@  and zero otherwise. The estimate α ^ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGafqySde MbaKaaaaa@3AF7@  contains the 21= ( 7×6 )/2 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaaGOmai aaigdacqGH9aqpdaWcgaqaamaabmaabaGaaG4naiabgEna0kaaiAda aiaawIcacaGLPaaaaeaacaaIYaaaaaaa@41B8@  estimated auto-correlations α ^ j j = α ^ j j = ( is w ij w i j e ^ ij e ^ i j )/ ( ϕ ^ [ is w ij w i j p ] ) , MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGafqySde MbaKaadaWgaaWcbaGaamOAaiqadQgagaqbaaqabaGccqGH9aqpcuaH XoqygaqcamaaBaaaleaaceWGQbGbauaacaWGQbaabeaakiabg2da9m aalyaabaWaaeWaaeaadaaeqaqabSqaaiaadMgacqGHiiIZcaWGZbaa beqdcqGHris5aOWaaOaaaeaacaWG3bWaaSbaaSqaaiaadMgacaWGQb aabeaaaeqaaOWaaOaaaeaacaWG3bWaaSbaaSqaaiaadMgaceWGQbGb auaaaeqaaaqabaGcceWGLbGbaKaadaWgaaWcbaGaamyAaiaadQgaae qaaOGabmyzayaajaWaaSbaaSqaaiaadMgaceWGQbGbauaaaeqaaaGc caGLOaGaayzkaaaabaWaaeWaaeaacuaHvpGzgaqcamaadmaabaWaaa beaeqaleaacaWGPbGaeyicI4Saam4Caaqab0GaeyyeIuoakmaakaaa baGaam4DamaaBaaaleaacaWGPbGaamOAaaqabaaabeaakmaakaaaba Gaam4DamaaBaaaleaacaWGPbGabmOAayaafaaabeaaaeqaaOGaeyOe I0IaamiCaaGaay5waiaaw2faaaGaayjkaiaawMcaaaaacaGGSaaaaa@69E0@  for j j = MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamOAai abgcMi5kqadQgagaqbaiabg2da9aaa@3DFF@  1995, 1997, 1999, 2001, 2003, 2006, 2008, and α ^ jj =1 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGafqySde MbaKaadaWgaaWcbaGaamOAaiaadQgaaeqaaOGaeyypa0JaaGymaaaa @3ECC@  for all j. MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamOAai aac6caaaa@3AE9@  These estimated values form the auto-correlation matrix:

R( α ^ )=( 1 α ^ 95,97 α ^ 95,99 α ^ 95,01 α ^ 95,03 α ^ 95,06 α ^ 95,08 1 α ^ 97,99 α ^ 97,01 α ^ 97,03 α ^ 97,06 α ^ 97,08 1 α ^ 99,01 α ^ 99,03 α ^ 99,06 α ^ 99,08 1 α ^ 01,03 α ^ 01,06 α ^ 01,08 1 α ^ 03,06 α ^ 03,08 sym 1 α ^ 06,08 1 )=( 1 0.38 0.36 0.32 0.30 0.28 0.27 1 0.42 0.36 0.33 0.32 0.31 1 0.46 0.38 0.36 0.34 1 0.47 0.40 0.38 1 0.49 0.44 sym 1 0.55 1 ). MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrVeFv0de9LqFf0xe9 vqaqFeFr0xbbG8FaYPYRWFb9vqVeuD0dYdbvk9qq=xd9qqai=hf9sr 0=vr0=vqFXqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaahkfada qadaqaaiqbeg7aHzaajaaacaGLOaGaayzkaaGaeyypa0ZaaeWaaeaa faqabeWbhaaaaaqaaiaaigdaaeaacuaHXoqygaqcamaaBaaaleaaca aI5aGaaGynaiaaiYcacaaI5aGaaG4naaqabaaakeaacuaHXoqygaqc amaaBaaaleaacaaI5aGaaGynaiaaiYcacaaI5aGaaGyoaaqabaaake aacuaHXoqygaqcamaaBaaaleaacaaI5aGaaGynaiaaiYcacaaIWaGa aGymaaqabaaakeaacuaHXoqygaqcamaaBaaaleaacaaI5aGaaGynai aaiYcacaaIWaGaaG4maaqabaaakeaacuaHXoqygaqcamaaBaaaleaa caaI5aGaaGynaiaaiYcacaaIWaGaaGOnaaqabaaakeaacuaHXoqyga qcamaaBaaaleaacaaI5aGaaGynaiaaiYcacaaIWaGaaGioaaqabaaa keaaaeaacaaIXaaabaGafqySdeMbaKaadaWgaaWcbaGaaGyoaiaaiE dacaaISaGaaGyoaiaaiMdaaeqaaaGcbaGafqySdeMbaKaadaWgaaWc baGaaGyoaiaaiEdacaaISaGaaGimaiaaigdaaeqaaaGcbaGafqySde MbaKaadaWgaaWcbaGaaGyoaiaaiEdacaaISaGaaGimaiaaiodaaeqa aaGcbaGafqySdeMbaKaadaWgaaWcbaGaaGyoaiaaiEdacaaISaGaaG imaiaaiAdaaeqaaaGcbaGafqySdeMbaKaadaWgaaWcbaGaaGyoaiaa iEdacaaISaGaaGimaiaaiIdaaeqaaaGcbaaabaaabaGaaGymaaqaai qbeg7aHzaajaWaaSbaaSqaaiaaiMdacaaI5aGaaGilaiaaicdacaaI XaaabeaaaOqaaiqbeg7aHzaajaWaaSbaaSqaaiaaiMdacaaI5aGaaG ilaiaaicdacaaIZaaabeaaaOqaaiqbeg7aHzaajaWaaSbaaSqaaiaa iMdacaaI5aGaaGilaiaaicdacaaI2aaabeaaaOqaaiqbeg7aHzaaja WaaSbaaSqaaiaaiMdacaaI5aGaaGilaiaaicdacaaI4aaabeaaaOqa aaqaaaqaaaqaaiaaigdaaeaacuaHXoqygaqcamaaBaaaleaacaaIWa GaaGymaiaaiYcacaaIWaGaaG4maaqabaaakeaacuaHXoqygaqcamaa BaaaleaacaaIWaGaaGymaiaaiYcacaaIWaGaaGOnaaqabaaakeaacu aHXoqygaqcamaaBaaaleaacaaIWaGaaGymaiaaiYcacaaIWaGaaGio aaqabaaakeaaaeaaaeaaaeaaaeaacaaIXaaabaGafqySdeMbaKaada WgaaWcbaGaaGimaiaaiodacaaISaGaaGimaiaaiAdaaeqaaaGcbaGa fqySdeMbaKaadaWgaaWcbaGaaGimaiaaiodacaaISaGaaGimaiaaiI daaeqaaaGcbaaabaGaae4CaiaabMhacaqGTbaabaaabaaabaaabaGa aGymaaqaaiqbeg7aHzaajaWaaSbaaSqaaiaaicdacaaI2aGaaGilai aaicdacaaI4aaabeaaaOqaaaqaaaqaaaqaaaqaaaqaaaqaaiaaigda aaaacaGLOaGaayzkaaGaeyypa0ZaaeWaaeaafaqabeWbhaaaaaqaai aaigdaaeaacaaIWaGaaiOlaiaaiodacaaI4aaabaGaaGimaiaac6ca caaIZaGaaGOnaaqaaiaaicdacaGGUaGaaG4maiaaikdaaeaacaaIWa GaaiOlaiaaiodacaaIWaaabaGaaGimaiaac6cacaaIYaGaaGioaaqa aiaaicdacaGGUaGaaGOmaiaaiEdaaeaaaeaacaaIXaaabaGaaGimai aac6cacaaI0aGaaGOmaaqaaiaaicdacaGGUaGaaG4maiaaiAdaaeaa caaIWaGaaiOlaiaaiodacaaIZaaabaGaaGimaiaac6cacaaIZaGaaG OmaaqaaiaaicdacaGGUaGaaG4maiaaigdaaeaaaeaaaeaacaaIXaaa baGaaGimaiaac6cacaaI0aGaaGOnaaqaaiaaicdacaGGUaGaaG4mai aaiIdaaeaacaaIWaGaaiOlaiaaiodacaaI2aaabaGaaGimaiaac6ca caaIZaGaaGinaaqaaaqaaaqaaaqaaiaaigdaaeaacaaIWaGaaiOlai aaisdacaaI3aaabaGaaGimaiaac6cacaaI0aGaaGimaaqaaiaaicda caGGUaGaaG4maiaaiIdaaeaaaeaaaeaaaeaaaeaacaaIXaaabaGaaG imaiaac6cacaaI0aGaaGyoaaqaaiaaicdacaGGUaGaaGinaiaaisda aeaaaeaacaqGZbGaaeyEaiaab2gaaeaaaeaaaeaaaeaacaaIXaaaba GaaGimaiaac6cacaaI1aGaaGynaaqaaaqaaaqaaaqaaaqaaaqaaaqa aiaaigdaaaaacaGLOaGaayzkaaGaaiOlaaaa@01AE@

We now give some conclusions about salaries in the Ph.D. workforce based on the estimated coefficients, which appear in Carrillo and Karr (2011). First of all, a sensible estimate of mean salary considers the intercept, the hours worked per week (whose average is 47), and years since degree (average of 15); so that an estimate of the overall average is exp( 9.4+47×0.038 47 2 ×0.0003+15×0.03 15 2 ×0.0006 )=$52,067, MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaciyzai aacIhacaGGWbWaaeWaaeaacaaI5aGaaiOlaiaaisdacqGHRaWkcaaI 0aGaaG4naiabgEna0kaaicdacaaIUaGaaGimaiaaiodacaaI4aGaey OeI0IaaGinaiaaiEdadaahaaWcbeqaaiaaikdaaaGccqGHxdaTcaaI WaGaaGOlaiaaicdacaaIWaGaaGimaiaaiodacqGHRaWkcaaIXaGaaG ynaiabgEna0kaaicdacaaIUaGaaGimaiaaiodacqGHsislcaaIXaGa aGynamaaCaaaleqabaGaaGOmaaaakiabgEna0kaaicdacaGGUaGaaG imaiaaicdacaaIWaGaaGOnaaGaayjkaiaawMcaaiabg2da9iaacsca caqG1aGaaeOmaiaabYcacaqGWaGaaeOnaiaabEdacaqGSaaaaa@699B@  for a subject with all other continuous covariates equal to zero and in the reference of all categorical covariates.

All other things being constant, women's salaries are about 93.4% those of men, whereas race does not seem to have an effect on salaries. The gender × MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9LqFf0x e9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFjea0RXxb9qr0dd9q8qi0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaey41aq laaa@3B5F@  years since 1995 interaction is not significant; therefore this salary differential is not changing over time. Notice that with a single year's data, we would not be able to evaluate the effect of time. Even more important than that, using only the data from a single wave, say 2008, we would not be able to assess whether the effect of being female is changing over time.

Doctorate holders with a management job have the highest salaries, followed by those in health occupations; on the other hand, those with the lowest salaries are the ones employed in "other� occupations, followed by those in political science.

Among employment sectors, salaries are highest in for-profit industry (20% higher than for the reference category of tenured faculty in public 4-year institutions), followed in order by the federal government, self-employment, non-profit industry, all of which are higher than the reference category. The lowest salaries are those in two-year colleges and in two- and four-year institutions for which tenure is not applicable.

The highest single negative effect on salaries also occurs within the education sector. Those with positions as adjunct faculty members have salaries that are approximately 59% of the salaries of comparable doctorate holders. Not surprisingly, postdoctoral salaries are only about 74% of the salaries of comparable people in other types of positions.

Sector is also a contributing factor to the hard-to-interpret dependence of salary on the starting month for the current position: salaries are lower for starting months of August and September. Additional analyses show that the monthly effect is present only in the education sector, where, as we have seen, salaries are lower than in industry or government, and in which starting months of August and September are common. Therefore, sector is part of the answer, but not the entire answer. Finer-grained divisions of the education sector, using Carnegie classifications, further reduce, but do not remove, the significance of monthly effects. The SDR does not seem to contain sufficient data to remove the monthly effects entirely, so we have retained the SDR definition of sector.

People with degrees in computing and information sciences have the highest salaries (around 20% higher than in the biological sciences), followed by those in electrical and computer engineering and in economics (approximately 16% higher). Doctorate holders in agricultural and food sciences, environmental life sciences, earth, atmospheric, and ocean sciences, and in "other� social sciences have the lowest salaries. The "other� social sciences are the social sciences excluding economics and political science.

Married people have the highest salaries, followed by those who are in married-like relationships, widowed, separated, divorced, and never married. The latter have salaries only around 89% as high as the married ones; one could argue that there is some association between never married and age. The presence of children older than two is associated with higher salaries, but the presence of children younger than two is not.

Doctorate holders with jobs only somewhat related to their degree field make around 93% of what people with closely related jobs (the reference category) do. If the job is not related to the doctoral degree as the result of a change in career or professional interests, they make around 82% of what people with closely related jobs do. On the other hand, those with jobs not related for other reasons make only about 76% of what the reference category does.

There is an increase of around 3% for every additional year since doctorate graduation, although there is a diminishing effect for higher number of years. We interpret this as the effect of experience. There is a small penalty for receiving the doctorate later in life; for every additional year of age at graduation, the salary reduces by 1%.

We also found that the regional Consumer Price Index (CPI) is significant. The higher the CPI, the higher the salary. We could not use the CPI associated with the labor market of employment because the SDR data do not identify geography beyond the state. We included the state in the model as a proxy for cost of living; the state effect is highly significant and some state coefficients are among the highest overall. The highest salaries are in California, Washington D.C. and its suburbs, and New York City and its suburbs. On the other hand, the lowest salaries are in Puerto Rico, Vermont, Montana, Maine, Idaho, South Dakota, North Dakota, and in the Territories/Abroad.

Having a part-time job due to being retired or semi-retired is significant and in several significant interactions. Because of this, we do not think that the available data present the full picture about retirement, for example, for people who are (semi-)retired and yet have full-time jobs.

Finally, we analyzed residuals; Figures 4.1 and 4.2 show a Box and Whisker plot of standardized residuals by year and a spaghetti plot of standardized residuals, respectively.

Figure 4.1 shows that the model fits reasonably well for all the reference years as most of the standardized residuals lie between -2 and 2. Also, the distributions of residuals do not seem to greatly differ from year to year.

Figure 4.1
Figure 4.1  Box and Whisker plot of standardized residuals by year

From Figure 4.2 we also conclude that the model fits reasonably well for most people, as most of the lines fluctuate between -2 and 2. Nonetheless, there are a few people for which the model seems to greatly over-predict in 2003 and some few people for whom that happens in 2006. We included several terms in the model to correct this issue but clearly none seemed to do so completely.

Figure 4.2
Figure 4.2  Spaghetti plot of standardized residuals

The last thing we tried was to produce exploratory classification trees for these residual blips. We found that, in the dataset available, the only thing related to them was the survey mode. The blips in 2003 are disproportionately high for web responses, and the blips in 2006 are disproportionately high for CATI responses. We conclude that either there is a mode effect in these two years or those respondents have something different, in those years, that is not included in the available variables.

Finally, the plot of fitted values versus observed (which can be found in Carrillo and Karr 2011) also shows a similar story. For most observations the model performs well, apart from those few cases in 2003 and 2006 for whom there is large over-estimation.

Previous | Next

Date modified: