Survey Methodology
Statistical methods for sampling cross-classified populations under constraints

by Louis-Paul RivestNote 1

  • Release date: January 3, 2024

Abstract

The article considers sampling designs for populations that can be represented as a N×M. MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOtaiabgE na0kaad2eacaGGUaaaaa@3EA6@  matrix. For instance when investigating tourist activities, the rows could be locations visited by tourists and the columns days in the tourist season. The goal is to sample cells (i,j) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaGikaiaadM gacaaISaGaamOAaiaaiMcaaaa@3E30@ of the matrix when the number of selections within each row and each column is fixed a priori. The i th MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyAamaaCa aaleqabaGaaeiDaiaabIgaaaaaaa@3D35@  row sample size represents the number of selected cells within row i; MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyAaiaacU daaaa@3BE5@  the j th MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOAamaaCa aaleqabaGaaeiDaiaabIgaaaaaaa@3D36@ column sample size is the number of selected cells within column j. MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOAaiaac6 caaaa@3BD9@  A matrix sampling design gives an N×M MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOtaiabgE na0kaad2eaaaa@3DF4@  matrix of sample indicators, with entry 1 at position (i,j) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaGikaiaadM gacaaISaGaamOAaiaaiMcaaaa@3E30@  if cell (i,j) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaGikaiaadM gacaaISaGaamOAaiaaiMcaaaa@3E30@  is sampled and 0 otherwise. The first matrix sampling design investigated has one level of sampling, row and column sample sizes are set in advance: the row sample sizes can vary while the column sample sizes are all equal. The fixed margins can be seen as balancing constraints and algorithms available for selecting such samples are reviewed. A new estimator for the variance of the Horvitz-Thompson estimator for the mean of survey variable y MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrFfpu0de9LqFHe9Lq pepeea0xd9q8as0=LqLs=Jirpepeea0=as0Fb9pgea0lrP0xe9Fve9 Fve9qapdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEaaaa@3B36@  is then presented. Several levels of sampling might be necessary to account for all the constraints; this involves multi-level matrix sampling designs that are also investigated.

Key Words: Balanced sampling; Creel surveys; Cube method; Multi-level sampling; Monte Carlo simulation; Variance estimation.

Table of contents

How to cite

Rivest, L.-P. (2023). Statistical methods for sampling cross-classified populations under constraints. Survey Methodology, Statistics Canada, Catalogue No. 12-001-X, Vol. 49, No. 2. Paper available at http://www.statcan.gc.ca/pub/12-001-x/2023002/article/00011-eng.htm.

Note

Date modified: