The challenges of producing statistics for the Web: sampling and automated data collection of webpage information in the Brazilian Web - ARCHIVED

Articles and reports: 11-522-X201300014255

Description:

The Brazilian Network Information Center (NIC.br) has designed and carried out a pilot project to collect data from the Web in order to produce statistics about the webpages’ characteristics. Studies on the characteristics and dimensions of the web require collecting and analyzing information from a dynamic and complex environment. The core idea was collecting data from a sample of webpages automatically by using software known as web crawler. The motivation for this paper is to disseminate the methods and results of this study as well as to show current developments related to sampling techniques in a dynamic environment.

Issue Number: 2013000
Author(s): Bertolini Coelho, Isabela; dos Santos, Emerson Gomes; Jaíze Alves da Silva, Suzana; Nascimento Silva, P.L.D.
FormatRelease dateMore information
PDFOctober 31, 2014

Related information

Subjects and keywords

Subjects

Date modified: