The challenges of producing statistics for the Web: sampling and automated data collection of webpage information in the Brazilian Web - ARCHIVED
Articles and reports: 11-522-X201300014255
Description:
The Brazilian Network Information Center (NIC.br) has designed and carried out a pilot project to collect data from the Web in order to produce statistics about the webpages’ characteristics. Studies on the characteristics and dimensions of the web require collecting and analyzing information from a dynamic and complex environment. The core idea was collecting data from a sample of webpages automatically by using software known as web crawler. The motivation for this paper is to disseminate the methods and results of this study as well as to show current developments related to sampling techniques in a dynamic environment.
Issue Number: 2013000
Format | Release date | More information |
---|---|---|
October 31, 2014 |
Related information
- Date modified: