Automated Text Extraction and Classification from Web Documents
Statistics Canada is developing the AI Website Analysis Tool (AiWAT) and AI Document Analysis Tool (AiDAT) to automatically extract and analyze textual information from PDF reports and documents found on websites. These complementary systems use artificial intelligence to identify, extract, and classify information from web-based sources, helping Government of Canada employees process large volumes of documents more efficiently.
The system is currently in development and is designed primarily for use by Government of Canada employees working in statistical analysis and research roles. It does not involve the processing of personal information. The tools employ text extraction and classification models to organize and interpret the content of PDF documents without human review of each individual document.