This paper is written by Sagrario Hernández et al. The full document can be downloaded by clicking here. Its abstract follows.
Abstract. Monitoring the internet for pests and/or diseases is a key component of the early warning system, as it locates and extracts documents to generate early information about potential pests that could pose a risk of spreading within the national territory.
The National Service for Agrifood Health, Safety, and Quality (SENASICA) is the Mexican government institution responsible for protecting agricultural, aquaculture, and livestock resources from pests and diseases of quarantine importance.
A news extractor was developed using web scraping with keywords to download news articles about pests that may represent a risk to Mexico's food sector. Through natural language processing, relevant data from the news documents are selected, such as the title, the country, the pest mentioned in the article, the date of the event, among other details.
To assess the success of the extraction, another development presents the information visually with statistics so that SENASICA analysts can evaluate whether the pest data is sufficient and determine if there is a risk of spreading in Mexico's territory, supported by other indicators.
Keywords: Phytosanitary risk, monitoring, pests, web scraping, natural language processing, information retrieval.