A MULTIMODAL FRAMEWORK FOR CROP DISEASE DIAGNOSIS: INTEGRATING VISION-BASED CLASSIFICATION AND LARGE LANGUAGE MODEL REASONING
- School of Mathematics and Computer Science, Wuhan Polytechnic University, Wuhan 430023, China.
- Abstract
- Keywords
- Cite This Article as
- Corresponding Author
Early and accurate diagnosis of crop diseases is a critical challenge in precision agriculture, particularly in regions with limited access to agronomic expertise. Although deep learning based image classification has achieved high accuracy in controlled settings, its real world deployment is hindered by challenges such as variable image quality, visual ambiguity among symptoms, and the lack of interpretable, actionable recommendations. To address these limitations, we propose Crop Diag LLM, a novel multimodal diagnostic framework that synergistically integrates (1) a state-of-the-art YOLOv11-based vision module for lesion detection and classification, and (2) a domain-adapted large language model (LLM) for evidence based causal reasoning and treatment planning. A key innovation is our Structured Prompt Engineering (SPE) strategy, which formally aligns visual outputs with textual reasoning. This enables the LLM to incorporate image-derived evidence including disease labels, confidence scores, crop type, and lesion location into a logical Chain-of-Thought (CoT) inference process. Evaluated on a field-collected dataset comprising 3,842 images across 12 major crops and 47 disease types, our system achieves a top-1 accuracy of 93.1% in disease identification, representing an 8.0% improvement over vision-only baselines. Furthermore, it generates treatment suggestions with a 97.2% Expert Compliance Rate (ECR). This work establishes that augmenting vision systems with LLM-driven reasoning not only enhances diagnostic accuracy but also fulfills the practical need for interpretable, actionable, and trustworthy decision support in agriculture.
[Chen Xu, Deng Xintong, Li Zhiqing, Deng Anyi, Tang Chao and Cai Weiwei (2025); A MULTIMODAL FRAMEWORK FOR CROP DISEASE DIAGNOSIS: INTEGRATING VISION-BASED CLASSIFICATION AND LARGE LANGUAGE MODEL REASONING Int. J. of Adv. Res. (Dec). 432-436] (ISSN 2320-5407). www.journalijar.com
Wuhan Polytechnic University
China






