A new AI system using Vision Transformers + a lightweight LLM classifies citrus fruit quality with 98.29% accuracy, explains its decisions with heatmaps and reports, and enables real-time, transparent, on-site quality control.
If youβve ever picked up an orange at the market and thought, βIs this one fresh enough?β, youβve faced the same challenge citrus farmers and suppliers battle every day β fruit quality assessment. Traditionally, this work relied on human eyes π, subjective judgment, and hours of sorting. But what if Artificial Intelligence (AI) could step in, classifying fruit with superhuman accuracy while explaining its reasoning clearly?
Thatβs exactly what a team of researchers from Morocco set out to achieve. Their study introduces an AI pipeline that combines Vision Transformers (ViTs) with a lightweight Large Language Model (LLM). The goal? To automatically classify citrus fruits into good, medium, or bad quality, while also giving human-readable explanations for each decision.
The results were stunning: 98.29% classification accuracy β , real-time performance on edge devices, and AI-generated reports explaining why a fruit was labeled as fresh, damaged, or rotten.
This work is a big leap for precision agriculture ππ€, and it shows how modern computer vision + language models can boost transparency in automated decision-making.
Before diving into the new system, letβs understand the evolution of fruit classification methods:
The Moroccan team harnessed this latest shift, building their citrus quality pipeline around ViT-Base (patch size 16Γ16, 224Γ224 input) with ImageNet pre-training.
The researchers collected a diverse dataset of citrus fruits:
Each fruit image was carefully labeled by experts according to international standards (USDA & UNECE). The dataset was also made publicly available on Kaggle, ensuring transparency and reproducibility.
Fruits were categorized into three classes:
This structured dataset laid the foundation for training a robust AI model.
Instead of scanning pixels locally like CNNs, the ViT breaks an image into patches, embeds them, and applies multi-head self-attention. This allows it to learn:
In training, the team used:
Performance quickly stabilized at high accuracy, showing the ViTβs ability to generalize across citrus varieties.
On the test dataset, the ViT achieved:
This is a major leap compared to older approaches (thresholding ~65%, clustering ~70%, CNNs ~90β95%).
Even in real-time tests with new, unseen fruits, the model performed flawlessly β all predictions were correct β .
One challenge with AI in agriculture is trust. Farmers and distributors wonβt rely on a βblack boxβ that just says βBad fruit ββ without explanation.
To solve this, the researchers added two interpretability layers:
βThis fruit is of medium quality (confidence: 99%). Minor imperfections detected on the surface, recommended for processing rather than fresh markets.β
And it does this fast: 0.3 seconds per report, with low power consumption (3.2 W) β perfect for edge devices in farms or warehouses.
This research is more than a technical achievement. It has real agricultural impact:
In short: better profits + less waste + more trust in AI systems.
The study proves the pipeline works, but thereβs more ahead:
This could transform not just citrus, but the entire global fresh produce supply chain.
The combination of Vision Transformers and Lightweight LLMs represents a new era in agricultural AI. This research shows how:
In the near future, picking out a perfect orange might not rely on your eyes alone β but on an AI-powered assistant making sure only the best fruits reach your basket π§Ίπ.
Vision Transformer (ViT) π€πΌοΈ An AI model that breaks an image into patches and uses self-attention to see both the big picture and tiny details at once. - More about this concept in the article "Building a Smarter Wireless Future: How Transformers Revolutionize 6G Radio Technology ππ‘".
Large Language Model (LLM) π§ π¬ A smart text-based AI trained on huge amounts of data β it explains, summarizes, and writes like a human assistant. - More about this concept in the article "Dive Smarter π How AI Is Making Underwater Robots Super Adaptive!".
Grad-CAM π₯π A heatmap tool that shows where the AI is βlookingβ in an image when making decisions (like highlighting a bruise on an orange).
ImageNet Pre-training ποΈββοΈπΈ Training an AI first on a giant image dataset (ImageNet) so it learns general vision skills, then fine-tuning it for citrus fruit quality.
Edge Devices π±π» Small, portable computers (like phones, tablets, or IoT gadgets) that can run AI locally β no need for constant internet. - More about this concept in the article "Smarter Forest Fire Detection in Real Time π₯ F3-YOLO".
Classification Accuracy π―β How often the AI gets it right. Example: 98.29% accuracy = almost 99 correct out of 100 tries.
Precision & Recall ππ
Confidence Score ππ How sure the AI is about its decision, shown as a number between 0 and 1. (e.g., βBad fruit, confidence 0.99β).
Transfer Learning ππ§© Reusing what a model learned on one task (like cats/dogs) and applying it to another (like citrus quality).
Interpretability ππ¦ Making AI decisions understandable to humans β so users can trust what the model says.
Source: Jrondi, Z.; Moussaid, A.; Hadi, M.Y. Interpretable Citrus Fruit Quality Assessment Using Vision Transformers and Lightweight Large Language Models. AgriEngineering 2025, 7, 286. https://doi.org/10.3390/agriengineering7090286
From: Ibn Tofail University; University Mohammed VI Polytechnic (UM6P).