From Images to Data: Building a Price Tag Recognition AI for Retail

Retail environments manage thousands of price changes across hundreds of products and locations. Verifying that shelf prices match system prices is still largely a manual process in most large-scale operations: staff walk aisles, scan products, and cross-reference labels against a database. For retailers with extensive floor space and frequent promotions, this creates a persistent pricing accuracy problem and an expensive labour cost. We built a computer vision system to automate the verification process.

Why Real-World Price Tag OCR Is Hard

Reading price tags from real-world retail images is harder than it sounds. In a controlled environment with a single, well-lit, flat-on product shot, basic OCR works fine. In a real store, images are captured at angles, under mixed lighting, with partial occlusion from other products, reflective surfaces, and hundreds of different tag designs across departments and brands. An OCR engine that works in a lab often fails immediately in production.

The specific problems we needed to solve were: reliably locating the price tag within a cluttered image (not running OCR on the entire scene), correctly reading the price despite image quality variations, and handling the wide diversity of tag formats — different fonts, sizes, layouts, promotional overlays, and unit price breakdowns alongside the main figure.

Three-Layer Architecture

  • Object detection — YOLO: The first layer identifies and localises price tags within the image, cropping the relevant region before passing it to the OCR layer. This step is critical: running OCR on a full store image produces far too much noise; OCR on a precisely cropped tag region produces clean, focused input
  • Primary OCR — EasyOCR and Tesseract: The cropped tag region is processed by a dual-engine OCR layer. EasyOCR handles handwritten or non-standard fonts effectively; Tesseract provides a secondary pass for structured text. When engine outputs differ, the system uses a voting mechanism to determine the final extraction
  • Fallback OCR — Azure Cognitive Services: Images that produce low-confidence results from the primary layer are escalated to Azure OCR, which handles degraded image quality and complex layouts with higher reliability at the cost of additional latency

Handling Real-World Conditions

  • Motion blur — frames from video feeds are pre-filtered; only frames above a sharpness threshold are processed, rejecting blurred captures before they enter the OCR pipeline
  • Lighting variation — adaptive contrast normalisation before the OCR pass improves extraction under fluorescent, natural, and mixed retail lighting conditions
  • Format diversity — a post-processing parser handles the variety of price formats encountered in practice: €1.99, 1,99 €, “WAS 2.49 NOW 1.99”, and unit prices per 100g or per litre

Output and Use Cases

Each processed image returns a structured output: detected region coordinates, extracted price, currency, confidence score, and additional metadata visible on the tag. This enables several downstream use cases:

  • Retail pricing audits — compare extracted shelf prices against system records to identify discrepancies at scale, without manual checks
  • Inventory validation — confirm that product placements match their pricing labels during restocking cycles
  • Promotional compliance — verify that promotional prices are correctly applied across all locations after a campaign launches

This project illustrates how multi-engine AI systems solve real operational problems that single-model approaches consistently fail to handle reliably. It aligns with TechZiel’s work in workflow automation and intelligent data extraction. If you are evaluating AI for retail operations or other image-based use cases, let us discuss what is achievable in your environment.

Leave a Reply

Your email address will not be published. Required fields are marked *