Object Recognition for Price Matching

Author: Aleksandras Šulženko, Product Owner at Oxylabs


Pricing intelligence rests on two foundational principles: product and price matching. Extracting the latter data is mostly relatively easy. It usually is easily searchable through the HTML file, making web scraping perfect at picking up price data at scale.

Product matching is where the process becomes complicated. At first glance, it may not seem difficult at all. Simply match the titles across websites, and you’re done. Unfortunately, such an approach would work for a few percentage points of products out of the entire ecommerce industry.

There’s no industry standard on how to create product titles. Additionally, on large user-generated marketplaces, SEO and other marketing considerations might come into play, making it even more challenging to find a perfect match.


Current solutions to product matching

As dynamic pricing is such a popular and important part of ecommerce, several solutions have emerged to tackle the problem. None of them, however, provide a foolproof detection method and are usually used in conjunction.

UPC, EAN, and GTIN comparisons are the most effective by far. They would be almost completely foolproof if not for the fact that few retailers ever publish them. Matching them is preferred to most other methods, but expectations are often shattered due to the scarce availability of such data.

Scraping product specifications such as dimensions, models, production dates, etc. These values are usually static across many retailers as they come from manufacturers and cannot be changed. Slight issues arise as the structure in the way the specifications are displayed isn’t equal across retailers. Additionally, some of them might not list all of the same details.

Aleksandras Sulzenko

Finally, there’s the possibility of producing logic trees. Descriptive features (e.g., phone) extracted from categories can be continually matched by other important aspects to create a logic tree (e.g., phone -> iPhone -> iPhone 12 -> iPhone 12 256GB, etc.).

Logic trees greatly reduce the likelihood of false positives but have the drawback of providing fairly few true positives. So, in the end, all methods are usually combined to maximize the probability of matching products.


Object recognition

An understudied area of ecommerce analytics is object recognition. AI of this sort made rounds online about a decade ago as it could separate cats from dogs, the internet’s favorite image source. Since then, significant strides have been made in the development of AI object recognition.

It could have its fair uses in product matching for ecommerce. Most retailers are heavily invested in high-quality images (or, in some cases, required to provide them) with clearly stated branding. A fair part of boxed products will have the product’s name listed on them with some potential for a description.

Machine learning models can derive fairly accurate descriptions of objects without any additional descriptors. Fine-tuned ones would be able separate objects out of categories, and ones dedicated to specific categories would be able to differentiate between objects within them.

Since most products, however, have descriptors added to the packaging or images, such as the aforementioned titles or other words, those can be also extracted. Marketing practices state that essential, differentiating information should be displayed most prominently, allowing a machine learning model to bypass other data collection methods.

Although there are some caveats, most prominently, not all products can be differentiated purely by image. For example, iPhone versions could be detected, but it’s impossible to extract in-built storage capabilities (i.e., 256 GB vs 512 GB) out of the image. Therefore, in some cases, other sources will have to be used.

Additionally, some products may be extremely similar between themselves (such as some of the IKEA range), which, even with a well-trained and adapted machine learning model, may be hard to detect outright.

Increasing accuracy

There are some inherent benefits in ecommerce product object recognition. Retailers have the incentive to create crisp quality images that clearly showcase specific products as it improves conversion rates.

In many cases, recognition deals with images of highly variable quality, angles, and primary object visibility. In ecommerce, many of these issues will be less prevalent due to the reasons outlined above.

Yet, there are still plenty of reasons to improve accuracy, as every percentage point will have a compounding effect in the long run. One of the options is to collect more data, which always works, however, it’s not the only way.

Data augmentation, the practice of tinkering with existing data to create new points, is perfectly suited for object recognition. Unlike text-based and numerical data, images can have nearly an infinite number of small changes while retaining the original intention.

Common examples of data augmentation include object occlusion, using photometric or geometric distortion (i.e., changing brightness, cropping, etc.), and superimposing two or more images on top of each other.

Object occlusion has shown promising results in making models more accurate at making predictions. The running theory is that by occluding certain parts of an object, the model is forced to focus on other parts to make a prediction, eliminating some possible skew.

Outside of object recognition practices, the model can be integrated with existing product matching systems. Each prediction can be matched in with an existing product in the database to see whether the specifications and other details also align.

For example, a prediction about a specific type of iPhone might turn out to be erroneous because the dimensions of that model don’t add up. In other words, there are some “hard facts” about products that never change. They can be used to hedge predictions to ensure that the system comes up with a higher level of accuracy.

So, it may seem like simply another method that could be viable at detecting and matching products. Yet, there is something important in relation to machine learning and web scraping.


Web scraping builds models

One of the hardest parts, if not truly the most complicated, is getting all the data that’s needed for a machine learning model. Typically, you’d have to scrape thousands of pages, label data, and keep feeding it into the algorithm.

But wherever pricing intelligence is already in place, the data is readily available. All the other methods of product matching rely on procuring the data that could be easily used to build a machine learning model.

As such, the resource costs associated with creating one are minimized. There still would have to be some sort of labeling involved, however, even that could be automated. After all, the products are already matched, so the desired output is known.

Since web scraping almost always downloads the entire HTML and parses it through to deliver the necessary data, downloading images to feed into the algorithm isn’t much of a change to the regular course of action. One word of caution, however, is that image delivery would greatly increase traffic costs for proxies, which can affect overall operation costs.

Therefore, the supplementary model is almost already available. Most of the hard work required to create one is done by the requirements of pricing intelligence. As such, gathering a dataset for implementing object recognition for price and product matching is much simpler than it may seem at first glance.



Product matching is likely one of the most complicated tasks allotted to ecommerce analytics. While rarely used, object recognition is one way to increase the likelihood of true positive detection.

One question remains – how much should one trust the model’s output? Unfortunately, I believe there’s no easy decision as accuracy is dependent on so many factors that giving a blanket answer is meaningless.





Most Popular