The scenario that shopping without checkout lines has been described multiple times since the introduction of radio frequency identification (RFID) technology and many people believed if that could come true, it must be RFID that enables it. However, Amazon Go will demonstrate to us that other emerging technologies are going to realize it first, due to the significant improvement of deep learning, computer vision and sensor fusion, as announced on 5th December 2016. This leads us to consider which technology will be the future in the retail industry: computer vision, or RFID?
What is Amazon Go?
Amazon Go, located at 2131 7th Ave, Seattle, WA, USA, currently only open to Amazon employees in its Beta program and will open to public in early 2017, is a new kind of store without checkout lines. Customers are expected to use the Amazon Go App in their smart device and scan the 2D barcode to enter the store and then they can enjoy shopping. Anything taken from the shelves will be automatically added to the customer's virtual cart. When the customers want to leave the store, they can just walk out, with their Amazon accounts automatically being charged and a receipt will be sent to the device.
The "just walk out" technology, used by Amazon Go, combines machine learning, computer vision, AI and sensor fusion technologies, much like those used in self-driving cars. It will be available to members only and no cash or credit cards will be accepted. This approach is quite different from RFID. The latter was raised as early as 15 years ago for this kind of application but has been experiencing difficulties in deployment due to the much higher costs of tags compared to barcodes printed on packages which cost almost nothing.
RFID was considered to be an exciting solution for no-checkout-lines stores
RFID is the use of radio waves to identify and track objects in a form of wireless communication. RFID systems have readers and tags that communicate with each other by radio waves. In an RFID system, a tag is attached to an item that is to be tracked. The tag is usually made of tag-chip (also called IC) connected to an antenna. The tag chip contains memory which stores the product's electronic product code (EPC) and other variable information. An RFID reader, however, is a network-connected device, which can be fixed or mobile, with an antenna that sends power, data and commands to the tags.
Figure 2 An RFID system
RFID has already been largely used in shops for anti-theft and security. There are different kinds of tags. One type is security tag as shown in the insert of the photo. They are removed or deactivated when the items are bought; otherwise a detection system (shown in the photo) sounds an alarm when it senses active tags to prevent shoplifting.
RFID can also be used at the checkout, where tags mounted on multiple items will be read at one time. Besides some technical challenges such as the number of tags to be read, read distance and accuracy, the biggest obstacle is the costs of tags. Compared with printed barcode which cost almost nothing, typical RFID tags cost from several cents to tens of cents today.
As early as the mid-2000s, IBM already pictured the application of RFID in future supermarkets: no lines, no checkout (https://www.youtube.com/watch?v=eob532iEpqk&feature=youtu.be). But this project did not have any follow-ups.
Figure 1 Anti-theft system widely used in shops are based on RFID technology
How Amazon Go system works
While RFID is taking off, albeit much slower than the initial expectation, Amazon Go is already starting to deploy the technologies used in autonomous driving into retail. So far Amazon only disclosed they combined deep learning, computer vision and sensor fusion without any other details.
It is not sure what detailed technologies are used in the store. But two patents filed by Amazon in 2013 and 2014 can help us to better understand it. They are "Detecting item interaction and movement" and "Transitioning items from the materials handling facility".
The former one describes "a system for tracking removal or placement of items at inventory location with a materials handling facility". The inventory management system may detect item removal and update a user item list associated with the user. If the item is placed back, the system will update the list to remove that item as well.
The latter one describes a system that identifies items that are picked up by a customer and automatically associate the items with the customers at or near the time of the item pick. "When the user enters and/or passes through a transition area, the picked items are automatically transitioned to the user without affirmative input from or delay to the user."
According to the patents, Amazon Go system can be divided into three parts as shown in the table below. IDTechEx also speculates how it works as following
After the customer enters the store, or the materials handling facility, the inventory management system can identify the customer via facial recognition, or user ID cart, or GPS. By identifying the customer, their information such as item retrieval history, view history and purchase history can be retrieved from a data store.
The customer can hold a portable device, in most cases, a smartphone, which has at least a wireless module to facilitate communication with the inventory management system. The smartphone can also be used to identify the customer. In some cases, other input/output devices such as projectors, displays (212 in the graph), speakers and microphones can also be used to facilitate communications between the customer and the inventory management system via the network created by wireless antennas.
Many image capture devices such as cameras are mounted on the ceiling, on the walls, or on the shelves. Cameras can be RGB cameras, depth sensing cameras or others. In addition, multiple sensors such as pressure sensors, infrared sensors, scales, volume displacement sensors, light curtains, etc. are used as well.
In order to understand how the Amazon Go system works, it is important to understand that the core technologies of the system should be able to tell "who" did "what" to "which products". Therefore, it can be split into 3 parts: motion recognition, item recognition and human identification.
What? Motion recognition:
Image sensing devices take pictures of hands when they pass through a vertical plane set on the shelf. By comparing the images that the hands enter the shelf and leave the shelf, the management system can tell whether it is a behaviour to remove an item or place an item. In addition, a pressure sensor and/or a scale can be used to detect when an object is removed and/or added from the shelve.
Which product? Item recognition:
Firstly, as the item was placed at a particular location on the shelf, in most cases the sensors mounted at a particular shelf location can tell which product is picked up.
Also, the computer vision system can tell which product is picked up. After Amazon acquired SnapTell in 2009, the image recognition technology has been developed to identify a large number of Amazon products. The technology was integrated into Amazon's app "Showrooming", by which customers can take a picture of a product in a local store to get a price comparison immediately. with a large number of product image data, the recognition accuracy can be very high. Even if the item cannot be recognized by computer vision, image recognition can reduce the number of possible products, e.g. at least be able to tell it is a bottle of orange juice or yogurt. Then pressure sensors, scales and infrared sensors can help to distinguish the inventory items.
In rare situations, customers can use the control to update their item list.
Who/human recognition:
It is highly likely that human recognition is realized by the customer's location information provided by his/her portable device with a wireless module. When the shelf can be divided into multiple small areas, if a customer standing within the small area picks up or places back an item, then it is this customer that is associated with the product. An infrared sensor can be used to distinguish between a customer's hand and inventory items.
Computer vision, or RFID, that is the question
In a clear difference from the initial IBM vision, the future stores that are realized by Amazon are based on computer vision, deep learning and sensor vision, instead of RFID technologies. It is widely known that the major barrier for RFID solution is the high costs of RFID tags including the costs for adding tags to item packages and continuous tag costs as they are difficult to recycle. Then how about the computer vision solution?
As the Amazon Go system uses deep learning, it needs to learn the different products in the store, which sets a big barrier if a third-party company wants to use the same technology. The shelves of the third-party store have to be modified and sensors, cameras are required to be installed on or near the shelves. The initial capital investment for deploying the new technologies is high although it can reduce future labour costs. In addition, the third company has to open its own inventory to Amazon for information training. Still, the new solution has only been tried by Amazon employees so far, there may be potential issues when it is open to different people, at different regions. Those barriers may prevent other retailers to adopt similar technologies unless they are addressed properly and proved to work properly.
However, it seems that Amazon wins the first battle between computer vision and RFID and it shows a good example to combine many fancy emerging technologies to position Amazon as a high-tech company. It seems RFID will continue to play an important role in warehouses, as well as some retail such as in apparel, but it will still be difficult to be used for common consumer goods. Hopefully, computer vision can be a workable future solution instead.
Top image: Wikipedia