Tiny Machine Learning – The Next AI Revolution

Tiny Machine Learning - The Next AI Revolution


Over the past decade, we have witnessed the size of machine learning algorithms grow exponentially due to improvements in processor speeds and the advent of big data. Initially, models were small enough to run on local machines using one or more cores within the central processing unit (CPU).

More recently, we have seen the development of specialised application-specific integrated circuits (ASICs) and tensor processing units (TPUs), which can pack the power of ~8 GPUs. These devices have been augmented with the ability to distribute learning across multiple systems in an attempt to grow larger and larger models.

The hierarchy of the cloud.

Tiny Machine Learning

Tiny machine learning (tinyML) is the intersection of machine learning and embedded internet of things (IoT) devices. The field is an emerging engineering discipline that has the potential to revolutionise many industries.

The main industry beneficiaries of tinyML are in edge computing and energy-efficient computing. TinyML emerged from the concept of the internet of things (IoT). The traditional idea of IoT was that data would be sent from a local device to the cloud for processing.

The most obvious example of TinyML is within smartphones. These devices perpetually listen actively for ‘wake words’, such as “Hey Google” for Android smartphones, or ‘Hey Siri” on iPhones.

“Hey Siri” and “Hey Google” are examples of keywords (often used synonymously with hot-word or wake word). Such devices listen continuously to audio input from a microphone and are trained to only respond to specific sequences of sounds, which correspond with the learned keywords. 

The visual wake words is a binary classification of an image that something is either present or not present.  For example, a smart lighting system may be designed such that it activates when it detects the presence of a person and turns off when they leave. 

Machine Learning use cases of TinyML

How it works?

TinyML algorithms work in much the same way as traditional machine learning models. Typically, the models are trained as usual on a user’s computer or in the cloud. Post-training is where the real tinyML work begins, in a process often referred to as deep compression.

Diagram of Deep Compression Process

Model Distallation & Pruning

The model is then altered via pruning and knowledge distillation to create a model with a more compact representation.

Model Distallation

This process is used to enshrine the same knowledge in a smaller network, providing a way of compressing the knowledge representation, and hence the size, of a neural network such that they can be used on more memory-constrained devices.


Pruning helps to make the model’s representation more compact. The smaller neural weights are removed whereas larger weights are kept due to their greater importance during inference. The network is then retrained on the pruned architecture to fine-tune the output.


The model is the quantised post-training into a format that is compatible with the architecture of the embedded device. This helps to reduce the storage size of weights by a factor of 4 without affected the accuracy.

Huffman Encoding

This is an optional step that is sometimes taken to further reduce the model size by storing the data in a maximally efficient way.


Once the model has been quantised and encoded, it is then converted to a format that can be interpreted by some form of light neural network interpreter like TF Lite or TF Lite Micro.

The model is then compiled into C or C++ code and run by the interpreter on-device.

The Next AI Revolution

The ability to run machine learning models on resource-constrained devices opens up doors to many new possibilities.

Developments may help to make standard machine learning more energy-efficient, which will help to quell concerns about the impact of data science on the environment. In addition, tinyML allows embedded devices to be endowed with new intelligence based on data-driven algorithms, which could be used for anything from preventative maintenance to detecting bird sounds in forests.

Source: Matthew Stewart, PhD Researcher: Tiny Machine Learning: The New AI Revolution


This website uses cookies to ensure you get the best experience.