« Back to Glossary Index
  • CPUCentral Processing Unit. Manage all the functions of a computer. Handles all the logics, calculations, and input/output of the computer, it is a general-purpose processor.
    Some CPU vendors: Intel, AMD, ARM.
  • GPUGraphical Processing Unit. Enhance the graphical performance of the computer. An additional processor to enhance the graphical interface and run high-end tasks.
    Some GPU vendors: Intel,NVidia, AMD.
  • TPUTensor Processing Unit. Custom build ASIC to accelerate TensorFlow projects.
    Powerful custom-built processors to run the project made on a specific framework, i.e. TensorFlow.
    Some TPU vendors: Google,Huawei, Alibaba.

CPU Features Summary:

  • Has Several Cores
  • Low Latency
  • Specialized in Serial Processing
  • Capable of executing a handful of operations at once
  • Have the highest FLOPS utilization for RNNs (recurrent neural network)
  • Support the largest model thanks to its large memory capacity
  • Much more flexible and programmable for irregular computations (e.g., small batches non MatMul computations)

GPU Features Summary:

  • Has thousands of cores
  • High throughput
  • Specialized for parallel processing
  • Capable of executing thousands of operations at once

TPUs Features Summary:

  • Special Hardware for Matrix Processing
  • High Latency (compared to CPU)
  • Very High Throughput
  • Compute with Extreme Parallelism
  • Highly-optimized for large batches and CNNs (convolutional neural network)

When To Use CPU, GPU, Or TPU To Run Your Machine Learning Models?

CPUSeveral core , Low latency , Serial processing , Limited simultaneous operations , Large memory capacity .

GPUThousands of Cores , High data throughput , Massive parallel computing , Limited multitasking Low memory .

TPUMatrix based workload , High latency, High data throughput , Suited for large batch sizes, Complex neural network models .


  • Prototypes that require the highest flexibility
  • Training simple models that do not require a long time
  • Training small models with small effective batch sizes
  • Mostly written in C++ based on custom TensorFlow operations
  • Models with limited I/O or limited system’s networking bandwidth


  • Models that are too difficult to change or sources that do not exist
  • Models with numerous custom TensorFlow operations that a GPU must support
  • Models that are not available on Cloud TPU
  • Medium or larger size models with bigger effective batch sizes


  • Training models using mostly matrix computations
  • Training models without custom TensorFlow operations inside the main training loop
  • Training Models that require weeks or months to complete
  • Training huge models with very large effective batch sizes

Credit: Kinnera Kiran

« Back to Glossary Index