As we already knew, hume.ai can match emotions in human voice, but it also can perform additional functions: measure facial expressions and classify communication.

Hence below an updated overview of its three main pillars, along with examples.


1. Emotion detection

Interpret emotional expressions and generate empathic responses:

Categorizing the emotions:


2. Measuring facial expressions


3. Classifying communcation

Example 1: classify meeting call “attentive vs distracted”:

Example 2: analyse video “self-confident vs self-doubting“:

A full list of models (Models · Hume AI):

  1. Song Genre Prediction – Predicts the genre of a song using recordings from the GZTAN dataset. The dataset includes 100 samples across 9 genres, excluding classical.
  2. Parkinson’s vs. Non-Parkinson’s – Determines the likelihood of a Parkinson’s diagnosis based on voice recordings, using data from a 2017 study by Dimauro et al.
  3. Self-Confident vs. Self-Doubting – Classifies individuals as self-confident or self-doubting using publicly available video clips.
  4. Alert vs Drowsy – Identifies whether a driver is alert or drowsy through video clips of real drivers and actors.
  5. Attentive vs Distracted – Detects attentiveness in individuals during video calls, trained on recordings of meetings and online classes.
  6. Father vs Not Father – Classifies individuals in a paternity test announcement scenario from a reality TV show based on reactions.
  7. Good vs Bad Call – Assesses customer service calls as good or bad using data from Lawyer.com.
  8. Toxic vs Not Toxic – Evaluates the toxicity in speech by video game streamers, identifying content as toxic or non-toxic.
  9. Best vs Worst Baker – Predicts the performance of amateur bakers in a reality TV show, determining if they received the ‘best’ or ‘worst’ award.
  10. Depressed vs Non-depressed Mood – Predicts depression based on public video diaries, including various clinical and personal sources.