Robot Brain · 06 of 06

Why Robots Need to Learn From Mistakes

Real work is full of small failures — the robot has to notice, stay safe, recover, and do better next time.

14 min read

Robots will make mistakes.

They will miss grasps. Bump objects. Get stuck. Place a part slightly wrong. Misunderstand a scene. A humanoid may lose balance because the floor is uneven, the load shifts, or the foot lands where the robot did not expect.

The goal is not a robot that never fails. That is not realistic.

The goal is a robot that can fail in a small way, notice the problem, stay safe, recover if possible, and do better next time.

It sounds simple. It is one of the hardest parts of Physical AI.

A mistake is only useful if it becomes a learning signal

A failed attempt does not teach a robot by itself. A robot can drop a box a thousand times and learn very little if the system does not record the right details.

  1. 01Try to do something.
  2. 02Notice that the result is wrong.
  3. 03Stop or move to a safe state.
  4. 04Record what was seen, what was done, and what happened.
  5. 05Add a human correction or a useful label.
  6. 06Update how the robot acts.
  7. 07Test the same situation again.
The basic rule

Without that loop, a mistake is just a failure.

The real world creates mistakes training misses

A box may be dented. A handle may be loose. A shiny part may confuse a camera. A cable may move. A person may block the robot. A shelf may be slightly different from the one in training. A small object may slip in the gripper.

These are not rare details. They are normal life. A robot that only learns from clean examples can look good in a demo and still fail in normal work.

Small mistakes can become big mistakes

A small error early in a sequence can change everything that comes later. If the robot reaches from the wrong angle, the object may slide. If the camera view changes, the robot may choose a worse grasp. If that grasp fails, the robot may push the object farther away.

The DAgger paper explains this formally: if a learner makes a mistake, it may end up in states unlike the expert demonstrations, and more mistakes follow.

The basic rule

A robot trained only on good examples may not know what to do after its first bad move.

Robots need recovery, not just success

Many demos show the clean path. Real work also needs recovery. What if the robot misses the object by two centimetres? What if it picks up two items? What if a part is half-inserted and stuck? What if the humanoid shifts its weight and the box starts to slip?

A useful robot may retry, reset the object, change its grip, slow down, ask for help, or stop. Stopping safely is not a failure of intelligence — in robotics, it is often the right decision.

Why this is harder for Physical AI than digital AI

A chatbot can make a bad sentence. A robot acts with force. It can drop a part, scratch a panel, crush packaging, tear a cable, block a walkway, fall into a shelf, or pinch a person.

In software, you can run millions of tests cheaply. In robotics, each attempt takes time. Batteries drain. Motors heat up. Parts wear down. A failed action may damage the object.

The hard question is how to let robots make useful mistakes without making dangerous or expensive ones.

Four ways robots learn from mistakes

  1. 011. Trial and error

    Reinforcement learning. The robot tries actions and receives a score. Powerful, but needs limits — open-ended trial and error is expensive and unsafe on real hardware.

  2. 022. Self-supervised learning

    The robot creates labels from the result of an action. Pinto and Gupta collected 50,000+ grasp trials over 700 robot hours. Google scaled to ~800,000 grasp attempts across multiple arms.

  3. 033. Human correction

    A person watches and steps in. The correction becomes data. Sirius reported a 27% real-hardware success-rate improvement. HIL-SERL reached high success rates after 1–2.5 hours of training.

  4. 044. Retry and reset skills

    Sometimes the policy is fine — the robot just needs a recovery move. FLARE argues that success-heavy training leaves VLA models brittle after common errors, and proposes retry/reset mechanisms.

Humanoids need this even more

A fixed arm only needs to solve reach, grasp, and placement inside a work cell. A humanoid needs to walk, balance, turn, carry weight, use both arms, avoid people, fit through human spaces, manage battery limits, and recover from slips.

The basic rule

A humanoid cannot be useful if it needs a human reset after every small problem. Learning from mistakes is part of autonomy, not a bonus.

What people often misunderstand

  1. Mistake 01

    Mistakes are automatically good data.

    A mistake is useless if the robot does not know why it failed. The same failure can have different causes — vision, force, timing, wrong grasp point. Without the right record, the robot learns the wrong lesson.

  2. Mistake 02

    Robots should learn by breaking things.

    No. Real-world trial and error must be bounded — simulation, cheap objects, slow motion, test cells, supervision, force limits, emergency stops, safe fallbacks.

  3. Mistake 03

    A recovery demo proves autonomy.

    It shows recovery worked once. Broader autonomy needs how many failure types were tested, how often recovery worked, whether it transferred, and whether human intervention dropped over time.

  4. Mistake 04

    Human help means the robot is not learning.

    Human help can be a crutch or a teacher. The honest metric is whether the need for help goes down while safety stays high.

Evidence from the real world and research

…is hard.
  1. Large-scale grasping (Pinto/Gupta, Google)

    Strong evidence that failed grasps become useful training data — for narrow manipulation. Not proof of humanoid work.

  2. DAgger (Ross/Gordon/Bagnell)

    A core idea: train on the states the robot actually reaches, including ones caused by its own mistakes. Don't train only on the clean path.

  3. Sirius / HIL-SERL

    Strong research evidence that correction loops improve policies on real hardware. Mostly lab and research tasks.

  4. RoboCat (DeepMind)

    A trained model generating data for later training iterations. Promising — not proof of safe, self-improving humanoid autonomy.

  5. DARPA Robotics Challenge

    Remembered for falls. The deeper lesson is the post-event analysis: why the fall happened, what the operator did, what the controller expected, what should change.

What is still hard

  • Detecting failure early — catching the slow slip before the crash.
  • Knowing why the failure happened — vision, planning, timing, force, wear, or assumption.
  • Learning without copying bad behaviour — different operators solve tasks differently.
  • Rare failures are hard to collect — but they are often the dangerous ones.
  • Recovery can create new risks — sometimes the best recovery is to stop.
  • Improvement must be measured — failure rate, intervention rate, recovery success, downtime.
The basic rule

More data is not always better. Better-used data is better.

A simple test for any claim

When the robot gets it wrong, does the system actually learn — or just try again?

  • What counts as a mistake?
  • How does the robot detect it?
  • What does the robot do to stay safe?
  • What data is recorded?
  • Who or what gives the correction?
  • How is the policy updated?
  • What measured result improved afterward?
Robots need to learn from mistakes because the real world will not stay inside the training set.
So why does this matter so much?
Because a robot that cannot recover will always need people nearby. A robot that can learn from small, safe failures has a path to becoming more useful over time. Learning from mistakes is not a slogan — it is a feedback loop.
What to remember
  • Robots will make mistakes — the question is whether they can learn from them.
  • A mistake is useful only if it becomes a clean learning signal.
  • Long tasks make small errors worse because mistakes can compound.
  • Robots need recovery skills, not just success skills.
  • Real-world trial and error must be bounded — physical mistakes can cause harm.
  • Human correction can be a valid training method when used openly and measured.
  • More data is not always better — bad failure data teaches the wrong lesson.
  • Humanoids need mistake learning because whole-body work creates many more ways to fail.
Key terms
Mistake
A wrong action or result — missed grasp, bad placement, slip, collision, or unsafe plan.
Failure
A mistake that prevents the task from finishing or creates a safety, quality, or reliability problem.
Recovery
What the robot does after something goes wrong — retry, reset, ask for help, or stop.
Reinforcement learning
A way for robots to learn through trial and error using rewards or penalties.
Self-supervised learning
A way for the robot to create training labels from the result of its own actions.
Human-in-the-loop
A person monitors, corrects, labels, or approves robot behaviour during learning or deployment.
Correction
A human or system action that shows the robot what to do instead when it is wrong.
Policy
The part of the robot system that chooses actions.
Compounding error
A small early mistake that pushes the robot into a worse situation, causing more mistakes later.
Distribution shift
When the robot sees situations different from its training data.
Failure data
Records of failed or near-failed attempts, including what the robot saw, did, and what happened next.
Safe exploration
Letting a robot try new actions while limiting harm to people, hardware, and the environment.
Intervention rate
How often a person has to step in. Lower is usually better — if safety and quality stay high.
Sources and evidence notes
Evidence

What this essay leans on

ClaimEvidenceStrengthNote
Small errors compound in sequential tasks.Ross, Gordon, Bagnell — DAgger, AISTATS 2011.StrongConceptual and theoretical evidence.
Trial-and-error learning works in robotics but needs care in the real world.Kober, Bagnell, Peters RL survey; OpenAI safe exploration paper.StrongStandard references.
Failed grasp attempts can train robots at scale.Pinto/Gupta (50k+ trials, 700 hrs); Google Research (~800k attempts, then 900k+).StrongNarrow grasping research.
Human-in-the-loop correction improves policies.Sirius (IJRR) — 27% real-hardware improvement; HIL-SERL (2024/2025).StrongResearch evidence; not commercial proof.
Self-improvement loops are being explored.DeepMind RoboCat, 2023.MediumResearch; not proof of open-ended autonomy.
Failure recovery is an active research target.FLARE, CVPR 2026.MediumTask-limited research.
Engineering analysis of failures improves robotics.DARPA Robotics Challenge analysis (Atkeson et al.).StrongHistorical engineering lessons.