All Episodes

June 2, 2017 • 32 mins
On March 28, 1979 Unit 2 of the Three Mile Island Nuclear Plant in the United States of America an incident would lead to a partial reactor core meltdown. Many blamed the operators for stopping the reactor cooling system but the real root causes showed a known flaw in the design and alarm flooding had blinded the operators to what was actually happening.
With John Chidgey.

Related episodes:

Links of potential interest:


Support Causality on Patreon

Episode Gold Producer: 'r'.
Episode Silver Producers: Carsten Hansen, Eivind Hjertnes and Daniel Dudley.
Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:00):
[Music]
Chain of events. Caused and effect. We
analyze what went right, and what went
wrong, as we discover that many outcomes
can be predicted, planned for and even
prevented. I'm John Chidgey and this is

(00:23):
Causality. Causality is part of The
Engineered Network. To support our shows
including this one, head over to our
Patreon page and for other great shows
visit https://engineered.network/ today. "Three
Mile Island" This is the first in a
series of episodes with a focus on
control system contributions to

(00:44):
disasters. Built on a sandbar in
Pennsylvania in the middle of the
Susquehanna River between 1968 and 1970
the Three Mile Island Nuclear Plant
consisted of 2 reactor cores, both
being a Pressurized Water Reactor design.

(01:05):
The reactors themselves were designed
and built by Babcock & Wilcox, and had
many reactors installed around the
United States at that time, and it was
operated by General Public Utilities
whose parent company was Metropolitan
Edison. The energy shortages and the
energy crisis of the early 1970s where

(01:26):
oil prices jumped from $3USD a barrel to
$30USD a barrel led to fuel shortages
across the United States and that had
driven utilities to the lure of cheap
nuclear energy. A large number of
reactors were built in a relatively
short period of time and the Nuclear
Regulatory Committee had difficulty

(01:48):
keeping up with the demand for
certification and compliance of all of
these new reactors. The designers had
been producing several proof-of-concept
plants in the hundred-megawatt range and
then, once they'd proven them, scaled them
to nearly 1GW with essentially
the same design, scaled up with little

(02:08):
proof during operation at full size.
Nuclear reactors are
essentially big steam engines. The
nuclear fuel rods have a chain reaction
that is slowed down by carbon control
rods that absorb neutrons that are
released from fission of those fuel rods
and heat is withdrawn (or extracted) from
the reactor core, by passing very clean

(02:30):
water through the reactor. The name
explains the basis of a Pressurised
Water Reactor design. The primary coolant
is kept under a higher pressure to stop
the cooling water from boiling and
turning into steam. Hence pressure
control is vital and a safety system to
prevent over-pressurization are
essential that they function correctly

(02:51):
to ensure the cooling remains under
control. The water in the clean water
circulation loop needs to be kept
extremely clean or it will damage and
prematurely wear the pipework inside
the high pressure and high temperature
sections of the boiler. In this design
each unit had 8 condensate polishers,
that filtered the clean water condensate

(03:14):
before being circulated back through the
high-temperature section of the boiler
or more specifically the steam generator
section of the loop. The secondary cooling
loops purpose was to be the heat
exchanger with the primary loop, with
waste heat evaporated through huge
cooling towers, which are commonplace in
any thermal electricity generating plant
that isn't alongside the ocean. The first

(03:37):
unit at Three Mile Island was capable
of generating 852MW and it came
online on the 19th of April, 1974,
followed by a second unit capable of
generating 906MW on the 30th of
December, 1978. Three Mile Islands Unit
2 had been operating for close to a

(03:59):
year but only came online commercially
for about 3 months when on Wednesday
the 28th of March, 1979 Unit 2 would
have a partial meltdown. Unit 1 at the
time was offline and shut down for
refueling. At approximately 5:30pm on
Tuesday, the day before, in the early

(04:20):
evening plant operators had attempted to
rectify a blockage in one of the
aforementioned condensate polishers.
The usual practice of clearing the resin
from the filter they used to use
compressed air, however in this case the
blockage was severe enough that this was
unsuccessful so the operators instead
chose to connect the compressed air to a

(04:43):
water line and then use the additional
water pressure generated by the airs
back pressure to force the resin out.
This turned out to be successful. The
unit was returned to normal operation
and no one thought anymore of it.
At 4:37am Eastern Standard Time, Unit
2 secondary loop, which is the second

(05:04):
of 3 steam water loops, lost its
circulating water flow following a
series of valves that had tripped shut.
This led to an increase in the
temperature of the primary coolant
beyond a safety shutdown temperature
setpoint. This then caused the primary
reactor to shut down with a S.C.R.A.M. and
the pilot-operated relief valve opened

(05:27):
as designed, to briefly reduce the
pressure inside the vessel. The high
pressure injection pumps then
automatically injected top-up water into
the reactors primary coolant systems, as
per design. Operators noticed that the
level in the pressuriser was rising from
the level indicator in the pressuriser.

(05:48):
This was the only indication of the
reactors cooling water level and
although it was not a direct measurement
it was rather an indirect measurement
from a system of pipe work that was
normally hydraulically linked. Operators
were trained to ensure reactor coolant
wasn't overfilled because if it was,
there was a possibility of vessel

(06:08):
rupture. By this time the primary coolant
pumps were trying to pump both steam and
water due to the incident and since they
can only pump fluid, cavitation became
severe resulting in large knocking and
vibration of the primary coolant pumps.
For these reasons the operators decided

(06:29):
to override and stop the primary coolant
pumps from circulating water, believing
that the level in the pressurizer was a
correct reading of reactor coolant water
level, and
to protect the coolant pumps from any
damage. This ended forced cooling of the
reactor core as the decay heat continued to
build following the S.C.R.A.M. Refer to

(06:52):
episode 3 of this show about
Fukushima for the discussion about decay
heat and S.C.R.A.M.s. By 6:00am there were
about 50 people in the control room
trying to figure out what had happened,
with one of the operators that had
entered at that time was called into the
control room and they examined the
readings and concluded that the

(07:13):
pilot-operated relief valve had not
closed as it was only supposed to
momentarily open, but for some reason
must still be open. At 6:22am a block
valve was manually closed to stop the
loss of coolant water through the faulty
stuck-open pilot-operated relief valve.

(07:34):
In the intervening 105 minutes, so much
water and steam had been lost through
the pilot-operated relief valve that the
high-pressure steam had formed, creating
gas locks in sections of pipe work and
preventing convection cooling of the
reactor core. With no forced cooling
occurring the reactor temperature

(07:54):
continued to climb. At 6:57am a plant
supervisor declared a site area
emergency shortly after radiation was
detected in the control room. At 7:25am
Station Manager Gary Miller
declared a general emergency which is
defined as potential for serious
radiological consequences to the general

(08:15):
public.
There were only 2 phone lines into
Unit 2s control room both of which
were constantly in use during the
incident, and a huge quantity of incoming
calls and no direct line was actually
available to the Emergency Response
Center or to the engineers that had
designed the plant. Instead

(08:37):
representatives from Babcock & Wilcox
had been unable to get through to the
control room of Unit 2 but they were
able to get through to Unit 1s control
room and they had a runner, running
messages between the two buildings
between Units 1 & 2, relaying
instructions,
getting printouts, and then running those
printouts back to Unit 1 to the phone

(08:58):
connection that was open to them. By
mid-afternoon operators gradually
recommenced high-pressure injection of
water into the reactor cooling system at
Babcock & Wilcox's direction, in an
attempt to increase pressure and force
any steam and gases back into the
solution. Without this step the primary

(09:22):
cooling pumps would not be able to pump
and would cavitate as they had earlier
in the day and at 7:50pm that day,
some 16hrs after the incident had
begun, the designers instructed the
operators to begin circulating water
through the reactor once again. Once the
operators did this, the temperatures
began to drop...then the

(09:42):
pressures began to drop as well. Over the
following 2 days the gas build-up from
that incident incrementally accumulated
in the make up tank of the auxilary
building, and the operators used a
combination of compressors and pipe
reconfiguration to move out as much of
that gas as possible to the waste gas
decay tanks. Unfortunately the

(10:03):
compressors did not reliably seal and a
quantity of radioactive gas was released.
The following morning it was reported
that there was a radioactive gas release
and an evacuation plan was suggested for
the immediately affected area.
It wasn't until 10:00am that the actual
amount of gas release was informed to

(10:24):
the governor. The governor recommended
that pregnant women and school-aged
children
evacuate a 5mi radius from the
Three Mile Island plant. This set off
somewhat of a panic. Before reaching the
environment the gases had passed through
a high-efficiency, particulate air filter
sometimes called a HEPA filter, as well

(10:47):
as an activated carbon charcoal filter
set. This filtration captured all of the
radionuclides with the only exception
being noble gases. The quantity of gas
released was not metered directly.
Estimates however following the incident
ranged from as little as 1.6PBq
(Peta-Becquerels) to a maximum of 480PBq.

(11:10):
1 Peta-Becquerel is 27,000 Curie's.
These figures are radioactive decay
events not dose absorption figures. The
average dose after the incident was
estimated from this gas released as an
average of 8 millirems per person with
a single maximum likely dosage of 100

(11:32):
millirems or 1 milliSievert. An average
background radiation dose in the United
States is about 360 millirems per
person per year, or 3.6 milliSieverts per
person per year. The noble gases released
had very short half-lives. Weren't

absorbed by plants or animals (11:52):
so-called
biologically inert and did not cause an
increase to the background radiation
dosage levels in the immediate or
extended area around the Three Mile
Island plant. On the 30th of March and
the 1st of April
an increase in pressure caused by the
exposed Zirc-alloy reaction (again refer

(12:14):
to Episode 3) at higher temperatures
creates a Hydrogen bubble above the
reactor on top of the containment vessel.
On Saturday morning some of the
calculations suggest that a Hydrogen
explosion was an imminent possibility
and these were being seriously discussed
by the response personnel, by...
late Saturday afternoon the possibility

(12:35):
of an explosion was leaked to the press
setting off a new wave of panic.
Operators however bled off the Hydrogen
build-up gradually by briefly opening
vent valves on the pressurizer,
periodically over several days until the
pressure had subsided. At the time there
were great fears the bubble could cause
an explosion however the pressure was

(12:56):
never allowed to get high enough and the
amount of Oxygen required to reach the
Lower Explosive Limit, was nowhere near
the required level for an explosion to
take place. In an attempt to calm panic,
the President of the United States at
the time, Jimmy Carter, toured the
facility 4 days after the incident
had occurred. The tour group he was a

(13:18):
part of was
protected only by radiation boots, to
prevent radioactive water from being
absorbed into their shoes and feet.
Following this incident, lead bricks were
brought in to surround the base of the
reactor and the Hydrogen build-up was
gradually bled-off and contained. The
pressure vessels' pressure was reduced to
normal operating conditions. By the 27th

(13:42):
of April the decay heat had subsided
enough, such that natural convection flow
of cooling water was now possible and
the plant was in a cold shutdown. With
water now below boiling point at
standard atmospheric pressure.
It wasn't until 3 years after the
incident that a camera was able to be
lowered safely into the reactor core to

(14:03):
determine the full extent of the damage
from the incident. They found that 5ft
from the top of the reactor core
had melted away. That's about 1.5m.
Nearly half of the reactor had partly or
fully melted down and had...pooled at
the bottom head of the pressure vessel

(14:25):
in the reactor, where it now lay, solidified.
Approximately 19 tonnes of core material
in total had melted and flowed to the
bottom. 62 tonnes had partly or fully
melted which is 45% of the entire
reactor core. The reactor core of Unit 2

(14:45):
was within 30min of a complete
meltdown. Had a full meltdown occurred
it would have become so hot, the entire
core would have become a molten blob of
metal with self-sustaining heat melting
its way through the vessel, concrete
foundations and bedrock. Had it

(15:05):
progressed to a full meltdown there's
little doubt that the sand and water
layer beneath the plant would have
turned into a superheated radioactive
steam, sending a huge amount of radiation
through the water table and the local
area and atmosphere surrounding the
plant. Some disaster projections
suggested it had the potential to wipe

(15:26):
out an area from Washington DC to New
York City, although that eventuality is
hotly debated
by the nuclear industry. So what went
wrong at Three Mile Island? There were
both technical errors and human errors.
The trigger event was actually a mistake
introduced the previous night. In the

(15:49):
late afternoon of the preceding day when
the operators had attempted a non-
standard procedure to clear the resin
blockage in one of the...
filters, the position of the air line
and the water line was very difficult to
physically access. It's not entirely
clear if it's long-term connection was
intended or accidental however the

(16:10):
process had allowed an amount of water
to enter the instrument air-line.
Instrument air is used actuate valves:
control valves for a multitude of
reasons. The primary being that air can
be directed at a valve manifold and the
very low current and low voltage relay
can signal to open or close the valve or

(16:31):
move it to a position using the air as
the primary motive force to move the
valve physically. It's cleaner and
simpler than hydraulic valves because it
doesn't leak in the same way and leave
mess on the floor and it doesn't get as
hot nor does it require thick cabling or
take up as much physical space as an
all-electric actuator.

(16:53):
Unfortunately instrument air has a
rather fatal flaw, and that is moisture.
If too much moisture enters the valve
manifolds they will either actuate
without being directed to do so or they
will cease to actuate when they are
commanded to do so. In the case of Three
Mile Island a series of valves on pipe-

(17:14):
work connecting the feedwater pumps,
condensate pumps and the condensate
booster pumps all failed in quick
succession, with several key valves all
slamming shut...quickly. And this caused a
cessation of the secondary cooling water
flow into the primary vessel and
initiated the chain of events. Once the

(17:37):
chain of events had been set in motion
though, there were automated systems
designed to prevent a loss of cooling to

the reactor as you'd expect (17:43):
it's a
nuclear reactor!
Basically the plant operators and
managers overrided the automatic safety
equipment.
It was those overrides that led to the
reactor core meltdown. Superficially
though it's easy to blame plant
operators for the Three Mile Island

(18:03):
incident. "Blame the operator" right? The
truth is that there was a long list of
reasons why they got it wrong. The actual
real root causes included contributions

from (18:16):
the utility company (Met-Ed), the
reactor vendor Babcock & Wilcox, the
architect engineer and the Nuclear
Regulatory Commission. They were all
responsible, either in whole or in part,
for deficiencies in training, control
room design, instrumentation and equipment
selection, the overall plant design and

(18:38):
emergency and evacuation procedures. All
we'll be looking at, is the exploration
of the control system and equipment
selection. The control system in use was
a Bailey 855 Process Control Computer,
and had been widely used by Babcock &

(18:59):
Wilcox and their designs for nearly a
decade at that point. The Bailey 855 was
configured with Visual Annunciator
lights as well as a computer printout
from 1 of 2 printers. 1 for on-
request plant status and the other for
system alarms. Due to a limited physical
space in the annunciation system, many

(19:20):
alarms that were deemed to be less
critical only appeared on the computer
printout. The printers themselves were
electric typewriters, and they were not
high-speed though in...
this day and age, we'd refer to these as
printers, these were technically "Computer
Typewriters." And these computer

(19:40):
typewriters could print at most 14
alarms every minute. When the alarm rate
was greater than the printing rate the
system had a memory buffer and that
would hold those alarms until the
printer could catch up. During routine
plant trips the alarm printer, as
configured by the designers,
could actually take an hour to fully

(20:03):
print off all of the alarms that had
occurred during a routine plant trip. The
plant operators knew about this from
their experiences in Reactor 1, and
regularly ignored the alarm system from
the printer and instead relied solely on
the on-demand system status printouts, and
alarm annunciator lights. High Water Level

(20:25):
in the containment sump was one such
alarm that only appeared on the printout
and not on its own annunciator. Had the
operators received this alarm in a
timely and clear fashion, they would have
realized that a large amount of water
was escaping containment much earlier
and it's likely that the block valve
would have been closed much, much sooner,

(20:47):
preventing such a big loss of primary
coolant flow and most likely preventing
the meltdown entirely. The unit had 1,200
alarms configured. A few hundred went off
in the first minutes of the incident
alone. After the incident some operators

went on record stating alarms were (21:07):
"...not

very helpful..." and they simply (21:09):
"...got in the
way." They went on to say the day had

concluded prior to the incident that (21:15):
"...the
alarms would provide little, if any
immediate assistance..." when trying to
diagnose and prioritize actions during
an event. Poor instrument selection. The
reactor coolant drain tank indicators
weren't directly visible to the plant
operators from the main console in the
main control room. Worse than that there

(21:38):
were no strip chart recorders. This was
the days before graphical trend displays
on a computer screen, for the reactor
coolant drain tank conditions, this
included pressure, temperature and water
level. So there were no strip chart
recorders, for any of those. There were
no instruments that directly measured
the water level in the reactor vessel.

(21:59):
The level was intended to be surmised
from the water level in the pressuriser,
which during the incident could not have
been expected to give an accurate
reading, due to the plant conditions at
the time and didn't. Instrumentation
selection and
ranging for temperature and pressure
limits were designed primarily around
the normal operational envelope, rather

(22:20):
than extreme operating conditions like
those experienced during the time of the
incident. As a result of this choice most
of the instrumentation was flat-lined at
either maximum or minimum values and
operators essentially had no useful
information from which to attempt to
diagnose or resolve the situation. The

(22:42):
pilot valve. The pilot-operated relief
valve was found to have previously
failed on 11 occasions in the life of
that specific reactor. 9 of those
failures had failed in the Failed Open
position. Every Failed Open position had
resulted in a coolant leak within the
containment vessel. The exact failure

(23:05):
chain of events had in fact been
replicated, 1-1/2yrs before
the incident at Three Mile Island at
another Babcock and Wilcox reactor of
exactly the same design. In this instance
however operators determined the failed
open condition within 20min,
compared to 80min for Three Mile
Island. The Davis-Besse Nuclear Power

(23:26):
Station was only operating at 9% power
at the time its valve failed open, unlike
near full power (at 97%) in the Unit 2 at
Three Mile Island was producing at the
time of its incident. Babcock & Wilcox
did not clearly communicate this risk to
all of its customers that utilized their
reactor designs and not to Three Mile

(23:48):
Island prior to the incident. In addition
the valve itself did not have an
independent and direct indicator of its
position either open or closed. Its
position instead was inferred based on
its commanded output, and that is to say
the control system commanded the valve
to open and it displayed that the valve

(24:12):
was open on the control system. Which
technically is control and indication by
inference, rather than control by
feedback or control by fact. In
programming control systems for decades
I've learned
it is always better to program based
on fact, not presumption. Timers waiting

(24:34):
for events that could happen or might
not happen. Assuming valves open or pumps
start, without independent evidence
verifying that they have is potentially
dangerous. Modern safety systems require
direct indication of safety equipment
position and loss of that indication
when the plant is in use leads to alarm
conditions and in some extreme cases

(24:55):
will even trip shut the plant. In my
experience lack of equipment feedback is
predominantly driven by cost. Whether
it's an I/O count reduction with less
test burden, a simplification of the
design or more commonly just the cost of
the limit switches themselves on a valve,
is considered too exorbitant and

(25:16):
unnecessary. It's not clear what drove
Babcock & Wilcox's decision to not
provide position feedback in this
instance. In the aftermath of Three Mile
Island the exact radiation dosages that
individuals received as they were
experiencing and present during the
incident at the Three Mile Island

(25:36):
facility, is unknown since only 2 of
the 7 radiation monitors in the
plant were actually functioning. In
addition, many of the personal dosage
meters handed out weren't correctly
recorded during the lead-up to the
incident, and they weren't regularly
changed out, hence their state when they
were carried on people's person,
wasn't known when they were going in

(25:57):
hence the relative reading when they
came out and wasn't known either. Whilst
the maximum dose officially was
estimated at 1 milliSievert, a 100
milliSievert dose increases the
probability of radiation induced cancer
by 0.8%. A 1 to 2 Sievert dose will
increase the probability of a fatality

(26:19):
due to radiation dosage at up to 5%.
An 8 to 30 Sievert dose, will
increase the probability of fatality to
an essential certainty. The fallout from
the incident has not shown a significant
increase in the number of cancers or
infant mortality rate in the area
surrounding Three Mile Island. One of the

(26:39):
interesting coincidences surround
Three Mile Island incident was that the
movie "The China Syndrome," which was about
a nuclear meltdown had opened in the
local movie theater in Harrisburg on the
day of the incident. 2,000 gallons
of contaminated water was released into
the Susquehanna River as a result of the
incident. Radioactive rat droppings were

(27:00):
also found scattered throughout the
building following the events. General
Public Utilities said this wasn't an

issue because (27:06):
"...none of the rats had left
the island." Showing some indifference to
the fact there was no way to know that
for sure.
Following the incident legal
interventions were undertaken against
Met-Ed and GPU across multiple areas
including management competency. The
Atomic Safety and Licensing Board stated

(27:27):
the interventions had wasted time and
money and that said, 1 week after the
Atomic Safety and Licensing Board issued
General Public Utilities with a clean
bill of health for management competency,
2 operators were caught cheating on
their licensing tests and 4 operators
in fact failed the tests entirely. The
hearings were reopened to determine more

(27:47):
stringent tests and all operators re-sat
at these more rigorous tests, and still
half of them failed. Evacuation plans
were required to be drawn up in full
detail, and far more thoroughly reviewed
including correcting oversights such as
putting the nearby city...halfway across
its bridge. New safety and training

(28:08):
measures were introduced following the
incident for nuclear reactors throughout
the United States. The clean-up
following Three Mile Island Unit 2 took
just under 12yrs to complete at a
cost of approximately $973M USD.
On the 22nd of October, 2009 the

(28:29):
US Nuclear Regulatory Commission renewed
the operating license for Three Mile
Island Unit 1 until the 19th of April,
2034. Unit 2 however remains mostly
disassembled with its generator moved in
2 parts refurbished and reused, at the
Sheraton Harris Nuclear Plant in New
Hill, North Carolina. So what do we learn

(28:50):
from this? No matter what process plant
you're designing, it's critically
important to think about what to show an
operator under normal operating
conditions naturally, but more
importantly, what to show them under
abnormal operating conditions, and that
includes critical system events. Having

(29:13):
an up-to-date training simulator with
regular training and refresher training
sessions for all operators is crucial to
ensure operators know the right way to
respond when critical events occur.
Critical events generally and hopefully
don't happen very often, so people need
regular re-visits of how to handle them
correctly or we, as humans under pressure,

(29:35):
will forget and make mistakes.
Beyond that controlling by fact and not
by inference, is crucial. And finally
having an alarm system is one thing, but
filling it with nuisance alarms,
incorrectly prioritising those alarms, so

(29:57):
they all appear to be equally important,
not conditionally muting alarms...and
having incorrectly set up consequential

alarms (30:06):
all of these things contribute to
rendering the alarm system completely
useless.
Just like at Three Mile Island. They were
operating a nuclear reactor that could
kill tens of thousands of people if it
went wrong, with confusion,
misunderstanding and trying to control a

(30:29):
plant whose design had essentially left
them blind. It's a miracle we got off
that lightly. If you're enjoying
Causality and want to support the show

you can like some of our backers (30:42):
Eivind,
Daniel Dudley and Chris Stone. They and
many others are Patrons of the show via
Patreon and you can find it at
https://patreon.com/johnchidgey
so if you'd like to contribute something,
anything at all, it's all
much appreciated. Causality is part of
The Engineered Network and you can find
it at https://engineered.network/ and you can

(31:03):
follow me on Mastodon @chidgey@
engineered.space or for our shows on
Twitter and @Engineered_Net.
This was Causality. I'm John Chidgey. Thanks so
much for listening
[Music]

(32:20):
[Music]
[Music]
Advertise With Us

Popular Podcasts

Stuff You Should Know
24/7 News: The Latest

24/7 News: The Latest

The latest news in 4 minutes updated every hour, every day.

Crime Junkie

Crime Junkie

Does hearing about a true crime case always leave you scouring the internet for the truth behind the story? Dive into your next mystery with Crime Junkie. Every Monday, join your host Ashley Flowers as she unravels all the details of infamous and underreported true crime cases with her best friend Brit Prawat. From cold cases to missing persons and heroes in our community who seek justice, Crime Junkie is your destination for theories and stories you won’t hear anywhere else. Whether you're a seasoned true crime enthusiast or new to the genre, you'll find yourself on the edge of your seat awaiting a new episode every Monday. If you can never get enough true crime... Congratulations, you’ve found your people. Follow to join a community of Crime Junkies! Crime Junkie is presented by audiochuck Media Company.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.