Regular Bugs
Easy bugs are just that, easy. Easy to produce,
easy to find and easy to
fix. These are the
bugs you find in development
and before release.
They don't require too
much thinking in correcting
as they just just kind
of jump out at you
and say "silly you,
you indexed by the
wrong value here".
There is not much of a mindset or philosophy
required to find and correct
these type of
issues.
Nasty Bugs
Real hard bugs are the
bugs that are created
by either coding done on
the project, or
by the interaction of the
product with the
the computer system (and
all it's processes)
that it's run on. These
are the bugs that
development can and should
do something about.
That being said, there
is still much that
impacts the debugging processes
in both a
positive and negative way.
A mind set that something is wrong
When something goes wrong, the first step
is acknowledgment that
something is wrong.
The report could have come
from the field,
a web user in Bulgaria
or a 6 year old son
of a tester, but something
did or did not
occurred as expected. What
ever the source,
the impact was great enough
to the user that
it was reported.
If the reported bug can not be duplicated,
it should be researched
a little deeper.
You'll never know when
a opportunity will
present it's self to correct
an error at
the earliest possible time.
But just as important
is to gather information
about configurations
that users have on their
systems and the
discovery of issues that
may never have been
thought of before. Users
are more than happy
to provide information
that will make the
product better. On the
plus side is the personal
repore that a user will
feel, knowing that
the products company is
concerned enough
about the average user
to research out any
problems they are having.
Even if the issue
is never resolved, this
concern and good
will will replicate it's
self in the products
community.
Downstream
The response back to the
user should never
be that its "not a
bug". It may
not be a code bug, but
a lack of clarity
on behalf of the documentation,
support,
wording, placement in the
window, etc. but
that it is an issue with
that user. This
person has spent part of
their precious time
in reporting what to them
is a bug. This
behavior should be rewarded
as it will enhance
the insight on what the
users expectation
are and how the product
is used. Any less
of a response will reduce
the feedback from
the users, as well as injure
future prospects
of sales.
Fault of the unknown
To paraphrase a political
statement: "Don't
fault you, Don't fault
me, fault that fella
behind that tree".
Programmers are confident
about their code. They
have to be, otherwise
these great software development
could not
be accomplished. There
is a downside to this
confidence though, and
that this when bugs
are found in a system during
integration
testing or later. If the
issue is one that
can not be tracked to a
specific area (memory
leak, delayed crashing,
general instability),
then this mindset of confidence
can mislead
and effect the debugging
process. Logic is
replaced by personal feeling
and the wild
goose chase begins.
Confidence allow one to overcome struggles
and difficulties to produce
great code. It
also clouds the vision
of our own weaknesses,
of which fallibility is
one. Programmers
can be so confident in
their own code that
it's difficult to comprehend
that they may
be the source of current
issues.
Run of the mill bugs can be easily tracked
by stack tracing, exception
information or
by stepping through the
code while running.
But general stability,
intermittent failures
and random faults are another
issue. These
are the unknown faults
and programmers have
a hard time dealing with.
It's during these times of stress and difficulties
in debugging where differences
in mindsets
are magnified. Normally
everyone is coding
happily until all code
needs to be inspected
for faults in logic or
coding. In a perfect
world, everyone's code
would routinely go
through code reviews on
a regular bases,
but in small shops and
or when delivery pressure
is mounting, reviews are
one of the first
processes that get dropped.
The code appears
to be working and progress
is being made,
so the thought is "why
disrupt coding
when everything is working?".
The reason
is that code may work fine
in unit testing,
but fail in mysterious
places during integration
testing or worse, after
a release into the
wild.
The programmer response to these type of
"unknown" errors
range from denial,
to "it's not my code",
to "I
think it's a OS error"
to "it must
be something we did".
Only the last
response is the logic response
and only one
that will prevail in a
solution. The others
will eventually get to
the same destination,
but only after long delays,
bruised egos
and creating or enlarged
chasms between staff
members.
Denial is the most unfortunate response.
People have reported one
of these unknown
faults and unless it is
found, understood
found and corrected, it
will continue to
dog the product. Denial
does nothing but
increase the cost of resolution
and undermine
the confidence in the product.
This mindset
will continue until the
evidence mounts and
the continuing pain forces
a rethinking of
the denial strategy. At
this point the shift
can be to the "fault
the unknown"
or to the "something
we did" mindset.
The "fault the unknown" mindset
is still a viewpoint that
the product code
is ok, but there is a mystery
bug with some
code (OS, Development tools,
etc.) not in
control of the development.
Now it's not
that this is an invalid
occurrence (it can
happen) but more likely
this mindset is just
a temporary stop on the
way to the "something
we did" realization.
It's hard to justify
this position without some
hard evidence.
Hearsay (newsgroups, forums,
etc.), rumors
and hunches are not hard
evidence for blaming
another companies product
for your problem.
This strategy will continue
until the programmers
statements get so unbelievable
that even
they no longer believe
in what they are suggesting.
At this point there is
only one stop left
in the mindset shuffle,
and that's to the
"something we did"
mindset.
Only at the "something we did"
mindset will progress be
made on the seemingly
unknown error. But this
will only work if
all involved are at this
level. Because of
the unknown nature of the
bug, every programmer
that has a piece of code
in the project,
must be sincere in re-evaluating
theirs and
each others code. A programmer
that is not
of this viewpoint will
only be evaluating
their code for the aspects
that work and
not for side effects that
could lead to instability
or possible errors. Heck,
some programmers
don't even know how their
own code works,
let alone what ripple effects
it may cause.
At this level, each programmer should actively
evaluate all code in the
product. Some programmers
may still hold on to the
"it's not my
code" mindset, but
when things really
go wrong, everyone's code
must be evaluated
for errors.
Whom do you trust?
For most errors the debugging process is
just find the line of code
and fix it, then
repeat for all bugs listed.
Most bugs are
easily fixed, a check here,
more logic there
and that's all there is.
Larger bugs that
take a day or longer to
understand will expose
the thought process of
the developer. This
thought process I call
the "whom do
you trust"
The debugging mindset listed above is really
a thought process of looking
at all of the
code (yours, others, OS,
drivers, etc.) and
developing a level of trust
of each of those
pieces of code that interact
with the code
being debugged.
The logical questions that should be asked
for each section of code (including your
own) is:
-
What is my confidence that this code is free
of the defect I am looking
for?
-
What is my confidence that this code has
been completely tested?
-
Are others confident of this code?
By answering these questions about your code,
the operating system or
language runtime,
the answers should point
to your own code
as the likely source any
errors. But it's
amazing how many programmers
will fault the
OS or language runtimes
as the possible source
over there own.
Logic would seem to state that new code is
the least tested and most
likely to contain
errors, rather than say
the current Java
VM. The order for searching
for your bug
should flow from least
tested (newest code)
to Most tested (mature
code).
Example
A programmer just updated the web server
from Tomcat to WebLogic
from BEA for the
first time. There where
some port conflicts
in the default settings,
so the programmer
went to work editing the
configuration files
to get things to work.
The application seemed
to work, but then started
to hang in random
places after a few minutes.
This was quite
puzzling and soon the statement
arose, "maybe
it's a bug in the WebLogic
server".
Hum, new code, first time ever installing
WebLogic, moving from a
JSP/Servlet server
to a J2EE leading AppServer
and you really
think your code is more
trustworthy? Whom
do you trust more?
|