When my garage filled up with, well, who knows what, I bought a bigger
house. The result of this additional space wasn’t neater stuff, it was
MORE stuff, proving once again that the amount of stuff you want to
store will grow in direct proportion to the storage space available.
And so it is with data. IBM’s RAMAC 305 computer, introduced in 1956,
was the first to feature disk storage (replacing drum storage). Its
model 350 Disk File consisted of a stack of 50 (count ‘em, 50) platters,
each an astounding two feet in diameter. With a total capacity of 5
million 7-bit characters (roughly 4.4M-bytes in today’s terms),
corporations at last had more storage capacity than they could fathom,
all for a lease fee of a mere $35,000 a year.
It’s hard to imagine 4.4M-bytes in a unit as big as a refrigerator.
Heck, Windows Server 2003 needs 1.5G-bytes of free space just to
install.
In 1973, IBM launched the legendary model 3340 “Winchester” storage
system, storing up to 70M-bytes in its washing-machine-size unit with
the industry’s first hermetically sealed disks, now just 14 inches in
diameter. But it was the follow-on model, the legendary 3350, storing
317.5M-bytes on its stack of eight 14-inch disks, that changed the face
of data processing. Well into the 1980s, corporate data centers, with
their raised floors, plenum air conditioning, and white-coated systems
engineers maintained row after row; dozens, or even hundreds, of 3350s
sucking up enough electricity to power a small town as brigades of
programmers churned out programs written in good-old, self-documenting
COBOL.
Today, you can hold hundreds of gigabytes in the palm of your hand, and
do it for just a couple of hundred dollars. At the enterprise level,
storage capacities of several terabytes are now common. (And, as we all
recall, a terabyte is 1,099,511,627,776 bytes, or, 1,024G-bytes.)
Is it chicken or egg? Are we developing ever-greater storage capacity
because corporate demand grows, or do we just churn out oceans of data
because we have a cheap place to put it? Whichever it is, the growth is
mind boggling: research firm Gartner Group says that in 2004,
enterprises will handle 30 times as much data as they did just four
years earlier. A new study by scientists at the University of California
at Berkeley reports that in 2002 alone, 5 million terabytes of new data
was created and stored.
You already know the opportunities in safeguarding such volumes of data
with backup technology; we don’t need to talk about that. But where are
you as a provider of technology solutions when it comes to day-to-day
management of this mountain? It’s not enough to simply bring in a
truckload of EMC, IBM, or Hitachi storage products. That’s because data
is useless. It’s INFORMATION we crave.
Transforming data into information through standard reporting
technologies isn’t much help. What’s called for is a vast explosion in
the use of data mining software, technology capable of slogging through
oceans of data, identifying the trends within that a human would never
be able to see unaided.
With data mining, information extraction leads to the discovery of
hidden facts, trends, or relationships contained in databases. Through a
variety of statistical and modeling techniques, data mining identifies
these subtle data relationships, analyzes them, and makes judgments,
laying the groundwork for decision support systems or prediction of
future trends. No wonder data mining is often referred to as “knowledge
discovery.”
A large retailer might discover that customers buying a certain product
tend to buy other types of products at the same time. Medical
researchers might learn that a certain medication produces beneficial
results in an area not being tested. Fraud patterns, insurance claims
analysis, and dozens of other disciplines benefit from data mining. A
bank could identify and predict customer behavior, grouping these
customers into clusters, each of which can be marketed to separately
with an appropriate array of products.
Privacy advocates don’t like this at all. They claim a system under
consideration by the Transportation Security Administration and FBI
isn’t necessary, will lead to an invasion of personal privacy, and may
not achieve its intended goal of making air travel safer.
Politics notwithstanding, data mining is a technology rapidly growing in
appeal, with an increasing number of vendors providing sophisticated
software solutions. It is a specialized discipline that isn’t right for
every solutions integrator. But that also means data mining is not a
commodity, making it a vehicle where opportunities and high-margin
opportunities abound.