Very large corporate data bases, coupled with effective data mining and applications, have always been "mother lodes of strategic economic gold". I was recently intrigued and motivated to determine what are the latest developments and applications. Also, I wanted to speak to vendors and end users who put their companies resources and sometimes own careers on the line. This would help clarify how well these systems are actually working. Does IT-driven marketing work? Is the impact of electronic commerce-- the Internet--worldwide competition and how newly discovered customer demographics function in the real world? Come enter the brave new world of VLDB and Data Mining for an overview of emerging directions and real life experiences.
David Stodder, VLDB Conference Chair states, "This conference is focused on the simple fact: Databases are getting much, much larger. The bigger they get, the more strategic they are to your business. Attendees have a tremendous and growing opportunity to become the hero---or the goat. To be the hero, you must master the VLDB rather than become it's servant After all, VLDB's can quickly consume the lions share of an organization's resources. Performance must be unparalleled. No downtime."
Now in it's four year, this summit brought together a rare collection of IT database professionals who have built and managed some of the world's largest data bases. This year's summit focuses on scaling up and maintaining performance. Speakers and conference attendees focused on core issues that make or break a VLDB. Although the summits are not for everyone, attendees come from a select group that design, develop and manage the largest, most complex and most visible databases in the world. The conference is also held in the elegant Beverly Hills Hilton without COMDEX size crowds.
Miller Freeman, Inc. organized this conference as part of their summit series to help provide real answers to real problems and opportunities for IT professionals. As technology keeps evolving at ever increasing rates, the need for better insights and practical information to keep database systems up-to-the-minute is even more critical. Some topics included: high-performance salability, data mining as the cutting edge of business intelligence, how future multi-media systems will move beyond terabytes into petabytes, how clustering technology is coming of age, influences of the omnipresent Internet and actual system performance reviews through case studies with tips and techniques for improvement.
World's Largest Databases
These giant data bases are regularly tracked by the Winter Corporation team of Richard Winter and Kathy Auerbach, who try to better understand system size, operational issues, practices and evolving technology trends. Some of their findings indicate mainframes are the predominant platform for VLDB's, not one Windows NT is running over 50 GB in size, IBM S/390 hardware and DB2 are a popular combination in these very large systems. According to one of their recent studies some of the largest VLDB systems include UPS 6,787 GB, Telstra 3,300 GB, U S Customs 2,800 GB, Experian 1,751 GB, Shin Han Bank 750 GB, IZB Software 696 GB and State Street Bank 633 GB. The future could see a 9 Terabyte system within one year and perhaps one in the tens of terabytes by 2000. Aside from the dazzling size of these systems , the researchers are gaining valuable insights into critical issues, technical directions and best practices of the leading installations.
Critical Information Discovery
Data Mining is "A Decision Support Process where large databases are analyzed for unknown and unexpected patterns of information." With interest in data mining growing, the myriad of technologies and products makes matching a specific technique to a business application challenging. According to Karman Parsaye, "at times interesting and useful applications of some techniques are overlooked. Data Mining can be a key strategic weapon which will make or break many businesses. It's a complex weapon that is often directly aimed at the organization's feet. Some of the key obstacles are overconfidence or timidity/slow thinking in some organizations. The bottom line is "those who win are those with the best people and best theory.
Mr. Parsaye's thoughts for the year 2000 include, "Most Fortune 1000 companies are performing 5% to 10% of what they could be doing in relationship management. This will increase to 40% to 50% by 2000 for some companies.
Future Data Visualization
Data visualization tools are beginning to appear in the marketplace which can improve the effectiveness and understandings of data mining. Visualization can provide new insights that statistic's alone can't tell most people. Peter Brooks of Coopers & Lybrand defines "Data Visualization as the use of graphical data presentation and interaction to elicit meaningful and actionable information."
In the future, Data visualization will become more important because of increased communication with users. Market consolidation and greater user-machine interaction will take place. Virtual reality and the Internet will have significant influence.
Susan Osterfelt, Senior VP of NationsBanc Services Co. and co-author of Understanding Data Pattern Processing: the Key to Competitive Advantage further elaborates that data visualization tools improve human comprehension of information by envisioning its relationships which are critical to data mining. It began in the scientific community and is now graduated to commercial business use. Her presentation highlighted how her company deployed the process and underscored how this will be an emerging trend. Her bottom line is "Where human decision makers need to be involve--where interpretation and judgments are required--visualization is the only way to manage information."
Big Isn't Necessarily Better
Pala Thornton, MCI information architect, concludes loading and distributing data to and from a multi-terabyte database environment should not be a goal of warehousing (as if achieving such somehow implies a higher level of maturity or accomplishment). Many companies have no choice but to face such a challenge head on as they learn a lot of lessons and are coming up with new demands for the warehousing industry.
Daniel Hills, current VP of research and development at Walt Disney, has been a catalyst in both data mining and VLDB fields. He was also highly influential in pioneering the concept and application of parallel supercomputers. He has worked closely with customers to apply technology to many problems in areas such as astrophysics, aircraft design, financial analysis, genetics, computer graphics and neurobiology. Mr. Hills' presentation helped frame and discuss advanced technology applications in many new fields and his work at Disney and other organizations
The following day's keynote, by KeyCorp's Stephen Cone, showed how VLDB's, data warehouses and data mining are revolutionizing the marketing battleground. He stated a company's strategic vision can be more effectively exercised by developing new products customized for---and marketed to reach---their most profitable customers. He demonstrated how his company, changed its marketing to exploit all those VLDB bytes, which lead to a memorable and distinctive market personality as well as new information-based services for customers.
The final daily keynote was presented by Terrell Jones, President of the SABRE Group of AMR, who provided an overview of important electronic commerce trends and lessons. He focused on what consumers are asking for, what works and what doesn't, the difference between the physical and virtual world and what the next phase of competition will include. He also describe how Travelocity's VLDB database platform handles the transaction and DSS duties---and the role of the database in future electronic commerce architectures.
Technical oriented proceedings amplified and supported much of the daily keynote vision. Much useful "nuts and bolts" information and contacts were exchanged. Herb Edelstein, President of Two Crows Corporation, explored what data mining products do and how they do it, who they are aimed at, and how they compare to traditional statistical modeling tools. He also included guidelines on how to evaluate and select tools, as well as a survey of many recently introduced tools.
Jim Gray, a Microsoft database specialist, provided insights on the trends and progress of scaleable database systems for both transaction processing and data analysis. These results show astonishing improvements. He feels there are obstacles and still substantial gaps in the open (Unix) and commodity (NT) approaches. Finally, he argues that most of the really large databases (the petabytes) will be multimedia databases. If that is true, VLDB systems must solve a host of issues that the current crop of DSS and OLAP systems ignore.
Ken Rudin, CEO of Emergent Corporation talked about using clusters in the real world. He discussed how clustered database servers deliver many real world benefits such as scaling applications beyond traditional SMP limits and higher application availability. However, these benefits create additional complexity. Rudin offered a practical view of how clusters work, their benefits and challenges, and some techniques for addressing the challenges. He also discussed how new clustering technologies, such as NUMA, that require a change our thinking about clusters and salability.
A Survival Manual For VLDB Expeditions Into Oracle was presented by Dave Ensor of BMC Software. The presentation stated there are a series of good practice guidelines for the design of any application to run under the Oracle server. In VLDB an VMDB environments, the potential penalties for deviating from good practices are markedly increased, and a number of quite specific issues caused by size alone must be taken into consideration.
In the hardware department, "S/390 For Data Warehousing: Not Your Father's Mainframe" was a good case study presented by Time Consumer Service Inc.'s Messrs. Sagar and Venkatesa. They used Sports Illustrated as a case study to explore the use of DB2 on the mainframe (DB2 V5 for OS/390) that enabled the publication to transform it's business from a product-centric franchise to a customer-centric, database marketing-driven business. They explained their rationale for the selection of the mainframe, VLDB physical design considerations, and how they addressed performance challenges.
Lehman Brother's, Yuval Lirov, provided an interesting and wide ranging case study on Mission-Critical Systems within their company across three continents. They said "the systems management challenge stems from the complexity of networked components and the variety of individual user configurations and application interdependencies." It was stated this problem is compounded by a combined effect of continuously evolving technology and growing user demands. The Lehman environment is comprised of 3,200 Unix hosts, 360 data servers, and over 10,000 batches in North America, Europe and Asia. The presentation demonstrated how a new synergistic support methodology, use of practical methods to improve systems availability and performance while lowering costs. Lehman Brothers has experienced results in both 96% client satisfaction and over 100% support productivity gains.
Conclusion and Additional Information
This conference is one of the premier and focused IT meetings covering Data Mining and VLDB systems for IT professionals. It presents powerful reasons to exploit opportunities uncovered by advanced business analysis discussed in these meetings. Its the driving force behind DSS and advance business intelligence.
It was certainly worth my time to be updated on recent developments and pre-visualize what will be available in the near future. Forward thinking businesses and people need to be aware of these important developments and applications. They also need to make and maintain relationships with other IT professionals they will meet at future conferences.
© 1998 Jim Bennett All rights reserved.