Computer Scientist Puts NSF Funding to Work for More Reliable Computing

San Diego, Calif., Sept. 9, 2010 — She specializes in making computers safer and more reliable. Yet Yuanyuan (YY) Zhou is also a maven of reliability in another sense: Securing grant funding for the University of California, San Diego. 

YY Zhou
UC San Diego Professor of Computer Science and Engineering Yuanyuan (YY) Zhou specializes in making computers less vulnerable to attacks and malfunctions. NSF has awarded her more than $1.6 million in grant funding as a solo investigator, as well as a portion of a collaborative $10 million project.

While only on the UC San Diego faculty for a little more than a year, she has won National Science Foundation (NSF) support as principal investigator (PI) on four projects, and co-PI on a fifth.  And three of the projects kicked off in just the past six weeks. 

The recent projects aim to make computer systems more reliable by detecting software bugs more efficiently, creating automated logs to diagnose software issues, and using software and system components to adapt to the variability in manufactured computing systems. 

“People always ask why their systems fail, and the computer industry is starting to pay attention to this reliability issue,” explains Zhou, who joined the UCSD Department of Computer Science and Engineering in the summer of 2009. “Fundamentally, my research is about making computer systems less vulnerable to attacks so they crash less. When Windows crashes, it’s more of an inconvenience, but when an e-commerce site crashes it can really be a problem.”  

Zhou is the first holder of the Qualcomm Endowed Chair in Mobile Computing. The chair is one of four established in the Jacobs School of Engineering through Qualcomm’s original $15 million commitment to the California Institute for Telecommunications and Information Technology (Calit2).

After earning her Ph.D. from Princeton University, Zhou worked for two years as a research scientist at NEC. She then taught at the University of Illinois at Urbana-Champaign (UIUC) from 2002 to 2009. 

While at UIUC, Zhou co-founded her second startup company, Pattern Insight, with several members of her research team. Now based in Mountain View, Calif., the company has already begun shipping its first project — a search-and-analysis suite for source code that helps software development teams with the challenges of managing large code bases.  Its solutions, based on Professor Zhou’s research at UIUC, have been deployed in large companies including Cisco, Qualcomm, Juniper and Tellabs.  In 2008, Intel licensed an innovation related to multi-core processors developed by Zhou and her students. 

Zhou remains Chief Technology Officer at Pattern Insight in her spare time — and with all of her work at UCSD to keep her busy, there's not much of it. Her UCSD grants as solo investigator total more than $1.6 million, and Zhou will be responsible for a portion of a $10 million, Calit2-based project just getting underway this month.

Bug Detector

Detecting computer bugs is crucial in the fight for system reliability, Zhou says, which is why she was granted $430,000 from NSF to study ways that software and hardware can be used to detect bugs, especially those in parallel and distributed programs. 

“Right now,” Zhou says, “cell phones, laptops and desktops have multicore processors, but to take advantage of this kind of processing, programs need to be concurrent. Writing these programs is difficult and error-prone, and this has been a major headache for industry. Detecting and preventing these bugs from doing damage has become an increasingly important and urgent issue.”

To improve the correctness of parallel and distributed software, Zhou proposes a novel and widely applicable invariance, called data-flow invariance, which can be used to detect various types of software bugs and make software more reliable and secure.

“I strongly believe that this research can effectively improve our understanding of this challenge, provide substantial tools support to software development and greatly improve software quality and system robustness,” she adds.  

As the recipient of the Committee on the Status of Women in Computing Research (CRA-W) Anita Borg Career Award for her contribution to women in computer science, Zhou notes that her project also incorporates various educational and outreach activities for students, especially for women in computer science programs.

Troubleshooter

Another strategy Zhou proposes for coping with computer crashes is to diagnose the problem at the source through automatic log inference and informative logging. NSF granted Zhou another $470,000 to research ways to enable developers to quickly troubleshoot production-run failures and shorten system downtime. 

“When a crash happens, you don’t want to have to send your cell phone or computer back to the manufacturer, because that takes valuable time and might compromise private data,” notes Zhou. Not to mention, she adds, the vendor “might not be able to replicate the problem in-house,” just as a patient cannot often replicate a health problem to understand the possible root cause.

“And customer support is expensive for the cell phone companies as well,” she adds. “With Motorola, every support call I make to them costs $300 on average, and if they can’t figure out the problem, they have to send  a complete replacement.” 

Instead, Zhou proposes a method for quickly identifying root causes of the system malfunction and releasing patches to fix it, consequently reducing the amount of time the system is down, and ideally sparing the consumer any hassle whatsoever.

She says that industry leaders like Motorola, Dell, Sony and Cisco Systems have already started to push so-called “call-home capability,” which equips cell phones, laptops,  and desktop computers with the means to call support centers automatically when a problem arises. 

“But right now, the support center still has to call the user, diagnose the problem, and then  fix it,” she says. “Eventually, we hope the final step will be automated to the point where the computer will predict a failure is happening , ‘calls home’, and then automatically self-heals without the user even noticing that anything was wrong at all.” 

Expeditions in Computing

Zhou is also part of a large ensemble of researchers at UC San Diego and five other universities on a third new NSF grant. The $10 million project is part of the foundation’s Expeditions in Computing program, and it is directed by CSE Professor (and Chair) Rajesh Gupta,  with Zhou and four other co-PIs at UC San Diego, and eight co-PIs distributed across UCLA, UC Irvine, Stanford, University of Michigan, and UIUC. The so-called Variability Expedition proposes to re-think and enhance the role that software can play in a new class of computing machines that are adaptive and highly energy efficient. The idea is to use system components — led by proactive software — to routinely monitor, predict and adapt to the variability in manufactured computer systems.

Says Zhou: “It represents not only a way to deal with hardware reliability, but a chance to rethink software architecture. If software is designed in a way where the software can automatically adapt to the changing execution environment including the underlying hardware, the software itself is more reliable, and is robust to errors and variations in not only hardware  but also software itself Cell phones, for example, need to adapt to constantly changing environments — not just to the physical environment like extreme heat or cold, but to various applications and devices manufactured by different companies.  For this reason, it would be useful for the software stack to be adaptive.”

Zhou predicts that as a result of current research in the field, computer systems will become markedly more adaptive and reliable in as little as five years — and consequently, the nature of information technology support staffs will evolve as well.

“In the past, people focused a lot on the features and performance of computers, but it’s gotten to the point where the performance isn’t that bad for most apps,” she continues. “I think IT staffs will be consolidated because with automation, we’ll need fewer people to do basic level calls. The people doing the support side of things will become more expert.” 

She notes that since all the apps will be running ‘in the cloud’ instead of on individual cell phones, “there will still be a need for planning for resource allocation, only these companies won’t be dealing with separate individuals, but with datacenters.”

In addition to the three NSF grants awarded in the past six weeks, Zhou is currently working on two projects funded by NSF in summer 2009 after she arrived at UC San Diego. A $569,000 award is allowing her to work on a novel approach to automatically perform on-site diagnosis of a software failure right at the moment of the failure, and provide programmers a detailed diagnosis report.  Also launched last summer: a project to improve storage system performance, dependability and manageability using system mining techniques.

Zhou will give a talk on cloud computing this month at the National Academy of Engineering’s “Frontiers of Engineering” conference, where she’ll discuss the impact of cloud computing on transparency. 

“With cloud computing, the data is no longer on the device itself, so there is less transparency, and these abstractions make application testing, diagnosing and troubleshooting much harder because it’s harder to see the physical layers. 

“Although the benefit of the cloud is elasticity — you can scale down or scale up and pay for the bandwidth you use — many apps are not designed for this and can easily break,” she continues. “We need to begin asking: What is the difference between traditional app development and what is needed now? Maybe the cloud infrastructure provider will need to start building in development, testing, deployment and diagnostics to enable more applications in clouds.”

by Tiffany Fox, tfox@ucsd.edu, (858) 246-0353