calit2

Clusters, the OptIPuter, and the Grid: A Conversation with Stephen Jenks at UCI

Stephen Jenks
Stephen Jenks

4.27.04 - "My research concerns how to design and use future generations of computers," says Stephen Jenks, an assistant professor in the Electrical Engineering and Computer Science department at UCI. Jenks also studies issues in parallel and distributed computing.



"I discovered - the hard way - that it's easier, not to mention less expensive, to create a distributed system in software than in hardware." Jenks constructed a 32-node AMDF Athlon (processor) system with dual-processor-capable motherboards. "Miricom donated a 2-Gbps/node Myrinet network," he says. "It was the fastest internal network at UCI - that is, until Charlie Zender installed a faster one for his Earth System Modeling Facility."



Jenks and his students use his current cluster for research. It's now split into two clusters that run different versions of SDSC's Rocks software with one kept stable and the other used for experimental purposes.



Jenks expects to use this cluster as an OptIPuter "node," so he's tracking that project closely with OptIPuter members Kane Kim and Michael Goodrich to ensure capability. In that context, it will have dual functions - to support the research of Jenks and his students along with the needs of the OptIPuter team - both middleware, led by Andrew Chien, and the applications, such as the Biomedical Informatics Research Network led by Mark Ellisman at UCSD and Steven Potkin at UCI.



In fact, there are other groups standing in line to use Jenks' cluster: A visualization group led by Falko Kuester wants to use it to drive high-resolution tiled displays, such as for a five- or six-sided CAVE.



Parallel programming and debugging are outrageously hard, says Jenks, which in turn makes parallel computing, even after years of experimenting, enormously difficult to do. "My goal, as a computer architect, is to make parallel computing accessible to scientists and engineers to make such machines useful. I am studying software techniques to achieve better utilization of the hardware and networks."



Of course, many people have experimented with OpenMP - the shared-memory model - in which "hints" to the compiler are included in the code, for example, "this next loop can be parallelized." But the code is not changed fundamentally. Doing this might speed up the code in some cases and can work for small clusters of, say, 32 nodes. But it doesn't scale well to giant machines.



One of the projects Jenks is working on is to make better use of shared resources on parallel chips. He points to a concept called hyperthreading using Pentium 4 chips. "In effect, this technique makes one processor look like two," he says. For example, the Power series chips from IBM have dual cores.



"Parallelism in the old days was constrained by bandwidth-limited chips," says Jenks. "The problem we have today, with such things as these dual-core chips, is the constraint of memory bandwidth."


One of his students wrote an electromagnetic simulation, which, at first, ran as slowly on two processors as it did on one because it was limited by exactly this problem. But the student devised a way to restructure the program so some communication could take place in the main cache, rather than memory, to avoid the memory bottleneck issue. The code called for one processor to lead and the other to follow, which took advantage of data stored in the cache to avoid both processors competing for memory at the same time. "This is a very interesting issue to us right now because all vendors are going to have dual-core chips in their systems."



Working with another student, Jenks is also rethinking cluster-to-cluster communications. "Now we treat nodes on the Grid as equivalents," says Jenks. "We allocate a bunch of nodes that communicate through MPI tailored to the Grid, but this approach doesn't address latency issues nor take advantage of the locality possible within a cluster."



Jenks thinks node specialization might be a more worthwhile approach in which a computation is pipelined between clusters, that is, the number crunching might be done on one processor, postprocessing on another, and visualization done on a third. "We need a different paradigm to program distributed clusters," says Jenks. "This might be a good topic for the OptIPuter project."



Jenks says people know how to do MPI communication between nodes and IP communication over long distances. "What we need to figure out," he says, "is how to automate mapping of data between clusters: One to many, many to one, and one to one - these are all still experimental situations."



In the 1980s and 1990s, the world shied away from custom machines toward COTS [commercial off the shelf - Ed.] systems because the processors in the latter were getting fast enough. But one of the problems of general-purpose computers is that they generate so much heat per useful operation, especially in parallel systems, that they may not be the answer for scientific computing.



But the pendulum is beginning to swing back in the other direction: Jenks perceives a more recent trend back toward custom systems that can do specific things extremely well. The twist now is that such systems are low-power. "We've heard a lot about the Japanese Earth Simulator," he says. "If your job is to do one thing, you should 'go custom' because clusters don't offer the same performance per watt."



And what about the Grid? "Clusters are cheap," he says, "so if your organization desperately needs computational resources, it's probably still more cost-effective to buy a cluster. Companies are also concerned about protecting trade secrets and the security of their data. If you own the resource, you have more control over it."



But the Grid, to be sure, has a place, says Jenks, and it's likely to change the way we do everything. Think of the utility grid where we have wall outlets with standard plug design and voltage. Computers on the Grid don't have that level of standardization yet. They have different architectures, memory designs, and programming models. But that standardization will come. Some people even have a term for it: Grid Utility Computing.



"Now data Grids, though - that's where it's at," says Jenks. "We're just beginning to be able to stream data from CERN, for example, and ship it around the world. Formerly, we've had to move the computation to have locality with these kinds of giant data sets. In fact, migration was my Ph.D. thesis topic. Now I want to apply that understanding to migration for the Grid across longer distance and latencies. There's another topic that would lend itself usefully to the OptIPuter."