Analysis and Prediction of the Dynamic Behavior of Applications, Hosts, and Networks

Instructor:Peter A. Dinda
Time:Spring 2002, WF 10:30-12
Location:CS Small Classroom (342)
Course number:CS 395/495-0-21

Winter 2001 instance of this course

General Handouts

  • Syllabus (pdf)
  • Reading List (pdf)
  • Projects

  • Project Ideas (pdf)
  • Presentation Schedule (Monday, June 10, 2002, 9am-11am, CS Classroom, 1890 Maple Avenue)
  • 9-9:30: Yi Qiao on Trace-based Network Bandwidth Analysis and Prediction (ppt)
    This paper performs detailed time-series analysis and prediction for network bandwidth. Three sets of trace-based data were collected for this purpose: the short-period WAN (NLANR) traces, the long-period WAN (AUCKLAND) traces, and LAN (BC) traces. For each of these three data sets, we give out a tree-based classification for the traces, which is based on different characteristics of different traces, such as autocorrelation function of bandwidth, histogram of bandwidth, and PSD of bandwidth. It has been found that most long-period traces display some degree of long-range dependency, while only a small fraction of short-period traces have this property. We then performed bandwidth prediction for the three sets of traces using RPS Toolkit. Performance of different predictor for different classes of traces were presented and analyzed in the paper. In almost all cases, AR model can give optimal or near optimal prediction results among all predictors that have been tested. Also, the effect of different bin sizes on bandwidth prediction were illustrated and discussed in the paper.
  • 9:30-10: Jeff Kwiat and Luka Spoljaric on Workload Characterization of a First Person Shooter (ppt)
    Everyday, first-person shooter games consume greater amounts of CPU resources. If we could understand the generated workload, programmers could develop more efficient games than are currently available. This would relieve the machine of excess CPU time which the machine could more effectively use elsewhere. The goals of this paper are to (1) present actual data from a first-person shooter; (2) extend our understanding of the generated workload; and (3) reveal the implications of our findings for future game development.
  • 10-10:30: Yevgeniy Vorobeychik on File Access Patterns in the Coda Distributed File System (ppt)
    Distributed File Systems have long utilized file caching techniques to improve performance. In many DFS’s clients are allowed to update the cached replicas of files, necessitating a variety of mechanisms that ensure the consistency of other replicas of these files across the network. This problem becomes complicated if there are many unstable files on the network, and especially so if there are no central servers. Surprisingly, there has not been much research into the access patterns of shared files. While researchers have found that there are relatively few unstable files, this claim has been disputed by others, and the last such study I am aware of dates back to 1992. Even more strikingly, I am aware of no detailed studies of file reading and writing patterns at all. In an attempt to fill this gap, I analyzed file access patterns in Coda Distributed File System using traces collected at Carnegie Mellon University over a period of approximately two years. I found that 1) Most files analyzed are stable; 2) Most unstable files tend to be updated and read by only one computer, but the computer that reads a file tends to be different from the one that updates it; and 3) A vast majority of files are read by the same computer that created them.
  • 10:30-11: Dong Lu on Communication Networks of Parallel & Distributed Systems: Low Latency & High Bandwidth goes to Cluster & GRID
    Communication networks play a vital role in parallel & distributed systems. Modern high performance communication networks can provide low latency, high bandwidth communication services. Low latency is often achieved by applying switched networks and DMA based zero copy, and communication protocol processing offloading by using a second processor. All these technologies originally came from parallel systems such as IBM SP2 and Intel Paragon, but there is a trend that those technologies are coming form inside to outside, that is, coming into clusters and System Area Networks such as Infiniband and VIA, and even Gigabits Ethernet. Also, with the development of networking technologies, higher bandwidth and lower latency can be achieved on GRID. This paper is organized as follows. Section two discusses communication networks inside two typical parallel systems, namely, IBM SP2 and SGI Origin. Section three discusses communication networks in a cluster environment, including Gigabits Ethernet, Myrinet, VIA and Infiniband. Through the discussion, the trend that some technologies are coming from inside parallel systems to the system area network and cluster can be seen clearly. Section four discusses the communication networks in GRID. Because GRID is operating on the basis of Internet --- a huge heterogeneous distributed system, of which the physical networks differs greatly, so the discussion focuses on the protocols instead, including IPv6, High performance TCP Reno, TCP tuning, RED, TCP Vegas. Also, some research is done on aggressive TCP by hacking the Linux kernel TCP/IP stack to make TCP more aggressive in some way. Some evaluation tests are done to verify the effectiveness of aggressive TCP.
  • Resources

  • Some time series analysis scripts
  • Statsoft Electronic Statistics Textbook
  • S-Plus 2000 Guide To Statistics (volumes 1 and 2)
  • BBN Prophet Statistical Analysis Software and Stat Guide (also available in the TLAB)
  • The TLAB
  • Matlab (available in the TLAB and site-licensed for Northwestern)
  • Statistics Toolbox - "help stats"
  • System Identification Toolbox - "help ident"
  • Control Systems Toolbox - "help control"
  • Wavelet Toolbox - "help wavelet"
  • Maple (available in the TLAB and site-licensed for Northwestern)
  • S-Plus, Mathematica, Systat, TableCurve, etc (Working on these)
  • Remos
  • NWS
  • RPS
  • Host Load Trace Archive
  • Internet Traffic Archive
  • Myers Anycast Data
  • NLANR Network Traces
  • CODA Project Traces

  • Peter Dinda
    Last modified: Mon Jul 17 14:12:45 CDT 2006