ࡱ> vu'( / 00DArialngsܖ0tt. 0De0}fԚngsܖ0tt. 0 DTahomagsܖ0tt. 00DWingdingsܖ0tt. 0 A .  @n?" dd@  @@`` `+'      !"#$%&'()* 0AA@83ʚ;ʚ;g4@d@d_ 0ppp@ <4dddd@k 0t. 80___PPT10 7 gCorrelating Instrumentation data to system states: A building block for automated diagnosis and controlhh("Ira Cohen, Jeffrey S. Chase et al. Introduction.Networked systems continue to grow in scale Complex behavior stemming from interaction of Workload Software structure Hardware Traffic conditions System goals Pervasive System needed to manage such a system Examples? HP s Openview IBM s Tivoli (Aggregates + displays graphically),Z.ZEZ0Z ZZ$Z,.E0   $,2 IntroductionTwo approaches to build self managing systems A priori models Event-condition-action rules Not based on real systems (Disadvantages?) Difficult and costly Unreliable, does not take account of allN.H>.H>  IntroductionStatistical learning techniques Assumes little to no domain knowledge Hence  general Problem! Still have to identify techniques that are powerful enough to induce effective models that are: Efficient Accurate Robust Z&ZZ Z`ZZ & ` GoalskAutomatically analyze instrumentation data from network services in order to Forecast Diagnose Repair failure conditions We use the Tree-Augmented Nave Bayesian Networks (TANs) as the basis for Diagnosis Forcasting System-level instrumentations in a 3-tier network service. Widely used in various fields, but TANs are not used in the context of computer systems.MP,PJPP;PYPM,J;  Y> _2 GoalsAnalyzed data from 124 metrics gathered from 3 tiered e-commerce site under synthetic load Httperf Java PetStore as platform TAN model select combination of metrics and threshold values that complies with Service Level Objectives for average response time. Results laterV-."-."  ,[  What is a TAN?SBayesian network is an annotated directed acyclic graph encoding a joint probability distribution Nave Bayesian Network State var S is only parent of all other vertices Assumes all metrics are fully independent given S TANs consider relationships among metrics themselves, with constraint that each metric has only one other parent than S*yy,Zt Why Use a TAN?Based on premise that a relatively small subset of metrics and threshold values is sufficient to approximate the distribution accurately Outperforms generalized Bayesian networks and other alternatives in both Cost Accuracy >ZZZ Why use a TAN?BUseful for forecasting failures and violations Possible to induce models that predict SLO violations in near future, even when system is stable Automated controller can invoke directly Identify impending violation Respond Loading Adding resources Cheap model to induce Possible to maintain multiple models Periodic refresh^Z%ZZLZ%L   SetupSystem is 3-tier webservice Apache Middleware (BEA WebLogic) Oracle db 3 Servers with HP Openview to collect statistics Load Generator is httperf SLO indicator processes the logs to determine complianceBZ+ZZ+P ): "Interpretability and ModifiabilityjTANs offer other advantages Interpretability Modifiability Influence of each metric can be quantified in a probabilistic model Analysis catalogs each type of violation according to the metrics and values that correlate with observed instances Strength is given from prob value occurring in different states Gives insight to causes of violations and how to repair\ZZDZZD$]  WorkloadsIVaries several characteristics Aggregate req rate Number of concurrent connections Fraction of data-intensive vs app-intensive requests This is to exercise the model-induction methodology by providing it with a wide range of M,P pairs Where M = sample of values for system metrics P = vector of app-level performance measurements\PiPcP_Pic_,)B  WorkloadsRAMP: Increasing concurrency STEP: Background + Step function Background constant traffic Bursty, hour long bursts BUGGY: Increasing aggregate req. rate6>5&>5&Z9 ResultsVaried SLO thresholds to explore effect on induced models To eval accuracy of models under varying conditions Trained and evaled TAN classifier for each of 31 different SLO definitions Baseline: accuracy of 60-pctile SLO classifier (MOD) and CPU as metric.Z,=9ResultsOverall BA of TAN is 87-94% 90+% for all experiments 6% False alarm for 2 experiments, 17% for BUGGY Single metric is not sufficient to capture pattern of SLO violations (CPU) Small number of metrics is sufficient to capture pattern (3-8) Sensitive to workload and SLO definition (MOD always has high detection rate, but generate false alarms at increasing rate as SLO thresh increases)Z ConclusionTANs are attractive for self-managing systems Build system models automatically No a priori knowledge required Generalizes to wide range of conditions Zeroes in on most relevant metrics Practical&.. ConclusionPossible work to adapt this to changing conditions Close the loop for automated diagnosis and control Ultimately most successful model is a hybrid of Automatically induced models A priori models*Z-Z- Questions? / ! " # $ %&'(  0` 33PP` 13` 3333` Q_{` 333fpKNāvI` j@v۩ῑ΂H>?" dd@,?n<d@ `7 `2@`7``2 n?" dd@   @@``PR    @ ` ` p>>   @ H (    <" J 0C   Td" J 0C   <"U_ J 0C   Td">& J 0C   N<"P J 0C   <"p J 0C   C xࡒ?d?"bUv J 0C    <ܤ #" `   T Click to edit Master title style! !$   0঒ "   RClick to edit Master text styles Second level Third level Fourth level Fifth level!     S   6D "]}  b*0   6` "] }   d* 0   6 "]T}  d* 0B  s *޽h ? 333380___PPT10.(г%  Blends   0 A 9 P (  T +  "+bb P@ # "Dwoh  s *"PP  Bd" P@bb P 0  # "Nyh  s *"P    Bd"P 0 z   <" a*h   s *"    f?d?"+)   Bd ?#" ` p  T Click to edit Master title style! !   0, " `    W#Click to edit Master subtitle style$ $  6 "`p   f*"0  6 "`p   h*$0  6 "`  h*$0B  s *޽h ? 333380___PPT10.(г% 0 `(  ` ` 0$ P    T*   ` 0,'     V*  d ` c $ ?   ` 0\  0  RClick to edit Master text styles Second level Third level Fourth level Fifth level!    S  ` 6x7 _P   T*   ` 6> _   V*  H ` 0޽h ? 3380___PPT10.L7Px'* 0  *(  x  c $0  p  r  S h  `    H  0޽h ? 3380___PPT10.]6$  0 p$(  r  S F     r  S H    H  0޽h ? 333380___PPT10.]6$  0  $(   r  S k     r  S `l    H  0޽h ? 333380___PPT10.a6Ǥ$  0 $$(  $r $ S w     r $ S y    H $ 0޽h ? 333380___PPT10.b6pZ$  0 ($(  (r ( S ̄     r ( S ؊    H ( 0޽h ? 333380___PPT10.b6$X$  0 ,$(  ,r , S      r , S 4    H , 0޽h ? 333380___PPT10.b6$  0 <$(  <r < S      r < S ̲    H < 0޽h ? 333380___PPT10.e6F$  0 4$(  4r 4 S      r 4 S 0    H 4 0޽h ? 333380___PPT10.c6 Q$  0 8$(  8r 8 S T     r 8 S h    H 8 0޽h ? 333380___PPT10.d6$   0 0$(  0r 0 S       r 0 S L    H 0 0޽h ? 333380___PPT10.c6PԱ;$   0 @$(  @r @ S      r @ S 0    H @ 0޽h ? 333380___PPT10.e60d$   0 D$(  Dr D S       r D S     H D 0޽h ? 333380___PPT10.f6lu$   0  H$(  Hr H S $'     r H S (    H H 0޽h ? 333380___PPT10.f6p {$   0 0L$(  Lr L S 3     r L S t4    H L 0޽h ? 333380___PPT10.g6Җ$  0 @P$(  Pr P S 0     r P S 4    H P 0޽h ? 333380___PPT10.g60jm$  0 PT$(  Tr T S 4W     r T S X    H T 0޽h ? 333380___PPT10.g6 .$  0 `X$(  Xr X S |a     r X S d    H X 0޽h ? 333380___PPT10.h6P(" $  0 p\$(  \r \ S L     r \ S $    H \ 0޽h ? 333380___PPT10.h6fG 0 d (  dX d C `    d S D` 0   " H d 0޽h ? 3380___PPT10.L7p/ 0 h (  hX h C `    h S Xj` 0   " H h 0޽h ? 3380___PPT10.L76 0 l (  lX l C `    l S @v` 0   " H l 0޽h ? 3380___PPT10.L76 0 p (  pX p C `    p S ` 0   " H p 0޽h ? 3380___PPT10.L70A8 0 t (  tX t C `    t S T` 0   " H t 0޽h ? 3380___PPT10.L70A8 0 x (  xX x C `    x S <` 0   " H x 0޽h ? 3380___PPT10.L70A8  0 | (  |X | C `    | S (` 0   " H | 0޽h ? 3380___PPT10.L70A8 0  (  X  C `     S ` 0   " H  0޽h ? 3380___PPT10.L70A8 0  (  X  C `     S  ` 0   " H  0޽h ? 3380___PPT10.L79 0   (  X  C `     S ` 0   " H  0޽h ? 3380___PPT10.L79  0 0 (  X  C `     S ,` 0   " H  0޽h ? 3380___PPT10.L79  0 @ (  X  C `     S %` 0   " H  0޽h ? 3380___PPT10.L7pN;  0 P (  X  C `     S  0` 0   " H  0޽h ? 3380___PPT10.L7<  0 ` (  X  C `     S C` 0   " H  0޽h ? 3380___PPT10.L7< 0 p (  X  C `     S U` 0   " H  0޽h ? 3380___PPT10.L7< 0  (  X  C `     S `` 0   " H  0޽h ? 3380___PPT10.L7< 0  (  X  C `     S u` 0   " H  0޽h ? 3380___PPT10.L7< 0  (  X  C `     S {` 0   " H  0޽h ? 3380___PPT10.L7<r'\7KH}JLNQ-S[WYYU ^5`abdfhk=m5Bioqsuwy{~)AYqю(1Oh+'0@2 hp (4 T ` lxhCorrelating Instrumentation data to system states: A building block for automated diagnosis and controlChi Yin CheungBlendsFabian E. Bustamante8Microsoft PowerPoint@6 @l]6@pTL7G0g  SQ  y--$xx--'33--$,4 4 ,,--'-::--$ , 5 5 , ,--ff--$ , 5 5 , ,----$ , 55, ,----$,55,,----$,55,,----$,55,,---'--$4<<44--'---$ 3 <<3 3--X--$3<<33--݌--$3<<33----$3<<33----$3<<33----$3<<33----$3<<33---'QA   2( ~~ll[[JJ99%%rraaOO>>--xxffUUDD5678&'() 0123 !"4$+,-./&'() *B !"#$%U s? u?'--$ * = = * *--'---$89988-- --$89988--"""--$89988--$$$--$89 9 88--&&&--$ 8 9 9 8 8--)))--$ 8 9 9 8 8--+++--$ 8 9 9 8 8-------$ 8 998 8--000--$89988--222--$89988--444--$89988--666--$89988--888--$89988--:::--$89988--===--$89988--???--$89988--AAA--$89988--CCC--$89988--EEE--$89988--HHH--$89988--JJJ--$89988--LLL--$89988--NNN--$89988--QQQ--$89988--SSS--$89 9 88--UUU--$ 8 9!9!8 8--WWW--$!8!9"9"8!8--YYY--$"8"9#9#8"8--[[[--$#8#9$9$8#8--]]]--$$8$9%9%8$8--___--$%8%9&9&8%8--aaa--$&8&9'9'8&8--ddd--$'8'9(9(8'8--fff--$(8(9)9)8(8--hhh--$)8)9*9*8)8--jjj--$*8*9+9+8*8--lll--$+8+9,9,8+8--nnn--$,8,9-9-8,8--ppp--$-8-9.9.8-8--sss--$.8.9/9/8.8--uuu--$/8/90908/8--www--$0809191808--zzz--$1819292818--|||--$2829393828--~~~--$3839494838----$4849595848----$5859696858----$6869797868----$7879898878----$8889999888----$9899:9:898----$:8:9;9;8:8----$;8;9<9<8;8----$<8<9=9=8<8----$=8=9>9>8=8----$>8>9?9?8>8----$?8?9@9@8?8----$@8@9A9A8@8----$A8A9B9B8A8----$B8B9C9C8B8----$C8C9D9D8C8----$D8D9E9E8D8----$E8E9G9G8E8----$G8G9H9H8G8----$H8H9I9I8H8----$I8I9J9J8I8----$J8J9K9K8J8----$K8K9L9L8K8----$L8L9M9M8L8----$M8M9N9N8M8----$N8N9O9O8N8----$O8O9P9P8O8----$P8P9Q9Q8P8----$Q8Q9S9S8Q8----$S8S9T9T8S8----$T8T9V9V8T8----$V8V9W9W8V8----$W8W9Y9Y8W8----$Y8Y9Z9Z8Y8----$Z8Z9[9[8Z8----$[8[9]9]8[8----$]8]9^9^8]8----$^8^9`9`8^8----$`8`9a9a8`8----$a8a9c9c8a8----$c8c9d9d8c8----$d8d9f9f8d8----$f8f9h9h8f8----$h8h9j9j8h8----$j8j9l9l8j8----$l8l9n9n8l8----$n8n9p9p8n8----$p8p9r9r8p8----$r8r9t9t8r8----$t8t9w9w8t8----$w8w9y9y8w8----$y8y9{9{8y8----${8{998{8----$89988----$89988----$89988----$89988----$89988----$89988---'@Tahoma-. 3392 !Correlating Instrumentation data ."System -@Tahoma-. 33<2 #to system states: A building block .-@Tahoma-. 3312 )for automated diagnosis and .-@Tahoma-. 332 4Hcontrol.-@Tahoma-. :2 L"Ira Cohen, Jeffrey S. Chase et al..-՜.+,0H    On-screen ShowNorthwestern University Arial 新細明體Tahoma WingdingsBlendshCorrelating Instrumentation data to system states: A building block for automated diagnosis and control Introduction Introduction IntroductionGoalsGoalsWhat is a TAN?Why Use a TAN?Why use a TAN?Setup#Interpretability and Modifiability Workloads WorkloadsResultsResults Conclusion Conclusion Questions?  Fonts UsedDesign Template Slide Titles,_ŕFabian E. BustamanteFabian E. Bustamante  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJLMNOPQRSTUVWXYZ[\]^_`abcdfghijklnopqrstwRoot EntrydO)Current UsermSummaryInformation(Kp2PowerPoint Document(DocumentSummaryInformation8e