> uwt}5@ eybjbj22 )XXqF &7&7&78^7L7$0j8L&9(N9N9N9M:M:M:iiiiiii$
lR\ni>I:M:>>i N9N9iFFF>j pN9N9iF>iFFGjX`>cN98 ?&7Awc2cj00jc2nDndc nc M:,y;FO<<M:M:M:iiD+&7F4&7Stylometric clustering: a comparison of data-driven and syntactic features
scott aaronson1
1Computer Science Department, University of California, Berkeley, Berkeley CA 94720-1776 (E-mail: aaronson@cs.berkeley.edu)
Tel: 510-548-0145 office, 510-664-2942 home
Fax: 510-642-5775
Full-length article, submitted on November 20, 2001
Abstract. We present evidence that statistics drawn from an automated parser can aid in stylometric analysis. The problem considered is that of clustering a collection of texts by author, without any baseline texts of known authorship. We use a feature based on relative frequencies of grammar rules as computed by an automated parser, the CMU Link Grammar parser. We compare this feature against standard data-driven features: letter, punctuation, and function word frequencies, and mean and variance of sentence length. On two corpora(1) the Federalist Papers and (2) selections from Twain, Hawthorne, and Melvilleour results indicate that syntactic and data-driven features combined yield accuracy as good as or better than data-driven features alone. We analyze the results using a cluster validity measure that may be of independent interest.
Keywords. author attribution, stylometry, automated parsing, clustering, cluster validity
Introduction
The goal of this paper is to study how differences in writing style manifest themselves statistically, and in particular, to evaluate whether syntactic features reveal information about style that data-driven features do not. By syntactic features, we mean statistics drawn from parses of a texts sentences; by data-driven features, we mean statistics drawn from a text itself (such as word and punctuation mark counts). The hypothesis is that data-driven features are more effective than syntactic ones at distinguishing style, but that the combination of both is more effective than either alone.
This hypothesis is of practical interest. For decades, stylometry has used data-driven features to help determine authorship of texts, both for historical research and for plagiarism and fraud cases. But there is an enormous variety of data-driven features, and finding ones that work for a specific set of texts is something of a black art. Thus, it is worth asking whether including syntactic features can improve the accuracy of author attribution.
There is intriguing evidence that the answer is yes. Baayen et al. (1996) found that syntactic features helped substantially in distinguishing between two writers of crime fiction. Their methods were somewhat different from ours: they used an annotated corpus and classified texts based on a sample with known authorship, whereas we used an automated parser and clustered texts by author. Still, our work is related to theirs, and may provide independent evidence for their claims.
The title of Baayen et al.s paper is Outside the Cave of Shadows: Using Syntactic Annotation to Enhance Authorship Attribution. In Baayen et al.s view, function words are analogous to the shadows on the cave wall in Platos parable. Function word frequencies are merely a reflection of what really separates one author from another, the authors differing use of syntax. Baayen et al. argue that by focusing on syntax we are tapping into the more abstract, largely unconscious and hence most revealing habits of our authors as we move from the relatively concrete domain of words into the more abstract domain of syntax, the use of elementary units becomes less subject to conscious manipulation and thematic development.
The cave metaphor suggests that clustering documents by style may be more difficult than clustering by subject. Given just an alphabetized list of the words occurring in a text, it is usually not difficult to discern the texts subject given enough time. Internet search engines such as GoogleTM rely on this property of texts. But if a texts subject resides largely in the words, then where does its style reside? The answer is unclear. One could probably learn little about style from an alphabetized list of a texts words, even though, as Mosteller and Wallace (1984) and others showed, differing frequencies of a few function words may suffice to distinguish one writers style from anothers. According to Baayen et al., differing function word frequencies are just an artifact of more fundamental differences in syntax.
Baayen et al.s paper ends with a note: We are not very optimistic about the use of fully automatic parsers, but follow-up research should not disregard this possibility. However, there is reason to doubt Baayen et al.s pessimism. Unlike other potential applications of automated parsing, stylometry does not require highly accurate parses. It only requires parses that yield statistically significant fingerprints for particular authors.
MACROBUTTON MTEditEquationSection2 Equation Section (Next) SEQ MTEqn \r \h \* MERGEFORMAT SEQ MTSec \h \* MERGEFORMAT
Summary of Results
In this paper, we give empirical results to support the hypothesis that automated parsing can aid author attribution.
The problem we consider is the following: given a collection of texts and an integer K(1, partition the texts into at most K clusters such that, to the greatest accuracy possible, all texts in a given cluster were written by the same author. The accuracy is defined in terms of the number of texts that must be moved from one cluster to another for the clustering to be perfect. In Section 5 we define the accuracy measure formally, and we prove some of its statistical properties in an appendix.
An important aspect of our work is that there is no training corpus; we simply want to cluster stylistically similar texts. This runs counter to what Baayen et al. (1996) calls the tradition, as exemplified in the study by Mosteller and Wallace, of comparing texts of unknown authorship with texts of which authorship is beyond doubt. If the clustering approach makes the problem more difficult, it also more closely reflects the reality of certain authorship controversies. For example, in applying stylometry to the bible, we do not have access to baselines of undisputed authorship, so the goal is generally to find clusters with stylistic similarities (as in Greenwood, 1992, which studies the Pauline epistles).
Our experiments compared how accurately texts are clustered using a variety of features, including data-driven features (letter, punctuation, and word frequencies, and mean and variance of sentence length), a syntactic feature (frequencies of grammar rules as computed by a parser), and combined data-driven and syntactic features. Section 4 explains and justifies the features. In all cases we used the K-means algorithm for clustering; see Section 4 for details of the algorithm. The test corpora were (1) a selection of 18 short passages from novels by Twain, Hawthorne, and Melville, and (2) a selection of 31 Federalist Papers.
The parser we used was the Carnegie Mellon Link Grammar parser (Temperley et al., 2001), version 3.0. Instead of building a parse tree, this parser simply tries to find syntactic links between words (see Grinberg et al., 1995 for details). As an example, the sentence James Madison wrote the disputed Federalist Papers is parsed to
+----------------------------Xp----------------------------+
| +---------------Osn---------------+ |
+-----Wd-----+ +------Op-----+ | |
| +---G--+---Ss--+ +---DD--+ +----G---+ |
| | | | | | | | |
///// James Madison wrote.v the disputed.a Federalist Papers .
The Link Grammar parser seems well suited to stylometric applications, in which one only cares about the relative frequencies of syntactic features, not about the logical structure of a particular sentence. Additional reasons we chose this parser were its good handling of fragments and ill-formed or ungrammatical sentences, its free availability as a C library, and its parsing efficiency.
The results, though inconclusive, support the hypothesis that data-driven and syntactic features combined can yield greater accuracy than either alone. Section 6 gives the results in detail, and Section 7 analyzes them.
Related Work
Besides the paper of Baayen et al. (1996) discussed previously, we mention two other papers relevant to the present work.
Forsyth and Holmes (1996) catalog the advantages and disadvantages of various data-driven features. They evaluated five features on a benchmark suite, establishing what they call a preliminary pecking order of features:
Letter frequencies, 69% accuracy
Frequencies of most frequent words, 73% accuracy (the principle is that the most frequent words tend to be function words)
Bigram frequencies, 74% accuracy
Progressive pairwise chunking, 75% accuracy
Monte-Carlo feature-finding, 79% accuracy
The accuracy is the proportion of texts assigned the correct author when a given feature is used for classification. The last two features were introduced by Forsyth and Holmes; we elected not to implement them.
Ledger and Merriam (1994) used cluster analysis, as well as multivariate analysis, to try to determine which acts of The Two Noble Kinsmen were written by Shakespeare and which by John Fletcher. Particularly noteworthy is that they achieved interesting results using only letter frequency as a feature.
There has been considerable work on clustering texts by genre (see, for example, Kessler et al., 1997). Here, though, our concern is clustering by author (or, more abstractly, by style), given a collection of texts that often belong to the same genre.
Features and Clustering Method
We consider each feature to be a set of real-valued components. For example, the letter frequency feature has 26 components, and the mean sentence length feature has only one component. For each text, we compute all components of all features being used and store the results as a vector. If more than one feature is being used, we normalize the components so that each feature counts equally. We then cluster the vectors using the K-means algorithm, with the number K of clusters given as input. (For a description of the K-means algorithm, see e.g. Mirkin, 1996) Originally we normalized the vectors by replacing components with their standard deviations, but we found that eliminating this step consistently improved results. We measure similarity using the L1 (or city-block) norm, which produced more accurate results than the L2 (or Euclidean) norm and several other p-norms that we tried. The centroid of a cluster is taken to be the mean of the points in it.
In the remainder of the section, we explain and justify the features used in our experiments.
Letter frequency (non-case-sensitive). We chose to include letter frequency for two reasons: first, it is a simple baseline for comparison against such features as word frequency, and second, several papers (such as Ledger and Merriam, 1994) reported unexpectedly good results with it.
Punctuation frequency. We count frequencies of 15 punctuation symbols commonly used in English text: ! $ % & ( ) - ; : , . / ?. The intuition is that frequencies of certain characters, such as exclamation points and parentheses, may reflect differences in writing styles.
Mean sentence length. Various heuristics for dealing with quotations, headings, and nonstandard punctuation are used to demarcate sentences. Since it is probably the simplest stylometric feature, we considered it crucial to include mean sentence length as a baseline. This feature might help to distinguish simple styles from more ponderous ones.
Standard deviation of sentence length. Intuitively, lively or informal writing might contain sentences of widely differing lengths, while formulaic writing might have less variation.
Word frequency. We compute the frequencies in a given text of the twenty most common words in the entire corpus, out of a list of 294 words chosen to be relatively content-neutral (mostly function words and pronouns). Word frequency is one of the cornerstones of stylometry; for example, it was key to Mosteller and Wallaces study of the Federalist Papers (Mosteller and Wallace, 1984). There are two reasons for using a pre-selected list of words. First, it allows one to make only one pass through the corpus rather than making two passes and maintaining a hash table. Second, we did not want to cluster texts based on common words signifying similar subject matter (such as names of characters in a novel).
Word frequency is harder to use in clustering than in classification with baselines of known authorship. The reason is that one cannot choose a set of words in advance (such as whilst and upon in the case of the Federalist Papers) with high potential to discriminate between specific authors.
Grammar rule frequency. This feature is identical to the word frequency feature, but is run on a corpus of grammar rules for sentences rather than on the sentences themselves. As mentioned in Section 1, we generated grammar rules using the Link Grammar parser. We attempted to parse only sentences 35 words long or less, since the parser almost never succeeded in parsing longer sentences. We allowed up to three seconds of processing time per sentence, and up to three null links (or errors) per parse. Sentences that could not be parsed within these limits were omitted. We were able to parse about 40% of sentences in both of the test corpora. The output was simply the sequence of grammar rules; structural information about the positions of the rules in sentences was ignored. We considered 239 distinct grammar rules (all of the Link Grammars basic link types, together with their extensions).
Bigram frequency. We obtained consistently poor results with bigram frequency (alone and in conjunction with other features), so we chose to remove it from the collection of features. If grammar rule frequency performed well compared to bigram frequency, it is not clear what would be demonstrated.
Cluster Matching
There are numerous ways to measure the validity of clusters (see for example Mirkin, 1996). But many measures seem to be specific to hierarchical clustering schemes, or deal with the structure of clusters themselves, rather than comparing them to external data. For this work, we elected to use cluster matching, a simple, intuitive measure that may be of independent interest. In this section we introduce the measure and state some theorems about its properties; the theorems are proved in an appendix.
We define the accuracy A to be 1 K( / ((K-1)m), where K ( 2 is the number of clusters, m ( K is the number of texts, and ( is the error, or minimum number of texts that must be moved from one cluster to another before the computed clusters are equivalent to the true clusters up to ordering. (We need up to ordering because the clusters returned by algorithms such as K-means are unlabeled.) We assume that both the true clustering and the computed clustering are partitions of the texts, which they always are in our experiments; clusters are not overlapping or hierarchical. We then have the following:
Theorem 1: 0 ( A ( 1.
Furthermore, one can check that a perfect clustering has A = 1, while a clustering in which each true cluster contributes an equal number of texts to each computed cluster has A = 0. The accuracy measure has another nice property:
Theorem 2: Given a computed clustering and a true clustering, A can be computed in polynomial time, by reduction to the maximum-weight bipartite matching problem.
The A values given in this paper were computed by hand; if the corpora were much larger, it would become worthwhile to implement, say, the Ford-Fulkerson algorithm (see Cormen et al., 2001) to perform the matching.
How can we prove that a given accuracy is statistically significant? We might first ask what the expected accuracy would be for a given list of true clusters, were each text placed into one of the K computed clusters uniformly at random. This is related to the classic urn model in statistics, and we can show the following:
Theorem 3: Suppose each true cluster has ( texts, ((K) = m / K is a function of K satisfying ( = ((K), and texts are placed uniformly at random into computed clusters. Then the expected accuracy is O(log K / K).
Given an error (, we might also want to know the probability p that placing texts into computed clusters uniformly at random would produce error less than or equal to (. We do not know a tight result, but can give an upper bound:
Theorem 4: EMBED Equation.DSMT4
This upper bound would not apply were the clustering algorithm given prior information about cluster sizes (for example, if it knew to find K clusters of roughly the same size). But in our experiments, the algorithm was not given any such information.
Experimental Results
The goal of the experiments was to compare syntactic and data-driven features effectiveness at clustering texts by author. We assembled two test corpora. The first consisted of 18 texts, three passages of approximately 2500 words each from six nineteenth-century American novels: The Adventures of Huckleberry Finn and A Connecticut Yankee in King Arthurs Court by Mark Twain; The Scarlet Letter and The House of The Seven Gables by Nathaniel Hawthorne; and Moby Dick and Typee by Herman Melville. We chose these by the following criteria: the texts should be by famous authors, they should all be from the same register, country, and historical period; they should span more than one work by each author; and they should be readily available in electronic form. The second corpus consisted of 31 Federalist Papers: ten accepted as Alexander Hamiltons on historical grounds (1, 6, 7, 30, 33, 34, 68, 71, 79, 85), ten accepted as James Madisons on historical grounds (10, 37, 38, 39, 40, 41, 42, 43, 44, 45), and eleven previously under dispute (49, 50, 51, 52, 53, 54, 55, 56, 57, 62, 63) but now accepted to be Madisons.
Tables 1 and 2 and Figure 1 summarize the results. Errors is the minimum number of texts that must be moved from one cluster to another; accuracy is as defined in Section 5; Max p is an upper bound on the result occurring by chance, as implied by Theorem 4; and Time (s) is the number of seconds needed for clustering, not including the time needed for the parsing program. Recall that, when features were tested in conjunction, they were always weighed equally. For the Federalist Papers corpus, disputed papers were counted as Madisons, in keeping with the current consensus.
Table 1: Results with 19th-century novels corpusFeature set testedErrorsAccuracyMax pTime (s)Letter frequency80.333Not sig.1.412Punctuation mark frequency10.9175.73(10-71.021Word frequency40.6678.69(10-41.810Letters and punctuation10.9175.73(10-72.003Punctuation and words20.8331.01(10-51.572Letters, punctuation, and words20.8331.01(10-52.704Mean sentence length70.417Not sig.0.490St. dev. of sentence length70.417Not sig.0.461Mean and st. dev. Of length70.417Not sig.0.540Grammar rule frequency20.8331.01(10-51.320Grammar rules and punctuation10.9175.73(10-71.700Grammar rules and words50.5835.12(10-31.870Grammar rules, punctuation, and words20.8331.01(10-52.300
Table 2: Results with Federalist Papers corpusFeature set testedErrorsAccuracyMax pTime (s)Letter frequency100.355Not sig.2.413Punctuation mark frequency90.4192.94(10-21.472Word frequency80.4841.07(10-22.520Letters and punctuation150.032Not sig.2.924Punctuation and words140.097Not sig.2.233Letters, punctuation, and words150.032Not sig.3.615Mean sentence length100.355Not sig.0.701St. dev. of sentence length110.290Not sig.0.752Mean and st. dev. Of length100.355Not sig.0.872Grammar rule frequency140.097Not sig.1.710Grammar rules and punctuation70.5483.33(10-32.200Grammar rules and words100.355Not sig.2.410Grammar rules, punctuation, and words120.281Not sig.3.570
Figure 1: Accuracy of clusters under different feature combinations
Analysis
In general, features generated reasonably accurate clusters for the 19th-century novels corpus, and poor clusters for the Federalist Papers corpus. This is unsurprising, since the Federalist Papers is a challenging benchmark problem in stylometry, and the novels corpus includes three writers with widely differing styles. (Twain, in particular, tended not to be clustered with Melville and Hawthorne.) What interests us here is the relative performance of the features, and what these performances indicate about the cave of shadows hypothesis of Baayen et al. (1996).
Letter frequency alone produced statistically insignificant results for both corpora. Punctuation frequency was highly effective, both by itself and in conjunction with other features. This supports the hypothesis that punctuation frequency is a more direct reflection of style than such features as letter frequency and bigram frequency. Mean sentence length and standard deviation of sentence length did not produce statistically significant results, which might be caused by their having only a single component each. In the novels corpus, features that worked well individually tended to work well togetherfor example, letter frequency combined with punctuation frequency was as effective as punctuation frequency alone. But strangely, this effect disappears in the Federalist Papers corpus; no combination of data-driven features produced statistically significant results.
The results for grammar rule frequency yield tentative support for our thesis. In the novels corpus, feature combinations involving grammar rules were competitive with those involving only data-driven features; neither was a clear winner. (Larger corpora would be required to reach a conclusion; we were limited by the processing time needed to parse the corpora.) In the Federalist Papers corpus, the best-performing feature set was grammar rule frequency combined with punctuation frequency, with 7 errors out of 31 texts and p<3.33(10-3.
All of these results seem poor compared to, for example, the apparently perfect accuracy that Mosteller and Wallace (1984) achieved. But since the clustering approach does not allow fine-tuning beforehand with texts of known authorship, statistically significant results, rather than near-perfect ones, may be the best achievable.
We hope that this paper will encourage further research on using automated parsing for stylometric applications. In particular, to address the usefulness of automated parsing for stylometry, one would ideally want a comparison of hand parses against machine-generated parses for some hand-parsed corpus, using the same set of syntactic features for both.
Acknowledgments
The author is grateful to Lillian Lee, Kiri Wagstaff, Daniel Sleator, Graeme Hirst, Neil Graham, and Charles Rackoff for helpful discussions. This work was supported in part by a National Science Foundation Graduate Fellowship.
Appendix: Proofs of Theorems in Section 5
Proof of Theorem 1: A ( 1 because K ( 2, ( ( 0, and m ( 1. For A ( 0, we use the probabilistic method. For any list C1 CK of computed clusters, consider the mean error over all K! assignments of computed clusters to true clusters. Let Ti be a true cluster; then Ti is equally likely to be assigned to any of the computed clusters. Letting Sij be the number of texts Ti shares with Ci (so that Si1 + + Sik = |Ti|), the mean error contributed by Ti then is
1 / K [(|Ti| - Si1) + + (|Ti| - Sik)]
= 1 / K [K|Ti| - (Si1 + + Sik)]
= (1 1 / K) |Ti|.
So adding the mean error contributed by all true clusters, the mean error over all assignments is
(1 1 / K) (|T1| + + |Tk|)
= (1 1 / K)m.
Therefore, there must exist at least one assignment for which the error is ( (1 1/K)m. So ( ( (1 1 / K)m, or rearranging, A = 1 Ke / (K-1) m ( 0. (
Proof of Theorem 2: We create a weighted bipartite graph G with the true clusters T1 TK as left vertices and the computed clusters C1 CK as right vertices. The weight of edge (Ti, Cj) equals the number of texts in Cj that are not in Ti. We then find a minimum-weight perfect matching for G, which can be done in polynomial time (Cormen et al. 2001). The matching thus obtained will minimize the sum, over all computed clusters, of the number of texts that do not belong to the corresponding true cluster and thus have to be moved to a different cluster. This is equivalent to minimizing the error. Summing the weights of edges in the matching yields (, from which we can compute A. (
Proof of Theorem 3: Let T be any true cluster. We have ( texts from T distributed uniformly at random into K computed clusters, and we would like to know, on average, the maximum number X(K,() of these texts that fall into any one computed cluster. In terms of the urn model, we would like to know the limit distribution of maximal occupancy if ( balls are thrown uniformly at random into K urns, a problem discussed in Kolchin et al. (1978), p. 94-115. Kolchin et al.s Theorem II.6.1 implies that if ( = o(K log K), then in the limit, the distribution of the maximum number of balls in an urn becomes concentrated at R-1 and R, where R > ( / K is a solution to the equation
R! (( / K)-R = K e-(/K.
This equation can give us an asymptotic upper bound on X(K,(): if ( = ((K), then X(K,() = O(( log K / K).
Let ( = ( / K = ((1), and R = ( log K / K = ( log K. We want to show that for any (, the inequality
R! (-R > K e-(
holds asymptotically (i.e., for all K > some K0). Using Stirlings approximation (and dropping the ((2(R), which can only make the inequality weaker), we obtain
(R/e)R (-R > Ke-(
Substituting R = ( log K and taking the logarithm of both sides:
( log K log[( log K / e] ( log ( log K > log K - (
Algebraic manipulation leads to
log log K + 1 / log K > 1/( + 1
which holds asymptotically since ( = ((1).
Now, for each computed cluster Cj, let tj be the maximum number of texts Cj has in common with any true cluster. Since all ( texts in Cj must be moved to the same true cluster, (-tj is the minimum error Cj could contribute. From the upper bound in the urn model, the expected value of tj is X(K,() = O(( log K / K), so Cjs expected error is ( - O(( log K / K). Since the random variable tj depends only on the distribution of Cjs texts, we can sum the expected error over all Cj, obtaining that the expected ( is
K( - O(( log K).
So the expected accuracy is
A = 1 K ( / ((K-1) m)
= 1 K[K( - O(( log K)] / ((K-1) K()
= O(log K / K). (
Proof of Theorem 4: Suppose the matching of computed clusters to true clusters were fixed. Then the probability q of the error being less than or equal to ( would just be the probability of from m-( to m successes out of m independent trials (assignments of texts into computed clusters), each one having success probability 1/K. From the binomial distribution,
EMBED Equation.DSMT4
There are K! possible matchings of computed clusters to true clusters, and were interested in the probability that any of them have error less than or equal to (. By the union bound, this is ( K! q. (
References
Baayen, H., H. van Halteren, and F. Tweedie (1996) Outside the Cave of Shadows: Using Syntactic Annotation to Enhance Authorship Attribution. Literary and Linguistic Computing 11(3).
Cormen, T.H., C.E. Leiserson, R.L. Rivest, and C. Stein (2001). Introduction to Algorithms, McGraw-Hill.
Forsyth, R.S. and D.I. Holmes (1996) Feature-Finding for Text Classification. Literary and Linguistic Computing 11(4).
Greenwood, H.H. (1992) St. Paul Revisiteda Computational Result. Literary and Linguistic Computing 7(1).
Grinberg, D., J. Lafferty and D. Sleator (1995) A robust parsing algorithm for link grammars. In Proceedings of the Fourth International Workshop on Parsing Technologies, Prague.
Kessler, B., G. Nunberg and H. Schtze (1997) Automatic Detection of Text Genre. In Proceedings of 35th Annual Meeting of the Association for Computational Linguistics (ACL).
Kolchin, V., B. Sevastyanov, and V. Chistyakov (1978) Random Allocations, Wiley.
Ledger, G. and T. Merriam (1994) Shakespeare, Fletcher, and the Two Noble Kinsmen. Literary and Linguistic Computing 9(3).
Mirkin, B. (1996) Mathematical Classification and Clustering (Nonconvex Optimization and its Applications Vol. 11), Kluwer.
Mosteller, F. and D. L. Wallace (1984) Applied Bayesian and Classical Inference: The Case of the Federalist Papers, Springer Series in Statistics.
Temperley, D., D. Sleator, and J. Lafferty (2001) The Link Grammar Parser (software). [http://www.link.cs.cmu.edu/link]
scott aaronson, stylometric clustering PAGE 17
PAGE 17
YZ[\J K V s
WQ\,201QSpruTU1#B#$ƾ}voh
c6] jh
ch
c0J"PJh
cmH sH jh
cUmH sH h
c6 h
c]h
c5CJ OJQJmH sH h
cB*OJQJphh
cB*phh
c5B*phh
c6OJQJ]h
cOJQJ h
c5
h
cCJh
c6CJ H* h
cH*h
c(K[ I J K L
h/4stu`
&FyVydy D#$$$%T%%%&&'x(y(z({(())*****+,
&F
&Fd#`$&{(())**B,W,-.N.X.11b1c1I2o2q2j3344566678@:Q:::!>2>Q?b?@@lAtAAAAAAAAAAACCCCCCDDGGG jh
c h
c5 jh
c jeh
ch
c6] h
cH* h
c6 h
c]h
cB*OJQJphh
c5CJ OJQJmH sH h
ch
c5CJOJQJ?,--.1H2I2h3j3445566g9:: >!>N?O?P?Q?b?^ACCD
&F`DcE:FGWH>IeIbJcJxJN/Q`QaQtlkd.$$Ifl4+""
04
laf4p
$d$Ifa$
&F#`
GGGGGGGGfHgHHH>IIIJIaIbIcIdIcJxJKKKKKLL)LFLOLTLYLMMPP/QGQIQ`QaQQQQQQR
RRR=R˿ h
cH* jh
ch
c5B*H*phh
c5B*phh
c6] h
c6h
c5CJ OJQJmH sH jh
c5EHUj?
h
cCJUVjh
c5U h
c5 jeh
c jWh
ch
c jah
c3aQtQ{QQQQQQQNkd$$Iflrx$~+"04
lad$IfQQQQQQQQQNkd$$Iflrx$~+"04
lad$IfQQQRRRRRNkd?$$Iflrx$~+"04
lad$IfRR1R3R9RCRIRYNNNNNd$Ifkd$$Iflrx$~+"04
la=R>R@RBRlRmRoRqRRRRRjSkSmSoSSSSSSSSSTTTTTT5TFTOTTTTTTTTTUVVVVWWWVWXWaWWWWWX'XYZZ h
c]h
c6\]h
cH*\ h
c\h
c5CJ OJQJmH sH h
c5\jh
cU h
c5h
c56B*]phh
c5B*ph h
c: h
cH*h
c jh
ccIcJc~ccccccddadcd|d}dddddddddddddddddd h
cH* jeh
c jh
c jh
c h
c5h
cOJQJh
cB*H*OJQJph jh
cB*OJQJphh
c6B*OJQJ]phh
cB*OJQJphh
c6]h
c;Y]6_`aaabbcdd)eFeeeezf{f0i1ikk\lllrmmm`ddeeee$e&eBeCeeeee*f+ff?fsftfxfyf{ffffffgggg1g2g5g6gWgXgjgkgii.i/i0i1iEiiijiiijj,k-kkkkkkkkkkk-l.l4l5l8l9lGl jWh
c jah
cH* h
cH* jah
c h
c5 jh
c jh
c jeh
c jh
ch
c h
cH*IGlHlNlOl\l]l`laldlelllmlzl{lllllllllllll4m5m7m8mwmxmymzm|mmmmmmmmmmmmmmmmm6n7n]n^nanbnnnnnnnnnnnoooo4o5ohh
cmHsH jph
c jh
c h
cH* jbh
cH* h
cH* jWh
c jbh
c h
c]h
c jah
cHmmnE2 lD`!d>E2 p2xڵVMLSA}-QhK!Z`
L8ChTyJD%ś 㑃FQN+ukڦ#~2gO.1-F[SҲv>&=:?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijkmnopqrs~vyz{|Root Entry F03?x Data
l[&WordDocument)ObjectPool{?03?_1067784066 F{?{?Ole
CompObjiObjInfo
!"#%
FMathType 5.0 EquationMathType EFEquation.DSMT49q$\DSMT5WinAllBasicCodePagesTimes New RomanSymbolCourier NewMT Extra!/ED/APG_APAPAE%B_AC_AE*_HA@AHA*_D_E_E_A
pd"K!Equation Native _1067784381F`?`?Ole
CompObj
iKm
pmi()i==0e
"
K"-1()i
.
FMathType 5.0 EquationMathType EFEquation.DSMT49q$\DSMT5WinAllBasicCodePagesObjInfo
Equation Native 1TableHoSummaryInformation(Times New RomanSymbolCourier NewMT Extra!/ED/APG_APAPAE%B_AC_AE*_HA@AHA*_D_E_E_A
q==1Km
pmi()i==0e
"
K"-1()i
.Oh+'0 "5554$$If!vh55555#v#v#v:Vl0"5554$$If!vh55555#v#v#v:Vl0"5554$$If!vh55555#v#v#v:Vl0"5554$$If!vh55555#v#v#v:Vl0"5554$$If!vh55555#v#v#v:Vl0"5554$$If!vh55555#v#v#v:Vl0"5554$$If!vh55555#v#v#v:Vl0"5554$$If!vh55555#v#v#v:Vl0"5554$$If!vh55555#v#v#v:Vl0"5554$$If!vh55555#v#v#v:Vl0"5554$$If!vh55555#v#v#v:Vl0"5554$$If!vh55555#v#v#v:Vl0"5554$$If!vh55555#v#v#v:Vl0"5554Dd
]5}$P
SbA?"2$Y!ۺ:u+ST5`!$Y!ۺ:u+ST5.o@hZxZ]lUlw,'!PK?b6@JkXJiqh@@Pkb!`P&>B(!z;gwv3̹|3wa!\!B}08{FFFh/ k$i =-qd6vWzB s:z:ǾzBPHal^j+~Q)
BRl yӯ[V~jώv \l;ҀrTgˏFY5ϻ%4wK:e.3\?s';a3g.\=s-cns)W<*ѣNwGVjgad14,!gk0R{)95f0Ɏ|{<-B2H=+/({F[Vڟ(q;xBAMbO+s%t!JĶ,I$sk
}EaaPjy'Nc[&T];Z!.]v7a7 mSX;;j?jYGP5ΫuN+]]>i%d.B_ٵҚ=̫uNM@ ʯs=7^1\X?QM2hM2F L4vղ`vB3?W7ZޝISgg "?nq"vqnh`~@ٵ+ڥP38WGs/Δ]bu#1G|sUƶfIXO]wΧ9J{A+]OL=)|x
sUWm+S^ 1(10awqvp\!XJ8>P8WGpUL,t?~Շ</e5{ZX~#^xlsr_]z ֛;uB'py:-m*W+Pw3IN2wř3W\=s˘[\%seWt*?E"^!msIze:sיYژkc~0vs']g[B\̔vt҉(TјKx2oځq^G_̇uT-Wӈy(+Ut;\O̯v12g_`~]"7hގ4$HW+~@67Jk'4vuE[{"Dc"N$Cx!IDGs0H+O>O[\Dq)ќ:;"k9$kRQjm6©QۜۼC6vw&[l!^*@*h:!}F*g
|@wmxGX9K[:DkDG]LնF8"U':K:O
@HZjc{Qw=[V/hunc
AZ)v/8=&ٹsv%``o7vJ.Z*#t]54Z(:
J{wس-gneHfRXFDn
Hh?]\rpƱ[66mX~og8t?*Xէc؉C8{8XО1_|[ǵ}S1Qư+-#Oh)y-L&`KZѓnNEWL*1Be^:^Cu:_xB
Z<^_R.YWj*`\QgntK/KqqǍ-(<R+ɫh욖\Lߍ21צ$'xsf:x;(t,((9|bNhq=B3:
'A(6A~DT)g"^9b/ -sR้2CWYj =RiЌZw>4Z]r7__FMTEquationSection1Eq
FMicrosoft Word Document
MSWordDocWord.Document.89q$L@LNormal$dha$CJ_HmH sH tH v@v Heading 13$$
&Fhd$<@&^h`a$;CJKHPJtH u^@^ Heading 2-$$
&FB<@&^B`a$:T@T Heading 3#$$
&Fdx<@&a$PJz@z Heading 4?$$
&F^dx<1$7$8$@&H$^^`a$
B*PJphJ@J Heading 5
&F<@&
56CJH@H Heading 6
&F<@&5CJ@@@ Heading 7
&F<@&D@D Heading 8
&F<@&6L @L Heading 9
&F<@&CJOJQJDA@DDefault Paragraph FontVi@VTable Normal :V44
la(k@(No List<O<Author$da$:CJ R"RCaption$d$`a$CJPJtH uPP
Footnote Textd$CJPJtH uD>@"DTitle$x@&a$5CJ(KH@&1@Footnote ReferenceH*0U@A0 Hyperlink>*B*,@R,Header:4 @b4Footer
!.)@q.Page NumberF0@F
List BulletPCJPJtH uN+NEndnote TextdxCJPJtH u>*>Endnote ReferenceH*<B@< Body Texth B*ph@O@Section header5CJJOJAffiliation$da$6CJDOD
References7^7`CJ0O0AbstractCJ0O0Keywords CJdC@dBody Text Indent!d`B*OJPJQJphVO!VMTEquationSection5<B*PJmH phsH HR@2HBody Text Indent 2
#`
eq
eqK[IJKLh/
4stuDTx y z { !!"""""#$%%&)H*I*h+j+,,--..g122 6!6N7O7P7Q7b7^9;;<c=:>?W@>AeAbBcBxBF/I`IaItI{IIIIIIIIIIIIIIIIIJJJJJJ1J3J9JCJIJJJ`JbJhJrJxJyJJJJJJJJJJJJJJJKKKK/K1K7K@KFKGK^K`KfKpKvKwKKKKKKKKKKKKKLL
LLLLLNLOLbLiLrLxLLLLLLLLLLLLLLLLLLMMM M#M)M2M8M9MOMRMXMaMgMhMMMMMMMMMMMMMMMMMNN N#N)N2N8N9NPNSNYNbNhNiNNNNNNNNNNNNNNNO OOOOVOWOXOaOQU6WXYYYZZ[\\)]F]]]]z^{^0a1acc\dddreeeef@BCEFGHJKLMNOPQRSTUVWXYZ[\]^_`abdgjdy?0QRpqIAaAcA^jvjxjeq3@@::*147>AH!!8GH@G`0( u u
B
S ?eqP *"Q <#>R l*"S 0T *>U 4+>V +>W +>X $,>Y d,>Z ->[ ->\ tFA] FA^ GA_ GA` tJAa JAb KAc *urn:schemas-microsoft-com:office:smarttags
PostalCode8*urn:schemas-microsoft-com:office:smarttagsCity9*urn:schemas-microsoft-com:office:smarttagsState9*urn:schemas-microsoft-com:office:smarttagsplace""""!6'6_6e677IAdAeAiAjAoATDYDKK
NNRRZ"Z#Z+Z4Z;ZDZIZdZkZ`\c\\\\\\\]]#]&]]]f^h^4_6_V_X_eeeeeeeeeeee f#fffffffffgg3g5ggggggghhHhJhhhhhhhhhhhuk|kl%l,l2lwm~mn!n)n0nnnnnooopppqqqqq:qEqGqRqTqTqUqVqcqfqftI
Q
I*o*j++++-...//22!626;;I@K@`@f@DD[[]]4^6^J^L^aaaa0c2c)d+dddddeeeffj^jzjpqqqqqq9q:qEqGqRqTqTqfqqqqqqUqVqfq
Avaya User
Avaya User
Avaya User
Avaya User
Avaya User
Avaya User
Avaya UseraaronsonaaronsonScott Aaronson@
w oTNm +FM2{Y=BNDsClM DRl(ЄeHp5 %q yP>I!|B5X~bhh^h`OJQJo(*hh^h`OJQJo(hh^h`OJQJo(hh^h`OJQJo(hh^h`B*OJQJo(hh^h`OJQJo(hh^h`OJQJo(hh^h`B*OJQJo(hh^h`o(.hh^h`o(hh^h`o(.0^`0o(..0^`0o(... 0^`0o( .... 88^8`o(.....
88^8`o(
......
`^``o(.......
`^``o(........hh^h`OJQJo(hh^h`5CJ OJQJo(.88^8`.L^`L. ^ `.^`.xLx^x`L.HH^H`.^`.L^`L.hh^h`OJQJo(^`OJQJo(opp^p`OJQJo(@@^@`OJQJo(^`OJQJo(o^`OJQJo(^`OJQJo(^`OJQJo(oPP^P`OJQJo(^`OJQJo(^`OJQJo(opp^p`OJQJo(@@^@`OJQJo(^`OJQJo(o^`OJQJo(^`OJQJo(^`OJQJo(oPP^P`OJQJo(hh^h`o(.@@^@`o(.P^`P@@^@`.0^`0..``^``... ^` .... ^`.....
^`
......
`^``.......
L`00^0`........hh^h`o(.h^`o(()h^`.hpLp^p`L.h@@^@`.h^`.hL^`L.h^`.h^`.hPLP^P`L.hh^h`o(.hh^h`o(..0^`0o(...0^`0o(.... 88^8`o(
..... 88^8`o(......
88^8`o(.......
`^``o(........
`^``o(.........^`o(()^`.pLp^p`L.@@^@`.^`.L^`L.^`.^`.PLP^P`L.Hp
X~yH#k#k#kY=BD$k$kM2o!|+Nm%qClMwDRlT#k @CJ@OJQJo(" #k (@CJ8OJQJo( #k @CJ0OJQJo(" P$k @CJ(OJQJo(" $k`@CJ0OJQJo("
cBF/I`IaItI{IIIIIIIIIIIIIIIIIJJJJJJ1J3J9JCJIJJJ`JbJhJrJxJyJJJJJJJJJJJJJJJKKKK/K1K7K@KFKGK^K`KfKpKvKwKKKKKKKKKKKKKLL
LLLLLNLOLbLiLrLxLLLLLLLLLLLLLLLLLLMMM M#M)M2M8M9MOMRMXMaMgMhMMMMMMMMMMMMMMMMMNN N#N)N2N8N9NPNSNYNbNhNiNNNNNNNNNNNNNNNO OOO`opqqTqfq5q #?@p eq0@UnknownGz Times New Roman5Symbol3&z Arial3z Times?5 z Courier New;&z Helvetica;Wingdings71 Courier"1h%[F1{j0`90`9!4dpp2QH?NANCY IDE, LAURENT ROMARYVassarScott Aaronsond
__