Current Search: Statistics (x)
Search results
Pages
 Title
 LUMPABILITY AND WEAK LUMPABILITY IN FINITE MARKOV CHAINS.
 Creator

ABDELMONEIM, ATEF MOHAMED., Florida State University
 Abstract/Description

Consider a Markov chain x(t), t = 0, 1, 2, ..., with a finite state space, N = {1, 2, ..., n}, transition probability matrix P = (p(,ij)) i, j (epsilon) N, and an initial probability vector V = (v(,i)) i (epsilon) N. For m (LESSTHEQ) n let A = {A(,1), A(,2), ..., A(,m)} be a partition on the set N. Define the process, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), The new process y(t), called a function of Markov chain, need not be Markov. If y(t) is again Markov, whatever the initial...
Show moreConsider a Markov chain x(t), t = 0, 1, 2, ..., with a finite state space, N = {1, 2, ..., n}, transition probability matrix P = (p(,ij)) i, j (epsilon) N, and an initial probability vector V = (v(,i)) i (epsilon) N. For m (LESSTHEQ) n let A = {A(,1), A(,2), ..., A(,m)} be a partition on the set N. Define the process, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), The new process y(t), called a function of Markov chain, need not be Markov. If y(t) is again Markov, whatever the initial probability vector of x(t), x(t) is said to be lumped to y(t) with respect to the partition A. If y(t) is again Markov for only certain initial probability vectors of x(t), x(t) is said to be weakly lumped to y(t) with respect to the partition A., Conditions under which x(t) can be lumped or weakly lumped to y(t) with respect to A, are introduced. Relationships between the two processes x(t) and y(t) and the properties of the new process y(t) are discussed., Criteria are developed to determine whether a given Markov chain can be weakly lumped with respect to a given partition in terms of an analysis of systems of linear equations. Necessary and sufficient conditions on the transition probability matrix of a Markov chain, a partition, A, on N and a subset S of probability vectors for weak lumpability to occur are given in terms of the solution classes to these systems of linear equations. Finally, given that weak lumping occurs, the class S of all initial probability vectors which allow weak lumping is determined as is the transition probability matrix of the lumped process, y(t)., Lumpability and weak lumpability are also studied for Markov chains which are not irreducible. This involves a study of the interplay between two partitions of the state space N, the partition C, induced by the closed sets of states of the Markov chain and the partition A, with respect to which lumpability is to be considered. Under the assumptions that lumpability occurs the relationships which must exist between sets of the two partitions A and C are obtained in detail. It is found, for example that if neither partition is a refinement of the other and (A,C) form an irreducible pair of partitions over N then for each A (epsilon) A and C (epsilon) C, A (INTERSECT) C (NOT=) (phi). Further conditions which the transition probability matrix P must satisfy if lumpability is to hold are obtained as are relationships which must exist between P and P*., Suppose a process y(t) is known to arise as a result of a weak lumping or lumping from some unknown Markov chain x(t). Let (chi)(t) be the class of all Markov chains x(t) with n states which yield this weak lumping or lumping. The problem of characterizing this class and a class S of initial probability vectors which allow this lumping is considered. A complete solution is given when n = 3 and m = 2., The importance of lumpability in application is discussed.
Show less  Date Issued
 1980, 1980
 Identifier
 AAI8109927, 3084860, FSDT3084860, fsu:74361
 Format
 Document (PDF)
 Title
 Testing for a timedependent covariate effect in the linear risk model.
 Creator

Amirsehi, Kourosh., Florida State University
 Abstract/Description

We propose two tests to identify a time dependent covariate effect in the partly parametric linear risk model, and derive asymptotic distributions of the test statistics under the assumption that the covariate effect of interest is constant. One of the asymptotic distributions depends on unknown functions and we devise a weighted bootstrap procedure to estimate its quantiles. We also derive rates of convergence of maximum likelihood estimators of regression coefficients in both the...
Show moreWe propose two tests to identify a time dependent covariate effect in the partly parametric linear risk model, and derive asymptotic distributions of the test statistics under the assumption that the covariate effect of interest is constant. One of the asymptotic distributions depends on unknown functions and we devise a weighted bootstrap procedure to estimate its quantiles. We also derive rates of convergence of maximum likelihood estimators of regression coefficients in both the nonparametric and the partly parametric linear risk models using the method of sieves. We carry a simulation study to assess the performance of the proposed test and apply it to real data from a clinical trial on myelomatosis.
Show less  Date Issued
 1995, 1995
 Identifier
 AAI9620872, 3088860, FSDT3088860, fsu:77659
 Format
 Document (PDF)
 Title
 Identifiability in the autopsy model of reliability theory.
 Creator

Antoine, Robin Michael., Florida State University
 Abstract/Description

Let S be a coherent system of m components acting independently. Two statistical models are considered. In the autopsy model S is observed until it fails. The set of failed components and the failure time of the system are noted. The failure times of the dead components are not known. In the second model, which was considered by Doss, Freitag and Proschan (Ann. Statist., 1989), the failure times of the dead components are also known., In the autopsy model, it is not always possible to...
Show moreLet S be a coherent system of m components acting independently. Two statistical models are considered. In the autopsy model S is observed until it fails. The set of failed components and the failure time of the system are noted. The failure times of the dead components are not known. In the second model, which was considered by Doss, Freitag and Proschan (Ann. Statist., 1989), the failure times of the dead components are also known., In the autopsy model, it is not always possible to estimate or identify the component lifelengths from the observed data. A sufficient condition for the identifiability of the component distributions is given for the case in which the distributions are assumed to be analytic. Necessary and sufficient conditions are given for the case in which the distributions are assumed to belong to certain parametric families., The model of Doss, Freitag and Proschan is considered in two special cases. In the first of these the component distributions are known to be identical. In the second, the distributions are known to be exponential. Estimators of the component and system life lengths are given for each of these cases, and the asymptotic relative efficiency of each with respect to the corresponding estimator of Doss, Freitag and Proschan is calculated.
Show less  Date Issued
 1992, 1992
 Identifier
 AAI9222356, 3087814, FSDT3087814, fsu:76624
 Format
 Document (PDF)
 Title
 PARTIAL SEQUENTIAL TESTS FOR THE MEAN OF A NORMAL DISTRIBUTION.
 Creator

ARGHAMI, NASSER REZA., Florida State University
 Abstract/Description

Recently, Billard (1977) introduced a truncated partial sequential procedure for testing a null hypothesis about a normal mean with known variance against a twosided alternative hypothesis. That procedure had the disadvantage that a large number of observations is necessary if the null hypothesis is to be accepted. A new procedure is introduced which reduces the expected sample size for all mean values with considerable reductions for values near the null mean value. Theoretical operating...
Show moreRecently, Billard (1977) introduced a truncated partial sequential procedure for testing a null hypothesis about a normal mean with known variance against a twosided alternative hypothesis. That procedure had the disadvantage that a large number of observations is necessary if the null hypothesis is to be accepted. A new procedure is introduced which reduces the expected sample size for all mean values with considerable reductions for values near the null mean value. Theoretical operating characteristic and average sample number functions are derived, and the empirical distribution of the sample size in some special cases is obtained., For the case of unknown variance and a onesided alternative hypothesis, there are a number of tests, the best known of which are those of Wald (1947) and Barnard (1952). These tests have concerned themselves with tests for units of (mu)/(sigma). In this work, a partial sequential test procedure is introduced for hypotheses concerned only with (mu). An advantage of this new procedure is its relative simplicity and ease of execution when compared to the above tests. This is essentially due to the fact that in the present procedure the transformed observations follow a central tdistribution as distinct from the noncentral tdistribution. The difficulties caused by the noncentral distribution explain the relative lack of progress in obtaining the results about the properties, such as the operating characteristic and average sample number functions, of the tests of Barnard and Wald. The key element in the present procedure is that a number of observations is taken initially before any decision is made; subsequent observations are then taken in batches, the sizes of which depend on the estimate for the variance obtained from the initial set of observations. Some properties of the procedure are studied. In particular, an approximation to the theoretical operating characteristic function is derived and the sensitivity of the average sample number function to changes in some of the test parameters is investigated., The ideas developed for the partial sequential ttest are extended to develop tests of hypotheses concerning the parameters of a simple linear regression equation, general linear hypotheses and hypotheses about the mean of special cases of the multivariate normal.
Show less  Date Issued
 1981, 1981
 Identifier
 AAI8125865, 3085070, FSDT3085070, fsu:74568
 Format
 Document (PDF)
 Title
 SOME RESULTS ON THE DISTRIBUTION OF GRUBBS ESTIMATORS.
 Creator

BRINDLEY, DENNIS ALFRED., Florida State University
 Abstract/Description

This dissertation is concerned with the estimation of error variances in a nonreplicated twoway classification and with inferences based on the estimators so derived. The postulated model used throughout the present work is, y(,ij) = (mu)(,i) + (beta)(,j) + (epsilon)(,ij),, where y(,ij) is the observation in the i('th) row and j('th) column, (mu)(,i) is the parameter representing the mean of the i('th) row, (beta)(,j) is the parameter representing the additional effect of the j('th) column,...
Show moreThis dissertation is concerned with the estimation of error variances in a nonreplicated twoway classification and with inferences based on the estimators so derived. The postulated model used throughout the present work is, y(,ij) = (mu)(,i) + (beta)(,j) + (epsilon)(,ij),, where y(,ij) is the observation in the i('th) row and j('th) column, (mu)(,i) is the parameter representing the mean of the i('th) row, (beta)(,j) is the parameter representing the additional effect of the j('th) column,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), and the (epsilon)(,ij) are independent, zeromean, normal variates with, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), A set of unbiased estimates, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), developed in earlier work by Grubbs (J. AMER. STATIST. ASSOC. 43 (1948), 243264), Ehrenberg (BIOMETRIKA 37 (1950), 347357) and Russell and Bradley (BIOMETRIKA 45 (1958), 111129) are considered., The exact joint density of Q(,1), ..., Q(,r) is obtained for r = 3 and two exact results are derived for testing the null hypothesis,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), unknown, versus the two specific alternatives,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), for at least some j, j = 1, 2, 3, and,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI)
Show less  Date Issued
 1982, 1982
 Identifier
 AAI8229146, 3085401, FSDT3085401, fsu:74896
 Format
 Document (PDF)
 Title
 KLEMS translog cost estimates and energy elasticities.
 Creator

Campbell, Timothy Alan., Florida State University
 Abstract/Description

Data from the Bureau of Labor Statistics (BLS) for capital, labor, energy, materials, and business services (KLEMS) are used to estimate translog cost functions. Much of the work developing and testing production and cost functions has used the same Berndt and Wood (BW) data for total manufacturing. Results from the BLS are compared with the BW data and considerable differences found., To improve the translog estimates the Kalman filter and state space form are used in an effort to permit the...
Show moreData from the Bureau of Labor Statistics (BLS) for capital, labor, energy, materials, and business services (KLEMS) are used to estimate translog cost functions. Much of the work developing and testing production and cost functions has used the same Berndt and Wood (BW) data for total manufacturing. Results from the BLS are compared with the BW data and considerable differences found., To improve the translog estimates the Kalman filter and state space form are used in an effort to permit the time proxy for technological change to follow a random walk with drift. The general state space form provides a unified structure that subsumes other models. After smoothing the Kalman filter model is equivalent to including time proxy., An errorcorrection model or ECM is used to make the translog specification more dynamic. Nested within the most general ECM specification are the more restrictive static, partial adjustment, and autoregressive models. Likelihood ratio tests reject the more restricted models in favor of the general ECM specification, but theoretical symmetry and addingup restrictions are rejected for most twodigit Standard Industrial Code industries using the general ECM specification. Elasticities are computed for total manufacturing and compared with those found in other studies with a special emphasis on energy. Many violations of the monotonic, ownprice, and concavity theoretical requirements are found.
Show less  Date Issued
 1993, 1993
 Identifier
 AAI9410157, 3088225, FSDT3088225, fsu:77029
 Format
 Document (PDF)
 Title
 ON DETERMINING THE NUMBER OF PREDICTORS IN A REGRESSION EQUATION USED FOR PREDICTION.
 Creator

CARR, MEG BRADY., Florida State University
 Abstract/Description

It is generally recognized that all the available variables should not necessarily be used as predictors in a linear regression equation. The problems which may arise from using too many predictors become especially acute in a regression equation used for prediction with independent data. In this case, the skill of prediction may actually deteriorate with increasing numbers of predictors. However, there is no definitive explanation as to why this should be so. There is also no universally...
Show moreIt is generally recognized that all the available variables should not necessarily be used as predictors in a linear regression equation. The problems which may arise from using too many predictors become especially acute in a regression equation used for prediction with independent data. In this case, the skill of prediction may actually deteriorate with increasing numbers of predictors. However, there is no definitive explanation as to why this should be so. There is also no universally accepted procedure for determining the number of predictors to use. The various regression methods which do exist are logically contrived but are also largely based on subjective considerations., The goal of this research is to develop and test a criterion that will indicate a priori the "optimum" number of predictors to use in a prediction equation. The mean square error statistic is used to evaluate the performance of a regression equation in both the dependent and independent samples. Selecting the "best" prediction equation consists of determining the equation with the minimum estimated independent sample mean square error. Several approximations and estimators of the independent sample mean square error which have appeared in the literature are discussed and two new estimators are derived., These approximations and estimators are tested in Monte Carlo simulations to determine their skill in indicating the number of predictors which will yield the best prediction equation. The sample size, number of available predictors, correlations among the variables, distribution of the variables, and selection method are manipulated to explore how these various factors influence the performances of the mean square error estimators. It is found that the better estimators are capable of indicating a number of predictors to include in the regression equation for which the corresponding independent sample mean square error is near the minimum value., As a practical test, the various estimators of the independent sample mean square error are applied to the data used in deriving the Model Output Statistics (MOS) maximum and minimum temperature forecast equations used by the National Weather Service. These prediction equations are linear regression equations derived using a forward selection method. The sequence of prediction equations corresponding to the forward trace of all the available predictors is derived for each of 192 cases and then applied to independent data. The forecasts made by the operational p = 10 predictor MOS equations are compared with those made by the equations determined by the estimators of the independent sample mean square error. The operational equations have the best overall verification statistics. The estimators persistently underestimate the values of the independent sample mean square error, but one of the new estimators is able to determine MOS forecast equations that perform as well as the operational equations. Furthermore, it is able to accomplish this without the use of an independent sample to help determine the optimum number of predictors.
Show less  Date Issued
 1980, 1980
 Identifier
 AAI8026121, 3084691, FSDT3084691, fsu:74192
 Format
 Document (PDF)
 Title
 LARGE DEVIATION LOCAL LIMIT THEOREMS, WITH APPLICATIONS.
 Creator

CHAGANTY, NARASINGA RAO., Florida State University
 Abstract/Description

Let {X(,n), n (GREATERTHEQ) 1} be a sequence of i.i.d. random variables withE(X(,1)) = 0, Var(X(,1)) = 1. Let (psi)(s) be the cumulant generating function (c.g.f.) and, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), be the large deviation rate of X(,1). Let S(,n) = X(,1) + ... + X(,n). Under some mild conditions on (psi), Richter (Theory Prob. Appl. (1957) 2, 206219) showed that the probability density function f(,n) of(' )S(,n)/SQRT.(n has the asymptotic expression, (DIAGRAM, TABLE...
Show moreLet {X(,n), n (GREATERTHEQ) 1} be a sequence of i.i.d. random variables withE(X(,1)) = 0, Var(X(,1)) = 1. Let (psi)(s) be the cumulant generating function (c.g.f.) and, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), be the large deviation rate of X(,1). Let S(,n) = X(,1) + ... + X(,n). Under some mild conditions on (psi), Richter (Theory Prob. Appl. (1957) 2, 206219) showed that the probability density function f(,n) of(' )S(,n)/SQRT.(n has the asymptotic expression, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), whenever x(,n) = o(SQRT.(n) and SQRT.(n x(,n) > 1. In this dissertation we obtain similar large deviation local limit theorems for arbitrary sequences of random variables, not necessarily sums of i.i.d. random variables, thereby increasing the applicability of Richter's theorem. Let {T(,n), n (GREATERTHEQ) 1} be an arbitrary sequence of nonlattice random variables with characteristic function (c.f.) (phi)(,n). Let (psi)(,n), (gamma)(,n) be the c.g.f. and the large deviation rate of T(,n)/n. The main theorem in Chapter II shows that under some standard conditions on (psi)(,n), which imply that T(,n)/n converges to a constant in probability, the density function K(,n) of T(,n)/n has the asymptotic expression, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where m(,n) is any sequence of real numbers and (tau)(,n) is defined by(psi)(,n)'((tau)(,n)) = m(,n). When T(,n) is the sum of n i.i.d. random variables our result reduces to Richter's theorem. Similar theorems for lattice valued random variables are also presented which are useful in obtaining asymptotic probabilities for Wilcoxon signedrank test statistic and Kendall's tau., In Chapter III we use the results of Chapter II to obtain central limit theorem for sums of a triangular array of dependent random variables X(,j)('(n)), j = 1, ..., n with joint distribution given by z(,n)('1)exp{H(,n)(x(,1), ..., x(,n))}(PI)dP(x(,j)), where x(,i) (ELEM) R (FOR ALL) i (GREATERTHEQ) 1. The function H(,n)(x(,1), ..., x(,n)) is known as the Hamiltonian. Here P is a probability measure on R. When H(,n)(x(,1), ..., x(,n)) = log (phi)(,n)(s(,n)/n), where s(,n) = x(,1) + ... + x(,n) and the probability measure P satisfies appropriate conditions, we show that there exists an integer r (GREATERTHEQ) 1 and a sequence (tau)(,n) such that (S(,n)  n(tau)(,n))/n('1 1/2r) has a limiting distribution which is nonGaussian if r (GREATERTHEQ) 2. This result generalizes the theorems of JongWoo Jeon (Ph.D. Thesis, Dept. of Stat., F.S.U. (1979)) and Ellis and Newman (Z. Wahrscheinlichkeitstheorie und Verw. Gebiete. (1978) 44, 117139). Chapters IV and V extend the above to the multivariate case.
Show less  Date Issued
 1982, 1982
 Identifier
 AAI8225279, 3085419, FSDT3085419, fsu:74914
 Format
 Document (PDF)
 Title
 PARTIAL ORDERINGS, WITH APPLICATIONS TO RELIABILITY (PARTIAL ORDERINGS, SCHUROSTROWSKI THEOREM, INEQUALITIES).
 Creator

CHAN, WAI TAT., Florida State University
 Abstract/Description

This dissertation is a contribution to the use of inequalities in reliability theory. Specifically, we study three partial orderings, develop some useful properties of these orderings, and apply them to obtain several applications in reliability., The first partial ordering is the notion of convexordering among life distributions. This is in the spirit of Hardy, Littlewood, and Polya (1952) who introduced the concept of relative convexity. Many parametric families of distribution functions...
Show moreThis dissertation is a contribution to the use of inequalities in reliability theory. Specifically, we study three partial orderings, develop some useful properties of these orderings, and apply them to obtain several applications in reliability., The first partial ordering is the notion of convexordering among life distributions. This is in the spirit of Hardy, Littlewood, and Polya (1952) who introduced the concept of relative convexity. Many parametric families of distribution functions encountered in reliability theory are convexordered. Different coherent structures can also be compared with respect to this partial ordering., The second partial ordering is the ordering of majorization among integrable functions. This ordering is a generalization of the majorization ordering of Hardy, Littlewood, and Polya (1952) for vectors in ndimensional Euclidean spaces. The concept of majorization among vectors plays a fundamental role in establishing various inequalities. These inequalities can be recast as statements that certain functions are increasing with respect to the ordering of majorization. Such functions are called Schurconvex functions. An important result in the theory of majorization is the SchurOstrowski Theorem, which characterizes Schurconvex functions. A functional defined on the space of integrable functions is said to be Schurconvex if it is increasing with respect to the ordering of majorization. We obtain an analogue of the SchurOstrowski theorem which characterizes Schurconvex functionals in terms of their Gateaux differentials., The third partial ordering is the ordering of unrestricted majorization among integrable functions. This partial ordering is similar to majorization but does not involve the use of decreasing rearrangements. We establish another analogue of the SchurOstrowski Theorem for functionals increasing with respect to the partial ordering of unrestricted majorization.
Show less  Date Issued
 1985, 1985
 Identifier
 AAI8509841, 3086034, FSDT3086034, fsu:75520
 Format
 Document (PDF)
 Title
 ON SEQUENTIAL UNBIASED AND BAYESTYPE ESTIMATES OF PARAMETERS IN A CONTINGENCY TABLE.
 Creator

CHEN, CHENGCHUNG., Florida State University
 Abstract/Description

Estimation of the probability parameters in a contingency table with linear and/or loglinear constraints on the parameters is the principal concern of this thesis. Sequential unbiased estimates of the cell probabilities as well as some Bayes posterior mean type estimates are considered., Chapter I is a review of some earlier work on the sequential unbiased estimation of the probability parameter in a Bernoulli process. The review begins with the classical work of Girshick, Mosteller and...
Show moreEstimation of the probability parameters in a contingency table with linear and/or loglinear constraints on the parameters is the principal concern of this thesis. Sequential unbiased estimates of the cell probabilities as well as some Bayes posterior mean type estimates are considered., Chapter I is a review of some earlier work on the sequential unbiased estimation of the probability parameter in a Bernoulli process. The review begins with the classical work of Girshick, Mosteller and Savage (1946) and some followup studies like Wolfowitz (1946), Savage (1947), Blackwell (1947), Lehmann and Stein (1950), Degroot (1959) and Kagan, Linnik and Rao (1973). In several cases the original proofs have been simplified and the arguments streamlined., Chapter II deals with the problem of sequential unbiased estimation of the parameters in a contingency table with linear and/or loglinear constraints. Multinomial Girschick, Mosteller and Savage (GMS) type stopping rules are discussed and the corresponding unbiased estimates based on the minimal sufficient statistic described. Consistency, in the sence of Wolfowitz (1947), of such estimates is demonstrated. Unbiased estimates of parametric functions like logcontrasts are derived. Sufficient conditions for the completeness of the GMStype stopping rules are given., In Chapter III, the problem of sequential unbiased estimation of the probability parameters in the BradleyTerry (1952) model of paired comparisons is studied.g The BradleyTerry model can be summarized as follows. Suppose that there are t treatments T(,1), ..., T(,t) that can be pairwise compared. The BradleyTerry model postulates that associated with treatement T(,i) is a :strenth" parameter (PI)(,i) > 0, i = 1, ..., t, such that if treatments T(,i) and T(,j) are compared, the probability that T(,i) is preferred to T(,j) is (theta)(,ij) = (PI)(,i)/((PI)(,i) + (PI)(,j)). The model imposes loglinear constraints on he (theta)(,ij)'s so that techniques similar to those in Chapter II may be used to obtain unbiased estimates, based on a sufficient statistic., In Chapter IV, two Bayestype procedures for estimating the multnomial cell probability vector p, in he presence of linear constraints on the parameters, are proposed and illustrated with examples. A general prior is used with the restriction that the moment generating function of the prior exists in a closed form. The estimators are shown to be strongly consistent. Estimation under loglinear constraints is also considered. Finally, Bayestype estimators for the covariance matrix of the cell frequencies are presented for some special cases of linearly and loglinearly constrained problems., Chapter V is concerned with a Bayesian approach to the estimation of parameters in the BradleyTerry model of paired comparisons. It is assumed that the sum of the treatment parameters (PI)(,i) is 1, and a Dirichlet prior for (PI) = ((PI)(,1), ..., (PI)(,t)) is used. Using the induced prior of (theta)(,ij) and Z(,ij) = (PI)(,i) + (PI)(,j), an estimate (PI)(,ij) of (PI)(,i), based on the data arising from the comparisons of(' ) treatments T(,i) and T(,j), is obtained. An estimate of (PI)(,i) based on all the data is a weighted combination of the (PI)(,ij)'s that minimizes a(' ) risk function. Similarly, estimates for logcontrasts of the (PI)(,i)'s areobtained. This technique of estimation is extended to the Lucemodel of multiple comparisons.(,)
Show less  Date Issued
 1981, 1981
 Identifier
 AAI8125818, 3085061, FSDT3085061, fsu:74559
 Format
 Document (PDF)
 Title
 Identifying influential effects in factorial experiments with sixteen runs: Empirical Bayes approaches.
 Creator

Chen, ChingHsiang., Florida State University
 Abstract/Description

To identify influential effects in unreplicated (possibly fractionated) factorial experiments, the effectsparsity assumption (Box and Meyer (1986), Technometrics 28. 1118) has been adopted in many studies. Although this assumption has been traditionally used for outlierdetecting problems, it may not be suitable to describe the effects from factorial experiments. In this research, we examine the effectsparsity approach and propose empirical Bayes methods relaxing this assumption. The study...
Show moreTo identify influential effects in unreplicated (possibly fractionated) factorial experiments, the effectsparsity assumption (Box and Meyer (1986), Technometrics 28. 1118) has been adopted in many studies. Although this assumption has been traditionally used for outlierdetecting problems, it may not be suitable to describe the effects from factorial experiments. In this research, we examine the effectsparsity approach and propose empirical Bayes methods relaxing this assumption. The study also examines the identification of influential effects based on information about the design structure such as the alias relationships, design resolution, and sizes of interactions. A simulation study, based primarily on the criterion of reducing experimental cost of misidentifying factors, has been performed to compare different methods. The results show that when the number of factors is large and when the factorial experiment is highly fractionated, the incorporation of information about the design structure into the analysis reduces the cost in a screening experiment compared to methods not considering design structure.
Show less  Date Issued
 1994, 1994
 Identifier
 AAI9424751, 3088354, FSDT3088354, fsu:77159
 Format
 Document (PDF)
 Title
 ON NONPARAMETRIC ESTIMATION OF DENSITY AND REGRESSION FUNCTIONS.
 Creator

CHENG, PHILIP E., The Florida State University
 Abstract/Description

In the field of statistical estimation, nonparametric procedures have received increased attention for the past decade. In particular, various nonparametric estimates of probability density functions and regression curves have been extensively studied, with special attention to large sample pr
 Date Issued
 1980, 1980
 Identifier
 AAI8020329, 2989654, FSDT2989654, fsu:74161
 Format
 Document (PDF)
 Title
 A comparison of two methods of bootstrapping in a reliability model.
 Creator

Chiang, YuangChin., Florida State University
 Abstract/Description

We consider bootstrapping in the following reliability model which was considered by Doss, Freitag, and Proschan (1987). Available for testing is a sample of iid systems each having the same structure of m independent components. Each system is continuously observed until it fails. For every component in each system, either a failure time or a censoring time is recorded. A failure time is recorded if the component fails before or at the time of system failure; otherwise a censoring time is...
Show moreWe consider bootstrapping in the following reliability model which was considered by Doss, Freitag, and Proschan (1987). Available for testing is a sample of iid systems each having the same structure of m independent components. Each system is continuously observed until it fails. For every component in each system, either a failure time or a censoring time is recorded. A failure time is recorded if the component fails before or at the time of system failure; otherwise a censoring time is recorded. To estimate the distribution of the component lifelengths F$\sb1,\...$,F$\sb{\rm m}$, one can formally compute the KaplanMeier estimates F$\sb1,\...$,F$\sb{\rm m}$. Various quantities of interest, such as the probability that a new system will survive time t$\sb0$, may then be estimated by combining F$\sb1,\...$,F$\sb{\rm m}$ in a suitable way. In this model, bootstrapping can be carried out in two different ways. One can resample n systems at random from the original n systems. Alternatively, one can construct artificial systems by generating independent random lifelengths from the KaplanMeier estimates F$\sb{\rm j}$, and from those form artificial data. The two methods are distinct. We show that asymptotically, bootstrapping by either method yields correct answers. We also compare the two methods via simulation studies.
Show less  Date Issued
 1988, 1988
 Identifier
 AAI8906216, 3161719, FSDT3161719, fsu:77918
 Format
 Document (PDF)
 Title
 Ridge regression: Application to educational data.
 Creator

Churngchow, Chidchanok., Florida State University
 Abstract/Description

Ridge regression is a type of regression technique which was developed to remedy the problem of multicollinearity in regression analysis. The major problem with multicollinearity is that it causes high variances in the estimation of regression coefficients. The ridge model introduces some bias into the regression equation in order to reduce the variance of the estimators. The purposes of this study were to demonstrate the application of the ridge regression model to educational data and to...
Show moreRidge regression is a type of regression technique which was developed to remedy the problem of multicollinearity in regression analysis. The major problem with multicollinearity is that it causes high variances in the estimation of regression coefficients. The ridge model introduces some bias into the regression equation in order to reduce the variance of the estimators. The purposes of this study were to demonstrate the application of the ridge regression model to educational data and to compare the characteristics and performance of the ridge method and the least squares method. In this study, four types of ridge were compared to the least squares method. They were ridge trace, generalized, ordinary and directed ridge., The sample of this study consisted of 141 public schools in Dade County, Florida. The dependent variable was the students' average scores in mathematical computation and reading comprehension. Six variables representing teacher and student characteristics were employed as the predictors. The performance of ridge and the least squares were compared in terms of the confidence interval of an individual estimator and predictive accuracy for the whole model. Since the statistical inference for the ridge method has not been completely developed, the bootstrap technique with a sample size of twenty, was used to calculate the confidence interval of each estimator., The study resulted in a successful application of ridge regression to school level data in which it was found that (1) ridge regression yielded a smaller confidence interval for every estimated regression coefficient and (2) ridge regression produced higher predictive accuracy than ordinary least squares., Since the results were just based on one particular set of data, it cannot be guaranteed that ridge always outperforms the least squares method in all cases.
Show less  Date Issued
 1988, 1988
 Identifier
 AAI8805652, 3086742, FSDT3086742, fsu:76217
 Format
 Document (PDF)
 Title
 A hypothesis test of cumulative sums of multinomial parameters.
 Creator

Clair, James Hunter., Florida State University
 Abstract/Description

Consider $N$ times to repair, $T\sb1,T\sb2\cdots,T\sb{N}$, from a repair time distribution function $F(\cdot)$. Let $p\sb{0~1},p\sb{0~2},\cdots,p\sb{0~K}$ be $K$ proportions with $\sum\sbsp{\nu =1}{K}p\sb{0~\nu}$ $<$ 1. We wish to have at least 100 ($\sum\sbsp{\nu =1}{K}p\sb{0~\nu}$)% of items repaired by time $L\sb{i}$, $1 \le i \le K$, $K \ge 2$. Denote the unknown quantity $F(L\sb{i}$)  $F(L\sb{i1})$ as $p\sb{i}$, $1 \le i \le K$. Thus we wish to test the hypothesis(UNFORMATTED TABLE OR...
Show moreConsider $N$ times to repair, $T\sb1,T\sb2\cdots,T\sb{N}$, from a repair time distribution function $F(\cdot)$. Let $p\sb{0~1},p\sb{0~2},\cdots,p\sb{0~K}$ be $K$ proportions with $\sum\sbsp{\nu =1}{K}p\sb{0~\nu}$ $<$ 1. We wish to have at least 100 ($\sum\sbsp{\nu =1}{K}p\sb{0~\nu}$)% of items repaired by time $L\sb{i}$, $1 \le i \le K$, $K \ge 2$. Denote the unknown quantity $F(L\sb{i}$)  $F(L\sb{i1})$ as $p\sb{i}$, $1 \le i \le K$. Thus we wish to test the hypothesis(UNFORMATTED TABLE OR EQUATION FOLLOWS), A simple procedure is to test this hypothesis with the $K$ statistics $N\sb1$, $\sum\sbsp{\nu=1}{2}N\sb{\nu},\cdots,\sum\sbsp{\nu=a}{K}N\sb{\nu}$, where $\sum\sbsp{\nu=1}{i}N\sb{\nu}$ = the number of repairs that takes place on or before $l\sb{i}$, $1 \le i \le K$. Each $\sum\sbsp{\nu=n}{i}N\sb{\nu}$ is a binomial random variable with unknown parameter $\sum\sbsp{\nu=1}{i}p\sb{\nu}$. The hypothesis H$\sb0$ is rejected if any of the $\sum\sbsp{\nu=1}{i}N\sb{\nu}$ $\le$ $n\sbsp{i}{0}$, where the $n\sbsp{i}{0}$ are chosen from binomial tables. This test is shown to have several deficiencies. We construct an alternative procedure with which to test this hypothesis., The Generalized Likelihood Ratio Statistic (GLRT) is based on the multinomial random variable ($N\sb1,N\sb2,\cdots,N\sb{K}$), with parameter ${(p\sb1,}$ $p\sb2,\cdots,$ $p\sb{K}$). The parameter space is(UNFORMATTED TABLE OR EQUATION FOLLOWS), An algorithm is constructed and computer code supplied to calculate $\lambda(N)$ efficiently for any finite $N$., For small samples computer code is given to calculate exactly $\delta$ or a pvalue for an observed value of $\lambda(N(K))$, 2 $\le$ $K$ $\le$ 5, and $K\ \le\ N\ \le\ N(K)$., For large $N$, we apply a theorem by Feder(1968) to evaluate the asymptotic critical values and power., The GLRT statistic, $\lambda(N)$, is shown to be approximately a unionintersection test and thus is approximated by a collection of uniformly most powerful unbiased tests of binomial parameters. The GLRT is shown empirically in the case of $K$ = 3 to have higher power than competing unionintersection tests., Two power estimation techniques are described and compared empirically., References. Feder, Paul J. (1968), "On the distribution of the loglikelihood ratio test statistic when the true parameter is 'near' the boundaries of the hypothesis region," Annals of Mathematical Statistics, 39, 20442055.
Show less  Date Issued
 1988, 1988
 Identifier
 AAI8822443, 3161637, FSDT3161637, fsu:77837
 Format
 Document (PDF)
 Title
 TWOWAY CLUSTER ANALYSIS WITH NOMINAL DATA.
 Creator

COOPER, PAUL GAYLORD., Florida State University
 Abstract/Description

Consider an M by N data matrix X whose elements may assume values 0, 1, 2, . . ., H. Denote the rows of X by (alpha)(,1), (alpha)(,2), . . ., (alpha)(,M). A tree on the rows of X is a sequence of distinct partitions {P(,1)}(,i=1) such that: (a) P(,1) = {((alpha)(,1)), . . ., ((alpha)(,M))}, (b) P(,i) is a refinement of P(,i+1) for i = 1, . . ., k1, and (c) P(,k) = {((alpha)(,1), . . ., (alpha)(,M))}. The twoway clustering problem consists of simultaneously constructing trees on the rows,...
Show moreConsider an M by N data matrix X whose elements may assume values 0, 1, 2, . . ., H. Denote the rows of X by (alpha)(,1), (alpha)(,2), . . ., (alpha)(,M). A tree on the rows of X is a sequence of distinct partitions {P(,1)}(,i=1) such that: (a) P(,1) = {((alpha)(,1)), . . ., ((alpha)(,M))}, (b) P(,i) is a refinement of P(,i+1) for i = 1, . . ., k1, and (c) P(,k) = {((alpha)(,1), . . ., (alpha)(,M))}. The twoway clustering problem consists of simultaneously constructing trees on the rows, columns, and elements of X. A generalization of a twoway joining algorithm (TWJA) introduced by J. A. Hartigan (1975) is used to construct the three trees., The TWJA requires the definition of measures of dissimilarity between row clusters and column clusters respectively. Two approaches are used in the construction of these dissimilarity coefficientsone based on intuition and one based on a formal prediction model. For matrices with binary elements (0 or 1), measures of dissimilarity between row or column clusters are based on the number of mismatching pairs. Consider two distinct row clusters R(,p) and R(,q) containing m(,p) and m(,q) rows respectively. One measure of dissimilarity, d(,0)(R(,p), R(,q)), between R(,p) and R(,q), is, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where b(,p(beta)) and b(,q(beta)) are the number of ones in column (beta) of clusters R(,p) and R(,q) respectively. Two additional intuitive dissimilarity coefficients are also defined and studied., For matrices containing nominal level data, dissimilarity coefficients are based on a formal prediction model. Analogous to the procedure of Cleveland and Relles (1974), for a given data matrix, the model consists of a scheme for random selection of two rows (or columns) from the matrix and an identification rule for distinguishing between the two rows (or columns). A loss structure is defined for both rows and columns and the expected loss due to incorrect row or column identification is computed. The dissimilarity between two (say) row clusters is then defined to be the increase in expected loss due to joining those two row clusters into a single cluster., Stopping criteria are suggested for both the intuitive and prediction model approaches. For the intuitive approach, it is suggested that joining be stopped when the dissimilarity between the (say) row clusters to be joined next exceeds that expected by chance under the assumption that the (say) column totals of the matrix are fixed. For the prediction model approach the stopping criterion is based on a cluster prediction model in which the objective is to distinguish between row or column clusters. A cluster identification rule is defined based on the information in the partitioned data matrix and the expected loss due to incorrect cluster identification is computed. The expected cluster loss is also computed when cluster identification is based on strict randomization. The relative decrease in expected cluster loss due to identification based on the partitioned matrix versus that based on randomization is suggested as a stopping criterion., Both contrived and real data examples are used to illustrate and compare the two clustering procedures. Computational aspects of the procedure are discussed and it is concluded that the intuitive approach is less costly in terms of computation time. Further, five admissibility properties are defined and, for certain intuitive dissimilarity coefficients, the trees produced by the TWJA are shown to possess three of the five properties.
Show less  Date Issued
 1980, 1980
 Identifier
 AAI8026123, 3084693, FSDT3084693, fsu:74194
 Format
 Document (PDF)
 Title
 STOCHASTIC VERSIONS OF REARRANGEMENT INEQUALITIES WITH APPLICATIONS TO STATISTICS.
 Creator

D'ABADIE, CATHERINE ANNE., Florida State University
 Abstract/Description

In this dissertation we develop a theory which offers a unified approach to the problem of obtaining stochastic versions of deterministic rearrangement inequalities., To develop the theory we first define two new classes of functions and establish preservation properties of these functions under various statistical and mathematical operations., Next we introduce the notion of stochastically similarly arranged (SSA) pairs of random vectors. We prove that if the random vectors (X,Y) are SSA and...
Show moreIn this dissertation we develop a theory which offers a unified approach to the problem of obtaining stochastic versions of deterministic rearrangement inequalities., To develop the theory we first define two new classes of functions and establish preservation properties of these functions under various statistical and mathematical operations., Next we introduce the notion of stochastically similarly arranged (SSA) pairs of random vectors. We prove that if the random vectors (X,Y) are SSA and the function f from R('n) x R('n) into R('n) is monotone with respect to a certain partial ordering on R('n) x R('n) then for every permutation (pi) the stochastic inequalities, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), hold. This result yields a unified way of obtaining stochastic versions of rearrangement inequalities., We then show that many multivariate densities of interest in statistical practice govern pairs of random vectors which are SSA., Next we show that under certain statistical operations on pairs of SSA random vectors the property of being SSA is preserved. For example, we show that the rank order of SSA random variables is SSA. We also show that the SSA property is preserved under certain contamination models., Finally, we show how the results we obtain can be applied to problems in hypothesis testing.
Show less  Date Issued
 1981, 1981
 Identifier
 AAI8205717, 3085181, FSDT3085181, fsu:74676
 Format
 Document (PDF)
 Title
 ESTIMATING MULTIDIMENSIONAL TABLES FROM SURVEY DATA: PREDICTING MAGAZINE AUDIENCES.
 Creator

DANAHER, PETER JOSEPH., Florida State University
 Abstract/Description

Suppose an advertiser constructs an advertising campaign by placing k advertisements in a magazine. He now estimates the proportion of the population which sees none, one, or up to all k advertisements (called the exposure distribution). Several criteria for evaluating the effectiveness of the campaign can be obtained directly from the exposure distribution. Two of them are reach, the proportion of the population which is exposed to at least one of the advertisements and effective reach, the...
Show moreSuppose an advertiser constructs an advertising campaign by placing k advertisements in a magazine. He now estimates the proportion of the population which sees none, one, or up to all k advertisements (called the exposure distribution). Several criteria for evaluating the effectiveness of the campaign can be obtained directly from the exposure distribution. Two of them are reach, the proportion of the population which is exposed to at least one of the advertisements and effective reach, the mean of the exposure distribution., We develop three exposure distribution models for the cases where advertising campaigns are comprised of one, two, or three or more magazines. The models build on each other in that the model for one magazine is used to improve the fit of the model for two magazines and the model for two magazines is used to estimate the parameters of the model for three or more magazines., A thorough empirical test, using the AGB:McNair "National Media Survey", shows that each of our models outperforms the best currentlyavailable models. In addition, the three models are proved to have optimal asymptotic properties., The models are used to select a media schedule which maximizes either reach or effective reach subject to a budget constraint. A monotonicity property of reach and effective reach yields an algorithm for optimizing both reach and effective reach that greatly reduces computation time over conventional methods used to solve integer programming problems., It is more useful to estimate the proportion of the population which sees the advertisements in a magazine rather than the proportion which sees the magazine. Often, however, no advertisement recall data is available so we are forced to estimate the proportion which is exposed to just the magazines. If advertisement recall data is available we give a natural and simple adjustment of the original magazine exposure data to get advertisement exposure data. Our models also give an excellent fit to these adjusted exposure data.
Show less  Date Issued
 1987, 1987
 Identifier
 AAI8721837, 3086665, FSDT3086665, fsu:76140
 Format
 Document (PDF)
 Title
 Ultrafast Lattice Dynamics in Metal Thin Films and NanoParticles.
 Creator

Wang, Xuan, Cao, Jim, Yang, Wei, Bonesteel, Nicholas, Riley, Mark, Xiong, Peng, Department of Physics, Florida State University
 Abstract/Description

This thesis presents the new development of the 3rd generation femtosecond diffractometer (FED) in Professor Jim Cao's group and its application to study ultrafast structural dynamics of solid state materials. The 3rd generation FED prevails its former type and other similar FED instruments by a DC electron gun that can generate much higher energy electron pulses, and a more efficient imaging system. This combination together with miscellaneous improvements significantly boosts the signalto...
Show moreThis thesis presents the new development of the 3rd generation femtosecond diffractometer (FED) in Professor Jim Cao's group and its application to study ultrafast structural dynamics of solid state materials. The 3rd generation FED prevails its former type and other similar FED instruments by a DC electron gun that can generate much higher energy electron pulses, and a more efficient imaging system. This combination together with miscellaneous improvements significantly boosts the signaltonoise ratio and thus enables us to study more complex solid state materials. Two main thrusts are discussed in details in this thesis. The first one is the dynamics of coherent phonon generation by ultrafast heating in gold thin film and nanoparticles, which emphasizes the electronic thermal stress. The other one is the ultrafast dynamics in Nickel, which shows that the mutual interactions among lattice, spin and electron subsystems can significantly alter the ultrafast lattice dynamics. In these studies, we exploit the advantage of FED instrument as an ideal tool that can directly and simultaneously monitor the coherent and random motion of lattice.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd1247
 Format
 Thesis
 Title
 AP Student Visual Preferences for Problem Solving.
 Creator

Swoyer, Liesl, Department of Statistics
 Abstract/Description

The purpose of this study is to explore the mathematical preference of high school AP Calculus students by examining their tendencies for using differing methods of thought. A student's preferred mode of thinking was measured on a scale ranging from a preference for analytical thought to a preference for visual thought as they completed derivative and antiderivative tasks presented both algebraically and graphically. This relates to previous studies by continuing to analyze the factors that...
Show moreThe purpose of this study is to explore the mathematical preference of high school AP Calculus students by examining their tendencies for using differing methods of thought. A student's preferred mode of thinking was measured on a scale ranging from a preference for analytical thought to a preference for visual thought as they completed derivative and antiderivative tasks presented both algebraically and graphically. This relates to previous studies by continuing to analyze the factors that have been found to mediate the students' performance and preference in regards to a variety of calculus tasks. Data was collected by Dr. Erhan Haciomeroglu at the University of Central Florida. Students' preferences were not affected by gender. Students were found to approach graphical and algebraic tasks similarly, without any significant change with regards to derivative or antiderivative nature of the tasks. Highly analytic and highly visual students revealed the same proportion of change in visuality as harmonic students when more difficult calculus tasks were encountered. Thus, a strong preference for visual thinking when completing algebraic tasks was not the determining factor of their preferred method of thinking when approaching graphical tasks.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_uhm0052
 Format
 Thesis
 Title
 Analysis of Multivariate Data with Random Cluster Size.
 Creator

Li, Xiaoyun, Sinha, Debajyoti, Zhou, Yi, McGee, Dan, Lipsitz, Stuart, Department of Statistics, Florida State University
 Abstract/Description

In this dissertation, we examine binary correlated data with present/absent component or missing data that are related to binary responses of interest. Depending on the data structure, correlated binary data can be referred as emph{clustered data} if sampling unit is a cluster of subjects, or it can be referred as emph{longitudinal data} when it involves repeated measurement of same subject over time. We propose our novel models in these two data structures and illustrate the model with real...
Show moreIn this dissertation, we examine binary correlated data with present/absent component or missing data that are related to binary responses of interest. Depending on the data structure, correlated binary data can be referred as emph{clustered data} if sampling unit is a cluster of subjects, or it can be referred as emph{longitudinal data} when it involves repeated measurement of same subject over time. We propose our novel models in these two data structures and illustrate the model with real data applications. In biomedical studies involving clustered binary responses, the cluster size can vary because some components of the cluster can be absent. When both the presence of a cluster component as well as the binary disease status of a present component are treated as responses of interest, we propose a novel twostage random effects logistic regression framework. For the ease of interpretation of regression effects, both the marginal probability of presence/absence of a component as well as the conditional probability of disease status of a present component, preserve the approximate logistic regression forms. We present a maximum likelihood method of estimation implementable using standard statistical software. We compare our models and the physical interpretation of regression effects with competing methods from literature. We also present a simulation study to assess the robustness of our procedure to wrong specification of the random effects distribution and to compare finite sample performances of estimates with existing methods. The methodology is illustrated via analyzing a study of the periodontal health status in a diabetic Gullah population. We extend this model in longitudinal studies with binary longitudinal response and informative missing data. In longitudinal studies, when treating each subject as a cluster, cluster size is the total number of observations for each subject. When data is informatively missing, cluster size of each subject can vary and is related to the binary response of interest and we are also interested in the missing mechanism. This is a modified situation of the cluster binary data with present components. We modify and adopt our proposed twostage random effects logistic regression model so that both the marginal probability of binary response and missing indicator as well as the conditional probability of binary response and missing indicator preserve logistic regression forms. We present a Bayesian framework of this model and illustrate our proposed model on an AIDS data example.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd1425
 Format
 Thesis
 Title
 A Statistical Approach for Information Extraction of Biological Relationships.
 Creator

Bell, Lindsey R., Zhang, Jinfeng, Niu, Xufeng, Tyson, Gary, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

Vast amounts of biomedical information are stored in scientific literature, easily accessed through publicly available databases. Relationships among biomedical terms constitute a major part of our biological knowledge. Acquiring such structured information from unstructured literature can be done through human annotation, but is time and resource consuming. As this content continues to rapidly grow, the popularity and importance of text mining for obtaining information from unstructured text...
Show moreVast amounts of biomedical information are stored in scientific literature, easily accessed through publicly available databases. Relationships among biomedical terms constitute a major part of our biological knowledge. Acquiring such structured information from unstructured literature can be done through human annotation, but is time and resource consuming. As this content continues to rapidly grow, the popularity and importance of text mining for obtaining information from unstructured text becomes increasingly evident. Text mining has four major components. First relevant articles are identified through information retrieval (IR), next important concepts and terms are flagged using entity recognition (ER), and then relationships between these entities are extracted from the literature in a process called information extraction(IE). Finally, text mining takes these elements and seeks to synthesize new information from the literature. Our goal is information extraction from unstructured literature concerning biological entities. To do this, we use the structure of triplets where each triplet contains two biological entities and one interaction word. The biological entities may include terms such as protein names, disease names, genes, and smallmolecules. Interaction words describe the relationship between the biological terms. Under this framework we aim to combine the strengths of three classifiers in an ensemble approach. The three classifiers we consider are Bayesian Networks, Support Vector Machines, and a mixture of logistic models defined by interaction word. The three classifiers and ensemble approach are evaluated on three benchmark corpora and one corpus that is introduced in this study. The evaluation includes cross validation and crosscorpus validation to replicate an application scenario. The three classifiers are unique and we find that performance of individual classifiers varies depending on the corpus. Therefore, an ensemble of classifiers removes the need to choose one classifier and provides optimal performance.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd1314
 Format
 Thesis
 Title
 Variable Selection of Correlated Predictors in Logistic Regression: Investigating the DietHeart Hypothesis.
 Creator

Thompson, Warren R. (Warren Robert), McGee, Daniel, Eberstein, Isaac, Huﬀer, Fred, Sinha, Debajyoti, She, Yiyuan, Department of Statistics, Florida State University
 Abstract/Description

Variable selection is an important aspect of modeling. Its aim is to distinguish between the authentic variables which are important in predicting outcome, and the noise variables which possess little to no predictive value. In other words, the goal is to find the variables that (collectively) best explains and predicts changes in the outcome variable. The variable selection problem is exacerbated when correlated variables are included in the covariate set. This dissertation examines the...
Show moreVariable selection is an important aspect of modeling. Its aim is to distinguish between the authentic variables which are important in predicting outcome, and the noise variables which possess little to no predictive value. In other words, the goal is to find the variables that (collectively) best explains and predicts changes in the outcome variable. The variable selection problem is exacerbated when correlated variables are included in the covariate set. This dissertation examines the variable selection problem in the context of logistic regression. Specifically, we investigated the merits of the bootstrap, ridge regression, the lasso and Bayesian model averaging (BMA) as variable selection techniques when highly correlated predictors and a dichotomous outcome are considered. This dissertation also contributes to the literature on the dietheart hypothesis. The dietheart hypothesis has been around since the early twentieth century. Since then, researchers have attempted to isolate the nutrients in diet that promote coronary heart disease (CHD). After a century of research, there is still no consensus. In our current research, we used some of the more recent statistical methodologies (mentioned above) to investigate the effect of twenty dietary variables on the incidence of coronary heart disease. Logistic regression models were generated for the data from the Honolulu Heart Program  a study of CHD incidence in men of Japanese descent. Our results were largely methodspecific. However, regardless of method considered, there was strong evidence to suggest that alcohol consumption has a strong protective effect on the risk of coronary heart disease. Of the variables considered, dietary cholesterol and caffeine were the only variables that, at best, exhibited a moderately strong harmful association with CHD incidence. Further investigation that includes a broader array of food groups is recommended.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd1360
 Format
 Thesis
 Title
 Some New Methods for Design and Analysis of Survival Data.
 Creator

Wang, Wenting, Sinha, Debajyoti, Arjmandi, Bahram H., McGee, Dan, Niu, Xufeng, Yu, Kai, Department of Statistics, Florida State University
 Abstract/Description

For survival outcomes, usually, statistical equivalent tests to show a new treatment therapeutically equivalent to a standard treatment are based on the Cox (1972) proportional hazards assumption. We present an alternative method based on the linear transformation model (LTM) for two treatment arms, and show the advantages of using this equivalence test instead of tests based on the Cox's model. LTM is a very general class of models including models such as the proportional odds survival...
Show moreFor survival outcomes, usually, statistical equivalent tests to show a new treatment therapeutically equivalent to a standard treatment are based on the Cox (1972) proportional hazards assumption. We present an alternative method based on the linear transformation model (LTM) for two treatment arms, and show the advantages of using this equivalence test instead of tests based on the Cox's model. LTM is a very general class of models including models such as the proportional odds survival model (POSM). We presented a sufficient condition to check whether logrank based tests have inflated Type I error rates. We show that POSM and some other commonly used survival models within the LTM class all satisfy this condition. Simulation studies show that repeated use of our test instead of using logrank based tests will be a safer statistical practice. Our second goal is to develop a practical Bayesian model for survival data with high dimensional covariate vector. We develop the Information Matrix (IM) and Information Matrix Ridge (IMR) priors for commonly used survival models including the Cox's model and the cure rate model proposed by Chen et al. (1999), and examine many desirable theoretical properties including sufficient conditions for the existence of the moment generating functions for these priors and corresponding posterior distributions. The performance of these priors in practice is compared with some competing priors via the Bayesian analysis of a study that investigates the relationship between lung cancer survival time and a large number of genetic markers.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd1248
 Format
 Thesis
 Title
 Bayesian Generalized Polychotomous Response Models and Applications.
 Creator

Yang, Fang, Niu, XuFeng, Johnson, Suzanne B., McGee, Dan, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

Polychotomous quantal response models are widely used in medical and econometric studies to analyze categorical or ordinal data. In this study, we apply the Bayesian methodology through a mixedeffects polychotomous quantal response model. For the Bayesian polychotomous quantal response model, we assume uniform improper priors for the regression coeffcients and explore the suffcient conditions for a proper joint posterior distribution of the parameters in the models. Simulation results from...
Show morePolychotomous quantal response models are widely used in medical and econometric studies to analyze categorical or ordinal data. In this study, we apply the Bayesian methodology through a mixedeffects polychotomous quantal response model. For the Bayesian polychotomous quantal response model, we assume uniform improper priors for the regression coeffcients and explore the suffcient conditions for a proper joint posterior distribution of the parameters in the models. Simulation results from Gibbs sampling estimates will be compared to traditional maximum likelihood estimates to show the strength that using the uniform improper priors for the regression coeffcients. Motivated by investigating of relationship between BMI categories and several risk factors, we carry out the application studies to examine the impact of risk factors on BMI categories, especially for categories of "Overweight" and "Obesities". By applying the mixedeffects Bayesian polychotomous response model with uniform improper priors, we would get similar interpretations of the association between risk factors and BMI, comparing to literature findings.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd1092
 Format
 Thesis
 Title
 Covariance on Manifolds.
 Creator

Balov, Nikolay H. (Nikolay Hristov), Srivastava, Anuj, Klassen, Eric, Patrangenaru, Victor, McGee, Daniel, Department of Statistics, Florida State University
 Abstract/Description

With ever increasing complexity of observational and theoretical data models, the sufficiency of the classical statistical techniques, designed to be applied only on vector quantities, is being challenged. Nonlinear statistical analysis has become an area of intensive research in recent years. Despite the impressive progress in this direction, a unified and consistent framework has not been reached. In this regard, the following work is an attempt to improve our understanding of random...
Show moreWith ever increasing complexity of observational and theoretical data models, the sufficiency of the classical statistical techniques, designed to be applied only on vector quantities, is being challenged. Nonlinear statistical analysis has become an area of intensive research in recent years. Despite the impressive progress in this direction, a unified and consistent framework has not been reached. In this regard, the following work is an attempt to improve our understanding of random phenomena on nonEuclidean spaces. More specifically, the motivating goal of the present dissertation is to generalize the notion of distribution covariance, which in standard settings is defined only in Euclidean spaces, on arbitrary manifolds with metric. We introduce a tensor field structure, named covariance field, that is consistent with the heterogeneous nature of manifolds. It not only describes the variability imposed by a probability distribution but also provides alternative distribution representations. The covariance field combines the distribution density with geometric characteristics of its domain and thus fills the gap between these two.We present some of the properties of the covariance fields and argue that they can be successfully applied to various statistical problems. In particular, we provide a systematic approach for defining parametric families of probability distributions on manifolds, parameter estimation for regression analysis, nonparametric statistical tests for comparing probability distributions and interpolation between such distributions. We then present several application areas where this new theory may have potential impact. One of them is the branch of directional statistics, with domain of influence ranging from geosciences to medical image analysis. The fundamental level at which the covariance based structures are introduced, also opens a new area for future research.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd1045
 Format
 Thesis
 Title
 A Study of the Asymptotic Properties of Lasso Estimates for Correlated Data.
 Creator

Gupta, Shuva, Bunea, Florentina, Gert, Joshua, Hollander, Myles, Wegkamp, Marten, Department of Statistics, Florida State University
 Abstract/Description

In this thesis we investigate postmodel selection properties of L1 penalized weighted least squares estimators in regression models with a large number of variables M and correlated errors. We focus on correct subset selection and on the asymptotic distribution of the penalized estimators. In the simple case of AR(1) errors we give conditions under which correct subset selection can be achieved via our procedure. We then provide a detailed generalization of this result to models with errors...
Show moreIn this thesis we investigate postmodel selection properties of L1 penalized weighted least squares estimators in regression models with a large number of variables M and correlated errors. We focus on correct subset selection and on the asymptotic distribution of the penalized estimators. In the simple case of AR(1) errors we give conditions under which correct subset selection can be achieved via our procedure. We then provide a detailed generalization of this result to models with errors that have a weakdependency structure (Doukhan 1996). In all cases, the number M of regression variables is allowed to exceed the sample size n. We further investigate the asymptotic distribution of our estimates, when M < n, and show that under appropriate choices of the tuning parameters the limiting distribution is multivariate normal. This generalizes to the case of correlated errors the result of Knight and Fu (2000), obtained for regression models with independent errors.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd3896
 Format
 Thesis
 Title
 Nonparametric Estimation of Three Dimensional Projective Shapes with Applications in Medical Imaging and in Pattern Recognition.
 Creator

Crane, Michael, Patrangenaru, Victor, Liu, Xiuwen, Huﬀer, Fred W., Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

This dissertation is on analysis of invariants of a 3D configuration from its 2D images in pictures of this configuration, without requiring any restriction on the camera positioning relative to the scene pictured. We briefly review some of the main results found in the literature. The methodology used is nonparametric, manifold based combined with standard computer vision reconstruction techniques. More specifically, we use asymptotic results for the extrinsic sample mean and the extrinsic...
Show moreThis dissertation is on analysis of invariants of a 3D configuration from its 2D images in pictures of this configuration, without requiring any restriction on the camera positioning relative to the scene pictured. We briefly review some of the main results found in the literature. The methodology used is nonparametric, manifold based combined with standard computer vision reconstruction techniques. More specifically, we use asymptotic results for the extrinsic sample mean and the extrinsic sample covariance to construct bootstrap confidence regions for mean projective shapes of 3D configurations. Chapters 4, 5 and 6 contain new results. In chapter 4, we develop tests for coplanarity. In chapter 5, is on reconstruction of 3D polyhedral scenes, including texture from arbitrary partial views. In chapter 6, we develop a nonparametric methodology for estimating the mean change for matched samples on a Lie group. We then notice that for k ≥ 4, a manifold of projective shapes of kads in general position in 3D has a structure of 3k − 15 dimensional Lie group (PQuaternions) that is equivariantly embedded in an Euclidean space, therefore testing for mean 3D projective shape change amounts to a one sample test for extrinsic mean PQuaternion Objects. The Lie group technique leads to a large sample and nonparametric bootstrap test for one population extrinsic mean on a projective shape space, as recently developed by Patrangenaru, Liu and Sughatadasa. On the other hand, in absence of occlusions, the 3D projective shape of a spatial configuration can be recovered from a stereo pair of images, thus allowing to test for mean glaucomatous 3D projective shape change detection from standard stereo pairs of eye images.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd4607
 Format
 Thesis
 Title
 A Probabilistic and Graphical Analysis of Evidence in O.J. Simpson's Murder Case Using Bayesian Networks.
 Creator

Olumide, Kunle, Huﬀer, Fred, Shute, Valerie, Sinha, Debajyoti, Niu, Xufeng, Logan, Wayne, Department of Statistics, Florida State University
 Abstract/Description

This research work is an attempt to illustrate the versatility and wide applications of the field of statistical science. Specifically, the research work involves the application of statistics in the field of law. The application will focus on the subfields of Evidence and Criminal law using one of the most celebrated cases in the history of American jurisprudence  the 1994 O.J. Simpson murder case in California. Our task here is to do a probabilistic and graphical analysis of the body of...
Show moreThis research work is an attempt to illustrate the versatility and wide applications of the field of statistical science. Specifically, the research work involves the application of statistics in the field of law. The application will focus on the subfields of Evidence and Criminal law using one of the most celebrated cases in the history of American jurisprudence  the 1994 O.J. Simpson murder case in California. Our task here is to do a probabilistic and graphical analysis of the body of evidence in this case using Bayesian Networks. We will begin the analysis by first constructing our main hypothesis regarding the guilt or nonguilt of the accused; this main hypothesis will be supplemented by a series of ancillary hypotheses. Using graphs and probability concepts, we will be evaluating the probative force or strength of the evidence and how well the body of evidence at hand will prove our main hypothesis. We will employ Bayes rule, likelihoods and likelihood ratios to carry out such an evaluation. Some sensitivity analyses will be carried out by varying the degree of our prior beliefs or probabilities, and evaluating the effect of such variations on the likelihood ratios regarding our main hypothesis.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd2287
 Format
 Thesis