Current Search: Research Repository (x) » Statistics (x) » Huffer, Fred W. (Fred William) (x)
Search results
 Title
 Bayesian Modeling and Variable Selection for Complex Data.
 Creator

Li, Hanning, Pati, Debdeep, Huffer, Fred W. (Fred William), Kercheval, Alec N., Sinha, Debajyoti, Bradley, Jonathan R., Florida State University, College of Arts and Sciences,...
Show moreLi, Hanning, Pati, Debdeep, Huffer, Fred W. (Fred William), Kercheval, Alec N., Sinha, Debajyoti, Bradley, Jonathan R., Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

As we routinely encounter highthroughput datasets in complex biological and environment research, developing novel models and methods for variable selection has received widespread attention. In this dissertation, we addressed a few key challenges in Bayesian modeling and variable selection for highdimensional data with complex spatial structures. a) Most Bayesian variable selection methods are restricted to mixture priors having separate components for characterizing the signal and the...
Show moreAs we routinely encounter highthroughput datasets in complex biological and environment research, developing novel models and methods for variable selection has received widespread attention. In this dissertation, we addressed a few key challenges in Bayesian modeling and variable selection for highdimensional data with complex spatial structures. a) Most Bayesian variable selection methods are restricted to mixture priors having separate components for characterizing the signal and the noise. However, such priors encounter computational issues in high dimensions. This has motivated continuous shrinkage priors, resembling the twocomponent priors facilitating computation and interpretability. While such priors are widely used for estimating highdimensional sparse vectors, selecting a subset of variables remains a daunting task. b) Spatial/spatialtemporal data sets with complex structures are nowadays commonly encountered in various scientific research fields ranging from atmospheric sciences, forestry, environmental science, biological science, and social science. Selecting important spatial variables that have significant influences on occurrences of events is undoubtedly necessary and essential for providing insights to researchers. Selfexcitation, which is a feature that occurrence of an event increases the likelihood of more occurrences of the same type of events nearby in time and space, can be found in many natural/social events. Research on modeling data with selfexcitation feature has increasingly drawn interests recently. However, existing literature on selfexciting models with inclusion of highdimensional spatial covariates is still underdeveloped. c) Gaussian Process is among the most powerful model frames for spatial data. Its major bottleneck is the computational complexity which stems from inversion of dense matrices associated with a Gaussian process covariance. Hierarchical divideconquer Gaussian Process models have been investigated for ultra large data sets. However, computation associated with scaling the distributing computing algorithm to handle a large number of subgroups poses a serious bottleneck. In chapter 2 of this dissertation, we propose a general approach for variable selection with shrinkage priors. The presence of very few tuning parameters makes our method attractive in comparison to ad hoc thresholding approaches. The applicability of the approach is not limited to continuous shrinkage priors, but can be used along with any shrinkage prior. Theoretical properties for nearcollinear design matrices are investigated and the method is shown to have good performance in a wide range of synthetic data examples and in a real data example on selecting genes affecting survival due to lymphoma. In Chapter 3 of this dissertation, we propose a new selfexciting model that allows the inclusion of spatial covariates. We develop algorithms which are effective in obtaining accurate estimation and variable selection results in a variety of synthetic data examples. Our proposed model is applied on Chicago crime data where the influence of various spatial features is investigated. In Chapter 4, we focus on a hierarchical Gaussian Process regression model for ultrahigh dimensional spatial datasets. By evaluating the latent Gaussian process on a regular grid, we propose an efficient computational algorithm through circulant embedding. The latent Gaussian process borrows information across multiple subgroups, thereby obtaining a more accurate prediction. The hierarchical model and our proposed algorithm are studied through simulation examples.
Show less  Date Issued
 2017
 Identifier
 FSU_FALL2017_Li_fsu_0071E_14159
 Format
 Thesis
 Title
 Bayesian Tractography Using Geometric Shape Priors.
 Creator

Dong, Xiaoming, Srivastava, Anuj, Klassen, E. (Eric), Wu, Wei, Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Diffusionweighted image(DWI) and tractography have been developed for decades and are key elements in recent, largescale efforts for mapping the human brain. The two techniques together provide us a unique possibility to access the macroscopic structure and connectivity of the human brain noninvasively and in vivo. The information obtained not only can help visualize brain connectivity and help segment the brain into different functional areas but also provides tools for understanding some...
Show moreDiffusionweighted image(DWI) and tractography have been developed for decades and are key elements in recent, largescale efforts for mapping the human brain. The two techniques together provide us a unique possibility to access the macroscopic structure and connectivity of the human brain noninvasively and in vivo. The information obtained not only can help visualize brain connectivity and help segment the brain into different functional areas but also provides tools for understanding some major cognitive diseases such as multiple sclerosis, schizophrenia, epilepsy, etc. There are lots of efforts have been put into this area. On the one hand, a vast spectrum of tractography algorithms have been developed in recent years, ranging from deterministic approaches through probabilistic methods to global tractography; On the other hand, various mathematical models, such as diffusion tensor, multitensor model, spherical deconvolution, Qball modeling, have been developed to better exploit the acquisition dependent signal of Diffusionweighted image(DWI). Despite considerable progress in this area, current methods still face many challenges, such as sensitive to noise, lots of false positive/negative fibers, incapable of handling complex fiber geometry and expensive computation cost. More importantly, recent researches have shown that, even with highquality data, the results using current tractography methods may not be improved, suggesting that it is unlikely to obtain an anatomically accurate map of the human brain solely based on the diffusion profile. Motivated by these issues, this dissertation develops a global approach that incorporates anatomical validated geometric shape prior when reconstructing neuron fibers. The fiber tracts between regions of interest are initialized and updated via deformations based on gradients of the posterior energy defined in this paper. This energy has contributions from diffusion data, shape prior information, and roughness penalty. The dissertation first describes and demonstrates the proposed method on the 2D dataset and then extends it to 3D Phantom data and the real brain data. The results show that the proposed method is relatively immune to issues such as noise, complicated fiber structure like fiber crossings and kissing, false positive fibers, and achieve more explainable tractography results.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_DONG_fsu_0071E_15144
 Format
 Thesis
 Title
 Envelopes, Subspace Learning and Applications.
 Creator

Wang, Wenjing, Zhang, Xin, Tao, Minjing, Li, Wen, Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Envelope model is a nascent dimension reduction technique. We focus on extending the envelope methodology to broader applications. In the first part of this thesis we propose a common reducing subspace model that can simultaneously estimating covariance, precision matrices and their differences across multiple populations. This model leads to substantial dimension reduction and efficient parameter estimation. We explicitly quantify the efficiency gain through an asymptotic analysis. In the...
Show moreEnvelope model is a nascent dimension reduction technique. We focus on extending the envelope methodology to broader applications. In the first part of this thesis we propose a common reducing subspace model that can simultaneously estimating covariance, precision matrices and their differences across multiple populations. This model leads to substantial dimension reduction and efficient parameter estimation. We explicitly quantify the efficiency gain through an asymptotic analysis. In the second part, we propose a set of new mixture models called CLEMM (Clustering with Envelope Mixture Models) that is based on the widely used Gaussian mixture model assumptions. The proposed CLEMM framework and the associated envelopeEM algorithms provides the foundations for envelope methodology in unsupervised and semisupervised learning problems. We also illustrate the performance of these models with simulation studies and empirical applications. Also, we have extended the envelope discriminant analysis from vector data to tensor data in the third part of this thesis. Another study on copulabased models for forecasting realized volatility matrix is included, which is an important financial application of estimating covariance matrices. We consider multivariatet, Clayton, and bivariate t, Gumbel, Clayton copulas to model and forecast oneday ahead realized volatility matrices. Empirical results show that copula based models can achieve significant performance both in terms of statistical precision and economical efficiency.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Wang_fsu_0071E_15085
 Format
 Thesis
 Title
 Four Methods for Combining Dependent Effects from Studies Reporting Regression Analysis.
 Creator

Gunter, Tracey Danielle, Becker, Betsy Jane, Huffer, Fred W. (Fred William), Almond, Russell G., Paek, Insu, Florida State University, College of Education, Department of...
Show moreGunter, Tracey Danielle, Becker, Betsy Jane, Huffer, Fred W. (Fred William), Almond, Russell G., Paek, Insu, Florida State University, College of Education, Department of Educational Psychology and Learning Systems
Show less  Abstract/Description

Over the years a variety of indices have been proposed to summarize regression analyses. Unfortunately the proposed indices are only appropriate when metaanalysts want to understand the role of a single predictor variable in predicting the outcome variable. However, sometimes metaanalysts want to understand the effect of a set of variables on an outcome variable. In this paper, four methods are presented for obtaining a composite effect for two focal predictor variables from a single...
Show moreOver the years a variety of indices have been proposed to summarize regression analyses. Unfortunately the proposed indices are only appropriate when metaanalysts want to understand the role of a single predictor variable in predicting the outcome variable. However, sometimes metaanalysts want to understand the effect of a set of variables on an outcome variable. In this paper, four methods are presented for obtaining a composite effect for two focal predictor variables from a single regression model. The indices are the average of the standardized regression coefficients (ASC), the average of the standardized regression coefficients using Hedges and Olkin's (1985) approach (AHO), the sheaf coefficient (SC), and the squared multiple semipartial correlation coefficient (MSP). A simulation study was conducted to examine the behavior of the indices and their variance when the number of predictor variables in the model, the sample size, the correlations between the focal predictor variables in the model, and the correlations between the focal and nonfocal predictor variables in the model were manipulated. The results of the study show that the average bias values of the ASC and AHO estimates are small even when the sample size is small. Furthermore, the ASC and AHO estimates and their estimated variances are more precise than the other indices under all conditions examined. Therefore, when metaanalysts are interested in estimating the effect of a set of predictor variables on an outcome variable from a single regression model, the ASC or AHO procedures are preferred.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Gunter_fsu_0071E_12829
 Format
 Thesis
 Title
 The Impact of Rater Variability on Relationships among Different EffectSize Indices for InterRater Agreement between Human and Automated Essay Scoring.
 Creator

Yun, Jiyeo, Becker, Betsy Jane, Huffer, Fred W. (Fred William), Paek, Insu, Zhang, Qian, Florida State University, College of Education, Department of Educational Psychology and...
Show moreYun, Jiyeo, Becker, Betsy Jane, Huffer, Fred W. (Fred William), Paek, Insu, Zhang, Qian, Florida State University, College of Education, Department of Educational Psychology and Learning Systems
Show less  Abstract/Description

Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for interrater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for interrater agreement used to assess the relatedness of human and automated essay scoring, and to examine impacts of rater variability on interrater agreement. To implement...
Show moreSince researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for interrater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for interrater agreement used to assess the relatedness of human and automated essay scoring, and to examine impacts of rater variability on interrater agreement. To implement the investigations, my study consists of two parts: empirical and simulation studies. Based on the results from the empirical study, the overall effects for interrater agreement were .63 and .99 for exact and adjacent proportions of agreement, .48 for kappas, and between .75 and .78 for correlations. Additionally, significant differences between 6point scales and the other scales (i.e., 3, 4, and 5point scales) for correlations, kappas and proportions of agreement existed. Moreover, based on the results of the simulated data, the highest agreements and lowest discrepancies achieved in the matched rater distribution pairs. Specifically, the means of exact and adjacent proportions of agreement, kappa and weighted kappa values, and correlations were .58, .95, .42, .78, and .78, respectively. Meanwhile the average standardized mean difference was .0005 in the matched rater distribution pairs. Acceptable values for interrater agreement as evaluation criteria for automated essay scoring, impacts of rater variability on interrater agreement, and relationships among interrater agreement indices were discussed.
Show less  Date Issued
 2017
 Identifier
 FSU_FALL2017_Yun_fsu_0071E_14144
 Format
 Thesis
 Title
 Impact of Violations of Measurement Invariance in Longitudinal Mediation Modeling.
 Creator

Xu, Jie, Yang, Yanyun, Zhang, Qian, Huffer, Fred W. (Fred William), Becker, Betsy J., Florida State University, College of Education, Department of Educational Psychology and...
Show moreXu, Jie, Yang, Yanyun, Zhang, Qian, Huffer, Fred W. (Fred William), Becker, Betsy J., Florida State University, College of Education, Department of Educational Psychology and Learning Systems
Show less  Abstract/Description

Research has shown that crosssectional mediation analysis cannot accurately reflect a true longitudinal mediated effect. To investigate longitudinal mediated effects, different longitudinal mediation models have been proposed and these models focus on different research questions related to longitudinal mediation. When fitting mediation models to longitudinal data, the assumption of longitudinal measurement invariance is usually made. However, the consequences of violating this assumption...
Show moreResearch has shown that crosssectional mediation analysis cannot accurately reflect a true longitudinal mediated effect. To investigate longitudinal mediated effects, different longitudinal mediation models have been proposed and these models focus on different research questions related to longitudinal mediation. When fitting mediation models to longitudinal data, the assumption of longitudinal measurement invariance is usually made. However, the consequences of violating this assumption have not been thoroughly studied in mediation analysis. No studies have examined issues of measurement noninvariance in a latent crosslagged panel mediation (LCPM) model with three or more measurement occasions. The goal of the current study is to investigate the impact of violations of measurement invariance on longitudinal mediation analysis. The focal model in the study is the LCPM model suggested by Cole and Maxwell (2003). This model can be used to examine mediated effects among the latent predictor, mediator, and outcome variables across time. In addition, it can account for measurement error and allow for the evaluation of longitudinal measurement invariance. Simulation methods were used and the investigation was performed using population covariance matrices and sample data generated under various conditions. Eight design factors were considered for data generation: sample size, proportion of noninvariant items, position of latent factors with noninvariant items, type of noninvariant parameters, magnitude of noninvariance, pattern of noninvariance, size of the direct effect, and size of the mediated effect. Results from population investigation were evaluated based on overall model fit and the calculated direct and mediated effects; results from finite sample analysis were evaluated in terms of convergence and inadmissible solutions, overall model fit, bias/relative bias, coverage rates, and statistical power/type I error rates. In general, results obtained from finite sample analysis were consistent with those from the population investigation, with respect to both model fit and parameter estimation. The type I error rate of the mediated effects was inflated under the noninvariant conditions with small sample size (200); power of the direct and mediated effects was excellent (1.0 or close to 1.0) across all investigated conditions. Type I error rates based on the chisquare statistic test were seriously inflated under the invariant conditions, especially when the sample size was relatively small. Power for detecting model misspecifications due to longitudinal noninvariance was excellent across all investigated conditions. Fit indices (CFI, TLI, RMSEA, and SRMR) were not sensitive in detecting misspecifications caused by violations of measurement invariance in the investigated LCPM model. Study results also showed that as the magnitude of noninvariance, the proportion of noninvariant items, and the number of positions of latent variables with noninvariant items increased, estimation of the direct and mediated effects tended to be less accurate. The decreasing pattern of change in item parameters over measurement occasions resulted in the least accurate estimates of the direct and mediated effects. Parameter estimates were fairly accurate under the conditions of the decreasing and then increasing pattern and the mixed pattern of change in item parameters. Findings from this study can help empirical researchers better understand the potential impact of violating measurement invariance on longitudinal mediation analysis using the LCPM model.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Xu_fsu_0071E_14994
 Format
 Thesis
 Title
 Improvement of Quality Prediction in InterConnected Manufacturing System by Integrating MultiSource Data.
 Creator

Ren, Jie, Wang, Hui, Vanli, Omer Arda, Park, Chiwoo, Huffer, Fred W. (Fred William), Florida State University, FAMUFSU College of Engineering, Department of Industrial and...
Show moreRen, Jie, Wang, Hui, Vanli, Omer Arda, Park, Chiwoo, Huffer, Fred W. (Fred William), Florida State University, FAMUFSU College of Engineering, Department of Industrial and Manufacturing Engineering
Show less  Abstract/Description

With the development of advanced sensing and network technology such as wireless data transmission and data storage and analytics under cloud platforms, the manufacturing plant is going through a new revolution, by which different production units/components can communicate with each other, leading to interconnected manufacturing. The interconnection enables the close coordination of process control actions among machines to improve product quality. Traditional quality prediction methods...
Show moreWith the development of advanced sensing and network technology such as wireless data transmission and data storage and analytics under cloud platforms, the manufacturing plant is going through a new revolution, by which different production units/components can communicate with each other, leading to interconnected manufacturing. The interconnection enables the close coordination of process control actions among machines to improve product quality. Traditional quality prediction methods that focus on the data from one single source are not sufficient to deal with the variation modeling, and quality prediction problems involved the interconnected manufacturing. Instead, new quality prediction methods that can integrate the data from multiple sources are necessary. This research addresses the fundamental challenges in improving quality prediction by data fusion for interconnected manufacturing including knowledge sharing and transfer among different machines and collaboration error monitoring. The methodology is demonstrated through surface machining and additive manufacturing processes. The first study is on the surface quality prediction for one machining process by fusing multiresolution spatial data measured from multiple surfaces or different surface machining processes. The surface variation is decomposed into a global trend part that characterizes the spatially varying relationship of selected process variables and surface height and a zeromean spatial Gaussian process part. Three models including two varying coefficientbased spatial models and an inference rulebased spatial model are proposed and compared. Also, transfer learning technique is used to help train the model via transferring useful information from a datarich surface to a datalacking surface, which demonstrates the advantage of interconnected manufacturing. The second study deals with the surface mating errors caused by the surface variations from two interconnected surface machining processes. A model aggregating data from two surfaces is proposed to predict the leak areas for surface assembly. By using the measurements of leak areas and the profiles of surfaces mated as training data along with Hagen–Poiseuille law, this study develops a novel diagnostic method to predict potential leak areas (leakage paths). The effectiveness and robustness of the proposed method are verified by an experiment and a simulation study. The approach provides practical guidance for the subsequent assembly process as well as troubleshooting in manufacturing processes. The last study focuses on the learning of quality prediction model in interconnected additive manufacturing systems, by which different 3D printing processes involved are driven by similar printing mechanisms and can exchange quality data via a network. A quality prediction model that estimates the printing widths along the printing paths for materialextrusionbased additive manufacturing (a.k.a., fused filament fabrication or fused deposition modeling) is established by leveraging the betweenprinter quality data. The established mathematical model quantifies the printing linewidth along the printing paths based on the kinematic parameters, e.g., printing speed and acceleration while considering data from multiple printers that contain betweenmachines similarity. The method can allow for the betweenprinter knowledge sharing to improve the quality prediction so that a printing process with limited historical data can quickly learn an effective quality model without intensive retraining, thus improving the system responsiveness to product variety. In the long run, the outcome of this research can help contribute to the development of highefficient InternetofThings manufacturing services for personalized products.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Ren_fsu_0071E_15160
 Format
 Thesis
 Title
 Investigating the ChiSquareBased ModelFit Indexes for WLSMV and ULSMV Estimators.
 Creator

Xia, Yan, Yang, Yanyun, Huffer, Fred W. (Fred William), Almond, Russell G., Becker, Betsy Jane, Paek, Insu, Florida State University, College of Education, Department of...
Show moreXia, Yan, Yang, Yanyun, Huffer, Fred W. (Fred William), Almond, Russell G., Becker, Betsy Jane, Paek, Insu, Florida State University, College of Education, Department of Educational Psychology and Learning Systems
Show less  Abstract/Description

In structural equation modeling (SEM), researchers use the model chisquare statistic and modelfit indexes to evaluate modeldata fit. Root mean square error of approximation (RMSEA), comparative fit index (CFI), and TuckerLewis index (TLI) are widely applied modelfit indexes. When data are ordered and categorical, the most popular estimator is the diagonally weighted least squares (DWLS) estimator. Robust corrections have been proposed to adjust the uncorrected chisquare statistic from...
Show moreIn structural equation modeling (SEM), researchers use the model chisquare statistic and modelfit indexes to evaluate modeldata fit. Root mean square error of approximation (RMSEA), comparative fit index (CFI), and TuckerLewis index (TLI) are widely applied modelfit indexes. When data are ordered and categorical, the most popular estimator is the diagonally weighted least squares (DWLS) estimator. Robust corrections have been proposed to adjust the uncorrected chisquare statistic from DWLS so that its first and second order moments are in alignment with the target central chisquare distribution under correctly specified models. DWLS with such a correction is called the mean and varianceadjusted weighted least squares (WLSMV) estimator. An alternative to WLSMV is the meanand varianceadjusted unweighted least squares (ULSMV) estimator, which has been shown to perform as well as, or slightly better than WLSMV. Because the chisquare statistic is corrected, the chisquarebased RMSEA, CFI, and TLI are thus also corrected by replacing the uncorrected chisquare statistic with the robust chisquare statistic. The robust model fit indexes calculated in such a way are named as the populationcorrected robust (PR) model fit indexes following BrosseauLiard, Savalei, and Li (2012). The PR model fit indexes are currently reported in almost every application when WLSMV or ULSMV is used. Nevertheless, previous studies have found the PR model fit indexes from WLSMV are sensitive to several factors such as sample sizes, model sizes, and thresholds for categorization. The first focus of this dissertation is on the dependency of model fit indexes on the thresholds for ordered categorical data. Because the weight matrix in the WLSMV fit function and the correction factors for both WLSMV and ULSMV include the asymptotic variances of thresholds and polychoric correlations, the model fit indexes are very likely to depend on the thresholds. The dependency of model fit indexes on the thresholds is not a desirable property, because when the misspecification lies in the factor structures (e.g., cross loadings are ignored or two factors are considered as a single factor), model fit indexes should reflect such misspecification rather than the threshold values. As alternatives to the PR model fit indexes, BrosseauLiard et al. (2012), BrosseauLiard and Savalei (2014), and Li and Bentler (2006) proposed the samplecorrected robust (SR) model fit indexes. The PR fit indexes are found to converge to distorted asymptotic values, but the SR fit indexes converge to their definitions asymptotically. However, the SR model fit indexes were proposed for continuous data, and have been neither investigated nor implemented in SEM software when WLSMV and ULSMV are applied. This dissertation thus investigates the PR and SR model fit indexes for WLSMV and ULSMV. The first part of the simulation study examines the dependency of the model fit indexes on the thresholds when the model misspecification results from omitting crossloadings or collapsing factors in confirmatory factor analysis. The study is conducted on extremely large computergenerated datasets in order to approximate the asymptotic values of model fit indexes. The results find that only the SR fit indexes from ULSMV are independent of the population threshold values, given the other design factors. The PR fit indexes from ULSMV, and the PR and SR fit indexes from WLSMV are influenced by thresholds, especially when data are binary and the hypothesized model is greatly misspecified. The second part of the simulation varies the sample sizes from 100 to 1000 to investigate whether the SR fit indexes under finite samples are more accurate estimates of the defined values of RMSEA, CFI, and TLI, compared with the uncorrected model fit indexes without robust correction and the PR fit indexes. Results show that the SR fit indexes are the more accurate in general. However, when the thresholds are different across items, data are binary, and sample size is less than 500, all versions of these indexes can be very inaccurate. In such situations, larger sample sizes are needed. In addition, the conventional cutoffs developed from continuous data with maximum likelihood (e.g., RMSEA < .06, CFI > .95, and TLI > .95; Hu & Bentler, 1999) have been applied to WLSMV and ULSMV regardless of the arguments against such a practice (e.g., Marsh, Hau, & Wen, 2004). For comparison purposes, this dissertation reports the RMSEA, CFI, and TLI based on continuous data using maximum likelihood before the variables are categorized to create ordered categorical data. Results show that the model fit indexes from maximum likelihood are very different from those from WLSMV and ULSMV, suggesting that the conventional rules should not be applied to WLSMV and ULSMV.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SU_Xia_fsu_0071E_13379
 Format
 Thesis
 Title
 Marked Determinantal Point Processes.
 Creator

Feng, Yiming, Nolder, Craig, Niu, Xufeng, Bradley, Jonathan R., Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Determinantal point processes (DPPs), which can be dened by their correlation kernels with known moments, are useful models for point patterns where nearby points exhibit repulsion. They have many nice properties, such as closedform densities, tractable estimation of parameterized families, and no edge eects. In the past, univariate DPPs have been wellstudied, both in discrete and continuous settings although their statistical applications are fairly recent and still rather limited, whereas...
Show moreDeterminantal point processes (DPPs), which can be dened by their correlation kernels with known moments, are useful models for point patterns where nearby points exhibit repulsion. They have many nice properties, such as closedform densities, tractable estimation of parameterized families, and no edge eects. In the past, univariate DPPs have been wellstudied, both in discrete and continuous settings although their statistical applications are fairly recent and still rather limited, whereas the multivariate DPPs, or the socalled multitype marked DPPs, have been little explored. In this thesis, we propose a class of multivariate DPPs based on a block kernel construction. For the marked DPP, we show that the conditions of existence of DPP can easily be satised. The block construction allows us to model the individually marked DPPs as well as controlling the scale of repulsion of points having dierent marks. Unlike other researchers who model the kernel function of a DPP, we model its spectral representation, which not only guarantees the existence of the multivariate DPP, but makes the simulationbased estimation methods readily available. In our research, we adopted bivariate complex Fourier basis, which demonstrates nice properties such as constant intensity and approximate isotropy within a short distance between the nearby points. The parameterized block kernels can approximate to commonlyused covariance functions using Fourier expansion. The parameters can be estimated using Maximum Likelihood Estimation, Bayesian approach and Minimum Contrast Estimation.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Feng_fsu_0071E_15011
 Format
 Thesis
 Title
 Median Regression for Complex Survey Data.
 Creator

Fraser, Raphael André, Sinha, Debajyoti, Lipsitz, Stuart, Carlson, Elwood, Slate, Elizabeth H., Huffer, Fred W. (Fred William), Florida State University, College of Arts and...
Show moreFraser, Raphael André, Sinha, Debajyoti, Lipsitz, Stuart, Carlson, Elwood, Slate, Elizabeth H., Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

The ready availability of publicuse data from various large national complex surveys has immense potential for the assessment of population characteristicsmeans, proportions, totals, etcetera. Using a modelbased approach, complex surveys can be used to evaluate the effectiveness of treatments and to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data...
Show moreThe ready availability of publicuse data from various large national complex surveys has immense potential for the assessment of population characteristicsmeans, proportions, totals, etcetera. Using a modelbased approach, complex surveys can be used to evaluate the effectiveness of treatments and to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to design features such as stratification, multistage sampling and unequal selection probabilities. In this paper, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a doubletransformbothsides based estimating equations approach to estimate the median regression parameters of the highly skewed response; the doubletransformbothsides method applies the same transformation twice to both the response and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudolikelihood based on minimizing absolute deviations. Furthermore, the doubletransformbothsides estimator is relatively robust to the true underlying distribution, and has much smaller mean square error than the least absolute deviations estimator. The method is motivated by an analysis of laboratory data on urinary iodine concentration from the National Health and Nutrition Examination Survey.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Fraser_fsu_0071E_12825
 Format
 Thesis
 Title
 MetaAnalysis of Factor Analyses: Comparison of Univariate and Multivariate Approaches Using Correlation Matrices and Factor Loadings.
 Creator

Cho, Kyunghwa, Becker, Betsy Jane, Huffer, Fred W. (Fred William), Paek, Insu, Yang, Yanyun, Florida State University, College of Education, Department of Educational Psychology...
Show moreCho, Kyunghwa, Becker, Betsy Jane, Huffer, Fred W. (Fred William), Paek, Insu, Yang, Yanyun, Florida State University, College of Education, Department of Educational Psychology and Learning Systems
Show less  Abstract/Description

Currently, more sophisticated techniques such as factor analyses are frequently applied in primary research thus may need to be metaanalyzed. This topic has been given little attention in the past due to its complexity. Because factor analysis is becoming more popular in research in many areas including education, social work, social science, and so on, the study of methods for the metaanalysis of factor analyses is also becoming more important. The first main purpose of this dissertation...
Show moreCurrently, more sophisticated techniques such as factor analyses are frequently applied in primary research thus may need to be metaanalyzed. This topic has been given little attention in the past due to its complexity. Because factor analysis is becoming more popular in research in many areas including education, social work, social science, and so on, the study of methods for the metaanalysis of factor analyses is also becoming more important. The first main purpose of this dissertation is to compare the results of seven different approaches to doing metaanalysis of confirmatory factor analyses. Specifically, five approaches are based on univariate metaanalysis methods. The next two approaches use multivariate metaanalysis to obtain the results of factor loadings and the standard errors of factor loadings. The results from each approach are compared. Given the fact that factor analyses are commonly used in many areas, the second purpose of this dissertation is to explore the appropriate approach or approaches to use for the metaanalysis of factor analyses, especially Confirmatory Factor Analysis (CFA). When the average sample size was small, the results of IRD, WMC, WMFL, and GLSMFL approaches showed better performance than those of UMC, MFL, and GLSMC approaches to estimating parameters. With large average sample sizes (larger than 150), the performance to estimate the parameters across all seven approaches seemed to be similar in this dissertation. Based on my simulation results, researchers who want to conduct metaanalytic confirmatory factor analysis can apply any of these approaches to synthesize the results from primary studies it their studies have n > 150.
Show less  Date Issued
 2015
 Identifier
 FSU_migr_etd9570
 Format
 Thesis
 Title
 Parameter Sensitive Feature Selection for Learning on Large Datasets.
 Creator

Gramajo, Gary, Barbu, Adrian G. (Adrian Gheorghe), Piyush, Kumar, Huffer, Fred W. (Fred William), She, Yiyuan, Zhang, Jinfeng, Florida State University, College of Arts and...
Show moreGramajo, Gary, Barbu, Adrian G. (Adrian Gheorghe), Piyush, Kumar, Huffer, Fred W. (Fred William), She, Yiyuan, Zhang, Jinfeng, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Though there are many feature selection methods for learning, they might not scale well to very large datasets, such as those generated in computer vision data. Furthermore, it can be beneficial to capture and model the variability inherent to data such as face detection where a plethora of face poses (i.e. parameters) are possible. We propose a parameter sensitive learning method that can learn effectively on datasets that can be prohibitively large. Our contributions are the following....
Show moreThough there are many feature selection methods for learning, they might not scale well to very large datasets, such as those generated in computer vision data. Furthermore, it can be beneficial to capture and model the variability inherent to data such as face detection where a plethora of face poses (i.e. parameters) are possible. We propose a parameter sensitive learning method that can learn effectively on datasets that can be prohibitively large. Our contributions are the following. First, we propose an efficient feature selection algorithm that optimizes a differentiable loss with sparsity constraints. We note that any differentiable loss can be used and will vary depending on the application. The iterative algorithm alternates parameter updates with tightening the sparsity constraints by gradually removing variables based on the coefficient magnitudes and a schedule. Second, we show how to train a single parameter sensitive classifier that models the wide range of class variability. The sole classifier is important since this reduces the amount of data necessary for training compared to methods where multiple classifiers are trained for each parameter value. Third, we show how to use nonlinear univariate response functions to obtain a nonlinear decision boundary with feature selection; an important characteristic since the separation of classes in real world datasets is very challenging. Fourth, we show it is possible to mine hard negatives with feature selection, though it is more difficult. This is vital in computer vision data where 10^5 training examples can be generated per image. Fifth, we propose an approach to perform face detection using a 3D model on a number of face keypoints. We modify binary face features from the literature (generated using random forests) to fit into our 3D model framework. Experiments on detecting the face keypoints and on face detection using the proposed 3D models and modified face features show that the feature selection dramatically improve performance and come close to the state of the art on two standard datasets for face detection . We also apply our parameter sensitive learning method with feature selection to detect malicious websites, a dataset with approximately 2.4 million websites and 3.3 million features per website. We outperform other batch algorithms and obtain results close to a high performing online algorithm but using far fewer features.
Show less  Date Issued
 2015
 Identifier
 FSU_migr_etd9604
 Format
 Thesis
 Title
 Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques.
 Creator

Scolnik, Ryan, McGee, Daniel, Slate, Elizabeth H., Eberstein, Isaac W., Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of...
Show moreScolnik, Ryan, McGee, Daniel, Slate, Elizabeth H., Eberstein, Isaac W., Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Evaluating the performance of models predicting a binary outcome can be done using a variety of measures. While some measures intend to describe the model's overall fit, others more accurately describe the model's ability to discriminate between the two outcomes. If a model fits well but doesn't discriminate well, what does that tell us? Given two models, if one discriminates well but has poor fit while the other fits well but discriminates poorly, which of the two should we choose? The...
Show moreEvaluating the performance of models predicting a binary outcome can be done using a variety of measures. While some measures intend to describe the model's overall fit, others more accurately describe the model's ability to discriminate between the two outcomes. If a model fits well but doesn't discriminate well, what does that tell us? Given two models, if one discriminates well but has poor fit while the other fits well but discriminates poorly, which of the two should we choose? The measures of interest for our research include the area under the ROC curve, Brier Score, discrimination slope, LogLoss, Rsquared and Fscore. To examine the underlying relationships among all of the measures, real data and simulation studies are used. The real data comes from multiple cardiovascular research studies and the simulation studies are run under general conditions and also for incidence rates ranging from 2% to 50%. The results of these analyses provide insight into the relationships among the measures and raise concern for scenarios when the measures may yield different conclusions. The impact of incidence rate on the relationships provides a basis for exploring alternative maximization routines to logistic regression. While most of the measures are easily optimized using the NewtonRaphson algorithm, the maximization of the area under the ROC curve requires optimization of a nonlinear, nondifferentiable function. Usage of the NelderMead simplex algorithm and close connections to economics research yield unique parameter estimates and general asymptotic conditions. Using real and simulated data to compare optimizing the area under the ROC curve to logistic regression further reveals the impact of incidence rate on the relationships, significant increases in achievable areas under the ROC curve, and differences in conclusions about including a variable in a model.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SP_Scolnik_fsu_0071E_13146
 Format
 Thesis
 Title
 Random Walks over Point Processes and Their Application in Finance.
 Creator

Salehy, Seyyed Navid, Kercheval, Alec N., Ewald, Brian, Fahim, Arash, Ökten, Giray, Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences,...
Show moreSalehy, Seyyed Navid, Kercheval, Alec N., Ewald, Brian, Fahim, Arash, Ökten, Giray, Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Mathematics
Show less  Abstract/Description

In continuoustime models in finance, it is common to assume that prices follow a geometric Brownian motion. More precisely, it is assumed that the price at time t ≥ 0 is given by Zt = Z₀exp(σBt + mt) where Z₀ is the initial price, B is standard Brownian motion, σ is the volatility, and m is the drift. We discuss how Z can be viewed as the limit of a sequence of discrete price models based on random walks. We note that in the usual random walks, jumps can only happen at deterministic times....
Show moreIn continuoustime models in finance, it is common to assume that prices follow a geometric Brownian motion. More precisely, it is assumed that the price at time t ≥ 0 is given by Zt = Z₀exp(σBt + mt) where Z₀ is the initial price, B is standard Brownian motion, σ is the volatility, and m is the drift. We discuss how Z can be viewed as the limit of a sequence of discrete price models based on random walks. We note that in the usual random walks, jumps can only happen at deterministic times. We first construct a natural simple model for price by considering a random walk in which jumps can happen at random times following a counting process N. We then develop a sequence of discrete price models using random walks over point processes. The limit process gives the new price model: Zt = Z₀exp(σBΛt + mΛt), where Λ is the compensator for the counting process N. We note that if N is a Poisson process with intensity 1, then this model coincides with the geometric Brownian motion model for the price. But this new model provides more flexibility as we can choose N to be many other wellknown counting processes. This includes not only homogeneous and inhomogeneous Poisson processes which have deterministic compensators but also Hawkes processes which have stochastic compensators. We also discuss and prove many properties for the process BΛ. For example, we show that BΛ is a continuous square integrable martingale. Moreover, we discuss when BΛ has uncorrelated increments and when it has independent increments. Moreover, we investigate how the BlackScholes pricing formula will change if the price of the risky asset follows this new model when N is an inhomogeneous Poisson process. We show that the usual BlackScholes formula is obtained when the counting process N is a Poisson process with intensity 1.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Salehy_fsu_0071E_15152
 Format
 Thesis
 Title
 SemiParametric Generalized Estimating Equations with Kernel Smoother: A Longitudinal Study in Financial Data Analysis.
 Creator

Yang, Liu, Niu, Xufeng, Cheng, Yingmei, Huffer, Fred W. (Fred William), Tao, Minjing, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Longitudinal studies are widely used in various fields, such as public health, clinic trials and financial data analysis. A major challenge for longitudinal studies is repeated measurements from each subject, which cause time dependent correlation within subjects. Generalized Estimating Equations can deal with correlated outcomes for longitudinal data through marginal effect. My model will base on Generalized Estimating Equations with semiparametric approach, providing a flexible structure...
Show moreLongitudinal studies are widely used in various fields, such as public health, clinic trials and financial data analysis. A major challenge for longitudinal studies is repeated measurements from each subject, which cause time dependent correlation within subjects. Generalized Estimating Equations can deal with correlated outcomes for longitudinal data through marginal effect. My model will base on Generalized Estimating Equations with semiparametric approach, providing a flexible structure for regression models: coefficients for parametric covariates will be estimated and nuisance covariates will be fitted in kernel smoothers for nonparametric part. Profile kernel estimator and the seemingly unrelated kernel estimator (SUR) will be used to deliver consistent and efficient semiparametric estimators comparing to parametric models. We provide simulation results for estimating semiparametric models with one or multiple nonparametric terms. In application part, we would like to focus on financial market: a credit card loan data will be used with the payment information for each customer across 6 months, investigating whether gender, income, age or other factors will influence payment status significantly. Furthermore, we propose model comparisons to evaluate whether our model should be fitted based on different levels of factors, such as male and female or based on different types of estimating methods, such as parametric estimation or semiparametric estimation.
Show less  Date Issued
 2017
 Identifier
 FSU_FALL2017_YANG_fsu_0071E_14219
 Format
 Thesis
 Title
 Shape Based Function Estimation.
 Creator

Dasgupta, Sutanoy, Srivastava, Anuj, Pati, Debdeep, Klassen, E. (Eric), Huffer, Fred W. (Fred William), Wu, Wei, Florida State University, College of Arts and Sciences,...
Show moreDasgupta, Sutanoy, Srivastava, Anuj, Pati, Debdeep, Klassen, E. (Eric), Huffer, Fred W. (Fred William), Wu, Wei, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Estimation of functions is an extremely rich and wellresearched topic of research with broad applications spanning several scientific fields. We develop a shape based framework for probability density and general function modelling. The framework encompasses both shape constrained and unconstrained estimation, and can accomodate a much broader notion of shape constraints than has been considered in literature. The estimation approach is a two step process where the first step creates a...
Show moreEstimation of functions is an extremely rich and wellresearched topic of research with broad applications spanning several scientific fields. We develop a shape based framework for probability density and general function modelling. The framework encompasses both shape constrained and unconstrained estimation, and can accomodate a much broader notion of shape constraints than has been considered in literature. The estimation approach is a two step process where the first step creates a template or an initial guess, and the second important step ``improves" the estimate according to an appropriate objective function. We derive asymptotic properties of the estimators in different scenarios, and illustrate the performance of the estimate through several simulation as well as real data examples.
Show less  Date Issued
 2019
 Identifier
 2019_Summer_Dasgupta_fsu_0071E_15347
 Format
 Thesis
 Title
 Sparse Feature and Element Selection in HighDimensional Vector Autoregressive Models.
 Creator

Huang, Xue, Niu, Xufeng, She, Yiyuan, Cheng, Yingmei, Huffer, Fred W. (Fred William), Wu, Wei, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

This thesis is to identify the underlying structures of multivariate time series and propose a methodology to construct predictive VAR models. Due to the complexity of high dimensions in multivariate time series, forecasting a target series with many predictors in VAR models poses a challenge in statistical learning and modeling. The quadratically increasing dimension of parameter space, which is known as "curse of dimensionality" poses considerable challenges to multivariate time series...
Show moreThis thesis is to identify the underlying structures of multivariate time series and propose a methodology to construct predictive VAR models. Due to the complexity of high dimensions in multivariate time series, forecasting a target series with many predictors in VAR models poses a challenge in statistical learning and modeling. The quadratically increasing dimension of parameter space, which is known as "curse of dimensionality" poses considerable challenges to multivariate time series models. Meanwhile, there are two facts involved in reducing dimensions in multivariate time series: first, some nuisance time series exist and better to be removed, second a target time series is typically driven by few dependent elements constructed from some indices. To address these challenge and facts, our approach is to reduce both the dimensions of the series and the features involved in each series simultaneously. As a result, the original high dimensional structure can be modeled using a lower dimensional time series, and subsequently the forecasting performance will be improved. The methodology we introduced in this work is called Sparse Feature and Element Selection (SFES). It employs a "L1 + group L1" penalty to conduct group selection and variable selection within each group simultaneously. Our contributions in this thesis are twofolds. First, the doublyconstrained regularization in SFES is a convex mathematical problem, and we optimize it using a fast but simpletoimplement algorithm. We evaluate this algorithm with a largescale dataset and theoretically prove that it has guaranteed strict iterative convergence and global optimality. Second, we theoretically present nonasymptotic results based on combined statistical and computational analysis. A sharp oracle inequality is proved to reveal its power in predictive learning. We compare SFES with the related work of Sparse Group Lasso (SGL) to show that the proposed method is both computationally efficient and theoretically justified. Experiments using simulation data and realworld macroeconomic time series data are conducted to demonstrate the efficiency and efficacy of the proposed SFES in practice.
Show less  Date Issued
 2016
 Identifier
 FSU_FA2016_Huang_fsu_0071E_13659
 Format
 Thesis
 Title
 Spatial Statistics and Its Applications in Biostatistics and Environmental Statistics.
 Creator

Hu, Guanyu, Huffer, Fred W. (Fred William), Paek, Insu, Sinha, Debajyoti, Slate, Elizabeth H., Bradley, Jonathan R., Florida State University, College of Arts and Sciences,...
Show moreHu, Guanyu, Huffer, Fred W. (Fred William), Paek, Insu, Sinha, Debajyoti, Slate, Elizabeth H., Bradley, Jonathan R., Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

This dissertation presents some topics in spatial statistics and their application in biostatistics and environmental statistics. The field of spatial statistics is an energetic area in statistics. In Chapter 2 and Chapter 3, the goal is to build subregion models under the assumption that the responses or the parameters are spatially correlated. For regression models, considering spatially varying coecients is a reasonable way to build subregion models. There are two different techniques for...
Show moreThis dissertation presents some topics in spatial statistics and their application in biostatistics and environmental statistics. The field of spatial statistics is an energetic area in statistics. In Chapter 2 and Chapter 3, the goal is to build subregion models under the assumption that the responses or the parameters are spatially correlated. For regression models, considering spatially varying coecients is a reasonable way to build subregion models. There are two different techniques for exploring spatially varying coecients. One is geographically weighted regression (Brunsdon et al. 1998). The other is a spatially varying coecients model which assumes a stationary Gaussian process for the regression coecients (Gelfand et al. 2003). Based on the ideas of these two techniques, we introduce techniques for exploring subregion models in survival analysis which is an important area of biostatistics. In Chapter 2, we introduce modied versions of the KaplanMeier and NelsonAalen estimators which incorporate geographical weighting. We use ideas from counting process theory to obtain these modied estimators, to derive variance estimates, and to develop associated hypothesis tests. In Chapter 3, we introduce a Bayesian parametric accelerated failure time model with spatially varying coefficients. These two techniques can explore subregion models in survival analysis using both nonparametric and parametric approaches. In Chapter 4, we introduce Bayesian parametric covariance regression analysis for a response vector. The proposed method denes a regression model between the covariance matrix of a pdimensional response vector and auxiliary variables. We propose a constrained MetropolisHastings algorithm to get the estimates. Simulation results are presented to show performance of both regression and covariance matrix estimates. Furthermore, we have a more realistic simulation experiment in which our Bayesian approach has better performance than the MLE. Finally, we illustrate the usefulness of our model by applying it to the Google Flu data. In Chapter 5, we give a brief summary of future work.
Show less  Date Issued
 2017
 Identifier
 FSU_FALL2017_Hu_fsu_0071E_14205
 Format
 Thesis
 Title
 TimeVarying Mixture Models for Financial Risk Management.
 Creator

Zhang, Shuguang, Niu, Xufeng, Cheng, Yingmei, Huffer, Fred W. (Fred William), Tao, Minjing, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Motivated by understanding the devastating financial crisis in 2008 that was partially caused by underestimation of financial risk, we propose a class of timevarying mixture models for risk analysis and management. There are various metrics for financial risk including value at risk (VaR), expected shortfall, expected / unexpected loss, etc. In this study we focus on VaR and one commonly used method to estimate VaR is the VarianceCovariance method, in which normal distribution is usually...
Show moreMotivated by understanding the devastating financial crisis in 2008 that was partially caused by underestimation of financial risk, we propose a class of timevarying mixture models for risk analysis and management. There are various metrics for financial risk including value at risk (VaR), expected shortfall, expected / unexpected loss, etc. In this study we focus on VaR and one commonly used method to estimate VaR is the VarianceCovariance method, in which normal distribution is usually assumed for asset returns that may underestimate the real risk. To address this issue, in this study we propose a series of twocomponent mixture models  one component is normal distribution and the other is a fattailed distribution such as Cauchy distribution, student's tdistribution or Gumbel distribution. Instead of assuming distribution parameters and weights to be constant, we allow them to change over time which guarantees exibility of our models. Monte Carlo ExpectationMaximization method and Monte Carlo maximum likelihood estimation were used for parameter estimation. Simulation studies are conducted and the models are applied in stock market price data.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SP_Zhang_fsu_0071E_13150
 Format
 Thesis
 Title
 Tools for Statistical Analysis on Shape Spaces of ThreeDimensional Object.
 Creator

Xie, Qian, Srivastava, Anuj, Klassen, E. (Eric), Huffer, Fred W. (Fred William), Wu, Wei, Zhang, Jinfeng, Florida State University, College of Arts and Sciences, Department of...
Show moreXie, Qian, Srivastava, Anuj, Klassen, E. (Eric), Huffer, Fred W. (Fred William), Wu, Wei, Zhang, Jinfeng, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

With the increasing popularity of information technology, especially electronic imaging techniques, large amount of high dimensional data such as 3D shapes become pervasive in science, engineering and even people's daily life, in the recent years. Though the data quantity is huge, the extraction of relevant knowledge on those data is still limited. How to understand data in a meaningful way is generally an open problem. The specific challenges include finding adequate mathematical...
Show moreWith the increasing popularity of information technology, especially electronic imaging techniques, large amount of high dimensional data such as 3D shapes become pervasive in science, engineering and even people's daily life, in the recent years. Though the data quantity is huge, the extraction of relevant knowledge on those data is still limited. How to understand data in a meaningful way is generally an open problem. The specific challenges include finding adequate mathematical representations of data and designing proper algorithms to process them. The existing tools for analyzing highdimensional data, including 3D shape data, are found to be insufficient as they usually suffer from many factors, such as misalignments, noise, and clutter. This thesis attempts to develop a framework for processing, analyzing and understanding highdimensional data, especially 3D shapes, by proposing a set of statistical tools including theory, algorithms and optimization applied to practical problems. In particular, the following aspects of shape analysis are considered: 1. A framework adopting the SRNF representation, based on parallel transport of deformations across surfaces in the shape space, leads to statistical analysis on shape data. Three main analyses are conducted under this framework: (1) computing geodesics when either two end surfaces or the starting surface and an initial deformation are given; (2) parallel transporting deformation across surfaces; and (3) sampling random surfaces. 2. Computational efficiency plays an important role in performing statistical shape analysis on large datasets of 3D objects. To speed up the previous method, a framework with numerical solution is introduced by approximating the inverse mapping, and it reduces the computational cost by an order of magnitude. 3. The geometrical and morphological information, or their shapes, of 3D objects can be analyzed explicitly using boundaries extracted from original image scans. An alternative idea is to consider variability in shapes directly from their embedding images. A novel framework is proposed to unify three important tasks, registering, comparing and modeling images. 4. Finally, the spatial deformations learned from registering images are modeled using the GRID based decomposition. This specific model provides a way to decompose a large deformation into local and fundamental ones so that shape differences between images are easily interpretable. We conclude this thesis with conclusions drawn in this research and discuss potential future directions of statistical shape analysis in the last chapter, both from methodological and application aspects.
Show less  Date Issued
 2015
 Identifier
 FSU_migr_etd9495
 Format
 Thesis
 Title
 Univariate and Multivariate Volatility Models for Portfolio Value at Risk.
 Creator

Xiao, Jingyi, Niu, Xufeng, Ökten, Giray, Wu, Wei, Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

In modern day financial risk management, modeling and forecasting stock return movements via their conditional volatilities, particularly predicting the Value at Risk (VaR), became increasingly more important for a healthy economical environment. In this dissertation, we evaluate and compare two main families of models for the conditional volatilities  GARCH and Stochastic Volatility (SV)  in terms of their VaR prediction performance of 5 major US stock indices. We calculate GARCHtype...
Show moreIn modern day financial risk management, modeling and forecasting stock return movements via their conditional volatilities, particularly predicting the Value at Risk (VaR), became increasingly more important for a healthy economical environment. In this dissertation, we evaluate and compare two main families of models for the conditional volatilities  GARCH and Stochastic Volatility (SV)  in terms of their VaR prediction performance of 5 major US stock indices. We calculate GARCHtype model parameters via Quasi Maximum Likelihood Estimation (QMLE) while for those of SV we employ MCMC with Ancillary Sufficient Interweaving Strategy. We use the forecast volatilities corresponding to each model to predict the VaR of the 5 indices. We test the predictive performances of the estimated models by a twostage backtesting procedure and then compare them via the Lopez loss function. Results of this dissertation indicate that even though it is more computational demanding than GARCHtype models, SV dominates them in forecasting VaR. Since financial volatilities are moving together across assets and markets, it becomes apparent that modeling the volatilities in a multivariate framework of modeling is more appropriate. However, existing studies in the literature do not present compelling evidence for a strong preference between univariate and multivariate models. In this dissertation we also address the problem of forecasting portfolio VaR via multivariate GARCH models versus univariate GARCH models. We construct 3 portfolios with stock returns of 3 major US stock indices, 6 major banks and 6 major technical companies respectively. For each portfolio, we model the portfolio conditional covariances with GARCH, EGARCH and MGARCHBEKK, MGARCHDCC, and GOGARCH models. For each estimated model, the forecast portfolio volatilities are further used to calculate (portfolio) VaR. The ability to capture the portfolio volatilities is evaluated by MAE and RMSE; the VaR prediction performance is tested through a twostage backtesting procedure and compared in terms of the loss function. The results of our study indicate that even though MGARCH models are better in predicting the volatilities of some portfolios, GARCH models could perform as well as their multivariate (and computationally more demanding) counterparts.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Xiao_fsu_0071E_15172
 Format
 Thesis