 Tests for equivalence of two survival functions: Alternative to the tests under proportional hazards..
 Creator

Martinez, Elvis E, Sinha, Debajyoti, Wang, Wenting, Lipsitz, Stuart R, Chappell, Richard J
 Abstract/Description

For either the equivalence trial or the noninferiority trial with survivor outcomes from two treatment groups, the most popular testing procedure is the extension (e.g., Wellek, A logrank test for equivalence of two survivor functions, Biometrics, 1993; 49: 877881) of logrank based test under proportional hazards model. We show that the actual type I error rate for the popular procedure of Wellek is higher than the intended nominal rate when survival responses from two treatment arms...
Show moreFor either the equivalence trial or the noninferiority trial with survivor outcomes from two treatment groups, the most popular testing procedure is the extension (e.g., Wellek, A logrank test for equivalence of two survivor functions, Biometrics, 1993; 49: 877881) of logrank based test under proportional hazards model. We show that the actual type I error rate for the popular procedure of Wellek is higher than the intended nominal rate when survival responses from two treatment arms satisfy the proportional odds survival model. When the true model is proportional odds survival model, we show that the hypothesis of equivalence of two survival functions can be formulated as a statistical hypothesis involving only the survival odds ratio parameter. We further show that our new equivalence test, formulation, and related procedures are applicable even in the presence of additional covariates beyond treatment arms, and the associated equivalence test procedures have correct type I error rates under the proportional hazards model as well as the proportional odds survival model. These results show that use of our test will be a safer statistical practice for equivalence trials of survival responses than the commonly used logrank based tests.
20170201
 20170201
 Identifier
 FSU_pmch_24925887, 10.1177/0962280214539282, PMC5557049, 24925887, 24925887, 0962280214539282
 Format
 Citation
 Title
 Approximate median regression for complex survey data with skewed response.
 Creator

Fraser, Raphael André, Lipsitz, Stuart R, Sinha, Debajyoti, Fitzmaurice, Garrett M, Pan, Yi
 Abstract/Description

The ready availability of publicuse data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and...
The ready availability of publicuse data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a doubletransformbothsides (DTBS)'based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same BoxCox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudolikelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey.
20161201
 20161201
 Identifier
 FSU_pmch_27062562, 10.1111/biom.12517, PMC5055849, 27062562, 27062562
 Format
 Citation
 Title
 Exact Bayesian pvalues for a test of independence in a 2 × 2 contingency table with missing data.
 Creator

Lin, Yan, Lipsitz, Stuart R, Sinha, Debajyoti, Fitzmaurice, Garrett, Lipshultz, Steven
 Abstract/Description

Altham (Altham PME. Exact Bayesian analysis of a 2 × 2 contingency table, and Fisher's "exact" significance test. J R Stat Soc B 1969; 31: 261269) showed that a onesided pvalue from Fisher's exact test of independence in a 2 × 2 contingency table is equal to the posterior probability of negative association in the 2 × 2 contingency table under a Bayesian analysis using an improper prior. We derive an extension of Fisher's exact test pvalue in the presence of missing data, assuming the...
Altham (Altham PME. Exact Bayesian analysis of a 2 × 2 contingency table, and Fisher's "exact" significance test. J R Stat Soc B 1969; 31: 261269) showed that a onesided pvalue from Fisher's exact test of independence in a 2 × 2 contingency table is equal to the posterior probability of negative association in the 2 × 2 contingency table under a Bayesian analysis using an improper prior. We derive an extension of Fisher's exact test pvalue in the presence of missing data, assuming the missing data mechanism is ignorable (i.e., missing at random or completely at random). Further, we propose Bayesian pvalues for a test of independence in a 2 × 2 contingency table with missing data using alternative priors; we also present results from a simulation study exploring the Type I error rate and power of the proposed exact test pvalues. An example, using data on the association between blood pressure and a cardiac enzyme, is presented to illustrate the methods.
20181101
 20181101
 Identifier
 FSU_pmch_28633606, 10.1177/0962280217702538, PMC5799034, 28633606, 28633606
 Format
 Citation
 Title
 Biascorrected estimates for logistic regression models for complex surveys with application to the United States' Nationwide Inpatient Sample.
 Creator

Rader, Kevin A, Lipsitz, Stuart R, Fitzmaurice, Garrett M, Harrington, David P, Parzen, Michael, Sinha, Debajyoti
 Abstract/Description

For complex surveys with a binary outcome, logistic regression is widely used to model the outcome as a function of covariates. Complex survey sampling designs are typically stratified cluster samples, but consistent and asymptotically unbiased estimates of the logistic regression parameters can be obtained using weighted estimating equations (WEEs) under the naive assumption that subjects within a cluster are independent. Despite the relatively large samples typical of many complex surveys,...
For complex surveys with a binary outcome, logistic regression is widely used to model the outcome as a function of covariates. Complex survey sampling designs are typically stratified cluster samples, but consistent and asymptotically unbiased estimates of the logistic regression parameters can be obtained using weighted estimating equations (WEEs) under the naive assumption that subjects within a cluster are independent. Despite the relatively large samples typical of many complex surveys, with rare outcomes, many interaction terms, or analysis of subgroups, the logistic regression parameters estimates from WEE can be markedly biased, just as with independent samples. In this paper, we propose biascorrected WEEs for complex survey data. The proposed method is motivated by a study of postoperative complications in laparoscopic cystectomy, using data from the 2009 United States' Nationwide Inpatient Sample complex survey of hospitals.
20171001
 20171001
 Identifier
 FSU_pmch_26265769, 10.1177/0962280215596550, PMC5799008, 26265769, 26265769, 0962280215596550
 Format
 Citation
 Title
 Efficient Computation of Reduced Regression Models.
 Creator

Lipsitz, Stuart R, Fitzmaurice, Garrett M, Sinha, Debajyoti, Hevelone, Nathanael, Giovannucci, Edward, Trinh, QuocDien, Hu, Jim C
 Abstract/Description

We consider settings where it is of interest to fit and assess regression submodels that arise as various explanatory variables are excluded from a larger regression model. The larger model is referred to as the full model; the submodels are the reduced models. We show that a computationally efficient approximation to the regression estimates under any reduced model can be obtained from a simple weighted least squares (WLS) approach based on the estimated regression parameters and covariance...
We consider settings where it is of interest to fit and assess regression submodels that arise as various explanatory variables are excluded from a larger regression model. The larger model is referred to as the full model; the submodels are the reduced models. We show that a computationally efficient approximation to the regression estimates under any reduced model can be obtained from a simple weighted least squares (WLS) approach based on the estimated regression parameters and covariance matrix from the full model. This WLS approach can be considered an extension to unbiased estimating equations of a firstorder Taylor series approach proposed by Lawless and Singhal. Using data from the 2010 Nationwide Inpatient Sample (NIS), a 20% weighted, stratified, cluster sample of approximately 8 million hospital stays from approximately 1000 hospitals, we illustrate the WLS approach when fitting interval censored regression models to estimate the effect of type of surgery (robotic versus nonrobotic surgery) on hospital lengthofstay while adjusting for three sets of covariates: patientlevel characteristics, hospital characteristics, and zipcode level characteristics. Ordinarily, standard fitting of the reduced models to the NIS data takes approximately 10 hours; using the proposed WLS approach, the reduced models take seconds to fit.
20170101
 20170101
 Identifier
 FSU_pmch_29104296, 10.1080/00031305.2017.1296375, PMC5664962, 29104296, 29104296
 Format
 Citation
 Title
 Association Models for Clustered Data with Binary and Continuous Responses.
 Creator

Lin, Lanjia, Sinha, Debajyoti, Hurt, Myra, Lipsitz, Stuart R., McGee, Daniel, Department of Statistics, Florida State University
 Abstract/Description

This dissertation develops novel single random effect models as well as bivariate correlated random effects model for clustered data with bivariate mixed responses. Logit and identity link functions are used for the binary and continuous responses. For the ease of interpretation of the regression effects, random effect of the binary response has bridge distribution so that the marginal model of mean of the binary response after integrating out the random effect preserves logistic form. And...
This dissertation develops novel single random effect models as well as bivariate correlated random effects model for clustered data with bivariate mixed responses. Logit and identity link functions are used for the binary and continuous responses. For the ease of interpretation of the regression effects, random effect of the binary response has bridge distribution so that the marginal model of mean of the binary response after integrating out the random effect preserves logistic form. And the marginal regression function of the continuous response preserves linear form. Withincluster and withinsubject associations could be measured by our proposed models. For the bivariate correlated random effects model, we illustrate how different levels of the association between two random effects induce different Kendall's tau values for association between the binary and continuous responses from the same cluster. Fully parametric and semiparametric Bayesian methods as well as maximum likelihood method are illustrated for model analysis. In the semiparametric Bayesian model, normality assumption of the regression error for the continuous response is relaxed by using a nonparametric Dirichlet Process prior. Robustness of the bivariate correlated random effects model using ML method to misspecifications of regression function as well as random effect distribution is investigated by simulation studies. The Bayesian and likelihood methods are applied to a developmental toxicity study of ethylene glycol in mice.
2009
 2009
 Identifier
 FSU_migr_etd1330
 Format
 Thesis
 Title
 Influence Measures for Bayesian Data Analysis.
 Creator

De Oliveira, Melaine C. (Melaine Cristina), Sinha, Debajyoti, Panton, Lynn B., Bradley, Jonathan R., Linero, Antonio Ricardo, Lipsitz, Stuart, Florida State University, College of Arts and Sciences, Department of Statistics
Show moreDe Oliveira, Melaine C. (Melaine Cristina), Sinha, Debajyoti, Panton, Lynn B., Bradley, Jonathan R., Linero, Antonio Ricardo, Lipsitz, Stuart, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Identifying influential observations in the data is desired to ensure proper inference and statistical analysis. Modern methods to identify influence cases uses crossvalidation diagnostics based on the effect of deletion of ith observation on inference. A popular method to identify influential observations is to use KullbackLiebler divergence measure between the posterior distribution of the parameter of interest given full data and the posterior distribution given the crossvalidated data...
Identifying influential observations in the data is desired to ensure proper inference and statistical analysis. Modern methods to identify influence cases uses crossvalidation diagnostics based on the effect of deletion of ith observation on inference. A popular method to identify influential observations is to use KullbackLiebler divergence measure between the posterior distribution of the parameter of interest given full data and the posterior distribution given the crossvalidated data, where the crossvalidated data has the ith observation removed. Although, in Bayesian inference, the posterior distribution contains all the relevant information about a parameter of interest, when the goal is prediction, perhaps the predictive distribution should be used to identifying influential observations. So, we extended our method to the comparison of the posterior predictive distributions given full data and crossvalidated data. We generalize and extend existing popular Bayesian crossvalidated influence diagnostics using Bregman divergence based measure (BD). We derive useful properties of these BD based on the influence of each observation on the posterior distribution and we show that it can be extended to the predictive distribution. We show that these BD based measures allow interpretable calibration and that they can be computed via Monte Carlo Markov Chain (MCMC) samples from a single posterior based on full data. We illustrate how our new measure of influence of observations have more useful practical roles for data analysis than popular Bayesian residual analysis tools (CPO) in an example of metaanalysis with binary response and in other cases of intervalcensored data.
2018
 2018
 Identifier
 2018_Su_DeOliveira_fsu_0071E_14712
 Format
 Thesis
 Title
 Semiparametric Bayesian Regression Models for Skewed Responses.
 Creator

Bhingare, Apurva Chandrashekhar, Sinha, Debajyoti, Shanbhag, Sachin, Linero, Antonio Ricardo, Bradley, Jonathan R., Pati, Debdeep, Lipsitz, Stuart, Florida State University, College of Arts and Sciences, Department of Statistics
Show moreBhingare, Apurva Chandrashekhar, Sinha, Debajyoti, Shanbhag, Sachin, Linero, Antonio Ricardo, Bradley, Jonathan R., Pati, Debdeep, Lipsitz, Stuart, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

It is common to encounter skewed response data in medicine, epidemiology and health care studies. Methodology needs to be devised to overcome the natural difficulties that occur in analyzing such data particularly when it is multivariate. Existing Bayesian statistical methods to deal with skewed data are mostly fully parametric. We propose novel semiparametric Bayesian methods to model an analyze such data. These methods make minimal assumptions about the true form of the distribution and...
It is common to encounter skewed response data in medicine, epidemiology and health care studies. Methodology needs to be devised to overcome the natural difficulties that occur in analyzing such data particularly when it is multivariate. Existing Bayesian statistical methods to deal with skewed data are mostly fully parametric. We propose novel semiparametric Bayesian methods to model an analyze such data. These methods make minimal assumptions about the true form of the distribution and structure of the observed data. Through examples from real life studies, we demonstrate practical advantages of our semiparametric Bayesian methods over the existing methods. For many reallife studies with skewed multivariate responses, the level of skewness and association structure assumptions are essential for evaluating the covariate effects on the response and its predictive distribution. First, we present a novel semiparametric multivariate model class leading to a theoretically justifiable semiparametric Bayesian analysis of multivariate skewed responses. Like the multivariate Gaussian densities, this multivariate model is closed under marginalization, allows a wide class of multivariate associations, and has meaningful physical interpretations of skewness levels and covariate effects on the marginal density. Compared to existing models, our model enjoys several desirable practical properties, including Bayesian computing via available software, and assurance of consistent Bayesian estimates of parameters and the nonparametric error density under a set of plausible prior assumptions. We introduce a particular parametric version of the model as an alternative to various parametric skewsymmetric models available in the literature. We illustrate the practical advantages of our methods over existing parametric alternatives via application to a clinical study to assess periodontal disease and through a simulation study. Unlike most of the models existing in literature, this class of models advocates a latent variable approach making implementation under the Bayesian paradigm via standard software for MCMC computation like WinBUGS/JAGS straightforward. Although, JAGS and WinBUGS are flexible MCMC engines, for complex model structures they tend to be rather slow. We offer an alternative tool to implement the aforementioned parametric version of the models using PROC MCMC in SAS. Our goal is to facilitate and encourage more extensive implementation of these models. To achieve this goal, we illustrate the implementation using PROC MCMC in SAS via examples from real life and provide a full annotated SAS code. In large scale national surveys, we often come across skewed data as well as semicontinuous data, that is, data characterized by point mass at zero (degenerate) and right skewed continuous distribution on positive support. For example, in the Medical Expenditure Panel Survey (MEPS), the variable total health care expenditure (i.e., the response) for nonusers of the health care services is zero, whereas for the users it is has continuous distribution typically skewed towards the right. We provide an overview of the existing models and methods to analyze such data.
2018
 2018
 Identifier
 2018_Sp_Bhingare_fsu_0071E_14468
 Format
 Thesis