You are here

Department of Statistics

Permalink: https://diginole.lib.fsu.edu/islandora/object/fsu:department_of_statistics

Pages

An investigation of the association of genetic susceptibility risk with somatic mutation burden in breast cancer
An investigation of the association of genetic susceptibility risk with somatic mutation burden in breast cancer
Background: Genome-wide association studies have reported nearly 100 common germline susceptibility loci associated with the risk for breast cancer. Tumour sequencing studies have characterised somatic mutation profiles in breast cancer patients. The relationship between breast cancer susceptibility loci and somatic mutation patterns in breast cancer remains largely unexplored. Methods: We used single-nucleotide polymorphism (SNP) genotyping array data and tumour exome sequencing data available from 638 breast cancer patients of European ancestry from The Cancer Genome Atlas (TCGA) project. We analysed both genotype data and, when necessary, imputed genotypes for 90 known breast cancer susceptibility loci. We performed linear regression models to investigate possible associations between germline risk variants with total somatic mutation count (TSMC), as well as specific mutation types. We examined individual SNP genotypes, as well as a multi-SNP polygenic risk score (PRS). Models were statistically adjusted for age at diagnosis, stage, oestrogen-receptor (ER) and progesterone-receptor (PR) status of breast cancer. We also performed stratified analyses by ER and PR status. Results: We observed a significant inverse association (P = 8.75 x 10(-6); FDR = 0.001) between the risk allele in rs2588809 of the gene RAD51B and TSMC across all breast cancer patients, for both ER+ and ER- tumours. This association was also evident for different types of mutations. The PRS analysis for all patients, with or without rs2588809, showed a significant inverse association (P = 0.01 and 0.04, respectively) with TSMC. This inverse association was significant in ER+ patients with the ER+-specific PRS (P = 0.02), but not among ER- patients for the ER--specific PRS (P = 0.39). Conclusions: We observed an inverse association between common germline risk variants and TSMC, which, if confirmed, could provide new insights into how germline variation informs our understanding of somatic mutation patterns in breast cancer., Keywords: association study, breast cancer, common, exome sequencing, genome-wide association, landscape, loci, polygenic risk score, single-nucleotide polymorphisms, somatic mutations, Publication Note: The publisher’s version of record is available at http://www.dx.doi.org/10.1038/bjc.2016.223
Approximate median regression for complex survey data with skewed response.
Approximate median regression for complex survey data with skewed response.
The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS)'based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey., Keywords: Complex survey, Median regression, Quantile regression, Sandwich estimator, Transform-both-sides, Grant Number: R01 GM029745, R03 CA205018, R01 CA069222, R01 CA160679, R01 AI060373, P01 CA068484, R01 CA074015, Publication Note: This NIH-funded author manuscript originally appeared in PubMed Central at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5055849.
Are screening methods useful in feature selection?
Are screening methods useful in feature selection?
Filter or screening methods are often used as a preprocessing step for reducing the number of variables used by a learning algorithm in obtaining a classification or regression model. While there are many such filter methods, there is a need for an objective evaluation of these methods. Such an evaluation is needed to compare them with each other and also to answer whether they are at all useful, or a learning algorithm could do a better job without them. For this purpose, many popular screening methods are partnered in this paper with three regression learners and five classification learners and evaluated on ten real datasets to obtain accuracy criteria such as R-square and area under the ROC curve (AUC). The obtained results are compared through curve plots and comparison tables in order to find out whether screening methods help improve the performance of learning algorithms and how they fare with each other. Our findings revealed that the screening methods were useful in improving the prediction of the best learner on two regression and two classification datasets out of the ten datasets evaluated., Keywords: filter methods, screening methods, feature selection, machine learning, high dimensional data
Automatic extraction of protein-protein interactions using grammatical relationship graph.
Automatic extraction of protein-protein interactions using grammatical relationship graph.
Relationships between bio-entities (genes, proteins, diseases, etc.) constitute a significant part of our knowledge. Most of this information is documented as unstructured text in different forms, such as books, articles and on-line pages. Automatic extraction of such information and storing it in structured form could help researchers more easily access such information and also make it possible to incorporate it in advanced integrative analysis. In this study, we developed a novel approach to extract bio-entity relationships information using Nature Language Processing (NLP) and a graph-theoretic algorithm. Our method, called GRGT (Grammatical Relationship Graph for Triplets), not only extracts the pairs of terms that have certain relationships, but also extracts the type of relationship (the word describing the relationships). In addition, the directionality of the relationship can also be extracted. Our method is based on the assumption that a triplet exists for a pair of interactions. A triplet is defined as two terms (entities) and an interaction word describing the relationship of the two terms in a sentence. We first use a sentence parsing tool to obtain the sentence structure represented as a dependency graph where words are nodes and edges are typed dependencies. The shortest paths among the pairs of words in the triplet are then extracted, which form the basis for our information extraction method. Flexible pattern matching scheme was then used to match a triplet graph with unknown relationship to those triplet graphs with labels (True or False) in the database. We applied the method on three benchmark datasets to extract the protein-protein-interactions (PPIs), and obtained better precision than the top performing methods in literature. We have developed a method to extract the protein-protein interactions from biomedical literature. PPIs extracted by our method have higher precision among other methods, suggesting that our method can be used to effectively extract PPIs and deposit them into databases. Beyond extracting PPIs, our method could be easily extended to extracting relationship information between other bio-entities., Keywords: Graph-theoretic algorithm, Information extraction, Nature language processing, Protein-protein-interactions, Relationship extraction, Publication Note: This NIH-funded author manuscript originally appeared in PubMed Central at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6069288.
Automatic stage identification of Drosophila egg chamber based on DAPI images
Automatic stage identification of Drosophila egg chamber based on DAPI images
The Drosophila egg chamber, whose development is divided into 14 stages, is a well-established model for developmental biology. However, visual stage determination can be a tedious, subjective and time-consuming task prone to errors. Our study presents an objective, reliable and repeatable automated method for quantifying cell features and classifying egg chamber stages based on DAPI images. The proposed approach is composed of two steps: 1) a feature extraction step and 2) a statistical modeling step. The egg chamber features used are egg chamber size, oocyte size, egg chamber ratio and distribution of follicle cells. Methods for determining the on-site of the polytene stage and centripetal migration are also discussed. The statistical model uses linear and ordinal regression to explore the stage-feature relationships and classify egg chamber stages. Combined with machine learning, our method has great potential to enable discovery of hidden developmental mechanisms., Keywords: endocycle, follicle cell-differentiation, melanogaster, morphogenesis, notch pathway, oogenesis, pattern-formation, polarity, Proliferation, watershed segmentation, Publication Note: The publisher’s version of record is available at http://www.dx.doi.org/10.1038/srep18850
Automatic stage identification of Drosophila egg chamber based on DAPI images.
Automatic stage identification of Drosophila egg chamber based on DAPI images.
The Drosophila egg chamber, whose development is divided into 14 stages, is a well-established model for developmental biology. However, visual stage determination can be a tedious, subjective and time-consuming task prone to errors. Our study presents an objective, reliable and repeatable automated method for quantifying cell features and classifying egg chamber stages based on DAPI images. The proposed approach is composed of two steps: 1) a feature extraction step and 2) a statistical modeling step. The egg chamber features used are egg chamber size, oocyte size, egg chamber ratio and distribution of follicle cells. Methods for determining the on-site of the polytene stage and centripetal migration are also discussed. The statistical model uses linear and ordinal regression to explore the stage-feature relationships and classify egg chamber stages. Combined with machine learning, our method has great potential to enable discovery of hidden developmental mechanisms., Grant Number: R01 GM072562, R01GM072562, Publication Note: This NIH-funded author manuscript originally appeared in PubMed Central at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702167.
Bayesian Approaches For Missing Not At Random Outcome Data
Bayesian Approaches For Missing Not At Random Outcome Data
Missing data is almost always present in real datasets, and introduces several statistical issues. One fundamental issue is that, in the absence of strong uncheckable assumptions, effects of interest are typically not non-parametrically identified. In this article, we review the generic approach of the use of identifying restrictions from a likelihood-based perspective, and provide points of contact for several recently proposed methods. An emphasis of this review is on restrictions for nonmonotone missingness, a subject that has been treated sparingly in the literature. We also present a general, fully Bayesian, approach which is widely applicable and capable of handling a variety of identifying restrictions in a uniform manner., Keywords: multiple imputation, longitudinal data, Missing data, global sensitivity-analysis, incomplete data, informative drop-out, mixture models, mnar, multiple-imputation, multivariate data, nonignorable missingness, nonparametric Bayes, nonresponse models, pattern-mixture-models, regression-models, shared-parameter models, Publication Note: The publisher’s version of record is available at https://doi.org/10.1214/17-STS630
Bayesian Semiparametric Multivariate Density Deconvolution.
Bayesian Semiparametric Multivariate Density Deconvolution.
We consider the problem of multivariate density deconvolution when interest lies in estimating the distribution of a vector valued random variable but precise measurements on are not available, observations being contaminated by measurement errors . The existing sparse literature on the problem assumes the density of the measurement errors to be completely known. We propose robust Bayesian semiparametric multivariate deconvolution approaches when the measurement error density of is not known but replicated proxies are available for at least some individuals. Additionally, we allow the variability of to depend on the associated unobserved values of through unknown relationships, which also automatically includes the case of multivariate multiplicative measurement errors. Basic properties of finite mixture models, multivariate normal kernels and exchangeable priors are exploited in novel ways to meet modeling and computational challenges. Theoretical results showing the flexibility of the proposed methods in capturing a wide variety of data generating processes are provided. We illustrate the efficiency of the proposed methods in recovering the density of through simulation experiments. The methodology is applied to estimate the joint consumption pattern of different dietary components from contaminated 24 hour recalls. Supplementary Material presents substantive additional details., Keywords: B-splines, Conditional heteroscedasticity, Latent factor analyzers, Measurement errors, Mixture models, Multivariate density deconvolution, Regularization, Shrinkage, Grant Number: R01 CA057030, R01 CA194391, U01 CA057030, Publication Note: This NIH-funded author manuscript originally appeared in PubMed Central at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6075844.
Bayesian Variable Selection For Pareto Regression Models With Latent Multivariate Log Gamma Process With Applications To Earthquake Magnitudes
Bayesian Variable Selection For Pareto Regression Models With Latent Multivariate Log Gamma Process With Applications To Earthquake Magnitudes
Generalized linear models are routinely used in many environment statistics problems such as earthquake magnitudes prediction. Hu et al. proposed Pareto regression with spatial random effects for earthquake magnitudes. In this paper, we propose Bayesian spatial variable selection for Pareto regression based on Bradley et al. and Hu et al. to tackle variable selection issue in generalized linear regression models with spatial random effects. A Bayesian hierarchical latent multivariate log gamma model framework is applied to account for spatial random effects to capture spatial dependence. We use two Bayesian model assessment criteria for variable selection including Conditional Predictive Ordinate (CPO) and Deviance Information Criterion (DIC). Furthermore, we show that these two Bayesian criteria have analytic connections with conditional AIC under the linear mixed model setting. We examine empirical performance of the proposed method via a simulation study and further demonstrate the applicability of the proposed method in an analysis of the earthquake data obtained from the United States Geological Survey (USGS)., Keywords: model selection, cpo, dic, earthquake hazard, predictive approach, Publication Note: The publisher's version of record is available at https://doi.org/10.3390/geosciences9040169
Bayesian mixture model for missing data in marine mammal growth analysis.
Bayesian mixture model for missing data in marine mammal growth analysis.
Much of what is known about bottle nose dolphin () anatomy and physiology is based on necropsies from stranding events. Measurements of total body length, total body mass, and age are used to estimate growth. It is more feasible to retrieve and transport smaller animals for total body mass measurement than larger animals, introducing a systematic bias in sampling. Adverse weather events, volunteer availability, and other unforeseen circumstances also contribute to incomplete measurement. We have developed a Bayesian mixture model to describe growth in detected stranded animals using data from both those that are fully measured and those not fully measured. Our approach uses a shared random effect to link the missingness mechanism (i.e. full/partial measurement) to distinct growth curves in the fully and partially measured populations, thereby enabling drawing of strength for estimation. We use simulation to compare our model to complete case analysis and two common multiple imputation methods according to model mean square error. Results indicate that our mixture model provides better fit both when the two populations are present and when they are not. The feasibility and utility of our new method is demonstrated by application to South Carolina strandings data., Keywords: Gibbs sampler, Growth, Necropsy sampling, Selection bias, Tursiops truncatus, Grant Number: T32 GM074934, Publication Note: This NIH-funded author manuscript originally appeared in PubMed Central at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5425172.
Bias Caused By Sampling Error In Meta-analysis With Small Sample Sizes
Bias Caused By Sampling Error In Meta-analysis With Small Sample Sizes
Background Meta-analyses frequently include studies with small sample sizes. Researchers usually fail to account for sampling error in the reported within-study variances; they model the observed study-specific effect sizes with the within-study variances and treat these sample variances as if they were the true variances. However, this sampling error may be influential when sample sizes are small. This article illustrates that the sampling error may lead to substantial bias in meta-analysis results. Methods We conducted extensive simulation studies to assess the bias caused by sampling error. Meta-analyses with continuous and binary outcomes were simulated with various ranges of sample size and extents of heterogeneity. We evaluated the bias and the confidence interval coverage for five commonly-used effect sizes (i.e., the mean difference, standardized mean difference, odds ratio, risk ratio, and risk difference). Results Sampling error did not cause noticeable bias when the effect size was the mean difference, but the standardized mean difference, odds ratio, risk ratio, and risk difference suffered from this bias to different extents. The bias in the estimated overall odds ratio and risk ratio was noticeable even when each individual study had more than 50 samples under some settings. Also, Hedges' g, which is a bias-corrected estimate of the standardized mean difference within studies, might lead to larger bias than Cohen's d in meta-analysis results. Conclusions Cautions are needed to perform meta-analyses with small sample sizes. The reported within-study variances may not be simply treated as the true variances, and their sampling error should be fully considered in such meta-analyses., Keywords: difference, absolute risk, binary outcomes, detect publication bias, heterogeneity variance, odds ratio, random-effects models, rare events, relative risk, sparse data, Publication Note: The publisher’s version of record is available at https://doi.org/10.1371/journal.pone.0204056
Bias caused by sampling error in meta-analysis with small sample sizes.
Bias caused by sampling error in meta-analysis with small sample sizes.
Meta-analyses frequently include studies with small sample sizes. Researchers usually fail to account for sampling error in the reported within-study variances; they model the observed study-specific effect sizes with the within-study variances and treat these sample variances as if they were the true variances. However, this sampling error may be influential when sample sizes are small. This article illustrates that the sampling error may lead to substantial bias in meta-analysis results. We conducted extensive simulation studies to assess the bias caused by sampling error. Meta-analyses with continuous and binary outcomes were simulated with various ranges of sample size and extents of heterogeneity. We evaluated the bias and the confidence interval coverage for five commonly-used effect sizes (i.e., the mean difference, standardized mean difference, odds ratio, risk ratio, and risk difference). Sampling error did not cause noticeable bias when the effect size was the mean difference, but the standardized mean difference, odds ratio, risk ratio, and risk difference suffered from this bias to different extents. The bias in the estimated overall odds ratio and risk ratio was noticeable even when each individual study had more than 50 samples under some settings. Also, Hedges' g, which is a bias-corrected estimate of the standardized mean difference within studies, might lead to larger bias than Cohen's d in meta-analysis results. Cautions are needed to perform meta-analyses with small sample sizes. The reported within-study variances may not be simply treated as the true variances, and their sampling error should be fully considered in such meta-analyses., Publication Note: This NIH-funded author manuscript originally appeared in PubMed Central at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6136825.
Bias-corrected estimates for logistic regression models for complex surveys with application to the United States' Nationwide Inpatient Sample.
Bias-corrected estimates for logistic regression models for complex surveys with application to the United States' Nationwide Inpatient Sample.
For complex surveys with a binary outcome, logistic regression is widely used to model the outcome as a function of covariates. Complex survey sampling designs are typically stratified cluster samples, but consistent and asymptotically unbiased estimates of the logistic regression parameters can be obtained using weighted estimating equations (WEEs) under the naive assumption that subjects within a cluster are independent. Despite the relatively large samples typical of many complex surveys, with rare outcomes, many interaction terms, or analysis of subgroups, the logistic regression parameters estimates from WEE can be markedly biased, just as with independent samples. In this paper, we propose bias-corrected WEEs for complex survey data. The proposed method is motivated by a study of postoperative complications in laparoscopic cystectomy, using data from the 2009 United States' Nationwide Inpatient Sample complex survey of hospitals., Keywords: Binary responses, Bladder cancer, Population survey, Stratified cluster sampling, Weighted estimating equations, Grant Number: R01 CA160679, Publication Note: This NIH-funded author manuscript originally appeared in PubMed Central at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5799008.
Chromatin structure profile data from DNS-seq
Chromatin structure profile data from DNS-seq
Presented here are data from Next-Generation Sequencing of differential micrococcal nuclease digestions of formaldehyde-crosslinked chromatin in selected tissues of maize () inbred line B73. Supplemental materials include a wet-bench protocol for making DNS-seq libraries, the DNS-seq data processing pipeline for producing genome browser tracks. This report also includes the peak-calling pipeline using the iSeg algorithm to segment positive and negative peaks from the DNS-seq difference profiles. The data repository for the sequence data is the NCBI SRA, BioProject Accession 8., Grant Number: R01 GM126558, Publication Note: This NIH-funded author manuscript originally appeared in PubMed Central at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6117953.
Coordinated Upregulation of Mitochondrial Biogenesis and Autophagy in Breast Cancer Cells
Coordinated Upregulation of Mitochondrial Biogenesis and Autophagy in Breast Cancer Cells
Overactive mitochondrial fission was shown to promote cell transformation and tumor growth. It remains elusive how mitochondrial quality is regulated in such conditions. Here, we show that upregulation of mitochondrial fission protein, dynamin related protein-1 (Drp1), was accompanied with increased mitochondrial biogenesis markers (PGC1 alpha, NRF1, and Tfam) in breast cancer cells. However, mitochondrial number was reduced, which was associated with lower mitochondrial oxidative capacity in breast cancer cells. This contrast might be owing to enhanced mitochondrial turnover through autophagy, because an increased population of autophagic vacuoles engulfing mitochondria was observed in the cancer cells. Consistently, BNIP3 (a mitochondrial autophagy marker) and autophagic flux were significantly upregulated, indicative of augmented mitochondrial autophagy (mitophagy). The upregulation of Drp1 and BNIP3 was also observed in vivo (human breast carcinomas). Importantly, inhibition of Drp1 significantly suppressed mitochondrial autophagy, metabolic reprogramming, and cancer cell viability. Together, this study reveals coordinated increase of mitochondrial biogenesis and mitophagy in which Drp1 plays a central role regulating breast cancer cell metabolism and survival. Given the emerging evidence of PGC1 alpha contributing to tumor growth, it will be of critical importance to target both mitochondrial biogenesis and mitophagy for effective cancer therapeutics., Keywords: bnip3, fission, fusion, growth, in-situ, Liver, metabolic-regulation, mitophagy, Progression, skeletal-muscle, Publication Note: The publisher’s version of record is available at http://www.dx.doi.org/10.1155/2016/4085727
Coordinated Upregulation of Mitochondrial Biogenesis and Autophagy in Breast Cancer Cells
Coordinated Upregulation of Mitochondrial Biogenesis and Autophagy in Breast Cancer Cells
Overactive mitochondrial fission was shown to promote cell transformation and tumor growth. It remains elusive how mitochondrial quality is regulated in such conditions. Here, we show that upregulation of mitochondrial fission protein, dynamin related protein-1 (Drp1), was accompanied with increased mitochondrial biogenesis markers (PGC1, NRF1, and Tfam) in breast cancer cells. However, mitochondrial number was reduced, which was associated with lower mitochondrial oxidative capacity in breast cancer cells. This contrast might be owing to enhanced mitochondrial turnover through autophagy, because an increased population of autophagic vacuoles engulfing mitochondria was observed in the cancer cells. Consistently, BNIP3 (a mitochondrial autophagy marker) and autophagic flux were significantly upregulated, indicative of augmented mitochondrial autophagy (mitophagy). The upregulation of Drp1 and BNIP3 was also observed in vivo (human breast carcinomas). Importantly, inhibition of Drp1 significantly suppressed mitochondrial autophagy, metabolic reprogramming, and cancer cell viability. Together, this study reveals coordinated increase of mitochondrial biogenesis and mitophagy in which Drp1 plays a central role regulating breast cancer cell metabolism and survival. Given the emerging evidence of PGC1 contributing to tumor growth, it will be of critical importance to target both mitochondrial biogenesis and mitophagy for effective cancer therapeutics., Publication Note: This NIH-funded author manuscript originally appeared in PubMed Central at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5056295.
Distance-Guided Forward and Backward Chain-Growth Monte Carlo Method for Conformational Sampling and Structural Prediction of Antibody CDR-H3 Loops.
Distance-Guided Forward and Backward Chain-Growth Monte Carlo Method for Conformational Sampling and Structural Prediction of Antibody CDR-H3 Loops.
Antibodies recognize antigens through the complementary determining regions (CDR) formed by six-loop hypervariable regions crucial for the diversity of antigen specificities. Among the six CDR loops, the H3 loop is the most challenging to predict because of its much higher variation in sequence length and identity, resulting in much larger and complex structural space, compared to the other five loops. We developed a novel method based on a chain-growth sequential Monte Carlo method, called distance-guided sequential chain-growth Monte Carlo for H3 loops (DiSGro-H3). The new method samples protein chains in both forward and backward directions. It can efficiently generate low energy, near-native H3 loop structures using the conformation types predicted from the sequences of H3 loops. DiSGro-H3 performs significantly better than another ab initio method, RosettaAntibody, in both sampling and prediction, while taking less computational time. It performs comparably to template-based methods. As an ab initio method, DiSGro-H3 offers satisfactory accuracy while being able to predict any H3 loops without templates., Grant Number: R01 GM079804, R01 GM115442, Publication Note: This NIH-funded author manuscript originally appeared in PubMed Central at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5565776.
Distinct tissue-specific transcriptional regulation revealed by gene regulatory networks in maize.
Distinct tissue-specific transcriptional regulation revealed by gene regulatory networks in maize.
Transcription factors (TFs) are proteins that can bind to DNA sequences and regulate gene expression. Many TFs are master regulators in cells that contribute to tissue-specific and cell-type-specific gene expression patterns in eukaryotes. Maize has been a model organism for over one hundred years, but little is known about its tissue-specific gene regulation through TFs. In this study, we used a network approach to elucidate gene regulatory networks (GRNs) in four tissues (leaf, root, SAM and seed) in maize. We utilized GENIE3, a machine-learning algorithm combined with large quantity of RNA-Seq expression data to construct four tissue-specific GRNs. Unlike some other techniques, this approach is not limited by high-quality Position Weighed Matrix (PWM), and can therefore predict GRNs for over 2000 TFs in maize. Although many TFs were expressed across multiple tissues, a multi-tiered analysis predicted tissue-specific regulatory functions for many transcription factors. Some well-studied TFs emerged within the four tissue-specific GRNs, and the GRN predictions matched expectations based upon published results for many of these examples. Our GRNs were also validated by ChIP-Seq datasets (KN1, FEA4 and O2). Key TFs were identified for each tissue and matched expectations for key regulators in each tissue, including GO enrichment and identity with known regulatory factors for that tissue. We also found functional modules in each network by clustering analysis with the MCL algorithm. By combining publicly available genome-wide expression data and network analysis, we can uncover GRNs at tissue-level resolution in maize. Since ChIP-Seq and PWMs are still limited in several model organisms, our study provides a uniform platform that can be adapted to any species with genome-wide expression data to construct GRNs. We also present a publicly available database, maize tissue-specific GRN (mGRN, https://www.bio.fsu.edu/mcginnislab/mgrn/ ), for easy querying. All source code and data are available at Github ( https://github.com/timedreamer/maize_tissue-specific_GRN )., Keywords: Bioinformatics, Database, Gene expression, Machine learning, Maize, Network, Systems biology, Transcription factor, Transcriptional regulation, Grant Number: 035919, Publication Note: This NIH-funded author manuscript originally appeared in PubMed Central at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6040155.
Efficient Computation of Reduced Regression Models.
Efficient Computation of Reduced Regression Models.
We consider settings where it is of interest to fit and assess regression submodels that arise as various explanatory variables are excluded from a larger regression model. The larger model is referred to as the full model; the submodels are the reduced models. We show that a computationally efficient approximation to the regression estimates under any reduced model can be obtained from a simple weighted least squares (WLS) approach based on the estimated regression parameters and covariance matrix from the full model. This WLS approach can be considered an extension to unbiased estimating equations of a first-order Taylor series approach proposed by Lawless and Singhal. Using data from the 2010 Nationwide Inpatient Sample (NIS), a 20% weighted, stratified, cluster sample of approximately 8 million hospital stays from approximately 1000 hospitals, we illustrate the WLS approach when fitting interval censored regression models to estimate the effect of type of surgery (robotic versus nonrobotic surgery) on hospital length-of-stay while adjusting for three sets of covariates: patient-level characteristics, hospital characteristics, and zip-code level characteristics. Ordinarily, standard fitting of the reduced models to the NIS data takes approximately 10 hours; using the proposed WLS approach, the reduced models take seconds to fit., Keywords: Complementary log–log regression, Weighted estimating equations, Weighted least squares, C survey, Grant Number: R01 CA069222, R01 CA160679, Publication Note: This NIH-funded author manuscript originally appeared in PubMed Central at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5664962.
Elevated Resistin Gene Expression in African American Estrogen and Progesterone Receptor Negative Breast Cancer.
Elevated Resistin Gene Expression in African American Estrogen and Progesterone Receptor Negative Breast Cancer.
African American (AA) women diagnosed with breast cancer are more likely to have aggressive subtypes. Investigating differentially expressed genes between patient populations may help explain racial health disparities. Resistin, one such gene, is linked to inflammation, obesity, and breast cancer risk. Previous studies indicated that resistin expression is higher in serum and tissue of AA breast cancer patients compared to Caucasian American (CA) patients. However, resistin expression levels have not been compared between AA and CA patients in a stage- and subtype-specific context. Breast cancer prognosis and treatments vary by subtype. This work investigates differential resistin gene expression in human breast cancer tissues of specific stages, receptor subtypes, and menopause statuses in AA and CA women. Differential gene expression analysis was performed using human breast cancer gene expression data from The Cancer Genome Atlas. We performed inter-race resistin gene expression level comparisons looking at receptor status and stage-specific data between AA and CA samples. DESeq was run to test for differentially expressed resistin values. Resistin RNA was higher in AA women overall, with highest values in receptor negative subtypes. Estrogen-, progesterone-, and human epidermal growth factor receptor 2- negative groups showed statistically significant elevated resistin levels in Stage I and II AA women compared to CA women. In inter-racial comparisons, AA women had significantly higher levels of resistin regardless of menopause status. In whole population comparisons, resistin expression was higher among Stage I and III estrogen receptor negative cases. In comparisons of molecular subtypes, resistin levels were significant higher in triple negative than in luminal A breast cancer. Resistin gene expression levels were significantly higher in receptor negative subtypes, especially estrogen receptor negative cases in AA women. Resistin may serve as an early breast cancer biomarker and possible therapeutic target for AA breast cancer., Publication Note: This NIH-funded author manuscript originally appeared in PubMed Central at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4912107.

Pages