Some of the material in is restricted to members of the community. By logging in, you may be able to gain additional access to certain collections or items. If you have questions about access or logging in, please use the form on the Contact Page.
Next generation sequencing can rapidly analyze entire genomes in just hours. However, due to the nature of the sequencing process, errors may arise which limit the accuracy of the reads obtained. Luckily, modern sequencing technologies associate with their reads, a quality score, derived from the sequencing procedures, which represents our confidence in each nucleotide in the sequence. Currently, these quality scores are used as a criteria for the removal or modification of reads in the data set. These methods result in the loss of information contained in those sequences and rely on parameters that are somewhat arbitrary; this may lead to a biased sample and inaccurate analyses. I propose an alternative method for incorporating the error of the sequences without discarding poor quality reads by including the error probabilities of the reads in the likelihood calculations used for sequence analysis. It was found that, despite introducing variability, using the error-informed likelihood method improved analyses compared with those which ignored the error altogether. While this method will likely result in analyses with less definite results compared with those in which the data was treated with a preprocessing technique, these results will utilize all of the provided data and will be more grounded in reality as we take into account the uncertainty that we have in our sequenced samples.