Some of the material in is restricted to members of the community. By logging in, you may be able to gain additional access to certain collections or items. If you have questions about access or logging in, please use the form on the Contact Page.
Software systems contain an immense amount of information captured in a variety of documents such as source code files, user documentation, use and test cases, bug reports, and system requirements among others. Relationships between these pieces of information -- called traceability links -- provide stakeholders broader knowledge about a system's constituent pieces and support many aspects of the software's development, maintenance, and evolution. Ideally, traceability links would be documented as software artifacts are produced. For instance, as they work, developers would document which test cases exercise which code segments or which code classes implement which use cases. However, this is typically not the case. Due to organizational issues such as tight timelines for product delivery and lack of buy-in by project managers, software traceability is often a secondary concern. To address this situation and improve traceability for a system post hoc, stakeholders can perform Traceability Link Recovery (TLR). TLR is a software engineering task that fills in missing traceability information by establishing (i.e., recovering) links between related artifacts. Through this process, software traceability can be promoted to naturally support various tasks such as program comprehension, concept localization, verifying test coverage, and ensuring that system and legal requirements are met. Unfortunately, performing TLR manually is an extremely time and resource intensive task. Therefore, even though prior work suggests it directly improves software maintenance and evolution, few systems have sufficient traceability to realize these benefits. The few that do are mainly safety-critical and have tight regulatory requirements where traceability is legally required for quality assurance to mitigate risk. First, we seek to reduce the cost of establishing traceability links through TLR by improving automatic approaches to it based on artifact similarity. Second, we seek to reduce the cost of maintaining existing traceability information by applying supervised machine learning. This technique mines statistical patterns from historical traceability information to build a predictive model that infers artifact relationships without the need for a human operator. As a result, software teams are able to realize the hitherto cost prohibitive benefits of traceability even for projects where there is no legal requirement for traceability to exist.