Director, Gynecologic Oncology Laboratory, Brigham and Women's Hospital, Massachusetts, United States
Background Machine learning has opened new opportunities for biomarker discovery, but this analytic approach requires very large datasets for proper model construction. Additionally, unbiased discovery necessitates consideration of a large number of input variables from each sample, while minimizing required sample volumes. All of these issues must be considered to avoid constraining model development due to insufficient power. We present our experience designing a high throughput analytic pipeline for evaluation of circulating microRNAs as novel biomarkers for diagnosis of women’s cancers.
Methods The Mass General Brigham BioBank was interrogated using the Research Patient Data Registry, a centralized clinical data registry/warehouse, for participants with known gynecologic histories, at least three years of clinical follow-up data, available data from the electronic health record for query, and known genotype profiles. Barcoded, deidentified samples were transferred to the Gynecologic Oncology Laboratory at Brigham and Women’s Hospital. Working with Hamilton Storage, we designed a high-capacity automated sample storage, retrieval, and processing pipeline for aliquoting and analysis of the study samples. To measure the miRNAs, we worked with Abcam, Inc to design a custom panel of 180 microRNA probes using the Fireplex particle technology, a porous bio-inert hydrogel that allows target capture throughout the 3D volume with spatial resolution to cover 68 distinct microRNAs per 50 ul sample well. The Fireplex® platform was adapted to a Hamilton Starlet® liquid handling robot by adding a positive pressure vacuum manifold to the deck to automate plate processing.
Results A cohort of 10,000 study subjects was assembled from the Mass General Brigham BioBank. Samples were subaliquoted into low profile 0.6 ul 2D barcoded Hamilton tubes and placed into barcoded racks for storage in a Hamilton SamHD® automated storage system. After retrieval of samples, tubes were scanned, inventoried, and decapped using a Hamilton LabElite ID® capper/decapper, then placed onto the Starlet® liquid handling robot. The Fireplex® panel was designed to be distributed over 3 sets of plates. With full implementation of the system, we are now completing circulating miRNA profiles of 200 study subjects per week, which comprises 36,000 separate data points, and we will complete the 10,000-patient cohort in just over a year.
Conclusions A coordinated approach to sample identification, inventory management, and high dimensionality analysis is required to use machine learning approaches to biomarker discovery. Automating each step of the process increases efficiency while allowing accurate sample tracking and maximization of each sample.