Introduction: Videourodynamics(VUDS) provides functional insight into bladder storage and micturition, particularly in children with neurogenic bladder who are at risk of upper tract injury. These studies can serve as drivers for both medical and surgical decision-making, as well as intervention escalation. However, VUDS interpretation is at threat of both intra- and inter-observer variability. Efforts exist to improve standardization, but it is unclear the extent to which surgeons’ interpretations vary. In this study, we evaluated variability in interpretation of upper tract deterioration secondary to bladder dysfunction in a sample of children with spina bifida. Methods: A total of 303 VUDS studies were evaluated from 256 unique patients between 2 months to 28 years of age. An initial pilot study of 10 VUDS studies were assessed by 5 fellowship trained urologists to align on risk stratification criteria of 3 graded categories(mild, moderate, or severe). Criteria considered to assess upper tract deterioration included bladder compliance, elevated detrusor leak point pressure, and detrusor sphincter dyssynergia from pressure volume tracings and fluoroscopy images. Subsequently, 303 VUDS studies were randomized to 5 reviewers such that each study received a minimum of 3 reviews. Intraclass correlation coefficients were used to reliability of average and single raters. Reviewers also reassessed a portion(range: 5-16%) of same studies within 6 months of their first review. Percent agreement was determined for each surgeon between initial and repeat reviews, as well as initial review and group consensus. Results: Among the 303 VUDS studies, single and average raters’ intraclass correlation (ICC) was 0.67(95% CI: 0.62-0.71) and 0.91(95% CI: 0.89-0.92), respectively. At least one reviewer showed disagreement with other reviewers in 37.3% (113/303) of cases. There was a sole case(0.3%, 1/303) in which 3 different risk scores were assigned to the same patient. Among the sample of studies that were reassessed for intrarater reliability, median agreement of a single urologist with their previous risk score was 76.25%(IQR: 68.75-78.525). Median agreement of a single urologist with group consensus was 80.4%(IQR: 75-81.6). Conclusions: Class reliability trended upwards with raters average compared to a single rater, showing moderate and and good reliability respectively. Single rater inter- and intra-rater variation may compound across serial urodynamic evaluation in clinical and research uses of VUDS to evaluate upper tract deterioration. SOURCE OF Funding: NA