Introduction: Development of intelligence systems for bladder tumor detection is cost- and labor-intensive. Specifically, image annotation is one of the most expensive tasks in the development of intelligence systems. Moreover, previous studies utilized selective screenshots or frame sequences for model development and validation, despite that cystoscopy is a dynamic visual inspection impacted by random noises. The current work proposes an efficient strategy to develop augmented intelligence strategy ready for real-time bladder tumor detection as computer-aided assistance tool for clinicians. Methods: We used a previously published educational cystoscopy atlas (n=312 images) and our deep learning models (ConvNeXt, PlexusNet, MobileNet, SwinTransformer) covering a variety of model complexity and computation efficacy to estimate the ratio between cancer and normal confidence scores; We applied an image augmentation strategy called RandAugment to populate the training set for model training and externally validated on video records for the initial diagnostic cystoscopy prior to TURBT from 68 cases with benign and bladder cancers tumors (i.e., region of interest, ROI) at a single center. Each frame of the video was labeled by ROI status. ROI was confirmed by pathology examination and the Delphi method. The ROI status was predicted based on the ratio (if the ratio >1, then the frame is positive for ROI, otherwise negative). On external validation, areas of adequate illumination in video frames were considered as input for models. The performance measurement included specificity and sensitivity at frame level, frame sequence (block) level, and ROI level for each case. The block level split each full-length video into small segments according to the ROI status. Results: Specificity was statistically comparable between four models at frame (range: 30.0–44.8%) and block levels (56–67%) by P>0.05. While sensitivity at frame level (range: 81.4 – 88.1) statistically differed between the models, sensitivity at block level (100%) and ROI level (100%) were comparable between these models, indicating that all ROI are detectable by these models and that the frame-level model performance is impacted by the random noises. MobileNet and PlexusNet were computationally more efficient (22 and 19 frames per second, retrospectively) and suitable for real-time detection task than ConvNeXt and SwinTransformer (13 and 17 frames per second, retrospectively). Conclusions: Educational cystoscopy atlas and cost-effective model development strategy facilitates the development of accurate and efficient intelligence systems with potential use for real-time bladder tumor detection. SOURCE OF Funding: The work was supported by National Institutes of Health R01 CA260426 to JCL.