Comparative validation of machine learning algorithms for surgical workflow and skill analysis with the HeiChole benchmark

Comparative validation of machine learning algorithms for surgical workflow and skill analysis with the HeiChole benchmark https://opara.zih.tu-dresden.de/xmlui/handle/123456789/5960 Purpose Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center video dataset. In this work we investigated the generalizability of phase recognition algorithms in a multicenter setting including more difficult recognition tasks such as surgical action and surgical skill. Methods To achieve this goal, a dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 h was created. Labels included framewise annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 international Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. Here, 12 research teams trained and submitted their machine learning algorithms for recognition of phase, action, instrument and/or skill assessment. Results F1-scores were achieved for phase recognition between 23.9% and 67.7% (n = 9 teams), for instrument presence detection between 38.5% and 63.8% (n = 8 teams), but for action recognition only between 21.8% and 23.3% (n = 5 teams). The average absolute error for skill assessment was 0.78 (n = 1 team). Conclusion Surgical workflow and skill analysis are promising technologies to support the surgical team, but there is still room for improvement, as shown by our comparison of machine learning algorithms. This novel HeiChole benchmark can be used for comparable evaluation and validation of future work. In future studies, it is of utmost importance to create more open, high-quality datasets in order to allow the development of artificial intelligence and cognitive robotics in surgery. 2026-05-18T10:46:06Z Comparative validation of machine learning algorithms for surgical workflow and skill analysis with the HeiChole benchmark https://opara.zih.tu-dresden.de/xmlui/handle/123456789/6085 Comparative validation of machine learning algorithms for surgical workflow and skill analysis with the HeiChole benchmark van der Linden, Lize Mari; Wagner, Martin; Bodenstedt, Sebastian; Speidel, Stefanie The data consists of endoscopic videos from general surgery operating rooms. The data was obtained during laparoscopic surgeries at the University Hospital of Heidelberg and its affiliate hospitals, forming a joint center of excellence for minimally invasive surgery. All surgeries were annotated framewise for surgical phases by surgical experts. Furthermore certain surgical actions, instrument usage and surgical skill levels were annotated. The surgeries recorded are laparoscopic gallbladder removals (cholecystectomy). The dataset consists out of at least 30 different recorded surgeries from three hospitals. For each surgery, the video captured by the endoscope is provided. To ensure anonymity, frames corresponding to extra-abdominal views are censored by entirely white (RGB 255 255 255) frames. The data will be released in three different sets: 2 training sets (the first set containing at least 12 videos and the second set containing at least 12 videos), which include framewise annotation of surgical phase, instrument usage, actions of surgeon and assistant as well as surgical skill. A testing set consisting of at least 6 videos will be provided. 2023-01-01T00:00:00Z