TY - JOUR
T1 - A vision transformer for decoding surgeon activity from surgical videos
AU - Kiyasseh, Dani
AU - Ma, Runzhuo
AU - Haque, Taseen F.
AU - Miles, Brian J.
AU - Wagner, Christian
AU - Donoho, Daniel A.
AU - Anandkumar, Animashree
AU - Hung, Andrew J.
N1 - Funding Information:
We are grateful to T. Chu for the annotation of videos with gestures. We also thank J. Laca and J. Nguyen for early feedback on the presentation of the manuscript. A.J.H. discloses support for the research described in this study from the National Cancer Institute under award no. R01CA251579-01A1 and a multi-year Intuitive Surgical Clinical Research Grant.
Publisher Copyright:
© 2023, The Author(s).
PY - 2023
Y1 - 2023
N2 - The intraoperative activity of a surgeon has substantial impact on postoperative outcomes. However, for most surgical procedures, the details of intraoperative surgical actions, which can vary widely, are not well understood. Here we report a machine learning system leveraging a vision transformer and supervised contrastive learning for the decoding of elements of intraoperative surgical activity from videos commonly collected during robotic surgeries. The system accurately identified surgical steps, actions performed by the surgeon, the quality of these actions and the relative contribution of individual video frames to the decoding of the actions. Through extensive testing on data from three different hospitals located in two different continents, we show that the system generalizes across videos, surgeons, hospitals and surgical procedures, and that it can provide information on surgical gestures and skills from unannotated videos. Decoding intraoperative activity via accurate machine learning systems could be used to provide surgeons with feedback on their operating skills, and may allow for the identification of optimal surgical behaviour and for the study of relationships between intraoperative factors and postoperative outcomes.
AB - The intraoperative activity of a surgeon has substantial impact on postoperative outcomes. However, for most surgical procedures, the details of intraoperative surgical actions, which can vary widely, are not well understood. Here we report a machine learning system leveraging a vision transformer and supervised contrastive learning for the decoding of elements of intraoperative surgical activity from videos commonly collected during robotic surgeries. The system accurately identified surgical steps, actions performed by the surgeon, the quality of these actions and the relative contribution of individual video frames to the decoding of the actions. Through extensive testing on data from three different hospitals located in two different continents, we show that the system generalizes across videos, surgeons, hospitals and surgical procedures, and that it can provide information on surgical gestures and skills from unannotated videos. Decoding intraoperative activity via accurate machine learning systems could be used to provide surgeons with feedback on their operating skills, and may allow for the identification of optimal surgical behaviour and for the study of relationships between intraoperative factors and postoperative outcomes.
UR - http://www.scopus.com/inward/record.url?scp=85151367335&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85151367335&partnerID=8YFLogxK
U2 - 10.1038/s41551-023-01010-8
DO - 10.1038/s41551-023-01010-8
M3 - Article
C2 - 36997732
AN - SCOPUS:85151367335
JO - Nature Biomedical Engineering
JF - Nature Biomedical Engineering
SN - 2157-846X
ER -