Whenever possible, we release software for our publications on the lab’s GitHub.
Notable software releases include:
LLaCIE
An information extraction pipeline that specializes in running large language models across many clinical notes to abstract new variables. Written as a Python package; easiest to deploy using Docker.
Pak TR, Kanjilal S, McKenna CS, Hoffner-Heinike A, Rhee C, Klompas M. Syndromic analysis of sepsis cohorts using large language models. JAMA Netw Open. 2025;8(10):e2539267. doi:10.1001/jamanetworkopen.2025.39267.
PathoSPOT
An open-source bioinformatics pipeline that turns pathogen genome sequences sampled from patients in a healthcare environment into interactive visualizations of probable transmission scenarios. It has been used to characterize an under-the-radar outbreak of MRSA affecting 16 patients and spreading over 5 hospital wards:
Berbel Caban A,* Pak TR,* Obla A, Dupper A, Chacko KI, Fox L, Mills A, Ciferri B, Oussenko I, Beckford C, Chung M, Sebra R, Smith M, Connolly S, Patel G, Kasarskis A, Sullivan MJ, Altman DR, van Bakel H. PathoSPOT genomic epidemiology reveals under-the-radar nosocomial outbreaks. Genome Med. 2020 Nov 16;12(1):96. doi:10.1186/s13073-020-00798-3
And was also used to unravel an outbreak of influenza A that started in one emergency department and ultimately involved 43 healthcare workers, 17 inpatients, and 6 other individuals across two hospitals:
Javaid W, Ehni J, Gonzalez-Reiche AS, Carreno JM, Hirsch E, Tan J, Khan Z, Kriti D, Ly T, Kranitzky B, Barnett B, Cera F, Prespa L, Moss M, Albrecht RA, Mustafa A, Herbison I, Hernandez MM, Pak TR, Alshammary H, Sebra R, Smith M, Krammer F, Gitman M, Sordillo EM, Simon V, van Bakel H. Real-Time investigation of a large nosocomial influenza A outbreak informed by genomic epidemiology. Clin Infect Dis. 2020 Nov 30;ciaa1781. doi:10.1093/cid/ciaa1781