Theodore Pak's LabTheodore Pak's Lab
Theodore Pak's LabTheodore Pak's Lab

Software

Whenever possible, we release software for our publications on the lab’s GitHub.

Notable software releases include:

GitHub repo

An information extraction pipeline that specializes in running large language models across many clinical notes to abstract new variables. Written as a Python package; easiest to deploy using Docker.

PyPI - Version PyPI - Python Versions CI

Pak TR, Kanjilal S, McKenna CS, Hoffner-Heinike A, Rhee C, Klompas M. Syndromic analysis of sepsis cohorts using large language models. JAMA Netw Open. 2025;8(10):e2539267. doi:10.1001/jamanetworkopen.2025.39267.

GitHub repo GitHub repo

An open-source bioinformatics pipeline that turns pathogen genome sequences sampled from patients in a healthcare environment into interactive visualizations of probable transmission scenarios. It has been used to characterize an under-the-radar outbreak of MRSA affecting 16 patients and spreading over 5 hospital wards:

Berbel Caban A,* Pak TR,* Obla A, Dupper A, Chacko KI, Fox L, Mills A, Ciferri B, Oussenko I, Beckford C, Chung M, Sebra R, Smith M, Connolly S, Patel G, Kasarskis A, Sullivan MJ, Altman DR, van Bakel H. PathoSPOT genomic epidemiology reveals under-the-radar nosocomial outbreaks. Genome Med. 2020 Nov 16;12(1):96. doi:10.1186/s13073-020-00798-3

And was also used to unravel an outbreak of influenza A that started in one emergency department and ultimately involved 43 healthcare workers, 17 inpatients, and 6 other individuals across two hospitals:

Javaid W, Ehni J, Gonzalez-Reiche AS, Carreno JM, Hirsch E, Tan J, Khan Z, Kriti D, Ly T, Kranitzky B, Barnett B, Cera F, Prespa L, Moss M, Albrecht RA, Mustafa A, Herbison I, Hernandez MM, Pak TR, Alshammary H, Sebra R, Smith M, Krammer F, Gitman M, Sordillo EM, Simon V, van Bakel H. Real-Time investigation of a large nosocomial influenza A outbreak informed by genomic epidemiology. Clin Infect Dis. 2020 Nov 30;ciaa1781. doi:10.1093/cid/ciaa1781