Software

PARSE specializes in developing high-performance software that transforms complex analytical methodologies into accessible, high-performance tools for healthcare research. Our expert team of R and Python developers builds custom software packages that implement cutting-edge statistical and machine learning approaches with robust architecture, comprehensive documentation, and rigorous testing. We excel in creating interactive Shiny applications that deliver powerful data visualization capabilities through intuitive interfaces, enabling researchers without programming expertise to leverage sophisticated analytical models.

Technology Stack

R
Shiny
Python
JavaScript
PostgreSQL
GitHub
Docker
Amazon AWS

Development Support

CELEHS : PARSE serves as the primary software development partner for Harvard Medical School's Translational Data Science Center for a Learning Health System (CELEHS). We architect, develop, and maintain a suite of packages and tools that implement CELEHS's novel statistical methodologies and machine learning algorithms, ensuring computational efficiency, reproducibility, and interoperability. Our team transforms theoretical and methodological frameworks into high-performance software tools that advance CELEHS's mission of democratizing access to cutting-edge health data science capabilities.

CIPHER : PARSE engineers the frontend Shiny user interfaces that power the VA's Centralized Interactive Phenomics Resource (CIPHER). Our custom data visualization tools enable researchers to efficiently navigate, query, and analyze CIPHER's extensive EHR-based phenotype knowledgebase. These web applications feature optimized data pipelines, interactive visualizations, and responsive designs that support CIPHER's goal of accelerating health data innovation through integrated knowledge sharing.

Interactive Applications

PARSE has developed a suite of interactive applications that empower researchers with user-friendly interfaces to complex analytical tools. Our featured applications are listed below, with additional visualization tools available through the CIPHER Data Visualization Tools.

ARCH

A large-scale knowledge graph system that analyzes codified and narrative electronic health records to generate semantic representations of clinical concepts.

Citation
ARRTLE

A privacy-preserving algorithm that learns time-to-event prediction models by aggregating information from multiple healthcare centers without sharing patient-level data.

Citation
MUGS

A multi-view network visualization tool that enables comparison of clinical concepts across different healthcare systems through interactive network analysis.

Citation
KOMAP

A multimodal automated phenotyping system that trains algorithms using summary statistics rather than patient-level data to enable privacy-preserving multi-center collaborations.

Citation
ONCE

A feature search engine that identifies relevant clinical concepts for disease phenotyping by leveraging knowledge graphs, medical literature, and EHR data.

Citation

Packages and Tools

PheCAP

A comprehensive R framework that implements surrogate-assisted feature extraction with machine learning to train and validate phenotyping models from electronic health records.

Citation
PheNorm

An unsupervised algorithm that combines and normalizes EHR features to create accurate disease classification models without requiring manual chart review.

Citation
MAP

An automated phenotyping method that integrates ICD codes and NLP-extracted narrative data to yield patient phenotype probabilities using ensemble latent mixture models.

Citation
sureLDA

A multi-disease automated phenotyping method that simultaneously models multiple conditions to improve efficiency and accuracy in large-scale clinical data analysis.

Citation
OptimalSurrogate

A model-free approach that quantifies the proportion of treatment effect explained by surrogate markers for evaluating potential biomarkers in clinical research.

Citation
PanelCurrentStatus

A statistical framework for developing risk prediction models with panel current status data to analyze time-to-event outcomes in longitudinal health records.

Citation
SCORNET

A semi-supervised survival curve estimator optimized for efficient use of Electronic Health Record data with limited current status labels.

Citation
SRAT

A rank-based approach to test associations between genetic variants and outcomes while accounting for within-family correlation with minimal assumptions about distributions.

Citation
LATTE

A label-efficient algorithm that accurately determines the timing of clinical events from longitudinal electronic health records by leveraging pre-trained semantic embeddings and visit attention learning.

Citation