2 Comments
User's avatar
Austin Morrissey's avatar

what is the computational burden from an analysis of this size? How long does it take, and on what type of hardware?

Expand full comment
Ben Miller's avatar

We deploy workloads to AWS and GCP. Converting a 1.5B read BCL file into ~800M row binding count job takes about two hours with our current parallelization configuration, but it could go faster. Behind that we use BigQuery to store the data for QC analysis and hitcalling. Generating hit calling jobs takes about 10 minutes per target from there, assuming 6-10 BCJs per HCJ. Lastly we load plots in our browsers using Dash and Streamlit apps which can take another 10 minutes per target to review all the various plots and write up a report. The key is being smart about when and where to pre-calculate and when to cache data as well as how to visualize it in browser.

Expand full comment