Annotated and Grouped Publication List – Murphy Group – January 30, 2017

Subjects

· Generative Models of Subcellular Organization

· Active Learning for Experimental Biology and Drug Development

· Subcellular Pattern Unmixing

· Intelligent Acquisition for Fluorescence Microscopy

· Subcellular Pattern Analysis

o Tissues

o Cells

o Yeast

· Learning Subcellular Sorting Pathways

· Content-based Image Retrieval

· Reviews and Commentaries

Generative Models of Subcellular Organization

Y. Li, T. D. Majarian, A. W. Naik, G. R. Johnson, and R. F. Murphy (2016) Point process models for localization and interdependence of punctate cellular structures. Cytometry Part A 89:633-643

This paper builds on the Johnson et al study of punctate patterns by constructing point process models using different subcellular components as references. It shows that models built on cell and nuclear geometry and microtubule distribution are not improved by adding information on the spatial distribution of the endoplasmic reticulum.

K. T. Roybal, T. E. Buck, X. Ruan, B. H. Cho, D. J. Clark, R. Ambler, H. M. Tunbridge, J. Zhang, P. Verkade, C. Wülfing, and R. F. Murphy (2016) Computational spatiotemporal analysis identifies WAVE2 and Cofilin as joint regulators of costimulation-mediated T cell actin dynamics. Science Signaling 9:rs3.

This close collaboration with Christoph Wülfing’s group involved constructing 4D “spatiotemporal maps” of the concentration of actin and eight of its regulators in T cells as they undergo immunological synapse formation. This was done by morphing each cell in each frame of thousands of movies into a standardized template.

R. M. Donovan, J.-J. Tapia, D. P. Sullivan, J. R. Faeder , R. F. Murphy , M. Dittrich, D. M. Zuckerman (2016) Unbiased Rare Event Sampling in Spatial Stochastic Systems Biology Models Using A Weighted Ensemble Of Trajectories. PLoS Computational Biology 12(2):e1004611.

This paper is the result of a collaboration between investigators in the National Center for Multiscale Modeling of Biological Systems. It describes using various cell geometries, including those generated by CellOrganizer, to efficiently carry out spatially accurate simulations of cell biochemistry.

G. R. Johnson, J. Li, A. Shariff, G.K.Rohde, and R.F. Murphy (2015) Automated Learning of Subcellular Pattern Variation among Punctate Proteins and of a Generative Model of their Distributions in Relation to Microtubules. PLoS Computational Biology 11(12): e1004614.

This paper uses images from the Human Protein Atlas to construct generative models of the relationships between eleven punctate structures and microtubules and shows that they can be used to accurately distinguish the eleven patterns in three cell lines.

G. R. Johnson, T. E. Buck, D. P. Sullivan, G. K. Rohde and R. F. Murphy (2015) Joint Modeling of Cell and Nuclear Shape Variation. Mol. Biol. Cell, 26:4046-4056.

This paper provides the first statistical evidence for a relationship between cell and nuclear shape, and shows that this relationship can be altered by gene alteration or drug addition. We also construct the first joint generative model of the dynamics of cell and nuclear shape.

T.E. Buck, J. Li, G.K. Rohde, and R.F. Murphy (2012) Towards the virtual cell: Automated approaches to building models of subcellular organization 'learned' from microscopy images. Bioessays 34:791-799.

J. Li, A. Shariff, M. Wiking, E. Lundberg, G.K. Rohde and R.F. Murphy(2012) Estimating microtubule distributions from 2D immunofluorescence microscopy images reveals differences among human cultured cell lines. PLoS ONE 7:e0050292.

This paper builds generative models of microtubule patterns from 2D images for different cultured cell lines using images from the Human Protein Atlas and compares them.

R. F. Murphy (2012) CellOrganizer: Image-derived Models of Subcellular Organization and Protein Distribution. Methods in Cell Biology 110: 179-193.

R. F. Murphy (2011) An active role for machine learning in drug development. Nature Chemical Biology 7:327-330.

T. Peng and R.F. Murphy (2011) Image-derived, Three-dimensional Generative Models of Cellular Organization. Cytometry Part A 79A:383-391.

This paper describes extension of the initial 2D models of Zhao and Murphy (2007) to 3D.

A. Shariff, R.F. Murphy, and G. Rohde (2011) Automated Estimation of Microtubule Model Parameters from 3-D Live Cell Microscopy Images. Proceedings of the 2011 IEEE International Symposium on Biomedical Imaging (ISBI 2011), pp. 1330-1333.

This paper describes modification of the microtubule model described below in order to allow for estimation of free tubulin, and applies the model to images of cells treated with and without nocodazole to depolymerize microtubules. The results are consistent with expectation.

R. F. Murphy (2010) Communicating Subcellular Distributions. Cytometry Part A 77A:686-692.

This review provides a perspective on methods for estimating pattern fractions and learning generative models. It addresses the critical problem of representing information learned about subcellular organization for comparison between cell and tissue types and for use in systems simulations.

A. Shariff, G. K. Rohde and R. F. Murphy (2010) A Generative Model of Microtubule Distributions, and Indirect Estimation of its Parameters from Fluorescence Microscopy Images. Cytometry 77A:457-466.

Methods have been described previously for learning models of cell organization from microscope images in order to be able to synthesize and combine subcellular distributions. These methods involve direct estimation of the model parameters but for some subcellular patterns (such as those of microtubules or microfilaments), direct estimation is difficult due to large numbers of tangled fibers. We describe the first method for indirectly learning a microtubule model and show that it produces results consistent with current knowledge.

T. Peng, Wei Wang, G. K. Rohde, R. F. Murphy (2009) Instance-Based Generative Biological Shape Modeling. Proceedings of the 2009 IEEE International Symposium on Biomedical Imaging (ISBI 2009), pp. 690-693.

G. K. Rohde, W. Wang, T. Peng, and R.F. Murphy (2008). Deformation-Based Nonlinear Dimension Reduction: Applications To Nuclear Morphometry. Proceedings of the 2008 IEEE International Symposium on Biomedical Imaging (ISBI 2008), pp. 500-503.

G. K. Rohde, A. Ribeiro, K. N. Dahl, and R. F. Murphy (2008). Deformation-based nuclear morphometry: capturing nuclear shape variation in HeLa Cells. Cytometry, 73A:341-350.

T. Zhao and R.F. Murphy (2007). Automated Learning of Generative Models for Subcellular Location: Building Blocks for Systems Biology. Cytometry 71A:978-990.

This was the first paper to describe the construction of generative models of cell architecture directly from microscope images. It constructed models of cell and nuclear shape and vesicular organelle size, shape and position.

Active Learning for Experimental Biology and Drug Development

A.W. Naik, J.D. Kangas, D. P. Sullivan, and R. F. Murphy (2016) Active Machine Learning-driven Experimentation to Determine Compound Effects on Protein Patterns, eLife 5:e10047. doi:10.7554/eLife.10047.

This paper describes the first prospective study to construct a predictive model of multiple drug and target interactions using experiments selected solely under computer control. The experiments were carried out using liquid handling robotics and an automated microscope, and performing only 28% of the experiments led to a model that was 92% accurate at predicting the results of experiments (whether it had done them or not).

M. Temerinac-Ott, A. W. Naik, and R. F. Murphy (2015) Deciding when to stop: Efficient experimentation to learn to predict drug-target interactions. BMC Bioinformatics 16:213 (also selected for oral presentation in the Proceedings track of RECOMB 2015).

This paper uses four existing drug-target interaction datasets to show that accurate models of these interactions can be constructed by active learning without needing to do all experiments. It also provides the first evidence on real datasets that the stopping rule algorithm of the Naik et al paper below can be used to estimate the accuracy of an actively learned model and decide when experimentation can be stopped.

J.D. Kangas, A.W. Naik, and R.F. Murphy (2014) Prediction of Biological Responses Using Protein and Compound Features and their Discovery using Active Learning. BMC Bioinformatics 15:143. doi:10.1186/1471-2105-15-143

This paper describes the design of efficient regression models to predict protein target responses to chemical compounds and shows that active machine learning permits accurate models to be learned without doing all experiments. We use a subset of PubChem data to test our combined approach, and use existing features to describe the similarity among compounds and among protein targets. The results show that 60% of the “hits” in the PubChem data could be discovered while “doing” only 3% of the possible experiments. This approach is complementary to that in the Naik et al. paper below, which handles the case where features are not available or reliable.

A. W. Naik, J. D. Kangas, C. J. Langmead and R. F. Murphy (2013) Efficient Modeling and Active Learning Discovery of Biological Responses. PLoS ONE 8: e83996. doi:10.1371/journal.pone.0083996

This paper characterizes new algorithms for active learning for drug discovery in the absence of compound or target features. The algorithms seek to learn the effects of many compounds on many targets, and address the case in which the effect of a given compound on a given target is represented as one of a number of different categorical phenotypes (rather than just as a score measuring extent of an expected effect). We introduces measures of uniqueness and responsiveness to characterize the nature of a given experimental space, and show in simulated experiments that our active learner shows significant improvement over using random choice and does so for essentially all values of the uniqueness and responsiveness. We also introduce a stopping rule approach for estimating the lower limit of the true accuracy of an actively learned model, permitting decisions to be made about when to stop a campaign of active learning-driven experimentation. Lastly, we show using Connectivity Map data that accurate models of the effects of drugs on gene expression in various cell lines can be constructed without the need to perform experiments for all possible combinations of drugs and cell lines.

R. F. Murphy (2011) An active role for machine learning in drug development. Nature Chemical Biology 7:327-330.

This commentary provides a perspective on two critical areas in which machine learning methods are projected to contribute to drug development and broader experimental biology: building image-drived models of subcellular organization, and using active learning to avoid exhaustive experimentation of large experimental spaces. It contains the first proposal of active learning as a solution to the problem of considering all possible interactions of many drugs and many targets.

Subcellular Pattern Unmixing

R. F. Murphy (2010) Communicating Subcellular Distributions. Cytometry Part A 77A:686-692.

L. P. Coelho, T. Peng, and R. F. Murphy (2010) Quantifying the distribution of probes between subcellular locations using unsupervised pattern unmixing. Bioinformatics 26:i7-i12 (Proceedings of 18^th Annual International Conference on Intelligent Systems in Molecular Biology; only 19% of submitted papers accepted).

Supervised approaches to pattern unmixing require examples of images for proteins that are found in only one fundamental subcellular pattern (e.g., organelle). When analyzing protein images on a proteome scale, the patterns may not all be known and/or proteins that are only present in each of these patterns may not be available. This paper described the first system for unsupervised unmixing of patterns, that is, simultaneously finding the underlying patterns and estimating the fraction of each protein in each.

T. Peng, G.M.C. Bonamy, E. Glory-Afshar, D. R. Rines, S. K. Chanda, and R. F. Murphy (2010) Determining the distribution of probes between different subcellular locations through automated unmixing of subcellular patterns. Proc. Natl. Acad. Sci. U.S.A. 107:2944-2949.

Proteins may be found in more than one subcellular location, but previous automated systems to classify images by their patterns could not estimate the amount in each. This paper is the first demonstration of the ability to unmix subcellular patterns in microscope images. It was chosen for a Highlights Track presentation at ISMB 2010.

T. Zhao, M. Velliste, M.V. Boland, and R.F. Murphy (2005). Object Type Recognition for Automated Analysis of Protein Subcellular Location. IEEE Trans. Image Proc. 14:1351-1359

Intelligent Acquisition for Fluorescence Microscopy

C. Jackson, E. Glory, R. F. Murphy and J. Kovacevic (2011) Model building and intelligent acquisition with application to protein subcellular location classification. Bioinformatics 27:1854-1859.

This paper describes a model of object dynamics and an algorithm for acquiring images of a given sample to efficiently learn the model parameters.

C. Jackson, R. F. Murphy, and J. Kovacevic (2009) Intelligent Acquisition and Learning of Fluorescence Microscope Data Models. IEEE Trans Image Proc. 18:2071-2084.

C. Jackson, R.F. Murphy and J. Kovacevic (2007). Efficient Acquisition and Learning of Fluorescence Microscopy Data Models. Proceedings of 2007 IEEE International Conference on Image Processing, pp. VI-245-VI-248.

Subcellular Pattern Analysis – Tissues

A. Kumar, A. Rao, S. Bhavani, J.Y. Newberg, R. F. Murphy (2014) Automated Analysis of Immunohistochemical Images Identifies Candidate Location Biomarkers for Cancers. Proc. Natl. Acad. Sci. U.S.A. 111:18249-18254.

This paper describes the construction of a system for measuring changes in subcellular location between normal and cancerous tissues (the subject of U.S. patent number 9,092,850) and uses it to identify proteins that are “location biomarkers” of different types of cancers.

A. Rao and R.F. Murphy (2011) Determination of Protein Location Diversity Via Analysis of Immunohistochemical Images from the Human Protein Atlas. Proceedings of the 2011 IEEE International Symposium on Biomedical Imaging (ISBI 2011), 1727-1729.

E. Glory-Afshar, E. Garcia Osuna, B. Granger, and R. F. Murphy (2010) A Graphical Model To Determine The Subcellular Protein Location In Artificial Tissues. Proceedings of the 2010 IEEE International Symposium on Biomedical Imaging (ISBI 2010), pp. 1037-1040.

E. Glory, J. Newberg, and R.F. Murphy (2008). Automated Comparison Of Protein Subcellular Location Patterns Between Images Of Normal And Cancerous Tissues. Proceedings of the 2008 IEEE International Symposium on Biomedical Imaging (ISBI 2008), pp. 304-307.

J. Newberg and R.F. Murphy (2008). A Framework for the Automated Analysis of Subcellular Patterns in Human Protein Atlas Images. J. Proteome Res. 7: 2300-2308.

Subcellular Pattern Analysis – Cultured Cells

L. P. Coelho, J. D. Kangas, A. Naik, E. Osuna-Highley, E. Glory-Afshar, M. Fuhrman, R. Simha, P. B. Berget, J. W. Jarvik, and R. F. Murphy (2013) Local Features Provide Better Generalization of Subcellular Location Classifiers to New Proteins. Bioinformatics 29: 2343-2349.

This paper provides a new perspective on the problem of classifying subcellular patterns. Previous work, beginning with our initial framing of the problem in Boland et al (1997), Boland et al (1998) and Boland & Murphy (2001), considered the problem of recognizing organelle patterns by measuring recognition of new images of the same marker proteins that had been used for training. However, this does not provide a good estimate of performance for classifying images of new proteins localized to the same organelle, which may not have exactly the same pattern as those used for training. Since previous image collections to not allow assessment of this performance, we describe new open access image collections containing multiple proteins that localize to each major. We also describe modifications to local feature methods that incorporate information from reference channels and show that these new features, combined with previously described features, provide improved performance on this task.

J. Li, J.Y. Newberg, M. Uhlén, E. Lundberg, and R.F. Murphy (2012) Automated Analysis and Reannotation of Subcellular Locations in Confocal Images from the Human Protein Atlas. PLoS ONE 7:e0050514.

J. Li, L. Xiong, J. Schneider, and R.F. Murphy (2012) Protein Subcellular Location Pattern Classification in Cellular Images Using Latent Discriminative Models. Bioinformatics 28, i32-39

Y. Hu, E. Garcia Osuna, J. Hua, T. S. Nowicki, R. Stolz, C. McKayle and R. F. Murphy (2010) Automated Analysis of Protein Subcellular Locations in Time Series Images. Bioinformatics 26:1630-1636.

Most work on automatically classifying subcellular patterns uses static images and is unable to distinguish proteins by their dynamic behavior. This paper describes a number of approaches for calculating features to describe variation in location over time, and shows that these features allow better discrimination between protein patterns.

J. Y. Newberg, J. Li, A. Rao, E. Lundberg, F. Ponten, M. Uhlen and R. F. Murphy (2009) Automated Analysis Of Human Protein Atlas Immunofluorescence Images. Proceedings of the 2009 IEEE International Symposium on Biomedical Imaging (ISBI 2009), pp. 1023-1026.

Subcellular Pattern Analysis – Yeast

S. Huh, D. Lee and R. F. Murphy (2009) Efficient framework for automated classification of subcellular patterns in budding yeast. Cytometry 75A:934-940.

S.-C. Chen, T. Zhao, G. J. Gordon, and R. F. Murphy (2007). Automated Image Analysis of Protein Localization in Budding Yeast. Bioinformatics 23:i66-i71

Learning Subcellular Sorting Pathways

T. Lin, Z. Bar-Joseph, and R. F. Murphy (2011) Learning Cellular Sorting Pathways Using Protein Interactions and Sequence Motifs. Journal of Computational Biology 18: 1709-1722.

T. Lin, Z. Bar-Joseph, and R. F. Murphy (2011) Learning Cellular Sorting Pathways Using Protein Interactions and Sequence Motifs. Lecture Notes in Bioinformatics (Proceedings of RECOMB 2011) 6577:204-221.

This paper (presented at RECOMB and published in slightly edited form in an issue of the Journal of Computational Biology featuring selected papers) uses both known motifs as well as motifs learned as described in Lin et al (2010) (below) in combination with data on protein-protein interaction to learn a sorting model for subcellular localization.

T. Lin, R.F. Murphy, and Z. Bar-Joseph (2010) Discriminative Motif Finding for Predicting Protein Subcellular Localization. IEEE/ACM Transactions on Computational Biology and Bioinformatics 8:441-51.

Many systems for predicting subcellular location of proteins using sequence motifs have been described, but this paper describes the first approaches for learning these motifs given just sequences and locations. The system can achieve results comparable to the best current predictors but on the much harder task of learning motifs as well.

Content-based Image Retrieval

B.H. Cho, I. Cao-Berg, J.A. Bakal, and R.F. Murphy (2012) OMERO.searcher: Content-based image search for microscope images. Nature Methods 9:633-634.

This paper showed that Subcellular Location Features described previously (see Subcellular Pattern Analysis) can be used to find images in an OMERO database that match the pattern of a user-supplied image of an unknown cell pattern. The open source software is now included in the OMERO distribution.

Reviews and Commentaries

R. F. Murphy (2016) Building Cell Models and Simulations from Microscope Images. Methods 96:33-39

K.T. Roybal, P. Sinai, P. Verkade, R. F. Murphy, and Christoph Wülfing (2013) The actin-driven spatiotemporal organization of signaling in T cells activated by antigen presenting cells. Immunological Reviews 256: 133-147.

K.W. Eliceiri, M.R. Berthold, I.G. Golberg, L. Ibanez, B.S. Manjunath, M.E. Martone, R.F. Murphy, H. Peng, A.L. Plant, B. Roysam, N. Stuurmann, J.R.Swedlow, P. Tomancak, and A.E. Carpenter (2012) Biological Imaging Software Tools. Nature Methods 9:697-710.

R. F. Murphy (2012) CellOrganizer: Image-derived Models of Subcellular Organization and Protein Distribution. Methods in Cell Biology 110: 179-193.

R. F. Murphy (2011) An active role for machine learning in drug development. Nature Chemical Biology 7:327-330.

This commentary provides a perspective on two critical areas in which machine learning methods are projected to contribute to drug development and broader experimental biology: building image-derived models of subcellular organization, and using active learning to avoid exhaustive experimentation of large experimental spaces. Our work on these areas is described in separate sections above.

R. F. Murphy (2010) Communicating Subcellular Distributions. Cytometry Part A 77A:686-692.

A. Shariff, J. Kangas, L.P. Coelho, S. Quinn and R.F. Murphy (2010) Automated Image Analysis for High Content Screening and Analysis. J. Biomolec. Screening 15:726-734.

L. P. Coelho, E. Glory-Afshar, J. Kangas, S. Quinn, A. Shariff, and R. F. Murphy (2010) Principles of Bioimage Informatics: Focus on machine learning of cell patterns. Lecture Notes in Computer Science 6004:8-18.