Unfolding and identification of membrane proteins in situ

Abstract
Editor's evaluation
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Single-molecule force spectroscopy (SMFS) uses the cantilever tip of an atomic force microscope (AFM) to apply a force able to unfold a single protein. The obtained force-distance curve encodes the unfolding pathway, and from its analysis it is possible to characterize the folded domains. SMFS has been mostly used to study the unfolding of purified proteins, in solution or reconstituted in a lipid bilayer. Here, we describe a pipeline for analyzing membrane proteins based on SMFS, which involves the isolation of the plasma membrane of single cells and the harvesting of force-distance curves directly from it. We characterized and identified the embedded membrane proteins combining, within a Bayesian framework, the information of the shape of the obtained curves, with the information from mass spectrometry and proteomic databases. The pipeline was tested with purified/reconstituted proteins and applied to five cell types where we classified the unfolding of their most abundant membrane proteins. We validated our pipeline by overexpressing four constructs, and this allowed us to gather structural insights of the identified proteins, revealing variable elements in the loop regions. Our results set the basis for the investigation of the unfolding of membrane proteins in situ, and for performing proteomics from a membrane fragment.

Editor's evaluation

This paper presents a method to identify membrane proteins in native cell membranes based on a combination of single molecule AFM and an unsupervised clustering procedure to identify clusters of single-protein curves. This original approach represents a definitive step forward for AFM technology and methodology, which can generally only be used to characterize purified biomolecules of known identity.

https://doi.org/10.7554/eLife.77427.sa0

Introduction

Mapping and recovering the structure of membrane proteins is a challenging aim. Arguably, the most successful tool for this purpose is cryo electron microscopy (cryo-EM) – but it requires frozen samples, and precise single particle measurements can be achieved only with purification. Cryo-EM and now AlphaFold Jumper et al., 2021 have changed structural biology since the molecular structure of possibly all proteins can be determined with reasonable effort. However, these tools determine only the most probable structure and – at the moment – are not able to identify the various configurations visited by the proteins or their mechanical properties in physiological conditions.

In fact, much of what we know about the mechanics and the structure at room temperature (RT) of cell membranes and membrane proteins comes from atomic force microscopy (AFM) (Al-Rekabi and Contera, 2018; Casuso et al., 2012; García-Sáez et al., 2007; Zuttion et al., 2018) which can operate in liquid environments with nanometric resolution.

AFM-based single-molecule force spectroscopy (SMFS) uses the tip of an AFM to apply a stretching force to unfold a single protein. SMFS provides a 1D force-distance (F-D) curve which encodes the unfolding pathway so that from the analysis of the sequence of force peaks it is possible to identify the folded domains and their variability (Engel and Gaub, 2008). It has been recently confirmed that from the 1D amino acid sequence of a protein it is possible to accurately determine its 3D structure (Jumper et al., 2021), therefore it is tempting to continue exploring which information can be recovered from totally different 1D spectra – the F-D curves provided by SMFS.

Hitherto SMFS has been mostly used to study the unfolding mechanics of purified proteins, in solution or reconstituted in lipid bilayers. Although the information that is possible to obtain at room temperature (RT) is of great interest (e.g. it allows studying mechanical stability [Sumbul et al., 2018a; Thoma et al., 2015] or structural heterogeneity [Hinczewski et al., 2016]), unfolding experiments have been performed only in less than 20 different membrane proteins in the last 20 years, most likely because of the difficulties involved in purification and reconstitution. Moreover, studying membrane proteins in their native membrane instead of purifying them would be preferable because their folding state highly depends on the physical and chemical properties of the cell membrane and on their molecular partners that might cooperatively function nearby (e.g. in case of oligomers; Maity et al., 2015; Thoma et al., 2018).

In this manuscript we aim to bridge the gap between all these recent breakthroughs by attempting to identify and recover structural information of membrane proteins embedded in biological samples, namely in their native environment. In this way we would like to obtain information on mechanical properties and on the possible structural heterogeneity at RT of a wide range of proteins, overcoming the limiting factor of purification that hindered the wide application SMFS on membrane proteins.

We describe a complete pipeline, including the experimental methods and the data analysis, which represent a first step forward in this direction. The pipeline allows to identify membrane proteins obtained from SMFS on single cells. First, we developed a technique to extract a piece of the membrane suitable for SMFS so to obtain millions of F-D curves from native biological membranes. Second, we used an unsupervised clustering procedure to detect sets of similar unfolding curves. Finally, we implemented a Bayesian meta-analysis using information from mass spectrometry and protein structure databases that allows to identify a limited list of candidates of the unfolded proteins, which can then be confirmed with specifically engineered constructs. We first characterized the unfolding of purified membrane proteins reconstituted in vitro in lipid bilayers. Then, we focused on the native cell membranes of five cell types (primary cells and cell lines).

Our protocol allowed us to obtain the combined protein profile of the isolated membrane fragments, and to gather specific structural information on variable segments of the identified proteins. We expanded the number of known unfolding spectra by more than 40, and we provided the molecular identification of four mammalian membrane proteins. Unexpectedly, the distribution of the membrane protein population found with mass spectrometry on thousands of cells can also be recovered with our F-D curves obtained from 3 to 10 cells, suggesting that membrane proteomic may be possible at the single-cell level.

Even if the proposed protocol cannot compete with the identification accuracy of mass spectroscopy performed on thousands of cells, or with the data quality of a SMFS experiment performed on a single purified protein, it accomplishes a different task – it allows gathering biologically meaningful information (i.e. structural properties at RT and in the native cell membrane, protein profiling) at the single-cell level that cannot be captured otherwise.

Results

Unfolding proteins from isolated cell membranes

The first ingredient of our pipeline is an unroofing method (Galvanetto, 2018a) to isolate the apical part of single-cell membranes (see Figure 1—figure supplement 1) containing membrane proteins with negligible contamination of cytoplasmic proteins (Figure 1—figure supplements 2–4). We sandwiched a single cell between two glass plates, the culture coverslip and another plate mounted on the AFM itself (see Figure 1B–C, triangular coverslip). The triangular coverslip is coated with polylysine which favors membrane adhesion. A rapid separation of the plates permits the isolation of the apical membrane of the cell (see Figure 1D–E). The method is reliable (n=105, ~80% success rate) with cell types grown on coverslips. With non-adherent cells, like freshly isolated rods, membranes were isolated with a lateral flux of medium (Clarke et al., 1975) (see Methods). During the unroofing process, we verified the absence of any adsorption of cytoplasmic material like tubulin, actin, mitochondria, or free cytoplasmic proteins from the unroofed membrane patch where only the membrane proteins that are hold by the lipids are present (see Figure 1—figure supplements 3–5).

Figure 1 with 9 supplements see all

Download asset Open asset

Experimental method for single-cell membrane isolation and protein unfolding.

(A) Workflow of the method in four steps: isolation of the apical membrane of single cells; atomic force microscopy (AFM)-based protein unfolding of native membrane proteins; identification of the persistent patterns of unfolding and characterization of the population of unfolding curves; Bayesian protein identification thanks to mass spectrometry data, Uniprot and PDB. (B) Side view and (C) top view of the cell culture and the triangular coverslip approaching the target cell (red arrow) to be unroofed. (D) Positioning of the AFM tip in the region of unroofing. (E) AFM topography of the isolated cell membrane with profile. (F) Cartoon of the process that leads to single-molecule force spectroscopy (SMFS) on native membranes. Examples of force-distance (F-D) curves of (G) no binding events; (H) constant viscous force produced by membrane tethers during retraction; (I) sawtooth-like patterns, typical sign of the unfolding of a membrane protein.

We verified with the AFM (Figure 1F) that the isolated membrane patches have a height of 5–8 nm with roughness in the order of 1 nm. Then, we performed conventional SMFS (Oesterhelt et al., 2000) collecting more than 2 millions of F-D traces from all the samples (hippocampal neurons, DRG neurons, neuroblastoma, rods, and rod discs, see below). Among the obtained curves, ~95% shows no binding (Figure 1G), ~3% shows plateau ascribable to membrane tethers (Figure 1H), while the remaining ~2% displays the sawtooth-like shape that characterizes the unfolding of proteins (Oesterhelt et al., 2000; Figure 1I) with each tooth fitting the worm-like chain (WLC) model with a persistence length of ~0.4 nm (Figure 1—figure supplement 6; Li et al., 2002). We point out that for membrane proteins, the term unfolding refers broadly to the tertiary structure, because the specific timing and dynamics of the pullout of a transmembrane segment and the unfolding of its secondary structure is not resolved yet. The AFM tips becomes attached to membrane proteins mainly by hydrophobic and hydrophilic interactions (Müller and Engel, 2007). For purely kinetic reasons it will be more likely the AFM will get attached to the C-terminus or the N-terminus, since those typically exert less resistance to traction than a loop. In other words, if the tip gets attached to a loop, the trace will be very short, and not detected by our analysis. This mechanism is non-specific and agnostic of the specific nature of the protein. We decided not to functionalize in any manner the AFM tip in order to avoid to introduce biases. These ~50,000 traces are dramatically heterogeneous, differing in total length, number of peaks, maximum force, etc. This is not surprising, since cell membranes contain a large number of different proteins, each hosted in different local environment. In the following we describe a procedure which allows, to some extent, disentangling this bundle.

In SMFS it is assumed that the binding between the cantilever and the protein occurs either at the C- or at the N-terminus, and that the protein is fully unfolded by the tip. However, some traces suggest that also other events take place: (i) the simultaneous attachment of two or more proteins to the tip (Walder et al., 2017), (ii) the incomplete unfolding of the attached protein (Tanuj Sapra et al., 2006), (iii) the binding of the AFM tip to a loop of the protein instead of to a terminus end (Figure 1—figure supplement 7C-F). (i) Attachment of multiple proteins (Figure 1—figure supplement 7D): the resulting F-D curves will not have a recurrent pattern; if two proteins form a complex, the resulting spectrum is the sum of the two individual spectra, which causes deviations of the measured persistence length (Figure 1—figure supplement 8). The simultaneous unfolding of multiple proteins is also characterized by force changes and varying persistence length (Figure 1—figure supplement 7D,G and Figure 1—figure supplement 8). (ii) Incomplete unfolding of the protein (Figure 1—figure supplement 7E): if the tip prematurely detaches from the terminus, the F-D curve has similar but shorter pattern compared to a complete unfolding (Figure 1—figure supplement 7C). The fraction of curves that prematurely detaches has been reported to be ~23% of the fully unfolded protein (Tanuj Sapra et al., 2006).

(iii) Binding of the AFM tip to a loop (Figure 1—figure supplement 7F): this case is equivalent to the attachment of multiple proteins. However, if the attachment of the cantilever tip to a loop occurs with some consistency, we will obtain a recurrent pattern with the features described in case (i) (deviation of persistence length during intersection, two major levels of unfolding force, see Figure 1—figure supplement 8).

Therefore, there is no doubt that a large fraction of F-D curves will not represent the proper unfolding of a single protein. But inspired by the successes of single particle cryo electron microscopy (cryo-EM), which can produce atomic resolution structures, despite selecting less than 20% of the protein images (Yi et al., 2019), we attempted a similar approach on SMFS data, under the same hypothesis that ‘bad’ events are likely to produce non-recurrent patterns of unfolding.

The key tool to recognize recurrent patterns is an automated clustering method developed in our group (Ilieva et al., 2020) based on density peak clustering (Rodriguez and Laio, 2014; Figure 1—figure supplement 9, see Methods for a detailed description of the algorithm). In short, this approach detects statistically dense F-D patterns occurring often in the sample (clusters). It is based on a previously established ‘similarity distance’ (Marsico et al., 2007) in the context of SMFS and, remarkably, it does not require to pre-set neither the shape nor the number of clusters.

Benchmarking the analysis with mixtures of purified proteins reconstituted in lipid bilayers

We first tested the consistency of our pipeline with three known membrane proteins. We reconstituted in lipid bilayers (Shen et al., 2013) highly purified Channelrhodopsin (ChR1) (Nagel et al., 2002), Bacteriorhodopsin (bR) (Oesterhelt et al., 2000), and the cyclic adenosine monophosphate gated channel SthK (Marchesi et al., 2018; Figure 2A), and we performed SMFS experiments (see Supplementary file 2 for sample statistics). bR, ChR1, and SthK were reconstituted in vitro one at a time. Since we did not have control of the protein orientation in the bilayer, we expected two main F-D patterns, and therefore two clusters for each sample: one representing the unfolding of the protein from the N-terminus and the other from the C-terminus.

Figure 2 with 3 supplements see all

Download asset Open asset

Unfolding of reconstituted mixtures of membrane proteins.

(A) Scheme of the structure of the three purified proteins used in the in vitro preliminary step: Channelrhodopsin (ChR1), SthK, and Bacteriorhodopsin (bR) (cylinders represent α-helices). (B) Superimposition of 101 unfolding curves (density plot) of the full unfolding of ChR1 from the N-terminus. (C) Density plot of the full unfolding of Sthk from the C-terminus. (D) Density plot of the full unfolding of bR. (E) Protein profile, that is, the histogram of the maximal contour length (*L_c*_max) of all the force-distance (F-D) curves in the clusters (black line), and of all the F-D curves collected from the sample (gray line, not in scale), from samples with mixtures of bR, ChR1, and SthK with a relative abundance of 1:1:1, 1:7:20, and 1:0:1 (bR-SthK only). Arrows indicate where the F-D curves of (B), (C), and (D) accumulate in the histogram. The peak at ~90 nm indicated by bR+short also contains the shorter clusters of ChR1 and SthK shown in Figure 2—figure supplement 1, which also appear in the mixtures.

The clustering algorithm identified two almost indistinguishable clusters for bR (see Methods), and the result is in agreement with the fact that the unfolding of bR from the two termini is almost symmetric as previously reported (Kessler and Gaub, 2006; Figure 2D). In the case of ChR1 we obtained three clusters where the second cluster – cluster C2 – represents the partial unfolding of ChR1 from the C terminus because the peaks of C2 match perfectly the first three peaks of the full unfolding (cluster C1), while cluster C3 perfectly match the unfolding from the N-terminus (Figure 2B). Sthk generated two clusters, clusters S1 and S2 (Figure 2C), as expected (all the clusters are shown in Figure 2—figure supplement 1). For all the three proteins, we observed that: (i) the value of L_c_max – that is, the estimated length of the segment of a.a. stretched in our SMFS experiments – is within 5% equal to the total length of the protein minus the free length of the N-terminus (C-terminus) domain if the cantilever tip attached to the C-terminus (N-terminus) (see Figure 2B and C and Figure 2—figure supplement 1B) and (ii) the force peaks of the clusters of F-D curves colocalize with the unfolded regions or loops of their structures (see Figure 2—figure supplement 1A and C).

Next, we simultaneously reconstituted two proteins in lipid bilayers (see Methods) obtaining the same clusters associated with the unfolding patterns observed in the single protein reconstitution (Figure 2—figure supplement 2). We finally mixed all the three proteins together, with different relative abundances: 1:1:1 and 1:7:20 respectively for bR, ChR1, and SthK. We observed that the number of traces per cluster scaled approximately with the abundance of the mix (Figure 2—figure supplement 3).

The histogram of the maximal contour length (L_cmax) of all the F-D curves (Figure 2E) encodes the protein content of the sample and allows a rapid detection of the presence and of the abundance of a given protein in sub-femtogram samples, also providing a qualitative indication on the concentration of a given protein. The success of this preliminary in vitro investigation prompted us to move to native membranes.

Clustering SMFS data from native membrane proteins

Next, we applied our pipeline to DRG, hippocampal neurons, neuroblastoma, rod outer segments, and rod discs membranes (Figure 3A–E) and we obtained 301,654 curves from the hippocampal neurons, 413,468 from DRG neurons, 394,118 from neuroblastoma, 386,128 from rods, and 106,528 from rod discs. From these F-D traces we applied our clustering procedure that identified 15, 10, 11, 8, and 5 statistically dense pattern of unfolding present in the dataset (Figure 3F–J, the clusters are named with the sample identifier and a unique number for the cluster, for example, H3 refers to the third cluster from hippocampal cell membranes). These traces show that clean unfolding patterns can be detected even in patches of native membranes.

Figure 3 with 1 supplement see all

Download asset Open asset

Unfolding clusters in native cell membranes.

Bright-field image of (A) dorsal root ganglia neuron; (B) hippocampal neuron; (C) rod before unroofing (scale bar 15 µm). (D) Atomic force microscopy (AFM) error image of an isolated disc (scale bar 1 µm), (E) NG108-15 cells. (**F, G, H, I, J**) Examples of obtained clusters from the native membranes shown as density plots, that is, superposition of the n unfolding curves agglomerated and colored as a heatmap. (K) Comparison of the protein profiles obtained with mass spectrometry vs. the force-distance (F-D) curves obtained with single-molecule force spectroscopy (SMFS) shows a good correlation. The protein profile from mass spectrometry is the normalized histogram of the total number of proteins found in the sample, considering their abundance. The protein profile from SMFS is the histogram of the maximal contour length (*L_cmax*) of all the F-D curves selected with our clustering procedure. (**L, M, N, O, P**) Representation of all the clustered F-D curves in 2D: x-axis is the maximum contour length; y-axis is the average unfolding force (DRG: n=1255; hippocampus: n=563; rod: n=1039; disc: n=703, NG108-15 cells: n=1591).

We identified four major classes of clusters: Short curves with increasing forces: clusters DRG12, H5, H8, and R3 show repeated peaks (ΔL_c 10–20 nm, distance between consecutive peaks) of increasing force up to 400 pN. Long and periodic curves: R6, H7, and DRG10 display periodic peaks of ~100 pN and with a ΔL_c of 30–40 nm whose unfolding patterns are similar to what seen when unfolding LacY (Serdiuk et al., 2016). Conventional short curves: the majority of the identified clusters like DRG1, H3, R8, and all clusters from the rod discs have F-D curves with a total length less than 120 nm with constant or descending force peaks. These F-D curves share various features with the opsin family unfolded in purified conditions (Engel and Gaub, 2008), for example, a conserved unfolding peak at the beginning (at contour length <20 nm) associated to the initiation of the denaturation of the protein. We also found unconventional clusters such as DRG7, DRG8, and R7: DRG8 exhibits initial high forces and with variable peaks followed by more periodic low forces, while cluster R7 has a conserved flat plateau at the end of the curve of unknown origin.

A compact representation of clustered F-D curves becomes more important in native samples where the information stored in the SMFS data is more complex (see Figure 3L–P). The cancerous NG108-15 cells have few short and not particularly stable membrane proteins compared to neural cells which instead have many clusters of proteins under 100 nm that unfold even above 200 pN. On the contrary, NG108-15 cells have a higher fraction of long and stable proteins than neurons.

Furthermore, in native membranes we can directly compare the expected protein profile obtained from mass spectrometry (usually obtained from thousands of cells, see Methods) with the F-D curves profile of just few cells (Figure 3K) which shows a good agreement in particular in the region above 100 nm of protein length.

Molecular identification of detected clusters

Having identified clusters of F-D curves from native membranes that approximate the overall population of membrane proteins (Figure 3K), the next question is: can we identify the membrane proteins whose unfolding corresponds to the identified clusters in Figure 3?

For this purpose, we developed a Bayesian method providing candidate proteins on the basis of the information present in the data from mass spectrometry (ProteomeXchange) and in other proteomic databases (Uniprot, PDB) – combined with the empirical results obtained from SMFS literature (Bosshart et al., 2012; Cisneros et al., 2005; Ge et al., 2016; Kawamura et al., 2013; Kedrov et al., 2004; Klyszejko et al., 2008; Maity et al., 2015; Möller et al., 2003; Oesterhelt et al., 2000; Sapra et al., 2009; Serdiuk et al., 2016; Thoma et al., 2017; Thoma et al., 2012). The Bayesian identification (Figure 4A) is based on two steps: first, the crossing of information between the clusters with the results of mass spectrometry analysis of the sample under investigation (hippocampal neurons, discs, etc.); second, a refinement of the preliminary candidates using additional information (structural and topological) present in the PDB and Uniprot databases. The first step relies on the comparison of L_c_max of each cluster with the length of proteins identified in the mass spectrometry data (Chen et al., 2006; Kwok et al., 2008; Panfoli et al., 2008). By using the protein abundance included in the mass spectrometry data as the prior, a first list of candidate proteins is obtained as well as their probabilities (Figure 4A III). The SMFS literature contains 14 examples of unfolded membrane proteins allowing a comparison between the L_c_max of the measured F-D curves and the real length of the same protein completely stretched (Figure 4B). Therefore, we extrapolated the first conditional probability of our Bayesian inference, showing that on average, the value of L_c_max corresponds to 89% of the real length of the protein (R²=0.98). When the structure of candidate proteins is known (helices, sheets), these probabilities are further refined considering for example, the force of unfolding or the position of the loops which are known to constrain the possible unfolding patterns (Figure 4C–F, see Methods for the formal description).

Figure 4 with 1 supplement see all

Download asset Open asset

Bayesian identification and conditional probabilities.

(A) Logical workflow of the Bayesian steps: selection due to total length and abundance (mass spectrometry), refinement with structural and topological information (PDB and Uniprot). (B) Comparison of the real length of the protein vs. the measured maximal contour length of the force-distance (F-D) curves in 14 single-molecule force spectroscopy (SMFS) experiments on membrane proteins (Pearson coefficient r=0.991). (C) Conditional probability of the observed maximal length of the clusters obtained from (B). (D) Comparison of the force necessary to unfold β-sheets and α-helices in 22 SMFS experiments. (E) Conditional probability of the observed unfolding forces obtained from (D). (F) Conditional probability for the occurrence of unfolding peaks extracted from SMFS literature (see Methods). Unfolding peaks occurs most likely in the loops (82%) than in transmembrane domains (n_peaks = 54, from 11 SMFS experiments of different membrane proteins). The points in the green (yellow) region represents unfolding peaks occurred in a cytoplasmic (extracellular) loop, the point in the pink area occurred in a transmembrane domain. The points above the green and below the yellow regions occurred in cytoplasmic and extracellular domains, respectively. The scale is approximate because in rare occasion loops are longer than 10 nm.

Like in mass spectrometry, this identification is probabilistic (Figure 5), and it allows to reduce the number of possible candidates from hundreds to one to four proteins.

Figure 5

Download asset Open asset

Bayesian identification of the unfolding clusters.

Most probable candidates for the unfolding clusters found in (A) hippocampal neurons; (B) rods; (C) rod discs; (D) NG108-15 cells.

To prove the reliability of the proposed pipeline protocol, we focused on four clusters found in NG108-15 cells where the Bayesian method assigned a high probability to the membrane proteins TMEM16F, TRPC1, TRPC5, and TRPC6. Reverse transcription polymerase chain reaction and western blot analysis (Wu et al., 2007; Figure 6—figure supplement 1) confirm that these membrane proteins are abundantly expressed in neuroblastoma NG108-15 cells. We overexpressed the construct 6xHis-N2B-protein-GFP, where protein is the molecular candidate to be validated (Figure 6A–B, Supplementary file 1). The N2B is a chain of 204 aa which can be unfolded with a force less than 10 pN providing a well-known signature (Linke and Grützner, 2008), and GFP is the green fluorescent protein used to identify which NG108-15 cell was successfully transfected, and if the proteins were present in the unroofed membrane patch (Figure 6C).

Figure 6 with 1 supplement see all

Download asset Open asset

Validation of the method with a comparison between native clusters and those obtained with overexpression of a construct with the N2B signature at the N-terminal and GFP at the C-terminal.

(A) Four clusters found in the dataset of force-distance (F-D) traces pulled from NG108-15 cells wild type. (B) The Bayesian identification of the clusters in (A) with the candidate membrane proteins. (C) Confocal images of NG108-15 cells overexpressing the construct N2b-Protein-GFP, as reported by the green fluorescence emitted by GFP. (D) From left to right, color density plots (blue indicating a colocalization among F-D traces of more than 80%) of the clusters superimposed to an F-D curve bearing the N2B signature (85 nm segment with a flat force below 10 pN) obtained in the sample overexpressing the candidate protein TMEM16F, TRPC1, TRPC5, and TRPC6, respectively. (E) Global histogram of *L_c* of the clusters in A and the clusters with the N2B signature from cells transfected with the corresponding construct. (F) Position of the most likely rupture regions of the proteins according to (E) superimposed to the cartoon of the cryo electron microscopy (cryo-EM) structure available in the PDB.

We compared the clusters obtained without overexpression (NG8, NG4, NG5, and NG11) with the new clusters obtained from the unfolding of the constructs N2B-TMEM16F-GFP, N2B-TRPC1-GFP, N2B-TRCP5-GFP, and N2B-TRPC6-GFP from the transfected cells (Figure 6D and E). These F-D traces had the well-known signature of the N2B domain consisting in a flat portion with a length of ~85 nm, which was followed by force peaks almost exactly matching those observed in the corresponding density plots (Figure 6D). This matching was also confirmed by the correspondence of the cumulative peaks observed in the global histogram of the contour length of the NG8, NG4, NG5, and NG11 clusters with those obtained from neuroblastoma cells transfected with the constructs N2B-TMEM16F-GFP, N2B-TRPC1-GFP, N2B-TRCP5-GFP, and N2B-TRPC6-GFP translated of ~85 nm because of the presence of the N2B (Figure 6E).

This pipeline demonstrates that from the obtained clusters (Figure 3) it is possible to derive the unfolding spectra of membrane proteins in their almost-native conditions without the need of purification. The superposition of the peaks (Figure 6D–E) confirms the identification and allows to map the unfolding positions along the tertiary structure of the proteins (Figure 6F).

Structural insights from SMFS

The confirmation of the molecular identity of the unfolded proteins allows a better understanding of their molecular structures obtained from cryo-EM, in particular of the not well-folded regions. SMFS detects the unfolded domains of a protein and can characterize their variability where the more powerful cryo-EM precisely determines the position of the atoms of the well-folded domains. The unfolding of the construct N2B-TRPC6-GFP as well as the endogenous TRPC6 show force peaks at 143, 193, and 230 nm with different properties (Figure 7). The peak at 143 nm is always well defined, but the force peaks at 193 and 230 nm – occurring at the extracellular loop E1 between S1 and S2 and at the extracellular loop E2 between S3 and S4, respectively (see positions in Figure 7B) – show variability in the location and strength of the secondary peaks (Figure 7D and E). The unfolding behavior observed at 193 and 230 nm reflects the presence of small structural elements (Yu et al., 2017) which are present in some but not all instances (see inserts in Figure 7D–E). The peak at 143 nm in contrast shows a clear and reproducible unfolding behavior (Figure 7C) typical of a defined and well-folded structure (Rico et al., 2013; Takahashi et al., 2018).

Figure 7 with 1 supplement see all

Download asset Open asset

Structural segmentation in the loops of TRPC6.

(A) Density plot of force-distance (F-D) curves of TRPC6 (validated from cluster NG11). (B) 3D structure of TRPC6 (PDB: 6UZA) with highlighted the position corresponding to the unfolding peaks in (A). (C) Representative F-D curves showing the single peak at 143 nm. Insert: comparison of the agreement with the three proposed structures of TRPC6 (PDB ID, green: 5YX9; purple: 6UZA; yellow: 6UZ8). (D) Representative F-D curves showing the variable behavior at 193 nm where ~20% of the F-D curves shows a single jump that suggests no structural fragmentation of the loop like in the green/purple structures. Right inserts: comparison of the structure of the loop obtained by cryo electron microscopy (cryo-EM) where only the yellow structure shows the presence of a 2-turn helix. (E) Representative F-D curves at 230 nm where ~30% of the F-D curves shows a single jump that suggests no structural fragmentation of the loop like in the green/yellow structures. Right inserts: comparison of the structure of the loop obtained by cryo-EM where only the purple structure shows the presence of a 1-turn helix.

In the light of these results, we analyzed the available cryo-EM structures of TRPC6. There are three published structures (Bai et al., 2020; Tang et al., 2018); these three cryo-EM structures are very similar – and almost identical – particularly in the long transmembrane α-helices, but differ in the connecting loops where short α-helices are present only in some of these structures (Figure 7C–E, inserts). These short α-helices in the loops E1 and E2 are the small structural elements at the origin of the segmentation observed in the F-D traces (Figure 7D–E). However, as shown in the structures in yellow, the α-helices are not always observed as the single-peak F-D curves suggest. Therefore, the most likely conclusion from our SMFS data is that, at RT and in their native environment, the different cryo-EM structures coexist and the connecting loops E1 and E2 have a variable structure.

The observed variability is not restricted only to loop E1 and E2 of TRPC6: also the loop pre-S1 and S1 of TRPC5 shows a similarly complex unfolding behavior as opposed to the unfolding of the loop between S2 and S3. Clusters DRG5 or H3 show an even more exotic behavior. Their unfolding curves show a dual nature: secondary peaks and unfolding humps (Figure 7—figure supplement 1).

Towards membrane proteomics on single cells

One of the most important results of this work are the protein profiles of Figure 2E and Figure 3K. SMFS experimentalists are aware that the collected dataset are very heterogeneous even with highly pure samples. And this is indeed reflected in the gray profiles of Figure 2E where no clear data structure is present, and where the three samples are indistinguishable from each other. But after the clustering procedure described in this manuscript, that requires minimal interventions by the user, we could obtain the black profiles in Figure 2E which clearly show the presence of structures in the data that have a straightforward correspondence with the samples. This protein profile can be considered as the minimal 2D representation of the SMFS dataset, but more importantly a ‘first-hand information’ on the mass/length of the proteins in the sample. This positive outcome encouraged us to analyze the SMFS dataset from native cells in the same way. Figure 3K shows the comparison of the distribution of membrane proteins obtained with mass spectrometry (broken lines) against the distribution of the F-D curves that survived the clustering protocol (solid lines). The solid lines show broader peaks and the distributions are not matching perfectly, but we think that the similarity is still remarkable considering the several orders of magnitude difference on the amount of sample used by the two techniques. To become quantitatively reliable, this approach still needs appropriate improvements, but these results suggest that protein length detection of SMFS – restricted to clustered traces – provides a semi-quantitative proxy for mass detection at the single-cell membrane level. These results may pave the way to single-cell proteomics and phenotyping, with the potential to advance our understanding of disease development, progress, and treatment effects.

Discussion

Given the high diversity of membrane proteins in cells, the unequivocal identification of unlabeled proteins from single cells is a daunting task. The present manuscript shows that by combining SMFS on native membrane patches with information present in databases, a probabilistic identification of groups of proteins is possible. Our approach provides usually less than four candidates (Figure 5) for most of the identified clusters of F-D traces (p>95%), and we were able to prove the validity of the identification by properly engineered constructs (Figure 6).

Advantages of the method

The proposed pipeline offers the possibility to enrich the structural information usually derived from cryo-EM, with insights obtained by unfolding the proteins in their native environment. SMFS cannot distinguish between a β-sheet and an α-helix, but it can determine with a good accuracy the unfolded domains of a membrane protein at RT (see Figure 7; Engel and Gaub, 2008). Cryo-EM cannot capture the structural heterogeneity of the poorly folded regions, while SMFS can provide important complementary information, for instance about the mechanical stability of connecting loops.

The pipeline, in addition, offers the possibility to obtain the unfolding profile of the membrane proteins from a limited amount of native material (membranes isolated from 1 to 10 cells). We have shown that the unfolding profile of membrane proteins can be used as a fingerprint to characterize the sample under investigation (see Figures 2–3 and Figure 3—figure supplement 1). We foresee that this approach can be extended to characterize, and eventually distinguish, membranes in cells in healthy and sick conditions where other methodology cannot be applied because of sample scarcity (e.g. samples from patients).

General hallmarks of SMFS of membrane proteins

The unfolding of a protein is stochastic in nature because it can be viewed as Brownian diffusion of a particle in a tilted energy landscape (Engel and Gaub, 2008; Yu et al., 2017), therefore variability in F-D curves is intrinsic to the technique. But this variability is not unconstrained: the energy landscape of a membrane protein has obligatory intermediates that allow to cluster the F-D curves coming from the same protein (Marsico et al., 2007). The variation of an obligatory intermediate, or, in other words, a new force peak/a shift of a force peak, is a strong evidence of a conformational change in the system (Maity et al., 2015). Minor unfolding features can also provide evidence of mechanical plasticity (Takahashi et al., 2018) or structural segmentations of apparently continuous structures (Yu et al., 2017).

Here, we want to stress that we are referring to the ‘system’ protein plus the surrounding environment and not to the protein only. The result of a pulling experiment of a membrane protein, as opposed to globular proteins, is strictly speaking not a single biomolecule event. A membrane protein does not exist in isolation because at least it needs lipids, but very often it forms oligomers with other proteins or even more complex structures like G-protein-coupled receptors. Therefore, to study the mechanical stability of a membrane protein it is preferable to have it in its physiological environment, whether it is composed of only lipids or other partner proteins that concur in the stabilization of the structure (like in the CNG tetramer; Maity et al., 2015; Napolitano et al., 2021).

Technical limitations of the method and future directions

A major limitation of the proposed pipeline – in its present form – is the possibility to merge in the same cluster unfolding traces of proteins with a different molecular identity: indeed, from the mass spectrometry data it is clear that different proteins can have the same – or approximately the same – molecular weight, a similar unfolding pattern and a similar total unfolded length L_c. This issue is rather significant for short proteins, that is, those with values of L_c between 50 and 150 nm. In order to overcome this limitation, it will be desirable to couple SMFS with a device (e.g. solid-state nanopores; Lee et al., 2018) able to identify, even qualitatively, specific amino acids during the unfolding process.

In the range between 50 and 150 nm, we also expect a number of F-D curves resulting from arbitrary attachment or detachment to pass the quality filter (see Methods Block 2). But in this case, given their arbitrariness, the similarity distance will position them at the edge of dense clusters so they will not be central in the analysis.

An underlying technical problem is that it is not trivial to assess a priori the expected maximal contour length (max L_c) simply by knowing the length of the protein measured by mass spectrometry (which usually corresponds to the nominal length: Number of amino acids * 0.4 nm). The difference between the two lengths depends on the specific anchoring points of the protein: the measured max L_c corresponds to the length of the chain between the tip of the AFM and the last amino acid anchored to the membrane during the pulling. In our experiments on reconstituted proteins (see Figure 2A–D), and in many reports from the literature (Kawamura et al., 2013; Kessler and Gaub, 2006; Möller et al., 2003), the last anchored point is the last amino acid of the last transmembrane segment (within error). However, applying these heuristics to all the proteins present in the mass spectrometry dataset is not feasible because the structure of many of those proteins is not known, and even if known, the error on this estimate is difficult to be assessed. We decided to use a Bayesian framework that is more flexible and that naturally provides an error estimate of the results. In our framework we rescaled the nominal length with the probability distribution shown in Figure 4C. Based on the 14 references found in the literature, the max L_c is equal to 89% of the length measured by mass spectroscopy on average. The Pearson correlation coefficient is r=0.991, reflecting a very good correlation between the two quantities (see Figure 4B). Therefore, we think that this approach is more scalable, and easier to be updated when new data will become available from the literature.

The Bayesian approach we used is also robust in case of a large conformational change of a protein as long as the last anchoring point does not change more than 10% of the max L_c. The max L_c has the higher weight in the probability estimation while the contribution of the correlation of the peaks (see Figure 4—figure supplement 1) is a second-order correction. This means that the same protein with different conformation is still properly classified.

Abundancy of the membrane proteins in the isolated membrane is also a limiting factor: the number of distinct membrane proteins that are present, for example, in hippocampal neurons is believed to be in the order of hundreds but with only few of them representing the majority of the expressed proteins. With our approach indeed we identify only 10 clusters that are then assigned to some of the most abundant proteins in the respective contour length range (see Methods for a first-order assessment of the relation between the number of collected traces and the predicted dimension of a cluster).

A way to mitigate this issue would be to functionalize the cantilever tip to increase the yield of unfolding specific target proteins. The specific functionalization strategy really depends on the goal of the investigation. The goal of the present manuscript is to perform unbiased SMFS and we think that the relatively low yield is an advantage in our case. A high attachment rate would cause a higher probability of binding two proteins at the same time making the F-D curves analysis more complicated if not impossible to interpret. On the contrary, attachment rates of 2% means that in first approximation double binding is also the 2% of the binding events so the 0.04% overall: this makes the analysis less error-prone.

Finally, an inherent characteristic of the isolation technique here presented is that only the cytoplasmic side of the membrane is accessible, and a method to reverse the membrane in order to perform the experiments from the extracellular side would be desired. This would require a purely technical improvement. We can speculate thinking about a device with multiple pipettes with which to perform parallel suctions on the cell membrane, to break the membrane as in a patch clamp inside-out configuration and then release the membrane on a clean surface coated with poly-lysine. The proof of principle of such an improvement is not demonstrated yet.

This work is a first step for the use of SMFS in native samples, but despite the limitations here reported, we were still able to classify tens of unfolding pathways, identify with high accuracy part of them, and we could also reconcile some contrasting cryo-EM structures. We therefore believe that the proposed methodology represents a genuine step in the direction of single-cell membrane proteomics.

Materials and methods

Key resources table

Reagent type (species) or resource	Designation	Source or reference	Identifiers	Additional information
Gene (Mus musculus)	TMEM16F	PMID:34445284	N/A
Gene (Mus musculus)	TMEM16A	PMID:34089532	N/A
Gene (Spirochaeta thermophila)	SthK	PMID:30266906	N/A	pCGFP-BC vector (modified by deleting the GFP and four out of the eight histidines)
Gene (Mus musculus)	Mouse TRPC1	MiaoLing Plasmid Company (NM_011643.4)	N/A
Gene (Mus musculus)	Mouse TRPC5	MiaoLing Plasmid Company (NM_009428.3)	N/A
Gene (Mus musculus)	Mouse TRPC6	MiaoLing Plasmid Company (NM_013838.2)	N/A
Gene (Mus musculus)	N2B	PMID:25963832	N/A
Strain, strain background (Escherichia coli)	DH5α	Thermo Fisher Scientific	Cat#18258012	Competent cells
Strain, strain Background (Escherichia coli)	Stbl3	Thermo Fisher Scientific	Cat#C7373-03	Competent cells
Cell line (Mouse × Rat Hybrid)	NG108-15 cells	Sigma-Aldrich (China)	Cat#88112302
Transfected construct (Mus musculus)	TMEM16A-GFP	PMID:34089532	N/A	peGFP-N1 backbone
Transfected construct (Mus musculus)	TMEM16F-GFP	PMID:34445284	N/A	peGFP-N1 backbone
Transfected construct (Mus musculus)	6xHis-N2B- mTMEM16F-GFP	GENEWIZ (China)	N/A	peGFP-N1 backbone
Transfected construct (Mus musculus)	6xHis-N2B- mTRPC1-GFP	GENEWIZ (China)	N/A	peGFP-N1 backbone
Transfected construct (Mus musculus)	6xHis-N2B- mTRPC5-GFP	GENEWIZ (China)	N/A	peGFP-N1 backbone
Transfected construct (Mus musculus)	6xHis-N2B- mTRPC6-GFP	GENEWIZ (China)	N/A	peGFP-N1 backbone
Transfected construct (Mus musculus)	plentiCRISPR V2-TRPC1- sgRNA1	MiaoLing Plasmid Company	N/A	sgRNA sequence ccgtaagcccacctgtaaga
Transfected construct (Mus musculus)	plentiCRISPR V2-TRPC1- sgRNA2	MiaoLing Plasmid Company	N/A	sgRNA sequence acgcttgtagcagaagggct
Transfected construct (Mus musculus)	plentiCRISPR V2-TRPC5- sgRNA1	MiaoLing Plasmid Company	N/A	sgRNA sequence attactctacgccatccgca
Transfected construct (Mus musculus)	plentiCRISPR V2-TRPC5- sgRNA2	MiaoLing Plasmid Company	N/A	sgRNA sequence ggagtgtgtatccagttcgg
Transfected construct (Mus musculus)	plentiCRISPR V2-TRPC6- sgRNA1	MiaoLing Plasmid Company	N/A	sgRNA sequence gcggcagacgattcttcgtg
Transfected construct (Mus musculus)	plentiCRISPR V2-TRPC6- sgRNA2	MiaoLing Plasmid Company	N/A	sgRNA sequence taaaggttatgtacggattg
Transfected construct (Mus musculus)	plentiCRISPR V2-TMEM16F- sgRNA1	MiaoLing Plasmid Company	N/A	sgRNA sequence agcgagcgttacctcctgta
Transfected construct (Mus musculus)	plentiCRISPR V2-TMEM16F- sgRNA2	MiaoLing Plasmid Company	N/A	sgRNA sequence ctctcgggtcaaataccaag
Transfected construct (Mus musculus)	plentiCRISPR V2-control-sgRNA	MiaoLing Plasmid Company	N/A	sgRNA sequence tcttgagtttgtaacagctg
Transfected construct (Mus musculus)	mCherry-Lifeact-7	Addgene	Plasmid #54491	Actin labeled with mCherry
Biological sample (Rat)	Hippocampal and DRG neurons	Home-made	N/A	Freshly isolated from Wistar rats
Biological sample (Xenopus laevis)	Rod cells	Home-made	N/A	Freshly isolated from male Xenopus laevis
Antibody	Anti-TMEM16F (Rabbit polyclonal)	Alomone Labs	Cat#ACL-016	WB(1:200)
Antibody	Anti-TRPC1 (Rabbit polyclonal)	Alomone Labs	Cat#ACC-010	WB(1:500)
Antibody	Anti-TRPC5 (Rabbit polyclonal)	Alomone Labs	Cat#ACC-020	WB(1:500)
Antibody	Anti-TRPC6 (Rabbit polyclonal)	Alomone Labs	Cat#ACC-120	WB(1:500)
Antibody	Anti-α-Tubulin (Mouse monoclonal)	Sigma	Cat#T8203	WB(1:5000)
Antibody	Goat Anti-Rabbit HRP (Goat polyclonal)	Dako	Cat#P0448	WB(1:5000)
Antibody	Goat Anti-Mouse HRP (Goat polyclonal)	Dako	Cat#P0447	WB(1:5000)
Peptide, recombinant protein	SthK	Home-made	N/A
Peptide, recombinant protein	Hs Bacteriorhodopsin (bR)	Cube-Biotech	Cat#28903
Peptide, recombinant protein	Channelrhodopsin 1_Ca (ChR1)	Cube-Biotech	Cat#28941
Chemical compound, drug	TCEP	Sigma	Cat#C4706
Software, algorithm	Matlab2017a	MathWorks	N/A
Software, algorithm	ImageJ 1.47v	NIH	RRID:SCR_003070
Other	Hoechst	Life Technologies	Cat#33342	Stain the DNA in live cells without the need of permeabilization

All experimental procedures were in accordance with the guidelines of the Italian Animal Welfare Act, and their use was approved by the SISSA Ethics Committee board and the National Ministry of Health (Permit Number: 630-III/14) in accordance with the European Union guidelines for animal care (d.1.116/92; 86/609/C.E.).

Share this article

Cite this article

Experimental method for single-cell membrane isolation and protein unfolding.

Unfolding of reconstituted mixtures of membrane proteins.

Unfolding clusters in native cell membranes.

Bayesian identification and conditional probabilities.

Bayesian identification of the unfolding clusters.

Validation of the method with a comparison between native clusters and those obtained with overexpression of a construct with the N2B signature at the N-terminal and GFP at the C-terminal.

Structural segmentation in the loops of TRPC6.

Author details

Nicola Galvanetto

Present address

Contribution

Contributed equally with

For correspondence

Competing interests

Zhongjie Ye

Contribution

Contributed equally with

Competing interests

Arin Marchesi

Contribution

Competing interests

Simone Mortal

Contribution

Competing interests

Sourav Maity

Contribution

Competing interests

Alessandro Laio

Contribution

Competing interests

Vincent Torre

Contribution

Competing interests

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism

Further reading