PProtein sequencing presents different challenges than nucleic acid sequencing, and proteomics has not benefited as much from the next generation sequencing revolution as genomics. However, the ability to sequence proteins at the nucleic acid level would be extremely beneficial. Thus, scientists High-throughput nucleic acid sequencing Get ideas for improving existing protein sequence analysis techniques and developing new ones.1
Jeff Nivara uses nanopores to read proteins with single amino acid sensitivity.
Jeff Hagen
Jeff NivaraNivara, a molecular engineer at the University of Washington, sees nanopore technology as the way forward for single-molecule protein sequencing and beyond. In this interview, Nivara said: New TechnologiesThere, he used the enzyme ClpX to unfold and ratchet long protein chains through nanopores, allowing them to be read with a sensitivity of single amino acids. 2
What sparked your interest in using nanopore technology for protein sequencing?
A major breakthrough in nucleic acid sequencing using nanopores was the discovery of motor proteins that could ratchet a chain through the pore, nucleotide by nucleotide. When I started my graduate studies, I tried to find a similar motor for proteins. Fortunately, around the same time, The study was published cell This characterizes how the unfoldase ClpX functions at the single-molecule level.3 The details presented in this study allow us to envision how this motor protein could be applied to nanopore protein arrays. Putting the two together.4
What are the biggest differences between using nanopore technology for protein sequencing versus nucleic acid sequencing?
Protein sequencing is much more difficult. Nucleic acids have a uniform negatively charged backbone, so electrophoretic forces alone can move them through a nanopore. Proteins, on the other hand, are non-uniformly charged, so this doesn’t work well and the signal is noisy. It’s also much more complicated when dealing with 20 amino acids compared to 4 nucleotides, and you have to take into account tertiary structure, folded domains, etc.
How hard is it to get a protein through a nanopore?
ClpX can process synthetic proteins fairly easily because they are unfolded. Natural proteins with fully folded domains are difficult to disassemble because they must be unfolded before passing through the nanopore. Proteins can also be refolded on the trans side of the pore after passing through. Another study using this technique actually had to fold the protein twice: once before passing through the nanopore by electrophoretic forces, and once again before moving up again. Much remains to be learned about how well the motor works with specific proteins.
How difficult is it to distinguish between different amino acids?
The signal we observe arises from sensing a sliding window of approximately 20 amino acids passing through the pore at once. This makes it much harder to detect single amino acid differences; however, the longer the sequence, the greater the likelihood of generating distinct signal elements. When these individual elements are combined, a collective unique signature is created, and the signature can be used to identify the protein, employing a fingerprint-based approach.
These sequence differences can be very subtle, making it difficult to run traditional statistical and other analytical methods through – this is where machine learning comes in. We train machine learning programs to extract the signal differences between different proteins, learn which features are associated with which amino acids, and map how surrounding amino acids contribute to the signal observed at a particular position.
What are your short-term and long-term goals for this technology?
We are expanding the size of our dataset by looking at more complex amino acid sequences, allowing us to train better models. Currently, most of our data comes from synthetic proteins, so we are adding more natural molecules to build our dataset. Our ultimate goal is to have a model that can recognize any protein in the human proteome.
As we continue down this path, we expect new challenges to emerge. For example, will improved motor proteins be required for certain types of protein sequences? Will different pore sizes improve the sensitivity of this technique, as current nanopores are optimized for DNA sequences? We foresee many improvements that will dramatically improve this technique in the future.
This interview has been condensed and edited for clarity.