The Bioinformatics Manual · Follow
7 min read · Dec 14, 2021
--
The TMHMM server is used for predicting Transmembrane domains in proteins that have been recently found. The most recent release of the TMHMM server is the TMHMM 2.0. In this blog, we discuss how we can use this server to make our transmembrane predictions as well as do an analysis of the results obtained from this server. And let's do this whole process in just 3 simple steps! What are we waiting for? Let's go!
Pre-requisites: We will be using the Insulin-like growth factor receptor protein from Uniprot for this blog.
Open the Uniprot website and type Insulin in the search box. Click on the first protein that appears. IGF1R_HUMAN from hom*o sapiens.
Go to the top result and click on its entry number i.e., P08069. It should open up a page as follows.
Scroll down on the page to find the sequence section.
Click on the blue-colored FASTA button with the down arrow, to get a text file of the sequence.
Select the whole sequence using Ctrl+A (Windows)/Cmd+A (Mac) and then copy the sequence using Ctrl+C (Windows)/Cmd+C (Mac). Alternately, you could also download the sequence by right-clicking the text page and saving it as a text file on your computer.
Once you're done copying the sequence, you can paste the sequence in the TMHMM server using Ctrl+V (Windows) / Cmd+V (Mac) on your system.
This is how the sequence looks after pasting. Awesome! We've completed step 1, let's head to Step 2! Woohoo!
Once we've pasted our sequence, the TMHMM server gives us some options for viewing our results.
We can choose the output format as extensive, with graphics, extensive, with no graphics, and one line per protein.
The extensive, with graphics form, gives us a visual depiction and chart of the transmembrane region and from where to where it lies in the protein.
Whereas the extensive, without graphics option, doesn't provide us with a chart but gives us an extensive summary of where the transmembrane domain lies.
The one line per protein gives just a one-line summary of where the transmembrane domain lies and is really helpful in cases where you want to upload multiple FASTA sequence files, and quickly understand where the transmembrane regions of each of those proteins lie.
For this blog, we choose the extensive, with graphics option and we will also have a look at the one-liner option later. And then for the other options, we don't want to use the first version of the server, so we don't check the box near the use old model option.
This is how our options look like now. We click on submit to submit our sequence. It takes a couple of minutes to predict the transmembrane region and then you have your results! Awesome! And with that, we're done with Step 2! Let's go to Step 3! Yay!
In Step 3, we analyze the results we have obtained from the TMHMM server, and this is undoubtedly the most important part of the whole blog. So let's begin!
Let's go line-wise to make things easier, and then move on to the diagram to try to understand what it depicts.
Line 1: The first line shows the length of the protein query we had provided the server with. In this case, the length was 1367.
Line 2: The second line tells us the number of transmembrane domains that were found in the protein. In this case, there is only one transmembrane domain that was found.
Line 3: The third line tells us the expected number of amino acids in transmembrane domains. If the number in this line is above 18, there is a transmembrane domain present positively in the protein. In our case, the number is 31, showing that on average there our transmembrane region is 31 amino acids long, and since 31>18, there exists a transmembrane domain in our protein.
Line 4: This shows the expected number of amino acids in transmembrane domains in the first 60 amino acids of the protein. In case this number is large, a transmembrane domain might be predicted somewhere between 1–60 positions, and we have to check if the transmembrane protein is not a signal peptide, coz, there is a high possibility it could be.
Line 5: The 'Total Prob of N-in' term indicates the total probability of N-terminal being on the cytoplasmic side of our transmembrane domain. In our case, this probability number is 0.15024 ( the closer it is to 1 the better ), which means the probability of the N-terminal being on the cytoplasmic side is very less, indicating that the N-terminal is not on the cytoplasmic side.
Line 6,7,8: Line 6 directly shows the positions outside the transmembrane domain, line 7 shows the amino acid positions for the transmembrane domain, and line 8 shows positions inside and on the other side of the transmembrane domain.
Now that we have a good understanding of what the lines mean, let's try to understand what the graph in the result depicts.
Looking at the graph, you can say that the positions 0–1367 are marked on the X-axis and the probabilities are marked on the Y-axis. The probability of transmembrane is marked with a solid red color, probability of the inside part is marked with blue, and probability of outside with pink.
The faint blue and pink lines in the graph show the actual probability of a particular position being inside, outside, or on the transmembrane domain, whereas solid lines depict an overall probability of the position being inside/outside or on the transmembrane domain.
For example, in a couple of positions beginning from zero, the probability of that position being outside is around 0.82, inside is around 0.17, and on the transmembrane domain is around 0.18. Therefore, overall, the position is assumed to be outside of the transmembrane domain.
Awesome!
Bonus: Sometimes when the expected number of transmembrane amino acids is greater than 10 in the first 60 positions, you might get a warning, given by ' POSSIBLE N-term signal sequence ' which means there might just be a signal peptide in the first 60 positions.
One-liner result interpretation:
Now, you usually won't have any issues even if you don't have the graph to look into, and when you use the extensive, no graphics option because most things are going to be the same. But, if you use the one-line option, you might just see a few more terms that might be difficult to understand. So, let's look at a one-liner result from the TMHMM server as well.
Looking at the one-liner, it is evident that we first have the name of the protein ' sp|P08069|IGF1R_HUMAN ' and then the length of the protein as 1367.
ExpAA: This is the expected number of amino acids in the transmembrane domain and in our case is 31.46.
First60: This is the expected number of transmembrane amino acids in the first 60 amino acids of the protein. And in our case is 3.25.
PredHel: This is something new in this one-liner, and it simply shows the number of transmembrane helices predicted in the protein. In our case, only one transmembrane domain is predicted in the protein.
Topology: This gives the positions of amino acids in the transmembrane bounded by an ' i ' or ' o ' meaning inside or outside the transmembrane domain. In our case, the topology is given by ' o936–958i ' which can simply be read as 0- 935 lies outside, 936–958 is the transmembrane and 959–1367 is inside and after the transmembrane.
Easy right!
I guess with this we are done with the third step of result analysis as well! Yay! This is amazing! We’ve concluded from our observations of the graph that our transmembrane domain sequence lies from 936–958 positions of the protein!
Awesome! Amazing! This is cool! We’ve found out where our transmembrane domain is! This calls for some celebrations! Woohoo!
But I guess that should be it for today! Congratulations on making it to the end! You’re a hard worker! Keep learning to work on new software every day! And thanks for reading!
Simple.Concise.Precise. That’s the motto.
Note from the Author: Hey there! How you doin’? Hope you enjoyed my writing! Let me know if you liked it! You can always write to me if you want me to write the manual for a particular software, give me a feedback, or even want to reach out regarding anything in general. I’ll be happy! Reach out to me at snippetsbio@gmail.com. Thank you!