deglycosylated protein amino acid sequence help!!

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

deglycosylated protein amino acid sequence help!!

Helen Pagett
Hello,
Im not sure if this is the best forum to place this but Im hoping that someone might be able to help!!

Im a marine biologist by training but have veered into the dark art of proteomics/glycobiology for my PhD...Im feeling a little out of my depth due to a complete lack of knowledge of basic principles!

So, basically I have a glycoprotein that I have purified which is the same as the one used in this paper (An α2-macroglobulin-like protein is the cue to gregarious settlement of the barnacle Balanus amphitrite. Dreanno, Matsumura, Dohmae, Takio, Hirota, Kirby and Clare.  PNAS  2006   vol. 103  no. 39  14396-14401) so have the amino acid sequence as found in that paper (GenBank accession number AY423545). I have been concentrating on the sugars side of the glycoprotein but am curious about the protein structure too.

I know that it is possible to guess possible glycosylation sites on the protein from the amino acid sequence (there are 7) but is it possible to find out other things?:

1) when treated with mercaptoethanol the native protein denatures into 3 subunits (98, 88 and 76kDa) on SDS-PAGE gel. This means they are disulphide bonds (correct me if I'm wrong?!). Can the amino acid sequence tell me where these subunits split (ie where the disulphide bonds are?)...this would mean I can find out which of the possible glycosylation sites is on which subunit.
2) any other structural information that can be found from the sequence?

Ive been doing some TEM work on the deglycosylated protein too to get some more info about its structural characteristics. I have found another protein to compare it to on the RCSB website, which is TEP1r (accession numer AF291654). They are ~26% similar. In regard to this I would like to know:
1) Where the 3 disulphide bonds are in this sequence (I think its the last  6 Cys's of the sequence from the RCBS PDB sequence details as it shows green lines between them...how has this been worked out?)
2) Where is the thioester bond found in the sequence? And can I display it in a PyMOL type image?

If I can combine all of this info into one neat paragraph for my thesis discussion it would be fantastic!!
If anyone could be able to help (even if its to tell me I cant find out this information from the amino acid sequence!) I would be very very very grateful!!

Thanks, Helen

--------------------------------------------------------------
Miss Helen E Pagett
PhD Researcher
School of Marine Science and Technology
Biofouling Group
Newcastle University
Ridley Building, Rm 375
Claremont Road
Newcastle upon Tyne
NE1 7RU
Office 0191 222 5048
Mobile 07970 848802
Web www.ncl.ac.uk/barnacles/
http://www.nerc.ac.uk/using/schemes/yes/winners.asp

NUSAC Expeditions Officer

_______________________________________________
Proteins mailing list
[hidden email]
http://www.bio.net/biomail/listinfo/proteins
Reply | Threaded
Open this post in threaded view
|

re: deglycosylated protein amino acid sequence help!!

Matthew Connelly
Hi Helen,
Welcome to the Dark Arts of Proteins!
(And secondly, sorry if any of the information below is stuff you either already know, or is just plain unhelpful!)

Firstly, a good place to go to work with sequence data is Expasy.ch

Secondly, the Uniprot ID for your protein (Q0IKU9) will be more useful on Expasy than the Genebank accession number AY423545.

Thirdly, at first glance it seems the amino acid sequence you are using might not be representative of the protein sequence in vivo. Looking at the Uniprot entry page at http://www.uniprot.org/uniprot/Q0IKU9.html there are two things which look suspicious. Firstly, there is no region annotated as CHAIN (which would be the actual chain in vivo, is not annotated by computer), and the first amino acid in the sequence is a methionine. The methionine start codon doesn't often make it into the final protein polypeptide due to post translational cleavage.

Also, the sequence only codes for a single polypeptide chain, which is not consistent with producing multiple bands on SDS-PAGE when you treat with a reducing agent.
Some explanations for the multiple bands could be:

        1) the disulphide bonds are all intramolecular, and post translational cleavage is turning one chain into several (see this page on insulin secretion to see an example of what I mean http://www.vivo.colostate.edu/hbooks/pathphys/endocrine/pancreas/insulin.html)
        2) the disulphide bonds are inter-molecular, and there is another coding sequence at work
        3) there are some impurities, but that would likely give multiple bands in a non-reducing gel.

Post translational cleavage doesn't seem likely at first glance, the total amino acid sequence for your protein only comes to about 170 kDa in weight, but the total mass of your bands is around 260 kDa (try using the protparam tool on expasy to calculate the weight of your protein Q0IKU9, it is a good example of one of the tools available there).

However, some other things could be going on to explain the mass difference:

1) Options 2 or 3 above.
2) Glycosylation weirdness: Your protein (thanks to cleavages) could be a lot smaller than 170kDa. but the subunits will give inaccurate molecular weights due to the presence of glycans. Glycans do not bind SDS moieties, so all your mass/charge ratio goes out of whack and they look heavier than they really are. Try treating everything with PNGase F first (NEB sell a nice kit) - on deglycosylation some of the bands should move to a lighter position. This will not only help identify glycosylated bands, but also help with getting more accurate weights.

N.B. Also, knowing the non reducing band weight will help rule out a lot of options in both lists.

The Interpro annotations (such as the Thioether region) are computer annotations, and might not be correct. You can get a better idea of what you should expect in your protein by looking at the annotations for some similar ones which have been annotated by hand.

If you want to look at similar proteins, try using the BLAST tool on Expasy, this gave me lots of possible proteins which match
for example:

 B8R3M2   _IXORI Alpha-2-macroglobiln splicing variant 1 precurso...  642   0.0
 Q8IT76   _ORNMO Alpha-2-macroglobulin splice variant 1 precursor...  617 e-174
 O01717   _9CHEL Alpha-2-macroglobulin [Limulus sp]                   607 e-171
 A3QX15   _LITVA Alpha 2 macroglobulin [Litopenaeus vannamei (Whi...  557 e-156
 Q60486   _CAVPO Alpha-macroglobulin precursor [Cavia porcellus (...  513 e-143
 A0T1M1   _MACRS Alpha-2-macroglobulin [Macrobrachium rosenbergii...  498 e-138
 D3YW52   _MOUSE Putative uncharacterized protein Pzp [Pzp] [Mus ...  489 e-135
 Q61838   A2M_MOUSE Alpha-2-macroglobulin precursor (Alpha-2-M) ...   488 e-135
 Q641C5   _XENLA MGC82112 protein [MGC82112] [Xenopus laevis (Afr...  475 e-131
 Q76DK1   _PENJP Alpha2-macroglobulin homolog [Penaeus japonicus ...  468 e-129
 A0T1M0   _LITVA Alpha-2-macroglobulin [Litopenaeus vannamei (Whi...  448 e-123
 Q6TL26   _9BIVA Alpha macroglobulin [Chlamys farreri]                443 e-122
 B5ACH4   _9BIVA Alpha2-macroglobulin [Cristaria plicata]             439 e-120
 A0A1G5   _9BIVA Alpha-2-macroglobulin [A2M] [Hyriopsis cumingii]     434 e-119

You can then perform another analysis immediately after the BLAST search using an algorithm called CLUSTALW which will show you exactly how these other A2 macroglobulins compare to yours by lining the simlar regions together. If these have coding regions and glycosylaton sites annotated etc. in their Uniprot entries it can help give you a good idea of what is consistent within the family of proteins (such as an estimate of molecular weight for example).

However, nothing is gospel, but it can help you get a feel for the kind of protein you are working on.

Try looking at this one http://www.uniprot.org/uniprot/Q8IT76 - it shows that there are 2 chains resulting from a single initial gene (just like insulin). Try using protpparam on each chain region and see if the molecular weights are a similar match your bands maybe....

I hope some of this helps. Email me if you have any other questions or queries and I would be glad to help further if possible.

Good luck,

Matt

================
Matthew Connelly
Senior Scientist
Lab901 Ltd.






_______________________________________________
Proteins mailing list
[hidden email]
http://www.bio.net/biomail/listinfo/proteins
Reply | Threaded
Open this post in threaded view
|

Re: deglycosylated protein amino acid sequence help!!

Dr Engelbert Buxbaum
In reply to this post by Helen Pagett
In article <[hidden email]>,
[hidden email] says...
>
> Hello, Im not sure if this is the best forum to place this but Im
> hoping that someone might be able to help!!
>
> Im a marine biologist by training but have veered into the dark art of
> proteomics/glycobiology for my PhD...Im feeling a little out of my
> depth due to a complete lack of knowledge of basic principles!
>
> So, basically I have a glycoprotein that I have purified which is
thesame as the one used in this paper (An a2-macroglobulin-like protein
is the cue to gregarious settlement of the barnacle Balanus amphitrite.
Dreanno, Matsumura, Dohmae, Takio, Hirota, Kirby and Clare.  PNAS  2006  
vol. 103  no. 39  14396-14401) so have the amino acid sequence as found
in that paper (GenBank accession number AY423545). I have been
concentrating on the sugars side of the glycoprotein but am curious
about the protein structure too.

>
> I know that it is possible to guess possible glycosylation sites on
> the protein from the amino acid sequence (there are 7) but is it
> possible to find out other things?:
>
> 1) when treated with mercaptoethanol the native protein denatures into
> 3 subunits (98, 88 and 76kDa) on SDS-PAGE gel. This means they are
> disulphide bonds (correct me if I'm wrong?!). Can the amino acid
> sequence tell me where these subunits split (ie where the disulphide
> bonds are?)...this would mean I can find out which of the possible
> glycosylation sites is on which subunit. 2) any other structural
> information that can be found from the sequence?
>
> Ive been doing some TEM work on the deglycosylated protein too to get
> some more info about its structural characteristics. I have found
> another protein to compare it to on the RCSB website, which is TEP1r
> (accession numer AF291654). They are ~26% similar. In regard to this I
> would like to know: 1) Where the 3 disulphide bonds are in this
> sequence (I think its the last 6 Cys's of the sequence from the RCBS
> PDB sequence details as it shows green lines between them...how has
> this been worked out?) 2) Where is the thioester bond found in the
> sequence? And can I display it in a PyMOL type image?
>
> If I can combine all of this info into one neat paragraph for my
> thesis discussion it would be fantastic!! If anyone could be able to
> help (even if its to tell me I cant find out this information from the
> amino acid sequence!) I would be very very very grateful!!
>
> Thanks, Helen

There is another group, bionet.glycosci, where you may ask about the
sweet things ;-).

Now about the protein part of your questions:

When you treat your protein with ßME, you indeed split disulphide bonds.
That you get 3 bands in SDS-PAGE means, that your protein is a hetero-
oligomer with 3 subunits of different molecular mass (not: weight!).
There may be just 3 subunits, or the 3 subunits form a protomer, of
which several come together for the functional protein (if you want to
read up on this, the textbook example is hemoglobin, a diprotomer. Each
protomer consists of an alpha- and a beta-subunit). Assuming a
monoprotomer, the molecular mass would be around 262 kDa. You should be
able to check the mass of the complete protein by doing the SDS-PAGE in
the absence of ßME (just leave it out of the sample buffer, then proceed
as usual). Obviously, you need a lower percentage gel to resolve such
big proteins.

None of your bands corresponds to AY423545, which has a protein
molecular mass of 170 kDa (check the uniprot link in the genebank
entry), to which the sugar chains would have to be added. Unless the
molecular mass differences are caused solely by differences in
glycosylation, the 3 bands correspond to 3 different protein sequences.
These may however be produced by proteolytic splitting of AY423545, the
difference between the 262 kDa found and the 170 expected might be the
contribution of the sugar chain to the electrophoretic mobility (which,
however, is not the molecular mass of the bound sugar!). You could blot
the bands onto a PVDF membrane and give it to a protein science core
facility for N-terminal sequencing. That gives you the first 30 or so
amino acids of each protein, which you can use for a BLAST search.  

In order to locate the disulphide bonds, you take the native protein and
label all free Cys with iodoacetamide. Those Cys, which take part in
disulphide bonds are not modified at this step. Then you split the
disulphides with TCEP and label the now free disulphide-forming Cys with
a fluorescing or radioactive SH-reagent, e.g. IAEDANS. The protein is
then fragmented by protease digestion, the peptides isolated
(chromatography and/or electrophoresis) and the fluorescent ones
sequenced. Again, you locate the position by BLAST search. The next step
would be to find out which of those Cys reacts with which other, but you
can come back here once you got that far ;-).

The other method of course would be to crystallize your protein and
determine the structure by X-ray diffraction. There you would actually
see the disulphide bonds. This is how most of the data in PDB were
obtained (some also by NMR, but your proteins are too big for that). To
view PDB files, use DeepView (http://spdbv.vital-it.ch/).

Either way, obtaining such structural information on a protein is not a
task done between now and lunch, and deserves more than a paragraph in
your thesis!
 
_______________________________________________
Proteins mailing list
[hidden email]
http://www.bio.net/biomail/listinfo/proteins