Python Distance Pairwise Alignment

Phylogenetic analysis is often used to direct libraries during protein engineering. Homologous sequences with known activities are collected and aligned. Residues that are critical for activity tend to be conserved. Residues that defy high conservation in homologs often affect rate or alter specificity. These residues are then mutagenized during library construction. Unfortunately, pairwise and multiple sequence alignments contain no structural context and so researchers often have to reference crystal structures to determine mutatgenesis sites.


Because the rates of IgG and elastin has only been demonstrated for human MMP 3, 7 and 12, multiple sequence of other known but less characterized mamalian MMPs may not be relevant. However, there are likely residues in MMP12 that facilitate IgG cleavage and residues in MMP7 that hinder cleavage.  To facilitate phlyogenetic analysis of MMPs, I have developed a python script which takes EMBOSS NEEDLE pairwise alignments and determines the distances of mismatches from the active site zinc. Intially MMP7 was aligned to MMP12 and MMP3 (Fig. 3). The output can easily be divided into sets of residues  based on their distance from the active site in the MMP7 crystal structure. Choosing a distance that is the length of your substrate of interest removes residues unlikely to be involed in substrate binding.















There were fewer mismatches between MMP12 and 7 than between MMP3 and 7 within 15A. MMP12/7 residues within 15A of the active site are Y172H, G179N, T180I, V214A, V215T and T216A. MMP3/7 residues within 15A of the active site are R101S, T102L, L179N, A218T, F228G, T233P, T240L, H242G, and N257K. Only residue 180 was conserved between MMP3 and 12 but not in MMP7 (Fig. 4)
















Distance alignment analysis of MMP1, 17, 19, 23 was also performed. These MMPs are members from the four clades of hMMPs. Analysis reveals the number of residues that change within 11A of the active site vary between 3 to 10 and are not necessarily an indication of phylogenetic distance. Hydrophobic residue 180 was seen in the phylogenically distance MMP19 further suggesting that the T180 in MMP7 may be deletirous to elastin and IgG activity since it occurs in MMP7 and a MMP with no activity for IgG. 






Fig. 3. The output consisting of bars, periods and semicolons indicate amino acid matches, mismatches, and similar phyiscal characteristics respectively.

Fig. 4. Mismatches between MMP7/12 and MMP7/3 within 15A of active site zinc. There are many more mismatches between MMP7/3 than MMP7/12. 

Fig. 5. Number of mismatches between MMP7 and other MMPS. .