Vol. 165, No. 1 JOURNAL OF BACTERIOLOGY, Jan. 1986, p. 161-166 0021-9193/86/010161-06$02.00/0 Copyright © 1986, American Society for Microbiology Nucleotide Sequence Corresponding to Five Chemotaxis Genes in Escherichia coli NORIHIRO MUTOH AND MELVIN I. SIMON* Division of Biology, California Institute of Technology, Pasadena, California 91125 Received 10 September 1985/Accepted 24 October 1985 The nucleotide sequence of DNA which contains five chemotaxis-related genes of Escherichia coli, cheW, cheR, cheB, cheY, and cheZ, and part of the cheA gene was determined. Molecular weights of the polypeptides encoded by these genes were calculated from translated amino acid sequences, and they were 18,100 for cheW, 32,700 for cheR, 37,500 for cheB, 14,100 for cheY, and 24,000 for cheZ. Nucleotide sequences which could act as ribosome-binding sites were found in the upstream region of each gene. After the termination codon of the cheW gene, a typical rho-independent transcription termination signal was observed. There are no other open reading frames long enough to encode polypeptides in this region except those which code for the two previously reported genes tar and tap. Many of the genes that are required for bacterial chemotaxis have been identified, and their gene products have been characterized in both Escherichia coli and Salmonella typhimurium (1, 4, 10, 12, 17, 22). Two operons that are adjacent to each other on the bacterial genome encode 10 chemotaxis-related functions. The Mocha operon includes the motA and motB genes that are responsible for coupling flagella rotation to energy supplied by the electrochemical gradient across the cell membrane. Adjacent to these genes are the cheA and cheW genes that are required for chemotaxis (1, 12, 17). The second operon includes the tar and tap genes, which are responsible for the synthesis of transmembrane receptor proteins (2, 7), the cheR and cheB genes, which reversibly methylate the transmembrane proteins and are responsible for adaptation (5, 20, 23), and the che Y and cheZ genes, which are thought to play a central role in generating a signal that regulates bacterial flagella rotation (1, 3, 12, 15, 17). In the past 10 years, we have learned how to measure and analyze components of the chemotaxis system in a variety of sophisticated ways; however, we still do not understand the basis for the signal transduction process, i.e., how the binding of a specific attractant molecule to a receptor generates a signal that regulates flagellar rotation. One approach to this problem involves the manipulation of the levels of the chemotaxis gene products and the isolation and purification of the gene products to study their biochemical properties. This approach would be greatly facilitated by the availability of the nucleic acid sequence of the genes responsible for chemotaxis. The tar and tap gene sequences have been published (8). In this paper, we report the DNA sequence and the derived amino sequences for the cheW, cheR, cheB, cheY, and cheZ genes and for part of the cheA gene. tories, and M13-mp8 and M13-mp9 were purchased from P-L Biochemicals, Inc. Dideoxy nucleotide triphosphates and primers were purchased from Bethesda Research Laboratories. [ot-32P]dATP (3,000 Ci/mmol) was from Amersham Corp. DNA fragments were obtained by gel electrophoresis after digestion with restriction enzymes. Agarose gel (0.8%) was used to separate fragments more than 600-base-pairs long, and 5% polyacrylamide gels were used for fragments that were smaller than 600 base pairs. DNA sequencing was done by the procedures described by Heidecker et al. (7). DNA sequencing. A 9.6-kilobase EcoRI fragment of pAK108 (2) was purified by gel electrophoresis. This fragment was cut with a variety of restriction enzymes, and the resulting fragments were cloned into M13-mp7, -mp8, or -mp9 phage. The hybrid phages were grown on Escherichia coli JM103 in L broth (1% tryptone [Difco Laboratories], 0.5% yeast extract, 0.5% sodium chloride). Phage DNA was purified by phenol extraction and ethanol precipitation. Sequencing was done with the purified phage DNA templates and the 26-base pair primer. RESULTS The plasmid pAK108 (2) carries an EcoRI fragment of 9.6 kilobases. This DNA includes the sequences corresponding to half of the cheA gene and the complete che W gene and the chemotaxis operon which includes the two genes encoding the chemosensory transducers, tar and tap, and four other chemotaxis-related genes, cheR, cheB, che Y, and cheZ. Figure 1 shows the strategy used to determine the nucleotide sequence. Appropriate restriction fragments were cloned into the M13 vectors, and initial sequencing was done at random. Overlapping sequences were matched by scanning the sequences and compiling them with a computer program. All of the sequencing was done so that there were at least two separate determinations for each segment, and the ends of each sequence were overlapped by a sequence determined for a separate fragment. The numbering scheme chosen was essentially arbitrary, with number one corresponding to the middle of the EcoRI site that defined the start of the fragment carrying the cheA gene (Fig. 1A and 2). Nucleotide 1198 of this segment corresponded to the first nucleotide of the sequence that Krikos et al. (8) previously published for the tar gene. In the same way, the sequence of the second segment that encoded the cheR, cheB, che Y, and cheZ genes MATERIALS AND METHODS Enzymes and chemicals. All restriction enzymes used in this study were purchased from Bethesda Research Laboratories, Inc., and New England BioLabs, Inc. T4 DNA ligase was obtained from Bethesda Research Laboratories. M13-mp7 replicative-form DNA was from Bethesda Research Labora* Corresponding author. 161 MUTOH AND SIMON 162 J. BACTERIOL. Eco RI A BII 11 I It "MI - 1201 che W I I Hinc I Sou 3AI Hae DI Hpa EI che A II-II " .-4==0-I=t- ---l ' che R che B Hind Il Sal I i 11 Sou 3A I Hoe m Alu I I III I 1 hl t * I* l -, I _~~ ~ ~ ~ ~- i .--¾I Hpa Taq I che Z che Y 1 2995 PvuE lHpa I I , 4 I I - 1 1e I I FIG. 1. Strategy for DNA sequencing. (A) Schematic drawing of the fragment extending from the EcoRI site in the middle of the cheA gene to the end of the cheW gene. The cheW gene ends at nucleotide 1201, with nucleotide number 1 corresponding to the G residue in the EcoRI site. The restriction fragments that were sequenced were derived by digestion with the enzymes shown on the right side. The arrows indicate the direction and the extent of sequencing. (B) Map of the region downstream from the tap gene. The numbering is arbitrary, with nucleotide number 30 corresponding to the A in the first codon of the cheR gene. The vertical lines indicate the position of specific restriction sites, and the horizontal arrows indicate the extent of sequencing of fragments obtained with specific restriction enzymes. numbered from 1 to 3,063 (Fig. 1B, 3, 4). Nucleotide 1 of this sequence corresponded to nucleotide 3450 of the tap sequence that was published previously (8). This data therefore links up the sequence previously published (8) and provides continuous sequence information covering seven chemotaxis genes and 7.7 kilobase pairs of DNA. Figure 2 shows the last part of the sequence of the cheA gene and the complete sequence of the cheW gene. The cheW gene is the last gene in the Mocha operon, which comprises the motA, motB, cheA, and cheW genes. Downstream of this operon is the next operon in the chemotaxis series, which includes the tar, tap, cheR, cheB, cheY, and cheZ genes. In screening the sequence corresponding to cheW, we found a single open reading frame including nucleotides 701 to 1201. This open reading frame specified a polypeptide of 167 amino acid residues with a calculated molecular weight of 18,000. This value is only slightly larger than the molecular weight of the cheW gene product estimated from sodium dodecyl sulfate-polyacrylamide gel electrophoresis of the labeled cheW gene product (10, 18). The ATG start codon was selected because it gave the longest open reading frame and because 11 base pairs upstream from this ATG codon there is a sequence, AAGG, which could act as part of the ribosome-binding site for the translation of the cheW gene product. At the end of the cheW open reading frame, there is an inverted repeat sequence followed by a cluster of T residues. This is typical of a rho-independent transcription termination signal (14) and could represent the signal for the termination of the Mocha operon transcript. Preceding the open reading frame that represents the cheW sequence, there is another long, contiguous, uninterrupted open reading frame which extends to the end of the fragment. Figure 2 shows the translated product of this open reading frame; it corresponds to the C-terminal end of the cheA gene product. was Figure 3 shows the sequence starting at the end of the tap The nucleotides TGA at positions 18, 19, and 20 correspond to the termination codon of the tap gene. There is a long open reading frame that starts with an ATG codon at nucleotide 30 and terminates with a TAA codon at nucleotide 890. This open reading frame is preceded by sequence AAGG, which could be part of the ribosomebinding site (16). The polypeptide encoded by this open reading frame has a calculated molecular weight of 32,700 and consists of 286 amino acids. The molecular'weight of the cheR gene product has been reported to be 28,000 (10, 18) and is very similar to the molecular weight derived from the product encoded by this open reading frame. There is no other long, extensive open reading frame that we could discern that covers this region, and therefore, we have assigned this open reading frame to the cheR gene. The next long open reading frame begins at an ATG codon at nucleotide 893 and ends with a TAA codon at nucleotide 1942. This open reading frame specifies 349 amino acid residues encoding a polypeptide with a molecular weight of 37,500, which agrees with the 38,000 molecular weight reported for the cheB gene product. Interestingly, this open reading frame is preceded by a sequence, AAGGA, which could act as part of the ribosome-binding site (16). However, this sequence is also a portion of the coding region of the cheR gene. Thus, there is an apparent overlap between the sequences that encode the cheR gene and sequences that might act as the ribosome-binding site for the beginning of translation of the large open reading frame corresponding to the cheB gene. Further evidence to support the contention that this open reading frame corresponds to the cheB gene comes from the finding of an SstII restriction endonuclease recognition site (CCGCGG, nucleotide 1070 to 1075) which was reported to exist in the cheB gene by Slocum and Parkinson (19). In previous labeling experiments in which gene. CHEMOTAXIS GENE SEQUENCES VOL. 165, 1986 CA ATT CTC GCA AAA CCC CCC TCC C CGCT TTC ACT CTC ACC CM MC ATC ACC CAC GAC GA CTC CCC ATG CTG ATA TTT Ile Lou Ala Lys Ala Ala Bar Cln Cly Leu Thr Val Ser Clu Agn Met Ser Asp Asp Clu Val Ala Met Leu Ile Phe 80 81 GCA CCT CCC TTC TCC ACC CCA CAC CAC CTC ACC CAC GTC TCC GCC CCC CCC CTC CCC ATC CAC GTC CTT AM CCT AAT ATC Ala Pro Cly Phe Ber Thr Ala Clu Gln Val Thr Asp Val Ser Cly Arg Cly Val Gly Not Asp Vol Val Lys Arg Asn Ile 161 162 CAC AC ATC CGC CCT CAT CTC CM ATC CAC TCC AAC CAG CCT ACT CCC ACT ACC ATC CCC ATT TTA CTC CCC CTC ACC CTC Cln Lys Not Cly Gly His Vol Clu Ile CGln Sr Lys Cln Cly Thr Cly Thr Thr Ile Arg Ile Leu Leu Pro Leu Thr Leu 242 243 CCC ATC CTC CAC CCC ATC TCC CTA- CCC CTT CCC CAT CAA CTT TTC ATT CTC CCC CTC MT CCT CTT ATC CM TCA CTC CM Ala Ile Lou Asp Cly Not Ser Val Arg Val Ala Asp Clu Val Phe Ile Leu Pro Lou Asn Ala Va' Met Glu Ser Leu Gln 323 324 CCC CGT CM CCC CAT CTC CAT CCA CTC CCC GGC CCG ACC CCG TGC TGG AAC TGC CGG GTC AMT ATC TGC CCA TCC TCC AAC Pro Arg Clu Ala Asp Lou His Pro Leu Ala Gly Ala Ser Cly Cys Trp Lys Cys Gly Vol Asn Ile Cys Pro Ser Ber Asn 404 405 TCT ACC CAG GCA ATT CTC GTG ATC TTA CAA Auf CCC GGT CCC Ala Thr Gln Gly Ile Vol Vol Ile Leu Cln Ser Cly Cly Arg 485 486 CCC TAC CCC TTC CTC CTC CAT CM TTA ATT CCT CM CAC CAC CTT CCC CTT AM MAC CTT GM ACT MC TAT CCC AM CTC Arg Tyr Ala Leu Leu Vol Asp Cln Leu Ile Cly Cln His Gln Vol Ala Vol Lys Asn Leu Glu Ser Asn Tyr Arg Lys Val 566 567 CCC CCC ATT TCT GCT CCC ACC ATT CTT GGC CAC CCC ACC CTC CCA CTC ATT CTT CAT GTC TCC CCC TTC CAG CCC ATA MC Pro Cly Ile Ser Ala Ala Thr Ile Leu Cly Asp Gly Ser Vol Ala Leu Ile Val Asp Vol Ser Ala Leu Gln Ala Ile Asn 647 648 CCC CM CM CCT ATC CCC MC ACC CCC CCC TGA ATCACTAMA AGCTMCMT ATC ACC CCT ATC ACC MT GTA ACA AAC CTG Net Thr Gly Met Thr Asn Val Thr Lys Leu Arg Clu Cln Arg Met Ala Asn Thr Alo Ala 730 731 CCC AGC GAG CCC TCA CCC CAG CM TTT CTC GTA TTT ACC CTT GGT CAT GM GAG TAC GGT ATT CAT ATC CTC MA GTC CAC .Ala Ser Clu Pro Ser Cly Gln Clu Phe Leu Vol Phe Thr Lou Cly Asp Clu Clu Tyr Cly Ile Asp Ile Leu Lys Val Gln 811 ATC CCT CCC TAC CAT CAG CTA ACA COC ATT GCCC C ACC CCA CCC TTT ATC MA CCC CTC ACC MT CTC CCC CCC CTT Clu Ile Arg Gly Tyr Asp Cln Vol Thr Arg Ile Ala Asn Thr Pro Ala Phe Ile Lys Cly Vol Thr Asn Leu Arg Cly Val 892 893 ATT CTC CCC ATI CTT CAC TTA CGA ATT MC TTC ACC CAG CTC CAT CTC CAC TAT MC CAC MC ACG CTA GTT ATC CTC CTC Ile Vol Pro Ile Vol Asp Leu Arg Ile Lys Phe Ser Gln Vol Asp Vol Asp Tyr Asn Asp Asn Thr Vol Val Ile Val Leu 973 AAT CTC CCA CAG CCC CTC CTC CCC ATC CTC CTT CAC CCC CTC TCA CAC CTG CTT TCA TTC ACC GCC GAG CM ATT CCT CCC Asn Lou Cly Cln Arg Val Vol Cly Ile Vol Val Asp Cly Val Ser Asp Vol Lou Ser Leu Thr Ala Clu Gln Ile Ar8 Pro 1054 1055 CCA CCC CM TTT CCC CTC ACC CTT TCA ACA CM TAT CTC ACT CCA CTC CCC CCA CTC CCC GAC CGG ATC TTC ATT CTC CTC Ala Pro Clu Ph. Ala Vol Thr Lou Ser Thr Clu Tyr Leu Thr Cly Lou Cly Ala Leu Cly Asp Arg Net Leu Ile Leu Vol 1135 1 812 974 CCA AAC CTG TTC MC GTC CCG CCC CCC MA ACC Cys Gly Lys Vol Phe Asn Vol Ala Cly Ala Lys Thr CCC Clu CAC 1136 MC ATC CM AA CTC CTC MC ACC CM CAG ATC CCC CTC TTA CAT ACC CCC CCC TCA CA GTC GCG TM TTCTCC CCATTTCCTC Asn Ile Clu Lys Lou Leu Asn Ber Clu Clu Net Ala Lou Leu Asp Ser Ala Ala Ser Clu Vol Ala 1221 MTTCMATG MCCCCATGA TCTGCGCATC CCCTTTTITA TTTCAATTTC CCGCCGCCTC CCATCACCAA TAAMCTTTCC CCCCTCCTTC 1311 CCCATMCCA GATCMCTTC TTTTCACGM GCTCCCTTAT CATTMCCCT 163 1220 1310 1360 FIG. 2. Nucleotide sequence of the cheA fragment and the cheW gene. The sequences are shown, with the reading frame indicated by the spacing of the codons. The corresponding amino acids are listed below each codon. The underlined sequences represent the cluster of thymidylate residues that may mark the termination of the mocha transcription unit. the cheR and cheB gene products were synthesized and labeled with [35S]methionine, the cheR gene product always appeared to be synthesized at lower levels than the cheB gene product (18). One reason for this observation may come from the finding that the open reading frame corresponding to the cheB gene product included approximately 20 methionine residues, while the open reading frame for the cheR gene product only included 6 methionine residues. Thus, the specific activity of the two gene products could vary by more than a factor of three even if they were synthesized at exactly the same rates. Figure 4 shows the sequences corresponding to the cheY and cheZ genes. The product of the che Y gene was reported to be a small polypeptide of molecular weight 8,000 (10, 18). The open reading frame corresponding to this polypeptide was found to start at nucleotide 1957 and extend to nucleotide 2343. This reading frame encoded a polypeptide comprising 129 amino acid residues with a molecular weight of 14,100. The calculated molecular weight was slightly higher than that estimated from the sodium dodecyl sulfatepolyacrylamide gel electrophoresis data. The putative ribosome-binding sequence was AGGAG, corresponding to nucleotides 1946 to 1950. Slocum and Parkinson (19) indicated the presence of an SaiI restriction endonuclease recognition site in the che Y gene. A SalI recognition site was found at nucleotides 2074 to 2079 in this opening reading frame. The gene ended at nucleotide 1943 and, thus, there was a space of 15 base pairs between the translated regions of the two genes. This space included a sequence that could serve as a ribosome-binding site. Matsumura et al. (9) reported the nucleotide sequence of the che Y gene. The sequence shown here is identical to the one that they reported. After the che Y gene, there was another long open reading frame, which we assigned to the cheZ gene. The che Y gene ended with a cheB termination nucleotide TGA, corresponding to nucleotide 2347. There was then a space of 11 nucleotides before the next ATG, which started the long open reading frame extending from nucleotide 2359 to 2999. This could encode a polypeptide corresponding to 214 amino acid residues with a molecular weight of 24,000. This molecular weight corresponded well to that of the cheZ gene product estimated by polyacrylamide gel electrophoresis. There was a PvuII restriction endonuclease recognition site 38 to 43 nucleotides downstream of the ATG start codon, and this again was consistent with the report of Slocum and Parkinson (19) that a PvuII recognition site is found in the early part of the cheZ gene. A ribosome-binding site could correspond to the sequence AGGA which is 11 base pairs from the start of the cheZ gene. There would be a 1-base overlap between the termination codon of the open reading frame corresponding to che Y and the putative ribosome-binding site for the cheZ gene product. We explored the sequence further down- 164 MUTOH AND SIMON J. BACTERIOL. cho R 1 TCGTATCCTG AAGTGATTGA GMACGCGCT ATC ACT TCA TCT CTC CCC TGT GGC CAA ACG TCT TTA TTC TTA CAG ATG ACC GAG CGC Met Thr Ser Ser Leu Pro Cys Gly Gln Thr Ser Leu Leu Leu Gln Met Thr Glu Arg 86 87 CTG GCG CTT TCC GAC GCG CAT TTT CCC CCC ATA ACT CAA TTC ATC TAT CM CGA GCC GGG ATC GTT CTG CCT GAC CAT AAA Leu Ala Leu Ser Asp Ala His Phe Arg Arg Ile Ser Gln Leu Ile Tyr Gln Arg Ala Gly Ile Val Leu Ala Asp His Lys 167 168 CCC CAC ATC GTT TAC MC CGA CTG GTT CGT CCT TTG CGT TCG CTG GGA CTG ACG GAT TTC GGT CAT TAT CTG AAC TTG CTG Arg Asp Met Vol Tyr Asn Arg Leu Val Arg Arg Leu Arg Ser Leu Gly Leu Thr Asp Phe Gly His Tyr Leu Asn Leu Leu 248 249 GM TCT MT CAG CAC AGC GGT GAG TGG CAG GCG TTT ATC MT TCG CTG ACC ACG MT CTG ACG GCA TTT i rc CGT GAG GCA Glu Ser Asn Gln His Ser Gly Glu Trp Gln Ala Phe Ile Asn Ser Leu Thr Thr Asn Leu Thr Ala Phe Phe Arg Glu Ala 329 330 CAT CAT TTC CCT CTG CTC CCC CAT CAC GCA CGT CCC CCT TCT CCC CAG TAT CCC GTA TCG AGC CCC CCC CCT TCG ACC GGC His His Phe Pro Leu Leu Ala Asp His Ala Arg Arg Cly Ser Gly Clu Tyr Arg Val Trp Ser Ala Ala Ala Ser Thr Gly 410 GM GAG CCC TAC AGC ATT GCC ATG ACC CTG GCT CAC ACA TTG GGC ACC CCG CCC GGA CGC TGG AAA GTG TTT GCC AGT CAT Glu Glu Pro Tyr Ser Ile Ala Met Thr Leu Ala Asp Thr Leu Gly Thr Ala Pro Gly Arg Trp Lys Val Phe Ala Ser Asp 491 492 ATC GAC ACC GM GTG CTG GM AAA CCC AGA ACC GGT ATC TAT CCC CAT GM GAG TTC AM MC CTG ACG CCG CAG CM CTG Ile Asp Thr Glu Val Leu Glu Lys Ala Arg Ser Gly Ile Tyr Arg His Glu Glu Leu Lys Asn Leu Thr Pro Gln Gln Leu 572 573 CM CGG TAT TTC ATG CGA GCG ACG GGG CCC CAT CM CGC CTG GTA CGC GTG CCT CAG GAG CTG GCC AAC TAT GTT GAT TTT Gln Arg Tyr Phe Mot Arg Gly Thr Gly Pro His Glu Gly Leu Val Arg Val Arg Gln Glu Leu Ala Asn Tyr Val Asp Phe 653 654 CCC CCC CTC MT CTA CTC CCC AAA CAC TAC ACC GTG CCC GGG CCG TTT GAT GCG ATC TTC TCT CGT MC GTC ATC ATC TAC Ala Pro Leu Asn Leu Leu Ala Lys Gln Tyr Thr Vol Pro Gly Pro Phe Asp Ala Ile Phe Cys Arg Asn Val Met Ile Tyr 734 735 TTC CAT CM ACT ACC CAG CAG GAG ATT TTG CCC CCC TTT CTT CCC CTC CTT AM CCC CAC GGA TTC CTG TTT GCG GCT CAC Phe Asp Cln Thr Thr Cln Cln Clu Ile Leu Arg Arg Phe Val Pro Leu Leu Lys Pro Asp Gly Leu Leu Phe Ala Gly His 815 411 816 TCT GM MC TTT ACC CAC CTT GAG CGC CCC TTC ACG CTG CCT GCT CAG ACC GTG TAT CCC CTA AGT AAC GAT TM Ser Glu Asn Phe Ser His Leu Glu Arg Arg Phe Thr Leu Arg Gly Gln Thr Val Tyr Ala Leu Ser Lys Asp 891 CGATGAGCM MTCAGCCTC TTATCTGTCG 865 890 920 che B CGGTGTATC CGCTMCTM GGATTM CC ATC ACC AM ATC AGG CTC TTA TCT GTC GAT GAT TCG GCA CTG ATG CGC CAG ATC ATC Met Ser Lys Ile Arg Val Leu Ser Val Asp Asp Ser Ala Leu Met Arg Gln Ile Met 949 950 ACA GM ATC ATC MC ACC CAT AGC GAC ATG CM ATG CTG GCC ACC GCG CCT GAT CCC CTG CTC GCC CCT GAC TTG ATT AAC Thr Glu Ile Ile Asn Ser His Ser Asp Met Glu Met Val Ala Thr Ala Pro Asp Pro Leu Val Ala Arg Asp Leu Ile Lys 1030 1031 AM TTC MAT CCC GAT CTC CTC ACC CTC CAT CTT GM ATC CCG CGG ATG GAC GGA CTC GAT TTC CTC CM AM TTA ATG CCT Lys Phe Asn Pro Asp Val Leu Thr Leu Asp Vol Clu Met Pro Arg Met Asp Gly Leu Asp Phe Leu Clu Lys Leu Met Arg 1111 1112 TTG CGT CCA ATG CCC GTT GTG ATG GTT TCT TCC CTG ACC GGC AAA GGG TCA GM GTC ACC CTG CCC CCC CTG GAG CTG GGG Leu Arg Pro Met Pro Vol Val Met Val Ser Ser Leu Thr Gly Lys Gly Ser Glu Val Thr Leu Arg Ala l eu Glu Leu Cly 1192 1193 CCC ATA CAT TTT CTC ACC AAA CCC CM CTC CCT ATT CCC CM CCT ATC CTC GCC TAT AAC CM ATG ATT CCT CAA AAC GTG Ala Ile Asp Phe Vol Thr Lys Pro Gln Leu Gly Ile Arg Glu Gly Met Leu Ala Tyr Asn Glu Met lie Ala Glu Lys Val 1273 1274 CCT ACC GCA GCA AAG CCC AGC CTT CCA CCA CAT AAC CCA TCG TCC GCA CCC ACA ACC CTG AAC CCC GCC CCC TTC TTC ACT Arg Thr Al& Ala Lys Ala Ser Leu Ala Ala His Lys Pro Lei Ser Ala Pro Thr Thr Leu Lys Ala Gly Pro Leu Leu Ser 1354 1355 TCT GM AM CTG ATT CCC ATT GGT GCT TCA ACG GCT GGA ACT GAG GCA ATT CGT CAC GTA CTG CM CCC TTG CCG CTT TCC 1435 Ser Glu Lys Leu Ile Ala Ile Cly Ala Ser Thr Gly Gly Thr Glu Ala Ile Arg His Val Leu Gln Pro Leu Pro Leu Ser 1436 AGC CCC GCA CTG TTA ATT ACC CAG CAT ATG CCC CCC GCT TTC ACC CGC TCT TTT GCC GAC AGA CTT MT AAG CTT TGC CAG Ser Pro Ala Leu Leu Ile Thr Gln His Met Pro Pro Gly Phe Thr Arg Ser Phe Ala Asp Arg Leu Asn Lys Leu Cys Gln 1516 1517 ATC GGG GTT AAA GM GCC GAA GAC GGA GM CGT GTC TTC CCG GGC CAT GCC TAT ATT GCG CCC GGC GAT CGG CAT ATG GAG Ile Gly Val Lys Clu Ala Glu Asp Gly Glu Arg Val Leu Pro Cly His Ala Tyr Ile Ala Pro Gly Asp Arg His Met Glu 1597 1598 CTC TCG CCT ACT GGC GCA MT TAC CM ATC AM ATT CACGAT CGC CCG GCG GTT AAC CGT CAT CGGCCT TCG GTA CAT GTG Leu Ser Arg Ser Gly Ala Asn Tyr Gln Ile Lys Ile His Asp Gly Pro Ala Val Asn Arg His Arg Pro Ser Val Asp Val 1678 1679 TTC TTC CAT TCT GTC GCC AA CAG CCC GGG CGT MT GCG CTT CCG GTG ATC CTG ACC GCT ATG GGC AAC GAC GGC GCG GCG Leu Phe His Ser Vol Ala Lys Gln Ala Gly Arg Asn Ala Val Gly Val Ile Leu Thr Gly Met Gly Asn Asp Gly Ala Ala 1759 1760 CGA ATC TTG CCC ATC CGT CAG GCC CCC GCA TCC ACC CTT CCG CM AMC GM GCA ACT TGC CTC GTG TTC CCC ATC CCG CGC Gly Met Leu Ala Met Arg Gln Ala Gly Ala Trp Thr Leu Ala Gln Asn Glu Ala Ser Cys Val Val Phe Gly Met Pro Arg 1840 1841 GAG GCC ATC MT ATG GCT GCT GTC TGC CM GTG GTC GAT CTT ACC CAG GTA AGC CAG CM ATC TTG GCA AM ATf AGT CCC Glu Ala Ile Asn Mot Gly Gly Vol Cys Glu Val Val Asp Leu Ser Gln Vol Ser Gln Gln Met Leu Ala Lys le Ser Ala 1921 1922 CGA CAG CCC ATA CGT ATT TM ATCAGGAG TCTGCAMATCC CCCATMAGA Gly Gln Ala Ile Arg Ile 1970 FIG. 3. Nucleotide sequence of the cheR and cheB genes. The TAA codon that corresponds to the termination of the cheB gene is underlined to show that the ribosome-binding site for the cheB gene overlaps with the translated sequence of the cheR gene. stream from the cheZ open reading frame and did not discover a long stretch of T residues or an inverted repeat sequence that could correspond to transcription signals similar to those found at the end of the Mocha operon. On the other hand, we did not find any other long, extensive open reading frames. We suggest that the cheZ gene may represent that last gene product in the cotranscribed unit corresponding to the chemotaxis genes. CHEMOTAXIS GENE SEQUENCES VOL. 165, 1986 165 che Y 1928 CCGATACGTA TTTAAMTCAC GAGTCTCAA ATG GCC GAT A GCM CTT AM TTT TTC CTT GTG CAT GAC TTT TCC ACC ATC CGA CCC Met Ala Asp Lys Clu Leu Lys Phe Leu Val Val Asp Asp Phe Ser Thr Met Arg Arg 2013 2014 ATA GTG CCT MC CTG CTG AMA GAC CTC GGA TTC AAT MT GTT GAG GM GCG GAA GAT GGC GTC GAC GCT CTC AAT AAC TTC I e Val Arg Asn Leu Leu Lys Glu Leu Gly Phe Asn Asn Val Glu Clu Ala Glu Asp Gly Val Asp Ala Leu Aun Lys Leu 2094 2095 CAC CCA CCC GCT TAT CGA TTT CTT ATC TCC CAC TCC AAC ATC CCC AAT ATC CAT CCC CTC CM TTC CTC AM ACA ATT CCT Gln Ala Cly Cly Tyr Gly Phe Vol Ile Ser Asp Trp Asn Met Pro Asn Met Asp Gly Leu Glu Leu Leu Lys Thr Ile Arg 2175 2176 GCC GAT CCC CCC ATC TCG GCA TTG CCA GTG TTA ATC GTG ACT GCA GM GCC AAG AM GAG MC ATC ATT CCT CCG GCC CM Ala Asp Gly Ala Met Ser Ala Leu Pro Vol Leu Met Val Thr Ala Glu Ala Lys Lys Glu Asn lle Ile Ala Ala Ala Gln 2256 2257 GCG GGG GCC ACT GCC TAT GTC GTC AAC CCA TTT ACC CCC CCG ACC CTC GAC CM AM CTC MC AM ATC TTT CAC AM CTG 2337 Ala Cly Ala Ser Cly Tyr Val Val Lys Pro Phe Thr Ala Ala Thr Lei Clu Clu Lys Leu Asn Lys Ile Phe Glu Lys Lev 2338 GGC ATC TGA GCAT GCGACTATGA TGCMCCATC MTCAACCT Gly 2380 Met che Z 2328 TCACAAACTC CCCATGTCAC CATCCGACT ATC ATC CM CCA TCA ATC AM CCT CCT GAC CAC CAT TCA GCT GCC CAT ATC ATT CCC 2413 Met Met Cln Pro Ser Ile Lys Pro Ala Asp Glu His Ser Ala Cly Asp Ile Ile Ala 2414 CCC ATC CCC ACC CTC ACC CCT ATC CTC CCC GAC ACT TTC CCC CM CTC CCC CTC CAT CAC GCC ATT CCC CM CCC CCC CM Arg Ile Gly Ser Leu Thr Arg Met Leu Arg Asp Ser Leu Arg Clu Leu Gly Leu Asp Cln Ala le Ala Clu Ala Ala Clu 2494 2495 CCC ATC CCC CAT GCC CCC CAT CCT TTC TAC TAT CTT CTC CAC ATC ACC CCC CAC CCT CCC CAC CCC CCC CTC AAC ACT CTT 2575 Ala Ile Pro Asp Ala Arg Asp Arg Leu Tyr Tyr Vol Val Gln Met Thr Ala Gln Ala Ala Clu Arg Ala Leu Asn Ser Vol 2576 GAG CCC TCA CM CCG CAT CAC GAT CM ATC GAG AM TCA CCA AA GCG TTA ACC CM CCT TGG GAT CAC TGG TTT CCC GAT 2656 Glu Ala Ser Cln Pro His Gln Asp Gln Met Clu Lys Ser Ala Lys Ala Leu Thr Cln Arg Trp Asp Asp Trp Phe Ala Asp 2657 CCG ATT GAC CTT GCC GAC GCC CGT GM CTC CTA ACA CAT ACA CGA CM TTT CTG CCA CAT GTA CCC CCG CAT ACC AGC TTT 2737 Pro Ile Asp Leu Ala Asp Ala Arg Clu Leu Vol Thr Asp Thr Arg Gln Phe Leu Ala Asp Vol Pro Ala His Thr Ser Phe 2738 ACT AAC GCG CM CTC CTG CAA ATC ATC ATC CCC CAG GAT TTT CAG CAT CTC ACC CCC CAG GTC ATT AAC CCC ATC ATC GAT 2818 Thr Asn Ala Cln Leu Leu Glu Ile Met Met Ala Cln Asp Phe Gln Asp Leu Thr Gly Gln Val Ile Lys Arg Met Met Asp 2819 GTC ATT CAG GAG ATC CM CCC CAC TTG CTC ATC CTC CTC TTC CM MC ATC CCC CAA CAG GAG TCG CCT CCA AAA CCT CM 2899 Vol Ile Cln Glu Ile Glu Arg Gln Leu Leu Met Val Leu Leu Clu Asn Ile Pro Clu Cln Clu Ser Arg Pro Lys Arg Clu 2900 MC CAG ACT TTC CTT MT CCA CCT CAG GTC CAT ACC AGC AM CCC CCT CTC CTA GCC ACT CAG GAT CAC CTC GAC GAT TTG 2980 Asn Cln Ser Leu Leu Asn Gly Pro Cln Vol Asp Thr Ser Lys Ala Cly Vol Vol Ala Ser Cln Asp Gln Val Asp Asp Leu 2981 TTC GAT ACT CTT CCA TTT TCA TTTCTATTC CCTCATCTCG CGTCACCACC TCATATCAGG CCTTCTCATA ACCCCATCAC GCC 3063 Leu Asp Ser Leu Gly Phe FIG. 4. Nucleotide sequence of the cheY and cheZ genes. In the lower part of the figure, the TGA sequence corresponding to the termination of the che Y gene is underlined to indicate the relationship and spacing between the che Y and cheZ genes. DISCUSSION One of the best-characterized sensory transducing sys- tems is the one involved in bacterial chemotaxis. Thus far, many of the experiments dealing with bacterial chemotaxis have focused on describing its components. However, the system is ideally suited to both genetic and biochemical manipulation. Experiments can be designed to modify genes and gene products and to study their interaction both in the whole organisms and in the test tube. The final stage in the characterization of the chemotaxis system is the description of the nucleotide sequence of the genes that encode the proteins involved in generating intracellular signals. The DNA sequence allows for precise gene manipulation, the preparation of overproducers, and site-specific mutagenesis. In this paper, we have described the DNA sequence of five genes that are centrally involved in regulating signal transduction in E. coli chemotaxis. All of the sequences presented here have been tested for homology with an extensive collection of known protein sequences in the Bionet collection and by R. Doolittle of the University of California, San Diego. Thus far, no significant homology has been found between the cheW, cheR, che Y, or cheZ sequences and the other sequences in the libraries. However, the C-terminal fragment of the cheA gene did show a significant match with the C-terminal portion of the sequence corresponding to the envZ gene (11). When the stretch of amino acids between Phe 335 to Thr 391 of the envZ sequence was aligned with amino acids Phe 30 to Met 86 of the cheA C-terminal sequence shown in Fig. 2, without introducing any gaps, 15 amino acid identities were found. While this is not sufficient homology to suggest a direct relationship between the function of the envZ gene product and the product of the cheA gene, it is sufficient to suggest that these proteins may be ancestrally related. envZ is a component of a pathway of information processing. It is thought to be in part responsible for sensing changes in osmolarity and transducing these changes into changes in levels of different outer membrane proteins in E. coli (11). It is thus possible that the C-terminal portion of the EnvZ protein and the CheA protein may have evolved from a common ancestral protein that played some role in a primitive pathway for sensory transduction. Hydropathy profiles of all of the sequences were determined, and they do not indicate any special regions that might be membrane associated. We would perhaps have predicted this result, since all of these gene products were found to be soluble cytoplasmic proteins in previous work (13). The codon usage in general was found to be fairly typical of that for E. coli proteins with the exception of the GGA and GGG codons, which were used significantly more frequently than expected (6). The cheR and cheB gene products are involved in mediating the reversible methylation of the chemotaxis receptors. It is interesting that their coding regions apparently overlap so that the termination of the cheR translation may be 166 MUTOH AND SIMON coordinated with the initiation of translation of cheB. The che Y gene product is also produced at higher stoichiometric levels than the other gene products. This may be the result of the apparent ribosome-binding site preceding che Y. This sequence has very good correspondence to the canonical ribosome-binding sequence (14). There may be other structural features of the mRNA gene product that enhance translation. They are not apparent from our analysis thus far. This work, together with previously published sequences, provides the complete nucleotide sequence of the Meche operon and the complete sequence of the cheW gene. ACKNOWLEDGMENTS This work was supported by a Public Health Service grant from the National Institute of Health. We thank R. Doolittle for analyzing the protein sequences presented in this paper for sequence homology with other proteins. His computer analyses found the homologies between the cheA and the envZ genes. LITERATURE CITED 1. Aswad, D., and D. E. Koshland, Jr. 1975. Isolation, characterization and complementation of Salmonella typhimurium chemotaxis mutants. J. Mol. Biol. 97:225-235. 2. Boyd, A., A. Krikos, and M. Simon. 1981. Sensory transducers of E. coli are encoded by homologous genes. Cell 26:333-343. 3. Clegg, D. 0., and D. E. Koshland, Jr. 1984. The role of a signaling protein in bacterial sensing: behavioral effects of increased gene expression. Proc. Natl. Acad. Sci. USA 81:5056-5060. 4. DeFranco, A. L., and D. E. Koshland, Jr. 1981. Molecular cloning of chemotaxis genes and overproduction of gene products in the bacterial sensing system. J. Bacteriol. 147:390-400. 5. Goy, M. F., M. S. Springer, and J. Adler. 1977. Sensory transduction in Escherichia coli: role of a protein methylation reaction in sensory adaptation. Proc. Natl. Acad. Sci. USA 74:496-4968. 6. Grosjean, H., and W. Fiers. 1982. Preferential codon usage in prokaryotic genes: the optimal codon-anticodon interaction energy and the selective codon usage in efficiently expressed genes. Gene 18:199-209. 7. Heidecker, G., J. Messing, and B. Gronenbaum. 1980. A versatile primer for DNA sequencing in the M13mp2 cloning system. Gene 10:69-73. 8. Krikos, A., N. Mutoh, A. Boyd, and M. I. Simon. 1983. Sensory transducers of E. coli are composed of discrete structural and functional domains. Cell 33:615-622. J. BACTERIOL. 9. Matsumura, P., J. J. Rydel, R. Linzmeier, and D. Vacante. 1984. Overexpression and sequence of the Escherichia coli che Y gene and biochemical activities of the CheY protein. J. Bacteriol. 160:36-41. 10. Matsumura, P., M. Silverman, and M. Simon. 1977. Synthesis of mot and che gene products of Escherichia coli programmed by hybrid ColEl plasmids in minicells. J. Bacteriol. 132:996-1002. 11. Mizuno, T., E. T. Wurtzel, and M. Inouye. 1982. Osmoregulation of gene expression. II. DNA sequence of envZ gene of the ompB operon of Escherichia coli and characterization of its gene product. J. Biol. Chem. 257:13692-13698. 12. Parkinson, J. S. 1978. Complementation analysis and deletion mapping of Escherichia coli mutants defective in chemotaxis. J. Bacteriol. 135:45-53. 13. Ridgway, H. F., M. Silverman, and M. I. Simon. 1977. Localization of proteins controlling motility and chemotaxis in Escherichia coli. J. Bacteriol. 132:657-665. 14. Rosenberg, M., and D. Court. 1979. Regulatory sequences involved in the promotion and termination of RNA transcription. Annu. Rev. Genet. 13:319-353. 15. Segall, J. E., A. Ishihara, and H. C. Berg. 1985. Chemotactic signaling in filamentous cells of Escherichia coli. J. Bacteriol. 161:51-59. 16. Shine, J., and L. Dalgarno. 1974. The 3'-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosomal binding sites. Proc. Natl. Acad. Sci. USA 71:1342-1346. 17. Silverman, M., and M. Simon. 1976. Operon controlling motility and chemotaxis in E. coli. Nature (London) 264:577-580. 18. Silverman, M., and M. Simon. 1977. Identification of polypeptides necessary for chemotaxis in Escherichia coli. J. Bacteriol. 130:1317-1325. 19. Slocum, M. K., and J. S. Parkinson. 1983. Genetics of methylaccepting chemotaxis proteins in Escherichia coli: organization of the tar region. J. Bacteriol. 155:565-577. 20. Springer, W. R., and D. E. Koshland, Jr. 1977. Identification of a protein methyltransferase as the cheR gene product in the bacterial sensing system. Proc. Natl. Acad. Sci. USA 74:533-537. 21. Stock, J. B., and D. E. Koshland, Jr. 1978. A protein methylesterase involved in bacterial sensing. Proc. Natl. Acad. Sci. USA 75:3659-3663. 22. Tsui Collins, A. L., and B. A. D. Stocker. 1976. Salmonella typhimurium mutants generally defective in chemotaxis. J. Bacteriol. 128:754-765. 23. Yonekawa, H., H. Hayashi, and J. S. Parkinson. 1983. Requirement of the cheB function for sensory adaptation in Escherichia coli. J. Bacteriol. 156:1228-1235.
Link or Click Back
Here will be a configuration form