BLAST-OER

The FASTA format

The FASTA format is a text-based format for nucleotide sequences or protein sequences. It represents the nucleotides or amino acids of the sequences using single-letter codes. A simple example looks like this:

>P01013 GENE X PROTEIN (FIRST PART)
QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNNSFNVATLPAE
KMKILELPFASGDLSMLVLLPD

The format starts with a comment line, commonly started with a > and giving information about the protein (here a Uniprot ID) and other experimental relevant information. The following lines contain the actual sequence.

The pattern can be repeated to create a multiple sequence FASTA file, where each time a comment line indicates the start of the next sequence.

More info

NCBI BLAST Help page
Wikipedia: FASTA format

toc