top of page
Search

A guide to Greengenes: A chimera-checked 16S rRNA gene database with multiple taxonomies



Greengenes Database Download: A Guide for Microbiome Researchers




Microbiome research is the study of the microbial communities that inhabit various environments, such as the human body, soil, water, plants and animals. Microbiome research can reveal the diversity, function and interactions of microorganisms, as well as their impact on health, disease, ecology and evolution.




greengenes database download




One of the main approaches for microbiome research is metagenomics, which involves sequencing the DNA from a mixed sample of microorganisms without prior cultivation. Metagenomics can provide insights into the composition, structure and dynamics of microbial communities, as well as their metabolic potential and gene expression.


However, metagenomics also poses many computational challenges, such as read assignment, assembly, annotation, comparison and visualization. To address these challenges, microbiome researchers need reliable reference databases that contain genomic information of microorganisms.


One of the most widely used reference databases for microbiome research is Greengenes, which is a curated collection of 16S ribosomal RNA (rRNA) gene sequences from bacteria and archaea. 16S rRNA gene is a marker gene that can be used to identify and classify microorganisms based on their evolutionary relationships.


In this article, we will explain what Greengenes database is and why it is useful for microbiome analysis. We will also show you how to download Greengenes database and use it for microbiome analysis. We will then discuss the advantages and limitations of Greengenes database, as well as some alternatives to it.


What is Greengenes database and why is it useful for microbiome analysis?




Greengenes database is a curated collection of 16S rRNA gene sequences




Greengenes database was created in 2006 by a team of researchers from the Lawrence Berkeley National Laboratory (LBL) in California. The aim of Greengenes database was to provide a comprehensive and consistent reference for 16S rRNA gene sequences from bacteria and archaea.


Greengenes database contains over one million 16S rRNA gene sequences that have been obtained from various sources, such as GenBank, RDP, SILVA, Sanger sequencing projects and environmental surveys. Greengenes database also includes sequences from uncultured microorganisms that have been identified by PCR amplification or metagenomic sequencing.


greengenes 16S rRNA gene database


greengenes database and workbench compatible with ARB


greengenes database chimera screening


greengenes database standard alignment


greengenes database taxonomic classification


greengenes database multiple published taxonomies


greengenes database terms of use


greengenes database documentation


greengenes database files


greengenes database formats


greengenes database StrainSelect


greengenes database PhyloChip


greengenes database second genome


greengenes database Lawrence Berkeley National Laboratory


greengenes database Center for Environmental Biotechnology


greengenes database bioresource centers


greengenes database genome assemblies


greengenes database contigs


greengenes database 16S rRNA genes


greengenes database strain identifiers


greengenes database unified taxonomic reference


greengenes database shotgun metagenomics


greengenes database Creative Commons Attribution-ShareAlike 3.0 Unported License


greengenes database phylogenetic methods


greengenes database Archaea and Bacteria


greengenes database net energy gain


greengenes database nuclear fusion experiment


greengenes database Korea Superconducting Tokamak Advanced Research facility (KSTAR)


greengenes database Korea Institute of Fusion Energy (KFE)


greengenes database Database Commons


greengenes database National Genomics Data Center (NGDC)


greengenes database RNAcentral


greengenes database mapping between accessions


greengenes database example entries


how to download greengenes database


where to download greengenes database


why to download greengenes database


what is the latest version of greengenes database


what is the size of the greengenes database download file


what are the benefits of using the greengenes database download file


what are the requirements for using the greengenes database download file


what are the alternatives to the greengenes database download file


how to install the greengenes database download file on Windows/Mac/Linux (choose one)


how to use the greengenes database download file with QIIME/MEGAN/Mothur (choose one)


how to update the greengenes database download file


how to cite the greengenes database download file


how to troubleshoot the greengenes database download file


how to contact the developers of the greengenes database download file


Greengenes database applies several quality control steps to ensure the accuracy and reliability of the 16S rRNA gene sequences. These steps include:



  • Chimera checking: removing sequences that are artificially generated by PCR errors or contamination



  • Standard alignment: aligning sequences to a common framework using NAST (Nearest Alignment Space Termination) algorithm



  • Taxonomic classification: assigning sequences to a hierarchical taxonomy based on phylogenetic methods and expert curation



  • Clustering: grouping sequences into operational taxonomic units (OTUs) based on sequence similarity thresholds



Greengenes database provides several files and formats for downloading and using the 16S rRNA gene sequences, such as FASTA, ARB, tree and OTU tables. Greengenes database also provides a web interface for browsing, searching and visualizing the 16S rRNA gene sequences and their taxonomic annotations.


Greengenes database provides a consistent taxonomy and alignment for bacterial and archaeal taxa




One of the main features of Greengenes database is that it provides a consistent and comprehensive taxonomy for bacterial and archaeal taxa. The taxonomy of Greengenes database is based on phylogenetic methods, such as maximum likelihood and Bayesian inference, as well as expert curation and manual revision.


The taxonomy of Greengenes database consists of eight ranks: domain, phylum, class, order, family, genus, species and OTU. The OTU rank represents the finest level of resolution that can be achieved by 16S rRNA gene sequences. The OTUs are defined by clustering sequences at 97% similarity, which is considered to approximate the species level.


The taxonomy of Greengenes database is compatible with the NCBI taxonomy, but also includes additional taxa that are not recognized by NCBI. For example, Greengenes database recognizes the candidate phyla that have been discovered by metagenomic studies, such as TM7, OP11 and SR1. Greengenes database also assigns names to unnamed taxa based on their phylogenetic position or environmental origin.


Another feature of Greengenes database is that it provides a standard alignment for 16S rRNA gene sequences. The alignment of Greengenes database is based on the NAST algorithm, which aligns sequences to a core set of reference sequences that represent the diversity of bacteria and archaea. The alignment of Greengenes database covers 1,250 nucleotide positions that are informative for phylogenetic analysis.


Greengenes database can be used for taxonomic classification and phylogenetic inference of microbiome samples




Greengenes database can be used for various applications in microbiome research, such as taxonomic classification and phylogenetic inference of microbiome samples. Taxonomic classification is the process of assigning 16S rRNA gene sequences to their corresponding taxa based on their similarity or distance to reference sequences. Phylogenetic inference is the process of reconstructing the evolutionary relationships among 16S rRNA gene sequences based on their alignment and tree models.


Greengenes database can be used for taxonomic classification and phylogenetic inference of microbiome samples with various bioinformatics tools and pipelines, such as QIIME, mothur, RDP Classifier, PhyloSeq and MG-RAST. These tools and pipelines can perform different steps of microbiome analysis, such as quality filtering, OTU picking, diversity estimation, statistical testing and visualization.


For example, QIIME (Quantitative Insights Into Microbial Ecology) is a popular pipeline for microbiome analysis that can use Greengenes database as a reference. QIIME can perform taxonomic classification of 16S rRNA gene sequences using different methods, such as BLAST, UCLUST or RDP Classifier. QIIME can also perform phylogenetic inference of 16S rRNA gene sequences using different methods, such as FastTree, RAxML or PhyML.


How to download Greengenes database and use it for microbiome analysis?




Greengenes database can be downloaded from the Second Genome website




Greengenes database can be downloaded from the Second Genome website ( The Second Genome website is the official repository of Greengenes database since 2016, when it was transferred from the LBL website. The Second Genome website provides access to the latest version of Greengenes database (13_8), which was released in August 2013.


The Second Genome website offers different options for downloading Greengenes database, depending on the user's needs and preferences. The user can download the entire Greengenes database or only specific files or formats. The user can also download different versions or subsets of Greengenes database, such as 13_5 or 13_1.


The Second Genome website provides a detailed description of each file and format of Greengenes database, as well as instructions on how to download them. The user can also find useful information about Greengenes database on the Second Genome website, such as publications, tutorials, FAQs and contact details.


Greengenes database can be used with various bioinformatics tools and pipelines




Greengenes database can be used with various bioinformatics tools and pipelines for microbiome analysis, as mentioned in the previous section. However, before using Greengenes database with these tools and pipelines, the user may need to perform some preprocessing steps, such as decompressing, formatting, indexing or converting the files.


For example, if the user wants to use Greengenes database with QIIME, the user may need to do the following steps:



  • Download the Greengenes database files from the Second Genome website, such as the 16S rRNA gene sequences (gg_13_8_99.fasta.gz), the taxonomy (gg_13_8_99.taxonomy.gz) and the tree (gg_13_8_99.tre.gz).



  • Decompress the files using gzip or other tools, such as unzip or 7zip.



  • Format the files according to QIIME requirements, such as adding a ">" symbol before each sequence ID in the FASTA file and removing any spaces or special characters in the taxonomy file.



  • Index the files using QIIME commands, such as add_qiime_labels.py, make_blast_db.py or make_phylogeny.py.



  • Convert the files to other formats if needed, such as BIOM for OTU tables or Newick for trees.



The user can find more details and examples on how to use Greengenes database with QIIME on the QIIME website ( The user can also find similar information on how to use Greengenes database with other tools and pipelines on their respective websites or manuals.


Greengenes database can be mapped to other taxonomies using tax2tree software




Greengenes database can be mapped to other taxonomies using tax2tree software ( Tax2tree is a software tool that can generate a unified taxonomy from multiple sources of information, such as 16S rRNA gene sequences, phylogenetic trees and taxonomic annotations.


Tax2tree can be useful for microbiome researchers who want to compare or integrate different taxonomies, such as Greengenes, SILVA, RDP or NCBI. Tax2tree can also be useful for microbiome researchers who want to update or refine their taxonomies based on new data or knowledge.


Tax2tree works by applying a set of rules and heuristics to resolve conflicts and inconsistencies among different taxonomies. Tax2tree also uses a confidence score to indicate the reliability of each taxonomic assignment. Tax2tree can output a consensus taxonomy in various formats, such as BIOM, Newick or CSV.


The user can find more details and examples on how to use tax2tree software on the tax2tree website ( The user can also find a tutorial on how to use tax2tree with Greengenes database on the QIIME website (


What are the advantages and limitations of Greengenes database?




Greengenes database has a high coverage and quality of 16S rRNA gene sequences




One of the advantages of Greengenes database is that it has a high coverage and quality of 16S rRNA gene sequences from bacteria and archaea. Greengenes database contains over one million 16S rRNA gene sequences that represent a wide range of microbial diversity and environments. Greengenes database also applies rigorous quality control steps to remove chimeras, errors and redundancies from the 16S rRNA gene sequences.


The high coverage and quality of Greengenes database can enable microbiome researchers to perform accurate and comprehensive analysis of their microbiome samples. For example, microbiome researchers can use Greengenes database to identify rare or novel taxa that may not be present in other databases. Microbiome researchers can also use Greengenes database to compare their microbiome samples with other samples from different habitats or hosts.


Greengenes database has a robust and comprehensive taxonomy based on phylogenetic methods




Another advantage of Greengenes database is that it has a robust and comprehensive taxonomy based on phylogenetic methods. Greengenes database uses phylogenetic methods, such as maximum likelihood and Bayesian inference, to assign 16S rRNA gene sequences to their corresponding taxa based on their evolutionary relationships. Greengenes database also uses expert curation and manual revision to ensure the consistency and accuracy of the taxonomy.


The robust and comprehensive taxonomy of Greengenes database can enable microbiome researchers to perform reliable and meaningful analysis of their microbiome samples. For example, microbiome researchers can use Greengenes database to infer the phylogenetic diversity and structure of their microbiome samples. Microbiome researchers can also use Greengenes database to explore the evolutionary history and ecological roles of their microbiome samples.


Greengenes database has not been updated since 2013 and may not reflect the latest taxonomic revisions




One of the limitations of Greengenes database is that it has not been updated since 2013 and may not reflect the latest taxonomic revisions. Greengenes database was last updated in August 2013, when the version 13_8 was released. Since then, no new versions or updates have been released by the Greengenes database team.


The lack of updates of Greengenes database may affect the accuracy and completeness of the 16S rRNA gene sequences and their taxonomic annotations. For example, Greengenes database may not include new 16S rRNA gene sequences that have been discovered or deposited in other databases. Greengenes database may also not reflect the latest taxonomic changes that have been proposed or accepted by the scientific community.


The lack of updates of Greengenes database may limit the applicability and relevance of Greengenes database for microbiome research. For example, microbiome researchers may not be able to identify or classify some taxa that are present in their microbiome samples using Greengenes database. Microbiome researchers may also not be able to compare or integrate their microbiome samples with other samples that use different or updated taxonomies.


What are some alternatives to Greengenes database for microbiome analysis?




SILVA, RDP, NCBI and OTT are other popular 16S rRNA gene databases




Greengenes database is not the only reference database for 16S rRNA gene sequences from bacteria and archaea. There are other popular 16S rRNA gene databases that can be used for microbiome analysis, such as SILVA, RDP, NCBI and OTT.


SILVA ( is a comprehensive and quality-controlled database for ribosomal RNA (rRNA) gene sequences from all domains of life. SILVA contains over six million rRNA gene sequences, including over four million 16S rRNA gene sequences from bacteria and archaea. SILVA provides a consistent and hierarchical taxonomy for rRNA gene sequences based on phylogenetic methods and manual curation.


RDP ( is a curated and annotated database for 16S rRNA gene sequences from bacteria and archaea. RDP contains over three million 16S rRNA gene sequences that have been obtained from various sources, such as GenBank, Sanger sequencing projects and environmental surveys. RDP provides a hierarchical taxonomy for 16S rRNA gene sequences based on a naive Bayesian classifier and expert curation.


NCBI ( is a comprehensive and authoritative database for nucleotide sequences from all domains of life. NCBI contains over 300 million nucleotide sequences, including over 100 million 16S rRNA gene sequences from bacteria and archaea. NCBI provides a hierarchical taxonomy for nucleotide sequences based on sequence similarity and literature review.


OTT ( is a unified taxonomy that integrates multiple sources of information, such as taxonomies, phylogenies and publications. OTT contains over three million taxa from all domains of life, including over one million taxa from bacteria and archaea. OTT provides a consistent and comprehensive taxonomy for taxa based on synthesis methods and expert curation.


SILVA, RDP and NCBI have more frequent updates and larger sizes than Greengenes




One of the advantages of SILVA, RDP and NCBI over Greengenes is that they have more frequent updates and larger sizes than Greengenes. SILVA, RDP and NCBI are regularly updated with new data and knowledge from various sources, such as GenBank, Sanger sequencing projects, environmental surveys and scientific publications. SILVA, RDP and NCBI also have larger sizes than Greengenes, as they contain more 16S rRNA gene sequences from bacteria and archaea.


The more frequent updates and larger sizes of SILVA, RDP and NCBI can enable microbiome researchers to perform more accurate and comprehensive analysis of their microbiome samples. For example, microbiome researchers can use SILVA, RDP and NCBI to identify and classify more taxa that are present in their microbiome samples using the latest 16S rRNA gene sequences and taxonomic revisions. Microbiome researchers can also use SILVA, RDP and NCBI to compare and integrate their microbiome samples with other samples that use the same or similar reference databases.


OTT is a unified taxonomy that integrates multiple sources of information




One of the advantages of OTT over Greengenes is that it is a unified taxonomy that integrates multiple sources of information, such as taxonomies, phylogenies and publications. OTT synthesizes information from various sources, such as Greengenes, SILVA, RDP, NCBI and others, to generate a consensus taxonomy that reflects the best available knowledge and evidence. OTT also incorporates information from phylogenetic studies and scientific publications to resolve conflicts and uncertainties among different taxonomies.


The unified taxonomy of OTT can enable microbiome researchers to perform consistent and comprehensive analysis of their microbiome samples. For example, microbiome researchers can use OTT to assign their 16S rRNA gene sequences to a single and coherent taxonomy that covers all domains of life. Microbiome researchers can also use OTT to explore the evolutionary relationships and ecological roles of their microbiome samples based on multiple sources of information.


Conclusion




Greengenes database is a curated collection of 16S rRNA gene sequences from bacteria and archaea that can be used for microbiome analysis. Greengenes database provides a consistent taxonomy and alignment for bacterial and archaeal taxa based on phylogenetic methods and expert curation. Greengenes database can be used for taxonomic classification and phylogenetic inference of microbiome samples with various bioinformatics tools and pipelines.


However, Greengenes database has not been updated since 2013 and may not reflect the latest taxonomic revisions. Greengenes database may also have some limitations in terms of coverage and quality of 16S rRNA gene sequences. Therefore, microbiome researchers may want to consider some alternatives to Greengenes database for microbiome analysis, such as SILVA, RDP, NCBI and OTT.


SILVA, RDP and NCBI are other popular 16S rRNA gene databases that have more frequent updates and larger sizes than Greengenes. SILVA, RDP and NCBI provide different taxonomies and alignments for bacterial and archaeal taxa based on different methods and sources. OTT is a unified taxonomy that integrates multiple sources of information, such as taxonomies, phylogenies and publications. OTT provides a consistent and comprehensive taxonomy for all domains of life based on synthesis methods and expert curation.


In conclusion, Greengenes database is a useful reference for microbiome analysis, but it may not be the best or the only option. Microbiome researchers should evaluate the advantages and limitations of Greengenes database and its alternatives before choosing the most suitable reference for their microbiome analysis.


FAQs




What is the difference between 16S rRNA gene sequences and 16S rRNA sequences?




16S rRNA gene sequences are the DNA sequences that encode for the 16S rRNA molecules. 16S rRNA sequences are the RNA sequences that are transcribed from the 16S rRNA genes. 16S rRNA gene sequences are more stable and abundant than 16S rRNA sequences, which makes them more suitable for sequencing and analysis.


What is the difference between OTUs and taxa?




OTUs are operational taxonomic units that are defined by clustering 16S rRNA gene sequences at a certain similarity threshold, such as 97%. Taxa are taxonomic units that are defined by assigning 16S rRNA gene sequences to a hierarchical taxonomy based on their evolutionary relationships. OTUs are more objective and reproducible than taxa, but taxa are more informative and meaningful than OTUs.


What is the difference between NAST and MAFFT algorithms?




NAST (Nearest Alignment Space Termination) is an algorithm that aligns 16S rRNA gene sequences to a core set of reference sequences that represent the diversity of bacteria and archaea. NAST is fast and accurate, but it may not align novel or divergent sequences well. MAFFT (Multiple Alignment using Fast Fourier Transform) is an algorithm that aligns 16S rRNA gene sequences to each other using a progressive method. MAFFT is more flexible and sensitive, but it may introduce more errors or gaps in the alignment.


What is the difference between FastTree and RAxML algorithms?




FastTree and RAxML are algorithms that infer phylogenetic trees from 16S rRNA gene sequences based on their alignment. FastTree is a fast and approximate algorithm that uses a heuristic search and a local hill-climbing method. RAxML is a slow and exact algorithm that uses a maximum likelihood approach and a global optimization method. FastTree is more efficient and scalable, but RAxML is more accurate and robust.


What is the difference between BIOM and Newick formats?




BIOM (Biological Observation Matrix) is a format that stores OTU tables in a compact and standardized way. BIOM can include metadata, such as sample names, OTU IDs, taxonomic annotations and environmental variables. Newick is a format that stores phylogenetic trees in a simple and parsimonious way. Newick can include branch lengths, node labels and bootstrap values. 44f88ac181


0 views0 comments

Recent Posts

See All

Pokémon GO APK - Everything You Need to Know

How to Download and Play Pokemon Go APK File on Your Android Device Pokemon Go is one of the most popular mobile games in the world, but not everyone can access it from the official app store. If you

Bitcoin Blast: A Puzzle Game that Pays You in Bitcoin

Bitcoin Blast Apkpure: A Fun and Easy Way to Earn Bitcoin Bitcoin is the most popular and valuable cryptocurrency in the world. Many people want to get their hands on some bitcoins, but they don't kno

bottom of page