Help and FAQ docs for MolEvolvR

This website is free and open to all users, no login required.

This help page shows how to use MolEvolvR to its fullest potential.

Coming Soon: Videos demonstrating what you can do with MolEvolvR, and how to set up custom analyses and navigate the app after loading your results.

The UI: workflow and usage

Proteins are the functional units of cellular processes. The goal of MolEvolvR is to characterize proteins by their sequence, structure, function, and phylogeny by using sequence similarity, domain architecture, lineages/phyletic spread, and more.

Published use cases/testing

You can explore a sample set of phage shock proteins (PSP) (e.g., lia operon from Bacillus subtilis here), and the full set of PSP proteins here. We created homology, domain architecture, and phylogeny of these proteins (and genomic contexts) to show their prevalence in other organisms and detail how variations of this phage shock stress response system are present across many lineages.

We, and others, have applied the approach underlying MolEvolvR to study diverse systems, including:

Nutrient acquisition systems in Staphylococcus aureus [tcyABCP, gis-gt]
A novel phage defense system in Vibrio cholerae [Vch1]
Surface layer proteins in Bacillus anthracis [SLPs]
Helicase operators in bacteria [DciA]
Internalins in Listeria [InlP]
Eukaryotic stress response protein involved in ROS signaling [RACK1]
Cyanobacterial light adaptation proteins [ProchLight]
Antimicrobial resistance (AMR) genes from the CARD database and uncharacterized proteins associated with AMR using machine learning approaches [AMR]

How to use MolEvolvR

You can provide a variety of protein inputs, including:

Protein sequence(s) in FASTA format
Protein accession number(s) in NCBI and/or UniProt format
Protein multiple sequence alignment (MSA) in FASTA/Pearson format
Protein BLAST output in .csv format
InterProScan output in .tsv format

With any of these inputs, proteins of interest will be analyzed to identify homologs, determine domain architectures, and delineate phyletic spreads. These analyses provide insights into the biological role(s) of the protein(s) of interest within organisms, as well as trace their evolution.

MolEvolvR can perform 4 types of analyses:

Domain architecture, which allows identification of protein domains, exploration of domain interactions, and domain co-occurrences
Identification of homologs, which reveals patterns within and across species
Phylogenetic analysis, which shows the phyletic spread of proteins across the tree of life, a multiple sequence alignment, and a phylogenetic tree
Visualization and analysis of results from BLAST suite, InterProScan, and multiple sequence alignments

Enter data

Accession numbers and FASTA (full analysis)

To begin, enter the amino acid FASTA sequence(s) or accession numbers of your protein(s) of interest into the Start Analysis tab. You can also upload a file containing multiple FASTA sequences (.fa, .faa, .fasta ), or accession numbers (.csv). Up to 100 protein sequences per job are accepted. For analyses with more than 100 proteins, please contact us.

MolEvolvR is designed for single proteins or small multi-protein queries (e.g., operons of 2–10 proteins); runtimes scale with query count, database size, and analysis type, ranging from ~15–30 minutes for a single-protein full analysis to several hours for larger multi-protein submissions. Phylogenetic analyses typically run much faster (e.g., 2–3 min for ~25 homologs). Users with large query sets are encouraged to consider subsetting their inputs to ensure optimal performance and visualization, or to contact us. Visualizations are optimized for datasets up to ~1,000 homologs; larger datasets (e.g., ~20,000 homologs as in the PSP case study) are supported but may require additional filtering for optimal interactivity.

Multiple accessions/FASTA of homologs

If you have a pre-existing set of homologous proteins, you can enter/upload the multiprotein FASTA or list of accession numbers. MolEvolvR can also use an MSA in FASTA/Pearson format generated through external programs such as Clustal Omega, ClustalW, Kalign, or MUSCLE.

Advanced options

Advanced options allow you to customize your analysis.

Phylogenetic analysis

Selecting Phylogenetic Analysis will analyze a set of known homologous proteins. Because this type of analysis already uses homologs, the homology search option will be disabled.

Homology search

Selecting Homology Search will identify homologs (related proteins) for each input protein. This pairs well with domain architecture searches that can be obtained for all homologous hits for each query.

Domain architecture

Selecting Domain Architecture will characterize the domain architecture of each protein using InterProScan, combining: profile matching against domain and orthology databases (InterPro/Pfam, CDD, Gene3D, COG); prediction of signal peptides (SignalP), transmembrane regions and cellular localization (Phobius, TMHMM), and disorder (MobiDBlite); and secondary structure annotations (Hamap, Coils). If selected alone, no other analysis will be performed.

Run analysis by domain

This option enables a domain-sensitive, divide-and-conquer homology search. MolEvolvR first runs InterProScan on the full-length query to delineate its constituent domains, then uses each identified domain^# — in addition to the full-length sequence — for independent BLAST+/DELTA-BLAST searches across the tree of life. By searching with individual domains rather than the full-length sequence alone, this strategy captures remote homologs that share only a subset of domains with the query, which full-length searches routinely miss. Phylogenetic analysis, domain architecture, and characterization are then performed on all hits.

^#Domains (e.g., Pfam) identified within each query protein, including their START and STOP coordinates, that are used to initiate new homology searches will be listed in the Query tab.

BLAST parameters

For analyses that include a homology search, you can adjust parameters like database (default refseq), maximum hits (default 100), and E-value (default 0.00001).

Organism(s) to include/exclude

You can filter your homology search results to specific organisms via the Organism(s) to Include/Exclude dropdowns. Enter either organism names or taxon IDs and the list will be dynamically filtered to show only matching organisms, with your query highlighted in each list item. The full taxonomic classification for each item is shown and is searchable. Multiple selections are possible.

The first dropdown filters organisms to include; if empty, all organisms/taxa will be eligible for inclusion. The second dropdown filters organisms to exclude; if empty, no organisms/taxa will be excluded. If both dropdowns are used, the “include” filter will be applied first, followed by the “exclude” filter.

The filters are applied to your currently selected BLAST database (e.g., refseq, nr).

In the backend, these options are passed to BLAST as -taxids <comma-delimited list of IDs> for the inclusion filter and -negative_taxids <comma-delimited list of taxon IDs> for exclusions. See Limiting a Search by taxonomy for NCBI’s documentation on this feature.

Customizing your analysis

Find homologs

A homology search requires you to enter/upload protein FASTA sequence(s) or accession number(s).

If given an accession number, MolEvolvR will find its corresponding FASTA sequence to run through DELTA-BLAST, a variation of BLASTP. DELTA-BLAST searches pre-constructed Position-Specific-Scoring-Matrix (PSSM) and the conserved domain database (CDD) for accelerated lookups. Once the BLAST homology search completes, _MolEvolvR_ clusters the resulting homolog sequences with BLASTClust and adds additional metadata by lineage and domain architecture (when selecting the Domain Architecture option).

Fully characterize proteins of interest

You can start your analysis with a full list of accession numbers or FASTA files for protein(s) of interest. MolEvolvR gathers homologs of your input protein(s), and then performs domain architecture analysis on all homolog and query sequences. You have the option to perform only Phylogenetic Analysis or Domain Architecture if you don’t need both.

Analyze external data

You can start your analysis from uploaded NCBI BLAST or InterProScan results. Web-BLAST results allow you to determine homolog similarity, the domain architecture and/or phylogeny. InterProScan results summarize and visualize domains and (if accession numbers are provided) phylogeny.

BLAST outputs from the NCBI BLAST web-interface

BLAST is available through NCBI’s website.

You can start your analysis with data from a previous BLAST run. These data are run through BLASTClust to cluster similar sequences among the retrieved homologs. The Phylogenetic Analysis and Domain Architecture options are then applied.

To ensure compatibility with the MolEvolvR Start Analysis tab, follow these guidelines.

Step 1: Enter accession numbers/FASTA sequences and choose parameters

First, enter your accession number(s) or FASTA sequence(s) into the “Enter Query Sequence” box.

For the database parameter, we support either the non-redundant database (nr) or the reference sequence collection (refseq_proteins). The refseq_proteins dataset is a high quality, non-redundant subset of protein records curated by NCBI staff. Meanwhile, nr is a larger, non-redundant set that includes many more sequences but is not necessarily vetted for quality and accuracy. If you would like to further filter your results based on lineages (e.g., species, genus, family, kingdom), enter the name/taxID in the Organism field and toggle the box to include/exclude those results in your search.

Next, select which algorithm to run. If you don’t know details of your protein, ‘BLASTP’ is a great place to start. If you are interested in identifying remote homologs, we suggest using ‘PSI-BLAST’. If your protein has domains of interest, ‘DELTA-BLAST’ works very well.

Creating a job title for the run is optional and for your personal convenience.

Under the expandable “Algorithm parameters” section, the defaults for max target sequences are typically sufficient. The expect threshold value, or E-value, represents the number of matches by pure random chance, and filters out hits with values greater than the threshold. We suggest 1e-5 (1x10-5 or 0.00001) for general searches. Double check your parameters across the page, then click the BLAST button.

Summary of NCBI BLAST submission parameters

Accession Number(s) or FASTA sequence(s)
Database

RefSeq. This database contains only NCBI-curated, high quality, non-redundant protein sequences.
NR. This database contains a much larger pool of uncurated, variable quality, non-redundant protein sequences.

Step 2: Downloading BLAST results

Once your BLAST search is complete, at the end of the RID row towards the top of the page, there will be a Download All option with a dropdown menu to download results. Click on the Download All button and select the Hit Table (csv) option. You can directly upload these .csv result files to MolEvolvR. If the first column of the results .csv does not include accession numbers, you will also need to provide the query sequence(s) that you used to run BLAST as a second file (.fa, .faa, or .fasta format).

If you are performing a PSI-BLAST, you will have the option to run additional iterative searches upon each search’s completion. Further iterations will find more remote homologs, so it is recommended you run several iterations before downloading the Hit Table.

BLAST provides information in many formats for your protein homologs, which we encourage you to review. However, MolEvolvR requires the Hit Table (csv) for analysis.

Alternatively, you may upload command line BLAST results with these columns specified:

query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score, % positives

Check out the BLAST tutorials to learn more about BLAST.

InterProScan outputs from the IPRscan5 web-interface

InterProScan is available through EBI’s website.

If you have already identified your protein’s domains through InterProScan, you can upload the output to MolEvolvR for a customizable visual summary of the information.

To ensure compatibility with the MolEvolvR Start Analysis tab, follow these guidelines.

Step 1: Enter FASTA sequence

Input your protein’s FASTA sequence by copy/pasting into the box or uploading the FASTA file with the Choose file button. If the sequence is valid, InterProScan will display a green check mark in the bottom right corner of the input box. You can use the Advanced options dropdown to select specific databases. When finished, click the Search button to begin.

When the search completes (another green check mark will appear under Status), click on your job submission to view the output.

Step 2: Download InterProScan results & upload into MolEvolvR

Under the blue Export dropdown menu, download the results in the .tsv format. Upload the .tsv file to MolEvolvR for visualization and further analysis. If the first column of the results .tsv does not include accession numbers, you will also need to provide the query protein sequence that you used to run InterProScan as a second file (amino acid sequence in .fa, .faa, or .fasta format).

Check out the InterProScan tutorials to learn more about their algorithms and search parameters.

Load analysis results

After submitting proteins to MolEvolvR, take a break! Job runtimes depend on server load and on the complexity of your submission (Full Analysis taking the most time), but you can expect typical runs to take 10 minutes or more. Phylogenetic analyses run much faster.

You will receive a six-character alphanumeric analysis code after submission. We recommend saving this code before you close the app. You will need to enter it later on the Retrieve Results tab to view your results.

Before submitting, you can provide an email to receive a link to your analysis. The app will also save any analyses you’ve submitted on your current device (laptop, phone, etc.), and will list them under the Retrieve Results tab.

Results summary

The Result Summary tab provides a high-level overview/snapshot of your analysis results. The top of the page includes tiles that surface key results of the analysis, including input and job metrics, homologs found, and lineages/domain architectures that were identified.

From this page, you can also generate a downloadable HTML report of your entire analysis by clicking the Generate Report button (see Report Generation section below for details). This report can be used offline, shared with collaborators, or archived for reproducibility, and includes interactive graphics and tables similar to those displayed on the MolEvolvR site.

You can explore your analysis fully with the detailed results and visualizations under the other tabs, as follows.

Query data

Data table

The Data Table tab shows the processed input data in tabular form. The default view includes query names, species, lineage, and their constituent Pfam domain architectures. The table can be customized using the Add/remove column(s) button to access the full list of available columns. Columns can be filtered by particular species, lineages, percentage similarities, etc., and the entire table can be searched with plain text or regex queries (see Regex section for advanced search patterns). Filters persist across tabs, allowing you to fine-tune your analysis by maintaining the same filter settings when exploring different visualizations. If you see a notification that “visualizations and data shown below are being filtered,” this reflects your active filters — it is expected behavior, not an error. This is to ensure that your filters are portable across tabs, and is therefore a feature, not a bug! Dismiss the notification or clear your filters to reset. The full data table (or any filtered subset) can be downloaded in .csv format with the Download as csv button.

FASTA

All FASTA sequences for the query protein(s) are provided for ease of access.

Query heatmap

A heatmap shows the occurrence of query protein(s) by taxonomic presence, which may be useful for multi-FASTA input of homologs.

Query domain architecture

A customizable domain architecture visualization shows the query protein(s) grouped by analysis or query. You can modify the domain plot by selecting the Analysis box and adding or removing results to display (e.g., Pfam, Phobius, Coils analyses).

Homolog data

The Homolog Data page contains detailed information on all homologs identified for each query protein.

Homolog data table

The Homolog Data table lists the best hits from all superkingdoms of life (queried across all refseq or nr genomes). Like in Query Data > Data Table described above, tabular details are provided across all homologs, including genome, species, lineage, and domain architecture information. Many homology-specific options are available in Add/remove column(s) like percent identity, cluster ID (BLAST parameters). The accession number for each homolog is hyperlinked to its corresponding NCBI protein page, and domain names are linked to EBI’s InterProScan details pages for easy reference.

Length distribution plot

The length distribution plot shows a box-and-whisker plot of the average lengths in amino acids of homologs grouped by lineage.

The “Superkingdom Filter” dropdown allows lineages to be filtered by the superkingdom (or taxonomic domain) to which each belongs. Multiple superkingdoms can be selected, resulting in all matching lineages being displayed.

The “Query Protein Filter” restricts the plot to only homologs found for the selected query protein(s). As with the superkingdom filter, multiple query proteins can be selected, which causes all matching homologs to be displayed.

Checking the “Show Median” box renders a line at the median length across all homologs. Checking the “By Query” box changes the plot to show the length distributions of homologs per query protein.

Domain architecture

A protein’s domain architecture (DA) refers to the order of specific functional regions of a protein. Currently, MolEvolvR uses databases and prediction algorithms integrated with InterProScan to characterize the domain architecture of protein queries and their homologs. We summarize the data with a set of useful visualizations below. Results from Pfam, Phobius, Gene3D, SignalP_Gram_positive, SignalP_Gram_negative, MobiDBlite, Hamap, and Coils are available.

Table

The table provides summary statistics on the domain architecture data across all homologs, with the top (most frequent) domain architectures by query protein (or across all queries) and the frequencies of occurrence and lineages in which they occur. Click each row to view the domain architecture spread across lineages in an interactive popup. The popup demonstrates the ‘LineageCount’ by showing the frequencies of occurrence by individual lineage for the selected domain architecture, allowing you to quickly identify lineage-specific vs. broadly conserved domain architectures.

Lineage bar chart

This stacked bar chart shows the frequency of occurrences of the top domain architectures per lineage; the domain architectures for each lineage-wise splits are displayed on the left.

Heatmap

A color gradient heatmap across the query protein(s) indicates the number of homologs identified within each lineage per domain architecture.

Rows: Predominant domain architectures. Columns: Key lineages from across the superkingdoms of life.

Network

A network visualization summarizes domain architectures across query protein(s) and their homologs. Nodes represent a domain and edges denote domain co-occurrence within a protein. The domains (nodes) that co-occur within a protein/domain architecture are connected (edges), and the size of nodes and thickness of edges are proportional to their relative occurrences across homologs (or query proteins).

InterProScan visualization

Each column of this visualization is organized by the database the domains were obtained from. The rows represent select query protein(s) and/or homologs (if a homology search was performed) with the lineage added to the front of the accession number. You can select specific proteins with the dropdown box, and choose to group rows by analysis or query to organize the visualization in different ways. The visualization can also be updated by toggling available database options under Analysis, or by adjusting the Total Cutoff Count slider to filter the number of proteins displayed based on the frequency of their domain architectures. The x-axis shows residue positions relative to the start of the protein (full-length search) or the start of the domain (when Run Analysis by Domain is used); axis labels are therefore relative, not absolute sequence positions.

UpSet plot

An UpSet plot is a helpful summary visualization that shows the frequencies of domains and domain architectures across all homologs. It shows the distribution of constituent domains underlying all homologs in a histogram (to the left). The combination matrix displays the various combinations of domains present across the domain architectures. The adjoining second histogram (on top) shows the frequency of occurrences of the indicated domain architectures (combinations).

Phylogeny

Phylogenetic analysis of proteins provides key insights into their development and evolution. The conservation of certain portions through lineages or across domains of life could indicate the importance of the protein in certain biological processes.

Sunburst

An interactive sunburst plot shows the phyletic spread of the query protein (selected with the Protein dropdown) across life. Hovering over each section of the plot displays the lineage. The depth of displayed taxonomic levels can be adjusted with Number of Levels to add more detail to the sunburst plot (default is 3 levels for the result summary sunburst).

Tree

This visualization is constructed from a multiple sequence alignment of representative homologs. Tree leaves are labeled by lineage, species (three-letter abbreviation), and accession numbers.

You can adjust the tree generation in two important ways: based on whether homologs are reduced to representative sequences (e.g., by lineage, species, or domain architecture), and based on the multiple sequence alignment (MSA) algorithm chosen (including Clustal Omega, Clustal W, and Muscle). The size of the tree can be adjusted by selecting the desirable number of sequences to include, allowing you to balance detail with readability. To the right of the tree is a visualization of the multiple sequence alignment, colored by amino acid and showing overall conservation of sequence and structure of the homologs used in tree construction.

You also have the option of downloading the tree as a raw Newick tree file (.nwk). This export format is a common standard and is compatible with most other software for viewing or customizing phylogenetic trees.

MSA

You can customize and download a multiple sequence alignment (MSA) as a searchable .pdf file, including a user-specified number of representative sequences among the homologs. This searchable PDF format makes it easy to find specific sequences or regions of interest within large alignments. You also have the option of downloading the MSA in FASTA format that is compatible with other external MSA readers and phylogenetic tree generators.

InterPro + Tree

This visualization combines the InterProScan results from the ‘Domain Architecture’ page with the phylogenetic tree described above. The two visualizations are aligned such that the tips of the tree correspond to the rows of the InterProScan results.

The tree can be customized via the parameters described in the section above.

The InterProScan panel can be customized by selecting the Analysis box and adding or removing InterProScan analysis types to display (e.g., Pfam, Phobius, Coils, Gene3D). Note that even if an analysis is selected, its corresponding column will only appear if the corresponding InterProScan rows that align to the tree have data for the selected analysis. Finally, following any changes to the domain architecture or tree/leaf attributes, select Generate to render the plot.

Explore your results

Report generation

You can generate a comprehensive report of your analysis in HTML format, which includes interactive figures for all the visualizations shown in the app. Click the Generate Report button to create the report. (This may take several minutes depending on the size of your analysis.)

Once the report is generated, you will receive a modal popup with a link to the results summary page. You can then click “Download Report” to download an HTML report named <job_code>_report.html.

If you wish to customize the parameters of the figures included in the report, you can first navigate to the other tabs after loading an analysis and then modify the visualizations as desired. Once you have customized the visualizations, return to the Result Summary tab and click Generate Report again to create a report with your customized visualizations.

Filters

Data tables are filterable via global or column-specific search boxes and controls. Filters are applied across the Phylogeny and Domain Architecture tabs (indicated by a small, dismissable notification), allowing users to fine-tune their analysis.

Columns are searched appropriately based on the data they contain. For example, the AccNum column is text searchable, while PcPositive provides sliders to specify a range of values.

Regex

Table-wide search boxes support JavaScript-flavored regular expressions. This can be used to make advanced searches, e.g. Staphylococcus\saureus|Klebsiella\spneumoniae (search for Staphylococcus OR Klebsiella.)

Compatibility

This web-app is regularly tested on the following:

Google Chrome/Brave, Mozilla Firefox, Apple Safari
Windows, macOS, iOS, Android
Desktop, tablet, phone/mobile

We only use standardized and widely supported HTML, CSS, and JavaScript features, so any other modern, standard-compliant browser such as Opera or Microsoft Edge should also work, even if not explicitly tested.

The following are NOT supported, and may result in unexpected look or behavior:

Microsoft Internet Explorer.
Smartwatches, or any device with a screen width < ~250px.
Browsers without JavaScript enabled (interactive features won’t work).

If you encounter a bug, please let us know!

Dependencies

Tools

We use the following tools to perform analyses and generate visualizations:

R v4.2.1
InterProScan v5.64-96
NCBI BLAST+ (with deltablast) v2.9.0
NCBI Entrez Direct v12.0
FastTree v2.1.11
MUSCLE v3.8.31
Phobius v1.01
TMHMM v2.0c
HMMER v2.3.2

Data

We use the following databases to characterize proteins and their homologs:

NCBI Taxonomy, NCBI GenBank/RefSeq, InterPro (incl. Pfam, Gene3D)

Databases, versions, and compilation dates:

BLAST refseq protein database v5 (Jun 20, 2026)
BLAST nr database v5 (Jun 15, 2026)
NCBI Taxonomy v2.3.2 (Jun 20, 2026)

R packages

ape, biomartr, cowplot, d3r, DT, gganimate, gggenes, ggraph, ggsci, ggthemes, ggtree, ggvis, gh, gridExtra, heatmap3, heatmaply, htmlwidgets, httr, igraph, knitr, latexpdf, pdftools, phangorn, phylogram, phylotools, phytools, plotly, rentrez, reutils, rmarkdown, seqinr, seqRFLP, shiny, shinydashboard, sunburstR, tidytext, tidytree, tidyverse, tinytex, UpSetR, viridis, visNetwork, wordcloud, wordcloud2

Software, database bibliography

Tools

Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL.
BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421.
https://pubmed.ncbi.nlm.nih.gov/20003500
Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, et al.
InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30(9):1236-1240.
https://doi.org/10.1093/bioinformatics/btu031
Kans J.
Entrez Direct: E-utilities on the Unix Command Line.
In: The NCBI Handbook [Internet]. National Center for Biotechnology Information (US); 2021.
https://www.ncbi.nlm.nih.gov/books/NBK179288/
Price MN, Dehal PS, Arkin AP.
FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS ONE. 2010;5(3):e9490.
https://pmc.ncbi.nlm.nih.gov/articles/PMC2693737/
Edgar RC.
MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research. 2004;32(5):1792-1797.
https://pubmed.ncbi.nlm.nih.gov/15034147
Käll L, Krogh A, Sonnhammer ELL.
A combined transmembrane topology and signal peptide prediction method. Journal of Molecular Biology. 2004;338(5):1027-1036.
https://pubmed.ncbi.nlm.nih.gov/15111065/
Krogh A, Larsson B, von Heijne G, Sonnhammer ELL.
Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. Journal of Molecular Biology. 2001;305(3):567-580.
https://pubmed.ncbi.nlm.nih.gov/11152613/
Eddy SR.
Accelerated Profile HMM Searches. PLoS Computational Biology. 2011;7(10):e1002195.
https://pubmed.ncbi.nlm.nih.gov/22039361

Databases

Schoch CL, Ciufo S, Domrachev M, Hotton CL, Kannan S, Khovanskaya R, et al.
NCBI Taxonomy: a comprehensive update on curation, resources and tools. Database (Oxford). 2020;2020:baaa062.
https://pubmed.ncbi.nlm.nih.gov/32761142/
O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, et al.
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Research. 2016;44(D1):D733–D745.
https://pubmed.ncbi.nlm.nih.gov/26553804/
Mitchell AL, Attwood TK, Babbitt PC, Blum M, Bork P, Bridge A, et al.
InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Research. 2019;47(D1):D351–D360.
https://pubmed.ncbi.nlm.nih.gov/30398656

How to cite

If you have used our web-app to generate any results for your publication or presentations, please cite us as follows:

MolEvolvR: a web-app for characterizing proteins using molecular evolution and phylogeny. Faisal S Alquaddoomi^#, Joseph T Burke^#, Lo Sosinski^#, Evan P Brenner, David A Mayer, Samuel Z Chen, Jacob D Krol, Vince P Rubinetti, Ethan P Wolfe, Shaddai Amolitos, Kellen M Reason, John B Johnston, Janani Ravi. [^#Co-primary authors] bioRxiv 2022.02.18.461833 (revised 2026); doi: https://doi.org/10.1101/2022.02.18.461833; web-app: https://jravilab.org/molevolvr

Current maintainers

Faisal S Alquaddoomi | faisal.alquaddoomi@cuanschutz.edu | falquaddoomi
Evan P Brenner | evan.brenner@cuanschutz.edu | epbrenner
Vince P Rubinetti | vincent.rubinetti@cuanschutz.edu | vincerubinetti
Janani Ravi | janani.ravi@cuanschutz.edu | @jananiravi | (Corresponding author)

Contact

Questions? Email us at thejravilab+molevolvr@gmail.com.

Funding

We would like to thank our funding sources: University of Colorado Anschutz start-up funds, Endowed Research Funds from the College of Veterinary Medicine, Michigan State University, and NSF-funded BEACON funding support awarded to JR; NSF-funded REU-ACRES summer scholarship to SZC; NIH NIAID U01AI176414 to JR; NIH NLM T15LM009451 to EPB.

Q: Will I receive an email when the job is done?

Yes, if you supplied an (optional) email on the submission page, then an email will be sent to confirm when a job is ready.

Q: How to paste/upload protein sequences?

Acceptable formats

NCBI FASTA

>OHS91782.1 16S rRNA pseudouridine(516) synthase [Staphylococcus aureus]
MRIDKFLANMGVGTRNEVKQLLKKGLVNVNEQVIKSPKTHIEPENDKITVRGELIEYIENVYIMLNKPKG
YISATEDHHSKTVIDLIPEYQHLNIFPVGRLDKDTEGLLLITNDGDFNHELMSPNKHVSKKYEVISANPI
TEDDIQAFKEGVTLTDGKVKPAILTYIDNQTSHVTIYEGKYHQVKRMFHSIQNEVLHLRRIKIADLELDS
NLDSGEYRLLTENDFDKLNYK

UniProt FASTA

>sp|P01189|COLI_HUMAN Pro-opiomelanocortin OS=Homo sapiens OX=9606 GN=POMC PE=1 SV=2
MPRSCCSRSGALLLALLLQASMEVRGWCLESSQCQDLTTESNLLECIRACKPDLSAETPM
FPGNGDEQPLTENPRKYVMGHFRWDRFGRRNSSSSGSSGAGQKREDVSAGEDCGPLPEGG
PEPRSDGAKPGPREGKRSYSMEHFRWGKPVGKKRRPVKVYPNGAEDESAEAFPLEFKREL
TGQRLREGDGPDGPADDGAGAQADLEHSLLVAAEKKDEGPYRMEHFRWGSPPKDKRYGGF
MTSEKSQTPLVTLFKNAIIKNAYKKGE

Custom FASTA header (not recommended)

>SEQUENCE154 UNKNOWN 
MPRSCCSRSGALLLALLLQASMEVRGWCLESSQCQDLTTESNLLECIRACKPDLSAETPM
FPG

The application uses NCBI or UniProt accessions to get taxonomy info from query proteins. Therefore, it is recommended to include valid protein accession numbers in the header when possible.

Common mistakes

No header lines (missing > header delimiter)

MRIDKFLANMGVGTRNEVKQLLKKGLVNVNEQVIKSPKTHIEPENDKITVRGELIEYIENVYIMLNKPKG

MPRSCCSRSGALLLALLLQASMEVRGWCLESSQCQDLTTESNLLECIRACKPDLSAETPM

Duplicate headers/AccNums

>GCF_000013425.1
MVPEEKGSITLSKEAAIIFAIAKFKPFKNRIKNNPQKTNPFLKLHENKKS
>GCF_000013425.1
MKQKKSKNIFWVFSILAVVFLVLFSFAVGASNVPMMILTFILLVATFGIGFTTKKKYRENDWL
>protein
MKLTLMKFFVGGFAVLLSYIVSVTLPWKEFGGIFATFPAVFLVSMFITGMQYGDKVAVHVSRGAVFGMTGVLVCILVTWM
MLHMTHMWLISIVVGFLSWFISAVCIFEAVEFIAQKRLEKHSWKAGKSNSK
>protein
MVKRTYQPNKRKHSKVHGFRKRMSTKNGRKVLARRRRKGRKVLSA

Q: Is my job still running? Did it complete?

Upon submission, a URL to retrieve the results will be displayed. The link provides job progress information and, once the job is finished, the results will be loaded.

Recommendations:

Bookmark the link
Supply an optional email to receive the link

Q: How long will my submission take to process? When can I expect my results?

Typical runtimes range from ~15–30 minutes for a single-protein full analysis to several hours for larger multi-protein submissions (e.g., Full analysis with >25 proteins queried against the nr database). Phylogenetic analyses typically run much faster (e.g., 2–3min for ~25 homologs).

Key factors that impact job duration:

Number of sequences submitted (contact us for >100 protein submissions)
Type of analysis selected (e.g., Full vs. Phylogenetic)
Number of homologs to search for each sequence (Advanced Options>Maximum Hits)
Length & complexity of sequences

Visualization performance is optimized for up to ~1,000 homologs. Larger datasets are supported but may require additional filtering (via the data table search/filter controls) for optimal interactivity.

Case studies

The computational methods underlying MolEvolvR have enabled understanding fundamental biological systems and protein evolution.

In this section, companion MolEvolvR jobs for proteins studied in these publications are provided for users to explore.

MolEvolvR

A web-app for characterizing proteins using molecular evolution and phylogeny

Example Analyses

Overview and Features

Abstract

How to cite

Analysis

Past Analyses

Enter Retrieval Code

Results Summary

Domain Architecture

Phylogeny

Data

Query Data

Parameters

Versions at Runtime

Homolog Data

Domain Architecture

Phylogeny

The UI: workflow and usage

Published use cases/testing

How to use MolEvolvR

Enter data

Accession numbers and FASTA (full analysis)

Multiple accessions/FASTA of homologs

Advanced options

Phylogenetic analysis

Homology search

Domain architecture

Run analysis by domain

BLAST parameters

Organism(s) to include/exclude

Customizing your analysis

Find homologs

Fully characterize proteins of interest

Analyze external data

BLAST outputs from the NCBI BLAST web-interface

Step 1: Enter accession numbers/FASTA sequences and choose parameters

Step 2: Downloading BLAST results

InterProScan outputs from the IPRscan5 web-interface

Step 1: Enter FASTA sequence

Step 2: Download InterProScan results & upload into MolEvolvR

Load analysis results

Results summary

Query data

Data table

FASTA

Query heatmap

Query domain architecture

Homolog data

Homolog data table

Length distribution plot

Domain architecture

Table

Lineage bar chart

Heatmap

Network

InterProScan visualization

UpSet plot

Phylogeny

Sunburst

Tree

MSA

InterPro + Tree

Explore your results

Report generation

Filters

Regex

Compatibility

Dependencies

Tools

Data

R packages

Software, database bibliography

Tools

Databases

How to cite

Current maintainers

More from JRaviLab

Contact