TCR_Explore Shiny R application

TCR_Explore was designed as an open-access web server that analyses and visualises TCR repertoire data without the need for coding expertise. TCR_Explore introduces multiple pipelines using an automated process that includes pairing of αβ or γδ chains, as well as facilitating interrogation of linked flow cytometric index data for immunophenotyping analyses. Additionally, automated summarisation process from a single input file enables the creation of a variety of publication-ready analytical plots.

There are three main sections:

Quality control (QC) processes
- Uses output files generated from IMGT¹
- Workflow → QC tab
- Creates a universal input file for TCR repertoire data analysis
- Tutorial video available
TCR analysis
- User uploads the paired file generated from TCR_Explore QC process
- Several analytical graph features available including Treemap, Chord diagram, Pie chart, Motif analysis, Diversity and chain usage, and Overlap for comparison of multiple datasets (Heatmap and Upset plots)
- For more information on the functions, see the TCR analysis information tab
Paired TCR with Index data
- User uploads the paired file generated from TCR_Explore QC process and a corresponding .fcs (FACS index data) file
- The merged file undergoes further QC process in the 'data cleaning steps'
  1. Changes the flow cytometric values from negative to small positive
  2. User can filter using the clone count for coloring purposes (0=all values included)
- This clean file is then used to create the dotplot, which has over 20 customisable features
- For more information on the functions, see Paired TCR with Index data information tab

Please contact: Kerry.Mullan1@monash.edu or Nicole.Mifsud@monash.edu to report errors.

Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia

Citation:

Mullan KA, Zhang JB, Jones CM, Goh SJR, Revote J, Illing PT, Purcell AW, La Gruta NL, Li C, Mifsud NA. TCR_Explore: A novel webtool for T cell receptor repertoire analysis. Comput Struct Biotechnol J. 2023 Feb 3;21:1272-1282. doi: 10.1016/j.csbj.2023.01.046. PMID: 36814721; PMCID: PMC9939424.

paper

References:

Lefranc MP, Giudicelli V, Duroux P, Jabado-Michaloud J, Folch G, Aouinti S, et al. IMGT®, the international ImMunoGeneTics information system® 25 years on. Nucleic Acids Res. 2015;43(Database issue):D413-22.

Local installation

Download this github repository

To use TCR_Explore, first install R, RStudio and other required programs

MAC:

Install R
Install RStudio Version 2022.07.2-576, as later version have an issue with opening RShiny in the window.
Install XQuatz

Windows:

Install R
Install RStudio
Install Rtools

installing the packages.

Step 1. open the TCR_Explore.Rproj file

Step 2. open the install.packages.TCR_Explore.R file

Step 3. Run each line (top right)

short-cut key
- Mac: Command-shift-n
- Windows: ctrl-shift-n

Possible installing prompts

When you see this line: “These packages have more recent versions available. It is recommended to update all of them. Which would you like to update? Enter one or more numbers, or an empty line to skip updates:”, answer with 1 and hit Enter.

On a Mac if you see “Do you want to install from sources the package which needs compilation? (Yes/no/cancel)”, answer with no and hit Enter. This same message will appear as an popup on Windows.

Once all packages are installed, this will be the final section of the installation process.

* installing *source* package ‘TCR_explore.R’ ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
*** copying figures
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (TCR_explore.R)

Step 4. open the TCR_Explore_v1.0.R and hit RunApp (top-right corner)

Tutorial video of Quality control processes

Quality control process of T cell receptor from nested PCR experiments.

Paired Sanger sequencing QC process

Step 1. Convert the .seq to .fasta files (QC -> SEQ to FASTA file merger)

Add in the required file naming conversion e.g. Individual.groupChain-initialwell
If more identifiers are required for your analysis, they can be added in later.

Step 2. Upload the .fasta file to IMGT/Vquest

select the species (e.g. Homo Sapiens) and receptor type or locus (e.g. TR)
Upload .fasta file to chose file
C. Excel file -> unchecked all -> Check 1. (Summary) and 6. (Junction)
- Alternatively you download all 12 tab, move the Junction tab to position 2
Download vquest.xls file by selecting the start button
Repeat this process for every .fasta file

Step 3. Downloading TCR_Explore QC file

Upload the vquest.xls file to QC -> IMGT (Sanger sequencing) -> tab 1. Create QC file.
Download the .csv file called “IMGT_onlyQC.csv”
The program extracts the required columns needed either for the filtering QC process, TCR_Explore or TCRdist

NOTE: If steps 1 and 2 were completed prior to this process and the Junction sheet is missing, the following process can be completed with 'Summary' sheet only. However, without this sheet, one cannot download the TCRdist3 file in the 'TCR analysis -> overview of TCR pairing -> summary table', as the JUNCTION nucleotide sequences will be missing.

Step 4. Fill in the QC file Copy all sequences into the one .csv file, if more than one plate is present add a number in front of the initial well.

The program adds three columns to the end of the file

'V.sequencing.quality.check'
- Will list the following issues: No alignment, Unproductive issue, V identity issue, J identity issue or No issue flagged by IMGT
'clone_quality'
- If V.sequencing.quality.check reported 'No issue flagged by IMGT' the program pre-fills in as a pass, as the test-data highlighted all 151 sequences had high quality chromatograms
- The remaining rows in will be filled in as NA. These NA will need to be replaced with either 'pass' or 'fail' in based on the chromatogram quality
- Can be viewed in QC -> Check .ab1 files.
- Alternatively, the .ab1 can viewed in programs including FinchTV (Mac) or Chromas (Windows)
- We recommend checking the quality of the chromatogram from the pre-filled in 'pass' sequences
'comments'
- Fill in based on NA e.g. High quality sequence (pass), two sequence (fail), cannot resolve frameshift/stop sequence (fail), messy sequences (fail)

Step 5. Creating the file needed for paired TCR_Explore

Upload the filled in IMGT_onlyQC.csv to QC -> IMGT (Sanger sequencing) -> tab 2. Paired chain file -> Completed QC file (.csv)
Select if the data is alpha-beta (ab) or gamma-delta (gd)
Select if the data was created from either “summary+JUNCTION” or “Summary”
This will render two tables in this section:
- Table 1. summary of QC process
- Table 2. paired TCR file
Download the paired chain file.
If needed also download the TCRdist output file.

Other data types

Sanger sequencing (single chain)

Complete steps 1-4

Step 5. Downloading filtered file

Upload the filled in IMGT_onlyQC.csv to QC -> IMGT (Sanger sequencing) -> tab 2. Paired chain file -> Completed QC file (.csv)
Go to the Tab 4. Single chain file, which does not require specifying if the data was alpha-beta or gamma-delta
Download single chain file

Non-Sanger sequencing data

Data that has undergone alignment with other processes (e.g. ImmunoSEQ, MiXCR etc.), will require conversion to the TCR_Explore format.
Go to QC -> Convert to TCR_Explore file format
There is two input types ImmunoSEQ and Other.
The user can upload either a .tsv, .csv or .txt file. Headers must be in row 1.
Select the column with the counts, variable, Diversity, Junction and amino acid column
Remove all unnecessary columns
This process re-orders the datasheet so the countColumn is in Column A of the .csv file, as well as adding in TRV, TRJ, TRVJ and TRVJ_CDR3 to the end of the file to aid producing the graphs
A video explain this process is in available.
For the ImmunoSEQ processed data, TCR_Explore will remove rows with missing information (e.g. NA in both V and J genes) NOTE: For other sources, the user will have to manually remove non-functional sequence. Contact Kerry.mullan@monash.edu if you have a specific filtering requirement.

TCR repertoire analysis

TCR analysis section
Motif analysis section
Diversity and chain interrogation
Group overlap analysis

Side panel.

Upload the file. This can be from our QC section or alternative sources. The other features in the side panel are

'Type of group'
- This is used to change the comparison. We recommend either using “group”,“indiv” or “group.indiv”.
'Type of data'
- This segregates out if the original file was 'raw' or 'summarized'
'Type of font'
- Specify the font for the figures. the R default fonts are serif, sans and mono. Additional fonts were found on https://fonts.google.com (email Kerry if there is a specific font you would like to use.)

TCR analysis section

summary table
Treemap
Chord diargram
Pie chart

Overview of TCR pairing

summary table

The user can specify the type of summary table to download.

They can either select their own columns (general summary) or downlaod as TCRdist3 .csv output.

For the TCRdist3, there is a need to use our QC process as it matches the IMGT column names.

There is also a need to select if the input data is either alpha-beta (ab) or gamma-delta (gd) for the TCRdist3 column selection.

TCR analysis section
Go to top

Treemap

The user can specify:

The order of the group (i.e. CD8 and IFNg)
colour choices include: default, rainbow, random or one colour (specified in side panel; e.g. grey)
- The colour can be altered afterwards
If they want the labels to appear on the graph
Column to colour as well as column to separate the panel
This plot can be downloaded as a PNG or PDF

TCR analysis section
Go to top

Chord diargram

There are several features the user can specify:

Sub-group to display
The user can select the two columns used to display in the chord diagram
The transparency of the 'Label' and 'no label' is for the entire data set
There is also an option to selectively label one or more, where the transparency and lines can be added
Colour choices: default, rainbow, random or one colour (specified in side panel)
Labels can be added or removed if needed
Legend is not displayed for any of the graphs
This plot can be downloaded as a PNG or PDF

TCR analysis section
Go to top

Pie chart

There are several features the user can specify:

Displays one chain
The user can alter what is displayed as either: group or indiv.group
The amount of rows can be specified
The legend location can be altered as well as the size of the text
Colour choices: default, random or one colour (specified in side panel)
This plot can be downloaded as a PNG or PDF

TCR analysis section
Go to top

Motif analysis section

CDR3 length distribution
Single length motif analysis
Aligned motif analysis

CDR3 length distribution

The length distribution presented is by the unique CDR3 sequences.

The user can specify:

Type of graphs available:
- histogram, which can be coloured by specific chains (e.g. AVJ)
- Density plot, coloured by Column of group (side panel)
The CDR3 sequences (amino acid or nucleotide)
The size of the text
x-axis range (default 1 to 30) and tick mark interval
The user can also download the summarised table with the lengths or colours that were used
This plot can be downloaded as a PNG or PDF

Motif analysis section
Go to top

Single length motif analysis

The nucleotide and amino acid plots show the unique sequences of a certain length (e.g. 15)

These are displayed as 'Motif (amino acid)' and 'Motif (nucleotide)'

The 'Motif (amino acid)' can also compare two groups of the same sequence.

Motif analysis section
Go to top

Aligned motif analysis

This section can align the sequences using 'muscle' package.

Motif analysis section
Go to top

Diversity and chain interrogation

Chain usage
Diversity of TCR sequence

Chain usage

there are three types of graphs available in chain bar graph section:

Diversity and chain interrogation
Go to top

Diversity of TCR sequence

The top panel showcases the Inverse Simpson Diversity index (SDI) table. This table can be downloaded, which may be needed with more complex designs (ANOVA).

The bottom panel showcases the graphical outputs and simple t-test.

Diversity and chain interrogation
Go to top

Group overlap analysis

There are two graphs to this overlap section.

Heatmap

The Heatmap plot can display data of a specific group/individual (Select specific groups=yes; selected group=“E10630.CD8”). The user may wish to display the x by y of AV vs BV.

If 'Select specific groups=no', the user can showcase the multiple individuals on either the x or y axis (see image below)

TCR analysis section
Go to top

Upset plot

The upset plot can highlight if the specific clonotypes overlap.

TCR analysis section
Go to top

Paired TCR with FACS index data

Please contact: Nicole.Mifsud@monash.edu or Kerry.Mullan@monash.edu to report errors.

Upload the FACS file and unsummarised clone file

The first tab is used to merge the paired TCR file with the .fcs FACS file.

The user needs to type into the “Group of data” (e.g. other) and “Individual of data” (e.g. 780) the group and individual. There is also the option to specify if multiple plates were used. However, if there is only one group and individual, only the header will show (i.e. group and Indiv), but the data will pair correctly (see test-data example)

The merged file is based on a 80 well sorted plate (A1-H10). Columns 11 and 12 are not included, which is based on the experimental setup.

Go to top

Data cleaning steps

upload the merged index paired TCR data file.

Things to do before uploading the file

rename headers as desired (i.e. CD69 APC)
Headers can only contain characters or number (no special characters)

Recommended selecting for ab TCR data: Indiv, group,TRBV,CDR3b.Sequence, TRBJ, TRAV, CDR3a.Sequence, TRAJ, AJ, BJ and AJBJ. Do not select flurochrome columns, cloneCount

Creating the files

select the gene column and corresponding CDR3 column (repeat for both chains)
Select the # of clones cut-off i.e. >1 clone or 0 for all clones
Download the file

Note: I would recommend leaving the clonal filter at 0 or 1. I would then copy these columns in excel followed by removing unwanted clones rather than having to redo this step.

Go to top

The dot plot of selected clones

User defined variables include:

x-axis
y-axis
Adding in a histogram
Column to colour by
Size, location and number of columns for the legend
- If using histograms, place legend below or to the left
Type of colouring scheme: Default, random or grey
- All colours can be altered
x- and y-axis cut-off lines (default = 1000 or 10^3); default colour is grey
Download as either a .png or PDF

Go to top

Fill in the 'clone_quality' column with lowercase: pass or fail

Information included

A high quality chormatogram will display few mismatches (Blue) between the primary and secondary sequences.

Showcasing the heterogeneous sequences in the .ab1 file

Number of sequences per row

Trim 5` sequences

Trim 3` sequences

Width of PDF

Height of PDF

Download PDF

Width of PNG

Height of PNG

Resolution of PNG

Download PNG

Overlap of the primary and secondary sequence.

Checking for heterozygosity in sequence overlap.

Check heterogenatiy of sequences with a 0.33 ratio cut-off as per the 'sangerseqR' package recommendation

Converting to TCR_Explore
Video of conversion process

Upload eiter .tsv, .csv or .txt files to convert to TCR_Explore format

Rows with missing sequences are removed from V and J gene columns

If using ImmunoSEQ data, there is an additional filtering step to only keep in-frame sequences

Input type

Count column

In-frame column (e.g. sequenceStatus)

CDR3 amino acid column

D chain present?

Variable gene column

Diversity gene column

Junction gene column

Columns to remove

Merge Multiple Files

Group of data

Individual of data

Multiple plates

Plate #

Conversion to bioexponetional
UMAP reduction

Recommended selecting for ab TCR data: Indiv, group,TRBV,CDR3b.Sequence, TRBJ, TRAV, CDR3a.Sequence, TRAJ, AJ, BJ and AJBJ. Do not select flurochrome columns, or cloneCount

Select flurochrome columns for dimension reduction (UMAP)

Lower number of clusters

Upper number of clusters

Average of each flurochrome per cluster (log10 transformed)

type of plot

Add histogram

Add ellipse to clusters

Colour

Shape

Size

Prefix of file name

Width of PDF

Height of PDF

Download PDF

Width of PNG

Height of PNG

Resolution of PNG

Download PNG

TCR_Explore Shiny R application

Citation:

References:

Local installation

To use TCR_Explore, first install R, RStudio and other required programs

MAC:

Windows:

installing the packages.

Tutorial video of Quality control processes

Quality control process of T cell receptor from nested PCR experiments.

Paired Sanger sequencing QC process

Other data types

Sanger sequencing (single chain)

Non-Sanger sequencing data

TCR repertoire analysis

Side panel.

TCR analysis section

Overview of TCR pairing

summary table

Treemap

Chord diargram

Pie chart

Motif analysis section

CDR3 length distribution

Single length motif analysis

Aligned motif analysis

Diversity and chain interrogation

Chain usage

Diversity of TCR sequence

Group overlap analysis

Heatmap

Upset plot

Paired TCR with FACS index data

Upload the FACS file and unsummarised clone file

Data cleaning steps

The dot plot of selected clones

Add Indiv and group/chain name

IndividualID.groupChain-initialwell

Select range of 50

option for paired and TCRdist outputs

Fill in the 'clone_quality' column with lowercase: pass or fail

Upload eiter .tsv, .csv or .txt files to convert to TCR_Explore format

Column of group

Type of input

Type of font

Rows

Colour

Colour treemap by

Separate panels by

Count column

Add label

If you see this error: 'not enough space for cells at track index '1'. Adjust Text size (cex)

Group

Chain one

Chain two

Type of label

Colour

Exporting the Circular plot

Colour by this chain

Colour

Legend location

Rows

Size of legend text

The amino acid CDR3 columns are callled: AA.JUNCTION, JUNCTION..AA. or CDR3_IMGT.

The _A (alpha), _B (beta), _G (gamma), _D (delta)

The amino acid CDR3 columns are callled: AA.JUNCTION, JUNCTION..AA. or CDR3_IMGT.

The _A (alpha), _B (beta), _G (gamma), _D (delta)

Select amino acid column and CDR3 length

Amino acid CDR3 column

Group 1 (top)

Group 2 (bottom)

Type of comparison

Exporting amino acid plot

Select nucleotide column and CDR3 length

Nucleotide CDR3 column

Group

Exporting plot

The amino acid CDR3 columns are callled: AA.JUNCTION, JUNCTION..AA. or CDR3_IMGT.

The _A (alpha), _B (beta), _G (gamma), _D (delta)

Restrict range