List of Options of SeqCode Services

NGS Tools

Compare2Peaks

Application to compare two sets of genomic regions and discriminate the common overlapping regions from the specific ones.

1 First set of peaks (BED format)

Tab-separated plain text file that contains one list of peaks. Each peak is described for the chromosome, the starting and the ending position in BED format.

Example about the format on a file with peaks in BED format:
chr   pos1   pos2
chr   pos1   pos2
...

2 Second set of peaks (BED format)

Tab-separated plain text file that contains one list of peaks. Each peak is described for the chromosome, the starting and the ending position in BED format.

Example about the format on a file with peaks in BED format:
chr   pos1   pos2
chr   pos1   pos2
...

3 Names of each set of peaks

Text labels that will be used to identify the peaks from each set when performing the comparison and building the final lists of overlapping and specific peaks. Spaces and special characters will be substituted for the symbol "_".

4 Definition of the overlap between two peaks

Minimum size in the overlap between two peaks that coincide to be considered as a successful match.

5 Graphical parameters of each set

Definition of graphical parameters to customize the Venn diagram between both sets of elements.

- Background color is used to fill the circles inside.
- Circle line color is used for the line around each circle.

Colors must be defined in the R software as shown in the following table.

- Transparency degree: alpha channel value related to the opacity in overlapping areas.
- Extract the number: this option is useful when there is not enough space to put a number inside an area of the diagram.
- Distance of the set name to circle: this option is useful to increase the space between the label of the set and the corresponding circle.

6 Global parameters of the Venn diagram

Definition of graphical parameters to customize the Venn diagram global appearance.

- Font size of gene names is useful to change the size of labels outside the diagram.
- Font size of numbers is useful to change the size of numbers inside the diagram.
- Line width of circles can be used to adapt the thickness of the Venn diagram circles.
- Figure size (small/normal): this option adapts the margins of the page and generates smaller plots (if necessary).
- Extract the common number permits the user to extract the value in the intersection of the diagram.
- Distance of external numbers together with the previous option is able to change the distance of numbers to the diagram.
- Proportional sizes: this option allows the users to generate proportional sets in the final diagram according to their sizes.
- Font family (helvetica/times) defines the family of the font (sans-serif or serif).

ComputeChIPlevels

Application to determine the amount of normalized reads of a sequencing experiment within a set of genomic regions.

1 Catalog of ChIPseq available experiments

List of ChIPseq experiments that are available in our web site. Samples are classified into several main groups:

- Mouse embryonic stem cells (serum)
- Mouse embryonic stem cells (2i+LIF)
- Mouse HPC7 cells (hematopoietic precursor)
- Human K562 cells (chronic myelogenous leukemia)
- Human DU145 cells (prostate cancer)
- Drosophila wing imaginal discs (L3)

Users will choose one sample of the list to plot the trend exhibit by the subsets of genes provided at the same time. Optionally, it is possible to select a control experiment from the second list (e.g. Input or IgG samples) to display the background level in the resulting image.

2 Number of subsets of genes to be included

Number of genesets that will be employed in the final graphical representation. For each list of genes provided by the user, the application will generate the resulting profile using the same sequencing experiment and the whole collection of profiles will be gathered into the same picture.

3 Subsets of target genes

Plain text file that contains a list of elements for further characterization. Only one element is stored at each line. The name of the elements is used to index another file of information in which the features for all elements have been previously uploaded as well. The lines of the full set of values that correspond to each key here will be graphically represented afterwards.

Example about the format of one file with the elements that form a subset.
element1
element2
element3
...

4 Captions and titles

Text labels that will be used to characterize the resulting boxplot of ChIPseq levels. Spaces and special characters will be substituted for the symbol "_".

5 Graphical parameters of Boxplots

Definition of graphical parameters to customize the global appearance of the resulting boxplot of ChIPseq levels.

- Color palette is the combination of colors that is used for the boxes.
- Color style for boxes can be used to determine whether the boxes are filled on a solid style, using colors for lines or black and white.
- Log scale converts the dsitributions of values submitted by the user into log scale distributions.
- Violin plots or violin plots with boxplots inside can be included in the final image instead of canonical boxplots.
- Line width can be used to adapt the thickness of the boxes of the boxplot.
- Outliers of each distribution can be included into the final boxplot.
- Each individual observation can be plotted as a point (optional).
- The size of the individual observations can be customized (see above for observations).
- The value on the X axis and the Y axis for the labels can be changed depending on the final boxplot.

MACSAnnotator

Application to attach putative target genes to the set of peaks called by MACS in the resulting XLS file.

1 List of peaks (MACS format)

Tab-separated plain text file generated by MACS that contains one list of peaks. Each peak is described for its location plus a series of several attributes and quality scores.

Example about the format on a file with peaks in MACS format:
chr  start	end	length	summit	tags	-10*log10(pvalue)	fold_enrichment	FDR(%)
chr1 3669878	3672606	2729	1433	123	193.84			6.05		0.00
chr1 4489349	4498634	9286	2549	566	169.78			3.69		0.00
chr1 4570559	4572646	2088	1400	100	73.15			2.79		1.02
...

2 Genome assembly

To match ChIPseq peaks or regions of another classes to genes, users must select the appropriate catalog of RefSeq transcripts. Into this current version of SeqCodeWEB, there are five available collections of genes:

- Mouse (mm10)
- Mouse (mm9)
- Human (hg38)
- Human (hg19)
- Drosophila (dm3)

Users can run the stand-alone version of SeqCode to perform the same operation on any RefSeq catalog of transcripts.

3 Name of the experiment

Text label to characterize the peaks when matching to genes into the output files. Spaces and special characters will be substituted for the symbol "_".

PeakAnnotator

Application to determine the frequency of a set of peaks in each class of gene features (exon, intron, ...).

1 List of peaks (BED format)

Tab-separated plain text file that contains one list of peaks. Each peak is described for the chromosome, the starting and the ending position in BED format.

Example about the format on a file with peaks in BED format:
chr   pos1   pos2
chr   pos1   pos2
...

2 Genome assembly

To match ChIPseq peaks or regions of another classes to genes, users must select the appropriate catalog of RefSeq transcripts. Into this current version of SeqCodeWEB, there are five available collections of genes:

- Mouse (mm10)
- Mouse (mm9)
- Human (hg38)
- Human (hg19)
- Drosophila (dm3)

Users can run the stand-alone version of SeqCode to perform the same operation on any RefSeq catalog of transcripts.

3 Name of the experiment

Text label to characterize the peaks when matching to genes into the output files. Spaces and special characters will be substituted for the symbol "_".

4 Rules of association between genes and peaks

Users can define the region of the RefSeq transcripts that is useful to calculate the overlap against the ChIPseq peaks. There are three classes of definitions:

- One place upstream of the TSS until the TSS (excluding the gene body) or the TES (including the gene body)
- From the TES (excluding the gene body) or from the TSS (including the gene body) until one place downstream of the TES
- One region around the TSS (indicating the amount of positions upstream and downstream of the TSS)

In all these cases, it is necessary to set the length of the region that will be scanned when searching for overlaps between peaks and transcripts.

5 Graphical parameters of the pie charts

Definition of graphical parameters to customize the global appearance of the resulting piecharts of genome annotations.

- High/low detail is useful to switch between two lists of genomic features to characterize the peaks:

(Detailed)
* Distal promoters: the region between 2.5 Kb and 0.5 Kb upstream of the TSS of genes
* Proximal promoters: the region between the TSS and 0.5 Kb upstream of the TSS of genes
* 5'UTR and 3'UTR: untranslated regions upstream/downstream of transcripts (UTR exons)
* CDS: protein coding sequence part of transcripts (coding exons)
* Introns: spliced part of the genes that is not included into the transcripts
* Intergenic regions: genomic regions that do not belong to any of the previous classes

(Simple)
* Promoter region: the region between 2.5 Kb upstream of the TSS and the TSS
* Intragenic: exons and introns of genes
* Intergenic: genomic regions that are not classified as promoters or intragenic


Depending on the size of peaks, those elements can overlap with more than one class of genomic region. Thus, the total number of peaks shown in the title of the piechart will reflect these cases (this value will be equal or higher than the actual number of peaks provided by the user).
The corresponding spie chart to represent how significant are the results on the particular genome will be generated only when the High detail option is active. The circular grid can be optionally hidden.

- The color to depict each class of genomic region in the pie charts can be selected by the user
Colors must be defined in the R software coloring scheme as shown in the following table.

ProduceGENEplots

Application to generate the aggregated meta-gene plot of a ChIPseq experiment for a list of genes.

1 Catalog of ChIPseq available experiments

List of ChIPseq experiments that are available in our web site. Samples are classified into several main groups:

- Mouse embryonic stem cells (serum)
- Mouse embryonic stem cells (2i+LIF)
- Mouse HPC7 cells (hematopoietic precursor)
- Human K562 cells (chronic myelogenous leukemia)
- Human DU145 cells (prostate cancer)
- Drosophila wing imaginal discs (L3)

Users will choose one sample of the list to plot the trend exhibit by the subsets of genes provided at the same time. Optionally, it is possible to select a control experiment from the second list (e.g. Input or IgG samples) to display the background level in the resulting image.

2 Number of subsets of genes to be included

Number of genesets that will be employed in the final graphical representation. For each list of genes provided by the user, the application will generate the resulting profile using the same sequencing experiment and the whole collection of profiles will be gathered into the same picture.

3 Subsets of target genes

Plain text file that contains a list of elements for further characterization. Only one element is stored at each line. The name of the elements is used to index another file of information in which the features for all elements have been previously uploaded as well. The lines of the full set of values that correspond to each key here will be graphically represented afterwards.

Example about the format of one file with the elements that form a subset.
element1
element2
element3
...

4 Graphical parameters of each list of genes

Definition of graphical parameters to customize the appearance of the multiple lists of genes into the resulting metaplot.

For each list of genes, users can define the following values:
- The color to represent the list of genes in the metaplot
- The style of the line: solid, dashed or dotted
- The width or thickness of the line: from 1 to 8
- The level of transparency when overlapping to other lines

Colors must be defined in the R software coloring scheme as shown in the following table.

5 General graphical parameters

Definition of graphical parameters to customize the global appearance of the resulting metaplot.

Users can change the values of following parameters:
- The background and foreground colors of the metaplot
- The size of the font legend
- The style of the line depicting the TSS/TES of genes
- The width of the line depicting the TSS/TES of genes
- The colour of the line depicting the TSS/TES of genes
- The minimum and maximum value of the Y axis to crop the image

Colors must be defined in the R software coloring scheme as shown in the following table.

ProduceTSSmaps

Application to generate the heatmap of ChIPseq signal intensities of a selected experiment for a list of genes.

1 List of genes

Plain text file that contains a list of elements for further characterization. Only one element is stored at each line. The name of the elements is used to index another file of information in which the features for all elements have been previously uploaded as well. The lines of the full set of values that correspond to each key here will be graphically represented afterwards.

Example about the format of one file with the elements that form a subset.
element1
element2
element3
...

2 Number of ChIPseq experiments to be included

Number of ChIPseq experiments that will be employed in the final graphical representation. For each sample selected by the user, the application will generate the resulting heat map using the same set of genes provided before and the whole collection of maps will be gathered into the same picture.

3 Catalog of ChIPseq available experiments

List of ChIPseq experiments that are available in our web site. Samples are classified into several main groups:

- Mouse embryonic stem cells (serum)
- Mouse embryonic stem cells (2i+LIF)
- Mouse HPC7 cells (hematopoietic precursor)
- Human K562 cells (chronic myelogenous leukemia)
- Human DU145 cells (prostate cancer)
- Drosophila wing imaginal discs (L3)

Users can choose up to five different samples of the list to plot the heat maps corresponding to the same set of genes provided at this moment. It is possible to select a control experiment from the same list (e.g. Input or IgG samples) to display the background level in the resulting heat map.

4 Captions and titles

Text labels that will be used to characterize the resulting heatmap (title). Spaces and special characters will be substituted for the symbol "_".

5 Graphical parameters of each heatmap

Definition of graphical parameters to customize the appearance of the multiple heat maps into the resulting metaplot.

For each ChIPseq sample, users can define the following colors:
- Foreground color to represent the presence of signal at each gene
- Background color to represent the absence of signal at each gene

Colors must be defined in the R software coloring scheme as shown in the following table.

6 General graphical parameters

Definition of graphical parameters to customize the appearance of the image containing the full set of heat maps.

Users can change the values of following parameters:
- The background and foreground colors of the multiple heatmap
- To generate a uniform heat map in which the regions associated to each gene in the map display only presence (foreground color) or absence of signal (background color). In other words, it is generated a binary heat map as there is not a degree of colors proportional to the strength of the ChIP signal
- To normalize all the heat maps using the same value to favor the comparison among them

Colors must be defined in the R software coloring scheme as shown in the following table.

ProduceTSSplots

Application to generate the aggregated plot of a ChIPseq experiment around the TSS of a list of genes.

1 Catalog of ChIPseq available experiments

List of ChIPseq experiments that are available in our web site. Samples are classified into several main groups:

- Mouse embryonic stem cells (serum)
- Mouse embryonic stem cells (2i+LIF)
- Mouse HPC7 cells (hematopoietic precursor)
- Human K562 cells (chronic myelogenous leukemia)
- Human DU145 cells (prostate cancer)
- Drosophila wing imaginal discs (L3)

Users will choose one sample of the list to plot the trend exhibit by the subsets of genes provided at the same time. Optionally, it is possible to select a control experiment from the second list (e.g. Input or IgG samples) to display the background level in the resulting image.

2 Number of subsets of genes to be included

Number of genesets that will be employed in the final graphical representation. For each list of genes provided by the user, the application will generate the resulting profile using the same sequencing experiment and the whole collection of profiles will be gathered into the same picture.

3 Subsets of target genes

Plain text file that contains a list of elements for further characterization. Only one element is stored at each line. The name of the elements is used to index another file of information in which the features for all elements have been previously uploaded as well. The lines of the full set of values that correspond to each key here will be graphically represented afterwards.

Example about the format of one file with the elements that form a subset.
element1
element2
element3
...

4 Graphical parameters of each list of genes

Definition of graphical parameters to customize the appearance of the multiple lists of genes into the resulting metaplot.

For each list of genes, users can define the following values:
- The color to represent the list of genes in the metaplot
- The style of the line: solid, dashed or dotted
- The width or thickness of the line: from 1 to 8
- The level of transparency when overlapping to other lines

Colors must be defined in the R software coloring scheme as shown in the following table.

5 General graphical parameters

Definition of graphical parameters to customize the global appearance of the resulting metaplot.

Users can change the values of following parameters:
- The background and foreground colors of the metaplot
- The size of the font legend
- The style of the line depicting the TSS/TES of genes
- The width of the line depicting the TSS/TES of genes
- The colour of the line depicting the TSS/TES of genes
- The minimum and maximum value of the Y axis to crop the image

Colors must be defined in the R software coloring scheme as shown in the following table.

Data Sets

BarPlotter

Application to generate barplots of term enrichment analysis introduced manually or generated by Enrichr.

1 Set the working mode

Users can generate barplots from a list of terms provided directly through the web form or by uploading an Enrichr file of results. Depending on the mode that is selected, users must provide the information in the specific box of parameters below: (1) Manual mode requires the user to input the name, score and percentage of genes belonging to such a class in comparison to the whole genome; (2) Enrichr file requires to upload the file of scores as provided by Enrichr for a particular ontology and gene set (Table option).

2 Choose the number of enrichment terms

Number of terms of the ontology enrichment analysis to be included in the resulting barplot. For the Manual mode, the first N terms in the order generated by sorting all terms using the score provided by the user will be selected. For the Enrichr mode, all terms in the file will be ranked by the scoring attribute selected by the user and the TOP N in the current ranking will be selected for the barplot.

3 Names of each term and numerical values

(Manual mode only) Users must provide for each term to be included in the output barplot, its name, a numerical natural score and a value of percentage. For scores, P values must be converted into natural numbers (e.g. -log P value) by the user prior to be provided here. Percentages correspond to the fraction of the genes of a particular class that belong to this term in comparison to the total number of genes of the same class annotated in the whole genome. Percentages must be provided without the % symbol.

For example, this a valid format for a term in this section:
MY TERM1   300   50
...

4 Upload the Enrichr file

Users must upload the Enrichr file of analysys of a determined ontology over a given gene set. By clicking the "Export entries to table" function from Enrichr for a particular ontology term enrichment analysis, users can download this archive that comprises the full set of terms and scores as calculated by Enrichr.

5 Choose Enrichr score

Users will choose the Enrichr scoring attribute that will be used to rank the whole set of terms from the input file. The TOP N terms according to this attribute will be extracted to be display in the final barplot. Three different scoring schemes are available: P value, adjusted P value and combined score (see Enrichr documentation for further details).

6 Captions and titles

Text labels that will be used to characterize the resulting barplot (title, X and Y axis). Spaces and special characters will be substituted for the symbol "_".

7 Graphical parameters

Definition of graphical parameters to customize the global appearance of the resulting barplots.

- The color of bars in the barplot.
- The color of borders of the bars in the barplot.
- Showing the term names on the left or inside the bars (hidden is also possible).
- Showing the percentages of members of each class inside the bars or not.
- Drawing barplots or bubbleplots in which the size of the circles is proportional to the percentage of each term.
- A grid in grey or black and white style can be integrated as a background of the boxplot.
- Set the size of fonts from terms, axis, and title.
- Line width can be used to adapt the thickness of the boxes of the boxplot.
- The background and foreground colors of the barplot.

BoxPlotter

Application to generate the boxplot of the distribution of multiple values and perform statistical testing for a list of genes.

1 Full list of elements and values

Tab-separated plain text file that contains the list of elements and features. Each line of the list contains the same number of columns: the key that identifies the current element is introduced at the column 1 while the attributes are stored from column 2 up to column N. Commas (if any) are internally susbtituted for floating points. Not available values (NA) are included in the treatment.

Example about the format on a file with elements characterized using two features:
element1   value1   value2
element2   value1   value2
element3   value1   value2
...

2 Subset of elements to be studied

Plain text file that contains a list of elements for further characterization. Only one element is stored at each line. The name of the elements is used to index another file of information in which the features for all elements have been previously uploaded as well. The lines of the full set of values that correspond to each key here will be graphically represented afterwards.

Example about the format of one file with the elements that form a subset.
element1
element2
element3
...

3 Select the features, the input names and the color

Selection of the columns of the file of features uploaded before for the full set of elements. For those items included in the subset, a graphical representation will be generated from their characteristics. Users must label each column with a description that will be inserted in the picture. Spaces will be substituted for the symbol "_". One color defined in the R software must be assigned to each category as well.

Colors must be defined in the R software as shown in the following table.

4 Captions and titles

Text labels that will be used to characterize the resulting boxplot (title and Y axis). Spaces and special characters will be substituted for the symbol "_".

5 Graphical parameters of Boxplots

Definition of graphical parameters to customize the global appearance of the resulting boxplots.

- Font family (helvetica/times) defines the family of the font (sans-serif or serif). - Font size of the title is useful to change the size of the title above the boxplot.
- Font size of the axis Y is useful to change the size of values on this axis of the boxplot.
- Font size of the labels is useful to change the size of labels of the boxplot.
- Angle of the labels can be used to rotate (0/45/90 degrees) the labels of sets in the boxplot.
- Position of the labels allows the user to set the location of the X axis labels on a defined position.
- Line width can be used to adapt the thickness of the boxes of the boxplot.
- Show outliers must be active to include the outliers of the distributions into the boxplot.
- Log scale converts the distributions of values submitted by the user into log scale distributions.

6 Graphical parameters of Histograms

Definition of graphical parameters to customize the graphical appearance of the complementary histograms.

- Font size of the title is useful to change the size of the title above the boxplot.
- Font size of the axis is useful to change the size of values on the axes of the boxplot.
- Font family (helvetica/times) defines the family of the font (sans-serif or serif). - Line width can be used to adapt the thickness of the boxes of the boxplot.

BoxPlotter2

Application to generate the boxplot of the distribution of one value and perform statistical testing for multiple lists of genes.

1 Full dataset (element,value)

Tab-separated plain text file that contains the list of elements and features. Each line of the list contains the same number of columns: the key that identifies the current element is introduced at the column 1 while the attributes are stored from column 2 up to column N. Commas (if any) are internally susbtituted for floating points. Not available values (NA) are included in the treatment.

Example about the format on a file with elements characterized using two features:
element1   value1   value2
element2   value1   value2
element3   value1   value2
...
The user will select in the next step which column must be used to generate the boxplot.

2 Select column of data file

Selection of one column of the file of features uploaded in the first step for the full set of elements. For the subsets of genes that will be included in the the third step, a graphical representation will be generated from the value of this column.

3 Subsets of target genes

Plain text file that contains a list of elements for further characterization. Only one element is stored at each line. The name of the elements is used to index another file of information in which the features for all elements have been previously uploaded as well. The lines of the full set of values that correspond to each key here will be graphically represented afterwards.

Example about the format of one file with the elements that form a subset.
element1
element2
element3
...
Up to five different subsets can be incorporated into the same boxplot.
Users will provide a name for each subset that will be used to characterize them in the resulting plot.

4 Captions and titles

Text labels that will be used to characterize the resulting boxplot (title and Y axis). Spaces and special characters will be substituted for the symbol "_".

5 Graphical parameters of Boxplots

Definition of graphical parameters to customize the global appearance of the resulting boxplots.

- Font family (helvetica/times) defines the family of the font (sans-serif or serif). - Font size of the title is useful to change the size of the title above the boxplot.
- Font size of the axis Y is useful to change the size of values on this axis of the boxplot.
- Font size of the labels is useful to change the size of labels of the boxplot.
- Angle of the labels can be used to rotate (0/45/90 degrees) the labels of sets in the boxplot.
- Position of the labels allows the user to set the location of the X axis labels on a defined position.
- Line width can be used to adapt the thickness of the boxes of the boxplot.
- Show outliers must be active to include the outliers of the distributions into the boxplot.
- Log scale converts the distributions of values submitted by the user into log scale distributions.

One color will be assigned to each box representing a subset in the boxplot.
Colors must be defined in the R software as shown in the following table.

6 Graphical parameters of Histograms

Definition of graphical parameters to customize the graphical appearance of the complementary histograms.

- Font size of the title is useful to change the size of the title above the boxplot.
- Font size of the axis is useful to change the size of values on the axes of the boxplot.
- Font family (helvetica/times) defines the family of the font (sans-serif or serif). - Line width can be used to adapt the thickness of the boxes of the boxplot.

BoxPlotter3

Application to generate the boxplot of the distribution of multiple values for multiple lists of genes allowing a wide range of graphical options.

1 Full list of elements and values

Tab-separated plain text file that contains the list of elements and features. Each line of the list contains the same number of columns: the key that identifies the current element is introduced at the column 1 while the attributes are stored from column 2 up to column N. Commas (if any) are internally susbtituted for floating points. Not available values (NA) are included in the treatment.

Example about the format on a file with elements characterized using two features:
element1   value1   value2
element2   value1   value2
element3   value1   value2
...

2 Subset of elements to be studied

Plain text file that contains a list of elements for further characterization. Only one element is stored at each line. The name of the elements is used to index another file of information in which the features for all elements have been previously uploaded as well. The lines of the full set of values that correspond to each key here will be graphically represented afterwards.

Example about the format of one file with the elements that form a subset.
element1
element2
element3
...

3 Select the features and input names

Selection of the columns of the file of features uploaded in the first step for the full set of elements. For those items included in the subset during the second step, a graphical representation will be generated from their characteristics. Users must label each column with a description that will be inserted in the picture. Spaces will be substituted for the symbol "_".

4 Captions and titles

Text labels that will be used to characterize the resulting boxplot (title and both axes). Spaces and special characters will be substituted for the symbol "_".

5 Graphical parameters

Definition of graphical parameters to customize the global appearance of the resulting boxplots.

- Color palette is the combination of colors that is used for the boxes.
- Color style for boxes can be used to determine whether the boxes are filled on a solid style, using colors for lines or black and white.
- Log scale converts the distributions of values submitted by the user into log scale distributions.
. Pseudocount value is added to each value of the distributions to avoid the calcularion of log 0.
- Violin plots or violin plots with boxplots inside can be included in the final image instead of canonical boxplots.
- A grid in grey or black and white style can be integrated as a background of the boxplot.
- Line width can be used to adapt the thickness of the boxes of the boxplot.
- Outliers of each distribution can be included into the final boxplot.
- Each individual observation can be plotted as a point (optional).
- The size of the individual observations can be customized (see above for observations).
- The value on the X axis and the Y axis for the labels can be changed depending on the final boxplot.
- The color of lines in the line plot of each individual
- The graphical appearance of lines in the line plot of each individual

HeatMapper

Application to generate the heatmap for a list of genes and values from multiple conditions.

1 Full set of records (element,value1,...,valueN)

Tab-separated plain text file that contains the list of elements and features. Each line of the list contains the same number of columns: the key that identifies the current element is introduced at the column 1 while the attributes are stored from column 2 up to column N. Commas (if any) are internally susbtituted for floating points. Not available values (NA) are included in the treatment.

Example about the format on a file with elements characterized using two features:
element1   value1   value2
element2   value1   value2
element3   value1   value2
...

2 Number of conditions/experiments (columns)

Number of values (starting from the second column) that will be graphically represented for each gene in the heat map.

3 Captions and titles

Text labels that will be used to represent each condition in the gene heat map and to characterize the resulting image (main title and legend features). Spaces and special characters will be substituted for the symbol "_".

4 Graphical parameters

Definition of graphical parameters to customize the global appearance of the resulting gene heat map.

- Color palette is the combination of colors that is used for the boxes
- Log scale converts the distributions of values submitted by the user into log scale distributions
- Define a max value allows for a direct color normalization of the gene heat map
- It is possible to perform hierachical clustering on the genes and/or the conditions (dendograms are optional)
- A grid can be superimposed to the final image and the color is configurable
- It is possible to show or hide the names of genes and/or conditions (depending on the number of elements, it is recommended to hide)
- Optionally, the actual value assigned to each gene and condition can be also displayed inside each cell
- To show the legend is useful to include the distribution of values in the upper corner of the gene heat map

MAplotter

Application to generate MA plots from differential gene expression analysis using replicates.

1 Full list of elements and features

Tab-separated plain text file that contains the statistics of each gene. Each line of the list contains the same number of columns: the key that identifies the current element is introduced at the column 1 while the attributes are stored from column 2 up to column N. Commas (if any) are internally susbtituted for floating points. There is a mandatory header line (in which the gene descriptor is omitted), usually generated by the DESeq2 software. However, users can create their own tabular file containing this information as long as the field descriptors of the key columns coincide with the ones defined elsewehere in the web application.

Example about the output format of the DESeq2 analysis in R:
baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
0610005C13Rik	6.70	-1.09	0.53	-2.05	0.03 0.11
...

2 Set the working mode

Users can provide tabular files directly generated by the DESeq2 R library or independently created. In all cases, an input file must be tabular (tab separated) and include a header line with the information about the attributes to be used for the plot generation.

3 Choose the number of genes to assign labels

Number of genes for which the user intends to highlight their name using labels overlapping their points in the MA plot.

4 List of elements to be highlighted

Names of the genes for which a label containing their name will be included in the plot. The number of genes from these boxes that will be effectively added depends on the previous parameter.

5 Names of the fields for X and Y axis

When providing their own custom tabular file, its header line must contain such field descriptors that will be employed to find the values to build the MA plot.

6 Captions and titles

Text label that will be used as a title and axis labels in the resulting MA plot. Spaces and special characters will be substituted

7 Graphical parameters

Definition of graphical parameters to customize the graphical appearance of the resulting MA plot:

- FC cutoff to define areas in the volcano plot.
- FDR cutoff to define areas in the volcano plot.
- Size of the points (genes) shown in the plot.
- Alpha transparency coefficient for points. - Size of the labels (if any) for genes in the plot.
- Colors to each area of the three gene sets as defined by the FC and FDR.
Colors must be defined in the R software as shown in the following table.

- Label style (bold/italics/boxed) if any.
- Number of labels to be shown in the MA plot.

PCAplotter

Application to perform the PCA analysis for a list of genes and values from multiple conditions.

1 Full list of elements and features

Tab-separated plain text file that contains the list of elements and features. Each line of the list contains the same number of columns: the key that identifies the current element is introduced at the column 1 while the attributes are stored from column 2 up to column N. Commas (if any) are internally susbtituted for floating points. Not available values (NA) are included in the treatment.

Example about the format on a file with elements characterized using two features:
element1   value1   value2
element2   value1   value2
element3   value1   value2
...

2 Select the features to associate names and groups

From the list of experiments (values) provided above, the user will select several ones (up to nine) in order to perform the PCA analysis.

Once one column (condition) is included, it is necessary to use a text label for the identification and indicate the group of this feature. The group is useful in the final PCA plot to show in the same color those features in the space belonging to the same class.

3 Captions and titles

Text label that will be used as a title in the resulting PCA plot. Spaces and special characters will be substituted

4 Graphical parameters

Definition of graphical parameters to customize the graphical appearance of the resulting PCA plot:

-One label and its color will be assigned to each group of conditions to highlight them in the bidimensional space plot.
Colors must be defined in the R software as shown in the following table.

- Log scale converts the distributions of values submitted by the user into log scale distributions.
- ncRNAs can be filtered out from the initial list of elements to help to improve the PCA analysis

Scatterplotter

Application to draw the scatterplot of points for a set of genes using two distribution of values.

1 Full set of records (element,value1,value2)

Tab-separated plain text file that contains the list of elements and features. Each line of the list contains the same number of columns: the key that identifies the current element is introduced at the column 1 while the attributes are stored from column 2 up to column N. Commas (if any) are internally susbtituted for floating points. Not available values (NA) are included in the treatment.

Example about the format on a file with elements characterized using two features:
element1   value1   value2
element2   value1   value2
element3   value1   value2
...

2 Highlight one subset of points [optional]

Tab-separated plain text file that contains a subset of the main list of elements including the values of features that contribute to their characterization. Each line of the list contains the same number of columns: the key that identifies the current element is introduced at the column 1 while the attributes are stored from column 2 up to column N. Commas (if any) are internally susbtituted for floating points. Not available values (NA) are included in the treatment.

Example about the format on a file with elements characterized using two features:
element1   value1   value2
element2   value1   value2
element3   value1   value2
...
These elements will be included into the final plot, being superimposed to the points generated from the whole list of values using a disctinct color for being highlghted. Up to two different subsets can be integrated into the scatterplot.

3 Highlight another subset of points [optional]

Tab-separated plain text file that contains a subset of the main list of elements including the values of features that contribute to their characterization. Each line of the list contains the same number of columns: the key that identifies the current element is introduced at the column 1 while the attributes are stored from column 2 up to column N. Commas (if any) are internally susbtituted for floating points. Not available values (NA) are included in the treatment.

Example about the format on a file with elements characterized using two features:
element1   value1   value2
element2   value1   value2
element3   value1   value2
...
These elements will be included into the final plot, being superimposed to the points generated from the whole list of values using a disctinct color for being highlghted. Up to two different subsets can be integrated into the scatterplot.

4 Captions and titles

Text labels that will be used to provide the title and the names of the two conditions being compared in the final scatterplot. Spaces and special characters will be substituted for the symbol "_".

5 Graphical parameters

Definition of graphical parameters to customize the global appearance of the resulting scatterplot.

- Log scale converts the distributions of values submitted by the user into log scale distributions.
- Null lines that contain 0 in both conditions can be excluded from the scatterplot
- Color palette is the combination of colors that is used for the points and the background
- Both optional subsets of points can be customized with a distinct color
- It is able to include several classes of guidelines (diagonal y=x, lines to denote several fold-change conditions)
- The regression line can be integrated into the picture (optional), customizing the color and the thickness
- Show lowest density area points is useful to highlight the areas with less points (potential outliers)
- Binarization values are useful to play with the balance between image smoothness and size of the scatterplot
- It is possible to crop the image by defining particular values for X and Y

Colors must be defined in the R software as shown in the following table.

Volcanoplotter

Application to generate volcano plots from differential gene expression analysis files.

1 Full list of elements and features

Tab-separated plain text file that contains the statistics of each gene. Each line of the list contains the same number of columns: the key that identifies the current element is introduced at the column 1 while the attributes are stored from column 2 up to column N. Commas (if any) are internally susbtituted for floating points. There is a mandatory header line (in which the gene descriptor is omitted), usually generated by the DESeq2 software. However, users can create their own tabular file containing this information as long as the field descriptors of the key columns coincide with the ones defined elsewehere in the web application.

Example about the output format of the DESeq2 analysis in R:
baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
0610005C13Rik	6.70	-1.09	0.53	-2.05	0.03 0.11
...

2 Set the working mode

Users can provide tabular files directly generated by the DESeq2 R library or independently created. In all cases, an input file must be tabular (tab separated) and include a header line with the information about the attributes to be used for the plot generation.

3 Choose the number of genes to assign labels

Number of genes for which the user intends to highlight their name using labels overlapping their points in the volcano plot.

4 List of elements to be highlighted

Names of the genes for which a label containing their name will be included in the plot. The number of genes from these boxes that will be effectively added depends on the previous parameter.

5 Names of fields for X axis and Y axis

When providing their own tabular files, those values should be found in the header line that describes the content of the archive. Both values will be used to generate the custom volcano plot.x

6 Captions and titles

Text that will be used as a title, subtitles and axis labels in the resulting volcano plot. Spaces and special characters will be substituted

7 Graphical parameters

Definition of graphical parameters to customize the graphical appearance of the resulting Volcano plot:

- FC cutoff to define areas in the volcano plot.
- P value cutoff to define areas in the volcano plot.
- Size of the points (genes) shown in the plot.
- Size of the labels (if any) for genes in the plot.
- Colors to each area of the four ones as defined by the FC and P value.
Colors must be defined in the R software as shown in the following table.

- Label style (bold/italics/boxed) if any.
- Legend position and size (for the four areas/classes of points of the plot).
- Show connectors or not for gene labels (if any).
- Horizontal volcano plot (X and Y axis are switched, respectively).

Gene Sets

AlluvialPlotter

Application to generate the alluvial diagram from a list of gene class assignments and multiple lists of annotations.

1 Membership of each gene in the other lists

Tab-separated plain text file that contains one list of genes. Each gene is described for the membership to a user-defined category.

Genes belonging to gene sets provided by the user will be annotated and classified following both type of information.

Example about the format on a file with genes and classes:
gene1   classx
gene2   classy
...

2 Choose the number of sets

Number of input sets of elements to be represented as columns in the alluvial diagram. According to this number, the same amount of columns will appear in the same order in the final plot.

3 Upload the files

Plain text files that contain a list of elements for further characterization. Only one element is stored at each line. The name of the elements is used to be compared against the rest of files in order to identify the elements in common and those that are specific of each particular combination of lists.

Example about the format of one file with the elements that form a set.
element1
element2
element3
...

4 Names of each set of genes

Text labels that will be used to identify each set in the resulting Venn diagram. Spaces and special characters will be substituted for the symbol "_".

5 Graphical parameters of each class

Definition of graphical parameters to customize the Alluvial diagram of elements.

Each class as defined by the user in the membership file will be assigned a color in the same order.

Colors must be defined in the R software as shown in the following table.

Compare2Genes

Application to draw the Venn diagram between two sets with proportional sizes and provide each list of common and specific elements.

1 First set of genes

Plain text file that contains a list of elements for further characterization. Only one element is stored at each line. The name of the elements is used to be compared against another file in order to identify the elements in common and those that are specific of each list.

Example about the format of one file with the elements that form a set.
element1
element2
element3
...

2 Second set of genes

Plain text file that contains a list of elements for further characterization. Only one element is stored at each line. The name of the elements is used to be compared against another file in order to identify the elements in common and those that are specific of each list.

Example about the format of one file with the elements that form a set.
element1
element2
element3
...

3 Names of each set of genes

Text labels that will be used to identify each set in the resulting Venn diagram. Spaces and special characters will be substituted for the symbol "_".

4 Graphical parameters of each set

Definition of graphical parameters to customize the Venn diagram between both sets of elements.

- Background color is used to fill the circles inside.
- Circle line color is used for the line around each circle.

Colors must be defined in the R software as shown in the following table.

- Transparency degree: alpha channel value related to the opacity in overlapping areas.
- Extract the number: this option is useful when there is not enough space to put a number inside an area of the diagram.
- Distance of the set name to circle: this option is useful to increase the space between the label of the set and the corresponding circle.

5 Global parameters of the Venn diagram

Definition of graphical parameters to customize the Venn diagram global appearance.

- Font size of gene names is useful to change the size of labels outside the diagram.
- Font size of numbers is useful to change the size of numbers inside the diagram.
- Line width of circles can be used to adapt the thickness of the Venn diagram circles.
- Figure size (small/normal): this option adapts the margins of the page and generates smaller plots (if necessary).
- Extract the common number permits the user to extract the value in the intersection of the diagram.
- Distance of external numbers together with the previous option is able to change the distance of numbers to the diagram.
- Proportional sizes: this option allows the users to generate proportional sets in the final diagram according to their sizes.
- Font family (helvetica/times) defines the family of the font (sans-serif or serif).

6 Statistical significance of the overlap

Total number of genes in the genome to compute the significance of the overlap between both sets.

UpSetPlotter

Application to generate the UpSet chart of multiple sets of elements.

1 Choose the number of sets

Number of input sets of elements to be compared (between 2 and 10). According to this number, the same number of files in the section below will be processed in the same order.

2 Upload the files

Plain text files that contain a list of elements for further characterization. Only one element is stored at each line. The name of the elements is used to be compared against the rest of files in order to identify the elements in common and those that are specific of each particular combination of lists.

Example about the format of one file with the elements that form a set.
element1
element2
element3
...

3 Choose the species

This option indicates the species, to be used when generating the table with the comparison codes.

4 Names of each set of genes

Text labels that will be used to identify each set in the resulting plot. Spaces and special characters will be substituted for the symbol "_".

5 Graphical parameters

Definition of graphical parameters to customize the global appearance of the resulting UpSet diagram.

- Font size of the title is useful to change the size of the title above the plot
- Point size is useful to change the size of the points denoting presence of an element in a particular combination
- Font size of the top and bottom axis is useful to change the size of values on both parts of the plot
- Intersections can be ranked by the number of elements that belong to each class in the dataset. Alternatively,
this can be done using a fixed ranking based on the class of combinations irrespectively of the example
- Font size of the number of intersections is useful to change the size of these values in the plot
- Highlight the perfect combination is useful to paint in a different color the option that contains elements of every class
- Bar color is useful to change the color of the bars that denote the size of each list of elements

Colors must be defined in the R software as shown in the following table.

VennPlotter

Application to draw the Venn diagram of multiple sets of elements.

1 Choose the number of sets

Number of input sets of elements to be compared (between 2 and 5). According to this number, the same number of files in the section below will be processed in the same order.

2 Upload the files

Plain text files that contain a list of elements for further characterization. Only one element is stored at each line. The name of the elements is used to be compared against the rest of files in order to identify the elements in common and those that are specific of each particular combination of lists.

Example about the format of one file with the elements that form a set.
element1
element2
element3
...

3 Names of each set of genes

Text labels that will be used to identify each set in the resulting Venn diagram. Spaces and special characters will be substituted for the symbol "_".

4 Graphical parameters of each set

Definition of graphical parameters to customize the Venn diagram among multiple sets of elements.

- Background color is used to fill the circles inside.
- Circle line color is used for the line around each circle.
(colors must defined in the R software as shown in the following table)

5 Global parameters of the Venn diagram

Definition of graphical parameters to customize the Venn diagram global appearance.

- Font size of gene names is useful to change the size of labels outside the diagram.
- Font size of numbers is useful to change the size of numbers inside the diagram.
- Figure size (small/normal): this option adapts the margins of the page and generates smaller plots (if necessary).
- Transparency degree: alpha channel value related to the opacity in overlapping areas.
- Show percentages: this option shows the percentages instead of the totals in the venn pies labels.

Regulatory Sets

MatScan

Application to predict the location of TF binding sites on a genomic sequence.

1 Upload the FASTA sequence file

Select a FASTA file to be processed by MatScan. FASTA format consists on a header line (labeled by the symbol ">" and a genomic sequence divided into lines of the same length.

2 Select the TF matrices

From the Jaspar database of predictive models for vertebrates, user can choose up to 5 models.

3 Genome mapping

If the location of the sequence to be scan for TF binding sites is known over a particular chromosome, this information can be provided here to get a screenshot of the putative sites in the genome using the UCSC genome browser.

4 Captions and titles

Text label that will be used to characterize the resulting list of values. Spaces and special characters will be substituted for the symbol "_".

5 Graphical parameters of each set

Definition of graphical parameters to customize the Matscan maps of binding sites.

- Palette of colors (list of viridis palettes).
- A grid in grey or black and white style can be integrated as a background of the boxplot.
- Point size of TF binding sites in the map of predictions can be customized.
- Point shape of TF binding sites in the map of predictions can be customized.
- The position of each hit in the map of TF sites can be included or hidden.
- Show title: this option shows the main title above in the top of the map plot.
- Background color is used in the background of the map plot.
- Foreground color is used in the foreground elements of the map plot.

List Operations

FCAnalysis

Application to extract the elements of a list that present a fold-change increase/decrease between conditions.

1 Full list of elements and values

Tab-separated plain text file that contains the list of elements and features. Each line of the list contains the same number of columns: the key that identifies the current element is introduced at the column 1 while the attributes are stored from column 2 up to column N. Commas (if any) are internally susbtituted for floating points. Not available values (NA) are included in the treatment.

Example about the format on a file with elements characterized using two features:
element1   value1   value2
element2   value1   value2
element3   value1   value2
...

2 Select the two features to be compared

Selection of two columns of the file of features uploaded in the first step for the full set of elements. The ratio between both values will be used to establish the fold-change

3 Captions and titles

Text label that will be used to characterize the resulting list of values. Spaces and special characters will be substituted for the symbol "_".

4 Conditions about the FC

Users can apply to filters on the ratio calculated between both selected values:

- Fold-change: the ratio must be higher/lower than this proportion (up or down elements)
- Minimum value: useful to dismiss the elements under this minimum threshold before calculating the ratio

FilterValues

Application to extract the elements of a list that fit into a particular condition or filter rule.

1 Full list of elements and values

Tab-separated plain text file that contains the list of elements and features. Each line of the list contains the same number of columns: the key that identifies the current element is introduced at the column 1 while the attributes are stored from column 2 up to column N. Commas (if any) are internally susbtituted for floating points. Not available values (NA) are included in the treatment.

Example about the format on a file with elements characterized using two features:
element1   value1   value2
element2   value1   value2
element3   value1   value2
...

2 Define the filters on basic conditions

To filter those lines that do not match certain rules, users on up to three fileds are able to check whether the corresponding attributes are lower/higher/equal than particular values.

3 Captions and titles

Text label that will be used to characterize the resulting list of values. Spaces and special characters will be substituted for the symbol "_".

Join2Lists

Application to extract the elements of two lists in common using the value of a particular attribute.

1 First list of elements

Plain text file that contains a list of elements and values for further comparison. Only one element is stored at each line (together with its set of features). One particular column defined arbitrarily by the user (the index or key) is used to be compared against another file in order to identify the elements in common and those that are specific of each list.

Example about the format of one file with the elements that form a set.
value11 ... value1N
value21 ... value2N
value31 ... value3N
...

2 Second list of elements

Plain text file that contains a list of elements and values for further comparison. Only one element is stored at each line (together with its set of features). One particular column defined arbitrarily by the user (the index or key) is used to be compared against another file in order to identify the elements in common and those that are specific of each list.

Example about the format of one file with the elements that form a set.
value11 ... value1N
value21 ... value2N
value31 ... value3N
...

3 Captions and titles

Text label that will be used to characterize the elements of each list in the comparison. Spaces and special characters will be substituted for the symbol "_".

4 Define the common column of each set

Users must indicate which column/field of each file will be used as the key or index for the comparison.