POSIF is a tool to identify small RNAs (sRNAs) in bacterial genomes using sRNA-seq data. The tool uses isolation forest machine learning for the read counts (per-base coverage file(s) as input) obtained from sRNA sequencing data and predicts putative sRNA encoding regions in the genome. The output of POSIF contains the genomic location of the predicted sRNAs.

List of Organisms supported by POSIF

• Escherichia coli K-12
• Salmonella enterica serovar Typhimurium LT2
• Pseudomonas aeruginosa PAO1
• Staphylococcus aureus NCTC 8325
• Bacillus subtilis 168
• Vibrio cholerae RFB16
• Listeria monocytogenes EGD-e
• Clostridium difficile S-0253
• Mycobacterium tuberculosis H37Rv
• Streptococcus pneumoniae Hu17

Instructions

1. Organism: Select the organism of your interest from the dropdown list
2. Select data type: The tool takes per base coverage file as input (Strand-Specific | Non-Strand-Specific).

Procedure to create per base coverage file:

a. BAM file converted into a sorted BAM file.
Command: samtools sort <bam file> -o <output sorted bam file>
b. Strand-specific sRNA-seq data requires splitting the BAM file into forward and reverse files. Non-strand-specific RNA-seq data uses a single BAM file.
c. BAM files need to be converted into BED files using the BEDtools command.
Command: bedtools genomcov -ibam <bam file> -d <output bed file>
d. The per base (.bed) file consists of three columns: Chromosome, Position, and Read Count. This file format serves as the input for POSIF. In the case of strand-specific data, two separate input files are required: Forward.bed and Reverse.bed. However, for non-strand-specific data, a single .bed file is used as the input for POSIF.

3. Contamination: The amount of contamination of the data set, i.e., the proportion of outliers in the data set. Used when fitting to define a threshold on the scores of the samples. The contamination should be in the range (0, 0.5), endpoints excluded.
4. Output File Name: Provide the output file name.

Example run

– Select the organism from the dropdown option of Select an Organism.
– For strand specific data, select the Strand Specific option on the tool, and upload both Forward Strand Per Base File and Reverse Strand Per Base File. (Download sample input here)
– For non-strand specific data, select the Non-strand Specific option on the tool and upload the Per base file as input.
– Set the contamination factor as 0.005.
– Enter the output file name and submit the form.

Contact Information

For any inquiries related to POSIF, please write to Dr. Shubhada Hegde (shubhada@ibab.ac.in), or Ms. Upasana Maity (upasana.megha1999@gmail.com).

Cite POSIF

If you use POSIF, kindly acknowledge it by citing the following source.

POSIF - Bacterial sRNA Detection Tool

Organism

ORG

Please select an organism of your choice

sRNA seq data

Non-Strand Specific Strand Specific

Forward Strand Perbase File

Please select a bed file

Select .bed file only

Reverse Strand Perbase File

Please select a bed file

Select .bed file only

Perbase File

Please select a bed file

Select .bed file only

Contamination Factor

Please enter a valid Contamination Factor

Percentage of total dataset to be detected as outlier. Must be in the range (0, 0.5)

Output File Name

Name

Please enter a valid Name

The final output file will be named as provided. Allowed characters: A-Z a-z _ 0-9

Loading

.

.

.

List of Organisms supported by POSIF

Instructions

Procedure to create per base coverage file:

Example run

Contact Information

Cite POSIF

POSIF - Bacterial sRNA Detection Tool