SEQ Mapper Web Application

About SEQ Mapper

SEQ Mapper is a .NET application that can be installed as a stand-alone Windows 64-bit desktop application or deployed as a web application. At the present time only the web version is available for non-commercial use.

System and Methods

The user is expected to submit a DNA sequence data file containing a group of reads of sequence data in FASTA or FASTQ format and a group of reference loci files in comma separated values (CSV) format containing the allele IDs, alleles and STRs of the specific locus as the reference data. Each FASTA or FASTQ read in the DNA sequence data file must follow a valid FASTA or FASTQ header. User may optionally add a FASTA-like header as the group header of the entire collection of the reads in the sample data file. If the group header is not provided in the sample data file, the header of the first read will be used as the group header. Please refer to the Help for more information on the format of the source data files. The group header will be presented in a summary report to assist user to identify the source data used for the report. Please refer to the sample data used in the performance test below, the sample data can be download for your reference.

SEQ Mapper provides a user interface to allow user to enter the begin index and the length of the flanking sequences. SEQ Mapper will use these options to determine the 5' and 3' flanking sequences for all reference alleles. The reverse allele sequence, STR, and primer sequences will be generated automatically and used in the search.

SEQ Mapper was designed to perform sequence mapping between reference sequences and reads generated by MPS. Three types of STR loci were used: the STR repeat region only, the STR region plus the two primer sequences, and the entire STR locus spanning the two primers and flanking DNA. SEQ Mapper identifies STR alleles from complex DNA data obtained from MPS using these three parameters. Sequences of STR alleles are used as references of search. Four levels of search in different stringency were used to detect alleles matching generated reads:

  1. Allele Search: is the strictest search requires a full match on the entire reference allele sequence trimmed by user specified primer indices against the specific FASTA/Q read.
  2. STR & Primers Search: is the next level search requires a match on STR, user specified 5' and 3' flanking sequences individually.
  3. STR Search: is the next level search only requires a match on STR.
  4. Primers Search: is the lowest level search requires match on 5' and 3'. Two Primers Search reports will be generated: one excludes the matched allele and another excludes the matched STR & primers. In Primers Search reports, the total number of different bases comparing allele and FASTA/Q read will be calculated. This Difference value is called Levenshtein Distance. Lower Levenshtein Distance value suggests a sample read is similar to a reference allele.

At the end of the search process multiple reports will be generated and saved in CSV. Please refer to the Help for more detail of the sample reports.

  1. A Summary Report shows the number of FASTA/Q reads found in the respective Allele Search Report, STR & Primers Search Report, and STR Search Report, the number of reads found in the Primer Search Report excluding the respective STR & Primers and Allele.
  2. Allele Search Report shows the number of FASTA/Q reads found matching the entire allele of a specific reference locus. The matched allele sequence will be trimmed according to the begin index of the primers user specified and included in the report.
  3. STR & Primers Search Report shows the number of FASTA/Q reads found matching the STR and 5 prime and 3 prime individually of an allele of a specific reference locus. The matched allele sequence will be trimmed according to the begin index of the primers user specified and included in the report. The matched STR and allele sequences will be included in the report.
  4. STR Search Report shows the number of FASTA/Q reads found matching the STR of an allele of a specific reference locus. The matched STR sequence will be included in the report.
  5. Primer Search Report - exclude matched Allele: this report shows the number of FASTA/Q read matching the primers only excluding all reads matched the entire allele. Both trimmed and untrimmed read for the matching primers are included in the report. The trimmed read is aligned with the matching primers from both ends, including the primers. In addition, the total number of different bases comparing allele and FASTA/Q read will be calculated. Lower Difference value suggests a read sequence is similar to a reference allele.

    The Difference value is called the Levenshtein Distance. The Levenshtein Distance between two sequences is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one sequence into the other. Therefore the Levenshtein Distance is ideal to measure how similar both sequences are – zero difference means identical.
  6. Primer Search Report - exclude matched STR & Primers: similar to the above report, this report shows the number of FASTA/Q read matching the primers only excluding all reads matched STR.
  7. Allele Sequence Report: all reference alleles, including allele ID, forward sequence, forward STR, forward 3', forward 5', reverse sequence, reverse STR, reverse 3' and reverse 5', are collected in this report.
  8. Skipped FASTA Read Sequence: all FASTA/Q reads excluded from matching are collected in this report. A possible cause is that the read is too short to satisfy the begin index and length of the primers specified by the user. In this case, user is advised to review the data source.
  9. No Match FASTA Read Sequence: as a last resort, all FASTA/Q reads matching no reference allele are collected in this report.

    The begin index and length of primers can help the user to determine how primers can be extracted from the reference alleles. It depends of how the reference data is prepared in the lab, sometimes the result may not completely meet user’s expectation. Using the No Match FASTA Read Sequence can provide the user a second chance to “tune” or better organize the reference data for more ideal results.

Performance

SEQ Mapper accepts FASTA or FASTQ format of reads. Searching parameters including 5’ and 3’ flanking sequence begin index and length can be defined by users. SEQ Mapper can be used to search SNP alleles if STR blocks were replaced by SNP blocks. SEQ Mapper is useful to detect STR or SNP alleles generated by MPS in forensic genetics.

Click here to download the performance test sample data.
Click here to download the performance test reports.

The flanking sequence options entered are below:

  1. 5’ Flanking Sequence Begin Index: 1
  2. 5’ Flanking Sequence Size: 10
  3. 3’ Flanking Sequence Begin Index: 1
  4. 3’ Flanking Sequence Size: 10

Account Registration

Users are required to create a free account with us. User's e-mail address will be verified during the registration verification process. All e-mail addresses will be kept confidential and will only be used to track requests submitted by the specific users. We may use e-mail address to notify a user if we found a specific request cannot be completed for any reasons. The application will become accessible once user completes the registration process and logs in to the site. Click here to create a new account.

SEQ Mapper Limitation

Depends on the size of the data files submitted by a user and the speed of user's internet, the upload time may vary from a few seconds to hours. SEQ Mapper limits user's overall upload size to 2GB for the duration of 5 hours or less. We will monitor our server and make necessary adjustment on this limitation.

Please note that the SEQ Mapper searching process may be very resource consuming and time consuming, we limit all users to submit one request at a time and user will need to wait for the current SEQ Mapper process completes first before submitting the next request. User will notice that the Submit button changed to Check Status right after a request was submitted. Users may click on the Check Status button to find out the status of the SEQ Mapper matching and report generation.

SEQ Mapper will compress the final reports and place them under a personalized folder for user to download. We encourage all users to download their reports as soon as possible since our storage on the server is limited. We may remove users' upload and download files periodically without notifying users. We thank you for your understanding in advance.

Browser Support

SEQ Mapper is designed to support HTML5 and CSS3 mainly for multiple files upload functionality. At this point Internet Explorer is not supported due to the lack of multiple files upload support. We'll continue finding ways to support IE as soon as a feasible solution becomes available.

Final Remark

As a final note, it is a common knowledge that applications often need to be revised in order to become more stable and useful. We intend to make this application better and truly useful therefore we appreciate your notification for any issues you've encountered.

Contact Us

Please send e-mail to SEQMapper@ntu.edu.tw if you have any technical related questions or comments or if you encounter a programming related bug. We'll do our best to get back to you on a timely maner. Thank you.