About SEQ Mapper
SEQ Mapper
is a .NET application that can be installed as a stand-alone Windows 64-bit desktop
application or deployed as a web application. At the present time only the web version
is available for non-commercial use.
System and Methods
The user is expected to submit a DNA sequence data file containing a group of reads
of sequence data in FASTA or FASTQ format and a group of reference loci files in
comma separated values (CSV) format containing the allele IDs, alleles and STRs
of the specific locus as the reference data. Each FASTA or FASTQ read in the DNA
sequence data file must follow a valid FASTA or FASTQ header. User may optionally
add a FASTA-like header as the group header of the entire collection of the reads
in the sample data file. If the group header is not provided in the sample data
file, the header of the first read will be used as the group header. Please refer
to the Help for
more information on the format of the source data files. The group header will be
presented in a summary report to assist user to identify the source data used for
the report. Please refer to the sample data used in the performance test below,
the sample data can be download for your reference.
SEQ Mapper
provides a user interface to allow user to enter the begin index and the length
of the flanking sequences.
SEQ Mapper
will use these options to determine the 5' and 3' flanking sequences for all reference
alleles. The reverse allele sequence, STR, and primer sequences will be generated
automatically and used in the search.
SEQ Mapper
was designed to perform sequence mapping between reference sequences and reads generated by MPS.
Three types of STR loci were used: the STR repeat region only, the STR region plus
the two primer sequences, and the entire STR locus spanning the two primers and
flanking DNA.
SEQ Mapper
identifies STR alleles from complex DNA data obtained from MPS using these three
parameters. Sequences of STR alleles are used as references of search. Four levels
of search in different stringency were used to detect alleles matching generated
reads:
- Allele Search: is the strictest search requires
a full match on the entire reference allele sequence trimmed by user specified primer
indices against the specific FASTA/Q read.
- STR & Primers Search: is the next level search
requires a match on STR, user specified 5' and 3' flanking sequences individually.
- STR Search: is the next level search only requires
a match on STR.
- Primers Search: is the lowest level search requires
match on 5' and 3'. Two Primers Search reports will be generated: one excludes the
matched allele and another excludes the matched STR & primers. In Primers Search
reports, the total number of different bases comparing allele and FASTA/Q read will
be calculated. This Difference value is called Levenshtein Distance. Lower Levenshtein
Distance value suggests a sample read is similar to a reference allele.
At the end of the search process multiple reports will be generated and saved in CSV.
Please refer to the Help for
more detail of the sample reports.
- A Summary Report shows the number of FASTA/Q reads
found in the respective Allele Search Report, STR & Primers Search Report, and STR
Search Report, the number of reads found in the Primer Search Report excluding the
respective STR & Primers and Allele.
- Allele Search Report shows the number of FASTA/Q
reads found matching the entire allele of a specific reference locus. The matched
allele sequence will be trimmed according to the begin index of the primers user
specified and included in the report.
- STR & Primers Search Report shows the number of
FASTA/Q reads found matching the STR and 5 prime and 3 prime individually of an
allele of a specific reference locus. The matched allele sequence will be trimmed
according to the begin index of the primers user specified and included in the report.
The matched STR and allele sequences will be included in the report.
- STR Search Report shows the number of FASTA/Q reads
found matching the STR of an allele of a specific reference locus. The matched STR
sequence will be included in the report.
- Primer Search Report - exclude matched Allele:
this report shows the number of FASTA/Q read matching the primers only excluding
all reads matched the entire allele. Both trimmed and untrimmed read for the matching
primers are included in the report. The trimmed read is aligned with the matching
primers from both ends, including the primers. In addition, the total number of
different bases comparing allele and FASTA/Q read will be calculated. Lower Difference
value suggests a read sequence is similar to a reference allele.
The Difference value is called the Levenshtein Distance. The Levenshtein Distance
between two sequences is the minimum number of single-character edits (i.e. insertions,
deletions or substitutions) required to change one sequence into the other. Therefore
the Levenshtein Distance is ideal to measure how similar both sequences are – zero
difference means identical.
- Primer Search Report - exclude matched STR & Primers:
similar to the above report, this report shows the number of FASTA/Q read matching
the primers only excluding all reads matched STR.
- Allele Sequence Report: all reference alleles,
including allele ID, forward sequence, forward STR, forward 3', forward 5', reverse
sequence, reverse STR, reverse 3' and reverse 5', are collected in this report.
- Skipped FASTA Read Sequence: all FASTA/Q reads
excluded from matching are collected in this report. A possible cause is that the
read is too short to satisfy the begin index and length of the primers specified
by the user. In this case, user is advised to review the data source.
- No Match FASTA Read Sequence: as a last resort,
all FASTA/Q reads matching no reference allele are collected in this report.
The begin index and length of primers can help the user to determine how primers
can be extracted from the reference alleles. It depends of how the reference data
is prepared in the lab, sometimes the result may not completely meet user’s expectation.
Using the No Match FASTA Read Sequence can provide the user a second chance to “tune”
or better organize the reference data for more ideal results.
Performance
SEQ Mapper
accepts FASTA or FASTQ format of reads. Searching parameters including 5’ and 3’
flanking sequence begin index and length can be defined by users.
SEQ Mapper
can be used to search SNP alleles if STR blocks were replaced by SNP blocks.
SEQ Mapper
is useful to detect STR or SNP alleles generated by MPS in forensic genetics.
Click
here to download the performance test sample data.
Click
here to download the performance test reports.
The flanking sequence options entered are below:
- 5’ Flanking Sequence Begin Index: 1
- 5’ Flanking Sequence Size: 10
- 3’ Flanking Sequence Begin Index: 1
- 3’ Flanking Sequence Size: 10
Account Registration
Users are required to create a free account with us. User's e-mail address will
be verified during the registration verification process. All e-mail addresses will
be kept confidential and will only be used to track requests submitted by the specific
users. We may use e-mail address to notify a user if we found a specific request
cannot be completed for any reasons. The application will become accessible once
user completes the registration process and logs in to the site. Click
here to create a new account.
SEQ Mapper
Limitation
Depends on the size of the data files submitted by a user and the speed of
user's internet, the upload time may vary from a few seconds to hours.
SEQ Mapper
limits user's overall upload size to
2GB
for the duration of
5 hours
or less. We will monitor our server and make necessary adjustment on this limitation.
Please note that the
SEQ Mapper
searching process may be very resource consuming and time consuming, we limit all
users to submit one request at a time and user will need to wait for the current
SEQ Mapper
process completes first before submitting the next request. User will notice that
the Submit button changed to Check Status right after a request was submitted. Users
may click on the Check Status button to find out the status of the
SEQ Mapper
matching and report generation.
SEQ Mapper
will compress the final reports and place them under a personalized folder for user
to download. We encourage all users to download their reports as soon as possible
since our storage on the server is limited. We may remove users' upload and download
files periodically without notifying users. We thank you for your understanding
in advance.
Browser Support
SEQ Mapper
is designed to support HTML5 and CSS3 mainly for multiple files upload functionality.
At this point Internet Explorer is not supported due to the lack of multiple files
upload support. We'll continue finding ways to support IE as soon as a feasible
solution becomes available.
Final Remark
As a final note, it is a common knowledge that applications often need to be revised
in order to become more stable and useful. We intend to make this application better
and truly useful therefore we appreciate your notification for any issues you've
encountered.
Contact Us
Please send e-mail to
jimlee@ntu.edu.tw if you have any technical related questions
or comments or if you encounter a programming related bug. We'll do our best to
get back to you on a timely maner. Thank you.