lakeview.SequenceAlignment
lakeview.SequenceAlignment#
- class lakeview.SequenceAlignment
Bases:
object
Plot sequence alignment from BAM files.
- __init__(reference_name, segments, pileup_depths, pileup_bases)
- Parameters
reference_name (str) –
segments (list[AlignedSegment]) –
pileup_bases (dict[int, collections.Counter[str]] | None) –
- segments: list[AlignedSegment]
A list of aligned segments loaded, in the order that appears in the source file.
Note
For advanced users, it is possible to modify this attribute prior to plotting to apply complex filtering/sorting operations.
- pileup_bases: dict[int, collections.Counter[str]] | None
A dict mapping reference positions to a counter, which in turn maps each base to its count.
- reference_name: str
Name of the reference sequence.
- classmethod from_file(file_path, region, *, load_pileup=True, **kw)
Load sequence alignment from a local BAM file.
Note
Python file objects are not supported due to a known limitation of pysam.
Note
The SAM format is not supported as it does not allow random access.
- Parameters
file_path (str) – Path to the BAM file.
region (str | tuple[str, tuple[float, float]] | tuple[str, None]) – region to load alignment records from. See
lakeview.region_notation
for additional details.load_pileup (bool) – Whether to load pileup data.
kw – Keyword arguments passed to
pysam.AlignmentFile
.
- classmethod from_remote(url, region, *, index_url=None, load_pileup=True, **kw)
Load sequence alignment from a remote BAM file via HTTP, HTTPS, or FTP protocol.
Note
The SAM format is not supported as it does not allow random access.
- Parameters
url (str) – URL to the remote BAM file
region (str | tuple[str, tuple[float, float]] | tuple[str, None]) – region to load alignment records from. See
lakeview.region_notation
for additional details.index_url (str | None) – URL to the remote BAM index (.bai) file. The default is
url + '.bai'
load_pileup (bool) – Whether to load pileup data.
kw – keyword arguments passed to
pysam.AlignmentFile
- draw_alignment(ax, *, filter_by=None, sort_by=None, link_by=None, group_by=None, color_by=None, group_labels=None, height=None, max_rows=200, min_spacing=None, show_backbones=True, show_arrowheads=True, show_links=True, show_insertions=True, min_insertion_size=10, show_deletions=True, min_deletion_size=10, show_mismatches=True, show_modified_bases=False, show_soft_clippings=True, min_soft_clipping_size=10, show_hard_clippings=True, min_hard_clipping_size=10, show_reference_skips=True, show_group_labels=None, show_group_separators=True, backbones_kw={}, arrowheads_kw={}, links_kw={}, insertions_kw={}, deletions_kw={}, mismatches_kw={}, modified_bases_kw={}, soft_clipping_kw={}, hard_clipping_kw={}, reference_skips_kw={}, letters_kw={}, group_labels_kw={}, group_separators_kw={})
Draw sequence alignment patterns, in a style similar to IGV alignment track.
Segments are drawn from top to bottom on the given
ax
.- Parameters
ax (Axes) – Matplotlib
matplotlib.axes.Axes
instance to plot on.filter_by (Callable[[AlignedSegment], bool] | Iterable[bool] | str | None) – Specify which aligned segments plotted. The default is to plot all aligned segments.
sort_by (Callable[[AlignedSegment], Identifier] | Iterable[Identifier] | Literal[('length', 'start')] | None) – Specify how aligned segments are sorted before plotting. The default is to keep the order of segments in the source file.
link_by (Callable[[AlignedSegment], LinkIdentifier] | Iterable[LinkIdentifier] | Literal[('name', 'pair')] | None) – Specify which segments are linked together before plotting. When two or more segments are linked, they will be plotted in the same row, even if they overlap with each other. The default is to link no segments together.
group_by (Callable[[AlignedSegment], GroupIdentifier] | Iterable[GroupIdentifier] | Literal[('name', 'proper_pair', 'strand')] | None) – Specify which segments are groupped together before plotting. Groupped segments are plotted next to each other. The default is not to link any segments. The default is to group no segments together.
color_by (Callable[[AlignedSegment], Color] | Iterable[Color] | Literal[('proper_pair', 'strand')] | None) – Specify segments colors. The default is to color all segments in light gray.
group_labels (Callable[[GroupIdentifier], str] | Mapping[GroupIdentifier, str] | None) – Text labels for each group. The default is to use group identifiers as labels. This parameter is only valid when a custom value for group_by is used.
height (Optional[float]) – Segment height in points. The default is to infer automatically.
min_spacing (Optional[float]) – The minimum horizontal spacing between two adjacent segments in the same row, in terms of number of bases. The default is to infer automatically.
max_rows (int) – The maximum number of rows to layout segments. Excess segments will not be drawn. If multiple segment groups exist, this parameter limits the maximum number of rows per group.
show_backbones (bool) – Whether to show segment backbones (i.e. horizontal bars representing aligned positions of each segment).
show_arrowheads (bool) – Whether to show triangular arrowheads to denote segment orientations.
show_links (bool) – Whether to show horizontal lines connecting linked segments. Only has effect when
link_by
is specified.show_insertions (bool) – Whether to show the “I”-shaped markers to denote insertions.
min_insertion_size (float) – Insertions below
min_insertion_size
will not be shown.show_deletions (bool) – Whether to show the horizontal dash markers to denote deletions.
min_deletion_size (float) – Deletions below
min_deletion_size
will not be shown.show_mismatches (bool) – Whether to show vertical lines denoting mismatched bases. Requires the CIGAR X (BAM_CDIFF) operation or the MD tag. See
lakeview.alignment.AlignedSegment.mismatched_bases
for details.show_modified_bases (bool) – Whether to show vertical lines denoting DNA modification. Requires the Ml and Mm tags. See
lakeview.alignment.AlignedSegment.modified_bases
for details.show_soft_clippings (bool) – Whether to show markers at segment ends denoting soft-clipped bases.
min_soft_clipping_size (float) – Soft-clipping below
min_soft_clipping_size
will not be shown.show_hard_clippings (bool) – Whether to show markers at segment ends denoting hard-clipped bases.
min_hard_clipping_size (float) – Hard-clipping below
min_hard_clipping_size
will not be shown.show_reference_skips (bool) – Whether to show horizontal lines denoting reference skips. Reference skips are commonly found in intron regions of RNA-seq reads.
show_group_labels (Optional[bool]) – Whether to show a text label for each segment group. If
None
, group labels will only be shown whengroup_by
is specified.show_group_separators (bool) – Whether to show horizontal separator lines between adjacent segment groups.
Note
For a detailed explaination on the usage of
filter_by
,sort_by
,link_by
,group_by
, andcolor_by
, see Customising alignment layout.
- draw_pileup(ax, *, show_mismatches=True, min_alt_frequency=0.2, min_alt_depth=2, window_size=1, pileup_kw={}, mismatch_kw={})
Draw pileup depths for alignment, in a style similar to IGV coverage track.
- Parameters
ax (matplotlib.axes._axes.Axes) – Matplotlib
matplotlib.axes.Axes
instance to plot on.show_mismatches (bool) – If
True
, mismatched bases will be shown as colored bars.min_alt_frequency (float) – Mismatched bases with frequency less than min_alt_frequency will not be shown.
min_alt_depth (float) – Mismatched bases with depth less than min_alt_frequency will not be shown.
window_size (int) – Show mean depth values in non-overlapping windows. Windows start from integer multiples of window_size.
pileup_kw – Keyword arguments passed to
matplotlib.axes.Axes.fill_between()
.mismatch_kw – Keyword arguments passed to
matplotlib.axes.Axes.bar()
.
Test docstring for annotation.py
- class lakeview.alignment.AlignedSegment#
Bases:
object
A wrapper around pysam.AlignedSegment
- __init__(wrapped)#
- Parameters
wrapped (pysam.libcalignedsegment.AlignedSegment) – an instance of
pysam.AlignedSegment
- wrapped: pysam.libcalignedsegment.AlignedSegment#
the wrapped
pysam.AlignedSegment
instance.
- reference_start: int#
0-based coordinate of the first aligned reference position of the segment; alias of
pysam.AlignedSegment.reference_start
- reference_end: int#
0-based coordinate of one past the last aligned reference position of the segment; alias of
pysam.AlignedSegment.reference_end
- query_name: str#
alias of
pysam.AlignedSegment.query_name
- is_forward: bool#
alias of
pysam.AlignedSegment.is_forward
- is_reverse: bool#
alias of
pysam.AlignedSegment.is_reverse
- is_proper_pair: bool#
alias of
pysam.AlignedSegment.is_proper_pair
- is_secondary: bool#
alias of
pysam.AlignedSegment.is_secondary
- is_supplementary: bool#
alias of
pysam.AlignedSegment.is_secondary
- is_mapped: bool#
alias of
pysam.AlignedSegment.is_mapped
- mapping_quality: int#
alias of
pysam.AlignedSegment.mapping_quality
- has_tag(tag)#
alias of
pysam.AlignedSegment.has_tag()
- get_tag(tag)#
alias of
pysam.AlignedSegment.get_tag()
- Parameters
tag (str) –
- property query_sequence#
alias of
pysam.AlignedSegment.query_sequence
- get_aligned_pairs(*args, **kw)#
- property reference_sequence: str#
reference sequence in the wrapped
pysam.AlignedSegment
; requires the MD tag; seepysam.AlignedSegment.get_reference_sequence()
- get_aligned_reference_position(query_position)#
Given the query_position, returns the corresponding reference position , or None if the query_position is part of an insertion.
- get_aligned_query_sequence(reference_position)#
Given the reference_position, returns the corresponding query sequence if
self.reference_start <= reference_position < self.reference_end
, orNone
otherwise.If the reference_position is part of a deletion or reference skip,
"-"
is returned.This function is particularly useful for phasing sequencing reads into halpotype groups.
- property insertions: Sequence[Insertion]#
insertions identified from CIGAR
- property deletions: Sequence[Deletion]#
deletions identified from CIGAR
- property soft_clipping: Sequence[SoftClippedBases]#
soft clipping identified from CIGAR
- property hard_clipping: Sequence[HardClippedBases]#
hard clipping identified from CIGAR
- property mismatched_bases: Sequence[_MismatchedBase]#
mismatched bases identified from CIGAR X (BAM_CDIFF) operation or the MD tag; empty if neither is available; see
pysam.AlignedSegment.cigartuples
andpysam.AlignedSegment.get_aligned_pairs()
Note
For Minimap2, use
--eqx
to include mismatch information in the CIGAR string,--MD
to include mismatch information in the MD tag.
- property modified_bases: list[ModifiedBase]#
modified bases identified from the Ml and Mm tags; see
pysam.AlignedSegment.modified_bases