lakeview.SequenceAlignment#

class lakeview.SequenceAlignment

Bases: object

Plot sequence alignment from BAM files.

__init__(reference_name, segments, pileup_depths, pileup_bases)
Parameters
segments: list[AlignedSegment]

A list of aligned segments loaded, in the order that appears in the source file.

Note

For advanced users, it is possible to modify this attribute prior to plotting to apply complex filtering/sorting operations.

pileup_depths: dict[int, int] | None

A dict mapping reference positions to pileup depths.

pileup_bases: dict[int, collections.Counter[str]] | None

A dict mapping reference positions to a counter, which in turn maps each base to its count.

reference_name: str

Name of the reference sequence.

classmethod from_file(file_path, region, *, load_pileup=True, **kw)

Load sequence alignment from a local BAM file.

Note

Python file objects are not supported due to a known limitation of pysam.

Note

The SAM format is not supported as it does not allow random access.

Parameters
classmethod from_remote(url, region, *, index_url=None, load_pileup=True, **kw)

Load sequence alignment from a remote BAM file via HTTP, HTTPS, or FTP protocol.

Note

The SAM format is not supported as it does not allow random access.

Parameters
draw_alignment(ax, *, filter_by=None, sort_by=None, link_by=None, group_by=None, color_by=None, group_labels=None, height=None, max_rows=200, min_spacing=None, show_backbones=True, show_arrowheads=True, show_links=True, show_insertions=True, min_insertion_size=10, show_deletions=True, min_deletion_size=10, show_mismatches=True, show_modified_bases=False, show_soft_clippings=True, min_soft_clipping_size=10, show_hard_clippings=True, min_hard_clipping_size=10, show_reference_skips=True, show_group_labels=None, show_group_separators=True, backbones_kw={}, arrowheads_kw={}, links_kw={}, insertions_kw={}, deletions_kw={}, mismatches_kw={}, modified_bases_kw={}, soft_clipping_kw={}, hard_clipping_kw={}, reference_skips_kw={}, letters_kw={}, group_labels_kw={}, group_separators_kw={})

Draw sequence alignment patterns, in a style similar to IGV alignment track.

Segments are drawn from top to bottom on the given ax.

Parameters
  • ax (Axes) – Matplotlib matplotlib.axes.Axes instance to plot on.

  • filter_by (Callable[[AlignedSegment], bool] | Iterable[bool] | str | None) – Specify which aligned segments plotted. The default is to plot all aligned segments.

  • sort_by (Callable[[AlignedSegment], Identifier] | Iterable[Identifier] | Literal[('length', 'start')] | None) – Specify how aligned segments are sorted before plotting. The default is to keep the order of segments in the source file.

  • link_by (Callable[[AlignedSegment], LinkIdentifier] | Iterable[LinkIdentifier] | Literal[('name', 'pair')] | None) – Specify which segments are linked together before plotting. When two or more segments are linked, they will be plotted in the same row, even if they overlap with each other. The default is to link no segments together.

  • group_by (Callable[[AlignedSegment], GroupIdentifier] | Iterable[GroupIdentifier] | Literal[('name', 'proper_pair', 'strand')] | None) – Specify which segments are groupped together before plotting. Groupped segments are plotted next to each other. The default is not to link any segments. The default is to group no segments together.

  • color_by (Callable[[AlignedSegment], Color] | Iterable[Color] | Literal[('proper_pair', 'strand')] | None) – Specify segments colors. The default is to color all segments in light gray.

  • group_labels (Callable[[GroupIdentifier], str] | Mapping[GroupIdentifier, str] | None) – Text labels for each group. The default is to use group identifiers as labels. This parameter is only valid when a custom value for group_by is used.

  • height (Optional[float]) – Segment height in points. The default is to infer automatically.

  • min_spacing (Optional[float]) – The minimum horizontal spacing between two adjacent segments in the same row, in terms of number of bases. The default is to infer automatically.

  • max_rows (int) – The maximum number of rows to layout segments. Excess segments will not be drawn. If multiple segment groups exist, this parameter limits the maximum number of rows per group.

  • show_backbones (bool) – Whether to show segment backbones (i.e. horizontal bars representing aligned positions of each segment).

  • show_arrowheads (bool) – Whether to show triangular arrowheads to denote segment orientations.

  • show_links (bool) – Whether to show horizontal lines connecting linked segments. Only has effect when link_by is specified.

  • show_insertions (bool) – Whether to show the “I”-shaped markers to denote insertions.

  • min_insertion_size (float) – Insertions below min_insertion_size will not be shown.

  • show_deletions (bool) – Whether to show the horizontal dash markers to denote deletions.

  • min_deletion_size (float) – Deletions below min_deletion_size will not be shown.

  • show_mismatches (bool) – Whether to show vertical lines denoting mismatched bases. Requires the CIGAR X (BAM_CDIFF) operation or the MD tag. See lakeview.alignment.AlignedSegment.mismatched_bases for details.

  • show_modified_bases (bool) – Whether to show vertical lines denoting DNA modification. Requires the Ml and Mm tags. See lakeview.alignment.AlignedSegment.modified_bases for details.

  • show_soft_clippings (bool) – Whether to show markers at segment ends denoting soft-clipped bases.

  • min_soft_clipping_size (float) – Soft-clipping below min_soft_clipping_size will not be shown.

  • show_hard_clippings (bool) – Whether to show markers at segment ends denoting hard-clipped bases.

  • min_hard_clipping_size (float) – Hard-clipping below min_hard_clipping_size will not be shown.

  • show_reference_skips (bool) – Whether to show horizontal lines denoting reference skips. Reference skips are commonly found in intron regions of RNA-seq reads.

  • show_group_labels (Optional[bool]) – Whether to show a text label for each segment group. If None, group labels will only be shown when group_by is specified.

  • show_group_separators (bool) – Whether to show horizontal separator lines between adjacent segment groups.

Note

For a detailed explaination on the usage of filter_by, sort_by, link_by, group_by, and color_by, see Customising alignment layout.

draw_pileup(ax, *, show_mismatches=True, min_alt_frequency=0.2, min_alt_depth=2, window_size=1, pileup_kw={}, mismatch_kw={})

Draw pileup depths for alignment, in a style similar to IGV coverage track.

Parameters
  • ax (matplotlib.axes._axes.Axes) – Matplotlib matplotlib.axes.Axes instance to plot on.

  • show_mismatches (bool) – If True, mismatched bases will be shown as colored bars.

  • min_alt_frequency (float) – Mismatched bases with frequency less than min_alt_frequency will not be shown.

  • min_alt_depth (float) – Mismatched bases with depth less than min_alt_frequency will not be shown.

  • window_size (int) – Show mean depth values in non-overlapping windows. Windows start from integer multiples of window_size.

  • pileup_kw – Keyword arguments passed to matplotlib.axes.Axes.fill_between().

  • mismatch_kw – Keyword arguments passed to matplotlib.axes.Axes.bar().

Test docstring for annotation.py

class lakeview.alignment.AlignedSegment#

Bases: object

A wrapper around pysam.AlignedSegment

__init__(wrapped)#
Parameters

wrapped (pysam.libcalignedsegment.AlignedSegment) – an instance of pysam.AlignedSegment

wrapped: pysam.libcalignedsegment.AlignedSegment#

the wrapped pysam.AlignedSegment instance.

reference_start: int#

0-based coordinate of the first aligned reference position of the segment; alias of pysam.AlignedSegment.reference_start

reference_end: int#

0-based coordinate of one past the last aligned reference position of the segment; alias of pysam.AlignedSegment.reference_end

query_name: str#

alias of pysam.AlignedSegment.query_name

is_forward: bool#

alias of pysam.AlignedSegment.is_forward

is_reverse: bool#

alias of pysam.AlignedSegment.is_reverse

is_proper_pair: bool#

alias of pysam.AlignedSegment.is_proper_pair

is_secondary: bool#

alias of pysam.AlignedSegment.is_secondary

is_supplementary: bool#

alias of pysam.AlignedSegment.is_secondary

is_mapped: bool#

alias of pysam.AlignedSegment.is_mapped

mapping_quality: int#

alias of pysam.AlignedSegment.mapping_quality

query_alignment_length: int#

alias of pysam.AlignedSegment.query_alignment_length

has_tag(tag)#

alias of pysam.AlignedSegment.has_tag()

Parameters

tag (str) –

Return type

bool

get_tag(tag)#

alias of pysam.AlignedSegment.get_tag()

Parameters

tag (str) –

property query_sequence#

alias of pysam.AlignedSegment.query_sequence

get_aligned_pairs(*args, **kw)#

alias of pysam.AlignedSegment.get_aligned_pairs()

property reference_sequence: str#

reference sequence in the wrapped pysam.AlignedSegment; requires the MD tag; see pysam.AlignedSegment.get_reference_sequence()

get_aligned_reference_position(query_position)#

Given the query_position, returns the corresponding reference position , or None if the query_position is part of an insertion.

Parameters

query_position (int) – 0-based query position

Return type

int | None

get_aligned_query_sequence(reference_position)#

Given the reference_position, returns the corresponding query sequence if self.reference_start <= reference_position < self.reference_end, or None otherwise.

If the reference_position is part of a deletion or reference skip, "-" is returned.

This function is particularly useful for phasing sequencing reads into halpotype groups.

Parameters

reference_position (int) – 0-based reference position

Return type

str | None

property insertions: Sequence[Insertion]#

insertions identified from CIGAR

property deletions: Sequence[Deletion]#

deletions identified from CIGAR

property soft_clipping: Sequence[SoftClippedBases]#

soft clipping identified from CIGAR

property hard_clipping: Sequence[HardClippedBases]#

hard clipping identified from CIGAR

property mismatched_bases: Sequence[_MismatchedBase]#

mismatched bases identified from CIGAR X (BAM_CDIFF) operation or the MD tag; empty if neither is available; see pysam.AlignedSegment.cigartuples and pysam.AlignedSegment.get_aligned_pairs()

Note

For Minimap2, use --eqx to include mismatch information in the CIGAR string, --MD to include mismatch information in the MD tag.

property modified_bases: list[ModifiedBase]#

modified bases identified from the Ml and Mm tags; see pysam.AlignedSegment.modified_bases

property reference_skips: list[ReferenceSkip]#

reference skips identified from CIGAR

Note

Reference skips often appear in RNAseq read alignment to represent introns.