lakeview.region_notation#

In Lakeview, the genomic region of interest can be specified in one of the following two formats:

  • a region notation string (e.g. "chr1:15,000-20,000"), with optional commas as thousands separators

  • a nested tuple in the form of (sequence_name, (start, end)) (e.g. ('chr1', (15_000, 20_000))), with optional underscores as thousands separators following the Python standard syntax.

In both cases, the start and end coordinates can be omitted together to include the entire sequence (e.g. "chr1" or ('chr1', (None, None))).

This module contains functions for the parsing and conversion of valid region notations.

lakeview.region_notation.get_region_notation(sequence_name, interval=None)#

Returns a samtools-compatible region string from sequence_name, start, and end. Note that start and end coordinates are 0-based half-open intervals, while the samtools-compatible notation instead represent a 1-based closed interval. See pysam documentation.

>>> get_region_notation("chr1", (15000, 20000))
'chr1:15001-20000'
>>> get_region_notation("chr1")
'chr1'
>>> get_region_notation("chr1", (234.2, 480.8))
'chr1:235-481'
Parameters

interval (Optional[tuple[float, float]]) –

Return type

str

lakeview.region_notation.parse_region_notation(region_notation)#

Parse region_notation into (sequence_name, (start, end)). If region_notation only contains the sequence name, both start and end will be None.

Parameters

region_notation (str) – region notation string to be parsed

Returns

a two-element tuple (sequence_name, (start, end))

Raises

ValueError – if region_notation is not correctly formatted

Return type

tuple[str, Optional[tuple[int, int]]]

>>> parse_region_notation("chr1:15001-20000")
('chr1', (15000, 20000))
>>> parse_region_notation("chr14:104,586,347-107,043,718")
('chr14', (104586346, 107043718))
>>> parse_region_notation("chr14")
('chr14', None)
>>> parse_region_notation("chr14:200")
Traceback (most recent call last):
    ...
lakeview.region_notation.InvalidRegionNotationError: Unable to parse region: 'chr14:200'.
lakeview.region_notation.normalize_region_notation(region_notation)#

Normalize region_notation to be samtools-compatible. Spaces are removed. Commas are removed from the coordinates.

Raises

RegionNotationError – if region_notation is not correctly formatted

Parameters

region_notation (str) –

Return type

str

>>> normalize_region_notation("chr14:104,586,347-107,043,718")
'chr14:104586347-107043718'
>>> normalize_region_notation("chrX: 26,23,922 - 26,235,150")
'chrX:2623922-26235150'
>>> normalize_region_notation("chr14")
'chr14'
exception lakeview.region_notation.InvalidRegionNotationError#

Bases: TypeError

Exception raised for invalid region of interest format.

__init__(region)#
Parameters

region (Any) –