lakeview.region_notation
lakeview.region_notation#
In Lakeview, the genomic region of interest can be specified in one of the following two formats:
a region notation string (e.g.
"chr1:15,000-20,000"
), with optional commas as thousands separatorsa nested tuple in the form of
(sequence_name, (start, end))
(e.g.('chr1', (15_000, 20_000))
), with optional underscores as thousands separators following the Python standard syntax.
In both cases, the start and end coordinates can be omitted together to include the entire sequence (e.g. "chr1"
or ('chr1', (None, None))
).
This module contains functions for the parsing and conversion of valid region notations.
- lakeview.region_notation.get_region_notation(sequence_name, interval=None)#
Returns a samtools-compatible region string from
sequence_name
,start
, andend
. Note thatstart
andend
coordinates are 0-based half-open intervals, while the samtools-compatible notation instead represent a 1-based closed interval. See pysam documentation.>>> get_region_notation("chr1", (15000, 20000)) 'chr1:15001-20000' >>> get_region_notation("chr1") 'chr1' >>> get_region_notation("chr1", (234.2, 480.8)) 'chr1:235-481'
- lakeview.region_notation.parse_region_notation(region_notation)#
Parse
region_notation
into(sequence_name, (start, end))
. Ifregion_notation
only contains the sequence name, bothstart
andend
will be None.- Parameters
region_notation (str) – region notation string to be parsed
- Returns
a two-element tuple
(sequence_name, (start, end))
- Raises
ValueError – if
region_notation
is not correctly formatted- Return type
>>> parse_region_notation("chr1:15001-20000") ('chr1', (15000, 20000)) >>> parse_region_notation("chr14:104,586,347-107,043,718") ('chr14', (104586346, 107043718)) >>> parse_region_notation("chr14") ('chr14', None) >>> parse_region_notation("chr14:200") Traceback (most recent call last): ... lakeview.region_notation.InvalidRegionNotationError: Unable to parse region: 'chr14:200'.
- lakeview.region_notation.normalize_region_notation(region_notation)#
Normalize
region_notation
to be samtools-compatible. Spaces are removed. Commas are removed from the coordinates.- Raises
RegionNotationError – if region_notation is not correctly formatted
- Parameters
region_notation (str) –
- Return type
>>> normalize_region_notation("chr14:104,586,347-107,043,718") 'chr14:104586347-107043718' >>> normalize_region_notation("chrX: 26,23,922 - 26,235,150") 'chrX:2623922-26235150' >>> normalize_region_notation("chr14") 'chr14'