binary_search¶
- giant.catalogues.ucac.binary_search(file, label, column=0, separator=None, column_conversion=<class 'float'>, order=ColumnOrder.ASCENDING, start=0, stop=None, line_length=None)[source]¶
This helper function does a binary search on a sorted file with fixed width lines.
The binary search is performed by successively checking the midpoint between the current block of the file under consideration and using it to determine whether to search to the left or right of the midpoint for the next iteration. As such, this requires the lines to be sorted on the column that is being searched. This also requires that the column being searched is orderable (implements comparison operators) after conversion from a string.
The conversion into an orderable type is controlled using the
column_conversion
keyword argument. This is applied to the specified column (controlled by keyword argumentscolumn
andseparator
) to create an orderable object. This can be any callable, so long as it returns an orderable object, but typically is a python type likeint
orfloat
. Note that strings are orderable as well, therefore you can makecolumn_conversion
str
, however be aware that the ordering of strings can be confusing when white space is involved (for instance'10'
is less than'2'
according to string comparisons). Therefore, unless your numbers are 0 padded (ie'02'
), we recommend using a numeric type for thecolumn_conversion
.If the searched for label is found in the column then the line in which it is found is returned (as a bytes object). If it is not found then
None
is returned.- Parameters:
file (BinaryIO) – The file object to search. This should be opened in binary read mode so that we can seek
label (Any) – the label we are searching for in the file object. This must support equality comparison (==) with the type that is returned by
column_conversion
.column (int) – the column index that is to be searched
separator (str | None) – The separator spec for splitting the file. If
None
then defaults to white space. This is passed directly tostr.split
column_conversion (Callable) – The callable to convert the column into an orderable object. Typically this should be one of the python builtin types (like
float
orint
) but it can be ay callable so long as the return supports less than/greater than operators. This is applied ascolumn_conversion(line.split(sep=separator))
whereline
is the current line under consideration.order (ColumnOrder | str) – How the column being searched is sorted. This should be either
ASCENDING
orDESCENDING
(one of theColumnOrder
enum values)start (int) – Where to start in the file in bytes. Typically this is unused unless you know you can skip part of the file
stop (int | None) – Where to stop the search in bytes. If this is
None
then it will be set to the length of the file. Typically this is unused unless you know you can skip part of the fileline_length (int | None) – The number of bytes in each line. If
None
then this will be computed from the file.
- Returns:
- Return type:
bytes | None