data_handling Module¶
Functions for handling data.
data_handling.request¶
Data handling for request step.
- gtexquery.data_handling.request.gtex_request(region: str, gene: str, output: str) None¶
Make a thead-safe gtex request against mediantranscriptexpression.
If gene starts with “ENSG”, a query is made to GTEx. If it does not, no file is created. This is designed to be used with snakemake checkpoints.
A thread local session is provided by a call to
_get_session. This allows the reuse of sessions, which, among other things, provides significant speed ups.- Parameters
region (str) – The gtex region to query.
gene (str) – The ensg to query.
output (str) – Where to save the output file.
- Raises
requests.HTTPError – When the get request returns an error
- gtexquery.data_handling.request.lut_check(gene: str, lut: pandas.core.frame.DataFrame) str¶
Check that a gene is found in the Gencode annotations.
If the gene is found, then it is converted to its Ensembl ID. If it is not found, then the gene name is returned. The found status can be queried by seeing if the resulting string starts with “ENSG”, a pattern that will only occur for Ensembl IDs.
Note
It’s likely that your gene is in Gencode even if it is not found. Common reasons (at least for me!) that a gene might not be found include spelling errors and name errors (ie. using NGN2 instead of NEUROG2).
- Parameters
gene (str) – The gene name to be queried
lut (pd.DataFrame) – The dataframe containing the name-to-id conversion for the genes
- Returns
- Return type
str
Example
>>> lut = pd.DataFrame.from_dict({"name": ["ASCL1"], "id": ["ENSG00000139352.3"]}) >>> lut_check("ASCL1", lut) 'ENSG00000139352.3' >>> lut_check("NotAGene", lut) 'NotAGene'
data_handling.biomart¶
Data handling for biomart step.
- gtexquery.data_handling.biomart.XML_QUERY¶
A lambda funcion encapsulating the unwieldy XML query string required by Biomart. The list of transcript are joined to form the ensembl_transcript_id field.
- Type
Callable[[list[str]], str]
- gtexquery.data_handling.biomart.biomart_request(infile: str, output: str) None¶
Query Biomart with a list of transcripts.
Instantiates a thread_local request.Session before querying Biomart with a list of transcript IDs. Should an error occur, it is logged using the logging.exception method.
- Parameters
infile (str) – The input file. This is expected to be the output of the GTEx query, and will fail if the expected columns are not present.
output (str) – Where to save results
- Raises
requests.HTTPError – When the GET request fails
data_handling.process¶
Data handling for process step.
- gtexquery.data_handling.process.merge_data(gtex_path: Union[pathlib.Path, str], bm_path: Union[pathlib.Path, str], mane: pandas.core.frame.DataFrame, out_path: Union[pathlib.Path, str]) None¶
Merge the data from previous pipeline queries.
- Parameters
gtex_path (Union[Path, str]) – Path to the file containing GTEx query data.
bm_path (Union[Path, str]) – Path to the file containing BioMart query data.
mane (pd.DataFrame) – A DataFrame containing MANE annotations.
out_path (Union[Path, str]) – Path to the output file.