# Pileup analysis

#### Initialize SeQuiLaSession and download sample data (check Initialize section for details)

In [None]:
%run initialize.ipynb

## `pileup`
`pileup(tableName: String, sampleId: String, referencePath: String, includeBaseQual: Boolean = False)`

##### Compute reads pileup over using aligment data

#### input parameters
* tableName - registered table over alignment files (see File formats for details)
* sampleId - name of the sample correspondind to the name of the alignment file without a file extension
* referencePath - a local path to referenece in a FASTQ format (should be indexed and available on all computing nodes if run in distributed nodes). Can be also distributed in the runtime using `--files` parameter to `pyspark-shell`
* includeAlts - determines whether alts should included in the output
* includeBaseQual - determines whether base qualities should be computed and included in the output (defaults to False). Please note that calculating base qualities has **significant** impact on performance (see Benchmark page for details).

#### returned columns
Blocks of pileup with the following columns:

* sample_id - name of the sample correspondind to the name of the alignment file without a file extension
* contig - contig name
* pos_start - start postition of a block
* pos_end - end postition of a block
* coverage - depth of coverage for a block or single position
* countRef - depth of coverage of reads that have at a give position base equal to reference
* alts - map of alts in format `(ASCII code of alt: coverage)`
* quals - map of base qualities in format (`base: Array[qualities]`)



In [None]:
ss.sql(f'''SELECT contig, pos_start, pos_end, ref, coverage, countRef, alts \
 FROM pileup('{table_name}', '{sample_id}', '{ref_path}', true, true) LIMIT 10''').toPandas()

## `to_charmap`
`to_charmap(quals: Map(Base: Short, Array[BaseQuality])`

##### Convert binary representation of base qualities into human-readable map

#### example:

In [None]:
ss.sql(f'''SELECT quals, to_charmap(quals) AS quals_decoded \
 FROM pileup('{table_name}', '{sample_id}', '{ref_path}', true, true) LIMIT 10''').toPandas()

## `to_char`
`to_char(alts: Map(Alt: Short, coverage:Short)`

##### Convert binary representation of alts into human-readable map with strand information encoded as lower/upper case

#### example:

In [None]:
ss.sql(f'''SELECT alts, alts_to_char(alts) AS alts_decoded \
 FROM pileup('{table_name}', '{sample_id}', '{ref_path}', true, true) LIMIT 10''').toPandas()

In [None]:
ss.stop()