# Coverage analysis

## Initialize SeQuiLaSession and download sample data (check Initialize section for details)

[1]:

%run initialize.ipynb


### coverage

coverage(tableName: String, sampleId: String, refPath: String)

## input parameters

• tableName - registered table over alignment files (see File formats for details)

• sampleId - name of the sample correspondind to the name of the alignment file without a file extension

• refPath - path to the reference file

## returned columns

Blocks or based-level of coverage with the following columns:

• sample_id - name of the sample correspondind to the name of the alignment file without a file extension

• contig - contig name

• pos_start - start postition of a block

• pos_end - end postition of a block

• coverage - depth of coverage for a block or single position

[2]:

ss.sql(f"SELECT * FROM coverage('{table_name}','{sample_id}', '{ref_path}') LIMIT 5").toPandas()

[2]:

contig pos_start pos_end ref coverage
0 1 34 34 R 1
1 1 35 35 R 2
2 1 36 37 R 3
3 1 38 40 R 4
4 1 41 49 R 5

In order to include positions(organized in blocks or base-level) with depth of coverage equal 0 you can set the following parameter:

[3]:

ss.sql("SET spark.biodatageeks.coverage.allPositions=true")
ss.sql(f"SELECT * FROM coverage('{table_name}','{sample_id}', '{ref_path}') LIMIT 5").toPandas()

[3]:

contig pos_start pos_end ref coverage
0 1 34 34 R 1
1 1 35 35 R 2
2 1 36 37 R 3
3 1 38 40 R 4
4 1 41 49 R 5

For more details on other coverage related parameters including reads filtering please refer to SeQuiLa documentation

[4]:

ss.stop()