Genomic intervals
Initialize SeQuiLaSession and download sample data (check Initialize section for details)
[1]:
%run initialize.ipynb
Create an additional table with features
[2]:
ss.sql(f'''
CREATE TABLE IF NOT EXISTS targets(contig String,pos_start Integer,pos_end Integer) \
USING csv \
OPTIONS (path "file:///{bed_path}", delimiter "\t")''')
[2]:
DataFrame[]
[3]:
ss.sql('SELECT replace(contig,"chr","") AS contig, pos_start, pos_end FROM targets LIMIT 5').toPandas()
[3]:
contig | pos_start | pos_end | |
---|---|---|---|
0 | 1 | 4505 | 4745 |
1 | 1 | 4806 | 4926 |
2 | 1 | 5614 | 5853 |
3 | 1 | 6462 | 6582 |
4 | 1 | 6697 | 6937 |
Run an interval join query with grouping between reads
and targets
tables
[4]:
ss.sql(f'''
SELECT replace(targets.contig,"chr","") AS contig,targets.pos_start,targets.pos_end, count(*) AS cnt FROM reads JOIN targets
ON (replace(targets.contig,"chr","") = reads.contig
AND
CAST(reads.pos_end AS INTEGER)>=CAST(targets.pos_start AS INTEGER)
AND
CAST(reads.pos_start AS INTEGER)<=CAST(targets.pos_end AS INTEGER)
)
GROUP BY replace(targets.contig,"chr",""),targets.pos_start,targets.pos_end LIMIT 5''').toPandas()
[4]:
contig | pos_start | pos_end | cnt | |
---|---|---|---|---|
0 | 1 | 4806 | 4926 | 16 |
1 | 1 | 5614 | 5853 | 7 |
2 | 1 | 7074 | 7194 | 13 |
3 | 1 | 6697 | 6937 | 18 |
4 | 1 | 6462 | 6582 | 5 |
[ ]: