pySeQuiLa: A distributed analytics for genomics!

Learn More Source code PyPI package Helm chart Research papers Try it in Google Colab

Analyze population-scale datasets seamlessly - in the cloud!

pySeQuiLa is a Python wrapper for SeQuiLa, an ANSI-SQL compliant solution for distributed processing of next generation sequencing data built on top of Apache Spark.

pySeQuiLa extends Apache Spark with highly efficient implementations of common bioinformatics operations such as interval joins, depth of coverage or pileup.

It combines analytical power of Python with SQL syntax for almost unlimited querying and processing of NGS data.

Big data ready, cloud-native

Join us on Slack!

Contributions welcome!

We do a Pull Request contributions workflow on GitHub. New users are always welcome!

Follow us on Twitter!

For announcement of latest features etc.