Opportunity: Bioinformatician -- Cancer Genomics Pipeline @ SeCore Biotech Limited -- London, UK

Alan Katt • March 30, 2026
Opportunity: Bioinformatician -- Cancer Genomics Pipeline @ SeCore Biotech Limited -- London, UK
BACKGROUND:

SeCore Biotech Limited is building an AI-powered platform for early canine cancer detection and personalised neoantigen vaccine design. The platform analyses four genomic signals from a routine canine blood draw – cell-free DNA concentration, fragmentomics entropy, copy number variation, and methylation deviation – and returns a breed-stratified cancer risk score on a 0-10 scale within 24-48 hours.

For high-risk cases, the platform automatically initiates a Phase 2 pipeline that identifies tumour-specific neoantigen peptide candidates using somatic variant calling, DLA allele typing, NetMHCpan binding prediction, AlphaFold-Multimer structural validation, and a proprietary QSA composite ranking formula. The top-ranked candidates are dispatched as a personalised vaccine design pack to a GMP peptide manufacturer.

The company holds a UK patent application covering the detection and vaccine design methodology (31 claims filed). The clinical software platform is live at app.secorebiotech.ai. The Python IP pipeline modules are production-ready with 109 tests passing. The FastAPI backend is fully authenticated and rate-limited.

What does not yet exist – and what this role is specifically hired to build – is the bioinformatics pipeline infrastructure that takes real canine blood sample sequencing data and processes it through the nine analytical stages to produce the feature vector that feeds the ML scoring model. That is the critical path to the first real clinical detection result.

RESPONSIBILITIES:

The SeCore platform has two phases. Phase 1 is the detection pipeline – nine stages that process whole genome sequencing data from a canine blood sample. Phase 2 is the vaccine design pipeline – eight stages that identify personalised neoantigen candidates. This role covers both.

PHASE 1 – DETECTION PIPELINE (9 STAGES)

Stage Tool Your Responsibility

S1 FastQC Containerise, set canine-specific QC thresholds (Q30 ≥ 85%, mapping ≥ 90%)
S2 Trimmomatic Containerise, configure Illumina adapter removal for 150bp PE reads
S3 BWA-MEM2 Containerise, index CanFam4 reference genome, tune alignment parameters
S4 GATK MarkDuplicates Containerise, set optical duplicate distance (2500px for NovaSeq)
S5 pysam + ichorCNA Implement cfDNA quantification and tumour fraction estimation for canine
S6 Custom Python (SeCore IP) Implement fragmentomics entropy scorer – fragment length distribution analysis
S7 CNVkit Containerise, configure for canine genome, produce copy number burden score
S8 Bismark Containerise, configure canine bisulphite alignment, methylation deviation scoring
S9 Custom Python (SeCore IP) Implement signal normalisation using Cancer Risk Library breed-age baselines

PHASE 2 – VACCINE DESIGN PIPELINE (8 STAGES)

Stage Tool Your Responsibility

V1 GATK Mutect2 Implement tumour-normal somatic variant calling, tune filtering parameters
V2 Ensembl VEP 111 Configure CanFam4 VEP cache, annotate variants with protein consequences
V3 SeCore Python (built) Integration only – peptide generator already written and tested
V4 OptiType (adapted) Adapt DLA allele typing for canine DLA-88 and DLA-DQ alleles
V5 NetMHCpan 4.1 Containerise with canine DLA pseudosequences (already assembled in V5 module)
V5b AlphaFold-Multimer Configure GPU pipeline, model weights, pLDDT and RMSD threshold validation
V6 PyTorch model Collaborate with ML engineer on immunogenicity model architecture and training
V7/V8 SeCore Python (built) Integration only – manufacturability scorer and QSA ranker already written

ORCHESTRATION AND INFRASTRUCTURE
- Write the Nextflow pipeline definition (.nf file) chaining all stages S1-S9 and V1-V8
- Configure AWS Batch job queues, compute environments, and spot instance strategies
- Store and manage the CanFam4 reference genome and indexes in S3
- Write Dockerfile for each stage using official bioconda or tool-specific base images
- Push all containers to Amazon ECR
- Write the S3 manifest format passed between stages
- Connect the pipeline output to the FastAPI genomics service (already built)
Detailed Scope of Work

1. Reference Data Setup
- Download CanFam4 reference genome (GCA_011100685.1) and store in S3
- Generate BWA-MEM2 index (~15GB), GATK sequence dictionary, samtools fai index
- Download Ensembl VEP 111 cache for CanFam4 (~8GB)
- Configure dbSNP canine variant database for GATK BQSR
- Document all S3 bucket paths and versioning strategy
2. Docker Container Development
- One container per pipeline stage – 9 for Phase 1, 5 for Phase 2 (V1, V2, V4, V5, V5b)
- Each container: tool pre-installed at a pinned version, health check, entrypoint script
- Containers must accept S3 input paths and write outputs back to S3
- All containers published to Amazon ECR with semantic version tags
- Total estimated containers: 14
3. Nextflow Pipeline
- Write pipeline.nf covering the full S1-S9 detection workflow
- Write vaccine.nf covering the V1-V8 vaccine design workflow
- Handle stage retries, failure logging, and QC gate logic (Q30/mapping thresholds)
- Write nextflow.config for AWS Batch executor with spot interruption handling
- Test with synthetic canine FASTQ input before any real samples
4. Parameter Tuning – Canine-Specific
- GATK Mutect2 somatic calling: minimum VAF, read depth, strand bias filters for canine
- ichorCNA tumour fraction: adapt training panel of normals for canine cfDNA
- CNVkit: build canine reference coverage baseline from normal samples
- Methylation thresholds: define deviation scoring relative to breed-age baselines
- These parameters require biological judgement – this is the most critical part of the role
5. Validation
- Validate Q30/mapping/duplicate rates against published canine WGS benchmarks
- Validate somatic variant calls against known canine cancer driver mutations (e.g. TP53, BRCA2)
- Validate cfDNA quantification against published canine liquid biopsy literature
- Document sensitivity and specificity estimates for each signal
- Write a validation report suitable for inclusion in the patent continuation and regulatory submissions
6. Cancer Risk Library
- Define the data schema for the breed-age population baseline library
- Implement the z-score normalisation formula in Stage S9
- Seed initial baselines from published canine WGS literature where available
- Design the library update process as retrospective cohort data accumulates
7. Integration with SeCore Platform
- The FastAPI genomics service (already built) expects a webhook from the lab and then submits an AWS Batch job
- Connect the Nextflow pipeline to the FastAPI job submission endpoint
- Ensure SSE status updates (already built) reflect real pipeline stage completion
- Ensure the S3 manifest from Stage S9 maps to the feature vector expected by the SageMaker ML scorer
REQUIREMENTS:

ESSENTIAL – MUST HAVE
- PhD or MSc in Bioinformatics, Computational Biology, Genomics, or a closely related field
- Minimum 3 years of hands-on experience running NGS pipelines in a research or clinical setting
- Direct experience with BWA, GATK, and Samtools in production – not just academic coursework
- Experience writing Nextflow or Snakemake workflow definitions
- Strong Python – able to write and debug bioinformatics scripts independently
- Experience with Docker container development and Docker in bioinformatics contexts
- Familiarity with AWS (S3, Batch, EC2) or equivalent cloud compute platform
- Experience with somatic variant calling (tumour-normal or tumour-only)
- Understanding of cfDNA biology and liquid biopsy methodology
HIGHLY DESIRABLE – STRONG PREFERENCE
- Experience with canine genomics or veterinary bioinformatics
- Experience with NetMHCpan or other MHC binding prediction tools
- Experience with AlphaFold or structural protein prediction
- Experience with CNVkit or ichorCNA copy number analysis
- Experience with Bismark or WGBS methylation analysis
- Experience in a clinical or regulated environment (GCP, CLIA, or equivalent)
- Experience with neoantigen identification for personalised cancer vaccine programmes
- Knowledge of canine MHC (DLA) allele biology
NICE TO HAVE
- Experience with Ensembl VEP annotation
- Experience with Nextflow Tower or Seqera Platform for pipeline monitoring
- Familiarity with AWS Batch spot instance configuration
- Published research in cancer genomics, liquid biopsy, or immunogenomics
TERMS:
- Remote-first. UK timezone strongly preferred for overlap with clinical partners.
- Direct access to the founder and software team via Slack and weekly calls.
- Full access to all three GitHub repositories (Secore-Platform, secore-api, secore-pipeline).
- AWS account provided with appropriate IAM permissions and a budget for compute.
- NetMHCpan commercial licence being applied for – available before V5 work begins.
- CanFam4 reference genome download and S3 storage costs covered by SeCore.
- AlphaFold model weights (~500GB) download and GPU instance costs covered by SeCore.
- Technical blueprint and full architecture documentation provided on day one.
COMPENSATION:

Contract based, $300-600 a day

HOW TO APPLY:

Please send the following to alan[at]secorebiotech.ai with the subject line: Bioinformatician Application – SCB-BIO-001
- Your CV or LinkedIn profile
- A brief paragraph (4-6 sentences) describing a Nextflow or Snakemake pipeline you have built, what tools it ran, and how you validated the biological outputs
- Links to any relevant GitHub repositories or published papers
- Your availability and preferred engagement structure (contract rate, hours per week, start date)
Shortlisted candidates will be asked to complete a short technical assessment: given a canine VCF file, call somatic variants using GATK Mutect2 and produce a filtered output. This takes approximately 2 hours and is paid.

Discussion forums: Opportunity: Bioinformatician -- Cancer Genomics Pipeline @ SeCore Biotech Limited -- London, UK

Expanded view | Monitor forum | Save place

Start a new thread:

You have to be to post a reply.