Equixly is an Italian cybersecurity company founded to revolutionise API security through proprietary artificial intelligence and machine learning algorithms. Our platform automates the API Security Testing process, enabling companies to identify deep vulnerabilities rapidly, scalably, and continuously.
On a strong growth trajectory, we are building out our AI Team with world-class engineers who want to work on real, production-grade LLM systems and high-complexity technical challenges at the cutting edge of applied AI.
The role
We are looking for a Senior AI Performance Engineer to join the AI Team and contribute to the development, optimisation, and productionisation of high-performance Large Language Model systems. You will work across model efficiency, inference, throughput, latency, quantisation, speculative decoding, and LLM training/post-training.
This is a high-impact role with direct influence on Equixly's AI strategy - improving the performance, scalability, and efficiency of the models that power our platform. You will report to the Head of AI.
What you will do
Your key responsibilities will include:
- Optimise LLM inference pipelines, improving throughput, latency, tokens/second, GPU utilisation, and cost per request.
- Design, implement, and evaluate model quantisation strategies - including FP8, NVFP4, INT8/INT4, PTQ, and QAT - balancing performance, memory, and accuracy.
- Experiment with and integrate speculative decoding techniques (draft models, MTP layers, EAGLE, PARD, MLP speculators) to accelerate model generation.
- Work with inference and serving optimisation frameworks such as vLLM, TensorRT-LLM, Hugging Face Transformers, PyTorch, and related tools.
- Define reproducible benchmarks covering latency, throughput, TTFT, TPOT, GPU memory usage, speculative decoding acceptance rate, and model quality regressions.
- Train, fine-tune, and evaluate LLMs using supervised fine-tuning, LoRA/QLoRA, distillation, reinforcement learning, and other post-training paradigms.
- Contribute to reducing model overthinking by optimising response and reasoning trace length without compromising accuracy, reliability, or output quality.
- Curate training, validation, and test datasets; define robust experimental protocols, evaluation metrics, ablation studies, and regression testing procedures.
- Collaborate with the engineering team to bring optimised models to production, ensuring stability, observability, scalability, and integration with the Equixly platform.
- Maintain a continuous awareness of the state of the art in LLM inference, model compression, speculative decoding, scaling laws, efficient training, and AI systems engineering.
What we are looking for
- Proven experience in the development, training, fine-tuning, or deployment of Large Language Models in applied research or production settings.
- Master's degree or PhD in Computer Science, Computer Engineering, AI, Machine Learning, Data Science, Applied Mathematics, Physics, Electronic Engineering, or an equivalent discipline.
- Solid command of PyTorch, Hugging Face Transformers, and modern tools for generative model training and inference.
- Hands-on experience with inference serving and optimisation frameworks: vLLM, TensorRT-LLM, SGLang, ONNX Runtime, or equivalents.
- Practical knowledge of model quantisation and compression techniques — at minimum several of: FP8, NVFP4, INT8, INT4, GPTQ, AWQ, SmoothQuant, PTQ, or QAT.
- Familiarity with speculative decoding architectures: draft models, EAGLE, PARD, Medusa, MTP, MLP speculators, or equivalent approaches.
- Experience evaluating LLM performance through metrics such as throughput, latency, token/sec, memory footprint, quality degradation, accuracy, and cost per inference.
- Knowledge of major training and post-training paradigms: supervised fine-tuning, reinforcement learning, distillation, preference optimisation, and data curation.
- Ability to set up rigorous experimental protocols, with clear train/validation/test separation, reproducible logging, and quantitative analysis of results.
- Good written and spoken English.
Nice to have:
- Experience in cybersecurity, API Security, agentic AI, or reasoning models.
- GPU profiling, CUDA, distributed training, or model deployment in cloud/on-premise environments.
What we offer
- A stimulating and dynamic work environment in an innovative startup at the intersection of cybersecurity and applied AI.
- Real opportunities for professional growth and impact: you will join a fast-evolving AI Team working on production-grade LLM systems and high-complexity technical problems.
- Direct contribution to Equixly's AI strategy - improving the performance, scalability, and efficiency of the models at the core of our platform.
- Competitive compensation commensurate with experience, plus company benefits.