Research Engineer

VishalYadav

Building the infrastructure that tells you when AI is wrong - and working to understand why.


Published Research · Explainable AI · AI Safety · London

I design and evaluate AI systems with a focus on measurable safety, reliability, and bias reduction.

Current focus

Making AI measurably safer through better evaluation, observability, and bias mitigation.

I build evaluation frameworks and observability infrastructure for deployed AI systems. At Arva AI, I redesigned benchmarking pipelines and built end-to-end agentic tracing - replacing unstructured logs with structured visibility across the full agent lifecycle.

My research background is in clinical AI - specifically demographic bias in pediatric mental health records. That work, developed during two years as a Research Assistant at Queen Mary University of London, was published in Nature (2026).

Open to research roles in AI safety, evaluation, and interpretability.

Selected work on clinical AI bias, evaluation methodology, and research communication.

[2]

Privacy-Preserving Behaviour of Chatbot Users: Steering Through Trust Dynamics

arXiv · November 2024

Preprint
[3]

Traceability Solution for SMEs

IJSREM · June 2021

Peer reviewed
[4]

Deep Neural Network Compiler

IRISS 2020 · March 2020

Conference

Selected Talks & Presentations

2024Poster - AI4H Conference, Italy - Bias mitigation for pediatric EHR notes

2024Alan Turing Data Science Conference - EHR bias research

2022Intelligent Sensing Winter School, QMUL - Explainable AI in Computer Vision

2020IRISS, IIT Gandhinagar - Deep Neural Network Compiler

2023ICRA 2023 - Volunteer

Roles focused on AI evaluation infrastructure, safety research, and applied model quality.

Mar 16 2026 / Mar 21 2026

Technical AI Safety Course

BlueDot Impact · Remote

  • Completed an intensive technical program focused on practical AI safety concepts and risk-aware system design.
  • Worked through hands-on exercises covering evaluation, failure analysis, and mitigation-oriented thinking for modern AI systems.
  • Collaborated in discussion-based sessions to apply safety principles to real-world deployment scenarios.

Dec 2025 / Mid Mar 2026

Research Engineer

Arva AI · London

  • Redesigned benchmarking framework - decomposed a flawed combined metric into factor-specific accuracy scores for verdict prediction and discounting.
  • Built full agentic observability with Langfuse - structured tracing of all agent calls across offline and online environments.
  • Developed complete evaluation infrastructure: custom evaluators, golden dataset, scoring logic, and construction guidelines.

Mar 2025 / Nov 2025

LLM Trainer & Evaluator

Mercor Intelligence · Remote

  • Designed complex scenarios to stress-test conversational AI - identifying edge cases and failure modes systematically.
  • Built evaluation rubrics quantifying performance across accuracy, contextual alignment, and user experience.

Nov 2023 / Feb 2025

Research Assistant

Queen Mary University of London · London

  • Developed bias detection and mitigation algorithms for clinical AI; published in Nature (2026).
  • Improved anxiety detection by 10% using time series-based medical NER.
  • Co-developed a clinical AI platform with NHS DialogPlus on Azure GPU infrastructure.

Jul 2021 / Jul 2022

Product Engineer, AI Technology

AI Technology & Systems · California (Remote)

  • Built a Deep Neural Network Compiler using Eigen, ONNX, and Caffe for edge devices.
  • Supervised 45 interns building TinyML applications.

Nov 2019 / Mar 2021

Research Intern

Indian Institute of Technology · Indore

  • Developed AR applications and a traceability app for Android - published.
  • Contributed to an Intelligent AGV for smart manufacturing (Industry 4.0).

Practical builds spanning evaluation tooling, multimodal AI, and production-oriented research systems.

Multimodal Hate Speech Detection

Compared early, late, and cross-attention fusion across BERT, ViT, and VisualBERT for multimodal classification.

NLPMultimodalTransformers

May - Sep 2023

Article Person Verification Agent

Autonomous adverse media screening using LangGraph and Gemini - with MLflow tracing and multilingual support across 6+ scripts.

LangGraphMLflowGemini

2025

Brain Tumor Segmentation

3D FMRI segmentation using NVIDIA deep learning libraries and SAM.

Medical AIComputer VisionPyTorch

Sep - Dec 2023

Traceability App for SMEs

End-to-end supply chain traceability; launched on Google Play Store.

AndroidIndustry 4.0

2021

DNN Compiler for Edge Devices

Compiled high-level DNN specs to optimised machine code for constrained hardware using C++ and Eigen.

C++Edge AIONNX

2020

Core stack for building, evaluating, and shipping reliable AI systems.

AI & Research

Evaluation Frameworks, Bias Mitigation, Benchmarking, Explainability (LIME, GradCAM), RLHF, Agentic AI, RAG, Generative AI, Reinforcement Learning, NLP, Computer Vision, Deep Learning

Infra & MLOps

Langfuse, MLFlow, Vertex AI, AWS, Azure, Terraform, CUDA, GPU Optimisation, Docker

Languages

Python, C/C++, SQL, TypeScript, Bash, R

Frameworks

PyTorch, TensorFlow, HuggingFace, Scikit-Learn, SpaCy, OpenCV, NumPy, Pandas, Playwright

Databases

PostgreSQL, MongoDB, Firebase, Redis

Let's talk.

I'm open to research collaborations, full-time roles in AI safety and evaluation, and conversations about the field. Best reached by email.

visdav8@gmail.com
Contact