Dyson’s new Data Intelligence team is seeking a specialized Data Intelligence Machine Learning Engineer in Dubai to design and implement in-house tools that automate data labelling pipelines. This role reduces reliance on manual annotation by leveraging Active Learning, Weak Supervision, and Synthetic Data Generation, bridging the gap between raw data collection and model-ready datasets to ensure high-quality labels at scale for Dyson’s next generation of connected, intelligent products.
About Dyson — Engineering, AI & Robotics Innovation
Innovation Driven: Relentless pursuit of innovation across engineering, AI, and robotics
Data Intelligence Team: A new team shaping Dyson’s future through data strategy and pipelines
Global Collaboration: Work alongside Dyson’s global engineering team and external software/hardware partners
Culture of Exploration: An environment built for exploration, discovery, delivery, and impact
Why This Role Matters at Dyson
Cutting-Edge Focus: Architect automated labelling pipelines using Snorkel, Cleanlab, or custom active learning loops
Human-in-the-Loop Systems: Build interfaces where models pre-label data and humans intervene only on high-uncertainty samples
Cross-Functional Impact: Collaborate with Data Scientists, Software Engineers, and Product teams
Global Reach: Power data strategy for connected devices used across Dyson’s worldwide product portfolio
Position Overview
This Data Intelligence Machine Learning Engineer role focuses on designing and deploying end-to-end automated labelling systems to reduce manual annotation effort. You will build Human-in-the-Loop (HITL) workflows, implement algorithmic checks to detect and correct noisy or mislabelled data, integrate labelling tools with data lakes and ML training infrastructure, fine-tune teacher models for high-quality pseudo-labels, and maintain robust data preparation infrastructure optimized for quality, speed, and seamless MLOps integration.
Why This Role Matters: As a Data Intelligence Machine Learning Engineer at Dyson, you build the automated labelling infrastructure powering next-generation connected products, work with cutting-edge frameworks like Snorkel and Cleanlab, gain hands-on exposure to Active Learning and Weak Supervision at scale, collaborate with global engineering and product teams, and grow your career within a company defined by engineering, AI, and robotics innovation.
Key Responsibilities
Automated Labelling Pipeline Design
- Architect and deploy end-to-end automated labelling systems using Snorkel, Cleanlab, or custom active learning loops
- Develop Human-in-the-Loop (HITL) systems where models pre-label data and humans intervene on high-uncertainty samples
- Implement algorithmic checks to identify and correct mislabelled or noisy data within existing datasets
Tooling, Integration & Model Optimization
- Collaborate with software engineers to integrate labelling tools with data lakes and ML training infrastructure
- Fine-tune “teacher” models to generate high-quality pseudo-labels for “student” models
- Set up and maintain robust data preparation infrastructure optimized for quality, speed, and MLOps integration
Data Analysis & Cross-Functional Collaboration
- Perform data visualization and in-depth analysis using advanced data and feature engineering techniques
- Transform raw data into actionable insight supporting both research and deployment
- Work closely with Data Scientists, Software Engineers, and Product teams to ensure high data quality and usability
Qualifications & Requirements
Educational Requirements
- Bachelor’s or Master’s degree in Computer Science, Engineering, Mathematics, Data Science, or related field
Experience Requirements
- At least 3+ years of professional experience in Machine Learning engineering, focused on data-centric AI or computer vision/NLP pipelines
- Proven experience with Weak Supervision (labelling functions) or Active Learning strategies (uncertainty sampling, diversity sampling)
- Hands-on expertise building auto-labelling solutions or large-scale data annotation workflows
Essential Skills
- Proficiency in Python and the ML stack: PyTorch or TensorFlow, NumPy, Pandas, Scikit-learn
- Experience with SQL and NoSQL databases, managing large-scale unstructured data (images, text, or audio)
- Familiarity with AWS SageMaker Ground Truth, GCP Vertex AI, or Azure ML labelling services
- Experience with DVC (Data Version Control) or similar tools to track dataset iterations
- Strong background in feature engineering, data analysis, and visualization using Jupyter, Tableau, or Power BI
- Great communicator who documents solutions clearly across technical and non-technical teams
About Dyson — Engineering Without Limits
Join Dyson, a company driven by a relentless pursuit of innovation across engineering, AI, and robotics. Dyson’s Data Intelligence team sits at the heart of this mission, shaping the company’s future through data strategy and pipelines that power intelligent, connected products. Dyson is an equal opportunity employer, welcoming applications from all backgrounds, with employment decisions made without regard to race, religion, gender, age, disability, or any other dimension of diversity.
Who Should Apply?
- ML Engineers: With 3+ years of experience in data-centric AI or computer vision/NLP pipelines
- Active Learning Specialists: Skilled in uncertainty sampling, diversity sampling, or weak supervision
- Data Engineers: Experienced building scalable, cloud-integrated data pipelines
- MLOps Practitioners: Comfortable maintaining data preparation infrastructure for production systems
- AI & Robotics Enthusiasts: Seeking to shape data strategy for next-generation connected devices
Recently Opening Job👇



