Nathan Claire Group is a leading Information Technology Consulting company that specializes in services supporting digital transformation. We are committed to delivering innovative solutions and cutting-edge technologies to our clients. As an Information Technology Consultant at Nathan Claire Group, you will have the opportunity to gain hands-on experience and work alongside industry professionals in a dynamic and fast-paced environment.
We are seeking an EXPERIENCED Data Lake Implementation Specialist to be responsible for guiding the setup and/or integration of on-premises and cloud data lakes to enable real-time analytics and AI in medium to large digital businesses. Experience in Apache Doris is an added advantage.
Core Skills & Expertise
Data Lake Architecture (Hybrid & Multi-Cloud)
- Designing modern data lakehouses with raw + curated layers, unified batch + streaming ingestion
- Integration with enterprise systems and support for schema-on-read
- Familiarity with lakehouse tools: Delta Lake, Apache Iceberg, Hudi
Real-Time Data Processing
- Expertise with streaming architectures: Apache Kafka, Flink, Spark Streaming
- Experience with event-driven design, CDC, and real-time ETL tool
- Delivered at least one large-scale Doris-based or comparable OLAP system in production
- Tools: Debezium, StreamSets, Apache NiFi
Cloud & On-Prem Data Services
- Cloud: AWS (S3, Glue, EMR, Kinesis), Azure (ADLS Gen2, Synapse), GCP (BigLake, Dataflow)
- On-prem: Hadoop, Cloudera, MapR, private cloud environments
AI/ML Enablement
Data Preparation for AI/ML
- Building pipelines for feature extraction and versioning datasets
- Integration with feature stores and data quality enforcement
- Integration with ML pipelines (Kubeflow, MLflow, SageMaker)
- Model deployment, tuning, and monitoring at scale
Analytics & BI Integration
- Support for BI tools (Power BI, Tableau) and fast querying layers (Presto, Trino)
- Near real-time dashboard enablement
Governance, Observability, and Security
Enterprise Data Governance
- Implementing data ownership, lineage, and access policies
- Use of catalogs: Collibra, Apache Atlas, AWS Glue Catalog
Observability & Monitoring
- End-to-end pipeline visibility, logs, and metrics
- Tools: Prometheus, Grafana, OpenTelemetry, Monte Carlo
Security & Compliance
- Encryption, tokenization, and data masking
- Adhering to regulations: GDPR, HIPAA, SOC2
Execution Experience
Large-Scale Implementations
- Hands-on delivery of hybrid data lake architectures
- Experience with syncing on-prem and cloud data systems
Cross-Functional Leadership
- Working with data scientists, product teams, and security teams
- Leading data platform teams or Centers of Excellence
Agility at Scale
- Agile delivery models for data initiatives
- Delivering data products and ML capabilities incrementally
Ideal candidate profile summary
A hands-on and strategic data lake architect/engineer with deep knowledge of hybrid and multi-cloud systems, proven experience with streaming data and ML enablement, and the leadership to orchestrate teams around real-time analytics and decision intelligence for digital enterprise scale.
Bonus: Certifications & Tools
Certifications
- AWS/GCP/Azure Data Engineer or ML Engineer
- Databricks Lakehouse Accreditation
- CDMP or DAMA certification
Tools Stack
- Airflow, dbt, Spark, Flink, Kafka
- Terraform, GitOps, CI/CD
- MLflow, Feature Store, SageMaker, Vertex AI
- Apache Ranger, Atlas, Lake Formation
Method of Application
Signup to view application details.
Signup Now