Apply on
Original
Simplified
Responsibilities:
- Design and develop data ingestion pipelines to extract data from various sources, including databases, data lakes, and streaming platforms, into cloud-based data repositories.
- Build and maintain ETL/ELT processes using cloud-native services and tools (e.g., Google Cloud Dataflow, Azure Data Factory, or AWS Glue) to transform and load data into data warehouses or data lakes.
- Implement and optimize data processing workflows using distributed computing frameworks like Apache Spark, Apache Beam, or cloud-native services like Azure Databricks, Google Dataproc, or Amazon EMR.
- Develop and maintain data storage solutions, such as data lakes (e.g., Azure Data Lake Storage, Google Cloud Storage, Amazon S3) and data warehouses (e.g., Azure Synapse Analytics, Google BigQuery, Amazon Redshift).
- Collaborate with data architects, data scientists, and analysts to understand data requirements and implement efficient data models and schemas.
- Ensure data quality, integrity, and security by implementing data validation, monitoring, and governance processes.
- Automate and orchestrate data pipelines using cloud-native tools (e.g., Azure Data Factory, Google Cloud Composer, AWS Step Functions) for efficient and reliable data processing.
- Optimize data pipelines and infrastructure for performance, scalability, and cost-effectiveness, leveraging cloud-native services and best practices.
- Provide technical support and documentation for data processing solutions and infrastructure.
Qualifications:
- Bachelor’s or master’s degree in computer science, engineering, or a related field.
- Minimum of 2 years of experience as a Data Engineer, with a strong focus on cloud technologies.
- Proven expertise in at least one major cloud platform (Azure, GCP, or AWS) and its data services and tools. Preferably Google Cloud Platform (GCP), using services like Cloud Dataflow, Cloud Dataprep, Cloud Dataproc, and BigQuery.
- Proficient in Python, SQL, and scripting languages like Bash or PowerShell.
- Experience with data processing frameworks and libraries like Apache Spark, Apache Beam, pandas etc.
- Knowledge of data warehousing concepts, data modeling techniques (e.g., star schema, dimensional modeling), and ETL/ELT processes.
Similar Jobs