Apache spark jobs

50+ Apache Spark Jobs in India

Apply to 50+ Apache Spark Jobs on CutShort.io. Find your next job, effortlessly. Browse Apache Spark Jobs and apply today!

SDE - II / III (Java, Kafka, Data Engineering

at Talent Pro

Posted by Mayank choudhary

Bengaluru (Bangalore)

4 - 8 yrs

₹26L - ₹35L / yr

Java

Spring Boot

Google Cloud Platform (GCP)

Distributed Systems

Microservices

+3 more

Role & Responsibilities

Responsible for ensuring that the architecture and design of the platform remains top-notch with respect to scalability, availability, reliability and maintainability

Act as a key technical contributor as well as a hands-on contributing member of the team.

Own end-to-end availability and performance of features, driving rapid product innovation while ensuring a reliable service.

Working closely with the various stakeholders like Program Managers, Product Managers, Reliability and Continuity Engineering(RCE) team, QE team to estimate and execute features/tasks independently.

Maintain and drive tech backlog execution for non-functional requirements of the platform required to keep the platform resilient

Assist in release planning and prioritization based on technical feasibility and engineering constraints

A zeal to continually find new ways to improve architecture, design and ensure timely delivery and high quality.

Role & Responsibilities

Responsible for ensuring that the architecture and design of the platform remains top-notch with respect to scalability, availability, reliability and maintainability

Act as a key technical contributor as well as a hands-on contributing member of the team.

Own end-to-end availability and performance of features, driving rapid product innovation while ensuring a reliable service.

Working closely with the various stakeholders like Program Managers, Product Managers, Reliability and Continuity Engineering(RCE) team, QE team to estimate and execute features/tasks independently.

Maintain and drive tech backlog execution for non-functional requirements of the platform required to keep the platform resilient

Assist in release planning and prioritization based on technical feasibility and engineering constraints

A zeal to continually find new ways to improve architecture, design and ensure timely delivery and high quality.

Sr. Data Engineer

at Trellissoft Inc.

3 candid answers

Posted by Nikita Sinha

Bengaluru (Bangalore)

6 - 9 yrs

Upto ₹25L / yr (Varies

)

Data Warehouse (DWH)

SQL

ETL

Amazon Web Services (AWS)

Google Cloud Platform (GCP)

+3 more

We’re looking for an experienced Senior Data Engineer to lead the design and development of scalable data solutions at our company. The ideal candidate will have extensive hands-on experience in data warehousing, ETL/ELT architecture, and cloud platforms like AWS, Azure, or GCP. You will work closely with both technical and business teams, mentoring engineers while driving data quality, security, and performance optimization.

Responsibilities:

Lead the design of data warehouses, lakes, and ETL workflows.
Collaborate with teams to gather requirements and build scalable solutions.
Ensure data governance, security, and optimal performance of systems.
Mentor junior engineers and drive end-to-end project delivery.

Requirements:

6+ years of experience in data engineering, including at least 2 full-cycle data warehouse projects.
Strong skills in SQL, ETL tools (e.g., Pentaho, dbt), and cloud platforms.
Expertise in big data tools (e.g., Apache Spark, Kafka).
Excellent communication skills and leadership abilities.

Preferred: Experience with workflow orchestration tools (e.g., Airflow), real-time data, and DataOps practices.

Responsibilities:

Lead the design of data warehouses, lakes, and ETL workflows.
Collaborate with teams to gather requirements and build scalable solutions.
Ensure data governance, security, and optimal performance of systems.
Mentor junior engineers and drive end-to-end project delivery.

Requirements:

6+ years of experience in data engineering, including at least 2 full-cycle data warehouse projects.
Strong skills in SQL, ETL tools (e.g., Pentaho, dbt), and cloud platforms.
Expertise in big data tools (e.g., Apache Spark, Kafka).
Excellent communication skills and leadership abilities.

Preferred: Experience with workflow orchestration tools (e.g., Airflow), real-time data, and DataOps practices.

Azure Data Engineer

NA

Agency job

via Method Hub by Sampreetha Pai

anywhere in India

4 - 5 yrs

₹18L - ₹22L / yr

SQL Azure

Apache Spark

DevOps

PySpark

Python

+1 more

Azure DE

Primary Responsibilities -

Create and maintain data storage solutions including Azure SQL Database, Azure Data Lake, and Azure Blob Storage.
Design, implement, and maintain data pipelines for data ingestion, processing, and transformation in Azure Create data models for analytics purposes
Utilizing Azure Data Factory or comparable technologies, create and maintain ETL (Extract, Transform, Load) operations
Use Azure Data Factory and Databricks to assemble large, complex data sets
Implementing data validation and cleansing procedures will ensure the quality, integrity, and dependability of the data.
Ensure data security and compliance
Collaborate with data engineers, and other stakeholders to understand requirements and translate them into scalable and reliable data platform architectures

Required skills:

Blend of technical expertise, analytical problem-solving, and collaboration with cross-functional teams
Azure DevOps
Apache Spark, Python
SQL proficiency
Azure Databricks knowledge
Big data technologies

The DEs should be well versed in coding, spark core and data ingestion using Azure. Moreover, they need to be decent in terms of communication skills. They should also have core Azure DE skills and coding skills (pyspark, python and SQL).

Out of the 7 open demands, 5 of The Azure Data Engineers should have minimum 5 years of relevant Data Engineering experience.

Azure DE

Primary Responsibilities -

Create and maintain data storage solutions including Azure SQL Database, Azure Data Lake, and Azure Blob Storage.
Design, implement, and maintain data pipelines for data ingestion, processing, and transformation in Azure Create data models for analytics purposes
Utilizing Azure Data Factory or comparable technologies, create and maintain ETL (Extract, Transform, Load) operations
Use Azure Data Factory and Databricks to assemble large, complex data sets
Implementing data validation and cleansing procedures will ensure the quality, integrity, and dependability of the data.
Ensure data security and compliance
Collaborate with data engineers, and other stakeholders to understand requirements and translate them into scalable and reliable data platform architectures

Required skills:

Blend of technical expertise, analytical problem-solving, and collaboration with cross-functional teams
Azure DevOps
Apache Spark, Python
SQL proficiency
Azure Databricks knowledge
Big data technologies

Out of the 7 open demands, 5 of The Azure Data Engineers should have minimum 5 years of relevant Data Engineering experience.

Data Engineer

at NonStop io Technologies Pvt Ltd

6 recruiters

Posted by Kalyani Wadnere

Pune

2 - 4 yrs

Best in industry

AWS Lambda

databricks

Database migration

Apache Kafka

Apache Spark

+3 more

About NonStop io Technologies:

NonStop io Technologies is a value-driven company with a strong focus on process-oriented software engineering. We specialize in Product Development and have a decade's worth of experience in building web and mobile applications across various domains. NonStop io Technologies follows core principles that guide its operations and believes in staying invested in a product's vision for the long term. We are a small but proud group of individuals who believe in the 'givers gain' philosophy and strive to provide value in order to seek value. We are committed to and specialize in building cutting-edge technology products and serving as trusted technology partners for startups and enterprises. We pride ourselves on fostering innovation, learning, and community engagement. Join us to work on impactful projects in a collaborative and vibrant environment.

Brief Description:

We are looking for a talented Data Engineer to join our team. In this role, you will design, implement, and manage data pipelines, ensuring the accessibility and reliability of data for critical business processes. This is an exciting opportunity to work on scalable solutions that power data-driven decisions

Skillset:

Here is a list of some of the technologies you will work with (the list below is not set in stone)

Data Pipeline Orchestration and Execution:

● AWS Glue

● AWS Step Functions

● Databricks Change

Data Capture:

● Amazon Database Migration Service

● Amazon Managed Streaming for Apache Kafka with Debezium Plugin

Batch:

● AWS step functions (and Glue Jobs)

● Asynchronous queueing of batch job commands with RabbitMQ to various “ETL Jobs”

● Cron and subervisord processing on dedicated job server(s): Python & PHP

Streaming:

● Real-time processing via AWS MSK (Kafka), Apache Hudi, & Apache Flink

● Near real-time processing via worker (listeners) spread over AWS Lambda, custom server (daemons) written in Python and PHP Symfony

● Languages: Python & PySpark, Unix Shell, PHP Symfony (with Doctrine ORM)

● Monitoring & Reliability: Datadog & Cloudwatch

Things you will do:

● Build dashboards using Datadog and Cloudwatch to ensure system health and user support

● Build schema registries that enable data governance

● Partner with end-users to resolve service disruptions and evangelize our data product offerings

● Vigilantly oversee data quality and alert upstream data producers of issues

● Support and contribute to the data platform architecture strategy, roadmap, and implementation plans to support the company’s data-driven initiatives and business objective

● Work with Business Intelligence (BI) consumers to deliver enterprise-wide fact and dimension data product tables to enable data-driven decision-making across the organization.

● Other duties as assigned

About NonStop io Technologies:

Brief Description:

Skillset:

Here is a list of some of the technologies you will work with (the list below is not set in stone)

Data Pipeline Orchestration and Execution:

● AWS Glue

● AWS Step Functions

● Databricks Change

Data Capture:

● Amazon Database Migration Service

● Amazon Managed Streaming for Apache Kafka with Debezium Plugin

Batch:

● AWS step functions (and Glue Jobs)

● Asynchronous queueing of batch job commands with RabbitMQ to various “ETL Jobs”

● Cron and subervisord processing on dedicated job server(s): Python & PHP

Streaming:

● Real-time processing via AWS MSK (Kafka), Apache Hudi, & Apache Flink

● Near real-time processing via worker (listeners) spread over AWS Lambda, custom server (daemons) written in Python and PHP Symfony

● Languages: Python & PySpark, Unix Shell, PHP Symfony (with Doctrine ORM)

● Monitoring & Reliability: Datadog & Cloudwatch

Things you will do:

● Build dashboards using Datadog and Cloudwatch to ensure system health and user support

● Build schema registries that enable data governance

● Partner with end-users to resolve service disruptions and evangelize our data product offerings

● Vigilantly oversee data quality and alert upstream data producers of issues

● Support and contribute to the data platform architecture strategy, roadmap, and implementation plans to support the company’s data-driven initiatives and business objective

● Work with Business Intelligence (BI) consumers to deliver enterprise-wide fact and dimension data product tables to enable data-driven decision-making across the organization.

● Other duties as assigned

Senior Data Engineer

at Talent Pro

Posted by Mayank choudhary

Bengaluru (Bangalore)

3 - 5 yrs

₹20L - ₹25L / yr

ETL

SQL

Apache Spark

Apache Kafka

Role & Responsibilities

About the Role:

We are seeking a highly skilled Senior Data Engineer with 5-7 years of experience to join our dynamic team. The ideal candidate will have a strong background in data engineering, with expertise in data warehouse architecture, data modeling, ETL processes, and building both batch and streaming pipelines. The candidate should also possess advanced proficiency in Spark, Databricks, Kafka, Python, SQL, and Change Data Capture (CDC) methodologies.

Key responsibilities:

Design, develop, and maintain robust data warehouse solutions to support the organization's analytical and reporting needs.

Implement efficient data modeling techniques to optimize performance and scalability of data systems.

Build and manage data lakehouse infrastructure, ensuring reliability, availability, and security of data assets.

Develop and maintain ETL pipelines to ingest, transform, and load data from various sources into the data warehouse and data lakehouse.

Utilize Spark and Databricks to process large-scale datasets efficiently and in real-time.

Implement Kafka for building real-time streaming pipelines and ensure data consistency and reliability.

Design and develop batch pipelines for scheduled data processing tasks.

Collaborate with cross-functional teams to gather requirements, understand data needs, and deliver effective data solutions.

Perform data analysis and troubleshooting to identify and resolve data quality issues and performance bottlenecks.

Stay updated with the latest technologies and industry trends in data engineering and contribute to continuous improvement initiatives.