PySpark Jobs in Mumbai

10+ PySpark Jobs in Mumbai | PySpark Job openings in Mumbai

Apply to 10+ PySpark Jobs in Mumbai on CutShort.io. Explore the latest PySpark Job opportunities across top companies like Google, Amazon & Adobe.

Data Engineer

at ZeMoSo Technologies

11 recruiters

Agency job

via TIGI HR Solution Pvt. Ltd. by Vaidehi Sarkar

Mumbai, Bengaluru (Bangalore), Hyderabad, Chennai, Pune

4 - 8 yrs

₹10L - ₹15L / yr

Data engineering

Python

SQL

Data Warehouse (DWH)

Amazon Web Services (AWS)

+3 more

Work Mode: Hybrid

Need B.Tech, BE, M.Tech, ME candidates - Mandatory

Must-Have Skills:

● Educational Qualification :- B.Tech, BE, M.Tech, ME in any field.

● Minimum of 3 years of proven experience as a Data Engineer.

● Strong proficiency in Python programming language and SQL.

● Experience in DataBricks and setting up and managing data pipelines, data warehouses/lakes.

● Good comprehension and critical thinking skills.

● Kindly note Salary bracket will vary according to the exp. of the candidate -

- Experience from 4 yrs to 6 yrs - Salary upto 22 LPA

- Experience from 5 yrs to 8 yrs - Salary upto 30 LPA

- Experience more than 8 yrs - Salary upto 40 LPA

Work Mode: Hybrid

Need B.Tech, BE, M.Tech, ME candidates - Mandatory

Must-Have Skills:

● Educational Qualification :- B.Tech, BE, M.Tech, ME in any field.

● Minimum of 3 years of proven experience as a Data Engineer.

● Strong proficiency in Python programming language and SQL.

● Experience in DataBricks and setting up and managing data pipelines, data warehouses/lakes.

● Good comprehension and critical thinking skills.

● Kindly note Salary bracket will vary according to the exp. of the candidate -

- Experience from 4 yrs to 6 yrs - Salary upto 22 LPA

- Experience from 5 yrs to 8 yrs - Salary upto 30 LPA

- Experience more than 8 yrs - Salary upto 40 LPA

Data Engineer

at Deqode

1 recruiter

Posted by Alisha Das

Bengaluru (Bangalore), Delhi, Gurugram, Noida, Ghaziabad, Faridabad, Mumbai, Pune, Hyderabad, Indore, Jaipur, Kolkata

4 - 5 yrs

₹2L - ₹18L / yr

Python

PySpark

We are looking for a skilled and passionate Data Engineers with a strong foundation in Python programming and hands-on experience working with APIs, AWS cloud, and modern development practices. The ideal candidate will have a keen interest in building scalable backend systems and working with big data tools like PySpark.

Key Responsibilities:

Write clean, scalable, and efficient Python code.
Work with Python frameworks such as PySpark for data processing.
Design, develop, update, and maintain APIs (RESTful).
Deploy and manage code using GitHub CI/CD pipelines.
Collaborate with cross-functional teams to define, design, and ship new features.
Work on AWS cloud services for application deployment and infrastructure.
Basic database design and interaction with MySQL or DynamoDB.
Debugging and troubleshooting application issues and performance bottlenecks.

Required Skills & Qualifications:

4+ years of hands-on experience with Python development.
Proficient in Python basics with a strong problem-solving approach.
Experience with AWS Cloud services (EC2, Lambda, S3, etc.).
Good understanding of API development and integration.
Knowledge of GitHub and CI/CD workflows.
Experience in working with PySpark or similar big data frameworks.
Basic knowledge of MySQL or DynamoDB.
Excellent communication skills and a team-oriented mindset.

Nice to Have:

Experience in containerization (Docker/Kubernetes).
Familiarity with Agile/Scrum methodologies.

Key Responsibilities:

Write clean, scalable, and efficient Python code.
Work with Python frameworks such as PySpark for data processing.
Design, develop, update, and maintain APIs (RESTful).
Deploy and manage code using GitHub CI/CD pipelines.
Collaborate with cross-functional teams to define, design, and ship new features.
Work on AWS cloud services for application deployment and infrastructure.
Basic database design and interaction with MySQL or DynamoDB.
Debugging and troubleshooting application issues and performance bottlenecks.

Required Skills & Qualifications:

4+ years of hands-on experience with Python development.
Proficient in Python basics with a strong problem-solving approach.
Experience with AWS Cloud services (EC2, Lambda, S3, etc.).
Good understanding of API development and integration.
Knowledge of GitHub and CI/CD workflows.
Experience in working with PySpark or similar big data frameworks.
Basic knowledge of MySQL or DynamoDB.
Excellent communication skills and a team-oriented mindset.

Nice to Have:

Experience in containerization (Docker/Kubernetes).
Familiarity with Agile/Scrum methodologies.

Data Engineer

IT Service company

Agency job

via Vinprotoday by Vikas Gaur

Mumbai

4 - 10 yrs

₹8L - ₹30L / yr

Google Cloud Platform (GCP)

Workflow

TensorFlow

Deployment management

PySpark

+1 more

Key Responsibilities:

Design, develop, and optimize scalable data pipelines and ETL processes.

Work with large datasets using GCP services like BigQuery, Dataflow, and Cloud Storage.

Implement real-time data streaming and processing solutions using Pub/Sub and Dataproc.

Collaborate with cross-functional teams to ensure data quality and governance.

Technical Requirements:

4+ years of experience in Data Engineering.

Strong expertise in GCP services like Workflow,tensorflow, Dataproc, and Cloud Storage.

Proficiency in SQL and programming languages such as Python or Java

.Experience in designing and implementing data pipelines

and working with real-time data processing.

Familiarity with CI/CD pipelines and cloud security best practices.

Key Responsibilities:

Design, develop, and optimize scalable data pipelines and ETL processes.

Work with large datasets using GCP services like BigQuery, Dataflow, and Cloud Storage.

Implement real-time data streaming and processing solutions using Pub/Sub and Dataproc.

Collaborate with cross-functional teams to ensure data quality and governance.

Technical Requirements:

4+ years of experience in Data Engineering.

Strong expertise in GCP services like Workflow,tensorflow, Dataproc, and Cloud Storage.

Proficiency in SQL and programming languages such as Python or Java

.Experience in designing and implementing data pipelines

and working with real-time data processing.

Familiarity with CI/CD pipelines and cloud security best practices.

Data Engineer(AWS+PYSPARK+SQL)

at ProtoGene Consulting Private Limited

1 recruiter

Posted by Anand Singh

Mumbai

3 - 8 yrs

₹7L - ₹18L / yr

PySpark

Data engineering

Big Data

Hadoop

Spark

+4 more

Data Engineer + Integration engineer + Support specialistExp – 5-8 years

Necessary Skills:· SQL & Python / PySpark

· AWS Services: Glue, Appflow, Redshift

· Data warehousing

· Data modelling

Job Description:· Experience of implementing and delivering data solutions and pipelines on AWS Cloud Platform. Design/ implement, and maintain the data architecture for all AWS data services

· A strong understanding of data modelling, data structures, databases (Redshift), and ETL processes

· Work with stakeholders to identify business needs and requirements for data-related projects

Strong SQL and/or Python or PySpark knowledge

· Creating data models that can be used to extract information from various sources & store it in a usable format

· Optimize data models for performance and efficiency

· Write SQL queries to support data analysis and reporting

· Monitor and troubleshoot data pipelines

· Collaborate with software engineers to design and implement data-driven features

· Perform root cause analysis on data issues

· Maintain documentation of the data architecture and ETL processes

· Identifying opportunities to improve performance by improving database structure or indexing methods

· Maintaining existing applications by updating existing code or adding new features to meet new requirements

· Designing and implementing security measures to protect data from unauthorized access or misuse

· Recommending infrastructure changes to improve capacity or performance

Experience in Process industry

Data Engineer + Integration engineer + Support specialistExp – 3-5 years

Necessary Skills:· SQL & Python / PySpark

· AWS Services: Glue, Appflow, Redshift

· Data warehousing basics

· Data modelling basics

Job Description:· Experience of implementing and delivering data solutions and pipelines on AWS Cloud Platform.

· A strong understanding of data modelling, data structures, databases (Redshift)

Strong SQL and/or Python or PySpark knowledge

· Design and implement ETL processes to load data into the data warehouse

· Creating data models that can be used to extract information from various sources & store it in a usable format

· Optimize data models for performance and efficiency

· Write SQL queries to support data analysis and reporting

· Collaborate with team to design and implement data-driven features

· Monitor and troubleshoot data pipelines

· Perform root cause analysis on data issues

· Maintain documentation of the data architecture and ETL processes

· Maintaining existing applications by updating existing code or adding new features to meet new requirements

· Designing and implementing security measures to protect data from unauthorized access or misuse

· Identifying opportunities to improve performance by improving database structure or indexing methods

· Designing and implementing security measures to protect data from unauthorized access or misuse

· Recommending infrastructure changes to improve capacity or performance

Data Engineer + Integration engineer + Support specialistExp – 5-8 years

Necessary Skills:· SQL & Python / PySpark

· AWS Services: Glue, Appflow, Redshift

· Data warehousing

· Data modelling

Job Description:· Experience of implementing and delivering data solutions and pipelines on AWS Cloud Platform. Design/ implement, and maintain the data architecture for all AWS data services

· A strong understanding of data modelling, data structures, databases (Redshift), and ETL processes

· Work with stakeholders to identify business needs and requirements for data-related projects

Strong SQL and/or Python or PySpark knowledge

· Creating data models that can be used to extract information from various sources & store it in a usable format

· Optimize data models for performance and efficiency

· Write SQL queries to support data analysis and reporting

· Monitor and troubleshoot data pipelines

· Collaborate with software engineers to design and implement data-driven features

· Perform root cause analysis on data issues

· Maintain documentation of the data architecture and ETL processes

· Identifying opportunities to improve performance by improving database structure or indexing methods

· Maintaining existing applications by updating existing code or adding new features to meet new requirements

· Designing and implementing security measures to protect data from unauthorized access or misuse

· Recommending infrastructure changes to improve capacity or performance

Experience in Process industry

Data Engineer + Integration engineer + Support specialistExp – 3-5 years

Necessary Skills:· SQL & Python / PySpark

· AWS Services: Glue, Appflow, Redshift

· Data warehousing basics

· Data modelling basics

Job Description:· Experience of implementing and delivering data solutions and pipelines on AWS Cloud Platform.

· A strong understanding of data modelling, data structures, databases (Redshift)

Strong SQL and/or Python or PySpark knowledge

· Design and implement ETL processes to load data into the data warehouse

· Creating data models that can be used to extract information from various sources & store it in a usable format

· Optimize data models for performance and efficiency

· Write SQL queries to support data analysis and reporting

· Collaborate with team to design and implement data-driven features

· Monitor and troubleshoot data pipelines

· Perform root cause analysis on data issues

· Maintain documentation of the data architecture and ETL processes

· Maintaining existing applications by updating existing code or adding new features to meet new requirements

· Designing and implementing security measures to protect data from unauthorized access or misuse

· Identifying opportunities to improve performance by improving database structure or indexing methods

· Designing and implementing security measures to protect data from unauthorized access or misuse

· Recommending infrastructure changes to improve capacity or performance

AWS Data Engineer (Contractual)

at Forward Eye Technologies

Posted by Jaya S

Bengaluru (Bangalore), Mumbai, Delhi, Gurugram, Pune, Hyderabad, Ahmedabad, Chennai

3 - 7 yrs

₹8L - ₹15L / yr

AWS Lambda

Amazon S3

Amazon VPC

Amazon EC2

Amazon Redshift

+3 more

Technical Skills:

Ability to understand and translate business requirements into design.
Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
Experience in creating ETL jobs using Python/PySpark.
Proficiency in creating AWS Lambda functions for event-based jobs.
Knowledge of automating ETL processes using AWS Step Functions.
Competence in building data warehouses and loading data into them.

Responsibilities:

Understand business requirements and translate them into design.
Assess AWS infrastructure needs for development work.
Develop ETL jobs using Python/PySpark to meet requirements.
Implement AWS Lambda for event-based tasks.
Automate ETL processes using AWS Step Functions.
Build data warehouses and manage data loading.
Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.

Technical Skills:

Ability to understand and translate business requirements into design.
Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
Experience in creating ETL jobs using Python/PySpark.
Proficiency in creating AWS Lambda functions for event-based jobs.
Knowledge of automating ETL processes using AWS Step Functions.
Competence in building data warehouses and loading data into them.

Responsibilities:

Understand business requirements and translate them into design.
Assess AWS infrastructure needs for development work.
Develop ETL jobs using Python/PySpark to meet requirements.
Implement AWS Lambda for event-based tasks.
Automate ETL processes using AWS Step Functions.
Build data warehouses and manage data loading.
Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.

Azure Developer

at Numantra Technologies

2 recruiters

Posted by Vandana Saxena

Mumbai, Navi Mumbai

2 - 8 yrs

₹5L - ₹12L / yr

Microsoft Windows Azure

ADF

NumPy

PySpark

Databricks

+1 more

Experience and expertise in using Azure cloud services. Azure certification will be a plus.

- Experience and expertise in Python Development and its different libraries like Pyspark, pandas, NumPy

- Expertise in ADF, Databricks.

- Creating and maintaining data interfaces across a number of different protocols (file, API.).

- Creating and maintaining internal business process solutions to keep our corporate system data in sync and reduce manual processes where appropriate.

- Creating and maintaining monitoring and alerting workflows to improve system transparency.

- Facilitate the development of our Azure cloud infrastructure relative to Data and Application systems.

- Design and lead development of our data infrastructure including data warehouses, data marts, and operational data stores.

- Experience in using Azure services such as ADLS Gen 2, Azure Functions, Azure messaging services, Azure SQL Server, Azure KeyVault, Azure Cognitive services etc.

Hadoop Senior Developer/Data Engineering Developer/ETL Developer

payments bank

Agency job

via Mavin RPO Solutions Pvt. Ltd. by Mahesh Iyer

Navi Mumbai

3 - 5 yrs

₹7L - ₹18L / yr

PySpark

Data engineering

Big Data

Hadoop

Spark

+6 more

Proficiency in shell scripting
Proficiency in automation of tasks
Proficiency in Pyspark/Python
Proficiency in writing and understanding of sqoop
Understanding of CloudEra manager
Good understanding of RDBMS
Good understanding of Excel

Proficiency in shell scripting
Proficiency in automation of tasks
Proficiency in Pyspark/Python
Proficiency in writing and understanding of sqoop
Understanding of CloudEra manager
Good understanding of RDBMS
Good understanding of Excel

Pyspark Lead/Pyspark Dev

at Virtusa

2 recruiters

Agency job

via Response Informatics by Anupama Lavanya Uppala

Chennai, Bengaluru (Bangalore), Mumbai, Hyderabad, Pune

3 - 10 yrs

₹10L - ₹25L / yr

PySpark

Python

Minimum 1 years of relevant experience, in PySpark (mandatory)
Hands on experience in development, test, deploy, maintain and improving data integration pipeline in AWS cloud environment is added plus
Ability to play lead role and independently manage 3-5 member of Pyspark development team
EMR ,Python and PYspark mandate.
Knowledge and awareness working with AWS Cloud technologies like Apache Spark, , Glue, Kafka, Kinesis, and Lambda in S3, Redshift, RDS

Minimum 1 years of relevant experience, in PySpark (mandatory)
Hands on experience in development, test, deploy, maintain and improving data integration pipeline in AWS cloud environment is added plus
Ability to play lead role and independently manage 3-5 member of Pyspark development team
EMR ,Python and PYspark mandate.
Knowledge and awareness working with AWS Cloud technologies like Apache Spark, , Glue, Kafka, Kinesis, and Lambda in S3, Redshift, RDS

Data Engineer with Expertise in ADF

at Numantra Technologies

2 recruiters

Posted by nisha mattas

Remote, Mumbai, powai

2 - 12 yrs

₹8L - ₹18L / yr

ADF

PySpark

Jupyter Notebook

Big Data

Windows Azure

+3 more

Data pre-processing, data transformation, data analysis, and feature engineering
Performance optimization of scripts (code) and Productionizing of code (SQL, Pandas, Python or PySpark, etc.)

Required skills:

Bachelors in - in Computer Science, Data Science, Computer Engineering, IT or equivalent
Fluency in Python (Pandas), PySpark, SQL, or similar
Azure data factory experience (min 12 months)
Able to write efficient code using traditional, OO concepts, modular programming following the SDLC process.
Experience in production optimization and end-to-end performance tracing (technical root cause analysis)
Ability to work independently with demonstrated experience in project or program management
Azure experience ability to translate data scientist code in Python and make it efficient (production) for cloud deployment

Data pre-processing, data transformation, data analysis, and feature engineering
Performance optimization of scripts (code) and Productionizing of code (SQL, Pandas, Python or PySpark, etc.)

Required skills:

Bachelors in - in Computer Science, Data Science, Computer Engineering, IT or equivalent
Fluency in Python (Pandas), PySpark, SQL, or similar
Azure data factory experience (min 12 months)
Able to write efficient code using traditional, OO concepts, modular programming following the SDLC process.
Experience in production optimization and end-to-end performance tracing (technical root cause analysis)
Ability to work independently with demonstrated experience in project or program management
Azure experience ability to translate data scientist code in Python and make it efficient (production) for cloud deployment

Azure Data Engineer

at Fragma Data Systems

8 recruiters

Posted by Evelyn Charles

Remote, Bengaluru (Bangalore), Hyderabad, Chennai, Mumbai, Pune

8 - 15 yrs

₹16L - ₹28L / yr

PySpark

SQL Azure

azure synapse

Windows Azure

Azure Data Engineer

+3 more

Technology Skills:

Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
Experience in migrating on-premise data warehouses to data platforms on AZURE cloud.
Designing and implementing data engineering, ingestion, and transformation functions

Good to Have:

Experience with Azure Analysis Services
Experience in Power BI
Experience with third-party solutions like Attunity/Stream sets, Informatica
Experience with PreSales activities (Responding to RFPs, Executing Quick POCs)
Capacity Planning and Performance Tuning on Azure Stack and Spark.

Technology Skills:

Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
Experience in migrating on-premise data warehouses to data platforms on AZURE cloud.
Designing and implementing data engineering, ingestion, and transformation functions

Good to Have:

Experience with Azure Analysis Services
Experience in Power BI
Experience with third-party solutions like Attunity/Stream sets, Informatica
Experience with PreSales activities (Responding to RFPs, Executing Quick POCs)
Capacity Planning and Performance Tuning on Azure Stack and Spark.

Get to hear about interesting companies hiring right now

Follow Cutshort

Why apply via Cutshort?

Connect with actual hiring teams and get their fast response. No spam.

Find more jobs

Get to hear about interesting companies hiring right now

Follow Cutshort