10+ PySpark Jobs in Mumbai | PySpark Job openings in Mumbai
Apply to 10+ PySpark Jobs in Mumbai on CutShort.io. Explore the latest PySpark Job opportunities across top companies like Google, Amazon & Adobe.


Work Mode: Hybrid
Need B.Tech, BE, M.Tech, ME candidates - Mandatory
Must-Have Skills:
● Educational Qualification :- B.Tech, BE, M.Tech, ME in any field.
● Minimum of 3 years of proven experience as a Data Engineer.
● Strong proficiency in Python programming language and SQL.
● Experience in DataBricks and setting up and managing data pipelines, data warehouses/lakes.
● Good comprehension and critical thinking skills.
● Kindly note Salary bracket will vary according to the exp. of the candidate -
- Experience from 4 yrs to 6 yrs - Salary upto 22 LPA
- Experience from 5 yrs to 8 yrs - Salary upto 30 LPA
- Experience more than 8 yrs - Salary upto 40 LPA

We are looking for a skilled and passionate Data Engineers with a strong foundation in Python programming and hands-on experience working with APIs, AWS cloud, and modern development practices. The ideal candidate will have a keen interest in building scalable backend systems and working with big data tools like PySpark.
Key Responsibilities:
- Write clean, scalable, and efficient Python code.
- Work with Python frameworks such as PySpark for data processing.
- Design, develop, update, and maintain APIs (RESTful).
- Deploy and manage code using GitHub CI/CD pipelines.
- Collaborate with cross-functional teams to define, design, and ship new features.
- Work on AWS cloud services for application deployment and infrastructure.
- Basic database design and interaction with MySQL or DynamoDB.
- Debugging and troubleshooting application issues and performance bottlenecks.
Required Skills & Qualifications:
- 4+ years of hands-on experience with Python development.
- Proficient in Python basics with a strong problem-solving approach.
- Experience with AWS Cloud services (EC2, Lambda, S3, etc.).
- Good understanding of API development and integration.
- Knowledge of GitHub and CI/CD workflows.
- Experience in working with PySpark or similar big data frameworks.
- Basic knowledge of MySQL or DynamoDB.
- Excellent communication skills and a team-oriented mindset.
Nice to Have:
- Experience in containerization (Docker/Kubernetes).
- Familiarity with Agile/Scrum methodologies.
Key Responsibilities:
Design, develop, and optimize scalable data pipelines and ETL processes.
Work with large datasets using GCP services like BigQuery, Dataflow, and Cloud Storage.
Implement real-time data streaming and processing solutions using Pub/Sub and Dataproc.
Collaborate with cross-functional teams to ensure data quality and governance.
Technical Requirements:
4+ years of experience in Data Engineering.
Strong expertise in GCP services like Workflow,tensorflow, Dataproc, and Cloud Storage.
Proficiency in SQL and programming languages such as Python or Java
.Experience in designing and implementing data pipelines
and working with real-time data processing.
Familiarity with CI/CD pipelines and cloud security best practices.
Data Engineer + Integration engineer + Support specialistExp – 5-8 years
Necessary Skills:· SQL & Python / PySpark
· AWS Services: Glue, Appflow, Redshift
· Data warehousing
· Data modelling
Job Description:· Experience of implementing and delivering data solutions and pipelines on AWS Cloud Platform. Design/ implement, and maintain the data architecture for all AWS data services
· A strong understanding of data modelling, data structures, databases (Redshift), and ETL processes
· Work with stakeholders to identify business needs and requirements for data-related projects
Strong SQL and/or Python or PySpark knowledge
· Creating data models that can be used to extract information from various sources & store it in a usable format
· Optimize data models for performance and efficiency
· Write SQL queries to support data analysis and reporting
· Monitor and troubleshoot data pipelines
· Collaborate with software engineers to design and implement data-driven features
· Perform root cause analysis on data issues
· Maintain documentation of the data architecture and ETL processes
· Identifying opportunities to improve performance by improving database structure or indexing methods
· Maintaining existing applications by updating existing code or adding new features to meet new requirements
· Designing and implementing security measures to protect data from unauthorized access or misuse
· Recommending infrastructure changes to improve capacity or performance
Experience in Process industry
Data Engineer + Integration engineer + Support specialistExp – 3-5 years
Necessary Skills:· SQL & Python / PySpark
· AWS Services: Glue, Appflow, Redshift
· Data warehousing basics
· Data modelling basics
Job Description:· Experience of implementing and delivering data solutions and pipelines on AWS Cloud Platform.
· A strong understanding of data modelling, data structures, databases (Redshift)
Strong SQL and/or Python or PySpark knowledge
· Design and implement ETL processes to load data into the data warehouse
· Creating data models that can be used to extract information from various sources & store it in a usable format
· Optimize data models for performance and efficiency
· Write SQL queries to support data analysis and reporting
· Collaborate with team to design and implement data-driven features
· Monitor and troubleshoot data pipelines
· Perform root cause analysis on data issues
· Maintain documentation of the data architecture and ETL processes
· Maintaining existing applications by updating existing code or adding new features to meet new requirements
· Designing and implementing security measures to protect data from unauthorized access or misuse
· Identifying opportunities to improve performance by improving database structure or indexing methods
· Designing and implementing security measures to protect data from unauthorized access or misuse
· Recommending infrastructure changes to improve capacity or performance
Technical Skills:
- Ability to understand and translate business requirements into design.
- Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
- Experience in creating ETL jobs using Python/PySpark.
- Proficiency in creating AWS Lambda functions for event-based jobs.
- Knowledge of automating ETL processes using AWS Step Functions.
- Competence in building data warehouses and loading data into them.
Responsibilities:
- Understand business requirements and translate them into design.
- Assess AWS infrastructure needs for development work.
- Develop ETL jobs using Python/PySpark to meet requirements.
- Implement AWS Lambda for event-based tasks.
- Automate ETL processes using AWS Step Functions.
- Build data warehouses and manage data loading.
- Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.
- Experience and expertise in Python Development and its different libraries like Pyspark, pandas, NumPy
- Expertise in ADF, Databricks.
- Creating and maintaining data interfaces across a number of different protocols (file, API.).
- Creating and maintaining internal business process solutions to keep our corporate system data in sync and reduce manual processes where appropriate.
- Creating and maintaining monitoring and alerting workflows to improve system transparency.
- Facilitate the development of our Azure cloud infrastructure relative to Data and Application systems.
- Design and lead development of our data infrastructure including data warehouses, data marts, and operational data stores.
- Experience in using Azure services such as ADLS Gen 2, Azure Functions, Azure messaging services, Azure SQL Server, Azure KeyVault, Azure Cognitive services etc.
- Proficiency in shell scripting
- Proficiency in automation of tasks
- Proficiency in Pyspark/Python
- Proficiency in writing and understanding of sqoop
- Understanding of CloudEra manager
- Good understanding of RDBMS
- Good understanding of Excel

- Minimum 1 years of relevant experience, in PySpark (mandatory)
- Hands on experience in development, test, deploy, maintain and improving data integration pipeline in AWS cloud environment is added plus
- Ability to play lead role and independently manage 3-5 member of Pyspark development team
- EMR ,Python and PYspark mandate.
- Knowledge and awareness working with AWS Cloud technologies like Apache Spark, , Glue, Kafka, Kinesis, and Lambda in S3, Redshift, RDS
- Data pre-processing, data transformation, data analysis, and feature engineering
- Performance optimization of scripts (code) and Productionizing of code (SQL, Pandas, Python or PySpark, etc.)
- Required skills:
- Bachelors in - in Computer Science, Data Science, Computer Engineering, IT or equivalent
- Fluency in Python (Pandas), PySpark, SQL, or similar
- Azure data factory experience (min 12 months)
- Able to write efficient code using traditional, OO concepts, modular programming following the SDLC process.
- Experience in production optimization and end-to-end performance tracing (technical root cause analysis)
- Ability to work independently with demonstrated experience in project or program management
- Azure experience ability to translate data scientist code in Python and make it efficient (production) for cloud deployment
- Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
- Experience in migrating on-premise data warehouses to data platforms on AZURE cloud.
- Designing and implementing data engineering, ingestion, and transformation functions
- Experience with Azure Analysis Services
- Experience in Power BI
- Experience with third-party solutions like Attunity/Stream sets, Informatica
- Experience with PreSales activities (Responding to RFPs, Executing Quick POCs)
- Capacity Planning and Performance Tuning on Azure Stack and Spark.