30+ PySpark Jobs in Pune | PySpark Job openings in Pune
Apply to 30+ PySpark Jobs in Pune on CutShort.io. Explore the latest PySpark Job opportunities across top companies like Google, Amazon & Adobe.


Work Mode: Hybrid
Need B.Tech, BE, M.Tech, ME candidates - Mandatory
Must-Have Skills:
● Educational Qualification :- B.Tech, BE, M.Tech, ME in any field.
● Minimum of 3 years of proven experience as a Data Engineer.
● Strong proficiency in Python programming language and SQL.
● Experience in DataBricks and setting up and managing data pipelines, data warehouses/lakes.
● Good comprehension and critical thinking skills.
● Kindly note Salary bracket will vary according to the exp. of the candidate -
- Experience from 4 yrs to 6 yrs - Salary upto 22 LPA
- Experience from 5 yrs to 8 yrs - Salary upto 30 LPA
- Experience more than 8 yrs - Salary upto 40 LPA


About Data Axle:
Data Axle Inc. has been an industry leader in data, marketing solutions, sales and research for over 45 years in the USA. Data Axle has set up a strategic global center of excellence in Pune. This center delivers mission critical data services to its global customers powered by its proprietary cloud-based technology platform and by leveraging proprietary business & consumer databases. Data Axle is headquartered in Dallas, TX, USA.
Roles and Responsibilities:
- Design, implement, and manage scalable analytical data infrastructure, enabling efficient access to large datasets and high-performance computing on Google Cloud Platform (GCP).
- Develop and optimize data pipelines using GCP-native services like BigQuery, Dataflow, Dataproc, Pub/Sub, Cloud Data Fusion, and Cloud Storage.
- Work with diverse data sources to extract, transform, and load data into enterprise-grade data lakes and warehouses, ensuring high availability and reliability.
- Implement and maintain real-time data streaming solutions using Pub/Sub, Dataflow, and Kafka.
- Research and integrate the latest big data and visualization technologies to enhance analytics capabilities and improve efficiency.
- Collaborate with cross-functional teams to implement machine learning models and AI-driven analytics solutions using Vertex AI and BigQuery ML.
- Continuously improve existing data architectures to support scalability, performance optimization, and cost efficiency.
- Enhance data security and governance by implementing industry best practices for access control, encryption, and compliance.
- Automate and optimize data workflows to simplify reporting, dashboarding, and self-service analytics using Looker and Data Studio.
Basic Qualifications
- 7+ years of experience in data engineering, software development, business intelligence, or data science, with expertise in large-scale data processing and analytics.
- Strong proficiency in SQL and experience with BigQuery for data warehousing.
- Hands-on experience in designing and developing ETL/ELT pipelines using GCP services (Cloud Composer, Dataflow, Dataproc, Data Fusion, or Apache Airflow).
- Expertise in distributed computing and big data processing frameworks, such as Apache Spark, Hadoop, or Flink, particularly within Dataproc and Dataflow environments.
- Experience with business intelligence and data visualization tools, such as Looker, Tableau, or Power BI.
- Knowledge of data governance, security best practices, and compliance requirements in cloud environments.
Preferred Qualifications:
- Degree/Diploma in Computer Science, Engineering, Mathematics, or a related technical field.
- Experience working with GCP big data technologies, including BigQuery, Dataflow, Dataproc, Pub/Sub, and Cloud SQL.
- Hands-on experience with real-time data processing frameworks, including Kafka and Apache Beam.
- Proficiency in Python, Java, or Scala for data engineering and pipeline development.
- Familiarity with DevOps best practices, CI/CD pipelines, Terraform, and infrastructure-as-code for managing GCP resources.
- Experience integrating AI/ML models into data workflows, leveraging BigQuery ML, Vertex AI, or TensorFlow.
- Understanding of Agile methodologies, software development life cycle (SDLC), and cloud cost optimization strategies.


Roles & Responsibilities:
We are looking for a Data Scientist to join the Data Science Client Services team to continue our success of identifying high quality target audiences that generate profitable marketing return for our clients. We are looking for experienced data science, machine learning and MLOps practitioners to design, build and deploy impactful predictive marketing solutions that serve a wide range of verticals and clients. The right candidate will enjoy contributing to and learning from a highly talented team and working on a variety of projects.
We are looking for a Manager Data Scientist who will be responsible for
- Ownership of design, implementation, and deployment of machine learning algorithms in a modern Python-based cloud architecture
- Design or enhance ML workflows for data ingestion, model design, model inference and scoring 3. Oversight on team project execution and delivery
- Establish peer review guidelines for high quality coding to help develop junior team members’ skill set growth, cross-training, and team efficiencies
- Visualize and publish model performance results and insights to internal and external audiences
Qualifications:
- Masters in a relevant quantitative, applied field (Statistics, Econometrics, Computer Science, Mathematics, Engineering)
- Minimum of 12+ years of work experience in the end-to-end lifecycle of ML model development and deployment into production within a cloud infrastructure (Databricks is highly preferred) 3. Proven ability to manage the output of a small team in a fast-paced environment and to lead by example in the fulfilment of client requests
- Exhibit deep knowledge of core mathematical principles relating to data science and machine learning (ML Theory + Best Practices, Feature Engineering and Selection, Supervised and Unsupervised ML, A/B Testing, etc.)
- Proficiency in Python and SQL required; PySpark/Spark experience a plus
- Ability to conduct a productive peer review and proper code structure in Github
- Proven experience developing, testing, and deploying various ML algorithms (neural networks, XGBoost, Bayes, and the like)
- Working knowledge of modern CI/CD methods
This position description is intended to describe the duties most frequently performed by an individual in this position. It is not intended to be a complete list of assigned duties but to describe a position level.

We are looking for a skilled and passionate Data Engineers with a strong foundation in Python programming and hands-on experience working with APIs, AWS cloud, and modern development practices. The ideal candidate will have a keen interest in building scalable backend systems and working with big data tools like PySpark.
Key Responsibilities:
- Write clean, scalable, and efficient Python code.
- Work with Python frameworks such as PySpark for data processing.
- Design, develop, update, and maintain APIs (RESTful).
- Deploy and manage code using GitHub CI/CD pipelines.
- Collaborate with cross-functional teams to define, design, and ship new features.
- Work on AWS cloud services for application deployment and infrastructure.
- Basic database design and interaction with MySQL or DynamoDB.
- Debugging and troubleshooting application issues and performance bottlenecks.
Required Skills & Qualifications:
- 4+ years of hands-on experience with Python development.
- Proficient in Python basics with a strong problem-solving approach.
- Experience with AWS Cloud services (EC2, Lambda, S3, etc.).
- Good understanding of API development and integration.
- Knowledge of GitHub and CI/CD workflows.
- Experience in working with PySpark or similar big data frameworks.
- Basic knowledge of MySQL or DynamoDB.
- Excellent communication skills and a team-oriented mindset.
Nice to Have:
- Experience in containerization (Docker/Kubernetes).
- Familiarity with Agile/Scrum methodologies.
About NonStop io Technologies:
NonStop io Technologies is a value-driven company with a strong focus on process-oriented software engineering. We specialize in Product Development and have a decade's worth of experience in building web and mobile applications across various domains. NonStop io Technologies follows core principles that guide its operations and believes in staying invested in a product's vision for the long term. We are a small but proud group of individuals who believe in the 'givers gain' philosophy and strive to provide value in order to seek value. We are committed to and specialize in building cutting-edge technology products and serving as trusted technology partners for startups and enterprises. We pride ourselves on fostering innovation, learning, and community engagement. Join us to work on impactful projects in a collaborative and vibrant environment.
Brief Description:
We are looking for a talented Data Engineer to join our team. In this role, you will design, implement, and manage data pipelines, ensuring the accessibility and reliability of data for critical business processes. This is an exciting opportunity to work on scalable solutions that power data-driven decisions
Skillset:
Here is a list of some of the technologies you will work with (the list below is not set in stone)
Data Pipeline Orchestration and Execution:
● AWS Glue
● AWS Step Functions
● Databricks Change
Data Capture:
● Amazon Database Migration Service
● Amazon Managed Streaming for Apache Kafka with Debezium Plugin
Batch:
● AWS step functions (and Glue Jobs)
● Asynchronous queueing of batch job commands with RabbitMQ to various “ETL Jobs”
● Cron and subervisord processing on dedicated job server(s): Python & PHP
Streaming:
● Real-time processing via AWS MSK (Kafka), Apache Hudi, & Apache Flink
● Near real-time processing via worker (listeners) spread over AWS Lambda, custom server (daemons) written in Python and PHP Symfony
● Languages: Python & PySpark, Unix Shell, PHP Symfony (with Doctrine ORM)
● Monitoring & Reliability: Datadog & Cloudwatch
Things you will do:
● Build dashboards using Datadog and Cloudwatch to ensure system health and user support
● Build schema registries that enable data governance
● Partner with end-users to resolve service disruptions and evangelize our data product offerings
● Vigilantly oversee data quality and alert upstream data producers of issues
● Support and contribute to the data platform architecture strategy, roadmap, and implementation plans to support the company’s data-driven initiatives and business objective
● Work with Business Intelligence (BI) consumers to deliver enterprise-wide fact and dimension data product tables to enable data-driven decision-making across the organization.
● Other duties as assigned

We are looking for a Senior Data Engineer with strong expertise in GCP, Databricks, and Airflow to design and implement a GCP Cloud Native Data Processing Framework. The ideal candidate will work on building scalable data pipelines and help migrate existing workloads to a modern framework.
- Shift: 2 PM 11 PM
- Work Mode: Hybrid (3 days a week) across Xebia locations
- Notice Period: Immediate joiners or those with a notice period of up to 30 days
Key Responsibilities:
- Design and implement a GCP Native Data Processing Framework leveraging Spark and GCP Cloud Services.
- Develop and maintain data pipelines using Databricks and Airflow for transforming Raw → Silver → Gold data layers.
- Ensure data integrity, consistency, and availability across all systems.
- Collaborate with data engineers, analysts, and stakeholders to optimize performance.
- Document standards and best practices for data engineering workflows.
Required Experience:
- 7-8 years of experience in data engineering, architecture, and pipeline development.
- Strong knowledge of GCP, Databricks, PySpark, and BigQuery.
- Experience with Orchestration tools like Airflow, Dagster, or GCP equivalents.
- Understanding of Data Lake table formats (Delta, Iceberg, etc.).
- Proficiency in Python for scripting and automation.
- Strong problem-solving skills and collaborative mindset.
⚠️ Please apply only if you have not applied recently or are not currently in the interview process for any open roles at Xebia.
Looking forward to your response!
Best regards,
Vijay S
Assistant Manager - TAG
Here is the Job Description -
Location -- Viman Nagar, Pune
Mode - 5 Days Working
Required Tech Skills:
● Strong at PySpark, Python
● Good understanding of Data Structure
● Good at SQL query/optimization
● Strong fundamentals of OOPs programming
● Good understanding of AWS Cloud, Big Data.
● Data Lake, AWS Glue, Athena, S3, Kinesis, SQL/NoSQL DB

About the Role:
We are seeking a skilled Python Backend Developer to join our dynamic team. This role focuses on designing, building, and maintaining efficient, reusable, and reliable code that supports both monolithic and microservices architectures. The ideal candidate will have a strong understanding of backend frameworks and architectures, proficiency in asynchronous programming, and familiarity with deployment processes. Experience with AI model deployment is a plus.
Overall 5+ years of IT experience with minimum of 5+ Yrs of experience on Python and in Opensource web framework (Django) with AWS Experience.
Key Responsibilities:
- Develop, optimize, and maintain backend systems using Python, Pyspark, and FastAPI.
- Design and implement scalable architectures, including both monolithic and microservices.
-3+ Years of working experience in AWS (Lambda, Serverless, Step Function and EC2)
-Deep Knowledge on Python Flask/Django Framework
-Good understanding of REST API’s
-Sound Knowledge on Database
-Excellent problem-solving and analytical skills
-Leadership Skills, Good Communication Skills, interested to learn modern technologies
- Apply design patterns (MVC, Singleton, Observer, Factory) to solve complex problems effectively.
- Work with web servers (Nginx, Apache) and deploy web applications and services.
- Create and manage RESTful APIs; familiarity with GraphQL is a plus.
- Use asynchronous programming techniques (ASGI, WSGI, async/await) to enhance performance.
- Integrate background job processing with Celery and RabbitMQ, and manage caching mechanisms using Redis and Memcached.
- (Optional) Develop containerized applications using Docker and orchestrate deployments with Kubernetes.
Required Skills:
- Languages & Frameworks:Python, Django, AWS
- Backend Architecture & Design:Strong knowledge of monolithic and microservices architectures, design patterns, and asynchronous programming.
- Web Servers & Deployment:Proficient in Nginx and Apache, and experience in RESTful API design and development. GraphQL experience is a plus.
-Background Jobs & Task Queues: Proficiency in Celery and RabbitMQ, with experience in caching (Redis, Memcached).
- Additional Qualifications: Knowledge of Docker and Kubernetes (optional), with any exposure to AI model deployment considered a bonus.
Qualifications:
- Bachelor’s degree in Computer Science, Engineering, or a related field.
- 5+ years of experience in backend development using Python and Django and AWS.
- Demonstrated ability to design and implement scalable and robust architectures.
- Strong problem-solving skills, attention to detail, and a collaborative mindset.
Preferred:
- Experience with Docker/Kubernetes for containerization and orchestration.
- Exposure to AI model deployment processes.


About Data Axle:
Data Axle Inc. has been an industry leader in data, marketing solutions, sales and research for over 50 years in the USA. Data Axle now as an established strategic global centre of excellence in Pune. This centre delivers mission critical data services to its global customers powered by its proprietary cloud-based technology platform and by leveraging proprietary business & consumer databases.
Data Axle Pune is pleased to have achieved certification as a Great Place to Work!
Roles & Responsibilities:
We are looking for a Senior Data Scientist to join the Data Science Client Services team to continue our success of identifying high quality target audiences that generate profitable marketing return for our clients. We are looking for experienced data science, machine learning and MLOps practitioners to design, build and deploy impactful predictive marketing solutions that serve a wide range of verticals and clients. The right candidate will enjoy contributing to and learning from a highly talented team and working on a variety of projects.
We are looking for a Senior Data Scientist who will be responsible for:
- Ownership of design, implementation, and deployment of machine learning algorithms in a modern Python-based cloud architecture
- Design or enhance ML workflows for data ingestion, model design, model inference and scoring
- Oversight on team project execution and delivery
- Establish peer review guidelines for high quality coding to help develop junior team members’ skill set growth, cross-training, and team efficiencies
- Visualize and publish model performance results and insights to internal and external audiences
Qualifications:
- Masters in a relevant quantitative, applied field (Statistics, Econometrics, Computer Science, Mathematics, Engineering)
- Minimum of 5 years of work experience in the end-to-end lifecycle of ML model development and deployment into production within a cloud infrastructure (Databricks is highly preferred)
- Proven ability to manage the output of a small team in a fast-paced environment and to lead by example in the fulfilment of client requests
- Exhibit deep knowledge of core mathematical principles relating to data science and machine learning (ML Theory + Best Practices, Feature Engineering and Selection, Supervised and Unsupervised ML, A/B Testing, etc.)
- Proficiency in Python and SQL required; PySpark/Spark experience a plus
- Ability to conduct a productive peer review and proper code structure in Github
- Proven experience developing, testing, and deploying various ML algorithms (neural networks, XGBoost, Bayes, and the like)
- Working knowledge of modern CI/CD methods This position description is intended to describe the duties most frequently performed by an individual in this position.
It is not intended to be a complete list of assigned duties but to describe a position level.
Job Description :
Job Title : Data Engineer
Location : Pune (Hybrid Work Model)
Experience Required : 4 to 8 Years
Role Overview :
We are seeking talented and driven Data Engineers to join our team in Pune. The ideal candidate will have a strong background in data engineering with expertise in Python, PySpark, and SQL. You will be responsible for designing, building, and maintaining scalable data pipelines and systems that empower our business intelligence and analytics initiatives.
Key Responsibilities:
- Develop, optimize, and maintain ETL pipelines and data workflows.
- Design and implement scalable data solutions using Python, PySpark, and SQL.
- Collaborate with cross-functional teams to gather and analyze data requirements.
- Ensure data quality, integrity, and security throughout the data lifecycle.
- Monitor and troubleshoot data pipelines to ensure reliability and performance.
- Work on hybrid data environments involving on-premise and cloud-based systems.
- Assist in the deployment and maintenance of big data solutions.
Required Skills and Qualifications:
- Bachelor’s degree in Computer Science, Information Technology, or related field.
- 4 to 8 Years of experience in Data Engineering or related roles.
- Proficiency in Python and PySpark for data processing and analysis.
- Strong SQL skills with experience in writing complex queries and optimizing performance.
- Familiarity with data pipeline tools and frameworks.
- Knowledge of cloud platforms such as AWS, Azure, or GCP is a plus.
- Excellent problem-solving and analytical skills.
- Strong communication and teamwork abilities.
Preferred Qualifications:
- Experience with big data technologies like Hadoop, Hive, or Spark.
- Familiarity with data visualization tools and techniques.
- Knowledge of CI/CD pipelines and DevOps practices in a data engineering context.
Work Model:
- This position follows a hybrid work model, with candidates expected to work from the Pune office as per business needs.
Why Join Us?
- Opportunity to work with cutting-edge technologies.
- Collaborative and innovative work environment.
- Competitive compensation and benefits.
- Clear career progression and growth opportunities.

TVARIT GmbH develops and delivers solutions in the field of artificial intelligence (AI) for the Manufacturing, automotive, and process industries. With its software products, TVARIT makes it possible for its customers to make intelligent and well-founded decisions, e.g., in forward-looking Maintenance, increasing the OEE and predictive quality. We have renowned reference customers, competent technology, a good research team from renowned Universities, and the award of a renowned AI prize (e.g., EU Horizon 2020) which makes Tvarit one of the most innovative AI companies in Germany and Europe.
We are looking for a self-motivated person with a positive "can-do" attitude and excellent oral and written communication skills in English.
We are seeking a skilled and motivated Data Engineer from the manufacturing Industry with over two years of experience to join our team. As a data engineer, you will be responsible for designing, building, and maintaining the infrastructure required for the collection, storage, processing, and analysis of large and complex data sets. The ideal candidate will have a strong foundation in ETL pipelines and Python, with additional experience in Azure and Terraform being a plus. This role requires a proactive individual who can contribute to our data infrastructure and support our analytics and data science initiatives.
Skills Required
- Experience in the manufacturing industry (metal industry is a plus)
- 2+ years of experience as a Data Engineer
- Experience in data cleaning & structuring and data manipulation
- ETL Pipelines: Proven experience in designing, building, and maintaining ETL pipelines.
- Python: Strong proficiency in Python programming for data manipulation, transformation, and automation.
- Experience in SQL and data structures
- Knowledge in big data technologies such as Spark, Flink, Hadoop, Apache and NoSQL databases.
- Knowledge of cloud technologies (at least one) such as AWS, Azure, and Google Cloud Platform.
- Proficient in data management and data governance
- Strong analytical and problem-solving skills.
- Excellent communication and teamwork abilities.
Nice To Have
- Azure: Experience with Azure data services (e.g., Azure Data Factory, Azure Databricks, Azure SQL Database).
- Terraform: Knowledge of Terraform for infrastructure as code (IaC) to manage cloud.

Greetings , Wissen Technology is Hiring for the position of Data Engineer
Please find the Job Description for your Reference:
JD
- Design, develop, and maintain data pipelines on AWS EMR (Elastic MapReduce) to support data processing and analytics.
- Implement data ingestion processes from various sources including APIs, databases, and flat files.
- Optimize and tune big data workflows for performance and scalability.
- Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions.
- Manage and monitor EMR clusters, ensuring high availability and reliability.
- Develop ETL (Extract, Transform, Load) processes to cleanse, transform, and store data in data lakes and data warehouses.
- Implement data security best practices to ensure data is protected and compliant with relevant regulations.
- Create and maintain technical documentation related to data pipelines, workflows, and infrastructure.
- Troubleshoot and resolve issues related to data processing and EMR cluster performance.
Qualifications:
- Bachelor’s degree in Computer Science, Information Technology, or a related field.
- 5+ years of experience in data engineering, with a focus on big data technologies.
- Strong experience with AWS services, particularly EMR, S3, Redshift, Lambda, and Glue.
- Proficiency in programming languages such as Python, Java, or Scala.
- Experience with big data frameworks and tools such as Hadoop, Spark, Hive, and Pig.
- Solid understanding of data modeling, ETL processes, and data warehousing concepts.
- Experience with SQL and NoSQL databases.
- Familiarity with CI/CD pipelines and version control systems (e.g., Git).
- Strong problem-solving skills and the ability to work independently and collaboratively in a team environment

Sr. Data Engineer (Data Warehouse-Snowflake)
Experience: 5+yrs
Location: Pune (Hybrid)
As a Senior Data engineer with Snowflake expertise you are a subject matter expert who is curious and an innovative thinker to mentor young professionals. You are a key person to convert Vision and Data Strategy for Data solutions and deliver them. With your knowledge you will help create data-driven thinking within the organization, not just within Data teams, but also in the wider stakeholder community.
Skills Preferred
- Advanced written, verbal, and analytic skills, and demonstrated ability to influence and facilitate sustained change. Ability to convey information clearly and concisely to all levels of staff and management about programs, services, best practices, strategies, and organizational mission and values.
- Proven ability to focus on priorities, strategies, and vision.
- Very Good understanding in Data Foundation initiatives, like Data Modelling, Data Quality Management, Data Governance, Data Maturity Assessments and Data Strategy in support of the key business stakeholders.
- Actively deliver the roll-out and embedding of Data Foundation initiatives in support of the key business programs advising on the technology and using leading market standard tools.
- Coordinate the change management process, incident management and problem management process.
- Ensure traceability of requirements from Data through testing and scope changes, to training and transition.
- Drive implementation efficiency and effectiveness across the pilots and future projects to minimize cost, increase speed of implementation and maximize value delivery
Knowledge Preferred
- Extensive knowledge and hands on experience with Snowflake and its different components like User/Group, Data Store/ Warehouse management, External Stage/table, working with semi structured data, Snowpipe etc.
- Implement and manage CI/CD for migrating and deploying codes to higher environments with Snowflake codes.
- Proven experience with Snowflake Access control and authentication, data security, data sharing, working with VS Code extension for snowflake, replication, and failover, optimizing SQL, analytical ability to troubleshoot and debug on development and production issues quickly is key for success in this role.
- Proven technology champion in working with relational, Data warehouses databases, query authoring (SQL) as well as working familiarity with a variety of databases.
- Highly Experienced in building and optimizing complex queries. Good with manipulating, processing, and extracting value from large, disconnected datasets.
- Your experience in handling big data sets and big data technologies will be an asset.
- Proven champion with in-depth knowledge of any one of the scripting languages: Python, SQL, Pyspark.
Primary responsibilities
- You will be an asset in our team bringing deep technical skills and capabilities to become a key part of projects defining the data journey in our company, keen to engage, network and innovate in collaboration with company wide teams.
- Collaborate with the data and analytics team to develop and maintain a data model and data governance infrastructure using a range of different storage technologies that enables optimal data storage and sharing using advanced methods.
- Support the development of processes and standards for data mining, data modeling and data protection.
- Design and implement continuous process improvements for automating manual processes and optimizing data delivery.
- Assess and report on the unique data needs of key stakeholders and troubleshoot any data-related technical issues through to resolution.
- Work to improve data models that support business intelligence tools, improve data accessibility and foster data-driven decision making.
- Ensure traceability of requirements from Data through testing and scope changes, to training and transition.
- Manage and lead technical design and development activities for implementation of large-scale data solutions in Snowflake to support multiple use cases (transformation, reporting and analytics, data monetization, etc.).
- Translate advanced business data, integration and analytics problems into technical approaches that yield actionable recommendations, across multiple, diverse domains; communicate results and educate others through design and build of insightful presentations.
- Exhibit strong knowledge of the Snowflake ecosystem and can clearly articulate the value proposition of cloud modernization/transformation to a wide range of stakeholders.
Relevant work experience
Bachelors in a Science, Technology, Engineering, Mathematics or Computer Science discipline or equivalent with 7+ Years of experience in enterprise-wide data warehousing, governance, policies, procedures, and implementation.
Aptitude for working with data, interpreting results, business intelligence and analytic best practices.
Business understanding
Good knowledge and understanding of Consumer and industrial products sector and IoT.
Good functional understanding of solutions supporting business processes.
Skill Must have
- Snowflake 5+ years
- Overall different Data warehousing techs 5+ years
- SQL 5+ years
- Data warehouse designing experience 3+ years
- Experience with cloud and on-prem hybrid models in data architecture
- Knowledge of Data Governance and strong understanding of data lineage and data quality
- Programming & Scripting: Python, Pyspark
- Database technologies such as Traditional RDBMS (MS SQL Server, Oracle, MySQL, PostgreSQL)
Nice to have
- Demonstrated experience in modern enterprise data integration platforms such as Informatica
- AWS cloud services: S3, Lambda, Glue and Kinesis and API Gateway, EC2, EMR, RDS, Redshift and Kinesis
- Good understanding of Data Architecture approaches
- Experience in designing and building streaming data ingestion, analysis and processing pipelines using Kafka, Kafka Streams, Spark Streaming, Stream sets and similar cloud native technologies.
- Experience with implementation of operations concerns for a data platform such as monitoring, security, and scalability
- Experience working in DevOps, Agile, Scrum, Continuous Delivery and/or Rapid Application Development environments
- Building mock and proof-of-concepts across different capabilities/tool sets exposure
- Experience working with structured, semi-structured, and unstructured data, extracting information, and identifying linkages across disparate data sets
Technical Skills:
- Ability to understand and translate business requirements into design.
- Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
- Experience in creating ETL jobs using Python/PySpark.
- Proficiency in creating AWS Lambda functions for event-based jobs.
- Knowledge of automating ETL processes using AWS Step Functions.
- Competence in building data warehouses and loading data into them.
Responsibilities:
- Understand business requirements and translate them into design.
- Assess AWS infrastructure needs for development work.
- Develop ETL jobs using Python/PySpark to meet requirements.
- Implement AWS Lambda for event-based tasks.
- Automate ETL processes using AWS Step Functions.
- Build data warehouses and manage data loading.
- Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.
Publicis Sapient Overview:
The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
.
Job Summary:
As Senior Associate L2 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. You are also required to have hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms.
Role & Responsibilities:
Your role is focused on Design, Development and delivery of solutions involving:
• Data Integration, Processing & Governance
• Data Storage and Computation Frameworks, Performance Optimizations
• Analytics & Visualizations
• Infrastructure & Cloud Computing
• Data Management Platforms
• Implement scalable architectural models for data processing and storage
• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode
• Build functionality for data analytics, search and aggregation
Experience Guidelines:
Mandatory Experience and Competencies:
# Competency
1.Overall 5+ years of IT experience with 3+ years in Data related technologies
2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)
3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline.
4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable
5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc
6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security
Preferred Experience and Knowledge (Good to Have):
# Competency
1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience
2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc
3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures
4.Performance tuning and optimization of data pipelines
5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality
6.Cloud data specialty and other related Big data technology certifications
Personal Attributes:
• Strong written and verbal communication skills
• Articulation skills
• Good team player
• Self-starter who requires minimal oversight
• Ability to prioritize and manage multiple tasks
• Process orientation and the ability to define and set up processes
Publicis Sapient Overview:
The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
.
Job Summary:
As Senior Associate L1 in Data Engineering, you will do technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. Having hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms will be preferable.
Role & Responsibilities:
Job Title: Senior Associate L1 – Data Engineering
Your role is focused on Design, Development and delivery of solutions involving:
• Data Ingestion, Integration and Transformation
• Data Storage and Computation Frameworks, Performance Optimizations
• Analytics & Visualizations
• Infrastructure & Cloud Computing
• Data Management Platforms
• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time
• Build functionality for data analytics, search and aggregation
Experience Guidelines:
Mandatory Experience and Competencies:
# Competency
1.Overall 3.5+ years of IT experience with 1.5+ years in Data related technologies
2.Minimum 1.5 years of experience in Big Data technologies
3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline. Working knowledge on real-time data pipelines is added advantage.
4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable
5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc
Preferred Experience and Knowledge (Good to Have):
# Competency
1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience
2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc
3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures
4.Performance tuning and optimization of data pipelines
5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality
6.Working knowledge with data platform related services on at least 1 cloud platform, IAM and data security
7.Cloud data specialty and other related Big data technology certifications
Job Title: Senior Associate L1 – Data Engineering
Personal Attributes:
• Strong written and verbal communication skills
• Articulation skills
• Good team player
• Self-starter who requires minimal oversight
• Ability to prioritize and manage multiple tasks
• Process orientation and the ability to define and set up processes


· The Objective:
You will play a crucial role in designing, implementing, and maintaining our data infrastructure, run tests and update the systems
· Job function and requirements
o Expert in Python, Pandas and Numpy with knowledge of Python web Framework such as Django and Flask.
o Able to integrate multiple data sources and databases into one system.
o Basic understanding of frontend technologies like HTML, CSS, JavaScript.
o Able to build data pipelines.
o Strong unit test and debugging skills.
o Understanding of fundamental design principles behind a scalable application
o Good understanding of RDBMS databases among Mysql or Postgresql.
o Able to analyze and transform raw data.
· About us
Mitibase helps companies find warm prospects every month that are most relevant, and then helps their team to act on those with automation. We do so by automatically tracking key accounts and contacts for job changes and relationships triggers and surfaces them as warm leads in your sales pipeline.
good exposure to concepts and/or technology across the broader spectrum. Enterprise Risk Technology
covers a variety of existing systems and green-field projects.
A Full stack Hadoop development experience with Scala development
A Full stack Java development experience covering Core Java (including JDK 1.8) and good understanding
of design patterns.
Requirements:-
• Strong hands-on development in Java technologies.
• Strong hands-on development in Hadoop technologies like Spark, Scala and experience on Avro.
• Participation in product feature design and documentation
• Requirement break-up, ownership and implantation.
• Product BAU deliveries and Level 3 production defects fixes.
Qualifications & Experience
• Degree holder in numerate subject
• Hands on Experience on Hadoop, Spark, Scala, Impala, Avro and messaging like Kafka
• Experience across a core compiled language – Java
• Proficiency in Java related frameworks like Springs, Hibernate, JPA
• Hands on experience in JDK 1.8 and strong skillset covering Collections, Multithreading with
For internal use only
For internal use only
experience working on Distributed applications.
• Strong hands-on development track record with end-to-end development cycle involvement
• Good exposure to computational concepts
• Good communication and interpersonal skills
• Working knowledge of risk and derivatives pricing (optional)
• Proficiency in SQL (PL/SQL), data modelling.
• Understanding of Hadoop architecture and Scala program language is a good to have.

consulting & implementation services in the area of Oil & Gas, Mining and Manufacturing Industry

- Data Engineer
Required skill set: AWS GLUE, AWS LAMBDA, AWS SNS/SQS, AWS ATHENA, SPARK, SNOWFLAKE, PYTHON
Mandatory Requirements
- Experience in AWS Glue
- Experience in Apache Parquet
- Proficient in AWS S3 and data lake
- Knowledge of Snowflake
- Understanding of file-based ingestion best practices.
- Scripting language - Python & pyspark
CORE RESPONSIBILITIES
- Create and manage cloud resources in AWS
- Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies
- Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform
- Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations
- Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
- Define process improvement opportunities to optimize data collection, insights and displays.
- Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible
- Identify and interpret trends and patterns from complex data sets
- Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.
- Key participant in regular Scrum ceremonies with the agile teams
- Proficient at developing queries, writing reports and presenting findings
- Mentor junior members and bring best industry practices
QUALIFICATIONS
- 5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)
- Strong background in math, statistics, computer science, data science or related discipline
- Advanced knowledge one of language: Java, Scala, Python, C#
- Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake
- Proficient with
- Data mining/programming tools (e.g. SAS, SQL, R, Python)
- Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
- Data visualization (e.g. Tableau, Looker, MicroStrategy)
- Comfortable learning about and deploying new technologies and tools.
- Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.
- Good written and oral communication skills and ability to present results to non-technical audiences
- Knowledge of business intelligence and analytical tools, technologies and techniques.
Familiarity and experience in the following is a plus:
- AWS certification
- Spark Streaming
- Kafka Streaming / Kafka Connect
- ELK Stack
- Cassandra / MongoDB
- CI/CD: Jenkins, GitLab, Jira, Confluence other related tools

- 5+ years of experience in a Data Engineering role on cloud environment
- Must have good experience in Scala/PySpark (preferably on data-bricks environment)
- Extensive experience with Transact-SQL.
- Experience in Data-bricks/Spark.
- Strong experience in Dataware house projects
- Expertise in database development projects with ETL processes.
- Manage and maintain data engineering pipelines
- Develop batch processing, streaming and integration solutions
- Experienced in building and operationalizing large-scale enterprise data solutions and applications
- Using one or more of Azure data and analytics services in combination with custom solutions
- Azure Data Lake, Azure SQL DW (Synapse), and SQL Database products or equivalent products from other cloud services providers
- In-depth understanding of data management (e. g. permissions, security, and monitoring).
- Cloud repositories for e.g. Azure GitHub, Git
- Experience in an agile environment (Prefer Azure DevOps).
Good to have
- Manage source data access security
- Automate Azure Data Factory pipelines
- Continuous Integration/Continuous deployment (CICD) pipelines, Source Repositories
- Experience in implementing and maintaining CICD pipelines
- Power BI understanding, Delta Lake house architecture
- Knowledge of software development best practices.
- Excellent analytical and organization skills.
- Effective working in a team as well as working independently.
- Strong written and verbal communication skills.
- Expertise in database development projects and ETL processes.
Hi,
Enterprise Minds is looking for Data Architect for Pune Location.
Req Skills:
Python,Pyspark,Hadoop,Java,Scala


We have urgent requirement of Data Engineer/Sr Data Engineer for reputed MNC company.
Exp: 4-9yrs
Location: Pune/Bangalore/Hyderabad
Skills: We need candidate either Python AWS or Pyspark AWS or Spark Scala
We have an urgent requirements of Big Data Developer profiles in our reputed MNC company.
Location: Pune/Bangalore/Hyderabad/Nagpur
Experience: 4-9yrs
Skills: Pyspark,AWS
or Spark,Scala,AWS
or Python Aws

Key Responsibilities : ( Data Developer Python, Spark)
Exp : 2 to 9 Yrs
Development of data platforms, integration frameworks, processes, and code.
Develop and deliver APIs in Python or Scala for Business Intelligence applications build using a range of web languages
Develop comprehensive automated tests for features via end-to-end integration tests, performance tests, acceptance tests and unit tests.
Elaborate stories in a collaborative agile environment (SCRUM or Kanban)
Familiarity with cloud platforms like GCP, AWS or Azure.
Experience with large data volumes.
Familiarity with writing rest-based services.
Experience with distributed processing and systems
Experience with Hadoop / Spark toolsets
Experience with relational database management systems (RDBMS)
Experience with Data Flow development
Knowledge of Agile and associated development techniques including:
n

- Minimum 1 years of relevant experience, in PySpark (mandatory)
- Hands on experience in development, test, deploy, maintain and improving data integration pipeline in AWS cloud environment is added plus
- Ability to play lead role and independently manage 3-5 member of Pyspark development team
- EMR ,Python and PYspark mandate.
- Knowledge and awareness working with AWS Cloud technologies like Apache Spark, , Glue, Kafka, Kinesis, and Lambda in S3, Redshift, RDS
- Design, create, test, and maintain data pipeline architecture in collaboration with the Data Architect.
- Build the infrastructure required for extraction, transformation, and loading of data from a wide variety of data sources using Java, SQL, and Big Data technologies.
- Support the translation of data needs into technical system requirements. Support in building complex queries required by the product teams.
- Build data pipelines that clean, transform, and aggregate data from disparate sources
- Develop, maintain and optimize ETLs to increase data accuracy, data stability, data availability, and pipeline performance.
- Engage with Product Management and Business to deploy and monitor products/services on cloud platforms.
- Stay up-to-date with advances in data persistence and big data technologies and run pilots to design the data architecture to scale with the increased data sets of consumer experience.
- Handle data integration, consolidation, and reconciliation activities for digital consumer / medical products.
Job Qualifications:
- Bachelor’s or master's degree in Computer Science, Information management, Statistics or related field
- 5+ years of experience in the Consumer or Healthcare industry in an analytical role with a focus on building on data pipelines, querying data, analyzing, and clearly presenting analyses to members of the data science team.
- Technical expertise with data models, data mining.
- Hands-on Knowledge of programming languages in Java, Python, R, and Scala.
- Strong knowledge in Big data tools like the snowflake, AWS Redshift, Hadoop, map-reduce, etc.
- Having knowledge in tools like AWS Glue, S3, AWS EMR, Streaming data pipelines, Kafka/Kinesis is desirable.
- Hands-on knowledge in SQL and No-SQL database design.
- Having knowledge in CI/CD for the building and hosting of the solutions.
- Having AWS certification is an added advantage.
- Having Strong knowledge in visualization tools like Tableau, QlikView is an added advantage
- A team player capable of working and integrating across cross-functional teams for implementing project requirements. Experience in technical requirements gathering and documentation.
- Ability to work effectively and independently in a fast-paced agile environment with tight deadlines
- A flexible, pragmatic, and collaborative team player with the innate ability to engage with data architects, analysts, and scientists
- Strong Python Coding skills and OOP skills
- Should have worked on Big Data product Architecture
- Should have worked with any one of the SQL-based databases like MySQL, PostgreSQL and any one of
- NoSQL-based databases such as Cassandra, Elasticsearch etc.
- Hands on experience on frameworks like Spark RDD, DataFrame, Dataset
- Experience on development of ETL for data product
- Candidate should have working knowledge on performance optimization, optimal resource utilization, Parallelism and tuning of spark jobs
- Working knowledge on file formats: CSV, JSON, XML, PARQUET, ORC, AVRO
- Good to have working knowledge with any one of the Analytical Databases like Druid, MongoDB, Apache Hive etc.
- Experience to handle real-time data feeds (good to have working knowledge on Apache Kafka or similar tool)
- Python and Scala (Optional), Spark / PySpark, Parallel programming
Job description
Role : Lead Architecture (Spark, Scala, Big Data/Hadoop, Java)
Primary Location : India-Pune, Hyderabad
Experience : 7 - 12 Years
Management Level: 7
Joining Time: Immediate Joiners are preferred
- Attend requirements gathering workshops, estimation discussions, design meetings and status review meetings
- Experience of Solution Design and Solution Architecture for the data engineer model to build and implement Big Data Projects on-premises and on cloud.
- Align architecture with business requirements and stabilizing the developed solution
- Ability to build prototypes to demonstrate the technical feasibility of your vision
- Professional experience facilitating and leading solution design, architecture and delivery planning activities for data intensive and high throughput platforms and applications
- To be able to benchmark systems, analyses system bottlenecks and propose solutions to eliminate them
- Able to help programmers and project managers in the design, planning and governance of implementing projects of any kind.
- Develop, construct, test and maintain architectures and run Sprints for development and rollout of functionalities
- Data Analysis, Code development experience, ideally in Big Data Spark, Hive, Hadoop, Java, Python, PySpark,
- Execute projects of various types i.e. Design, development, Implementation and migration of functional analytics Models/Business logic across architecture approaches
- Work closely with Business Analysts to understand the core business problems and deliver efficient IT solutions of the product
- Deployment sophisticated analytics program of code using any of cloud application.
Perks and Benefits we Provide!
- Working with Highly Technical and Passionate, mission-driven people
- Subsidized Meals & Snacks
- Flexible Schedule
- Approachable leadership
- Access to various learning tools and programs
- Pet Friendly
- Certification Reimbursement Policy
- Check out more about us on our website below!
www.datametica.com

- Sr. Data Engineer:
Core Skills – Data Engineering, Big Data, Pyspark, Spark SQL and Python
Candidate with prior Palantir Cloud Foundry OR Clinical Trial Data Model background is preferred
Major accountabilities:
- Responsible for Data Engineering, Foundry Data Pipeline Creation, Foundry Analysis & Reporting, Slate Application development, re-usable code development & management and Integrating Internal or External System with Foundry for data ingestion with high quality.
- Have good understanding on Foundry Platform landscape and it’s capabilities
- Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues.
- Defines company data assets (data models), Pyspark, spark SQL, jobs to populate data models.
- Designs data integrations and data quality framework.
- Design & Implement integration with Internal, External Systems, F1 AWS platform using Foundry Data Connector or Magritte Agent
- Collaboration with data scientists, data analyst and technology teams to document and leverage their understanding of the Foundry integration with different data sources - Actively participate in agile work practices
- Coordinating with Quality Engineer to ensure the all quality controls, naming convention & best practices have been followed
Desired Candidate Profile :
- Strong data engineering background
- Experience with Clinical Data Model is preferred
- Experience in
- SQL Server ,Postgres, Cassandra, Hadoop, and Spark for distributed data storage and parallel computing
- Java and Groovy for our back-end applications and data integration tools
- Python for data processing and analysis
- Cloud infrastructure based on AWS EC2 and S3
- 7+ years IT experience, 2+ years’ experience in Palantir Foundry Platform, 4+ years’ experience in Big Data platform
- 5+ years of Python and Pyspark development experience
- Strong troubleshooting and problem solving skills
- BTech or master's degree in computer science or a related technical field
- Experience designing, building, and maintaining big data pipelines systems
- Hands-on experience on Palantir Foundry Platform and Foundry custom Apps development
- Able to design and implement data integration between Palantir Foundry and external Apps based on Foundry data connector framework
- Hands-on in programming languages primarily Python, R, Java, Unix shell scripts
- Hand-on experience in AWS / Azure cloud platform and stack
- Strong in API based architecture and concept, able to do quick PoC using API integration and development
- Knowledge of machine learning and AI
- Skill and comfort working in a rapidly changing environment with dynamic objectives and iteration with users.
Demonstrated ability to continuously learn, work independently, and make decisions with minimal supervision
- Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
- Experience in migrating on-premise data warehouses to data platforms on AZURE cloud.
- Designing and implementing data engineering, ingestion, and transformation functions
- Experience with Azure Analysis Services
- Experience in Power BI
- Experience with third-party solutions like Attunity/Stream sets, Informatica
- Experience with PreSales activities (Responding to RFPs, Executing Quick POCs)
- Capacity Planning and Performance Tuning on Azure Stack and Spark.