PySpark Jobs in Pune

30+ PySpark Jobs in Pune | PySpark Job openings in Pune

Apply to 30+ PySpark Jobs in Pune on CutShort.io. Explore the latest PySpark Job opportunities across top companies like Google, Amazon & Adobe.

Data Engineer

at ZeMoSo Technologies

11 recruiters

Agency job

via TIGI HR Solution Pvt. Ltd. by Vaidehi Sarkar

Mumbai, Bengaluru (Bangalore), Hyderabad, Chennai, Pune

4 - 8 yrs

₹10L - ₹15L / yr

Data engineering

Python

SQL

Data Warehouse (DWH)

Amazon Web Services (AWS)

+3 more

Work Mode: Hybrid

Need B.Tech, BE, M.Tech, ME candidates - Mandatory

Must-Have Skills:

● Educational Qualification :- B.Tech, BE, M.Tech, ME in any field.

● Minimum of 3 years of proven experience as a Data Engineer.

● Strong proficiency in Python programming language and SQL.

● Experience in DataBricks and setting up and managing data pipelines, data warehouses/lakes.

● Good comprehension and critical thinking skills.

● Kindly note Salary bracket will vary according to the exp. of the candidate -

- Experience from 4 yrs to 6 yrs - Salary upto 22 LPA

- Experience from 5 yrs to 8 yrs - Salary upto 30 LPA

- Experience more than 8 yrs - Salary upto 40 LPA

Work Mode: Hybrid

Need B.Tech, BE, M.Tech, ME candidates - Mandatory

Must-Have Skills:

● Educational Qualification :- B.Tech, BE, M.Tech, ME in any field.

● Minimum of 3 years of proven experience as a Data Engineer.

● Strong proficiency in Python programming language and SQL.

● Experience in DataBricks and setting up and managing data pipelines, data warehouses/lakes.

● Good comprehension and critical thinking skills.

● Kindly note Salary bracket will vary according to the exp. of the candidate -

- Experience from 4 yrs to 6 yrs - Salary upto 22 LPA

- Experience from 5 yrs to 8 yrs - Salary upto 30 LPA

- Experience more than 8 yrs - Salary upto 40 LPA

Sr. Data Engineer (GCP)

at Data Axle

2 candid answers

Posted by Eman Khan

Pune

7 - 10 yrs

Best in industry

Google Cloud Platform (GCP)

ETL

Python

Java

Scala

+4 more

About Data Axle:

Data Axle Inc. has been an industry leader in data, marketing solutions, sales and research for over 45 years in the USA. Data Axle has set up a strategic global center of excellence in Pune. This center delivers mission critical data services to its global customers powered by its proprietary cloud-based technology platform and by leveraging proprietary business & consumer databases. Data Axle is headquartered in Dallas, TX, USA.

Roles and Responsibilities:

Design, implement, and manage scalable analytical data infrastructure, enabling efficient access to large datasets and high-performance computing on Google Cloud Platform (GCP).
Develop and optimize data pipelines using GCP-native services like BigQuery, Dataflow, Dataproc, Pub/Sub, Cloud Data Fusion, and Cloud Storage.
Work with diverse data sources to extract, transform, and load data into enterprise-grade data lakes and warehouses, ensuring high availability and reliability.
Implement and maintain real-time data streaming solutions using Pub/Sub, Dataflow, and Kafka.
Research and integrate the latest big data and visualization technologies to enhance analytics capabilities and improve efficiency.
Collaborate with cross-functional teams to implement machine learning models and AI-driven analytics solutions using Vertex AI and BigQuery ML.
Continuously improve existing data architectures to support scalability, performance optimization, and cost efficiency.
Enhance data security and governance by implementing industry best practices for access control, encryption, and compliance.
Automate and optimize data workflows to simplify reporting, dashboarding, and self-service analytics using Looker and Data Studio.

Basic Qualifications

7+ years of experience in data engineering, software development, business intelligence, or data science, with expertise in large-scale data processing and analytics.
Strong proficiency in SQL and experience with BigQuery for data warehousing.
Hands-on experience in designing and developing ETL/ELT pipelines using GCP services (Cloud Composer, Dataflow, Dataproc, Data Fusion, or Apache Airflow).
Expertise in distributed computing and big data processing frameworks, such as Apache Spark, Hadoop, or Flink, particularly within Dataproc and Dataflow environments.
Experience with business intelligence and data visualization tools, such as Looker, Tableau, or Power BI.
Knowledge of data governance, security best practices, and compliance requirements in cloud environments.

Preferred Qualifications:

Degree/Diploma in Computer Science, Engineering, Mathematics, or a related technical field.
Experience working with GCP big data technologies, including BigQuery, Dataflow, Dataproc, Pub/Sub, and Cloud SQL.
Hands-on experience with real-time data processing frameworks, including Kafka and Apache Beam.
Proficiency in Python, Java, or Scala for data engineering and pipeline development.
Familiarity with DevOps best practices, CI/CD pipelines, Terraform, and infrastructure-as-code for managing GCP resources.
Experience integrating AI/ML models into data workflows, leveraging BigQuery ML, Vertex AI, or TensorFlow.
Understanding of Agile methodologies, software development life cycle (SDLC), and cloud cost optimization strategies.

About Data Axle:

Roles and Responsibilities:

Design, implement, and manage scalable analytical data infrastructure, enabling efficient access to large datasets and high-performance computing on Google Cloud Platform (GCP).
Develop and optimize data pipelines using GCP-native services like BigQuery, Dataflow, Dataproc, Pub/Sub, Cloud Data Fusion, and Cloud Storage.
Work with diverse data sources to extract, transform, and load data into enterprise-grade data lakes and warehouses, ensuring high availability and reliability.
Implement and maintain real-time data streaming solutions using Pub/Sub, Dataflow, and Kafka.
Research and integrate the latest big data and visualization technologies to enhance analytics capabilities and improve efficiency.
Collaborate with cross-functional teams to implement machine learning models and AI-driven analytics solutions using Vertex AI and BigQuery ML.
Continuously improve existing data architectures to support scalability, performance optimization, and cost efficiency.
Enhance data security and governance by implementing industry best practices for access control, encryption, and compliance.
Automate and optimize data workflows to simplify reporting, dashboarding, and self-service analytics using Looker and Data Studio.

Basic Qualifications

7+ years of experience in data engineering, software development, business intelligence, or data science, with expertise in large-scale data processing and analytics.
Strong proficiency in SQL and experience with BigQuery for data warehousing.
Hands-on experience in designing and developing ETL/ELT pipelines using GCP services (Cloud Composer, Dataflow, Dataproc, Data Fusion, or Apache Airflow).
Expertise in distributed computing and big data processing frameworks, such as Apache Spark, Hadoop, or Flink, particularly within Dataproc and Dataflow environments.
Experience with business intelligence and data visualization tools, such as Looker, Tableau, or Power BI.
Knowledge of data governance, security best practices, and compliance requirements in cloud environments.

Preferred Qualifications:

Degree/Diploma in Computer Science, Engineering, Mathematics, or a related technical field.
Experience working with GCP big data technologies, including BigQuery, Dataflow, Dataproc, Pub/Sub, and Cloud SQL.
Hands-on experience with real-time data processing frameworks, including Kafka and Apache Beam.
Proficiency in Python, Java, or Scala for data engineering and pipeline development.
Familiarity with DevOps best practices, CI/CD pipelines, Terraform, and infrastructure-as-code for managing GCP resources.
Experience integrating AI/ML models into data workflows, leveraging BigQuery ML, Vertex AI, or TensorFlow.
Understanding of Agile methodologies, software development life cycle (SDLC), and cloud cost optimization strategies.

Manager - Data Scientist

at Data Axle

2 candid answers

Posted by Eman Khan

Pune

12 - 17 yrs

Best in industry

databricks

Python

PySpark

Machine Learning (ML)

SQL

+1 more

Roles & Responsibilities:

We are looking for a Data Scientist to join the Data Science Client Services team to continue our success of identifying high quality target audiences that generate profitable marketing return for our clients. We are looking for experienced data science, machine learning and MLOps practitioners to design, build and deploy impactful predictive marketing solutions that serve a wide range of verticals and clients. The right candidate will enjoy contributing to and learning from a highly talented team and working on a variety of projects.

We are looking for a Manager Data Scientist who will be responsible for

Ownership of design, implementation, and deployment of machine learning algorithms in a modern Python-based cloud architecture
Design or enhance ML workflows for data ingestion, model design, model inference and scoring 3. Oversight on team project execution and delivery
Establish peer review guidelines for high quality coding to help develop junior team members’ skill set growth, cross-training, and team efficiencies
Visualize and publish model performance results and insights to internal and external audiences

Qualifications:

Masters in a relevant quantitative, applied field (Statistics, Econometrics, Computer Science, Mathematics, Engineering)
Minimum of 12+ years of work experience in the end-to-end lifecycle of ML model development and deployment into production within a cloud infrastructure (Databricks is highly preferred) 3. Proven ability to manage the output of a small team in a fast-paced environment and to lead by example in the fulfilment of client requests
Exhibit deep knowledge of core mathematical principles relating to data science and machine learning (ML Theory + Best Practices, Feature Engineering and Selection, Supervised and Unsupervised ML, A/B Testing, etc.)
Proficiency in Python and SQL required; PySpark/Spark experience a plus
Ability to conduct a productive peer review and proper code structure in Github
Proven experience developing, testing, and deploying various ML algorithms (neural networks, XGBoost, Bayes, and the like)
Working knowledge of modern CI/CD methods

This position description is intended to describe the duties most frequently performed by an individual in this position. It is not intended to be a complete list of assigned duties but to describe a position level.

Roles & Responsibilities:

We are looking for a Manager Data Scientist who will be responsible for

Ownership of design, implementation, and deployment of machine learning algorithms in a modern Python-based cloud architecture
Design or enhance ML workflows for data ingestion, model design, model inference and scoring 3. Oversight on team project execution and delivery
Establish peer review guidelines for high quality coding to help develop junior team members’ skill set growth, cross-training, and team efficiencies
Visualize and publish model performance results and insights to internal and external audiences

Qualifications:

Masters in a relevant quantitative, applied field (Statistics, Econometrics, Computer Science, Mathematics, Engineering)
Minimum of 12+ years of work experience in the end-to-end lifecycle of ML model development and deployment into production within a cloud infrastructure (Databricks is highly preferred) 3. Proven ability to manage the output of a small team in a fast-paced environment and to lead by example in the fulfilment of client requests
Exhibit deep knowledge of core mathematical principles relating to data science and machine learning (ML Theory + Best Practices, Feature Engineering and Selection, Supervised and Unsupervised ML, A/B Testing, etc.)
Proficiency in Python and SQL required; PySpark/Spark experience a plus
Ability to conduct a productive peer review and proper code structure in Github
Proven experience developing, testing, and deploying various ML algorithms (neural networks, XGBoost, Bayes, and the like)
Working knowledge of modern CI/CD methods

Data Engineer

at Deqode

1 recruiter

Posted by Alisha Das

Bengaluru (Bangalore), Delhi, Gurugram, Noida, Ghaziabad, Faridabad, Mumbai, Pune, Hyderabad, Indore, Jaipur, Kolkata

4 - 5 yrs

₹2L - ₹18L / yr

Python

PySpark

We are looking for a skilled and passionate Data Engineers with a strong foundation in Python programming and hands-on experience working with APIs, AWS cloud, and modern development practices. The ideal candidate will have a keen interest in building scalable backend systems and working with big data tools like PySpark.

Key Responsibilities:

Write clean, scalable, and efficient Python code.
Work with Python frameworks such as PySpark for data processing.
Design, develop, update, and maintain APIs (RESTful).
Deploy and manage code using GitHub CI/CD pipelines.
Collaborate with cross-functional teams to define, design, and ship new features.
Work on AWS cloud services for application deployment and infrastructure.
Basic database design and interaction with MySQL or DynamoDB.
Debugging and troubleshooting application issues and performance bottlenecks.

Required Skills & Qualifications:

4+ years of hands-on experience with Python development.
Proficient in Python basics with a strong problem-solving approach.
Experience with AWS Cloud services (EC2, Lambda, S3, etc.).
Good understanding of API development and integration.
Knowledge of GitHub and CI/CD workflows.
Experience in working with PySpark or similar big data frameworks.
Basic knowledge of MySQL or DynamoDB.
Excellent communication skills and a team-oriented mindset.

Nice to Have:

Experience in containerization (Docker/Kubernetes).
Familiarity with Agile/Scrum methodologies.

Key Responsibilities:

Write clean, scalable, and efficient Python code.
Work with Python frameworks such as PySpark for data processing.
Design, develop, update, and maintain APIs (RESTful).
Deploy and manage code using GitHub CI/CD pipelines.
Collaborate with cross-functional teams to define, design, and ship new features.
Work on AWS cloud services for application deployment and infrastructure.
Basic database design and interaction with MySQL or DynamoDB.
Debugging and troubleshooting application issues and performance bottlenecks.

Required Skills & Qualifications:

4+ years of hands-on experience with Python development.
Proficient in Python basics with a strong problem-solving approach.
Experience with AWS Cloud services (EC2, Lambda, S3, etc.).
Good understanding of API development and integration.
Knowledge of GitHub and CI/CD workflows.
Experience in working with PySpark or similar big data frameworks.
Basic knowledge of MySQL or DynamoDB.
Excellent communication skills and a team-oriented mindset.

Nice to Have:

Experience in containerization (Docker/Kubernetes).
Familiarity with Agile/Scrum methodologies.

Data Engineer

at NonStop io Technologies Pvt Ltd

6 recruiters

Posted by Kalyani Wadnere

Pune

2 - 4 yrs

Best in industry

AWS Lambda

databricks

Database migration

Apache Kafka

Apache Spark

+3 more

About NonStop io Technologies:

NonStop io Technologies is a value-driven company with a strong focus on process-oriented software engineering. We specialize in Product Development and have a decade's worth of experience in building web and mobile applications across various domains. NonStop io Technologies follows core principles that guide its operations and believes in staying invested in a product's vision for the long term. We are a small but proud group of individuals who believe in the 'givers gain' philosophy and strive to provide value in order to seek value. We are committed to and specialize in building cutting-edge technology products and serving as trusted technology partners for startups and enterprises. We pride ourselves on fostering innovation, learning, and community engagement. Join us to work on impactful projects in a collaborative and vibrant environment.

Brief Description:

We are looking for a talented Data Engineer to join our team. In this role, you will design, implement, and manage data pipelines, ensuring the accessibility and reliability of data for critical business processes. This is an exciting opportunity to work on scalable solutions that power data-driven decisions

Skillset:

Here is a list of some of the technologies you will work with (the list below is not set in stone)

Data Pipeline Orchestration and Execution:

● AWS Glue

● AWS Step Functions

● Databricks Change

Data Capture:

● Amazon Database Migration Service

● Amazon Managed Streaming for Apache Kafka with Debezium Plugin

Batch:

● AWS step functions (and Glue Jobs)

● Asynchronous queueing of batch job commands with RabbitMQ to various “ETL Jobs”

● Cron and subervisord processing on dedicated job server(s): Python & PHP

Streaming:

● Real-time processing via AWS MSK (Kafka), Apache Hudi, & Apache Flink

● Near real-time processing via worker (listeners) spread over AWS Lambda, custom server (daemons) written in Python and PHP Symfony

● Languages: Python & PySpark, Unix Shell, PHP Symfony (with Doctrine ORM)

● Monitoring & Reliability: Datadog & Cloudwatch

Things you will do:

● Build dashboards using Datadog and Cloudwatch to ensure system health and user support

● Build schema registries that enable data governance

● Partner with end-users to resolve service disruptions and evangelize our data product offerings

● Vigilantly oversee data quality and alert upstream data producers of issues

● Support and contribute to the data platform architecture strategy, roadmap, and implementation plans to support the company’s data-driven initiatives and business objective

● Work with Business Intelligence (BI) consumers to deliver enterprise-wide fact and dimension data product tables to enable data-driven decision-making across the organization.

● Other duties as assigned

About NonStop io Technologies:

Brief Description:

Skillset:

Here is a list of some of the technologies you will work with (the list below is not set in stone)

Data Pipeline Orchestration and Execution:

● AWS Glue

● AWS Step Functions

● Databricks Change

Data Capture:

● Amazon Database Migration Service

● Amazon Managed Streaming for Apache Kafka with Debezium Plugin

Batch:

● AWS step functions (and Glue Jobs)

● Asynchronous queueing of batch job commands with RabbitMQ to various “ETL Jobs”

● Cron and subervisord processing on dedicated job server(s): Python & PHP

Streaming:

● Real-time processing via AWS MSK (Kafka), Apache Hudi, & Apache Flink

● Near real-time processing via worker (listeners) spread over AWS Lambda, custom server (daemons) written in Python and PHP Symfony

● Languages: Python & PySpark, Unix Shell, PHP Symfony (with Doctrine ORM)

● Monitoring & Reliability: Datadog & Cloudwatch

Things you will do:

● Build dashboards using Datadog and Cloudwatch to ensure system health and user support

● Build schema registries that enable data governance

● Partner with end-users to resolve service disruptions and evangelize our data product offerings

● Vigilantly oversee data quality and alert upstream data producers of issues

● Support and contribute to the data platform architecture strategy, roadmap, and implementation plans to support the company’s data-driven initiatives and business objective

● Work with Business Intelligence (BI) consumers to deliver enterprise-wide fact and dimension data product tables to enable data-driven decision-making across the organization.

● Other duties as assigned

GCP Senior Data Engineer

at Xebia IT Architects

2 recruiters

Posted by Vijay S

Bengaluru (Bangalore), Gurugram, Pune, Hyderabad, Chennai, Bhopal, Jaipur

10 - 15 yrs

₹30L - ₹40L / yr

Spark

Google Cloud Platform (GCP)

Python

Apache Airflow

PySpark

+1 more

We are looking for a Senior Data Engineer with strong expertise in GCP, Databricks, and Airflow to design and implement a GCP Cloud Native Data Processing Framework. The ideal candidate will work on building scalable data pipelines and help migrate existing workloads to a modern framework.

Shift: 2 PM 11 PM
Work Mode: Hybrid (3 days a week) across Xebia locations
Notice Period: Immediate joiners or those with a notice period of up to 30 days

Key Responsibilities:

Design and implement a GCP Native Data Processing Framework leveraging Spark and GCP Cloud Services.
Develop and maintain data pipelines using Databricks and Airflow for transforming Raw → Silver → Gold data layers.
Ensure data integrity, consistency, and availability across all systems.
Collaborate with data engineers, analysts, and stakeholders to optimize performance.
Document standards and best practices for data engineering workflows.

Required Experience:

7-8 years of experience in data engineering, architecture, and pipeline development.
Strong knowledge of GCP, Databricks, PySpark, and BigQuery.
Experience with Orchestration tools like Airflow, Dagster, or GCP equivalents.
Understanding of Data Lake table formats (Delta, Iceberg, etc.).
Proficiency in Python for scripting and automation.
Strong problem-solving skills and collaborative mindset.

⚠️ Please apply only if you have not applied recently or are not currently in the interview process for any open roles at Xebia.

Looking forward to your response!

Best regards,

Vijay S

Assistant Manager - TAG

https://www.linkedin.com/in/vijay-selvarajan/

Shift: 2 PM 11 PM
Work Mode: Hybrid (3 days a week) across Xebia locations
Notice Period: Immediate joiners or those with a notice period of up to 30 days

Key Responsibilities:

Design and implement a GCP Native Data Processing Framework leveraging Spark and GCP Cloud Services.
Develop and maintain data pipelines using Databricks and Airflow for transforming Raw → Silver → Gold data layers.
Ensure data integrity, consistency, and availability across all systems.
Collaborate with data engineers, analysts, and stakeholders to optimize performance.
Document standards and best practices for data engineering workflows.

Required Experience:

7-8 years of experience in data engineering, architecture, and pipeline development.
Strong knowledge of GCP, Databricks, PySpark, and BigQuery.
Experience with Orchestration tools like Airflow, Dagster, or GCP equivalents.
Understanding of Data Lake table formats (Delta, Iceberg, etc.).
Proficiency in Python for scripting and automation.
Strong problem-solving skills and collaborative mindset.

⚠️ Please apply only if you have not applied recently or are not currently in the interview process for any open roles at Xebia.

Looking forward to your response!

Best regards,

Vijay S

Assistant Manager - TAG

https://www.linkedin.com/in/vijay-selvarajan/

AWS Data engineer

at Deqode

1 recruiter

Posted by Shraddha Katare

Pune

2 - 5 yrs

₹3L - ₹10L / yr

PySpark

Amazon Web Services (AWS)

AWS Lambda

SQL

Data engineering

+2 more

Here is the Job Description -

Location -- Viman Nagar, Pune

Mode - 5 Days Working

Required Tech Skills:

● Strong at PySpark, Python

● Good understanding of Data Structure

● Good at SQL query/optimization

● Strong fundamentals of OOPs programming

● Good understanding of AWS Cloud, Big Data.

● Data Lake, AWS Glue, Athena, S3, Kinesis, SQL/NoSQL DB

Here is the Job Description -

Location -- Viman Nagar, Pune

Mode - 5 Days Working

Required Tech Skills:

● Strong at PySpark, Python

● Good understanding of Data Structure

● Good at SQL query/optimization

● Strong fundamentals of OOPs programming

● Good understanding of AWS Cloud, Big Data.

● Data Lake, AWS Glue, Athena, S3, Kinesis, SQL/NoSQL DB

Sr. Python Developer

at Nirmitee.io

4 recruiters

Posted by Gitashri K

Pune

5 - 10 yrs

₹8L - ₹15L / yr

Python

PySpark

Amazon Web Services (AWS)

CI/CD

GitHub

About the Role:

We are seeking a skilled Python Backend Developer to join our dynamic team. This role focuses on designing, building, and maintaining efficient, reusable, and reliable code that supports both monolithic and microservices architectures. The ideal candidate will have a strong understanding of backend frameworks and architectures, proficiency in asynchronous programming, and familiarity with deployment processes. Experience with AI model deployment is a plus.

Overall 5+ years of IT experience with minimum of 5+ Yrs of experience on Python and in Opensource web framework (Django) with AWS Experience.

Key Responsibilities:

- Develop, optimize, and maintain backend systems using Python, Pyspark, and FastAPI.

- Design and implement scalable architectures, including both monolithic and microservices.

-3+ Years of working experience in AWS (Lambda, Serverless, Step Function and EC2)

-Deep Knowledge on Python Flask/Django Framework

-Good understanding of REST API’s

-Sound Knowledge on Database

-Excellent problem-solving and analytical skills

-Leadership Skills, Good Communication Skills, interested to learn modern technologies

- Apply design patterns (MVC, Singleton, Observer, Factory) to solve complex problems effectively.

- Work with web servers (Nginx, Apache) and deploy web applications and services.

- Create and manage RESTful APIs; familiarity with GraphQL is a plus.

- Use asynchronous programming techniques (ASGI, WSGI, async/await) to enhance performance.

- Integrate background job processing with Celery and RabbitMQ, and manage caching mechanisms using Redis and Memcached.

- (Optional) Develop containerized applications using Docker and orchestrate deployments with Kubernetes.

Required Skills:

- Languages & Frameworks:Python, Django, AWS

- Backend Architecture & Design:Strong knowledge of monolithic and microservices architectures, design patterns, and asynchronous programming.

- Web Servers & Deployment:Proficient in Nginx and Apache, and experience in RESTful API design and development. GraphQL experience is a plus.

-Background Jobs & Task Queues: Proficiency in Celery and RabbitMQ, with experience in caching (Redis, Memcached).

- Additional Qualifications: Knowledge of Docker and Kubernetes (optional), with any exposure to AI model deployment considered a bonus.

Qualifications:

- Bachelor’s degree in Computer Science, Engineering, or a related field.

- 5+ years of experience in backend development using Python and Django and AWS.

- Demonstrated ability to design and implement scalable and robust architectures.

- Strong problem-solving skills, attention to detail, and a collaborative mindset.

Preferred:

- Experience with Docker/Kubernetes for containerization and orchestration.

- Exposure to AI model deployment processes.

About the Role:

Overall 5+ years of IT experience with minimum of 5+ Yrs of experience on Python and in Opensource web framework (Django) with AWS Experience.

Key Responsibilities:

- Develop, optimize, and maintain backend systems using Python, Pyspark, and FastAPI.

- Design and implement scalable architectures, including both monolithic and microservices.

-3+ Years of working experience in AWS (Lambda, Serverless, Step Function and EC2)

-Deep Knowledge on Python Flask/Django Framework

-Good understanding of REST API’s

-Sound Knowledge on Database

-Excellent problem-solving and analytical skills

-Leadership Skills, Good Communication Skills, interested to learn modern technologies

- Apply design patterns (MVC, Singleton, Observer, Factory) to solve complex problems effectively.

- Work with web servers (Nginx, Apache) and deploy web applications and services.

- Create and manage RESTful APIs; familiarity with GraphQL is a plus.

- Use asynchronous programming techniques (ASGI, WSGI, async/await) to enhance performance.

- Integrate background job processing with Celery and RabbitMQ, and manage caching mechanisms using Redis and Memcached.

- (Optional) Develop containerized applications using Docker and orchestrate deployments with Kubernetes.

Required Skills:

- Languages & Frameworks:Python, Django, AWS

- Backend Architecture & Design:Strong knowledge of monolithic and microservices architectures, design patterns, and asynchronous programming.

- Web Servers & Deployment:Proficient in Nginx and Apache, and experience in RESTful API design and development. GraphQL experience is a plus.

-Background Jobs & Task Queues: Proficiency in Celery and RabbitMQ, with experience in caching (Redis, Memcached).

- Additional Qualifications: Knowledge of Docker and Kubernetes (optional), with any exposure to AI model deployment considered a bonus.

Qualifications:

- Bachelor’s degree in Computer Science, Engineering, or a related field.

- 5+ years of experience in backend development using Python and Django and AWS.

- Demonstrated ability to design and implement scalable and robust architectures.

- Strong problem-solving skills, attention to detail, and a collaborative mindset.

Preferred:

- Experience with Docker/Kubernetes for containerization and orchestration.

- Exposure to AI model deployment processes.

Senior Data Scientist

at Data Axle

2 candid answers

Posted by Eman Khan

Pune

6 - 9 yrs

Best in industry

Azure

Machine Learning (ML)

databricks

Python

SQL

+2 more

About Data Axle:

Data Axle Inc. has been an industry leader in data, marketing solutions, sales and research for over 50 years in the USA. Data Axle now as an established strategic global centre of excellence in Pune. This centre delivers mission critical data services to its global customers powered by its proprietary cloud-based technology platform and by leveraging proprietary business & consumer databases.

Data Axle Pune is pleased to have achieved certification as a Great Place to Work!

Roles & Responsibilities:

We are looking for a Senior Data Scientist to join the Data Science Client Services team to continue our success of identifying high quality target audiences that generate profitable marketing return for our clients. We are looking for experienced data science, machine learning and MLOps practitioners to design, build and deploy impactful predictive marketing solutions that serve a wide range of verticals and clients. The right candidate will enjoy contributing to and learning from a highly talented team and working on a variety of projects.

We are looking for a Senior Data Scientist who will be responsible for:

Ownership of design, implementation, and deployment of machine learning algorithms in a modern Python-based cloud architecture
Design or enhance ML workflows for data ingestion, model design, model inference and scoring
Oversight on team project execution and delivery
Establish peer review guidelines for high quality coding to help develop junior team members’ skill set growth, cross-training, and team efficiencies
Visualize and publish model performance results and insights to internal and external audiences

Qualifications:

Masters in a relevant quantitative, applied field (Statistics, Econometrics, Computer Science, Mathematics, Engineering)
Minimum of 5 years of work experience in the end-to-end lifecycle of ML model development and deployment into production within a cloud infrastructure (Databricks is highly preferred)
Proven ability to manage the output of a small team in a fast-paced environment and to lead by example in the fulfilment of client requests
Exhibit deep knowledge of core mathematical principles relating to data science and machine learning (ML Theory + Best Practices, Feature Engineering and Selection, Supervised and Unsupervised ML, A/B Testing, etc.)
Proficiency in Python and SQL required; PySpark/Spark experience a plus
Ability to conduct a productive peer review and proper code structure in Github
Proven experience developing, testing, and deploying various ML algorithms (neural networks, XGBoost, Bayes, and the like)
Working knowledge of modern CI/CD methods This position description is intended to describe the duties most frequently performed by an individual in this position.

It is not intended to be a complete list of assigned duties but to describe a position level.

About Data Axle:

Data Axle Pune is pleased to have achieved certification as a Great Place to Work!

Roles & Responsibilities:

We are looking for a Senior Data Scientist who will be responsible for:

Ownership of design, implementation, and deployment of machine learning algorithms in a modern Python-based cloud architecture
Design or enhance ML workflows for data ingestion, model design, model inference and scoring
Oversight on team project execution and delivery
Establish peer review guidelines for high quality coding to help develop junior team members’ skill set growth, cross-training, and team efficiencies
Visualize and publish model performance results and insights to internal and external audiences

Qualifications:

Masters in a relevant quantitative, applied field (Statistics, Econometrics, Computer Science, Mathematics, Engineering)
Minimum of 5 years of work experience in the end-to-end lifecycle of ML model development and deployment into production within a cloud infrastructure (Databricks is highly preferred)
Proven ability to manage the output of a small team in a fast-paced environment and to lead by example in the fulfilment of client requests
Exhibit deep knowledge of core mathematical principles relating to data science and machine learning (ML Theory + Best Practices, Feature Engineering and Selection, Supervised and Unsupervised ML, A/B Testing, etc.)
Proficiency in Python and SQL required; PySpark/Spark experience a plus
Ability to conduct a productive peer review and proper code structure in Github
Proven experience developing, testing, and deploying various ML algorithms (neural networks, XGBoost, Bayes, and the like)
Working knowledge of modern CI/CD methods This position description is intended to describe the duties most frequently performed by an individual in this position.

It is not intended to be a complete list of assigned duties but to describe a position level.

Data Engineer

at NeoGenCode Technologies Pvt Ltd

2 candid answers

Posted by Akshay Patil

Pune

4 - 8 yrs

₹1L - ₹12L / yr

PySpark

Data engineering

Big Data

Hadoop

Spark

+4 more

Job Description :

Job Title : Data Engineer

Location : Pune (Hybrid Work Model)

Experience Required : 4 to 8 Years

Role Overview :

We are seeking talented and driven Data Engineers to join our team in Pune. The ideal candidate will have a strong background in data engineering with expertise in Python, PySpark, and SQL. You will be responsible for designing, building, and maintaining scalable data pipelines and systems that empower our business intelligence and analytics initiatives.

Key Responsibilities:

Develop, optimize, and maintain ETL pipelines and data workflows.
Design and implement scalable data solutions using Python, PySpark, and SQL.
Collaborate with cross-functional teams to gather and analyze data requirements.
Ensure data quality, integrity, and security throughout the data lifecycle.
Monitor and troubleshoot data pipelines to ensure reliability and performance.
Work on hybrid data environments involving on-premise and cloud-based systems.
Assist in the deployment and maintenance of big data solutions.

Required Skills and Qualifications:

Bachelor’s degree in Computer Science, Information Technology, or related field.
4 to 8 Years of experience in Data Engineering or related roles.
Proficiency in Python and PySpark for data processing and analysis.
Strong SQL skills with experience in writing complex queries and optimizing performance.
Familiarity with data pipeline tools and frameworks.
Knowledge of cloud platforms such as AWS, Azure, or GCP is a plus.
Excellent problem-solving and analytical skills.
Strong communication and teamwork abilities.

Preferred Qualifications:

Experience with big data technologies like Hadoop, Hive, or Spark.
Familiarity with data visualization tools and techniques.
Knowledge of CI/CD pipelines and DevOps practices in a data engineering context.

Work Model:

This position follows a hybrid work model, with candidates expected to work from the Pune office as per business needs.

Why Join Us?

Opportunity to work with cutting-edge technologies.
Collaborative and innovative work environment.
Competitive compensation and benefits.
Clear career progression and growth opportunities.

Job Description :

Job Title : Data Engineer

Location : Pune (Hybrid Work Model)

Experience Required : 4 to 8 Years

Role Overview :

Key Responsibilities:

Develop, optimize, and maintain ETL pipelines and data workflows.
Design and implement scalable data solutions using Python, PySpark, and SQL.
Collaborate with cross-functional teams to gather and analyze data requirements.
Ensure data quality, integrity, and security throughout the data lifecycle.
Monitor and troubleshoot data pipelines to ensure reliability and performance.
Work on hybrid data environments involving on-premise and cloud-based systems.
Assist in the deployment and maintenance of big data solutions.

Required Skills and Qualifications:

Bachelor’s degree in Computer Science, Information Technology, or related field.
4 to 8 Years of experience in Data Engineering or related roles.
Proficiency in Python and PySpark for data processing and analysis.
Strong SQL skills with experience in writing complex queries and optimizing performance.
Familiarity with data pipeline tools and frameworks.
Knowledge of cloud platforms such as AWS, Azure, or GCP is a plus.
Excellent problem-solving and analytical skills.
Strong communication and teamwork abilities.

Preferred Qualifications:

Experience with big data technologies like Hadoop, Hive, or Spark.
Familiarity with data visualization tools and techniques.
Knowledge of CI/CD pipelines and DevOps practices in a data engineering context.

Work Model:

This position follows a hybrid work model, with candidates expected to work from the Pune office as per business needs.

Why Join Us?

Opportunity to work with cutting-edge technologies.
Collaborative and innovative work environment.
Competitive compensation and benefits.
Clear career progression and growth opportunities.

Azure Data Engineer

at TVARIT GmbH

2 candid answers

Posted by Shivani Kawade

Remote, Pune

2 - 4 yrs

₹8L - ₹20L / yr

Python

PySpark

ETL

databricks

Azure

+6 more

TVARIT GmbH develops and delivers solutions in the field of artificial intelligence (AI) for the Manufacturing, automotive, and process industries. With its software products, TVARIT makes it possible for its customers to make intelligent and well-founded decisions, e.g., in forward-looking Maintenance, increasing the OEE and predictive quality. We have renowned reference customers, competent technology, a good research team from renowned Universities, and the award of a renowned AI prize (e.g., EU Horizon 2020) which makes Tvarit one of the most innovative AI companies in Germany and Europe.

We are looking for a self-motivated person with a positive "can-do" attitude and excellent oral and written communication skills in English.

We are seeking a skilled and motivated Data Engineer from the manufacturing Industry with over two years of experience to join our team. As a data engineer, you will be responsible for designing, building, and maintaining the infrastructure required for the collection, storage, processing, and analysis of large and complex data sets. The ideal candidate will have a strong foundation in ETL pipelines and Python, with additional experience in Azure and Terraform being a plus. This role requires a proactive individual who can contribute to our data infrastructure and support our analytics and data science initiatives.

Skills Required

Experience in the manufacturing industry (metal industry is a plus)
2+ years of experience as a Data Engineer
Experience in data cleaning & structuring and data manipulation
ETL Pipelines: Proven experience in designing, building, and maintaining ETL pipelines.
Python: Strong proficiency in Python programming for data manipulation, transformation, and automation.
Experience in SQL and data structures
Knowledge in big data technologies such as Spark, Flink, Hadoop, Apache and NoSQL databases.
Knowledge of cloud technologies (at least one) such as AWS, Azure, and Google Cloud Platform.
Proficient in data management and data governance
Strong analytical and problem-solving skills.
Excellent communication and teamwork abilities.

Nice To Have

Azure: Experience with Azure data services (e.g., Azure Data Factory, Azure Databricks, Azure SQL Database).
Terraform: Knowledge of Terraform for infrastructure as code (IaC) to manage cloud.

We are looking for a self-motivated person with a positive "can-do" attitude and excellent oral and written communication skills in English.

Skills Required

Experience in the manufacturing industry (metal industry is a plus)
2+ years of experience as a Data Engineer
Experience in data cleaning & structuring and data manipulation
ETL Pipelines: Proven experience in designing, building, and maintaining ETL pipelines.
Python: Strong proficiency in Python programming for data manipulation, transformation, and automation.
Experience in SQL and data structures
Knowledge in big data technologies such as Spark, Flink, Hadoop, Apache and NoSQL databases.
Knowledge of cloud technologies (at least one) such as AWS, Azure, and Google Cloud Platform.
Proficient in data management and data governance
Strong analytical and problem-solving skills.
Excellent communication and teamwork abilities.

Nice To Have

Azure: Experience with Azure data services (e.g., Azure Data Factory, Azure Databricks, Azure SQL Database).
Terraform: Knowledge of Terraform for infrastructure as code (IaC) to manage cloud.

Data Engineer

at Wissen Technology

4 recruiters

Posted by Sukanya Mohan

Pune, Bengaluru (Bangalore)

5 - 10 yrs

Best in industry

Amazon Web Services (AWS)

EMR

Python

GLUE

SQL

+1 more

Greetings , Wissen Technology is Hiring for the position of Data Engineer

Please find the Job Description for your Reference:

Design, develop, and maintain data pipelines on AWS EMR (Elastic MapReduce) to support data processing and analytics.
Implement data ingestion processes from various sources including APIs, databases, and flat files.
Optimize and tune big data workflows for performance and scalability.
Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions.
Manage and monitor EMR clusters, ensuring high availability and reliability.
Develop ETL (Extract, Transform, Load) processes to cleanse, transform, and store data in data lakes and data warehouses.
Implement data security best practices to ensure data is protected and compliant with relevant regulations.
Create and maintain technical documentation related to data pipelines, workflows, and infrastructure.
Troubleshoot and resolve issues related to data processing and EMR cluster performance.

Qualifications:

Bachelor’s degree in Computer Science, Information Technology, or a related field.
5+ years of experience in data engineering, with a focus on big data technologies.
Strong experience with AWS services, particularly EMR, S3, Redshift, Lambda, and Glue.
Proficiency in programming languages such as Python, Java, or Scala.
Experience with big data frameworks and tools such as Hadoop, Spark, Hive, and Pig.
Solid understanding of data modeling, ETL processes, and data warehousing concepts.
Experience with SQL and NoSQL databases.
Familiarity with CI/CD pipelines and version control systems (e.g., Git).
Strong problem-solving skills and the ability to work independently and collaboratively in a team environment

Greetings , Wissen Technology is Hiring for the position of Data Engineer

Please find the Job Description for your Reference:

Design, develop, and maintain data pipelines on AWS EMR (Elastic MapReduce) to support data processing and analytics.
Implement data ingestion processes from various sources including APIs, databases, and flat files.
Optimize and tune big data workflows for performance and scalability.
Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions.
Manage and monitor EMR clusters, ensuring high availability and reliability.
Develop ETL (Extract, Transform, Load) processes to cleanse, transform, and store data in data lakes and data warehouses.
Implement data security best practices to ensure data is protected and compliant with relevant regulations.
Create and maintain technical documentation related to data pipelines, workflows, and infrastructure.
Troubleshoot and resolve issues related to data processing and EMR cluster performance.

Qualifications:

Bachelor’s degree in Computer Science, Information Technology, or a related field.
5+ years of experience in data engineering, with a focus on big data technologies.
Strong experience with AWS services, particularly EMR, S3, Redshift, Lambda, and Glue.
Proficiency in programming languages such as Python, Java, or Scala.
Experience with big data frameworks and tools such as Hadoop, Spark, Hive, and Pig.
Solid understanding of data modeling, ETL processes, and data warehousing concepts.
Experience with SQL and NoSQL databases.
Familiarity with CI/CD pipelines and version control systems (e.g., Git).
Strong problem-solving skills and the ability to work independently and collaboratively in a team environment

Sr. Data Engineer (Data Warehouse-Snowflake)

at IntraEdge

1 recruiter

Posted by Karishma Shingote

Pune

5 - 11 yrs

₹5L - ₹15L / yr

SQL

snowflake

Enterprise Data Warehouse (EDW)

Python

PySpark

Sr. Data Engineer (Data Warehouse-Snowflake)

Experience: 5+yrs

Location: Pune (Hybrid)

As a Senior Data engineer with Snowflake expertise you are a subject matter expert who is curious and an innovative thinker to mentor young professionals. You are a key person to convert Vision and Data Strategy for Data solutions and deliver them. With your knowledge you will help create data-driven thinking within the organization, not just within Data teams, but also in the wider stakeholder community.

Skills Preferred

Advanced written, verbal, and analytic skills, and demonstrated ability to influence and facilitate sustained change. Ability to convey information clearly and concisely to all levels of staff and management about programs, services, best practices, strategies, and organizational mission and values.
Proven ability to focus on priorities, strategies, and vision.
Very Good understanding in Data Foundation initiatives, like Data Modelling, Data Quality Management, Data Governance, Data Maturity Assessments and Data Strategy in support of the key business stakeholders.
Actively deliver the roll-out and embedding of Data Foundation initiatives in support of the key business programs advising on the technology and using leading market standard tools.
Coordinate the change management process, incident management and problem management process.
Ensure traceability of requirements from Data through testing and scope changes, to training and transition.
Drive implementation efficiency and effectiveness across the pilots and future projects to minimize cost, increase speed of implementation and maximize value delivery

Knowledge Preferred

Extensive knowledge and hands on experience with Snowflake and its different components like User/Group, Data Store/ Warehouse management, External Stage/table, working with semi structured data, Snowpipe etc.
Implement and manage CI/CD for migrating and deploying codes to higher environments with Snowflake codes.
Proven experience with Snowflake Access control and authentication, data security, data sharing, working with VS Code extension for snowflake, replication, and failover, optimizing SQL, analytical ability to troubleshoot and debug on development and production issues quickly is key for success in this role.
Proven technology champion in working with relational, Data warehouses databases, query authoring (SQL) as well as working familiarity with a variety of databases.
Highly Experienced in building and optimizing complex queries. Good with manipulating, processing, and extracting value from large, disconnected datasets.
Your experience in handling big data sets and big data technologies will be an asset.
Proven champion with in-depth knowledge of any one of the scripting languages: Python, SQL, Pyspark.

Primary responsibilities

You will be an asset in our team bringing deep technical skills and capabilities to become a key part of projects defining the data journey in our company, keen to engage, network and innovate in collaboration with company wide teams.
Collaborate with the data and analytics team to develop and maintain a data model and data governance infrastructure using a range of different storage technologies that enables optimal data storage and sharing using advanced methods.
Support the development of processes and standards for data mining, data modeling and data protection.
Design and implement continuous process improvements for automating manual processes and optimizing data delivery.
Assess and report on the unique data needs of key stakeholders and troubleshoot any data-related technical issues through to resolution.
Work to improve data models that support business intelligence tools, improve data accessibility and foster data-driven decision making.
Ensure traceability of requirements from Data through testing and scope changes, to training and transition.
Manage and lead technical design and development activities for implementation of large-scale data solutions in Snowflake to support multiple use cases (transformation, reporting and analytics, data monetization, etc.).
Translate advanced business data, integration and analytics problems into technical approaches that yield actionable recommendations, across multiple, diverse domains; communicate results and educate others through design and build of insightful presentations.
Exhibit strong knowledge of the Snowflake ecosystem and can clearly articulate the value proposition of cloud modernization/transformation to a wide range of stakeholders.

Relevant work experience

Bachelors in a Science, Technology, Engineering, Mathematics or Computer Science discipline or equivalent with 7+ Years of experience in enterprise-wide data warehousing, governance, policies, procedures, and implementation.

Aptitude for working with data, interpreting results, business intelligence and analytic best practices.

Business understanding

Good knowledge and understanding of Consumer and industrial products sector and IoT.

Good functional understanding of solutions supporting business processes.

Skill Must have

Snowflake 5+ years
Overall different Data warehousing techs 5+ years
SQL 5+ years
Data warehouse designing experience 3+ years
Experience with cloud and on-prem hybrid models in data architecture
Knowledge of Data Governance and strong understanding of data lineage and data quality
Programming & Scripting: Python, Pyspark
Database technologies such as Traditional RDBMS (MS SQL Server, Oracle, MySQL, PostgreSQL)

Nice to have

Demonstrated experience in modern enterprise data integration platforms such as Informatica
AWS cloud services: S3, Lambda, Glue and Kinesis and API Gateway, EC2, EMR, RDS, Redshift and Kinesis
Good understanding of Data Architecture approaches
Experience in designing and building streaming data ingestion, analysis and processing pipelines using Kafka, Kafka Streams, Spark Streaming, Stream sets and similar cloud native technologies.
Experience with implementation of operations concerns for a data platform such as monitoring, security, and scalability
Experience working in DevOps, Agile, Scrum, Continuous Delivery and/or Rapid Application Development environments
Building mock and proof-of-concepts across different capabilities/tool sets exposure
Experience working with structured, semi-structured, and unstructured data, extracting information, and identifying linkages across disparate data sets

Sr. Data Engineer (Data Warehouse-Snowflake)

Experience: 5+yrs

Location: Pune (Hybrid)

Skills Preferred

Advanced written, verbal, and analytic skills, and demonstrated ability to influence and facilitate sustained change. Ability to convey information clearly and concisely to all levels of staff and management about programs, services, best practices, strategies, and organizational mission and values.
Proven ability to focus on priorities, strategies, and vision.
Very Good understanding in Data Foundation initiatives, like Data Modelling, Data Quality Management, Data Governance, Data Maturity Assessments and Data Strategy in support of the key business stakeholders.
Actively deliver the roll-out and embedding of Data Foundation initiatives in support of the key business programs advising on the technology and using leading market standard tools.
Coordinate the change management process, incident management and problem management process.
Ensure traceability of requirements from Data through testing and scope changes, to training and transition.
Drive implementation efficiency and effectiveness across the pilots and future projects to minimize cost, increase speed of implementation and maximize value delivery

Knowledge Preferred

Extensive knowledge and hands on experience with Snowflake and its different components like User/Group, Data Store/ Warehouse management, External Stage/table, working with semi structured data, Snowpipe etc.
Implement and manage CI/CD for migrating and deploying codes to higher environments with Snowflake codes.
Proven experience with Snowflake Access control and authentication, data security, data sharing, working with VS Code extension for snowflake, replication, and failover, optimizing SQL, analytical ability to troubleshoot and debug on development and production issues quickly is key for success in this role.
Proven technology champion in working with relational, Data warehouses databases, query authoring (SQL) as well as working familiarity with a variety of databases.
Highly Experienced in building and optimizing complex queries. Good with manipulating, processing, and extracting value from large, disconnected datasets.
Your experience in handling big data sets and big data technologies will be an asset.
Proven champion with in-depth knowledge of any one of the scripting languages: Python, SQL, Pyspark.

Primary responsibilities

You will be an asset in our team bringing deep technical skills and capabilities to become a key part of projects defining the data journey in our company, keen to engage, network and innovate in collaboration with company wide teams.
Collaborate with the data and analytics team to develop and maintain a data model and data governance infrastructure using a range of different storage technologies that enables optimal data storage and sharing using advanced methods.
Support the development of processes and standards for data mining, data modeling and data protection.
Design and implement continuous process improvements for automating manual processes and optimizing data delivery.
Assess and report on the unique data needs of key stakeholders and troubleshoot any data-related technical issues through to resolution.
Work to improve data models that support business intelligence tools, improve data accessibility and foster data-driven decision making.
Ensure traceability of requirements from Data through testing and scope changes, to training and transition.
Manage and lead technical design and development activities for implementation of large-scale data solutions in Snowflake to support multiple use cases (transformation, reporting and analytics, data monetization, etc.).
Translate advanced business data, integration and analytics problems into technical approaches that yield actionable recommendations, across multiple, diverse domains; communicate results and educate others through design and build of insightful presentations.
Exhibit strong knowledge of the Snowflake ecosystem and can clearly articulate the value proposition of cloud modernization/transformation to a wide range of stakeholders.

Relevant work experience

Aptitude for working with data, interpreting results, business intelligence and analytic best practices.

Business understanding

Good knowledge and understanding of Consumer and industrial products sector and IoT.

Good functional understanding of solutions supporting business processes.

Skill Must have

Snowflake 5+ years
Overall different Data warehousing techs 5+ years
SQL 5+ years
Data warehouse designing experience 3+ years
Experience with cloud and on-prem hybrid models in data architecture
Knowledge of Data Governance and strong understanding of data lineage and data quality
Programming & Scripting: Python, Pyspark
Database technologies such as Traditional RDBMS (MS SQL Server, Oracle, MySQL, PostgreSQL)

Nice to have

Demonstrated experience in modern enterprise data integration platforms such as Informatica
AWS cloud services: S3, Lambda, Glue and Kinesis and API Gateway, EC2, EMR, RDS, Redshift and Kinesis
Good understanding of Data Architecture approaches
Experience in designing and building streaming data ingestion, analysis and processing pipelines using Kafka, Kafka Streams, Spark Streaming, Stream sets and similar cloud native technologies.
Experience with implementation of operations concerns for a data platform such as monitoring, security, and scalability
Experience working in DevOps, Agile, Scrum, Continuous Delivery and/or Rapid Application Development environments
Building mock and proof-of-concepts across different capabilities/tool sets exposure
Experience working with structured, semi-structured, and unstructured data, extracting information, and identifying linkages across disparate data sets

AWS Data Engineer (Contractual)

at Forward Eye Technologies

Posted by Jaya S

Bengaluru (Bangalore), Mumbai, Delhi, Gurugram, Pune, Hyderabad, Ahmedabad, Chennai

3 - 7 yrs

₹8L - ₹15L / yr

AWS Lambda

Amazon S3

Amazon VPC

Amazon EC2

Amazon Redshift

+3 more

Technical Skills:

Ability to understand and translate business requirements into design.
Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
Experience in creating ETL jobs using Python/PySpark.
Proficiency in creating AWS Lambda functions for event-based jobs.
Knowledge of automating ETL processes using AWS Step Functions.
Competence in building data warehouses and loading data into them.

Responsibilities:

Understand business requirements and translate them into design.
Assess AWS infrastructure needs for development work.
Develop ETL jobs using Python/PySpark to meet requirements.
Implement AWS Lambda for event-based tasks.
Automate ETL processes using AWS Step Functions.
Build data warehouses and manage data loading.
Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.

Technical Skills:

Ability to understand and translate business requirements into design.
Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
Experience in creating ETL jobs using Python/PySpark.
Proficiency in creating AWS Lambda functions for event-based jobs.
Knowledge of automating ETL processes using AWS Step Functions.
Competence in building data warehouses and loading data into them.

Responsibilities:

Understand business requirements and translate them into design.
Assess AWS infrastructure needs for development work.
Develop ETL jobs using Python/PySpark to meet requirements.
Implement AWS Lambda for event-based tasks.
Automate ETL processes using AWS Step Functions.
Build data warehouses and manage data loading.
Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.

Senior Data Engineer (L2)

at Publicis Sapient

10 recruiters

Posted by Mohit Singh

Bengaluru (Bangalore), Pune, Hyderabad, Gurugram, Noida

5 - 11 yrs

₹20L - ₹36L / yr

PySpark

Data engineering

Big Data

Hadoop

Spark

+7 more

Publicis Sapient Overview:

The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution

Job Summary:

As Senior Associate L2 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution

The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. You are also required to have hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms.

Role & Responsibilities:

Your role is focused on Design, Development and delivery of solutions involving:

• Data Integration, Processing & Governance

• Data Storage and Computation Frameworks, Performance Optimizations

• Analytics & Visualizations

• Infrastructure & Cloud Computing

• Data Management Platforms

• Implement scalable architectural models for data processing and storage

• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode

• Build functionality for data analytics, search and aggregation

Experience Guidelines:

Mandatory Experience and Competencies:

# Competency

1.Overall 5+ years of IT experience with 3+ years in Data related technologies

2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)

3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline.

4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable

5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc

6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security

Preferred Experience and Knowledge (Good to Have):

# Competency

1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience

2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc

3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures

4.Performance tuning and optimization of data pipelines

5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality

6.Cloud data specialty and other related Big data technology certifications

Personal Attributes:

• Strong written and verbal communication skills

• Articulation skills

• Good team player

• Self-starter who requires minimal oversight

• Ability to prioritize and manage multiple tasks

• Process orientation and the ability to define and set up processes

Publicis Sapient Overview:

Job Summary:

Role & Responsibilities:

Your role is focused on Design, Development and delivery of solutions involving:

• Data Integration, Processing & Governance

• Data Storage and Computation Frameworks, Performance Optimizations

• Analytics & Visualizations

• Infrastructure & Cloud Computing

• Data Management Platforms

• Implement scalable architectural models for data processing and storage

• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode

• Build functionality for data analytics, search and aggregation

Experience Guidelines:

Mandatory Experience and Competencies:

# Competency

1.Overall 5+ years of IT experience with 3+ years in Data related technologies

2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)

4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable

5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc

6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security

Preferred Experience and Knowledge (Good to Have):

# Competency

1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience

2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc

3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures

4.Performance tuning and optimization of data pipelines

5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality

6.Cloud data specialty and other related Big data technology certifications

Personal Attributes:

• Strong written and verbal communication skills

• Articulation skills

• Good team player

• Self-starter who requires minimal oversight

• Ability to prioritize and manage multiple tasks

• Process orientation and the ability to define and set up processes

Senior Data Engineer (L1)

at Publicis Sapient

10 recruiters

Posted by Mohit Singh

Bengaluru (Bangalore), Gurugram, Pune, Hyderabad, Noida

4 - 10 yrs

Best in industry

PySpark

Data engineering

Big Data

Hadoop

Spark

+6 more

Publicis Sapient Overview:

Job Summary:

As Senior Associate L1 in Data Engineering, you will do technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution

The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. Having hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms will be preferable.

Role & Responsibilities:

Job Title: Senior Associate L1 – Data Engineering

Your role is focused on Design, Development and delivery of solutions involving:

• Data Ingestion, Integration and Transformation

• Data Storage and Computation Frameworks, Performance Optimizations

• Analytics & Visualizations

• Infrastructure & Cloud Computing

• Data Management Platforms

• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time

• Build functionality for data analytics, search and aggregation

Experience Guidelines:

Mandatory Experience and Competencies:

# Competency

1.Overall 3.5+ years of IT experience with 1.5+ years in Data related technologies

2.Minimum 1.5 years of experience in Big Data technologies

4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable

5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc

Preferred Experience and Knowledge (Good to Have):

# Competency

1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience

2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc

3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures

4.Performance tuning and optimization of data pipelines

5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality

6.Working knowledge with data platform related services on at least 1 cloud platform, IAM and data security

7.Cloud data specialty and other related Big data technology certifications

Job Title: Senior Associate L1 – Data Engineering

Personal Attributes:

• Strong written and verbal communication skills

• Articulation skills

• Good team player

• Self-starter who requires minimal oversight

• Ability to prioritize and manage multiple tasks

• Process orientation and the ability to define and set up processes

Publicis Sapient Overview:

Job Summary:

The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. Having hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms will be preferable.

Role & Responsibilities:

Job Title: Senior Associate L1 – Data Engineering

Your role is focused on Design, Development and delivery of solutions involving:

• Data Ingestion, Integration and Transformation

• Data Storage and Computation Frameworks, Performance Optimizations

• Analytics & Visualizations

• Infrastructure & Cloud Computing

• Data Management Platforms

• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time

• Build functionality for data analytics, search and aggregation

Experience Guidelines:

Mandatory Experience and Competencies:

# Competency

1.Overall 3.5+ years of IT experience with 1.5+ years in Data related technologies

2.Minimum 1.5 years of experience in Big Data technologies

4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable

5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc

Preferred Experience and Knowledge (Good to Have):

# Competency

1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience

2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc

3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures

4.Performance tuning and optimization of data pipelines

5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality

6.Working knowledge with data platform related services on at least 1 cloud platform, IAM and data security

7.Cloud data specialty and other related Big data technology certifications

Job Title: Senior Associate L1 – Data Engineering

Personal Attributes:

• Strong written and verbal communication skills

• Articulation skills

• Good team player

• Self-starter who requires minimal oversight

• Ability to prioritize and manage multiple tasks

• Process orientation and the ability to define and set up processes

Data engineer

at Mitibase

Posted by Vaidehi Ghangurde

Pune

2 - 4 yrs

₹6L - ₹8L / yr

Vue.js

AngularJS (1.x)

React.js

Angular (2+)

Javascript

+6 more

· The Objective:

You will play a crucial role in designing, implementing, and maintaining our data infrastructure, run tests and update the systems

· Job function and requirements

o Expert in Python, Pandas and Numpy with knowledge of Python web Framework such as Django and Flask.

o Able to integrate multiple data sources and databases into one system.

o Basic understanding of frontend technologies like HTML, CSS, JavaScript.

o Able to build data pipelines.

o Strong unit test and debugging skills.

o Understanding of fundamental design principles behind a scalable application

o Good understanding of RDBMS databases among Mysql or Postgresql.

o Able to analyze and transform raw data.

· About us

Mitibase helps companies find warm prospects every month that are most relevant, and then helps their team to act on those with automation. We do so by automatically tracking key accounts and contacts for job changes and relationships triggers and surfaces them as warm leads in your sales pipeline.

· The Objective:

You will play a crucial role in designing, implementing, and maintaining our data infrastructure, run tests and update the systems

· Job function and requirements

o Expert in Python, Pandas and Numpy with knowledge of Python web Framework such as Django and Flask.

o Able to integrate multiple data sources and databases into one system.

o Basic understanding of frontend technologies like HTML, CSS, JavaScript.

o Able to build data pipelines.

o Strong unit test and debugging skills.

o Understanding of fundamental design principles behind a scalable application

o Good understanding of RDBMS databases among Mysql or Postgresql.

o Able to analyze and transform raw data.

· About us

Big Data developer

one of the world's leading multinational investment bank

Agency job

via HiyaMee by Lithin Raj

Pune

5 - 9 yrs

₹5L - ₹15L / yr

PySpark

Data engineering

Big Data

Hadoop

Spark

+2 more

This role is for a developer with strong core application or system programming skills in Scala, java and
good exposure to concepts and/or technology across the broader spectrum. Enterprise Risk Technology
covers a variety of existing systems and green-field projects.
A Full stack Hadoop development experience with Scala development
A Full stack Java development experience covering Core Java (including JDK 1.8) and good understanding
of design patterns.
Requirements:-
• Strong hands-on development in Java technologies.
• Strong hands-on development in Hadoop technologies like Spark, Scala and experience on Avro.
• Participation in product feature design and documentation
• Requirement break-up, ownership and implantation.
• Product BAU deliveries and Level 3 production defects fixes.
Qualifications & Experience
• Degree holder in numerate subject
• Hands on Experience on Hadoop, Spark, Scala, Impala, Avro and messaging like Kafka
• Experience across a core compiled language – Java
• Proficiency in Java related frameworks like Springs, Hibernate, JPA
• Hands on experience in JDK 1.8 and strong skillset covering Collections, Multithreading with

For internal use only
For internal use only
experience working on Distributed applications.
• Strong hands-on development track record with end-to-end development cycle involvement
• Good exposure to computational concepts
• Good communication and interpersonal skills
• Working knowledge of risk and derivatives pricing (optional)
• Proficiency in SQL (PL/SQL), data modelling.
• Understanding of Hadoop architecture and Scala program language is a good to have.

Data Engineer

consulting & implementation services in the area of Oil & Gas, Mining and Manufacturing Industry

Agency job

via Jobdost by Sathish Kumar

Ahmedabad, Hyderabad, Pune, Delhi

5 - 7 yrs

₹18L - ₹25L / yr

AWS Lambda

AWS Simple Notification Service (SNS)

AWS Simple Queuing Service (SQS)

Python

PySpark

+9 more

Data Engineer

Required skill set: AWS GLUE, AWS LAMBDA, AWS SNS/SQS, AWS ATHENA, SPARK, SNOWFLAKE, PYTHON

Mandatory Requirements 

Experience in AWS Glue
Experience in Apache Parquet 
Proficient in AWS S3 and data lake 
Knowledge of Snowflake
Understanding of file-based ingestion best practices.
Scripting language - Python & pyspark

CORE RESPONSIBILITIES

Create and manage cloud resources in AWS 
Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies 
Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform 
Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations 
Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
Define process improvement opportunities to optimize data collection, insights and displays.
Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible 
Identify and interpret trends and patterns from complex data sets 
Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders. 
Key participant in regular Scrum ceremonies with the agile teams  
Proficient at developing queries, writing reports and presenting findings 
Mentor junior members and bring best industry practices

 QUALIFICATIONS

5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales) 
Strong background in math, statistics, computer science, data science or related discipline
Advanced knowledge one of language: Java, Scala, Python, C# 
Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake  
Proficient with
Data mining/programming tools (e.g. SAS, SQL, R, Python)
Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
Data visualization (e.g. Tableau, Looker, MicroStrategy)
Comfortable learning about and deploying new technologies and tools. 
Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines. 
Good written and oral communication skills and ability to present results to non-technical audiences 
Knowledge of business intelligence and analytical tools, technologies and techniques.

Familiarity and experience in the following is a plus: 

AWS certification
Spark Streaming 
Kafka Streaming / Kafka Connect 
ELK Stack 
Cassandra / MongoDB 
CI/CD: Jenkins, GitLab, Jira, Confluence other related tools

Data Engineer

Required skill set: AWS GLUE, AWS LAMBDA, AWS SNS/SQS, AWS ATHENA, SPARK, SNOWFLAKE, PYTHON

Mandatory Requirements 

Experience in AWS Glue
Experience in Apache Parquet 
Proficient in AWS S3 and data lake 
Knowledge of Snowflake
Understanding of file-based ingestion best practices.
Scripting language - Python & pyspark

CORE RESPONSIBILITIES

Create and manage cloud resources in AWS 
Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies 
Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform 
Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations 
Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
Define process improvement opportunities to optimize data collection, insights and displays.
Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible 
Identify and interpret trends and patterns from complex data sets 
Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders. 
Key participant in regular Scrum ceremonies with the agile teams  
Proficient at developing queries, writing reports and presenting findings 
Mentor junior members and bring best industry practices

 QUALIFICATIONS

5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales) 
Strong background in math, statistics, computer science, data science or related discipline
Advanced knowledge one of language: Java, Scala, Python, C# 
Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake  
Proficient with
Data mining/programming tools (e.g. SAS, SQL, R, Python)
Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
Data visualization (e.g. Tableau, Looker, MicroStrategy)
Comfortable learning about and deploying new technologies and tools. 
Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines. 
Good written and oral communication skills and ability to present results to non-technical audiences 
Knowledge of business intelligence and analytical tools, technologies and techniques.

Familiarity and experience in the following is a plus: 

AWS certification
Spark Streaming 
Kafka Streaming / Kafka Connect 
ELK Stack 
Cassandra / MongoDB 
CI/CD: Jenkins, GitLab, Jira, Confluence other related tools

Data Engineer

at GradMener Technology Pvt. Ltd.

Posted by Soni Jagwani

Pune, Chennai

5 - 9 yrs

₹15L - ₹20L / yr

Scala

PySpark

Spark

SQL Azure

Hadoop

+4 more

5+ years of experience in a Data Engineering role on cloud environment

Must have good experience in Scala/PySpark (preferably on data-bricks environment)

Extensive experience with Transact-SQL.
Experience in Data-bricks/Spark.

Strong experience in Dataware house projects
Expertise in database development projects with ETL processes.
Manage and maintain data engineering pipelines

Develop batch processing, streaming and integration solutions
Experienced in building and operationalizing large-scale enterprise data solutions and applications

Using one or more of Azure data and analytics services in combination with custom solutions
Azure Data Lake, Azure SQL DW (Synapse), and SQL Database products or equivalent products from other cloud services providers

In-depth understanding of data management (e. g. permissions, security, and monitoring).
Cloud repositories for e.g. Azure GitHub, Git
Experience in an agile environment (Prefer Azure DevOps).

Good to have

Manage source data access security
Automate Azure Data Factory pipelines
Continuous Integration/Continuous deployment (CICD) pipelines, Source Repositories
Experience in implementing and maintaining CICD pipelines
Power BI understanding, Delta Lake house architecture
Knowledge of software development best practices.
Excellent analytical and organization skills.
Effective working in a team as well as working independently.
Strong written and verbal communication skills.
Expertise in database development projects and ETL processes.

5+ years of experience in a Data Engineering role on cloud environment

Must have good experience in Scala/PySpark (preferably on data-bricks environment)

Extensive experience with Transact-SQL.
Experience in Data-bricks/Spark.

Strong experience in Dataware house projects
Expertise in database development projects with ETL processes.
Manage and maintain data engineering pipelines

Develop batch processing, streaming and integration solutions
Experienced in building and operationalizing large-scale enterprise data solutions and applications

Using one or more of Azure data and analytics services in combination with custom solutions
Azure Data Lake, Azure SQL DW (Synapse), and SQL Database products or equivalent products from other cloud services providers

In-depth understanding of data management (e. g. permissions, security, and monitoring).
Cloud repositories for e.g. Azure GitHub, Git
Experience in an agile environment (Prefer Azure DevOps).

Good to have

Manage source data access security
Automate Azure Data Factory pipelines
Continuous Integration/Continuous deployment (CICD) pipelines, Source Repositories
Experience in implementing and maintaining CICD pipelines
Power BI understanding, Delta Lake house architecture
Knowledge of software development best practices.
Excellent analytical and organization skills.
Effective working in a team as well as working independently.
Strong written and verbal communication skills.
Expertise in database development projects and ETL processes.

Data Architect (SG0601)

at EnterpriseMinds

2 recruiters

Posted by phani kalyan

Pune

9 - 14 yrs

₹20L - ₹40L / yr

Spark

Hadoop

Big Data

Data engineering

PySpark

+3 more

Job Id: SG0601

Hi,

Enterprise Minds is looking for Data Architect for Pune Location.

Req Skills:
Python,Pyspark,Hadoop,Java,Scala

Big data developer

Persistent System Ltd

Agency job

via Milestone Hr Consultancy by Haina khan

Pune, Bengaluru (Bangalore), Hyderabad

4 - 9 yrs

₹8L - ₹27L / yr

Python

PySpark

Amazon Web Services (AWS)

Spark

Scala

Greetings..

We have urgent requirement of Data Engineer/Sr Data Engineer for reputed MNC company.

Exp: 4-9yrs

Location: Pune/Bangalore/Hyderabad

Skills: We need candidate either Python AWS or Pyspark AWS or Spark Scala

Big data Developer

at Persistent Systems

1 video

1 recruiter

Agency job

via Milestone Hr Consultancy by Haina khan

Pune, Bengaluru (Bangalore), Hyderabad, Nagpur

4 - 9 yrs

₹4L - ₹15L / yr

Spark

Hadoop

Big Data

Data engineering

PySpark

+3 more

Greetings..

We have an urgent requirements of Big Data Developer profiles in our reputed MNC company.

Location: Pune/Bangalore/Hyderabad/Nagpur
Experience: 4-9yrs

Skills: Pyspark,AWS
or Spark,Scala,AWS
or Python Aws

Big Data Engineer

Hiring for one of the MNC for India location

Agency job

via Natalie Consultants by Rahul Kumar

Gurugram, Pune, Bengaluru (Bangalore), Delhi, Noida, Ghaziabad, Faridabad

2 - 9 yrs

₹8L - ₹20L / yr

Python

Hadoop

Big Data

Spark

Data engineering

+3 more

Key Responsibilities : ( Data Developer Python, Spark)

Exp : 2 to 9 Yrs

Development of data platforms, integration frameworks, processes, and code.

Develop and deliver APIs in Python or Scala for Business Intelligence applications build using a range of web languages

Develop comprehensive automated tests for features via end-to-end integration tests, performance tests, acceptance tests and unit tests.

Elaborate stories in a collaborative agile environment (SCRUM or Kanban)

Familiarity with cloud platforms like GCP, AWS or Azure.

Experience with large data volumes.

Familiarity with writing rest-based services.

Experience with distributed processing and systems

Experience with Hadoop / Spark toolsets

Experience with relational database management systems (RDBMS)

Experience with Data Flow development

Knowledge of Agile and associated development techniques including:

Key Responsibilities : ( Data Developer Python, Spark)

Exp : 2 to 9 Yrs

Development of data platforms, integration frameworks, processes, and code.

Develop and deliver APIs in Python or Scala for Business Intelligence applications build using a range of web languages

Develop comprehensive automated tests for features via end-to-end integration tests, performance tests, acceptance tests and unit tests.

Elaborate stories in a collaborative agile environment (SCRUM or Kanban)

Familiarity with cloud platforms like GCP, AWS or Azure.

Experience with large data volumes.

Familiarity with writing rest-based services.

Experience with distributed processing and systems

Experience with Hadoop / Spark toolsets

Experience with relational database management systems (RDBMS)

Experience with Data Flow development

Knowledge of Agile and associated development techniques including:

Pyspark Lead/Pyspark Dev

at Virtusa

2 recruiters

Agency job

via Response Informatics by Anupama Lavanya Uppala

Chennai, Bengaluru (Bangalore), Mumbai, Hyderabad, Pune

3 - 10 yrs

₹10L - ₹25L / yr

PySpark

Python

Minimum 1 years of relevant experience, in PySpark (mandatory)
Hands on experience in development, test, deploy, maintain and improving data integration pipeline in AWS cloud environment is added plus
Ability to play lead role and independently manage 3-5 member of Pyspark development team
EMR ,Python and PYspark mandate.
Knowledge and awareness working with AWS Cloud technologies like Apache Spark, , Glue, Kafka, Kinesis, and Lambda in S3, Redshift, RDS

Minimum 1 years of relevant experience, in PySpark (mandatory)
Hands on experience in development, test, deploy, maintain and improving data integration pipeline in AWS cloud environment is added plus
Ability to play lead role and independently manage 3-5 member of Pyspark development team
EMR ,Python and PYspark mandate.
Knowledge and awareness working with AWS Cloud technologies like Apache Spark, , Glue, Kafka, Kinesis, and Lambda in S3, Redshift, RDS

Data Engineer

Cloud infrastructure solutions and support company. (SE1)

Agency job

via Multi Recruit by Ranjini A R

Pune

2 - 6 yrs

₹12L - ₹16L / yr

SQL

ETL

Data engineering

Big Data

Java

+2 more

Design, create, test, and maintain data pipeline architecture in collaboration with the Data Architect.
Build the infrastructure required for extraction, transformation, and loading of data from a wide variety of data sources using Java, SQL, and Big Data technologies.
Support the translation of data needs into technical system requirements. Support in building complex queries required by the product teams.
Build data pipelines that clean, transform, and aggregate data from disparate sources
Develop, maintain and optimize ETLs to increase data accuracy, data stability, data availability, and pipeline performance.
Engage with Product Management and Business to deploy and monitor products/services on cloud platforms.
Stay up-to-date with advances in data persistence and big data technologies and run pilots to design the data architecture to scale with the increased data sets of consumer experience.
Handle data integration, consolidation, and reconciliation activities for digital consumer / medical products.

Job Qualifications:

Bachelor’s or master's degree in Computer Science, Information management, Statistics or related field
5+ years of experience in the Consumer or Healthcare industry in an analytical role with a focus on building on data pipelines, querying data, analyzing, and clearly presenting analyses to members of the data science team.
Technical expertise with data models, data mining.
Hands-on Knowledge of programming languages in Java, Python, R, and Scala.
Strong knowledge in Big data tools like the snowflake, AWS Redshift, Hadoop, map-reduce, etc.
Having knowledge in tools like AWS Glue, S3, AWS EMR, Streaming data pipelines, Kafka/Kinesis is desirable.
Hands-on knowledge in SQL and No-SQL database design.
Having knowledge in CI/CD for the building and hosting of the solutions.
Having AWS certification is an added advantage.
Having Strong knowledge in visualization tools like Tableau, QlikView is an added advantage
A team player capable of working and integrating across cross-functional teams for implementing project requirements. Experience in technical requirements gathering and documentation.
Ability to work effectively and independently in a fast-paced agile environment with tight deadlines
A flexible, pragmatic, and collaborative team player with the innate ability to engage with data architects, analysts, and scientists

Design, create, test, and maintain data pipeline architecture in collaboration with the Data Architect.
Build the infrastructure required for extraction, transformation, and loading of data from a wide variety of data sources using Java, SQL, and Big Data technologies.
Support the translation of data needs into technical system requirements. Support in building complex queries required by the product teams.
Build data pipelines that clean, transform, and aggregate data from disparate sources
Develop, maintain and optimize ETLs to increase data accuracy, data stability, data availability, and pipeline performance.
Engage with Product Management and Business to deploy and monitor products/services on cloud platforms.
Stay up-to-date with advances in data persistence and big data technologies and run pilots to design the data architecture to scale with the increased data sets of consumer experience.
Handle data integration, consolidation, and reconciliation activities for digital consumer / medical products.

Job Qualifications:

Bachelor’s or master's degree in Computer Science, Information management, Statistics or related field
5+ years of experience in the Consumer or Healthcare industry in an analytical role with a focus on building on data pipelines, querying data, analyzing, and clearly presenting analyses to members of the data science team.
Technical expertise with data models, data mining.
Hands-on Knowledge of programming languages in Java, Python, R, and Scala.
Strong knowledge in Big data tools like the snowflake, AWS Redshift, Hadoop, map-reduce, etc.
Having knowledge in tools like AWS Glue, S3, AWS EMR, Streaming data pipelines, Kafka/Kinesis is desirable.
Hands-on knowledge in SQL and No-SQL database design.
Having knowledge in CI/CD for the building and hosting of the solutions.
Having AWS certification is an added advantage.
Having Strong knowledge in visualization tools like Tableau, QlikView is an added advantage
A team player capable of working and integrating across cross-functional teams for implementing project requirements. Experience in technical requirements gathering and documentation.
Ability to work effectively and independently in a fast-paced agile environment with tight deadlines
A flexible, pragmatic, and collaborative team player with the innate ability to engage with data architects, analysts, and scientists

Data Engineer For Python

at A2Tech Consultants

3 recruiters

Posted by Dhaval B

Pune

4 - 12 yrs

₹6L - ₹15L / yr

Data engineering

Data Engineer

ETL

Spark

Apache Kafka

+5 more

We are looking for a smart candidate with:

Strong Python Coding skills and OOP skills
Should have worked on Big Data product Architecture
Should have worked with any one of the SQL-based databases like MySQL, PostgreSQL and any one of
NoSQL-based databases such as Cassandra, Elasticsearch etc.
Hands on experience on frameworks like Spark RDD, DataFrame, Dataset
Experience on development of ETL for data product
Candidate should have working knowledge on performance optimization, optimal resource utilization, Parallelism and tuning of spark jobs
Working knowledge on file formats: CSV, JSON, XML, PARQUET, ORC, AVRO
Good to have working knowledge with any one of the Analytical Databases like Druid, MongoDB, Apache Hive etc.
Experience to handle real-time data feeds (good to have working knowledge on Apache Kafka or similar tool)

Key Skills:

Python and Scala (Optional), Spark / PySpark, Parallel programming

We are looking for a smart candidate with:

Strong Python Coding skills and OOP skills
Should have worked on Big Data product Architecture
Should have worked with any one of the SQL-based databases like MySQL, PostgreSQL and any one of
NoSQL-based databases such as Cassandra, Elasticsearch etc.
Hands on experience on frameworks like Spark RDD, DataFrame, Dataset
Experience on development of ETL for data product
Candidate should have working knowledge on performance optimization, optimal resource utilization, Parallelism and tuning of spark jobs
Working knowledge on file formats: CSV, JSON, XML, PARQUET, ORC, AVRO
Good to have working knowledge with any one of the Analytical Databases like Druid, MongoDB, Apache Hive etc.
Experience to handle real-time data feeds (good to have working knowledge on Apache Kafka or similar tool)

Key Skills:

Python and Scala (Optional), Spark / PySpark, Parallel programming

Bigdata Lead Architecture

at DataMetica

1 video

7 recruiters

Posted by Nikita Aher

Pune, Hyderabad

7 - 12 yrs

₹12L - ₹33L / yr

Big Data

Hadoop

Spark

Apache Spark

Apache Hive

+3 more

Job description

Role : Lead Architecture (Spark, Scala, Big Data/Hadoop, Java)

Primary Location : India-Pune, Hyderabad

Experience : 7 - 12 Years

Management Level: 7

Joining Time: Immediate Joiners are preferred

Attend requirements gathering workshops, estimation discussions, design meetings and status review meetings
Experience of Solution Design and Solution Architecture for the data engineer model to build and implement Big Data Projects on-premises and on cloud.
Align architecture with business requirements and stabilizing the developed solution
Ability to build prototypes to demonstrate the technical feasibility of your vision
Professional experience facilitating and leading solution design, architecture and delivery planning activities for data intensive and high throughput platforms and applications
To be able to benchmark systems, analyses system bottlenecks and propose solutions to eliminate them
Able to help programmers and project managers in the design, planning and governance of implementing projects of any kind.
Develop, construct, test and maintain architectures and run Sprints for development and rollout of functionalities
Data Analysis, Code development experience, ideally in Big Data Spark, Hive, Hadoop, Java, Python, PySpark,
Execute projects of various types i.e. Design, development, Implementation and migration of functional analytics Models/Business logic across architecture approaches
Work closely with Business Analysts to understand the core business problems and deliver efficient IT solutions of the product
Deployment sophisticated analytics program of code using any of cloud application.

Perks and Benefits we Provide!

Working with Highly Technical and Passionate, mission-driven people
Subsidized Meals & Snacks
Flexible Schedule
Approachable leadership
Access to various learning tools and programs
Pet Friendly
Certification Reimbursement Policy
Check out more about us on our website below!

www.datametica.com

Job description

Role : Lead Architecture (Spark, Scala, Big Data/Hadoop, Java)

Primary Location : India-Pune, Hyderabad

Experience : 7 - 12 Years

Management Level: 7

Joining Time: Immediate Joiners are preferred

Attend requirements gathering workshops, estimation discussions, design meetings and status review meetings
Experience of Solution Design and Solution Architecture for the data engineer model to build and implement Big Data Projects on-premises and on cloud.
Align architecture with business requirements and stabilizing the developed solution
Ability to build prototypes to demonstrate the technical feasibility of your vision
Professional experience facilitating and leading solution design, architecture and delivery planning activities for data intensive and high throughput platforms and applications
To be able to benchmark systems, analyses system bottlenecks and propose solutions to eliminate them
Able to help programmers and project managers in the design, planning and governance of implementing projects of any kind.
Develop, construct, test and maintain architectures and run Sprints for development and rollout of functionalities
Data Analysis, Code development experience, ideally in Big Data Spark, Hive, Hadoop, Java, Python, PySpark,
Execute projects of various types i.e. Design, development, Implementation and migration of functional analytics Models/Business logic across architecture approaches
Work closely with Business Analysts to understand the core business problems and deliver efficient IT solutions of the product
Deployment sophisticated analytics program of code using any of cloud application.

Perks and Benefits we Provide!

Working with Highly Technical and Passionate, mission-driven people
Subsidized Meals & Snacks
Flexible Schedule
Approachable leadership
Access to various learning tools and programs
Pet Friendly
Certification Reimbursement Policy
Check out more about us on our website below!

www.datametica.com

Sr Data Engineer

at Infogain

Agency job

via Technogen India PvtLtd by RAHUL BATTA

Bengaluru (Bangalore), Pune, Noida, NCR (Delhi | Gurgaon | Noida)

7 - 10 yrs

₹20L - ₹25L / yr

Data engineering

Python

SQL

Spark

PySpark

+10 more

Sr. Data Engineer:

Core Skills – Data Engineering, Big Data, Pyspark, Spark SQL and Python

Candidate with prior Palantir Cloud Foundry OR Clinical Trial Data Model background is preferred

Major accountabilities:

Responsible for Data Engineering, Foundry Data Pipeline Creation, Foundry Analysis & Reporting, Slate Application development, re-usable code development & management and Integrating Internal or External System with Foundry for data ingestion with high quality.
Have good understanding on Foundry Platform landscape and it’s capabilities
Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues.
Defines company data assets (data models), Pyspark, spark SQL, jobs to populate data models.
Designs data integrations and data quality framework.
Design & Implement integration with Internal, External Systems, F1 AWS platform using Foundry Data Connector or Magritte Agent
Collaboration with data scientists, data analyst and technology teams to document and leverage their understanding of the Foundry integration with different data sources - Actively participate in agile work practices
Coordinating with Quality Engineer to ensure the all quality controls, naming convention & best practices have been followed

Desired Candidate Profile :

Strong data engineering background
Experience with Clinical Data Model is preferred
Experience in

SQL Server ,Postgres, Cassandra, Hadoop, and Spark for distributed data storage and parallel computing
Java and Groovy for our back-end applications and data integration tools
Python for data processing and analysis
Cloud infrastructure based on AWS EC2 and S3

7+ years IT experience, 2+ years’ experience in Palantir Foundry Platform, 4+ years’ experience in Big Data platform
5+ years of Python and Pyspark development experience
Strong troubleshooting and problem solving skills
BTech or master's degree in computer science or a related technical field
Experience designing, building, and maintaining big data pipelines systems
Hands-on experience on Palantir Foundry Platform and Foundry custom Apps development
Able to design and implement data integration between Palantir Foundry and external Apps based on Foundry data connector framework
Hands-on in programming languages primarily Python, R, Java, Unix shell scripts
Hand-on experience in AWS / Azure cloud platform and stack
Strong in API based architecture and concept, able to do quick PoC using API integration and development
Knowledge of machine learning and AI
Skill and comfort working in a rapidly changing environment with dynamic objectives and iteration with users.

Demonstrated ability to continuously learn, work independently, and make decisions with minimal supervision

Sr. Data Engineer:

Core Skills – Data Engineering, Big Data, Pyspark, Spark SQL and Python

Candidate with prior Palantir Cloud Foundry OR Clinical Trial Data Model background is preferred

Major accountabilities:

Responsible for Data Engineering, Foundry Data Pipeline Creation, Foundry Analysis & Reporting, Slate Application development, re-usable code development & management and Integrating Internal or External System with Foundry for data ingestion with high quality.
Have good understanding on Foundry Platform landscape and it’s capabilities
Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues.
Defines company data assets (data models), Pyspark, spark SQL, jobs to populate data models.
Designs data integrations and data quality framework.
Design & Implement integration with Internal, External Systems, F1 AWS platform using Foundry Data Connector or Magritte Agent
Collaboration with data scientists, data analyst and technology teams to document and leverage their understanding of the Foundry integration with different data sources - Actively participate in agile work practices
Coordinating with Quality Engineer to ensure the all quality controls, naming convention & best practices have been followed

Desired Candidate Profile :

Strong data engineering background
Experience with Clinical Data Model is preferred
Experience in

SQL Server ,Postgres, Cassandra, Hadoop, and Spark for distributed data storage and parallel computing
Java and Groovy for our back-end applications and data integration tools
Python for data processing and analysis
Cloud infrastructure based on AWS EC2 and S3

7+ years IT experience, 2+ years’ experience in Palantir Foundry Platform, 4+ years’ experience in Big Data platform
5+ years of Python and Pyspark development experience
Strong troubleshooting and problem solving skills
BTech or master's degree in computer science or a related technical field
Experience designing, building, and maintaining big data pipelines systems
Hands-on experience on Palantir Foundry Platform and Foundry custom Apps development
Able to design and implement data integration between Palantir Foundry and external Apps based on Foundry data connector framework
Hands-on in programming languages primarily Python, R, Java, Unix shell scripts
Hand-on experience in AWS / Azure cloud platform and stack
Strong in API based architecture and concept, able to do quick PoC using API integration and development
Knowledge of machine learning and AI
Skill and comfort working in a rapidly changing environment with dynamic objectives and iteration with users.

Demonstrated ability to continuously learn, work independently, and make decisions with minimal supervision

Azure Data Engineer

at Fragma Data Systems

8 recruiters

Posted by Evelyn Charles

Remote, Bengaluru (Bangalore), Hyderabad, Chennai, Mumbai, Pune

8 - 15 yrs

₹16L - ₹28L / yr

PySpark

SQL Azure

azure synapse

Windows Azure

Azure Data Engineer

+3 more

Technology Skills:

Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
Experience in migrating on-premise data warehouses to data platforms on AZURE cloud.
Designing and implementing data engineering, ingestion, and transformation functions

Good to Have:

Experience with Azure Analysis Services
Experience in Power BI
Experience with third-party solutions like Attunity/Stream sets, Informatica
Experience with PreSales activities (Responding to RFPs, Executing Quick POCs)
Capacity Planning and Performance Tuning on Azure Stack and Spark.

Technology Skills:

Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
Experience in migrating on-premise data warehouses to data platforms on AZURE cloud.
Designing and implementing data engineering, ingestion, and transformation functions

Good to Have:

Experience with Azure Analysis Services
Experience in Power BI
Experience with third-party solutions like Attunity/Stream sets, Informatica
Experience with PreSales activities (Responding to RFPs, Executing Quick POCs)
Capacity Planning and Performance Tuning on Azure Stack and Spark.

Get to hear about interesting companies hiring right now

Follow Cutshort

Why apply via Cutshort?

Connect with actual hiring teams and get their fast response. No spam.

Find more jobs

Get to hear about interesting companies hiring right now

Follow Cutshort