Pyspark jobs

50+ PySpark Jobs in India

Apply to 50+ PySpark Jobs on CutShort.io. Find your next job, effortlessly. Browse PySpark Jobs and apply today!

Manager - Data Scientist

at Data Axle

2 candid answers

Posted by Eman Khan

Pune

12 - 17 yrs

Best in industry

databricks

Python

PySpark

Machine Learning (ML)

SQL

+1 more

Roles & Responsibilities:

We are looking for a Data Scientist to join the Data Science Client Services team to continue our success of identifying high quality target audiences that generate profitable marketing return for our clients. We are looking for experienced data science, machine learning and MLOps practitioners to design, build and deploy impactful predictive marketing solutions that serve a wide range of verticals and clients. The right candidate will enjoy contributing to and learning from a highly talented team and working on a variety of projects.

We are looking for a Manager Data Scientist who will be responsible for

Ownership of design, implementation, and deployment of machine learning algorithms in a modern Python-based cloud architecture
Design or enhance ML workflows for data ingestion, model design, model inference and scoring 3. Oversight on team project execution and delivery
Establish peer review guidelines for high quality coding to help develop junior team members’ skill set growth, cross-training, and team efficiencies
Visualize and publish model performance results and insights to internal and external audiences

Qualifications:

Masters in a relevant quantitative, applied field (Statistics, Econometrics, Computer Science, Mathematics, Engineering)
Minimum of 12+ years of work experience in the end-to-end lifecycle of ML model development and deployment into production within a cloud infrastructure (Databricks is highly preferred) 3. Proven ability to manage the output of a small team in a fast-paced environment and to lead by example in the fulfilment of client requests
Exhibit deep knowledge of core mathematical principles relating to data science and machine learning (ML Theory + Best Practices, Feature Engineering and Selection, Supervised and Unsupervised ML, A/B Testing, etc.)
Proficiency in Python and SQL required; PySpark/Spark experience a plus
Ability to conduct a productive peer review and proper code structure in Github
Proven experience developing, testing, and deploying various ML algorithms (neural networks, XGBoost, Bayes, and the like)
Working knowledge of modern CI/CD methods

This position description is intended to describe the duties most frequently performed by an individual in this position. It is not intended to be a complete list of assigned duties but to describe a position level.

Roles & Responsibilities:

We are looking for a Manager Data Scientist who will be responsible for

Ownership of design, implementation, and deployment of machine learning algorithms in a modern Python-based cloud architecture
Design or enhance ML workflows for data ingestion, model design, model inference and scoring 3. Oversight on team project execution and delivery
Establish peer review guidelines for high quality coding to help develop junior team members’ skill set growth, cross-training, and team efficiencies
Visualize and publish model performance results and insights to internal and external audiences

Qualifications:

Masters in a relevant quantitative, applied field (Statistics, Econometrics, Computer Science, Mathematics, Engineering)
Minimum of 12+ years of work experience in the end-to-end lifecycle of ML model development and deployment into production within a cloud infrastructure (Databricks is highly preferred) 3. Proven ability to manage the output of a small team in a fast-paced environment and to lead by example in the fulfilment of client requests
Exhibit deep knowledge of core mathematical principles relating to data science and machine learning (ML Theory + Best Practices, Feature Engineering and Selection, Supervised and Unsupervised ML, A/B Testing, etc.)
Proficiency in Python and SQL required; PySpark/Spark experience a plus
Ability to conduct a productive peer review and proper code structure in Github
Proven experience developing, testing, and deploying various ML algorithms (neural networks, XGBoost, Bayes, and the like)
Working knowledge of modern CI/CD methods

Data Engineer

at Deqode

1 recruiter

Posted by Alisha Das

Bengaluru (Bangalore), Delhi, Gurugram, Noida, Ghaziabad, Faridabad, Mumbai, Pune, Hyderabad, Indore, Jaipur, Kolkata

4 - 5 yrs

₹2L - ₹18L / yr

Python

PySpark

We are looking for a skilled and passionate Data Engineers with a strong foundation in Python programming and hands-on experience working with APIs, AWS cloud, and modern development practices. The ideal candidate will have a keen interest in building scalable backend systems and working with big data tools like PySpark.

Key Responsibilities:

Write clean, scalable, and efficient Python code.
Work with Python frameworks such as PySpark for data processing.
Design, develop, update, and maintain APIs (RESTful).
Deploy and manage code using GitHub CI/CD pipelines.
Collaborate with cross-functional teams to define, design, and ship new features.
Work on AWS cloud services for application deployment and infrastructure.
Basic database design and interaction with MySQL or DynamoDB.
Debugging and troubleshooting application issues and performance bottlenecks.

Required Skills & Qualifications:

4+ years of hands-on experience with Python development.
Proficient in Python basics with a strong problem-solving approach.
Experience with AWS Cloud services (EC2, Lambda, S3, etc.).
Good understanding of API development and integration.
Knowledge of GitHub and CI/CD workflows.
Experience in working with PySpark or similar big data frameworks.
Basic knowledge of MySQL or DynamoDB.
Excellent communication skills and a team-oriented mindset.

Nice to Have:

Experience in containerization (Docker/Kubernetes).
Familiarity with Agile/Scrum methodologies.

Key Responsibilities:

Write clean, scalable, and efficient Python code.
Work with Python frameworks such as PySpark for data processing.
Design, develop, update, and maintain APIs (RESTful).
Deploy and manage code using GitHub CI/CD pipelines.
Collaborate with cross-functional teams to define, design, and ship new features.
Work on AWS cloud services for application deployment and infrastructure.
Basic database design and interaction with MySQL or DynamoDB.
Debugging and troubleshooting application issues and performance bottlenecks.

Required Skills & Qualifications:

4+ years of hands-on experience with Python development.
Proficient in Python basics with a strong problem-solving approach.
Experience with AWS Cloud services (EC2, Lambda, S3, etc.).
Good understanding of API development and integration.
Knowledge of GitHub and CI/CD workflows.
Experience in working with PySpark or similar big data frameworks.
Basic knowledge of MySQL or DynamoDB.
Excellent communication skills and a team-oriented mindset.

Nice to Have:

Experience in containerization (Docker/Kubernetes).
Familiarity with Agile/Scrum methodologies.

Azure Data Engineer

NA

Agency job

via Method Hub by Sampreetha Pai

anywhere in India

4 - 5 yrs

₹18L - ₹22L / yr

SQL Azure

Apache Spark

DevOps

PySpark

Python

+1 more

Azure DE

Primary Responsibilities -

Create and maintain data storage solutions including Azure SQL Database, Azure Data Lake, and Azure Blob Storage.
Design, implement, and maintain data pipelines for data ingestion, processing, and transformation in Azure Create data models for analytics purposes
Utilizing Azure Data Factory or comparable technologies, create and maintain ETL (Extract, Transform, Load) operations
Use Azure Data Factory and Databricks to assemble large, complex data sets
Implementing data validation and cleansing procedures will ensure the quality, integrity, and dependability of the data.
Ensure data security and compliance
Collaborate with data engineers, and other stakeholders to understand requirements and translate them into scalable and reliable data platform architectures

Required skills:

Blend of technical expertise, analytical problem-solving, and collaboration with cross-functional teams
Azure DevOps
Apache Spark, Python
SQL proficiency
Azure Databricks knowledge
Big data technologies

The DEs should be well versed in coding, spark core and data ingestion using Azure. Moreover, they need to be decent in terms of communication skills. They should also have core Azure DE skills and coding skills (pyspark, python and SQL).

Out of the 7 open demands, 5 of The Azure Data Engineers should have minimum 5 years of relevant Data Engineering experience.

Azure DE

Primary Responsibilities -

Create and maintain data storage solutions including Azure SQL Database, Azure Data Lake, and Azure Blob Storage.
Design, implement, and maintain data pipelines for data ingestion, processing, and transformation in Azure Create data models for analytics purposes
Utilizing Azure Data Factory or comparable technologies, create and maintain ETL (Extract, Transform, Load) operations
Use Azure Data Factory and Databricks to assemble large, complex data sets
Implementing data validation and cleansing procedures will ensure the quality, integrity, and dependability of the data.
Ensure data security and compliance
Collaborate with data engineers, and other stakeholders to understand requirements and translate them into scalable and reliable data platform architectures

Required skills:

Blend of technical expertise, analytical problem-solving, and collaboration with cross-functional teams
Azure DevOps
Apache Spark, Python
SQL proficiency
Azure Databricks knowledge
Big data technologies

Out of the 7 open demands, 5 of The Azure Data Engineers should have minimum 5 years of relevant Data Engineering experience.

Data Engineer

at NonStop io Technologies Pvt Ltd

6 recruiters

Posted by Kalyani Wadnere

Pune

2 - 4 yrs

Best in industry

AWS Lambda

databricks

Database migration

Apache Kafka

Apache Spark

+3 more

About NonStop io Technologies:

NonStop io Technologies is a value-driven company with a strong focus on process-oriented software engineering. We specialize in Product Development and have a decade's worth of experience in building web and mobile applications across various domains. NonStop io Technologies follows core principles that guide its operations and believes in staying invested in a product's vision for the long term. We are a small but proud group of individuals who believe in the 'givers gain' philosophy and strive to provide value in order to seek value. We are committed to and specialize in building cutting-edge technology products and serving as trusted technology partners for startups and enterprises. We pride ourselves on fostering innovation, learning, and community engagement. Join us to work on impactful projects in a collaborative and vibrant environment.

Brief Description:

We are looking for a talented Data Engineer to join our team. In this role, you will design, implement, and manage data pipelines, ensuring the accessibility and reliability of data for critical business processes. This is an exciting opportunity to work on scalable solutions that power data-driven decisions

Skillset:

Here is a list of some of the technologies you will work with (the list below is not set in stone)

Data Pipeline Orchestration and Execution:

● AWS Glue

● AWS Step Functions

● Databricks Change

Data Capture:

● Amazon Database Migration Service

● Amazon Managed Streaming for Apache Kafka with Debezium Plugin

Batch:

● AWS step functions (and Glue Jobs)

● Asynchronous queueing of batch job commands with RabbitMQ to various “ETL Jobs”

● Cron and subervisord processing on dedicated job server(s): Python & PHP

Streaming:

● Real-time processing via AWS MSK (Kafka), Apache Hudi, & Apache Flink

● Near real-time processing via worker (listeners) spread over AWS Lambda, custom server (daemons) written in Python and PHP Symfony

● Languages: Python & PySpark, Unix Shell, PHP Symfony (with Doctrine ORM)

● Monitoring & Reliability: Datadog & Cloudwatch

Things you will do:

● Build dashboards using Datadog and Cloudwatch to ensure system health and user support

● Build schema registries that enable data governance

● Partner with end-users to resolve service disruptions and evangelize our data product offerings

● Vigilantly oversee data quality and alert upstream data producers of issues

● Support and contribute to the data platform architecture strategy, roadmap, and implementation plans to support the company’s data-driven initiatives and business objective

● Work with Business Intelligence (BI) consumers to deliver enterprise-wide fact and dimension data product tables to enable data-driven decision-making across the organization.

● Other duties as assigned

About NonStop io Technologies:

Brief Description:

Skillset:

Here is a list of some of the technologies you will work with (the list below is not set in stone)

Data Pipeline Orchestration and Execution:

● AWS Glue

● AWS Step Functions

● Databricks Change

Data Capture:

● Amazon Database Migration Service

● Amazon Managed Streaming for Apache Kafka with Debezium Plugin

Batch:

● AWS step functions (and Glue Jobs)

● Asynchronous queueing of batch job commands with RabbitMQ to various “ETL Jobs”

● Cron and subervisord processing on dedicated job server(s): Python & PHP

Streaming:

● Real-time processing via AWS MSK (Kafka), Apache Hudi, & Apache Flink

● Near real-time processing via worker (listeners) spread over AWS Lambda, custom server (daemons) written in Python and PHP Symfony

● Languages: Python & PySpark, Unix Shell, PHP Symfony (with Doctrine ORM)

● Monitoring & Reliability: Datadog & Cloudwatch

Things you will do:

● Build dashboards using Datadog and Cloudwatch to ensure system health and user support

● Build schema registries that enable data governance

● Partner with end-users to resolve service disruptions and evangelize our data product offerings

● Vigilantly oversee data quality and alert upstream data producers of issues

● Support and contribute to the data platform architecture strategy, roadmap, and implementation plans to support the company’s data-driven initiatives and business objective

● Work with Business Intelligence (BI) consumers to deliver enterprise-wide fact and dimension data product tables to enable data-driven decision-making across the organization.

● Other duties as assigned

Snowflake on AWS

at Risk Resources LLP hyd

Posted by susmitha o

Chennai, Kolkata, Hyderabad

6 - 10 yrs

₹7L - ₹20L / yr

snowflake

Amazon Web Services (AWS)

databricks

PySpark

Expert level fluency with Snowflake, SQL, Tableau, and Python is required.

• Experience with data modeling in Snowflake, performance tuning the queries and optimization.

• Computational analysis using Snowflake, mySQL, Python, Tableau and Business Objects.

• Experience with building Cloud Native applications using AWS.

Expert level fluency with Snowflake, SQL, Tableau, and Python is required.

• Experience with data modeling in Snowflake, performance tuning the queries and optimization.

• Computational analysis using Snowflake, mySQL, Python, Tableau and Business Objects.

• Experience with building Cloud Native applications using AWS.

GCP Senior Data Engineer

at Xebia IT Architects

2 recruiters

Posted by Vijay S

Bengaluru (Bangalore), Gurugram, Pune, Hyderabad, Chennai, Bhopal, Jaipur

10 - 15 yrs

₹30L - ₹40L / yr

Spark

Google Cloud Platform (GCP)

Python

Apache Airflow

PySpark

+1 more

We are looking for a Senior Data Engineer with strong expertise in GCP, Databricks, and Airflow to design and implement a GCP Cloud Native Data Processing Framework. The ideal candidate will work on building scalable data pipelines and help migrate existing workloads to a modern framework.

Shift: 2 PM 11 PM
Work Mode: Hybrid (3 days a week) across Xebia locations
Notice Period: Immediate joiners or those with a notice period of up to 30 days

Key Responsibilities:

Design and implement a GCP Native Data Processing Framework leveraging Spark and GCP Cloud Services.
Develop and maintain data pipelines using Databricks and Airflow for transforming Raw → Silver → Gold data layers.
Ensure data integrity, consistency, and availability across all systems.
Collaborate with data engineers, analysts, and stakeholders to optimize performance.
Document standards and best practices for data engineering workflows.

Required Experience:

7-8 years of experience in data engineering, architecture, and pipeline development.
Strong knowledge of GCP, Databricks, PySpark, and BigQuery.
Experience with Orchestration tools like Airflow, Dagster, or GCP equivalents.
Understanding of Data Lake table formats (Delta, Iceberg, etc.).
Proficiency in Python for scripting and automation.
Strong problem-solving skills and collaborative mindset.

⚠️ Please apply only if you have not applied recently or are not currently in the interview process for any open roles at Xebia.

Looking forward to your response!

Best regards,

Vijay S

Assistant Manager - TAG

https://www.linkedin.com/in/vijay-selvarajan/

Shift: 2 PM 11 PM
Work Mode: Hybrid (3 days a week) across Xebia locations
Notice Period: Immediate joiners or those with a notice period of up to 30 days

Key Responsibilities:

Design and implement a GCP Native Data Processing Framework leveraging Spark and GCP Cloud Services.
Develop and maintain data pipelines using Databricks and Airflow for transforming Raw → Silver → Gold data layers.
Ensure data integrity, consistency, and availability across all systems.
Collaborate with data engineers, analysts, and stakeholders to optimize performance.
Document standards and best practices for data engineering workflows.

Required Experience:

7-8 years of experience in data engineering, architecture, and pipeline development.
Strong knowledge of GCP, Databricks, PySpark, and BigQuery.
Experience with Orchestration tools like Airflow, Dagster, or GCP equivalents.
Understanding of Data Lake table formats (Delta, Iceberg, etc.).
Proficiency in Python for scripting and automation.
Strong problem-solving skills and collaborative mindset.

⚠️ Please apply only if you have not applied recently or are not currently in the interview process for any open roles at Xebia.

Looking forward to your response!

Best regards,

Vijay S

Assistant Manager - TAG

https://www.linkedin.com/in/vijay-selvarajan/

Lead Data Engineer

Data Havn

Agency job

via Infinium Associate by Toshi Srivastava

Noida

5 - 9 yrs

₹40L - ₹60L / yr

Python

SQL

Data engineering

Snowflake

ETL

+5 more

About the Role:

We are seeking a talented Lead Data Engineer to join our team and play a pivotal role in transforming raw data into valuable insights. As a Data Engineer, you will design, develop, and maintain robust data pipelines and infrastructure to support our organization's analytics and decision-making processes.

Responsibilities:

Data Pipeline Development: Build and maintain scalable data pipelines to extract, transform, and load (ETL) data from various sources (e.g., databases, APIs, files) into data warehouses or data lakes.
Data Infrastructure: Design, implement, and manage data infrastructure components, including data warehouses, data lakes, and data marts.
Data Quality: Ensure data quality by implementing data validation, cleansing, and standardization processes.
Team Management: Able to handle team.
Performance Optimization: Optimize data pipelines and infrastructure for performance and efficiency.
Collaboration: Collaborate with data analysts, scientists, and business stakeholders to understand their data needs and translate them into technical requirements.
Tool and Technology Selection: Evaluate and select appropriate data engineering tools and technologies (e.g., SQL, Python, Spark, Hadoop, cloud platforms).
Documentation: Create and maintain clear and comprehensive documentation for data pipelines, infrastructure, and processes.

Skills:

Strong proficiency in SQL and at least one programming language (e.g., Python, Java).
Experience with data warehousing and data lake technologies (e.g., Snowflake, AWS Redshift, Databricks).
Knowledge of cloud platforms (e.g., AWS, GCP, Azure) and cloud-based data services.
Understanding of data modeling and data architecture concepts.
Experience with ETL/ELT tools and frameworks.
Excellent problem-solving and analytical skills.
Ability to work independently and as part of a team.

Preferred Qualifications:

Experience with real-time data processing and streaming technologies (e.g., Kafka, Flink).
Knowledge of machine learning and artificial intelligence concepts.
Experience with data visualization tools (e.g., Tableau, Power BI).
Certification in cloud platforms or data engineering.

About the Role:

Responsibilities:

Data Pipeline Development: Build and maintain scalable data pipelines to extract, transform, and load (ETL) data from various sources (e.g., databases, APIs, files) into data warehouses or data lakes.
Data Infrastructure: Design, implement, and manage data infrastructure components, including data warehouses, data lakes, and data marts.
Data Quality: Ensure data quality by implementing data validation, cleansing, and standardization processes.
Team Management: Able to handle team.
Performance Optimization: Optimize data pipelines and infrastructure for performance and efficiency.
Collaboration: Collaborate with data analysts, scientists, and business stakeholders to understand their data needs and translate them into technical requirements.
Tool and Technology Selection: Evaluate and select appropriate data engineering tools and technologies (e.g., SQL, Python, Spark, Hadoop, cloud platforms).
Documentation: Create and maintain clear and comprehensive documentation for data pipelines, infrastructure, and processes.

Skills:

Strong proficiency in SQL and at least one programming language (e.g., Python, Java).
Experience with data warehousing and data lake technologies (e.g., Snowflake, AWS Redshift, Databricks).
Knowledge of cloud platforms (e.g., AWS, GCP, Azure) and cloud-based data services.
Understanding of data modeling and data architecture concepts.
Experience with ETL/ELT tools and frameworks.
Excellent problem-solving and analytical skills.
Ability to work independently and as part of a team.

Preferred Qualifications:

Experience with real-time data processing and streaming technologies (e.g., Kafka, Flink).
Knowledge of machine learning and artificial intelligence concepts.
Experience with data visualization tools (e.g., Tableau, Power BI).
Certification in cloud platforms or data engineering.

AWS Data engineer

at Deqode

1 recruiter

Posted by Shraddha Katare

Pune

2 - 5 yrs

₹3L - ₹10L / yr

PySpark

Amazon Web Services (AWS)

AWS Lambda

SQL

Data engineering

+2 more

Here is the Job Description -

Location -- Viman Nagar, Pune

Mode - 5 Days Working

Required Tech Skills:

● Strong at PySpark, Python

● Good understanding of Data Structure

● Good at SQL query/optimization

● Strong fundamentals of OOPs programming

● Good understanding of AWS Cloud, Big Data.

● Data Lake, AWS Glue, Athena, S3, Kinesis, SQL/NoSQL DB

Here is the Job Description -

Location -- Viman Nagar, Pune

Mode - 5 Days Working

Required Tech Skills:

● Strong at PySpark, Python

● Good understanding of Data Structure

● Good at SQL query/optimization

● Strong fundamentals of OOPs programming

● Good understanding of AWS Cloud, Big Data.

● Data Lake, AWS Glue, Athena, S3, Kinesis, SQL/NoSQL DB

Sr. Python Developer

at Nirmitee.io

4 recruiters

Posted by Gitashri K

Pune

5 - 10 yrs

₹8L - ₹15L / yr

Python

PySpark

Amazon Web Services (AWS)

CI/CD

GitHub

About the Role:

We are seeking a skilled Python Backend Developer to join our dynamic team. This role focuses on designing, building, and maintaining efficient, reusable, and reliable code that supports both monolithic and microservices architectures. The ideal candidate will have a strong understanding of backend frameworks and architectures, proficiency in asynchronous programming, and familiarity with deployment processes. Experience with AI model deployment is a plus.

Overall 5+ years of IT experience with minimum of 5+ Yrs of experience on Python and in Opensource web framework (Django) with AWS Experience.

Key Responsibilities:

- Develop, optimize, and maintain backend systems using Python, Pyspark, and FastAPI.

- Design and implement scalable architectures, including both monolithic and microservices.

-3+ Years of working experience in AWS (Lambda, Serverless, Step Function and EC2)

-Deep Knowledge on Python Flask/Django Framework

-Good understanding of REST API’s

-Sound Knowledge on Database

-Excellent problem-solving and analytical skills

-Leadership Skills, Good Communication Skills, interested to learn modern technologies

- Apply design patterns (MVC, Singleton, Observer, Factory) to solve complex problems effectively.

- Work with web servers (Nginx, Apache) and deploy web applications and services.

- Create and manage RESTful APIs; familiarity with GraphQL is a plus.

- Use asynchronous programming techniques (ASGI, WSGI, async/await) to enhance performance.

- Integrate background job processing with Celery and RabbitMQ, and manage caching mechanisms using Redis and Memcached.

- (Optional) Develop containerized applications using Docker and orchestrate deployments with Kubernetes.

Required Skills:

- Languages & Frameworks:Python, Django, AWS

- Backend Architecture & Design:Strong knowledge of monolithic and microservices architectures, design patterns, and asynchronous programming.

- Web Servers & Deployment:Proficient in Nginx and Apache, and experience in RESTful API design and development. GraphQL experience is a plus.

-Background Jobs & Task Queues: Proficiency in Celery and RabbitMQ, with experience in caching (Redis, Memcached).

- Additional Qualifications: Knowledge of Docker and Kubernetes (optional), with any exposure to AI model deployment considered a bonus.

Qualifications:

- Bachelor’s degree in Computer Science, Engineering, or a related field.

- 5+ years of experience in backend development using Python and Django and AWS.

- Demonstrated ability to design and implement scalable and robust architectures.

- Strong problem-solving skills, attention to detail, and a collaborative mindset.

Preferred:

- Experience with Docker/Kubernetes for containerization and orchestration.

- Exposure to AI model deployment processes.

About the Role:

Overall 5+ years of IT experience with minimum of 5+ Yrs of experience on Python and in Opensource web framework (Django) with AWS Experience.

Key Responsibilities:

- Develop, optimize, and maintain backend systems using Python, Pyspark, and FastAPI.

- Design and implement scalable architectures, including both monolithic and microservices.

-3+ Years of working experience in AWS (Lambda, Serverless, Step Function and EC2)

-Deep Knowledge on Python Flask/Django Framework

-Good understanding of REST API’s

-Sound Knowledge on Database

-Excellent problem-solving and analytical skills

-Leadership Skills, Good Communication Skills, interested to learn modern technologies

- Apply design patterns (MVC, Singleton, Observer, Factory) to solve complex problems effectively.

- Work with web servers (Nginx, Apache) and deploy web applications and services.

- Create and manage RESTful APIs; familiarity with GraphQL is a plus.

- Use asynchronous programming techniques (ASGI, WSGI, async/await) to enhance performance.

- Integrate background job processing with Celery and RabbitMQ, and manage caching mechanisms using Redis and Memcached.

- (Optional) Develop containerized applications using Docker and orchestrate deployments with Kubernetes.

Required Skills:

- Languages & Frameworks:Python, Django, AWS

- Backend Architecture & Design:Strong knowledge of monolithic and microservices architectures, design patterns, and asynchronous programming.

- Web Servers & Deployment:Proficient in Nginx and Apache, and experience in RESTful API design and development. GraphQL experience is a plus.

-Background Jobs & Task Queues: Proficiency in Celery and RabbitMQ, with experience in caching (Redis, Memcached).

- Additional Qualifications: Knowledge of Docker and Kubernetes (optional), with any exposure to AI model deployment considered a bonus.

Qualifications:

- Bachelor’s degree in Computer Science, Engineering, or a related field.

- 5+ years of experience in backend development using Python and Django and AWS.

- Demonstrated ability to design and implement scalable and robust architectures.

- Strong problem-solving skills, attention to detail, and a collaborative mindset.

Preferred:

- Experience with Docker/Kubernetes for containerization and orchestration.

- Exposure to AI model deployment processes.

Senior Data Scientist

at Data Axle

2 candid answers

Posted by Eman Khan

Pune

6 - 9 yrs

Best in industry

Azure

Machine Learning (ML)

databricks

Python

SQL

+2 more

About Data Axle:

Data Axle Inc. has been an industry leader in data, marketing solutions, sales and research for over 50 years in the USA. Data Axle now as an established strategic global centre of excellence in Pune. This centre delivers mission critical data services to its global customers powered by its proprietary cloud-based technology platform and by leveraging proprietary business & consumer databases.

Data Axle Pune is pleased to have achieved certification as a Great Place to Work!

Roles & Responsibilities:

We are looking for a Senior Data Scientist to join the Data Science Client Services team to continue our success of identifying high quality target audiences that generate profitable marketing return for our clients. We are looking for experienced data science, machine learning and MLOps practitioners to design, build and deploy impactful predictive marketing solutions that serve a wide range of verticals and clients. The right candidate will enjoy contributing to and learning from a highly talented team and working on a variety of projects.

We are looking for a Senior Data Scientist who will be responsible for:

Ownership of design, implementation, and deployment of machine learning algorithms in a modern Python-based cloud architecture
Design or enhance ML workflows for data ingestion, model design, model inference and scoring
Oversight on team project execution and delivery
Establish peer review guidelines for high quality coding to help develop junior team members’ skill set growth, cross-training, and team efficiencies
Visualize and publish model performance results and insights to internal and external audiences

Qualifications:

Masters in a relevant quantitative, applied field (Statistics, Econometrics, Computer Science, Mathematics, Engineering)
Minimum of 5 years of work experience in the end-to-end lifecycle of ML model development and deployment into production within a cloud infrastructure (Databricks is highly preferred)
Proven ability to manage the output of a small team in a fast-paced environment and to lead by example in the fulfilment of client requests
Exhibit deep knowledge of core mathematical principles relating to data science and machine learning (ML Theory + Best Practices, Feature Engineering and Selection, Supervised and Unsupervised ML, A/B Testing, etc.)
Proficiency in Python and SQL required; PySpark/Spark experience a plus
Ability to conduct a productive peer review and proper code structure in Github
Proven experience developing, testing, and deploying various ML algorithms (neural networks, XGBoost, Bayes, and the like)
Working knowledge of modern CI/CD methods This position description is intended to describe the duties most frequently performed by an individual in this position.

It is not intended to be a complete list of assigned duties but to describe a position level.

About Data Axle:

Data Axle Pune is pleased to have achieved certification as a Great Place to Work!

Roles & Responsibilities:

We are looking for a Senior Data Scientist who will be responsible for:

Ownership of design, implementation, and deployment of machine learning algorithms in a modern Python-based cloud architecture
Design or enhance ML workflows for data ingestion, model design, model inference and scoring
Oversight on team project execution and delivery
Establish peer review guidelines for high quality coding to help develop junior team members’ skill set growth, cross-training, and team efficiencies
Visualize and publish model performance results and insights to internal and external audiences

Qualifications:

Masters in a relevant quantitative, applied field (Statistics, Econometrics, Computer Science, Mathematics, Engineering)
Minimum of 5 years of work experience in the end-to-end lifecycle of ML model development and deployment into production within a cloud infrastructure (Databricks is highly preferred)
Proven ability to manage the output of a small team in a fast-paced environment and to lead by example in the fulfilment of client requests
Exhibit deep knowledge of core mathematical principles relating to data science and machine learning (ML Theory + Best Practices, Feature Engineering and Selection, Supervised and Unsupervised ML, A/B Testing, etc.)
Proficiency in Python and SQL required; PySpark/Spark experience a plus
Ability to conduct a productive peer review and proper code structure in Github
Proven experience developing, testing, and deploying various ML algorithms (neural networks, XGBoost, Bayes, and the like)
Working knowledge of modern CI/CD methods This position description is intended to describe the duties most frequently performed by an individual in this position.

It is not intended to be a complete list of assigned duties but to describe a position level.

Sr. Data Engineer

at Koantek

Posted by Bhoomika Varshney

Remote only

4 - 8 yrs

₹10L - ₹30L / yr

Python

databricks

SQL

Spark

PySpark

+3 more

The Sr AWS/Azure/GCP Databricks Data Engineer at Koantek will use comprehensive

modern data engineering techniques and methods with Advanced Analytics to support

business decisions for our clients. Your goal is to support the use of data-driven insights

to help our clients achieve business outcomes and objectives. You can collect, aggregate, and analyze structured/unstructured data from multiple internal and external sources and

patterns, insights, and trends to decision-makers. You will help design and build data

pipelines, data streams, reporting tools, information dashboards, data service APIs, data

generators, and other end-user information portals and insight tools. You will be a critical

part of the data supply chain, ensuring that stakeholders can access and manipulate data

for routine and ad hoc analysis to drive business outcomes using Advanced Analytics. You are expected to function as a productive member of a team, working and

communicating proactively with engineering peers, technical lead, project managers, product owners, and resource managers. Requirements:

 Strong experience as an AWS/Azure/GCP Data Engineer and must have

AWS/Azure/GCP Databricks experience.  Expert proficiency in Spark Scala, Python, and spark

 Must have data migration experience from on-prem to cloud

 Hands-on experience in Kinesis to process & analyze Stream Data, Event/IoT Hubs, and Cosmos

 In depth understanding of Azure/AWS/GCP cloud and Data lake and Analytics

solutions on Azure.  Expert level hands-on development Design and Develop applications on Databricks.  Extensive hands-on experience implementing data migration and data processing

using AWS/Azure/GCP services

 In depth understanding of Spark Architecture including Spark Streaming, Spark Core, Spark SQL, Data Frames, RDD caching, Spark MLib

 Hands-on experience with the Technology stack available in the industry for data

management, data ingestion, capture, processing, and curation: Kafka, StreamSets, Attunity, GoldenGate, Map Reduce, Hadoop, Hive, Hbase, Cassandra, Spark, Flume, Hive, Impala, etc

 Hands-on knowledge of data frameworks, data lakes and open-source projects such

asApache Spark, MLflow, and Delta Lake

 Good working knowledge of code versioning tools [such as Git, Bitbucket or SVN]

 Hands-on experience in using Spark SQL with various data sources like JSON, Parquet and Key Value Pair

 Experience preparing data for Data Science and Machine Learning with exposure to- model selection, model lifecycle, hyperparameter tuning, model serving, deep

learning, etc

 Demonstrated experience preparing data, automating and building data pipelines for

AI Use Cases (text, voice, image, IoT data etc. ).  Good to have programming language experience with. NET or Spark/Scala

 Experience in creating tables, partitioning, bucketing, loading and aggregating data

using Spark Scala, Spark SQL/PySpark

 Knowledge of AWS/Azure/GCP DevOps processes like CI/CD as well as Agile tools

and processes including Git, Jenkins, Jira, and Confluence

 Working experience with Visual Studio, PowerShell Scripting, and ARM templates.  Able to build ingestion to ADLS and enable BI layer for Analytics

 Strong understanding of Data Modeling and defining conceptual logical and physical

data models.  Big Data/analytics/information analysis/database management in the cloud

 IoT/event-driven/microservices in the cloud- Experience with private and public cloud

architectures, pros/cons, and migration considerations.  Ability to remain up to date with industry standards and technological advancements

that will enhance data quality and reliability to advance strategic initiatives

 Working knowledge of RESTful APIs, OAuth2 authorization framework and security

best practices for API Gateways

 Guide customers in transforming big data projects, including development and

deployment of big data and AI applications

 Guide customers on Data engineering best practices, provide proof of concept, architect solutions and collaborate when needed

 2+ years of hands-on experience designing and implementing multi-tenant solutions

using AWS/Azure/GCP Databricks for data governance, data pipelines for near real-

time data warehouse, and machine learning solutions.  Over all 5+ years' experience in a software development, data engineering, or data

analytics field using Python, PySpark, Scala, Spark, Java, or equivalent technologies.  hands-on expertise in Apache SparkTM (Scala or Python)

 3+ years of experience working in query tuning, performance tuning, troubleshooting, and debugging Spark and other big data solutions.  Bachelor's or Master's degree in Big Data, Computer Science, Engineering, Mathematics, or similar area of study or equivalent work experience

 Ability to manage competing priorities in a fast-paced environment

 Ability to resolve issues

 Basic experience with or knowledge of agile methodologies

 AWS Certified: Solutions Architect Professional

 Databricks Certified Associate Developer for Apache Spark

 Microsoft Certified: Azure Data Engineer Associate

 GCP Certified: Professional Google Cloud Certified