Senior Data Engineer- Blackstraw- Work from office- Chennai

10.0 - 15.0 years

15.0 - 30.0 Lacs P.A.

Pallavaram

Posted:4 days ago| Platform: Naukri logo

Apply Now

Skills Required

PysparkAzure DatabricksSparkData BricksPython

Work Mode

Work from Office

Job Type

Full Time

Job Description

Data Engineering Lead Company Name: Blackstraw.ai Oce Location: Chennai (Work from Office) Job Type: Full-time Experience: 10 - 15 Years Candidates who can join immediately will be preferred. Job Description: As a lead data engineer you will oversee data architecture, ETL processes, and analytics pipelines, ensuring efficiency, scalability, and quality. Key Responsibilities: Working with clients to understand their data. Based on the understanding you will be building the data structures and pipelines. You will be working on the application from end to end collaborating with UI and other development teams. You will be working with various cloud providers such as Azure & AWS. You will be engineering data using the Hadoop/Spark ecosystem. You will be responsible for designing, building, optimizing and supporting new and existing data pipelines. Orchestrating jobs using various tools such Oozie, Airflow, etc. Developing programs for cleaning and processing data. You will be responsible for building the data pipelines to migrate and load the data into the HDFS either on-prem or in the cloud. Developing Data ingestion/process/integration pipelines effectively. Creating Hive data structures,metadata and loading the data into data lakes / BigData warehouse environments. Optimized (Performance tuning) many data pipelines effectively to minimize cost. Code versioning control and git repository is up to date. You should be able to explain the data pipeline to internal and external stakeholders. You will be responsible for building and maintaining CI/CD of the data pipelines. You will be managing the unit testing of all data pipelines. Tech Stack: Minimum of 5+ years working experience with Spark, Hadoop eco systems. Minimum of 4+ years working experience on designing data streaming pipelines. Should be an expert in either Python/Scala/Java. Should have experience in Data Ingestion and Integration into data lake using hadoop ecosystem tools such as Sqoop, Spark, SQL, Hive, Airflow, etc.. Should have experience optimizing (Performance tuning) data pipelines. Should have minimum experience of 3+ years on NoSQL and Spark Streaming. Knowledge of Kubernetes and Docker is a plus. Should have experience with Cloud services either Azure/AWS. Should have experience with on-prem distribution such as Cloudera/HortonWorks/MapR. Basic understanding of CI/CD pipelines. Basic knowledge of Linux environment and commands. Preferred Qualifications: Bachelors degree in computer science or related field. Proven experience with big data ecosystem tools such as Sqoop, Spark, SQL, API, Hive, Oozie, Airflow, etc.. Solid experience in all phases of SDLC with 10+ years of experience (plan, design, develop, test, release, maintain and support) Hands-on experience using Azures data engineering stack. Should have implemented projects using programming languages such as Scala or Python. Working experience on SQL complex data merging techniques such as windowing functions etc.. Hands-on experience with on-prem distribution tools such as Cloudera/HortonWorks/MapR. Should have excellent communication, presentation and problem solving skills. Key Traits: Should have excellent communication skills. Should be self motivated and willing to work as part of a team. Should be able to collaborate and coordinate with on shore and offshore teams. Be a problem solver and be proactive to solve the challenges that come his way.

Information Technology & Services
Silicon Valley

RecommendedJobs for You

Hyderabad, Pune, Bengaluru

Pune, Chennai, Bengaluru

Bhubaneswar, Pune, Bengaluru