Jobs

Companies

Resume

PySpark Developer - Complex XML Data Processing

Victrix Systems Labs Pvt. Ltd.

5 - 9 years

7.0 - 11.0 Lacs P.A.

Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata

Posted:2 months ago| Platform:

Apply Now

Skills Required

Computer scienceorchestrationMemory managementXMLMachine learningSchemaSCALAData processingJSONAnalytics

Work Mode

Work from Office

Job Type

Full Time

Job Description

Senior PySpark Developer - Complex XML Data Processing Key Responsibilities: Design and develop scalable PySpark pipelines to ingest, parse, and process XML datasets with extreme hierarchical complexity. Implement efficient XPath expressions, recursive parsing techniques, and custom schema definitions to extract data from nested XML structures. Optimize Spark jobs through partitioning, caching, and parallel processing to handle terabytes of XML data efficiently. Transform raw hierarchical XML data into structured Data Frames for analytics, machine learning, and reporting use cases. Collaborate with data architects and analysts to define data models for nested XML schemas. Troubleshoot performance bottlenecks and ensure reliability in distributed environments (e.g., AWS, Databricks, Hadoop). Document parsing logic, data lineage, and optimization strategies for maintainability. Qualifications: 5+ years of hands-on experience with PySpark and Spark XML libraries (e.g., `spark-xml`) in production environments. Proven track record of parsing XML data with 20+ levels of nesting using recursive methods and schema inference. Expertise in XPath, XQuery, and DataFrame transformations (e.g., `explode`, `struct`, `selectExpr`) for hierarchical data. Strong understanding of Spark optimization techniques: partitioning strategies, broadcast variables, and memory management. Experience with distributed computing frameworks (e.g., Hadoop, YARN) and cloud platforms (AWS, Azure, GCP). Familiarity with big data file formats (Parquet, Avro) and orchestration tools (Airflow, Luigi). Bachelor s degree in Computer Science, Data Engineering, or a related field. Preferred Skills: Experience with schema evolution and versioning for nested XML/JSON datasets. Knowledge of Scala or Java for extending Spark XML libraries. Exposure to Databricks, Delta Lake, or similar platforms. Certifications in AWS/Azure big data technologies.

Victrix Systems Labs Pvt. Ltd.

http://www.victrixsystems.com

Technology / Software Development

Bangalore

Login to

Please Verify Your Phone or Email

Confirm Action

Search

Profile

Bookmarks

PySpark Developer - Complex XML Data Processing

Experience & Salary

Skills Required

Work Mode

Job Type

Job Description

Victrix Systems Labs Pvt. Ltd.

RecommendedJobs for You

Java Developer

Site Reliability Engineer- Azure

Ionic Mobile Developer Job

Automation Anywhere Developer - Except Bangalore Location

Data Scientist

Software Engineer III - Java & AWS

Senior React Developer || Infogain || Hybrid Work

Appian Developer

Sap Basis Technical Consultant (Bangalore)

Senior React JS Developer- JavaScript (ES6+)- Paytm Money

Login to

Please Verify Your Phone or Email

Confirm Action

Contact Us

Contact Us

Search

Profile

Bookmarks

Personal Settings

PySpark Developer - Complex XML Data Processing

Experience & Salary

Skills Required

Work Mode

Job Type

Job Description