ML Operations Engineer(Data &AI)Bengaluru-Hybrid

7 - 12 years

20.0 - 35.0 Lacs P.A.

Bengaluru

Posted:2 months ago| Platform: Naukri logo

Apply Now

Skills Required

TensorflowMachine LearningDevopsHuggingfaceAws SagemakerVertexLangchainBash ScriptingDeepSpeedShell ScriptingLLMOpsHadoopCi/CdJenkinsOpenAI ApiAimlAmazon BedrockMl PipelinesPythonKubernetes

Work Mode

Work from Office

Job Type

Full Time

Job Description

Location: Bangalore / Hybrid Department: Data & AI Company: Resolve Tech Solutions / Juno Labs About Juno Labs: Juno Labs is at the forefront of AI-driven cloud solutions, helping businesses unlock the power of data with scalable, intelligent, and high-performance architectures. We specialize in building next-gen data platforms, leveraging cloud technologies, AI/ML, vector databases, and advanced frameworks to drive real-time insights and intelligent decision-making. Job Description: We are looking for an experienced MLOps Engineer to join our Data & AI team. This role will focus on building, deploying, and optimizing end-to-end machine learning systems with an emphasis on LLMOps (Large Language Models operationalization). The ideal candidate will have strong expertise in MLOps , LLMOps , and DevOps , with hands-on experience managing and deploying large-scale models, particularly LLMs , in both cloud and on-premise environments. The role involves not only building robust MLOps pipelines but also self-hosting models, optimizing GPU usage, and performing quantization to reduce the cost of deployment. Key Responsibilities: Design and implement scalable MLOps pipelines to deploy, monitor, and manage machine learning models, with a particular focus on LLMOps . Integrate, fine-tune, and optimize Hugging Face models (e.g., Transformers , BART , GPT-2/3 ) for diverse NLP tasks such as text generation , text classification , and NER , and deploy them for production-scale systems. Use LangChain to build sophisticated LLM-driven applications , enabling seamless model workflows for NLP and decision-making tasks. Optimize and manage LLMOps pipelines for large-scale models using technologies such as OpenAI API , Amazon Bedrock , DeepSpeed , and Hugging Face Hub . Develop and scale self-hosted LLM solutions (e.g., fine-tuning and serving models on-premises or in a hybrid cloud environment) to meet performance, reliability, and cost-effectiveness goals. Leverage cloud-native tools such as Amazon SageMaker , Vertex AI , GCP , AWS for scaling large language models, and ensure their optimization in distributed cloud environments. Use GPU-based optimization for large-scale model training and deployment, ensuring high performance and efficient resource allocation in the cloud or on- premises environments. Deploy models via containerized solutions using Docker , Kubernetes , and Helm , allowing for seamless scaling and management in both cloud and on- premise infrastructures. Implement model quantization and pruning techniques to reduce the resource footprint of deployed models while maintaining high performance. Monitor model performance in production using Prometheus , Grafana , ELK Stack , and other observability tools to track metrics such as inference latency, accuracy, and throughput. Automate the end-to-end workflow of model development and deployment via CI/CD pipelines with tools like GitLab CI , Jenkins , and CircleCI . Integrate vector databases (e.g., Pinecone , FAISS , Milvus ) for efficient storage, retrieval, and querying of model-generated embeddings. Stay up to date with the latest advancements in MLOps , LLMOps , and machine learning technologies, ensuring the adoption of best practices in model development, deployment, and optimization. Required Skills & Qualifications: Bachelors or Masters degree in Computer Science, Engineering, or a related field. 5+ years of experience in MLOps , LLMOps , DevOps , or related roles, with a focus on deploying and managing machine learning models in production environments. Experience with cloud platforms such as AWS , GCP , Azure , and services like Amazon SageMaker , Vertex AI , TensorFlow Serving , DeepSpeed , and Amazon Bedrock . Expertise in Hugging Face models and the Transformers library, including model fine-tuning , deployment , and optimizing NLP models for large-scale production. Experience with LangChain for building and deploying LLM-based applications that handle dynamic and real-time tasks. Strong experience with self-hosting LLMs in cloud or on-premises environments using GPU-based infrastructure for training and inference (e.g., NVIDIA GPUs , CUDA ). Expertise in GPU utilization and optimization for large-scale model training, inference, and cost-effective deployment. Hands-on experience in model quantization techniques to reduce the memory footprint and inference time, such as TensorFlow Lite , ONNX , or DeepSpeed . Familiarity with distributed ML frameworks like Kubeflow , Ray , Dask , MLflow , for managing end-to-end ML workflows and large-scale model training and evaluation. Proficiency with containerization and orchestration tools such as Kubernetes , Docker , Helm , and Terraform for infrastructure automation. Knowledge of vector databases like Pinecone , Milvus , or FAISS to facilitate fast and scalable retrieval of model-generated embeddings. Expertise in setting up and managing CI/CD pipelines for model training, validation, testing, and deployment with tools like Jenkins , GitLab CI , and CircleCI . Strong programming skills in Python , Bash , and Shell scripting . Solid understanding of monitoring and logging tools such as Prometheus , Grafana , and ELK Stack to ensure high system performance, error detection, and model health tracking. Preferred Qualifications: Proven experience in deploying and managing large-scale LLMs like GPT-3 , BERT , T5 , and BLOOM in production environments using cloud-native solutions and on-premises hosting. Deep expertise in quantization , model compression , and pruning to optimize deployed models for lower latency and reduced resource consumption. Strong understanding of NLP tasks and deep learning concepts such as transformers, attention mechanisms, and pretrained model fine-tuning. Experience with Kedro for building reproducible ML pipelines with a focus on data engineering, workflow orchestration, and modularity. Familiarity with Apache Spark and Hadoop for handling big data processing needs, especially in real-time AI workloads . Familiarity with advanced data engineering pipelines and data lakes for the effective management of large datasets required for training LLMs. Why Join Us: Work with cutting-edge technologies in AI , MLOps , and LLMOps , including self- hosting and optimizing large-scale language models. Be part of an innovative, fast-growing team working on the future of AI-driven cloud solutions. Flexibility in work style with a hybrid work environment that promotes work-life balance. Competitive salary and benefits package, with opportunities for personal and professional growth.

IT Services and IT Consulting
Addison TX +

RecommendedJobs for You

Chennai, Pune, Mumbai, Bengaluru, Gurgaon

Chennai, Pune, Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata

Pune, Bengaluru, Mumbai (All Areas)