Role Summary
We are looking for a mid-level Python Developer - NLP, ML, Gen AI with combined experience in Data Engineering and AI/NLP engineering. The candidate will build NLP pipelines using libraries such as Flair, BERT, and LLM frameworks, and will also work on large-scale data processing using PySpark, Pandas, and related data tools. The role includes developing APIs, integrating with platform services, and supporting CI/CD deployments using GitHub and LightSpeed Enterprise.
Key Responsibilities
- Develop and optimize ETL/data processing jobs using PySpark, Pandas, PyArrow, and related libraries.
- Work with Parquet files using FastParquet or pyarrow.parquet for efficient data processing.
- Implement data parsing and serialization using json, ujson, or orjson for high-performance JSON handling.
- Build and maintain NLP pipelines using Flair, BERT, and LLM-based models.
- Develop scalable ingestion and data transformation pipelines for AI and analytics use cases.
- Build and maintain Flask-based APIs for model inference and service integrations.
- Use regular expressions for text cleaning, parsing, and NLP preprocessing.
- Integrate caching and fast lookups using Redis.
- Manage and deploy ML models using MLflow for tracking and versioning.
- Support CI/CD workflows using GitHub, LightSpeed Enterprise, and deployment pipelines.
- Create and maintain Autosys JILs for job scheduling and automation.
- Use basic Linux commands for troubleshooting, operations, and deployment tasks.
- Monitor application and system health using ITRS Geneos.
- Write unit tests and improve automation test coverage (PyTest/unittest).
- Work with REST APIs, microservices, and basic shell scripting.
- Work with cloud services (ECS), including boto3.
Required Skills
- 3–5 years of hands‑on Python programming experience.
- Strong fundamentals in Python, OOP, and design patterns.
- Experience with NLP libraries such as Flair, BERT, HuggingFace Transformers, or similar.
- Solid experience with PySpark, Pandas, PyArrow, and distributed data pipelines.
- Proficient in working with Parquet using FastParquet or pyarrow.parquet.
- Familiarity with fast JSON parsing libraries (json, ujson, orjson).
- Experience building APIs using Flask (FastAPI is a plus).
- Experience with MLflow for model tracking and deployment.
- Good understanding of CI/CD practices and Git workflows.
- Experience working with Redis or similar in‑memory stores.
- Experience with Autosys JILs for job scheduling.
- Comfortable with Linux command line and shell scripting.
- Solid debugging, problem‑solving, and teamwork skills.
- Exposure to cloud services; AWS boto3 experience is an asset.
Nice-to-Have
- Experience with Polars or Dask for high‑performance data processing.
- Experience with PyTorch or TensorFlow for model training.
- Experience with Docker, Kubernetes, or containerized deployments.
- Experience with monitoring tools such as ITRS Geneos.
- Experience with FastAPI, Airflow, or Prefect.
#J-18808-Ljbffr