About the Role
Our company operates in the energy domain, processing large volumes of utility bill data (primarily PDFs) to identify and recover overcharged payments. This role is critical in transforming raw, unstructured data into reliable, scalable, and actionable datasets that power analytics, operations, and future AI initiatives.
You will work closely with leadership and cross-functional teams to design data models, build pipelines, and enable data-driven decision-making across the organization.
What You’ll Do
Design, build, and maintain scalable data pipelines (ETL/ELT) to process structured and unstructured data (including PDF-based data extraction workflows)
Model and transform data into clean, reliable datasets optimized for analytics and business intelligence
Ensure data quality, consistency, and availability across systems
Partner with business and executive teams to understand data needs and enable data-driven decision-making
Develop and maintain dashboards and reports using tools like Power BI
Automate data workflows and reporting processes to improve efficiency and scalability
Collaborate with backend and engineering teams to integrate data systems and APIs
Contribute to the definition of data architecture and best practices
Support exploratory analysis and provide high-quality datasets for analytics and future machine learning use cases
Communicate technical concepts and data insights clearly to both technical and non-technical stakeholders
What Your Day Might Look Like
Building pipelines to extract and structure data from energy bills
Designing data models in PostgreSQL for analytics consumption
Creating reliable datasets to support financial recovery analysis
Developing dashboards to track recovered amounts, anomalies, and operational KPIs
Working with leadership to prioritize high-impact data initiatives
Improving data reliability and reducing manual processes through automation
Advanced English for technical collaboration with the client’s global team
Strong Python experience, ideally 8 to 10 years — automation, PDF manipulation, regular expressions, testing
Hands-on use of AI tools such as Claude or Codex, with a focus on prompt engineering
Solid (Senior-level) experience in OCR projects
Experience with data pipelines, modeling, and integration of results (e.g., JSON, CSV, APIs)
QA experience
AWS experience
Familiarity with agentic workflows
Strong analytical skills and attention to detail to ensure high accuracy
Nice-to-Have Qualifications:
Previous experience in document automation projects within energy, utilities, or finance companies
Knowledge of Machine Learning applied to OCR (e.g., layout analysis, entity recognition)
Familiarity with Google Vision, AWS Textract, or Azure Cognitive Services
Experience with Gemini
Architecture background