Senior OCR Engineer

PJ, Jaragua do sul Remote

About the Role 

Our company operates in the energy domain, processing large volumes of utility bill data (primarily PDFs) to identify and recover overcharged payments. This role is critical in transforming raw, unstructured data into reliable, scalable, and actionable datasets that power analytics, operations, and future AI initiatives.

You will work closely with leadership and cross-functional teams to design data models, build pipelines, and enable data-driven decision-making across the organization.

What You’ll Do

Design, build, and maintain scalable data pipelines (ETL/ELT) to process structured and unstructured data (including PDF-based data extraction workflows)
Model and transform data into clean, reliable datasets optimized for analytics and business intelligence
Ensure data quality, consistency, and availability across systems
Partner with business and executive teams to understand data needs and enable data-driven decision-making
Develop and maintain dashboards and reports using tools like Power BI
Automate data workflows and reporting processes to improve efficiency and scalability
Collaborate with backend and engineering teams to integrate data systems and APIs
Contribute to the definition of data architecture and best practices
Support exploratory analysis and provide high-quality datasets for analytics and future machine learning use cases
Communicate technical concepts and data insights clearly to both technical and non-technical stakeholders
 

What Your Day Might Look Like

Building pipelines to extract and structure data from energy bills
Designing data models in PostgreSQL for analytics consumption
Creating reliable datasets to support financial recovery analysis
Developing dashboards to track recovered amounts, anomalies, and operational KPIs
Working with leadership to prioritize high-impact data initiatives
Improving data reliability and reducing manual processes through automation
 

Requirements

Advanced English for technical collaboration with the client’s global team
Strong Python experience, ideally 8 to 10 years — automation, PDF manipulation, regular expressions, testing
Hands-on use of AI tools such as Claude or Codex, with a focus on prompt engineering
Solid (Senior-level) experience in OCR projects
Experience with data pipelines, modeling, and integration of results (e.g., JSON, CSV, APIs)
QA experience
AWS experience
Familiarity with agentic workflows
Strong analytical skills and attention to detail to ensure high accuracy


Nice-to-Have Qualifications:
Previous experience in document automation projects within energy, utilities, or finance companies
Knowledge of Machine Learning applied to OCR (e.g., layout analysis, entity recognition)
Familiarity with Google Vision, AWS Textract, or Azure Cognitive Services
Experience with Gemini
Architecture background

Benefits
  • 100% remote opportunities 👨🏻‍💻
  • Home office allowance 💻
  • Regular feedback 💬
  • Referral program 🏅
  • Psychological support 🙋🏻‍♂️
  • Workplace exercise sessions 🏋️
  • Knowledge academy 🧠
  • Partnership with an English school 🔤
  • Monthly transparency meetings 🔃
  • Online happy hours 🍻
  • Welcome kit 🎁