Do you want to be in the forefront of engineering big data solutions that takes Transportation models to the next generation? Do you have a solid analytical thinking, metrics driven decision making and want to solve problems with solutions that will meet the growing worldwide need?
We are seeking an experienced Data Engineer to work in a large and complex data warehouse environment. You will bring together disparate datasets to answer business questions. You will collaborate with Business Intelligence Engineers, Data Scientists, and business teams across the organization.
In this role, you will own the end-to-end development of data engineering solutions, playing a critical role in strategic decision-making. You will work with relational database management systems, develop key business questions, onboard new datasets, and analyze data.
The ideal candidate is a self-starter, comfortable with ambiguity, and able to create and maintain automated processes efficiently. You should have strong analytical thinking, metrics-driven decision-making, and enjoy working with large volumes of data in complex technical contexts. Expertise in data modeling, ETL design, business intelligence tools, SQL, data warehousing, and building ETL pipelines is essential.
Key job responsibilities
- Ensure compliance with data governance and regulatory requirements, implementing data lineage tracking, auditing, and monitoring mechanisms.
- Collaborate with legal and compliance teams to adhere to applicable laws.
- Monitor and manage running queries on the cluster, optimize resource utilization, and develop solutions for Redshift cluster optimization.
- Evaluate data retention requirements, implement archiving and purging strategies, manage dataset migrations, and design solutions for backfilling new datasets.
- Identify and create curated datasets, ensuring data quality and documentation, and provide tools for stakeholders to utilize these datasets.
- Standardize data inputs for tools under development and provide Data Engineering support.
- Establish access controls and guardrails to prevent unauthorized or inefficient use of cluster resources.
- Generate complete, reusable metadata and dataset documentation.
- Engage in all phases of the development life cycle, including design, implementation, testing, delivery, documentation, support, and maintenance.
- Determine best practices for creating data lineage from a range of data sources by analyzing source data systems. data sources.
- Build and support a modern data architecture that integrates a wide range of available AWS data technologies
- Design and implement scalable data infrastructure and pipelines on AWS to provide efficient data access and storage for LLM-powered chatbot development