San Francisco, CA | Contract
San Francisco, CA
6 months plus
- Analyze, transform and integrate high volume, complex data sources in SQL
- Integrate data sets all together into Hive, Vertica and 3P APIs and implement systems tracking data quality and consistency
- Develop tools supporting self-service data pipeline management (ETL).
- Analyze SQL and MapReduce job tuning to improve data processing performance
- Communicate with business users and data scientists to understand the business objectives and engineering needs and consistently evolve data model & data schema
- Work with data SME’ s and stakeholders to elicit requirements and develop instantaneous business metrics, analytical products and analytical insights.
- Advanced SQL and Python skills
- Extensive experience with Hadoop (or similar) Ecosystem
- Experience in development and design of very scalable, end to end process to consume large volume, complex data
- Experience with workflow management tools (Airflow, Oozie, Azkaban, UC4)
- Strong Python to build processes around data structures.
- Strong skills in high volume data Analytics to identify deliverables/gaps/inconsistencies.
- Good understanding of SQL Engine and able to conduct advanced performance tuning