San Francisco, CA 94105
San Francisco, CA
- Analyze, transform and integrate high volume, complex data sources in SQL
- Integrate data sets all together into Hive, Vertica and 3P APIs and implement systems tracking data quality and consistency
- Analyze SQL and MapReduce job tuning to improve data processing performance
- Communicate with business users and data scientists to understand the business objectives and engineering needs and consistently evolve data model & data schema
- Work with data SME’ s and stakeholders to elicit requirements and develop instantaneous business metrics, analytical products and analytical insights.
- Must have Advanced Python skills
- Experience with Python web frameworks (Pylons, Tornado, Django, etc)
- Experience with Hadoop (or similar) Ecosystem
- Experience in development and design of very scalable, end to end process to consume large volume, complex data
- Experience with workflow management tools (Airflow, Oozie, Azkaban, UC4)
- Strong Python to build processes around data structures.
- Strong skills in high volume data Analytics to identify deliverables/gaps/inconsistencies.
- Good understanding of SQL Engine and able to conduct advanced performance tuning