Job Responsibilities:
Build distributed, scalable, and reliable data pipelines that ingest and process data at scale and in real-time.
Collaborate with other teams to design and develop and deploy data tools that support both operations and product use cases.
Perform offline analysis of large data sets using components from the Hadoop ecosystem.
Evaluate and advise on technical aspects of open work requests in the product backlog with the project lead.
Own product features from the development phase through to production deployment.
Evaluate big data technologies and prototype solutions to improve our data processing architecture.
Candidate Profile:
BS in Computer Science or related area
10-12 years software development experience
Minimum 2 Year Experience on Big Data Platform
Proficiency with Java, Python, Scala, HBase, Hive, MapReduce, ETL, Kafka, Mongo, Postgres, Visualization technologies etc.
Flair for data, schema, data model, how to bring efficiency in big data related life cycle
Understanding of automated QA needs related to Big data
Understanding of various Visualization platform (Tableau, D3JS, others)
Proficiency with agile or lean development practices
Strong object-oriented design and analysis skills
Excellent technical and organizational skills
Excellent written and verbal communication skills
Desired Skills and Experience
Top skill sets / technologies in the ideal candidate:
* Programming language -- Java (must), Python, Scala, Ruby
* Batch processing -- Hadoop MapReduce, Cascading/Scalding, Apache Spark
* Stream processing -- Apache Storm, AKKA, Samza, Spark streaming
* NoSQL -- HBase, MongoDB, Cassandra, Riak,
Key Skills: