Cloudera Jobs in Bangalore (Bengaluru)
Job responsibilities
- You will partner with teammates to create complex data processing pipelines in order to solve our clients' most complex challenges
- You will pair to write clean and iterative code based on TDD
- Leverage various continuous delivery practices to deploy, support and operate data pipelines
- Advise and educate clients on how to use different distributed storage and computing technologies from the plethora of options available
- Develop and operate modern data architecture approaches to meet key business objectives and provide end-to-end data solutions
- Create data models and speak to the tradeoffs of different modeling approaches
- Seamlessly incorporate data quality into your day-to-day work as well as into the delivery process
- Encouraging open communication and advocating for shared outcomes
Technical skills
- You have a good understanding of data modelling and experience with data engineering tools and platforms such as Spark (Scala) and Hadoop
- You have built large-scale data pipelines and data-centric applications using any of the distributed storage platforms such as HDFS, S3, NoSQL databases (Hbase, Cassandra, etc.) and any of the distributed processing platforms like Hadoop, Spark, Hive, Oozie, and Airflow in a production setting
- Hands on experience in MapR, Cloudera, Hortonworks and/or cloud (AWS EMR, Azure HDInsights, Qubole etc.) based Hadoop distributions
- You are comfortable taking data-driven approaches and applying data security strategy to solve business problems
- Working with data excites you: you can build and operate data pipelines, and maintain data storage, all within distributed systems
- You're genuinely excited about data infrastructure and operations with a familiarity working in cloud environments
Professional skills
- You're resilient and flexible in ambiguous situations and enjoy solving problems from technical and business perspectives
- An interest in coaching, sharing your experience and knowledge with teammates
- You enjoy influencing others and always advocate for technical excellence while being open to change when needed
- Presence in the external tech community: you willingly share your expertise with others via speaking engagements, contributions to open source, blogs and more
Responsibilities :
- Provide Support Services to our Gold & Enterprise customers using our flagship product suits. This may include assistance provided during the engineering and operations of distributed systems as well as responses for mission-critical systems and production customers.
- Lead end-to-end delivery and customer success of next-generation features related to scalability, reliability, robustness, usability, security, and performance of the product
- Lead and mentor others about concurrency, parallelization to deliver scalability, performance, and resource optimization in a multithreaded and distributed environment
- Demonstrate the ability to actively listen to customers and show empathy to the customer’s business impact when they experience issues with our products
Requires Skills :
- 10+ years of Experience with a highly scalable, distributed, multi-node environment (100+ nodes)
- Hadoop operation including Zookeeper, HDFS, YARN, Hive, and related components like the Hive metastore, Cloudera Manager/Ambari, etc
- Authentication and security configuration and tuning (KNOX, LDAP, Kerberos, SSL/TLS, second priority: SSO/OAuth/OIDC, Ranger/Sentry)
- Java troubleshooting, e.g., collection and evaluation of jstacks, heap dumps
- Linux, NFS, Windows, including application installation, scripting, basic command line
- Docker and Kubernetes configuration and troubleshooting, including Helm charts, storage options, logging, and basic kubectl CLI
- Experience working with scripting languages (Bash, PowerShell, Python)
- Working knowledge of application, server, and network security management concepts
- Familiarity with virtual machine technologies
- Knowledge of databases like MySQL and PostgreSQL,
- Certification on any of the leading Cloud providers (AWS, Azure, GCP ) and/or Kubernetes is a big plus