This role will be responsible for developing and deploying a game-changing and highly-disruptive advertising technology platform. This person would also take on the following responsibilities: Gather and process raw data at scale (including writing scripts, web scraping, calling APIs, write SQL queries, etc.) Work closely with our engineering team to integrate your amazing innovations and algorithms into our production systems Support business decisions with ad hoc analysis as needed Propose and investigate new techniques Troubleshoot production issues and identify practical solutions Routine check-up, back-up and monitoring of the entire MySQL and Hadoop ecosystem Take end-to-end responsibility of the Traditional Databases (MySQL), Big Data ETL, Analysis and processing Life Cycle in the organization Build, deploy and maintain real-time streaming pipelines and real-time analytics Manage deployments of big-data clusters across private and public cloud platforms
Responsibilities: Design and develop ETL Framework and Data Pipelines in Python 3. Orchestrate complex data flows from various data sources (like RDBMS, REST API, etc) to the data warehouse and vice versa. Develop app modules (in Django) for enhanced ETL monitoring. Device technical strategies for making data seamlessly available to BI and Data Sciences teams. Collaborate with engineering, marketing, sales, and finance teams across the organization and help Chargebee develop complete data solutions. Serve as a subject-matter expert for available data elements and analytic capabilities. Qualification: Expert programming skills with the ability to write clean and well-designed code. Expertise in Python, with knowledge of at least one Python web framework. Strong SQL Knowledge, and high proficiency in writing advanced SQLs. Hands on experience in modeling relational databases. Experience integrating with third-party platforms is an added advantage. Genuine curiosity, proven problem-solving ability, and a passion for programming and data.
About the job: - You will work with data scientists to architect, code and deploy ML models - You will solve problems of storing and analyzing large scale data in milliseconds - architect and develop data processing and warehouse systems - You will code, drink, breathe and live python, sklearn and pandas. It’s good to have experience in these but not a necessity - as long as you’re super comfortable in a language of your choice. - You will develop tools and products that provide analysts ready access to the data About you: - Strong CS fundamentals - You have strong experience in working with production environments - You write code that is clean, readable and tested - Instead of doing it second time, you automate it - You have worked with some of the commonly used databases and computing frameworks (Psql, S3, Hadoop, Hive, Presto, Spark, etc) - It will be great if you have one of the following to share - a kaggle or a github profile - You are an expert in one or more programming languages (Python preferred). Also good to have experience with python-based application development and data science libraries. - Ideally, you have 2+ years of experience in tech and/or data. - Degree in CS/Maths from Tier-1 institutes.
JOB DESCRIPTION: We are looking for a Data Engineer with a solid background in scalable systems to work with our engineering team to improve and optimize our platform. You will have significant input into the team’s architectural approach and execution. We are looking for a hands-on programmer who enjoys designing and optimizing data pipelines for large-scale data. This is NOT a "data scientist" role, so please don't apply if you're looking for that. RESPONSIBILITIES: 1. Build, maintain and test, performant, scalable data pipelines 2. Work with data scientists and application developers to implement scalable pipelines for data ingest, processing, machine learning and visualization 3. Building interfaces for ingest across various data stores MUST-HAVE: 1. A track record of building and deploying data pipelines as a part of work or side projects 2. Ability to work with RDBMS, MySQL or Postgres 3. Ability to deploy over cloud infrastructure, at least AWS 4. Demonstrated ability and hunger to learn GOOD-TO-HAVE: 1. Computer Science degree 2. Expertise in at least one of: Python, Java, Scala 3. Expertise and experience in deploying solutions based on Spark and Kafka 4. Knowledge of container systems like Docker or Kubernetes 5. Experience with NoSQL / graph databases: 6. Knowledge of Machine Learning Kindly apply only if you are skilled in building data pipelines.
Data Architect who leads a team of 5 numbers. Required skills : Spark ,Scala , hadoop
candidate will be responsible for all aspects of data acquisition, data transformation, and analytics scheduling and operationalization to drive high-visibility, cross-division outcomes. Expected deliverables will include the development of Big Data ELT jobs using a mix of technologies, stitching together complex and seemingly unrelated data sets for mass consumption, and automating and scaling analytics into the GRAND's Data Lake. Key Responsibilities : - Create a GRAND Data Lake and Warehouse which pools all the data from different regions and stores of GRAND in GCC - Ensure Source Data Quality Measurement, enrichment and reporting of Data Quality - Manage All ETL and Data Model Update Routines - Integrate new data sources into DWH - Manage DWH Cloud (AWS/AZURE/Google) and Infrastructure Skills Needed : - Very strong in SQL. Demonstrated experience with RDBMS, Unix Shell scripting preferred (e.g., SQL, Postgres, Mongo DB etc) - Experience with UNIX and comfortable working with the shell (bash or KRON preferred) - Good understanding of Data warehousing concepts. Big data systems : Hadoop, NoSQL, HBase, HDFS, MapReduce - Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments. - Working with data delivery teams to set up new Hadoop users. This job includes setting up Linux users, setting up and testing HDFS, Hive, Pig and MapReduce access for the new users. - Cluster maintenance as well as creation and removal of nodes using tools like Ganglia, Nagios, Cloudera Manager Enterprise, and other tools. - Performance tuning of Hadoop clusters and Hadoop MapReduce routines. - Screen Hadoop cluster job performances and capacity planning - Monitor Hadoop cluster connectivity and security - File system management and monitoring. - HDFS support and maintenance. - Collaborating with application teams to install operating system and - Hadoop updates, patches, version upgrades when required. - Defines, develops, documents and maintains Hive based ETL mappings and scripts