Roles and Responsibilities:• Responsible for developing and maintaining applications with PySpark • Contribute to the overall design and architecture of the application developed and deployed. • Performance Tuning wrt to executor sizing and other environmental parameters, code optimization, partitions tuning, etc. • Interact with business users to understand requirements and troubleshoot issues. • Implement Projects based on functional specifications.Must Have Skills: • Good experience in Pyspark - Including Dataframe core functions and Spark SQL • Good experience in SQL DBs - Be able to write queries including fair complexity. • Should have excellent experience in Big Data programming for data transformation and aggregations • Good at ETL architecture. Business rules processing and data extraction from Data Lake into data streams for business consumption. • Good customer communication. • Good Analytical skills
Required: 5-10 years of experience in Application and/or Data Operations Support domain. Expertise in doing RCA (root-cause analysis) and collaborating with development teams for CoE (correction of errors). Good communication & collaboration skills - liaison with product, operations & business teams to understand the requirements and provide data extracts & reports on need basis. Experience in working in an enterprise environment, with a good discipline & adherence to the SLA. Good understanding of the ticketing tools, to track the various requests and manage the lifecycle for multiple requests e.g. JIRA, Service-Now, Rally, Change-Gear etc. Orientation towards addressing the root-cause for any issue i.e. collaborate and follow-up with development teams to ensure permanent fix & prevention is given high priority. Ability to create SOPs (system operating procedures) in Confluence/Wiki to ensure there is a good reference for the support team to utilise. Self-starter and a collaborator having the ability to independently acquire the knowledge required in succeeding the job. Ability to mentor & lead Data Ops team-members for high quality of customer experience and resolution of issues on timely basis. Adherence to a well-defined process for workflow with partner teams. Specifically for Data Ops Engineer role, following experience is required: BI, Reporting & Data Warehousing domain Experience in production support for Data queries - monitoring, analysis & triage of issues Experience in using BI tools like MicroStrategy, Qlik, Power BI, Business Objects Expertise in data-analysis & writing SQL queries to provide insights into the production data. Experience with relational database (RDBMS) & data-mart technologies like DB2, RedShift, SQL Server, My SQL, Netezza etc. Ability to monitor ETL jobs in AWS stack with tools like Tidal, Autosys etc. Experience with Big data platforms like Amazon RedShift Responsibilities: Production Support (Level 2) Job failures resolution - re-runs based on SOPs Report failures root-cause analysis & resolution Address queries for existing Reports & APIs Ad-hoc data requests for product & business stakeholders: Transactions per day, per entity (merchant, card-type, card-category) Custom extracts Ability to track & report the health of the system Create matrix for issue volume Coordinate and setup an escalation workflow Provide status-reports on regular basis for stakeholders review
Our product is centered around lots of data being processed, ingested and read efficiently. The underlying systems need to provide capabilities update and ingest data on the order of billions of records on a daily basis. Complex analytics queries need to run on 10s of billions of rows where a single query that can potentially touch 100+ million rows needs to finish in interactive SLAs. All of this processing happens on data with several 100s of dimensions and tens of thousands of metrics.This leads to a very interesting and challenging use case in the emerging field of large scale distributed HTAP, which is still not mature enough to provide a solution out of the box that works for our scale and SLAs. So, we are building a solution that can handle the complexity of our use case and scale to several trillions of rows. As a "Database Engineer", you will evolve, architect, build and scale the core data warehouse that sits at the heart of Clarisights enabling large scale distributed, interactive analytics on near realtime data.What you'll do- Understanding and gaining expertise in existing data warehouse.- Use the above knowledge to identify gaps in the current system and formulate strategies around what can be done to fill them- Avail KPIs around the data warehouse.- Find solutions to evolve and scale the data warehouse. This will involve a lot of technical research, benchmarking and testing of existing and candidate replacement systems.- Bulid from scratch all or parts of the data warehouse to improve the KPIs.- Ensure the SLAs and SLOs of data warehouse, which will require assuming ownership and being oncall for the same.- Gain deep understanding into Linux and understand concepts that drive performance characteristics like IO scheduling, paging, processing scheduling, CPU instruction pipelining etc.- Adopt/build tooling and tune the systems to extract maximum performance out of the underlying hardware.- Build wrappers/microservices for improving visibility, control, adoption and ease of use for the data warehouse.- Build tooling and automation for monitoring, debugging and deployment of the warehouse.- Contribute to open source database technologies that are used at or are potential candidates for use.What you bringWe are looking for engineers with a strong passion for solving challenging engineering problems and a burning desire to learn and grow in a fast growing startup. This is not an easy gig, it will require strong technical chops and an insatiable curiosity to make things better. We need passionate and mature engineers who can do wonders with some mentoring and don't need to be managed.- Distributed systems: You have a good understanding of general patterns of scaling and fault-tolerance in large scale distributed systems.- Databases: You have a good understanding of database concepts like query optimization, indexing, transactions, sharding, replication etc.- Data pipelines: You have a working knowledge of distributed data processing systems.- Engineer at heart: You thrive on writing great code and have a strong appreciation for modular, testable and maintainable code, and make sure to document it. You have the ability to take new initiatives and questioning status quo.- Passion & Drive to learn and excel: You believe in our vision. You drive the product for the better, always looking to improve things, and soon become the go-to person to talk to on something that you mastered along. You love dabbling in your own side-projects and learning new skills that are not necessarily part of your normal day job.- Inquisitiveness: You are curious to know how different modules on our platform work. You are not afraid to venture into unknown territories of code. You ask questions.- Ownership: You are your own manager. You have the ability to implement engineering tasks on your own without a need for micro-management and take responsibility for any task that has been assigned to you.- Teamwork: You should be helpful and work well with teams. You’re probably someone who enjoys sharing knowledge with team-mates, asking for help when they need it.- Open Source Contribution: Bonus.
Job Description We are looking for a Data Engineer that will be responsible for collecting, storing, processing, and analyzing huge sets of data that is coming from different sources. Responsibilities Working with Big Data tools and frameworks to provide requested capabilities Identify development needs in order to improve and streamline operations Develop and manage BI solutions Implementing ETL process and Data Warehousing Monitoring performance and managing infrastructure Skills Proficient understanding of distributed computing principles Proficiency with Hadoop and Spark Experience with building stream-processing systems, using solutions such as Kafka and Spark-Streaming Good knowledge of Data querying tools SQL and Hive Knowledge of various ETL techniques and frameworks Experience with Python/Java/Scala (at least one) Experience with cloud services such as AWS or GCP Experience with NoSQL databases, such as DynamoDB,MongoDB will be an advantage Excellent written and verbal communication skills
Skill Set SQL, Python, Numpy,Pandas,Knowledge of Hive and Data warehousing concept will be a plus point.JD - Strong analytical skills with the ability to collect, organise, analyse and interpret trends or patterns in complex data sets and provide reports & visualisations.- Work with management to prioritise business KPIs and information needs Locate and define new process improvement opportunities.- Technical expertise with data models, database design and development, data mining and segmentation techniques- Proven success in a collaborative, team-oriented environment- Working experience with geospatial data will be a plus.
DESCRIPTION :- We- re looking for an experienced Data Engineer to be part of our team who has a strong cloud technology experience to help our big data team to take our products to the next level.- This is a hands-on role, you will be required to code and develop the product in addition to your leadership role. You need to have a strong software development background and love to work with cutting edge big data platforms.- You are expected to bring with you extensive hands-on experience with Amazon Web Services (Kinesis streams, EMR, Redshift), Spark and other Big Data processing frameworks and technologies as well as advanced knowledge of RDBS and Data Warehousing solutions.REQUIREMENTS :- Strong background working on large scale Data Warehousing and Data processing solutions.- Strong Python and Spark programming experience.- Strong experience in building big data pipelines.- Very strong SQL skills are an absolute must.- Good knowledge of OO, functional and procedural programming paradigms.- Strong understanding of various design patterns.- Strong understanding of data structures and algorithms.- Strong experience with Linux operating systems.- At least 2+ years of experience working as a software developer or a data-driven environment.- Experience working in an agile environment.Lots of passion, motivation and drive to succeed!Highly desirable :- Understanding of agile principles specifically scrum.- Exposure to Google cloud platform services such as BigQuery, compute engine etc.- Docker, Puppet, Ansible, etc..- Understanding of digital marketing and digital advertising space would be advantageous.BENEFITS :Datalicious is a global data technology company that helps marketers improve customer journeys through the implementation of smart data-driven marketing strategies. Our team of marketing data specialists offer a wide range of skills suitable for any challenge and cover everything from web analytics to data engineering, data science and software development.Experience : Join us at any level and we promise you'll feel up-levelled in no time, thanks to the fast-paced, transparent and aggressive growth of DataliciousExposure : Work with ONLY the best clients in the Australian and SEA markets, every problem you solve would directly impact millions of real people at a large scale across industriesWork Culture : Voted as the Top 10 Tech Companies in Australia. Never a boring day at work, and we walk the talk. The CEO organises nerf-gun bouts in the middle of a hectic day.Money: We'd love to have a long term relationship because long term benefits are exponential. We encourage people to get technical certifications via online courses or digital schools.So if you are looking for the chance to work for an innovative, fast growing business that will give you exposure across a diverse range of the world's best clients, products and industry leading technologies, then Datalicious is the company for you!