About Datalicious Pty Ltd
Datalicious (An Equifax Company) is a global data analytics agency that helps marketers improve customer journeys through the implementation of smart data driven marketing strategies. Our team of marketing data specialists offers a wide range of skills suitable for any challenge and covers everything from web analytics over data engineering and data science to software development.
- Designing and implementing fine-tuned production ready data/ML pipelines in Hadoop platform.
- Driving optimization, testing and tooling to improve quality.
- Reviewing and approving high level & amp; detailed design to ensure that the solution delivers to the business needs and aligns to the data & analytics architecture principles and roadmap.
- Understanding business requirements and solution design to develop and implement solutions that adhere to big data architectural guidelines and address business requirements.
- Following proper SDLC (Code review, sprint process).
- Identifying, designing, and implementing internal process improvements: automating manual processes, optimizing data delivery, etc.
- Building robust and scalable data infrastructure (both batch processing and real-time) to support needs from internal and external users.
- Understanding various data security standards and using secure data security tools to apply and adhere to the required data controls for user access in the Hadoop platform.
- Supporting and contributing to development guidelines and standards for data ingestion.
- Working with a data scientist and business analytics team to assist in data ingestion and data related technical issues.
- Designing and documenting the development & deployment flow.
- Experience in developing rest API services using one of the Scala frameworks.
- Ability to troubleshoot and optimize complex queries on the Spark platform
- Expert in building and optimizing ‘big data’ data/ML pipelines, architectures and data sets.
- Knowledge in modelling unstructured to structured data design.
- Experience in Big Data access and storage techniques.
- Experience in doing cost estimation based on the design and development.
- Excellent debugging skills for the technical stack mentioned above which even includes analyzing server logs and application logs.
- Highly organized, self-motivated, proactive, and ability to propose best design solutions.
- Good time management and multitasking skills to work to deadlines by working independently and as a part of a team.
Job roles and responsibilities:
- Minimum 3 to 4 years hands-on designing, building and operationalizing large-scale enterprise data solutions and applications using GCP data and analytics services like, Cloud DataProc, Cloud Dataflow, Cloud BigQuery, Cloud PubSub, Cloud Functions.
- Hands-on experience in analyzing, re-architecting and re-platforming on-premise data warehouses to data platforms on GCP cloud using GCP/3rd party services.
- Experience in designing and building data pipelines within a hybrid big data architecture using Java, Python, Scala & GCP Native tools.
- Hands-on Orchestrating and scheduling Data pipelines using Composer, Airflow.
- Experience in performing detail assessments of current state data platforms and creating an appropriate transition path to GCP cloud
Technical Skills Required:
- Strong Experience in GCP data and Analytics Services
- Working knowledge on Big data ecosystem-Hadoop, Spark, Hbase, Hive, Scala etc
- Experience in building and optimizing data pipelines in Spark
- Strong skills in Orchestration of workflows with Composer/Apache Airflow
- Good knowledge on object-oriented scripting languages: Python (must have) and Java or C++.
- Good to have knowledge in building CI/CD pipelines with GCP Cloud Build and native GCP services
Deliver plugins for our Python-based ETL pipelines
Deliver Python microservices for provisioning and managing cloud infrastructure
Implement algorithms to analyse large data sets
Draft design documents that translate requirements into code
Effectively manage challenges associated with handling large volumes of data working to tight deadlines
Manage expectations with internal stakeholders and context-switch in a fast-paced environment
Thrive in an environment that uses AWS and Elasticsearch extensively
Keep abreast of technology and contribute to the engineering strategy
Champion best development practices and provide mentorship to others
First and foremost you are a Python developer, experienced with the Python Data stack
You love and care about data
Your code is an artistic manifest reflecting how elegant you are in what you do
You feel sparks of joy when a new abstraction or pattern arises from your code
You support the manifests DRY (Don’t Repeat Yourself) and KISS (Keep It Short and Simple)
You are a continuous learner
You have a natural willingness to automate tasks
You have critical thinking and an eye for detail
Excellent ability and experience of working to tight deadlines
Sharp analytical and problem-solving skills
Strong sense of ownership and accountability for your work and delivery
Excellent written and oral communication skills
Mature collaboration and mentoring abilities
We are keen to know your digital footprint (community talks, blog posts, certifications, courses you have participated in or you are keen to, your personal projects as well as any kind of contributions to the open-source communities if any)
Delivering complex software, ideally in a FinTech setting
Experience with CI/CD tools such as Jenkins, CircleCI
Experience with code versioning (git / mercurial / subversion)
Job Description - Sr Azure Data Engineer
Roles & Responsibilities:
- Hands-on programming in C# / .Net,
- Develop serverless applications using Azure Function Apps.
- Writing complex SQL Queries, Stored procedures, and Views.
- Creating Data processing pipeline(s).
- Develop / Manage large-scale Data Warehousing and Data processing solutions.
- Provide clean, usable data and recommend data efficiency, quality, and data integrity.
- Should have working experience on C# /.Net.
- Proficient with writing SQL queries, Stored Procedures, and Views
- Should have worked on Azure Cloud Stack.
- Should have working experience ofin developing serverless code.
- Must have MANDATORILY worked on Azure Data Factory.
- 4+ years of relevant experience
The Biostrap platform extracts many metrics related to health, sleep, and activity. Many algorithms are designed through research and often based on scientific literature, and in some cases they are augmented with or entirely designed using machine learning techniques. Biostrap is seeking a Data Scientist to design, develop, and implement algorithms to improve existing metrics and measure new ones.
As a Data Scientist at Biostrap, you will take on projects to improve or develop algorithms to measure health metrics, including:
- Research: search literature for starting points of the algorithm
- Design: decide on the general idea of the algorithm, in particular whether to use machine learning, mathematical techniques, or something else.
- Implement: program the algorithm in Python, and help deploy it.
The algorithms and their implementation will have to be accurate, efficient, and well-documented.
- A Master’s degree in a computational field, with a strong mathematical background.
- Strong knowledge of, and experience with, different machine learning techniques, including their theoretical background.
- Strong experience with Python
- Experience with Keras/TensorFlow, and preferably also with RNNs
- Experience with AWS or similar services for data pipelining and machine learning.
- Ability and drive to work independently on an open problem.
- Fluency in English.
• Responsible for developing and maintaining applications with PySpark
- Required to work individually or as part of a team on data science projects and work closely with lines of business to understand business problems and translate them into identifiable machine learning problems which can be delivered as technical solutions.
- Build quick prototypes to check feasibility and value to the business.
- Design, training, and deploying neural networks for computer vision and machine learning-related problems.
- Perform various complex activities related to statistical/machine learning.
- Coordinate with business teams to provide analytical support for developing, evaluating, implementing, monitoring, and executing models.
- Collaborate with technology teams to deploy the models to production.
- 2+ years of experience in solving complex business problems using machine learning.
- Understanding and modeling experience in supervised, unsupervised, and deep learning models; hands-on knowledge of data wrangling, data cleaning/ preparation, dimensionality reduction is required.
- Experience in Computer Vision/Image Processing/Pattern Recognition, Machine Learning, Deep Learning, or Artificial Intelligence.
- Understanding of Deep Learning Architectures like InceptionNet, VGGNet, FaceNet, YOLO, SSD, RCNN, MASK Rcnn, ResNet.
- Experience with one or more deep learning frameworks e.g., TensorFlow, PyTorch.
- Knowledge of vector algebra, statistical and probabilistic modeling is desirable.
- Proficiency in programming skills involving Python, C/C++, and Python Data Science Stack (NumPy, SciPy, Pandas, Scikit-learn, Jupyter, IPython).
- Experience working with Amazon SageMaker or Azure ML Studio for deployments is a plus.
- Experience in data visualization software such as Tableau, ELK, etc is a plus.
- Strong analytical, critical thinking, and problem-solving skills.
- B.E/ B.Tech./ M. E/ M. Tech in Computer Science, Applied Mathematics, Statistics, Data Science, or related Engineering field.
- Minimum 60% in Graduation or Post-Graduation
- Great interpersonal and communication skills
Data Platform engineering at Uber is looking for a strong Technical Lead (Level 5a Engineer) who has built high quality platforms and services that can operate at scale. 5a Engineer at Uber exhibits following qualities:
- Demonstrate tech expertise › Demonstrate technical skills to go very deep or broad in solving classes of problems or creating broadly leverageable solutions.
- Execute large scale projects › Define, plan and execute complex and impactful projects. You communicate the vision to peers and stakeholders.
- Collaborate across teams › Domain resource to engineers outside your team and help them leverage the right solutions. Facilitate technical discussions and drive to a consensus.
- Coach engineers › Coach and mentor less experienced engineers and deeply invest in their learning and success. You give and solicit feedback, both positive and negative, to others you work with to help improve the entire team.
- Tech leadership › Lead the effort to define the best practices in your immediate team, and help the broader organization establish better technical or business processes.
What You’ll Do
- Build a scalable, reliable, operable and performant data analytics platform for Uber’s engineers, data scientists, products and operations teams.
- Work alongside the pioneers of big data systems such as Hive, Yarn, Spark, Presto, Kafka, Flink to build out a highly reliable, performant, easy to use software system for Uber’s planet scale of data.
- Become proficient of multi-tenancy, resource isolation, abuse prevention, self-serve debuggability aspects of a high performant, large scale, service while building these capabilities for Uber's engineers and operation folks.
What You’ll Need
- 7+ years experience in building large scale products, data platforms, distributed systems in a high caliber environment.
- Architecture: Identify and solve major architectural problems by going deep in your field or broad across different teams. Extend, improve, or, when needed, build solutions to address architectural gaps or technical debt.
- Software Engineering/Programming: Create frameworks and abstractions that are reliable and reusable. advanced knowledge of at least one programming language, and are happy to learn more. Our core languages are Java, Python, Go, and Scala.
- Data Engineering: Expertise in one of the big data analytics technologies we currently use such as Apache Hadoop (HDFS and YARN), Apache Hive, Impala, Drill, Spark, Tez, Presto, Calcite, Parquet, Arrow etc. Under the hood experience with similar systems such as Vertica, Apache Impala, Drill, Google Borg, Google BigQuery, Amazon EMR, Amazon RedShift, Docker, Kubernetes, Mesos etc.
- Execution & Results: You tackle large technical projects/problems that are not clearly defined. You anticipate roadblocks and have strategies to de-risk timelines. You orchestrate work that spans multiple teams and keep your stakeholders informed.
- A team player: You believe that you can achieve more on a team that the whole is greater than the sum of its parts. You rely on others’ candid feedback for continuous improvement.
- Business acumen: You understand requirements beyond the written word. Whether you’re working on an API used by other developers, an internal tool consumed by our operation teams, or a feature used by millions of customers, your attention to details leads to a delightful user experience.
- 5+ years of experience in a Data Engineer role
- Graduate degree in Computer Science, Statistics, Informatics, Information Systems or another quantitative field.
- Experience with big data tools: Hadoop, Spark, Kafka, etc.
- Experience with relational SQL and NoSQL databases such as Cassandra.
- Experience with AWS cloud services: EC2, EMR, Athena
- Experience with object-oriented/object function scripting languages: Python, Java, C++, Scala, etc.
- Advanced SQL knowledge and experience working with relational databases, query authoring (SQL) as well as familiarity with unstructured datasets.
- Deep problem-solving skills to perform root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.