Job Description:
As an Azure Data Engineer, your role will involve designing, developing, and maintaining data solutions on the Azure platform. You will be responsible for building and optimizing data pipelines, ensuring data quality and reliability, and implementing data processing and transformation logic. Your expertise in Azure Databricks, Python, SQL, Azure Data Factory (ADF), PySpark, and Scala will be essential for performing the following key responsibilities:
Designing and developing data pipelines: You will design and implement scalable and efficient data pipelines using Azure Databricks, PySpark, and Scala. This includes data ingestion, data transformation, and data loading processes.
Data modeling and database design: You will design and implement data models to support efficient data storage, retrieval, and analysis. This may involve working with relational databases, data lakes, or other storage solutions on the Azure platform.
Data integration and orchestration: You will leverage Azure Data Factory (ADF) to orchestrate data integration workflows and manage data movement across various data sources and targets. This includes scheduling and monitoring data pipelines.
Data quality and governance: You will implement data quality checks, validation rules, and data governance processes to ensure data accuracy, consistency, and compliance with relevant regulations and standards.
Performance optimization: You will optimize data pipelines and queries to improve overall system performance and reduce processing time. This may involve tuning SQL queries, optimizing data transformation logic, and leveraging caching techniques.
Monitoring and troubleshooting: You will monitor data pipelines, identify performance bottlenecks, and troubleshoot issues related to data ingestion, processing, and transformation. You will work closely with cross-functional teams to resolve data-related problems.
Documentation and collaboration: You will document data pipelines, data flows, and data transformation processes. You will collaborate with data scientists, analysts, and other stakeholders to understand their data requirements and provide data engineering support.
Skills and Qualifications:
Strong experience with Azure Databricks, Python, SQL, ADF, PySpark, and Scala.
Proficiency in designing and developing data pipelines and ETL processes.
Solid understanding of data modeling concepts and database design principles.
Familiarity with data integration and orchestration using Azure Data Factory.
Knowledge of data quality management and data governance practices.
Experience with performance tuning and optimization of data pipelines.
Strong problem-solving and troubleshooting skills related to data engineering.
Excellent collaboration and communication skills to work effectively in cross-functional teams.
Understanding of cloud computing principles and experience with Azure services.
Similar jobs
Who We Are:
DeepIntent is leading the healthcare advertising industry with data-driven solutions built for the future. From day one, our mission has been to improve patient outcomes through the artful use of advertising, data science, and real-world clinical data.
What You’ll Do:
We are looking for a Senior Software Engineer based in Pune, India who can master both DeepIntent’s data architectures and pharma research and analytics methodologies to make significant contributions to how health media is analyzed by our clients. This role requires an Engineer who not only understands DBA functions but also how they impact research objectives and can work with researchers and data scientists to achieve impactful results.
This role will be in the Analytics Organization and will require integration and partnership with the Engineering Organization. The ideal candidate is a self-starter who is inquisitive who is not afraid to take on and learn from challenges and will constantly seek to improve the facets of the business they manage. The ideal candidate will also need to demonstrate the ability to collaborate and partner with others.
- Serve as the Engineering interface between Analytics and Engineering teams
- Develop and standardized all interface points for analysts to retrieve and analyze data with a focus on research methodologies and data based decisioning
- Optimize queries and data access efficiencies, serve as expert in how to most efficiently attain desired data points
- Build “mastered” versions of the data for Analytics specific querying use cases
- Help with data ETL, table performance optimization
- Establish formal data practice for the Analytics practice in conjunction with rest of DeepIntent
- Build & operate scalable and robust data architectures
- Interpret analytics methodology requirements and apply to data architecture to create standardized queries and operations for use by analytics teams
- Implement DataOps practices
- Master existing and new Data Pipelines and develop appropriate queries to meet analytics specific objectives
- Collaborate with various business stakeholders, software engineers, machine learning engineers, analysts
- Operate between Engineers and Analysts to unify both practices for analytics insight creation
Who You Are:
- Adept in market research methodologies and using data to deliver representative insights
- Inquisitive, curious, understands how to query complicated data sets, move and combine data between databases
- Deep SQL experience is a must
- Exceptional communication skills with ability to collaborate and translate with between technical and non technical needs
- English Language Fluency and proven success working with teams in the U.S.
- Experience in designing, developing and operating configurable Data pipelines serving high volume and velocity data
- Experience working with public clouds like GCP/AWS
- Good understanding of software engineering, DataOps, and data architecture, Agile and DevOps methodologies
- Experience building Data architectures that optimize performance and cost, whether the components are prepackaged or homegrown
- Proficient with SQL,Python or JVM based language, Bash
- Experience with any of Apache open source projects such as Spark, Druid, Beam, Airflow etc.and big data databases like BigQuery, Clickhouse, etc
- Ability to think big, take bets and innovate, dive deep, hire and develop the best talent, learn and be curious
- Comfortable to work in EST Time Zone
Science)
Have 2 to 6 years of experience working in a similar role in a startup environment
SQL and Excel have no secrets for you
You love visualizing data with Tableau
Any experience with product analytics tools (Mixpanel, Clevertap) is a plus
You solve math puzzles for fun
A strong analytical mindset with a problem-solving attitude
Comfortable with being critical and speaking your mind
You can easily switch between coding (R or Python) and having a business
discussion
Be a team player who thrives in a fast-paced and constantly changing environment
Role Summary
As a Data Engineer, you will be an integral part of our Data Engineering team supporting an event-driven server less data engineering pipeline on AWS cloud, responsible for assisting in the end-to-end analysis, development & maintenance of data pipelines and systems (DataOps). You will work closely with fellow data engineers & production support to ensure the availability and reliability of data for analytics and business intelligence purposes.
Requirements:
· Around 4 years of working experience in data warehousing / BI system.
· Strong hands-on experience with Snowflake AND strong programming skills in Python
· Strong hands-on SQL skills
· Knowledge with any of the cloud databases such as Snowflake,Redshift,Google BigQuery,RDS,etc.
· Knowledge on debt for cloud databases
· AWS Services such as SNS, SQS, ECS, Docker, Kinesis & Lambda functions
· Solid understanding of ETL processes, and data warehousing concepts
· Familiarity with version control systems (e.g., Git/bit bucket, etc.) and collaborative development practices in an agile framework
· Experience with scrum methodologies
· Infrastructure build tools such as CFT / Terraform is a plus.
· Knowledge on Denodo, data cataloguing tools & data quality mechanisms is a plus.
· Strong team player with good communication skills.
Overview Optisol Business Solutions
OptiSol was named on this year's Best Companies to Work for list by Great place to work. We are a team of about 500+ Agile employees with a development center in India and global offices in the US, UK (United Kingdom), Australia, Ireland, Sweden, and Dubai. 16+ years of joyful journey and we have built about 500+ digital solutions. We have 200+ happy and satisfied clients across 24 countries.
Benefits, working with Optisol
· Great Learning & Development program
· Flextime, Work-at-Home & Hybrid Options
· A knowledgeable, high-achieving, experienced & fun team.
· Spot Awards & Recognition.
· The chance to be a part of next success story.
· A competitive base salary.
More Than Just a Job, We Offer an Opportunity To Grow. Are you the one, who looks out to Build your Future & Build your Dream? We have the Job for you, to make your dream comes true.
About Us:
Cognitio Analytics is an award-winning, niche service provider that offers digital transformation solutions powered by AI and machine learning. We help clients realize the full potential of their data assets and the investments made in related technologies, be it analytics and big data platforms, digital technologies or your people. Our solutions include Health Analytics powered by Cognitio’s Health Data Factory that drives better care outcomes and higher ROI on care spending. We specialize in providing Data Governance solutions that help effectively use data in a consistent and compliant manner. Additionally, our smart intelligence solutions enable a deeper understanding of operations through the use of data science and advanced solutions like process mining technologies. We have offices in New Jersey and Connecticut in the USA and in Gurgaon in India.
What we're looking for:
- Ability in data modelling, design, build and deploy DW/BI systems for Insurance, Health Care, Banking, etc.
- Performance tuning of ETL process and SQL queries and recommend & implement ETL and query tuning techniques.
- Develop and create transformation queries, views, and stored procedures for ETL processes, and process automations.
- Translate business needs into technical specifications, evaluate, and improve existing BI systems.
- Use business intelligence and visualization software (e.g., Tableau, Qlik Sense, Power BI etc.) to empower customers to drive their analytics and reporting
- Develop and update technical documentation for BI systems.
Key Technical Skills:
- Hands on experience in MS SQL Server & MSBI (SSIS/SSRS/SSAS), with understanding of database concepts, star schema, SQL Tuning, OLAP, Databricks, Hadoop, Spark, cloud technologies.
- Experience in designing and building complete ETL / SSIS processes and transforming data for ODS, Staging, and Data Warehouse
- Experience building self-service reporting solutions using business intelligence software (e.g., Tableau, Qlik Sense, Power BI etc.)
Work Timing: 5 Days A Week
Responsibilities include:
• Ensure right stakeholders gets right information at right time
• Requirement gathering with stakeholders to understand their data requirement
• Creating and deploying reports
• Participate actively in datamarts design discussions
• Work on both RDBMS as well as Big Data for designing BI Solutions
• Write code (queries/procedures) in SQL / Hive / Drill that is both functional and elegant,
following appropriate design patterns
• Design and plan BI solutions to automate regular reporting
• Debugging, monitoring and troubleshooting BI solutions
• Creating and deploying datamarts
• Writing relational and multidimensional database queries
• Integrate heterogeneous data sources into BI solutions
• Ensure Data Integrity of data flowing from heterogeneous data sources into BI solutions.
Minimum Job Qualifications:
• BE/B.Tech in Computer Science/IT from Top Colleges
• 1-5 years of experience in Datawarehousing and SQL
• Excellent Analytical Knowledge
• Excellent technical as well as communication skills
• Attention to even the smallest detail is mandatory
• Knowledge of SQL query writing and performance tuning
• Knowledge of Big Data technologies like Apache Hadoop, Apache Hive, Apache Drill
• Knowledge of fundamentals of Business Intelligence
• In-depth knowledge of RDBMS systems, Datawarehousing and Datamarts
• Smart, motivated and team oriented
Desirable Requirements
• Sound knowledge of software development in Programming (preferably Java )
• Knowledge of the software development lifecycle (SDLC) and models
Job responsibilities
- You will partner with teammates to create complex data processing pipelines in order to solve our clients' most complex challenges
- You will collaborate with Data Scientists in order to design scalable implementations of their models
- You will pair to write clean and iterative code based on TDD
- Leverage various continuous delivery practices to deploy, support and operate data pipelines
- Advise and educate clients on how to use different distributed storage and computing technologies from the plethora of options available
- Develop and operate modern data architecture approaches to meet key business objectives and provide end-to-end data solutions
- Create data models and speak to the tradeoffs of different modeling approaches
- Seamlessly incorporate data quality into your day-to-day work as well as into the delivery process
- Assure effective collaboration between Thoughtworks' and the client's teams, encouraging open communication and advocating for shared outcomes
- You have a good understanding of data modelling and experience with data engineering tools and platforms such as Kafka, Spark, and Hadoop
- You have built large-scale data pipelines and data-centric applications using any of the distributed storage platforms such as HDFS, S3, NoSQL databases (Hbase, Cassandra, etc.) and any of the distributed processing platforms like Hadoop, Spark, Hive, Oozie, and Airflow in a production setting
- Hands on experience in MapR, Cloudera, Hortonworks and/or cloud (AWS EMR, Azure HDInsights, Qubole etc.) based Hadoop distributions
- You are comfortable taking data-driven approaches and applying data security strategy to solve business problems
- Working with data excites you: you can build and operate data pipelines, and maintain data storage, all within distributed systems
- You're genuinely excited about data infrastructure and operations with a familiarity working in cloud environments
- Professional skills
- You're resilient and flexible in ambiguous situations and enjoy solving problems from technical and business perspectives
- An interest in coaching, sharing your experience and knowledge with teammates
- You enjoy influencing others and always advocate for technical excellence while being open to change when needed
- Presence in the external tech community: you willingly share your expertise with others via speaking engagements, contributions to open source, blogs and more
Our startup is building an AI based decentralized marketplace for the travel industry. Its founded by technologists, marketeers, and PhD/IITians from Google, Airbnb, Agoda etc.
We're looking to bring a Scala backend engineer with experience on reactive actor system using Akka/Akka-streams. The individual will be part of our Europe based core team where individual team members have 20+ years experience each. Individual must've been hands on with scalable platform development and capable of individual contribution as well.
● Frame ML / AI use cases that can improve the company’s product
● Implement and develop ML / AI / Data driven rule based algorithms as software items
● For example, building a chatbot that replies an answer from relevant FAQ, and
reinforcing the system with a feedback loop so that the bot improves
Must have skills:
● Data extraction and ETL
● Python (numpy, pandas, comfortable with OOP)
● Django
● Knowledge of basic Machine Learning / Deep Learning / AI algorithms and ability to
implement them
● Good understanding of SDLC
● Deployed ML / AI model in a mobile / web product
● Soft skills : Strong communication skills & Critical thinking ability
Good to have:
● Full stack development experience
Required Qualification:
B.Tech. / B.E. degree in Computer Science or equivalent software engineering
Key Responsibilities:
- Partnering with clients and internal business owners (product, marketing, edit, etc.) to understand needs and develop models and products for Kaleidofin business line.
- Good understanding of the underlying business and workings of cross functional teams for successful execution
- Design and develop analyses based on business requirement needs and challenges.
- Leveraging statistical analysis on consumer research and data mining projects, including segmentation, clustering, factor analysis, multivariate regression, predictive modeling, hyperparameter tuning, ensembling etc.
- Providing statistical analysis on custom research projects and consult on A/B testing and other statistical analysis as needed. Other reports and custom analysis as required.
- Identify and use appropriate investigative and analytical technologies to interpret and verify results.
- Apply and learn a wide variety of tools and languages to achieve results
- Use best practices to develop statistical and/ or machine learning techniques to build models that address business needs.
- Collaborate with the team to improve the effectiveness of business decisions using data and machine learning/predictive modeling.
- Innovate on projects by using new modeling techniques or tools.
- Utilize effective project planning techniques to break down complex projects into tasks and ensure deadlines are kept.
- Communicate findings to team and leadership to ensure models are well understood and incorporated into business processes.
Skills:
- 2+ year experience in advanced analytics, model building, statistical modeling, optimization, and machine learning algorithms.
- Machine Learning Algorithms: Crystal clear understanding, coding, implementation, error analysis, model tuning knowledge on Linear Regression, Logistic Regression, SVM, shallow Neural Networks, clustering, Decision Trees, Random forest, Boosting trees, Recommender Systems, ARIMA and Anomaly Detection. Feature selection, hyper parameters tuning, model selection and error analysis, ensemble methods.
- Strong with programming languages like Python and data processing using SQL or equivalent and ability to experiment with newer open source tools
- Experience in normalizing data to ensure it is homogeneous and consistently formatted to enable sorting, query and analysis.
- Experience designing, developing, implementing and maintaining a database and programs to manage data analysis efforts.
- Experience with big data and cloud computing viz. Spark, Hadoop (MapReduce, PIG, HIVE)
- Experience in risk and credit scoring domains preferred
1)Machine learning development using Python or Scala Spark
2)Knowledge of multiple ML algorithms like Random forest, XG boost, RNN, CNN, Transform learning etc..
3)Aware of typical challenges in machine learning implementation and respective applications
Good to have
1)Stack development or DevOps team experience
2)Cloud service (AWS, Cloudera), SAAS, PAAS
3)Big data tools and framework
4)SQL experience