- Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
- Experience in migrating on-premise data warehouses to data platforms on AZURE cloud.
- Designing and implementing data engineering, ingestion, and transformation functions
-
Azure Synapse or Azure SQL data warehouse
-
Spark on Azure is available in HD insights and data bricks
- Experience with Azure Analysis Services
- Experience in Power BI
- Experience with third-party solutions like Attunity/Stream sets, Informatica
- Experience with PreSales activities (Responding to RFPs, Executing Quick POCs)
- Capacity Planning and Performance Tuning on Azure Stack and Spark.
Similar jobs
Mandatory Skills: Azure Data Lake Storage, Azure SQL databases, Azure Synapse, Data Bricks (Pyspark/Spark), Python, SQL, Azure Data Factory.
Good to have: Power BI, Azure IAAS services, Azure Devops, Microsoft Fabric
Ø Very strong understanding on ETL and ELT
Ø Very strong understanding on Lakehouse architecture.
Ø Very strong knowledge in Pyspark and Spark architecture.
Ø Good knowledge in Azure data lake architecture and access controls
Ø Good knowledge in Microsoft Fabric architecture
Ø Good knowledge in Azure SQL databases
Ø Good knowledge in T-SQL
Ø Good knowledge in CI /CD process using Azure devops
Ø Power BI
Company - Tekclan Software Solutions
Position – SQL Developer
Experience – Minimum 4+ years of experience in MS SQL server, SQL Programming, ETL development.
Location - Chennai
We are seeking a highly skilled SQL Developer with expertise in MS SQL Server, SSRS, SQL programming, writing stored procedures, and proficiency in ETL using SSIS. The ideal candidate will have a strong understanding of database concepts, query optimization, and data modeling.
Responsibilities:
1. Develop, optimize, and maintain SQL queries, stored procedures, and functions for efficient data retrieval and manipulation.
2. Design and implement ETL processes using SSIS for data extraction, transformation, and loading from various sources.
3. Collaborate with cross-functional teams to gather business requirements and translate them into technical specifications.
4. Create and maintain data models, ensuring data integrity, normalization, and performance.
5. Generate insightful reports and dashboards using SSRS to facilitate data-driven decision making.
6. Troubleshoot and resolve database performance issues, bottlenecks, and data inconsistencies.
7. Conduct thorough testing and debugging of SQL code to ensure accuracy and reliability.
8. Stay up-to-date with emerging trends and advancements in SQL technologies and provide recommendations for improvement.
9. Should be an independent and individual contributor.
Requirements:
1. Minimum of 4+ years of experience in MS SQL server, SQL Programming, ETL development.
2. Proven experience as a SQL Developer with a strong focus on MS SQL Server.
3. Proficiency in SQL programming, including writing complex queries, stored procedures, and functions.
4. In-depth knowledge of ETL processes and hands-on experience with SSIS.
5. Strong expertise in creating reports and dashboards using SSRS.
6. Familiarity with database design principles, query optimization, and data modeling.
7. Experience with performance tuning and troubleshooting SQL-related issues.
8. Excellent problem-solving skills and attention to detail.
9. Strong communication and collaboration abilities.
10. Ability to work independently and handle multiple tasks simultaneously.
Preferred Skills:
1. Certification in MS SQL Server or related technologies.
2. Knowledge of other database systems such as Oracle or MySQL.
3. Familiarity with data warehousing concepts and tools.
4. Experience with version control systems.
Experienced in writing complex SQL select queries (window functions & CTE’s) with advanced SQL experience
Should be an individual contributor for initial few months based on project movement team will be aligned
Strong in querying logic and data interpretation
Solid communication and articulation skills
Able to handle stakeholders independently with less interventions of reporting manager
Develop strategies to solve problems in logical yet creative ways
Create custom reports and presentations accompanied by strong data visualization and storytelling
Who is IDfy?
IDfy is the Fintech ScaleUp of the Year 2021. We build technology products that identify people accurately. This helps businesses prevent fraud and engage with the genuine with the least amount of friction. If you have opened an account with HDFC Bank or ordered from Amazon and Zomato or transacted through Paytm and BharatPe or played on Dream11 and MPL, you might have already experienced IDfy. Without even knowing it. Well…that’s just how we roll. Global credit rating giant TransUnion is an investor in IDfy. So are international venture capitalists like MegaDelta Capital, BEENEXT, and Dream Incubator. Blume Ventures is an early investor and continues to place its faith in us. We have kept our 500 clients safe from fraud while helping the honest get the opportunities they deserve. Our 350-people strong family works and plays out of our offices in suburban Mumbai. IDfy has run verifications on 100 million people. In the next 2 years, we want to touch a billion users. If you wish to be part of this journey filled with lots of action and learning, we welcome you to be part of the team!
What are we looking for?
As a senior software engineer in Data Fabric POD, you would be responsible for producing and implementing functional software solutions. You will work with upper management to define software requirements and take the lead on operational and technical projects. You would be working with a data management and science platform which provides Data as a service (DAAS) and Insight as a service (IAAS) to internal employees and external stakeholders.
You are eager to learn technology-agnostic who loves working with data and drawing insights from it. You have excellent organization and problem-solving skills and are looking to build the tools of the future. You have exceptional communication skills and leadership skills and the ability to make quick decisions.
YOE: 3 - 10 yrs
Position: Sr. Software Engineer/Module Lead/Technical Lead
Responsibilities:
- Work break-down and orchestrating the development of components for each sprint.
- Identifying risks and forming contingency plans to mitigate them.
- Liaising with team members, management, and clients to ensure projects are completed to standard.
- Inventing new approaches to detecting existing fraud. You will also stay ahead of the game by predicting future fraud techniques and building solutions to prevent them.
- Developing Zero Defect Software that is secured, instrumented, and resilient.
- Creating design artifacts before implementation.
- Developing Test Cases before or in parallel with implementation.
- Ensuring software developed passes static code analysis, performance, and load test.
- Developing various kinds of components (such as UI Components, APIs, Business Components, image Processing, etc. ) that define the IDfy Platforms which drive cutting-edge Fraud Detection and Analytics.
- Developing software using Agile Methodology and tools that support the same.
Requirements:
- Apache BEAM, Clickhouse, Grafana, InfluxDB, Elixir, BigQuery, Logstash.
- An understanding of Product Development Methodologies.
- Strong understanding of relational databases especially SQL and hands-on experience with OLAP.
- Experience in the creation of data ingestion pipelines and ETL pipeline (Good to have Apache Beam or Apache Airflow experience).
- Strong design skills in defining API Data Contracts / OOAD / Microservices / Data Models.
Good to have:
- Experience with TimeSeries DBs (we use InfluxDB) and Alerting / Anomaly Detection Frameworks.
- Visualization Layers: Metabase, PowerBI, Tableau.
- Experience in developing software in the Cloud such as GCP / AWS.
- A passion to explore new technologies and express yourself through technical blogs.
About the Company
Blue Sky Analytics is a Climate Tech startup that combines the power of AI & Satellite data to aid in the creation of a global environmental data stack. Our funders include Beenext and Rainmatter. Over the next 12 months, we aim to expand to 10 environmental data-sets spanning water, land, heat, and more!
We are looking for a data scientist to join its growing team. This position will require you to think and act on the geospatial architecture and data needs (specifically geospatial data) of the company. This position is strategic and will also require you to collaborate closely with data engineers, data scientists, software developers and even colleagues from other business functions. Come save the planet with us!
Your Role
Manage: It goes without saying that you will be handling large amounts of image and location datasets. You will develop dataframes and automated pipelines of data from multiple sources. You are expected to know how to visualize them and use machine learning algorithms to be able to make predictions. You will be working across teams to get the job done.
Analyze: You will curate and analyze vast amounts of geospatial datasets like satellite imagery, elevation data, meteorological datasets, openstreetmaps, demographic data, socio-econometric data and topography to extract useful insights about the events happening on our planet.
Develop: You will be required to develop processes and tools to monitor and analyze data and its accuracy. You will develop innovative algorithms which will be useful in tracking global environmental problems like depleting water levels, illegal tree logging, and even tracking of oil-spills.
Demonstrate: A familiarity with working in geospatial libraries such as GDAL/Rasterio for reading/writing of data, and use of QGIS in making visualizations. This will also extend to using advanced statistical techniques and applying concepts like regression, properties of distribution, and conduct other statistical tests.
Produce: With all the hard work being put into data creation and management, it has to be used! You will be able to produce maps showing (but not limited to) spatial distribution of various kinds of data, including emission statistics and pollution hotspots. In addition, you will produce reports that contain maps, visualizations and other resources developed over the course of managing these datasets.
Requirements
These are must have skill-sets that we are looking for:
- Excellent coding skills in Python (including deep familiarity with NumPy, SciPy, pandas).
- Significant experience with git, GitHub, SQL, AWS (S3 and EC2).
- Worked on GIS and is familiar with geospatial libraries such as GDAL and rasterio to read/write the data, a GIS software such as QGIS for visualisation and query, and basic machine learning algorithms to make predictions.
- Demonstrable experience implementing efficient neural network models and deploying them in a production environment.
- Knowledge of advanced statistical techniques and concepts (regression, properties of distributions, statistical tests and proper usage, etc.) and experience with applications.
- Capable of writing clear and lucid reports and demystifying data for the rest of us.
- Be curious and care about the planet!
- Minimum 2 years of demonstrable industry experience working with large and noisy datasets.
Benefits
- Work from anywhere: Work by the beach or from the mountains.
- Open source at heart: We are building a community where you can use, contribute and collaborate on.
- Own a slice of the pie: Possibility of becoming an owner by investing in ESOPs.
- Flexible timings: Fit your work around your lifestyle.
- Comprehensive health cover: Health cover for you and your dependents to keep you tension free.
- Work Machine of choice: Buy a device and own it after completing a year at BSA.
- Quarterly Retreats: Yes there's work-but then there's all the non-work+fun aspect aka the retreat!
- Yearly vacations: Take time off to rest and get ready for the next big assignment by availing the paid leaves.
Title: Data Engineer (Azure) (Location: Gurgaon/Hyderabad)
Salary: Competitive as per Industry Standard
We are expanding our Data Engineering Team and hiring passionate professionals with extensive
knowledge and experience in building and managing large enterprise data and analytics platforms. We
are looking for creative individuals with strong programming skills, who can understand complex
business and architectural problems and develop solutions. The individual will work closely with the rest
of our data engineering and data science team in implementing and managing Scalable Smart Data
Lakes, Data Ingestion Platforms, Machine Learning and NLP based Analytics Platforms, Hyper-Scale
Processing Clusters, Data Mining and Search Engines.
What You’ll Need:
- 3+ years of industry experience in creating and managing end-to-end Data Solutions, Optimal
Data Processing Pipelines and Architecture dealing with large volume, big data sets of varied
data types.
- Proficiency in Python, Linux and shell scripting.
- Strong knowledge of working with PySpark dataframes, Pandas dataframes for writing efficient pre-processing and other data manipulation tasks.
● Strong experience in developing the infrastructure required for data ingestion, optimal
extraction, transformation, and loading of data from a wide variety of data sources using tools like Azure Data Factory, Azure Databricks (or Jupyter notebooks/ Google Colab) (or other similiar tools).
- Working knowledge of github or other version control tools.
- Experience with creating Restful web services and API platforms.
- Work with data science and infrastructure team members to implement practical machine
learning solutions and pipelines in production.
- Experience with cloud providers like Azure/AWS/GCP.
- Experience with SQL and NoSQL databases. MySQL/ Azure Cosmosdb / Hbase/MongoDB/ Elasticsearch etc.
- Experience with stream-processing systems: Spark-Streaming, Kafka etc and working experience with event driven architectures.
- Strong analytic skills related to working with unstructured datasets.
Good to have (to filter or prioritize candidates)
- Experience with testing libraries such as pytest for writing unit-tests for the developed code.
- Knowledge of Machine Learning algorithms and libraries would be good to have,
implementation experience would be an added advantage.
- Knowledge and experience of Datalake, Dockers and Kubernetes would be good to have.
- Knowledge of Azure functions , Elastic search etc will be good to have.
- Having experience with model versioning (mlflow) and data versioning will be beneficial
- Having experience with microservices libraries or with python libraries such as flask for hosting ml services and models would be great.
- Experience implementing large-scale ETL processes using Informatica PowerCenter.
- Design high-level ETL process and data flow from the source system to target databases.
- Strong experience with Oracle databases and strong SQL.
- Develop & unit test Informatica ETL processes for optimal performance utilizing best practices.
- Performance tune Informatica ETL mappings and report queries.
- Develop database objects like Stored Procedures, Functions, Packages, and Triggers using SQL and PL/SQL.
- Hands-on Experience in Unix.
- Experience in Informatica Cloud (IICS).
- Work with appropriate leads and review high-level ETL design, source to target data mapping document, and be the point of contact for any ETL-related questions.
- Good understanding of project life cycle, especially tasks within the ETL phase.
- Ability to work independently and multi-task to meet critical deadlines in a rapidly changing environment.
- Excellent communication and presentation skills.
- Effectively worked on the Onsite and Offshore work model.
• Responsible for developing and maintaining applications with PySpark
Must-Have Skills:
- Actively engage with internal business teams to understand their challenges and deliver robust, data-driven solutions.
- Work alongside global counterparts to solve data-intensive problems using standard analytical frameworks and tools.
- Be encouraged and expected to innovate and be creative in your data analysis, problem-solving, and presentation of solutions.
- Network and collaborate with a broad range of internal business units to define and deliver joint solutions.
- Work alongside customers to leverage cutting-edge technology (machine learning, streaming analytics, and ‘real’ big data) to creatively solve problems and disrupt existing business models.
In this role, we are looking for:
- A problem-solving mindset with the ability to understand business challenges and how to apply your analytics expertise to solve them.
- The unique person who can present complex mathematical solutions in a simple manner that most will understand, including customers.
- An individual excited by innovation and new technology and eager to finds ways to employ these innovations in practice.
- A team mentality, empowered by the ability to work with a diverse set of individuals.
Basic Qualifications
- A Bachelor’s degree in Data Science, Math, Statistics, Computer Science or related field with an emphasis on analytics.
- 5+ Years professional experience in a data scientist/analyst role or similar.
- Proficiency in your statistics/analytics/visualization tool of choice, but preferably in the Microsoft Azure Suite, including Azure ML Studio and PowerBI as well as R, Python, SQL.
Preferred Qualifications
- Excellent communication, organizational transformation, and leadership skills
- Demonstrated excellence in Data Science, Business Analytics and Engineering
Only a solid grounding in computer engineering, Unix, data structures and algorithms would enable you to meet this challenge. 7+ years of experience architecting, developing, releasing, and maintaining large-scale big data platforms on AWS or GCP Understanding of how Big Data tech and NoSQL stores like MongoDB, HBase/HDFS, ElasticSearch synergize to power applications in analytics, AI and knowledge graphs Understandingof how data processing models, data location patterns, disk IO, network IO, shuffling affect large scale text processing - feature extraction, searching etc Expertise with a variety of data processing systems, including streaming, event, and batch (Spark, Hadoop/MapReduce) 5+ years proficiency in configuring and deploying applications on Linux-based systems 5+ years of experience Spark - especially Pyspark for transforming large non-structured text data, creating highly optimized pipelines Experience with RDBMS, ETL techniques and frameworks (Sqoop, Flume) and big data querying tools (Pig, Hive) Stickler of world class best practices, uncompromising on the quality of engineering, understand standards and reference architectures and deep in Unix philosophy with appreciation of big data design patterns, orthogonal code design and functional computation models |