Subodh PopalwarSoftware Engineer, Memorres
About Intelliswift Software
We are hiring for Data Scientist for Bangalore.
- ML programming
- Model Deployment
- Experience processing unstructured data and building NLP models
- Experience with big data tools pyspark
- Pipeline orchestration using Airflow and model deployment experience is preferred
- Big data developer with 8+ years of professional IT experience with expertise in Hadoop ecosystem components in ingestion, Data modeling, querying, processing, storage, analysis, Data Integration and Implementing enterprise level systems spanning Big Data.
- A skilled developer with strong problem solving, debugging and analytical capabilities, who actively engages in understanding customer requirements.
- Expertise in Apache Hadoop ecosystem components like Spark, Hadoop Distributed File Systems(HDFS), HiveMapReduce, Hive, Sqoop, HBase, Zookeeper, YARN, Flume, Pig, Nifi, Scala and Oozie.
- Hands on experience in creating real - time data streaming solutions using Apache Spark core, Spark SQL & DataFrames, Kafka, Spark streaming and Apache Storm.
- Excellent knowledge of Hadoop architecture and daemons of Hadoop clusters, which include Name node,Data node, Resource manager, Node Manager and Job history server.
- Worked on both Cloudera and Horton works in Hadoop Distributions. Experience in managing Hadoop clustersusing Cloudera Manager tool.
- Well versed in installation, Configuration, Managing of Big Data and underlying infrastructure of Hadoop Cluster.
- Hands on experience in coding MapReduce/Yarn Programs using Java, Scala and Python for analyzing Big Data.
- Exposure to Cloudera development environment and management using Cloudera Manager.
- Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle .
- Implemented Spark using PYTHON and utilizing Data frames and Spark SQL API for faster processing of data and handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS.
- Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
- Hands on experience in MLlib from Spark which are used for predictive intelligence, customer segmentation and for smooth maintenance in Spark streaming.
- Experience in using Flume to load log files into HDFS and Oozie for workflow design and scheduling.
- Experience in optimizing MapReduce jobs to use HDFS efficiently by using various compression mechanisms.
- Working on creating data pipeline for different events of ingestion, aggregation, and load consumer response data into Hive external tables in HDFS location to serve as feed for tableau dashboards.
- Hands on experience in using Sqoop to import data into HDFS from RDBMS and vice-versa.
- In-depth Understanding of Oozie to schedule all Hive/Sqoop/HBase jobs.
- Hands on expertise in real time analytics with Apache Spark.
- Experience in converting Hive/SQL queries into RDD transformations using Apache Spark, Scala and Python.
- Extensive experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS).
- Experience in Microsoft cloud and setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.
- Extensively worked on Spark using Python on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL.
- Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
- Knowledge in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
- Experienced in writing Ad Hoc queries using Cloudera Impala, also used Impala analytical functions.
- Experience in creating Data frames using PySpark and performing operation on the Data frames using Python.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS and MapReduce Programming Paradigm, High Availability and YARN architecture.
- Establishing multiple connections to different Redshift clusters (Bank Prod, Card Prod, SBBDA Cluster) and provide the access for pulling the information we need for analysis.
- Generated various kinds of knowledge reports using Power BI based on Business specification.
- Developed interactive Tableau dashboards to provide a clear understanding of industry specific KPIs using quick filters and parameters to handle them more efficiently.
- Well Experience in projects using JIRA, Testing, Maven and Jenkins build tools.
- Experienced in designing, built, and deploying and utilizing almost all the AWS stack (Including EC2, S3,), focusing on high-availability, fault tolerance, and auto-scaling.
- Good experience with use-case development, with Software methodologies like Agile and Waterfall.
- Working knowledge of Amazon's Elastic Cloud Compute( EC2 ) infrastructure for computational tasks and Simple Storage Service ( S3 ) as Storage mechanism.
- Good working experience in importing data using Sqoop, SFTP from various sources like RDMS, Teradata, Mainframes, Oracle, Netezza to HDFS and performed transformations on it using Hive, Pig and Spark .
- Extensive experience in Text Analytics, developing different Statistical Machine Learning solutions to various business problems and generating data visualizations using Python and R.
- Proficient in NoSQL databases including HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
- Hands on experience in Hadoop Big data technology working on MapReduce, Pig, Hive as Analysis tool, Sqoop and Flume data import/export tools.
Intuitive cloud (http://www.intuitive.cloud">www.intuitive.cloud) is one of the fastest growing top-tier Cloud Solutions and SDx Engineering solution and service company supporting 80+ Global Enterprise Customer across Americas, Europe and Middle East.
Intuitive is a recognized professional and manage service partner for core superpowers in cloud(public/ Hybrid), security, GRC, DevSecOps, SRE, Application modernization/ containers/ K8 -as-a- service and cloud application delivery.
- 9+ years’ experience as data engineer.
- Must have 4+ Years in implementing data engineering solutions with Databricks.
- This is hands on role building data pipelines using Databricks. Hands-on technical experience with Apache Spark.
- Must have deep expertise in one of the programming languages for data processes (Python, Scala). Experience with Python, PySpark, Hadoop, Hive and/or Spark to write data pipelines and data processing layers
- Must have worked with relational databases like Snowflake. Good SQL experience for writing complex SQL transformation.
- Performance Tuning of Spark SQL running on S3/Data Lake/Delta Lake/ storage and Strong Knowledge on Databricks and Cluster Configurations.
- Hands on architectural experience
- Nice to have Databricks administration including security and infrastructure features of Databricks.
Mid / Senior Big Data Engineer
Role: Big Data EngineerNumber of open positions: 5Location: PuneAt Clairvoyant, we're building a thriving big data practice to help enterprises enable and accelerate the adoption of Big data and cloud services. In the big data space, we lead and serve as innovators, troubleshooters, and enablers. Big data practice at Clairvoyant, focuses on solving our customer's business problems by delivering products designed with best in class engineering practices and a commitment to keep the total cost of ownership to a minimum.
- 4-10 years of experience in software development.
- At least 2 years of relevant work experience on large scale Data applications.
- Strong coding experience in Java is mandatory
- Good aptitude, strong problem solving abilities, and analytical skills, ability to take ownership as appropriate
- Should be able to do coding, debugging, performance tuning and deploying the apps to Prod.
- Should have good working experience on
- o Hadoop ecosystem (HDFS, Hive, Yarn, File formats like Avro/Parquet)
- o Kafka
- o J2EE Frameworks (Spring/Hibernate/REST)
- o Spark Streaming or any other streaming technology.
- Strong coding experience in Java is mandatory
- Ability to work on the sprint stories to completion along with Unit test case coverage.
- Experience working in Agile Methodology
- Excellent communication and coordination skills
- Knowledgeable (and preferred hands on) - UNIX environments, different continuous integration tools.
- Must be able to integrate quickly into the team and work independently towards team goals
- Take the complete responsibility of the sprint stories' execution
- Be accountable for the delivery of the tasks in the defined timelines with good quality.
- Follow the processes for project execution and delivery.
- Follow agile methodology
- Work with the team lead closely and contribute to the smooth delivery of the project.
- Understand/define the architecture and discuss the pros-cons of the same with the team
- Involve in the brainstorming sessions and suggest improvements in the architecture/design.
- Work with other team leads to get the architecture/design reviewed.
- Work with the clients and counter-parts (in US) of the project.
- Keep all the stakeholders updated about the project/task status/risks/issues if there are any.
Experience: 4 to 9 years
Keywords: java, scala, spark, software development, hadoop, hive
Design, development and deployment of highly-available and fault-tolerant enterprise business software at scale.
Demonstrate tech expertise to go very deep or broad in solving classes of problems or creating broadly leverage-able solutions.
Execute large-scale projects - Provide technical leadership in architecting and building product solutions.
Collaborate across teams to deliver a result, from hardworking team members within your group, through smart technologists across lines of business.
Be a role model on acting with good judgment and responsibility, helping teams to commit and move forward.
Be a humble mentor and trusted advisor for both our talented team members and passionate leaders alike. Deal with differences in opinion in a mature and fair way.
Raise the bar by improving standard methodologies, producing best-in-class efficient solutions, code, documentation, testing, and monitoring.
• 15+ years of relevant engineering experience.
Proven record of building and productionizing highly reliable products at scale.
Experience with Java and Python
Experience with the Big Data technologie is a plus.
Ability to assess new technologies and make pragmatic choices that help guide us towards a long-term vision
Can collaborate well with several other engineering orgs to articulate requirements and system design
• Team player!
• Great interpersonal skills, deep technical ability, and a portfolio of successful execution.
• Excellent written and verbal communication skills, including the ability to write detailed technical documents.
• Passionate about helping teams grow by inspiring and mentoring engineers.
- Extract and present valuable information from data
- Understand business requirements and generate insights
- Build mathematical models, validate and work with them
- Explain complex topics tailored to the audience
- Validate and follow up on results
- Work with large and complex data sets
- Establish priorities with clear goals and responsibilities to achieve a high level of performance.
- Work in an agile and iterative manner on solving problems
- Evaluate different options proactively and the ability to solve problems in an innovative way. Develop new solutions or combine existing methods to create new approaches.
- Good understanding of Digital & analytics
- Strong communication skills, orally and in writing
As a Data Scientist, you will work in collaboration with our business and engineering people, on creating value from data. Often the work requires solving complex problems by turning vast amounts of data into business insights through advanced analytics, modeling, and machine learning. You have a strong foundation in analytics, mathematical modeling, computer science, and math - coupled with a strong business sense. You proactively fetch information from various sources and analyze it for better understanding of how the business performs. Furthermore, you model and build AI tools that automate certain processes within the company. The solutions produced will be implemented to impact business results.
- Develop an understanding of business obstacles, create solutions based on advanced analytics and draw implications for model development
- Combine, explore, and draw insights from data. Often large and complex data assets from different parts of the business.
- Design and build explorative, predictive- or prescriptive models, utilizing optimization, simulation, and machine learning techniques
- Prototype and pilot new solutions and be a part of the aim of ‘productizing’ those valuable solutions that can have an impact at a global scale
- Guides and coaches other chapter colleagues to help solve data/technical problems at an operational level, and in methodologies to help improve development processes
- Identifies and interprets trends and patterns in complex data sets to enable the business to make data-driven decisions
- Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
- Experience in migrating on-premise data warehouses to data platforms on AZURE cloud.
- Designing and implementing data engineering, ingestion, and transformation functions
- Experience with Azure Analysis Services
- Experience in Power BI
- Experience with third-party solutions like Attunity/Stream sets, Informatica
- Experience with PreSales activities (Responding to RFPs, Executing Quick POCs)
- Capacity Planning and Performance Tuning on Azure Stack and Spark.