About Us:
Small businesses are the backbone of the US economy, comprising almost half of the GDP and the private workforce. Yet, big banks don’t provide the access, assistance and modern tools that owners need to successfully grow their business.
We started Novo to challenge the status quo—we’re on a mission to increase the GDP of the modern entrepreneur by creating the go-to banking platform for small businesses (SMBs). Novo is flipping the script of the banking world, and we’re excited to lead the small business banking revolution.
At Novo, we’re here to help entrepreneurs, freelancers, startups and SMBs achieve their financial goals by empowering them with an operating system that makes business banking as easy as iOS. We developed modern bank accounts and tools to help to save time and increase cash flow. Our unique product integrations enable easy access to tracking payments, transferring money internationally, managing business transactions and more. We’ve made a big impact in a short amount of time, helping thousands of organizations access powerfully simple business banking.
We are looking for a Senior Data Scientist who is enthusiastic about using data and technology to solve complex business problems. If you're passionate about leading and helping to architect and develop thoughtful data solutions, then we want to chat. Are you ready to revolutionize the small business banking industry with us?
About the Role: (specific to the role-- describe the role activities/duties, who they interact with, what they are accountable for, how the role operates in the team, department and organization)
- Build and manage predictive models focussed on credit risk, fraud, conversions, churn, consumer behaviour etc
- Provides best practices, direction for data analytics and business decision making across multiple projects and functional areas
- Implements performance optimizations and best practices for scalable data models, pipelines and modelling
- Resolve blockers and help the team stay productive
- Take part in building the team and iterating on hiring processes
Requirements for the Role: (these are specific to the role-- technical skills and requirements to fulfill the job duties, certifications, years of experience, degree)
- 4+ years of experience in data science roles focussed on managing data processes, modelling and dashboarding
- Strong experience in python, SQL and in-depth understanding of modelling techniques
- Experience working with Pandas, scikit learn, visualization libraries like plotly, bokeh etc.
- Prior experience with credit risk modelling will be preferred
- Deep Knowledge of Python to write scripts to manipulate data and generate automated reports
How We Define Success: (these are specific to the role-- should be tied to performance management, OKRs or general goals)
- Expand access to data driven decision making across the organization
- Solve problems in risk, marketing, growth, customer behaviour through analytics models that increase efficacy
Nice To Have, but Not Required:
- Experience in dashboarding libraries like Python Dash and exposure to CI/CD
- Exposure to big data tools like Spark, and some core tech knowledge around API’s, data streaming etc.
Novo values diversity as a core tenant of the work we do and the businesses we serve. We are an equal opportunity employer, indiscriminate of race, religion, ethnicity, national origin, citizenship, gender, gender identity, sexual orientation, age, veteran status, disability, genetic information or any other protected characteristic.
Similar jobs
Daily and monthly responsibilities
- Review and coordinate with business application teams on data delivery requirements.
- Develop estimation and proposed delivery schedules in coordination with development team.
- Develop sourcing and data delivery designs.
- Review data model, metadata and delivery criteria for solution.
- Review and coordinate with team on test criteria and performance of testing.
- Contribute to the design, development and completion of project deliverables.
- Complete in-depth data analysis and contribution to strategic efforts
- Complete understanding of how we manage data with focus on improvement of how data is sourced and managed across multiple business areas.
Basic Qualifications
- Bachelor’s degree.
- 5+ years of data analysis working with business data initiatives.
- Knowledge of Structured Query Language (SQL) and use in data access and analysis.
- Proficient in data management including data analytical capability.
- Excellent verbal and written communications also high attention to detail.
- Experience with Python.
- Presentation skills in demonstrating system design and data analysis solutions.
- 3+ years experience in practical implementation and deployment of ML based systems preferred.
- BE/B Tech or M Tech (preferred) in CS/Engineering with strong mathematical/statistical background
- Strong mathematical and analytical skills, especially statistical and ML techniques, with familiarity with different supervised and unsupervised learning algorithms
- Implementation experiences and deep knowledge of Classification, Time Series Analysis, Pattern Recognition, Reinforcement Learning, Deep Learning, Dynamic Programming and Optimisation
- Experience in working on modeling graph structures related to spatiotemporal systems
- Programming skills in Python
- Experience in developing and deploying on cloud (AWS or Google or Azure)
- Good verbal and written communication skills
- Familiarity with well-known ML frameworks such as Pandas, Keras, TensorFlow
- Big data developer with 8+ years of professional IT experience with expertise in Hadoop ecosystem components in ingestion, Data modeling, querying, processing, storage, analysis, Data Integration and Implementing enterprise level systems spanning Big Data.
- A skilled developer with strong problem solving, debugging and analytical capabilities, who actively engages in understanding customer requirements.
- Expertise in Apache Hadoop ecosystem components like Spark, Hadoop Distributed File Systems(HDFS), HiveMapReduce, Hive, Sqoop, HBase, Zookeeper, YARN, Flume, Pig, Nifi, Scala and Oozie.
- Hands on experience in creating real - time data streaming solutions using Apache Spark core, Spark SQL & DataFrames, Kafka, Spark streaming and Apache Storm.
- Excellent knowledge of Hadoop architecture and daemons of Hadoop clusters, which include Name node,Data node, Resource manager, Node Manager and Job history server.
- Worked on both Cloudera and Horton works in Hadoop Distributions. Experience in managing Hadoop clustersusing Cloudera Manager tool.
- Well versed in installation, Configuration, Managing of Big Data and underlying infrastructure of Hadoop Cluster.
- Hands on experience in coding MapReduce/Yarn Programs using Java, Scala and Python for analyzing Big Data.
- Exposure to Cloudera development environment and management using Cloudera Manager.
- Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle .
- Implemented Spark using PYTHON and utilizing Data frames and Spark SQL API for faster processing of data and handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS.
- Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
- Hands on experience in MLlib from Spark which are used for predictive intelligence, customer segmentation and for smooth maintenance in Spark streaming.
- Experience in using Flume to load log files into HDFS and Oozie for workflow design and scheduling.
- Experience in optimizing MapReduce jobs to use HDFS efficiently by using various compression mechanisms.
- Working on creating data pipeline for different events of ingestion, aggregation, and load consumer response data into Hive external tables in HDFS location to serve as feed for tableau dashboards.
- Hands on experience in using Sqoop to import data into HDFS from RDBMS and vice-versa.
- In-depth Understanding of Oozie to schedule all Hive/Sqoop/HBase jobs.
- Hands on expertise in real time analytics with Apache Spark.
- Experience in converting Hive/SQL queries into RDD transformations using Apache Spark, Scala and Python.
- Extensive experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS).
- Experience in Microsoft cloud and setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.
- Extensively worked on Spark using Python on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL.
- Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
- Knowledge in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
- Experienced in writing Ad Hoc queries using Cloudera Impala, also used Impala analytical functions.
- Experience in creating Data frames using PySpark and performing operation on the Data frames using Python.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS and MapReduce Programming Paradigm, High Availability and YARN architecture.
- Establishing multiple connections to different Redshift clusters (Bank Prod, Card Prod, SBBDA Cluster) and provide the access for pulling the information we need for analysis.
- Generated various kinds of knowledge reports using Power BI based on Business specification.
- Developed interactive Tableau dashboards to provide a clear understanding of industry specific KPIs using quick filters and parameters to handle them more efficiently.
- Well Experience in projects using JIRA, Testing, Maven and Jenkins build tools.
- Experienced in designing, built, and deploying and utilizing almost all the AWS stack (Including EC2, S3,), focusing on high-availability, fault tolerance, and auto-scaling.
- Good experience with use-case development, with Software methodologies like Agile and Waterfall.
- Working knowledge of Amazon's Elastic Cloud Compute( EC2 ) infrastructure for computational tasks and Simple Storage Service ( S3 ) as Storage mechanism.
- Good working experience in importing data using Sqoop, SFTP from various sources like RDMS, Teradata, Mainframes, Oracle, Netezza to HDFS and performed transformations on it using Hive, Pig and Spark .
- Extensive experience in Text Analytics, developing different Statistical Machine Learning solutions to various business problems and generating data visualizations using Python and R.
- Proficient in NoSQL databases including HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
- Hands on experience in Hadoop Big data technology working on MapReduce, Pig, Hive as Analysis tool, Sqoop and Flume data import/export tools.
TOP 3 SKILLS
Python (Language)
Spark Framework
Spark Streaming
Docker/Jenkins/ Spinakar
AWS
Hive Queries
He/She should be good coder.
Preff: - Airflow
Must have experience: -
Python
Spark framework and streaming
exposure to Machine Learning Lifecycle is mandatory.
Project:
This is searching domain project. Any searching activity which is happening on website this team create the model for the same, they create sorting/scored model for any search. This is done by the data
scientist This team is working more on the streaming side of data, the candidate would work extensively on Spark streaming and there will be a lot of work in Machine Learning.
INTERVIEW INFORMATION
3-4 rounds.
1st round based on data engineering batching experience.
2nd round based on data engineering streaming experience.
3rd round based on ML lifecycle (3rd round can be a techno-functional round based on previous
feedbacks otherwise 4th round will be a functional round if required.
● Create and maintain optimal data pipeline architecture.
● Assemble large, complex data sets that meet functional / non-functional
business requirements.
● Building and optimizing ‘big data’ data pipelines, architectures and data sets.
● Maintain, organize & automate data processes for various use cases.
● Identifying trends, doing follow-up analysis, preparing visualizations.
● Creating daily, weekly and monthly reports of product KPIs.
● Create informative, actionable and repeatable reporting that highlights
relevant business trends and opportunities for improvement.
Required Skills And Experience:
● 2-5 years of work experience in data analytics- including analyzing large data sets.
● BTech in Mathematics/Computer Science
● Strong analytical, quantitative and data interpretation skills.
● Hands-on experience with Python, Apache Spark, Hadoop, NoSQL
databases(MongoDB preferred), Linux is a must.
● Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.
● Experience with Google Cloud Data Analytics Products such as BigQuery, Dataflow, Dataproc etc. (or similar cloud-based platforms).
● Experience working within a Linux computing environment, and use of
command-line tools including knowledge of shell/Python scripting for
automating common tasks.
● Previous experience working at startups and/or in fast-paced environments.
● Previous experience as a data engineer or in a similar role.
• Working Knowledge of XML, JSON, Shell and other DBMS scripts
• Hands on Experience on Oracle 11G,12c. Working knowledge of Oracle 18 and 19c
• Analysis, design, coding, testing, debugging and documentation. Complete knowledge of
Software Development Life Cycle (SDLC).
• Writing Complex Queries, stored procedures, functions and packages
• Knowledge of REST Services, UTL functions, DBMS functions and data integration is required
• Good knowledge on table level partitions, row locks and experience in OLTP.
• Should be aware about ETL tools, Data Migration, Data Mapping functionalities
• Understand the business requirement, transform/design the same into business solutions.
Perform data modelling and implement the business rules using Oracle database objects.
• Define source to target data mapping and data transformation logic as per the business
need.
• Should have worked on Materialised views creation and maintenance. Experience in
Performance tuning, impact analysis required
• Monitoring and optimizing the performance of the database. Planning for backup and
recovery of database information. Maintaining archived data. Backing up and restoring
databases.
• Hands on Experience on SQL Developer
Data Platform engineering at Uber is looking for a strong Technical Lead (Level 5a Engineer) who has built high quality platforms and services that can operate at scale. 5a Engineer at Uber exhibits following qualities:
- Demonstrate tech expertise › Demonstrate technical skills to go very deep or broad in solving classes of problems or creating broadly leverageable solutions.
- Execute large scale projects › Define, plan and execute complex and impactful projects. You communicate the vision to peers and stakeholders.
- Collaborate across teams › Domain resource to engineers outside your team and help them leverage the right solutions. Facilitate technical discussions and drive to a consensus.
- Coach engineers › Coach and mentor less experienced engineers and deeply invest in their learning and success. You give and solicit feedback, both positive and negative, to others you work with to help improve the entire team.
- Tech leadership › Lead the effort to define the best practices in your immediate team, and help the broader organization establish better technical or business processes.
What You’ll Do
- Build a scalable, reliable, operable and performant data analytics platform for Uber’s engineers, data scientists, products and operations teams.
- Work alongside the pioneers of big data systems such as Hive, Yarn, Spark, Presto, Kafka, Flink to build out a highly reliable, performant, easy to use software system for Uber’s planet scale of data.
- Become proficient of multi-tenancy, resource isolation, abuse prevention, self-serve debuggability aspects of a high performant, large scale, service while building these capabilities for Uber's engineers and operation folks.
What You’ll Need
- 7+ years experience in building large scale products, data platforms, distributed systems in a high caliber environment.
- Architecture: Identify and solve major architectural problems by going deep in your field or broad across different teams. Extend, improve, or, when needed, build solutions to address architectural gaps or technical debt.
- Software Engineering/Programming: Create frameworks and abstractions that are reliable and reusable. advanced knowledge of at least one programming language, and are happy to learn more. Our core languages are Java, Python, Go, and Scala.
- Data Engineering: Expertise in one of the big data analytics technologies we currently use such as Apache Hadoop (HDFS and YARN), Apache Hive, Impala, Drill, Spark, Tez, Presto, Calcite, Parquet, Arrow etc. Under the hood experience with similar systems such as Vertica, Apache Impala, Drill, Google Borg, Google BigQuery, Amazon EMR, Amazon RedShift, Docker, Kubernetes, Mesos etc.
- Execution & Results: You tackle large technical projects/problems that are not clearly defined. You anticipate roadblocks and have strategies to de-risk timelines. You orchestrate work that spans multiple teams and keep your stakeholders informed.
- A team player: You believe that you can achieve more on a team that the whole is greater than the sum of its parts. You rely on others’ candid feedback for continuous improvement.
- Business acumen: You understand requirements beyond the written word. Whether you’re working on an API used by other developers, an internal tool consumed by our operation teams, or a feature used by millions of customers, your attention to details leads to a delightful user experience.