Responsibilities:
- Designing and implementing fine-tuned production ready data/ML pipelines in Hadoop platform.
- Driving optimization, testing and tooling to improve quality.
- Reviewing and approving high level & amp; detailed design to ensure that the solution delivers to the business needs and aligns to the data & analytics architecture principles and roadmap.
- Understanding business requirements and solution design to develop and implement solutions that adhere to big data architectural guidelines and address business requirements.
- Following proper SDLC (Code review, sprint process).
- Identifying, designing, and implementing internal process improvements: automating manual processes, optimizing data delivery, etc.
- Building robust and scalable data infrastructure (both batch processing and real-time) to support needs from internal and external users.
- Understanding various data security standards and using secure data security tools to apply and adhere to the required data controls for user access in the Hadoop platform.
- Supporting and contributing to development guidelines and standards for data ingestion.
- Working with a data scientist and business analytics team to assist in data ingestion and data related technical issues.
- Designing and documenting the development & deployment flow.
Requirements:
- Experience in developing rest API services using one of the Scala frameworks.
- Ability to troubleshoot and optimize complex queries on the Spark platform
- Expert in building and optimizing ‘big data’ data/ML pipelines, architectures and data sets.
- Knowledge in modelling unstructured to structured data design.
- Experience in Big Data access and storage techniques.
- Experience in doing cost estimation based on the design and development.
- Excellent debugging skills for the technical stack mentioned above which even includes analyzing server logs and application logs.
- Highly organized, self-motivated, proactive, and ability to propose best design solutions.
- Good time management and multitasking skills to work to deadlines by working independently and as a part of a team.
About Information Solution Provider Company
Similar jobs
Lightning Job By Cutshort ⚡
As part of this feature, you can expect status updates about your application and replies within 72 hours (once the screening questions are answered)
About Databook:-
- Great salespeople let their customers’ strategies do the talking.
Databook’s award-winning Strategic Relationship Management (SRM) platform uses advanced AI and NLP to empower the world’s largest B2B sales teams to create, manage, and maintain strategic relationships at scale. The platform ingests and interprets billions of financial and market data signals to generate actionable sales strategies that connect the seller’s solutions to a buyer’s financial pain and urgency.
The Opportunity
We're seeking Junior Engineers to support and develop Databook’s capabilities. Working closely with our seasoned engineers, you'll contribute to crafting new features and ensuring our platform's reliability. If you're eager about playing a part in building the future of customer intelligence, with a keen eye towards quality, we'd love to meet you!
Specifically, you'll
- Participate in various stages of the engineering lifecycle alongside our experienced engineers.
- Assist in maintaining and enhancing features of the Databook platform.
- Collaborate with various teams to comprehend requirements and aid in implementing technology solutions.
Please note: As you progress and grow with us, you might be introduced to on-call rotations to handle any platform challenges.
Working Arrangements:
- This position offers a hybrid work mode, allowing employees to work both remotely and in-office as mutually agreed upon.
What we're looking for
- 1-2+ years experience as a Data Engineer
- Bachelor's degree in Engineering
- Willingness to work across different time zones
- Ability to work independently
- Knowledge of cloud (AWS or Azure)
- Exposure to distributed systems such as Spark, Flink or Kafka
- Fundamental knowledge of data modeling and optimizations
- Minimum of one year of experience using Python working as a Software Engineer
- Knowledge of SQL (Postgres) databases would be beneficial
- Experience with building analytics dashboard
- Familiarity with RESTful APIs and/or GraphQL is welcomed
- Hand-on experience with Numpy, Pandas, SpaCY would be a plus
- Exposure or working experience on GenAI (LLMs in general), LLMOps would be a plus
- Highly fluent in both spoken and written English language
Ideal candidates will also have:
- Self-motivated with great organizational skills.
- Ability to focus on small and subtle details.
- Are willing to learn and adapt in a rapidly changing environment.
- Excellent written and oral communication skills.
Join us and enjoy these perks!
- Competitive salary with bonus
- Medical insurance coverage
- 5 weeks leave plus public holidays
- Employee referral bonus program
- Annual learning stipend to spend on books, courses or other training materials that help you develop skills relevant to your role or professional development
- Complimentary subscription to Masterclass
- Core Java: advanced level competency, should have worked on projects with core Java development.
- Linux shell : advanced level competency, work experience with Linux shell scripting, knowledge and experience to use important shell commands
- Rdbms, SQL: advanced level competency, Should have expertise in SQL query language syntax, should be well versed with aggregations, joins of SQL query language.
- Data structures and problem solving: should have ability to use appropriate data structure.
- AWS cloud : Good to have experience with aws serverless toolset along with aws infra
- Data Engineering ecosystem : Good to have experience and knowledge of data engineering, ETL, data warehouse (any toolset)
- Hadoop, HDFS, YARN : Should have introduction to internal working of these toolsets
- HIVE, MapReduce, Spark: Good to have experience developing transformations using hive queries, MapReduce job implementation and Spark Job Implementation. Spark implementation in Scala will be plus point.
- Airflow, Oozie, Sqoop, Zookeeper, Kafka: Good to have knowledge about purpose and working of these technology toolsets. Working experience will be a plus point here.
XpressBees – a logistics company started in 2015 – is amongst the fastest growing
companies of its sector. While we started off rather humbly in the space of
ecommerce B2C logistics, the last 5 years have seen us steadily progress towards
expanding our presence. Our vision to evolve into a strong full-service logistics
organization reflects itself in our new lines of business like 3PL, B2B Xpress and cross
border operations. Our strong domain expertise and constant focus on meaningful
innovation have helped us rapidly evolve as the most trusted logistics partner of
India. We have progressively carved our way towards best-in-class technology
platforms, an extensive network reach, and a seamless last mile management
system. While on this aggressive growth path, we seek to become the one-stop-shop
for end-to-end logistics solutions. Our big focus areas for the very near future
include strengthening our presence as service providers of choice and leveraging the
power of technology to improve efficiencies for our clients.
Job Profile
As a Lead Data Engineer in the Data Platform Team at XpressBees, you will build the data platform
and infrastructure to support high quality and agile decision-making in our supply chain and logistics
workflows.
You will define the way we collect and operationalize data (structured / unstructured), and
build production pipelines for our machine learning models, and (RT, NRT, Batch) reporting &
dashboarding requirements. As a Senior Data Engineer in the XB Data Platform Team, you will use
your experience with modern cloud and data frameworks to build products (with storage and serving
systems)
that drive optimisation and resilience in the supply chain via data visibility, intelligent decision making,
insights, anomaly detection and prediction.
What You Will Do
• Design and develop data platform and data pipelines for reporting, dashboarding and
machine learning models. These pipelines would productionize machine learning models
and integrate with agent review tools.
• Meet the data completeness, correction and freshness requirements.
• Evaluate and identify the data store and data streaming technology choices.
• Lead the design of the logical model and implement the physical model to support
business needs. Come up with logical and physical database design across platforms (MPP,
MR, Hive/PIG) which are optimal physical designs for different use cases (structured/semi
structured). Envision & implement the optimal data modelling, physical design,
performance optimization technique/approach required for the problem.
• Support your colleagues by reviewing code and designs.
• Diagnose and solve issues in our existing data pipelines and envision and build their
successors.
Qualifications & Experience relevant for the role
• A bachelor's degree in Computer Science or related field with 6 to 9 years of technology
experience.
• Knowledge of Relational and NoSQL data stores, stream processing and micro-batching to
make technology & design choices.
• Strong experience in System Integration, Application Development, ETL, Data-Platform
projects. Talented across technologies used in the enterprise space.
• Software development experience using:
• Expertise in relational and dimensional modelling
• Exposure across all the SDLC process
• Experience in cloud architecture (AWS)
• Proven track record in keeping existing technical skills and developing new ones, so that
you can make strong contributions to deep architecture discussions around systems and
applications in the cloud ( AWS).
• Characteristics of a forward thinker and self-starter that flourishes with new challenges
and adapts quickly to learning new knowledge
• Ability to work with a cross functional teams of consulting professionals across multiple
projects.
• Knack for helping an organization to understand application architectures and integration
approaches, to architect advanced cloud-based solutions, and to help launch the build-out
of those systems
• Passion for educating, training, designing, and building end-to-end systems.
- Hands-on experience in any Cloud Platform
- Microsoft Azure Experience
- Developing telemetry software to connect Junos devices to the cloud
- Fast prototyping and laying the SW foundation for product solutions
- Moving prototype solutions to a production cloud multitenant SaaS solution
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources
- Build analytics tools that utilize the data pipeline to provide significant insights into customer acquisition, operational efficiency and other key business performance metrics.
- Work with partners including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
- Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
- Work with data and analytics specialists to strive for greater functionality in our data systems.
Qualification and Desired Experiences
- Master in Computer Science, Electrical Engineering, Statistics, Applied Math or equivalent fields with strong mathematical background
- 5+ years experiences building data pipelines for data science-driven solutions
- Strong hands-on coding skills (preferably in Python) processing large-scale data set and developing machine learning model
- Familiar with one or more machine learning or statistical modeling tools such as Numpy, ScikitLearn, MLlib, Tensorflow
- Good team worker with excellent interpersonal skills written, verbal and presentation
- Create and maintain optimal data pipeline architecture,
- Assemble large, sophisticated data sets that meet functional / non-functional business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Experience with AWS, S3, Flink, Spark, Kafka, Elastic Search
- Previous work in a start-up environment
- 3+ years experiences building data pipelines for data science-driven solutions
- Master in Computer Science, Electrical Engineering, Statistics, Applied Math or equivalent fields with strong mathematical background
- We are looking for a candidate with 9+ years of experience in a Data Engineer role, who has attained a Graduate degree in Computer Science, Statistics, Informatics, Information Systems or another quantitative field. They should also have experience using the following software/tools:
- Experience with big data tools: Hadoop, Spark, Kafka, etc.
- Experience with relational SQL and NoSQL databases, including Postgres and Cassandra.
- Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
- Experience with AWS cloud services: EC2, EMR, RDS, Redshift
- Experience with stream-processing systems: Storm, Spark-Streaming, etc.
- Experience with object-oriented/object function scripting languages: Python, Java, C++, Scala, etc.
- Strong hands-on coding skills (preferably in Python) processing large-scale data set and developing machine learning model
- Familiar with one or more machine learning or statistical modeling tools such as Numpy, ScikitLearn, MLlib, Tensorflow
- Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.
- Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.
- Experience performing root cause analysis on internal and external data and processes to answer specific business questions and find opportunities for improvement.
- Strong analytic skills related to working with unstructured datasets.
- Build processes supporting data transformation, data structures, metadata, dependency and workload management.
- A successful history of manipulating, processing and extracting value from large disconnected datasets.
- Proven understanding of message queuing, stream processing, and highly scalable ‘big data’ data stores.
- Strong project management and interpersonal skills.
- Experience supporting and working with multi-functional teams in a multidimensional environment.
- You'd have to set up your own shop, work with design customers to find generalizable use cases, and build them out.
- Ability to collaborate with cross-functional teams to build and ship new features
- At least 2-5 years of experience
- Predictive Analytics – Machine Learning Algorithms, Logistics & Linear Regression, Decision Tree, Clustering.
- Exploratory Data Analysis – Data Preparation, Data Exploration, and Data Visualization.
- Analytics Tools – R, Python, SQL, Power BI, MS Excel.
Work days- Sun-Thu
Day shift
- 3-5yrs of practical DS experience working with varied data sets. Working with retail banking is preferred but not necessary.
- Need to be strong in concepts of statistical modelling – particularly looking for practical knowledge learnt from work experience (should be able to give "rule of thumb" answers)
- Strong problem solving skills and the ability to articulate really well.
- Ideally, the data scientist should have interfaced with data engineering and model deployment teams to bring models / solutions to "live" in production.
- Strong working knowledge of python ML stack is very important here.
- Willing to work on diverse range of tasks in building ML related capability on the Corridor Platform as well as client work.
- Someone with strong interest in data engineering aspect of ML is highly preferred, i.e. can play dual role of Data Scientist as well as someone who can code a module on our Corridor Platform writing robust code.
Structured ML techniques for candidates:
- GBM
- XgBoost
- Random Forest
- Neural Net
- Logistic Regression
(Hadoop, HDFS, Kafka, Spark, Hive)
Overall Experience - 8 to 12 years
Relevant exp on Big data - 3+ years in above
Salary: Max up-to 20LPA
Job location - Chennai / Bangalore /
Notice Period - Immediate joiner / 15-to-20-day Max
The Responsibilities of The Senior Data Engineer Are:
- Requirements gathering and assessment
- Breakdown complexity and translate requirements to specification artifacts and story boards to build towards, using a test-driven approach
- Engineer scalable data pipelines using big data technologies including but not limited to Hadoop, HDFS, Kafka, HBase, Elastic
- Implement the pipelines using execution frameworks including but not limited to MapReduce, Spark, Hive, using Java/Scala/Python for application design.
- Mentoring juniors in a dynamic team setting
- Manage stakeholders with proactive communication upholding TheDataTeam's brand and values
A Candidate Must Have the Following Skills:
- Strong problem-solving ability
- Excellent software design and implementation ability
- Exposure and commitment to agile methodologies
- Detail oriented with willingness to proactively own software tasks as well as management tasks, and see them to completion with minimal guidance
- Minimum 8 years of experience
- Should have experience in full life-cycle of one big data application
- Strong understanding of various storage formats (ORC/Parquet/Avro)
- Should have hands on experience in one of the Hadoop distributions (Hortoworks/Cloudera/MapR)
- Experience in at least one cloud environment (GCP/AWS/Azure)
- Should be well versed with at least one database (MySQL/Oracle/MongoDB/Postgres)
- Bachelor's in Computer Science, and preferably, a Masters as well - Should have good code review and debugging skills
Additional skills (Good to have):
- Experience in Containerization (docker/Heroku)
- Exposure to microservices
- Exposure to DevOps practices - Experience in Performance tuning of big data applications