![DataMetica's logo](/_next/image?url=https%3A%2F%2Fcdnv2.cutshort.io%2Fcompany-static%2F59a7dc010adfdb705f968452%2Fuser_uploaded_data%2Flogos%2Fcompany_logo_ybSwbcNL.png&w=3840&q=75)
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fpython.png&w=32&q=75)
Job description
Role : Lead Architecture (Spark, Scala, Big Data/Hadoop, Java)
Primary Location : India-Pune, Hyderabad
Experience : 7 - 12 Years
Management Level: 7
Joining Time: Immediate Joiners are preferred
- Attend requirements gathering workshops, estimation discussions, design meetings and status review meetings
- Experience of Solution Design and Solution Architecture for the data engineer model to build and implement Big Data Projects on-premises and on cloud.
- Align architecture with business requirements and stabilizing the developed solution
- Ability to build prototypes to demonstrate the technical feasibility of your vision
- Professional experience facilitating and leading solution design, architecture and delivery planning activities for data intensive and high throughput platforms and applications
- To be able to benchmark systems, analyses system bottlenecks and propose solutions to eliminate them
- Able to help programmers and project managers in the design, planning and governance of implementing projects of any kind.
- Develop, construct, test and maintain architectures and run Sprints for development and rollout of functionalities
- Data Analysis, Code development experience, ideally in Big Data Spark, Hive, Hadoop, Java, Python, PySpark,
- Execute projects of various types i.e. Design, development, Implementation and migration of functional analytics Models/Business logic across architecture approaches
- Work closely with Business Analysts to understand the core business problems and deliver efficient IT solutions of the product
- Deployment sophisticated analytics program of code using any of cloud application.
Perks and Benefits we Provide!
- Working with Highly Technical and Passionate, mission-driven people
- Subsidized Meals & Snacks
- Flexible Schedule
- Approachable leadership
- Access to various learning tools and programs
- Pet Friendly
- Certification Reimbursement Policy
- Check out more about us on our website below!
www.datametica.com
![companies logos](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fhiring_companies_logos-v2.webp&w=3840&q=80)
About DataMetica
![DataMetica's video section](/_next/image?url=https%3A%2F%2Fimg.youtube.com%2Fvi%2FeJxWRYpseX4%2Fmaxresdefault.jpg&w=3840&q=75)
![DataMetica's video section](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fyt_play_red.png&w=3840&q=75)
Similar jobs
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fmachine_learning.png&w=32&q=75)
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fdata_science.png&w=32&q=75)
Roles & Responsibilities:
-Adopt novel and breakthrough Deep Learning/Machine Learning technology to fully solve real world problems for different industries. -Develop prototypes of machine learning models based on existing research papers.
-Utilize published/existing models to meet business requirements. Tweak existing implementations to improve efficiencies and adapt for use-case variations.
-Optimize machine learning model training and inference time. -Work closely with development and QA teams in transitioning prototypes to commercial products
-Independently work end-to-end from data collection, preparation/annotation to validation of outcomes.
-Define and develop ML infrastructure to improve efficiency of ML development workflows.
Must Have:
- Experience in productizing and deployment of ML solutions.
- AI/ML expertise areas: Computer Vision with Deep Learning. Experience with object detection, classification, recognition; document layout and understanding tasks, OCR/ICR
. - Thorough understanding of full ML pipeline, starting from data collection to model building to inference.
- Experience with Python, OpenCV and at least a few framework/libraries (TensorFlow / Keras / PyTorch / spaCy / fastText / Scikit-learn etc.)
- Years with relevant experience:
5+ -Experience or Knowledge in ML OPS.
Good to Have: NLP: Text classification, entity extraction, content summarization. AWS, Docker.
- Extensive exposure to at least one Business Intelligence Platform (if possible, QlikView/Qlik Sense) – if not Qlik, ETL tool knowledge, ex- Informatica/Talend
- At least 1 Data Query language – SQL/Python
- Experience in creating breakthrough visualizations
- Understanding of RDMS, Data Architecture/Schemas, Data Integrations, Data Models and Data Flows is a must
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fpython.png&w=32&q=75)
Role : Web Scraping Engineer
Experience : 2 to 3 Years
Job Location : Chennai
About OJ Commerce:
OJ Commerce (OJC), a rapidly expanding and profitable online retailer, is headquartered in Florida, USA, with a fully-functional office in Chennai, India. We deliver exceptional value to our customers by harnessing cutting-edge technology, fostering innovation, and establishing strategic brand partnerships to enable a seamless, enjoyable shopping experience featuring high-quality products at unbeatable prices. Our advanced, data-driven system streamlines operations with minimal human intervention.
Our extensive product portfolio encompasses over a million SKUs and more than 2,500 brands across eight primary categories. With a robust presence on major platforms such as Amazon, Walmart, Wayfair, Home Depot, and eBay, we directly serve consumers in the United States.
As we continue to forge new partner relationships, our flagship website, www.ojcommerce.com, has rapidly emerged as a top-performing e-commerce channel, catering to millions of customers annually.
Job Summary:
We are seeking a Web Scraping Engineer and Data Extraction Specialist who will play a crucial role in our data acquisition and management processes. The ideal candidate will be proficient in developing and maintaining efficient web crawlers capable of extracting data from large websites and storing it in a database. Strong expertise in Python, web crawling, and data extraction, along with familiarity with popular crawling tools and modules, is essential. Additionally, the candidate should demonstrate the ability to effectively utilize API tools for testing and retrieving data from various sources. Join our team and contribute to our data-driven success!
Responsibilities:
- Develop and maintain web crawlers in Python.
- Crawl large websites and extract data.
- Store data in a database.
- Analyze and report on data.
- Work with other engineers to develop and improve our web crawling infrastructure.
- Stay up to date on the latest crawling tools and techniques.
Required Skills and Qualifications:
- Bachelor's degree in computer science or a related field.
- 2-3 years of experience with Python and web crawling.
- Familiarity with tools / modules such as
- Scrapy, Selenium, Requests, Beautiful Soup etc.
- API tools such as Postman or equivalent.
- Working knowledge of SQL.
- Experience with web crawling and data extraction.
- Strong problem-solving and analytical skills.
- Ability to work independently and as part of a team.
- Excellent communication and documentation skills.
What we Offer
• Competitive salary
• Medical Benefits/Accident Cover
• Flexi Office Working Hours
• Fast paced start up
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fdata_analytics.png&w=32&q=75)
Work closely with different Front Office and Support Function stakeholders including but not restricted to Business
Management, Accounts, Regulatory Reporting, Operations, Risk, Compliance, HR on all data collection and reporting use cases.
Collaborate with Business and Technology teams to understand enterprise data, create an innovative narrative to explain, engage and enlighten regular staff members as well as executive leadership with data-driven storytelling
Solve data consumption and visualization through data as a service distribution model
Articulate findings clearly and concisely for different target use cases, including through presentations, design solutions, visualizations
Perform Adhoc / automated report generation tasks using Power BI, Oracle BI, Informatica
Perform data access/transfer and ETL automation tasks using Python, SQL, OLAP / OLTP, RESTful APIs, and IT tools (CFT, MQ-Series, Control-M, etc.)
Provide support and maintain the availability of BI applications irrespective of the hosting location
Resolve issues escalated from Business and Functional areas on data quality, accuracy, and availability, provide incident-related communications promptly
Work with strict deadlines on high priority regulatory reports
Serve as a liaison between business and technology to ensure that data related business requirements for protecting sensitive data are clearly defined, communicated, and well understood, and considered as part of operational
prioritization and planning
To work for APAC Chief Data Office and coordinate with a fully decentralized team across different locations in APAC and global HQ (Paris).
General Skills:
Excellent knowledge of RDBMS and hands-on experience with complex SQL is a must, some experience in NoSQL and Big Data Technologies like Hive and Spark would be a plus
Experience with industrialized reporting on BI tools like PowerBI, Informatica
Knowledge of data related industry best practices in the highly regulated CIB industry, experience with regulatory report generation for financial institutions
Knowledge of industry-leading data access, data security, Master Data, and Reference Data Management, and establishing data lineage
5+ years experience on Data Visualization / Business Intelligence / ETL developer roles
Ability to multi-task and manage various projects simultaneously
Attention to detail
Ability to present to Senior Management, ExCo; excellent written and verbal communication skills
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fpython.png&w=32&q=75)
-
Fix issues with plugins for our Python-based ETL pipelines
-
Help with automation of standard workflow
-
Deliver Python microservices for provisioning and managing cloud infrastructure
-
Responsible for any refactoring of code
-
Effectively manage challenges associated with handling large volumes of data working to tight deadlines
-
Manage expectations with internal stakeholders and context-switch in a fast-paced environment
-
Thrive in an environment that uses AWS and Elasticsearch extensively
-
Keep abreast of technology and contribute to the engineering strategy
-
Champion best development practices and provide mentorship to others
-
First and foremost you are a Python developer, experienced with the Python Data stack
-
You love and care about data
-
Your code is an artistic manifest reflecting how elegant you are in what you do
-
You feel sparks of joy when a new abstraction or pattern arises from your code
-
You support the manifests DRY (Don’t Repeat Yourself) and KISS (Keep It Short and Simple)
-
You are a continuous learner
-
You have a natural willingness to automate tasks
-
You have critical thinking and an eye for detail
-
Excellent ability and experience of working to tight deadlines
-
Sharp analytical and problem-solving skills
-
Strong sense of ownership and accountability for your work and delivery
-
Excellent written and oral communication skills
-
Mature collaboration and mentoring abilities
-
We are keen to know your digital footprint (community talks, blog posts, certifications, courses you have participated in or you are keen to, your personal projects as well as any kind of contributions to the open-source communities if any)
-
Delivering complex software, ideally in a FinTech setting
-
Experience with CI/CD tools such as Jenkins, CircleCI
-
Experience with code versioning (git / mercurial / subversion)
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fdata_analytics.png&w=32&q=75)
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fpython.png&w=32&q=75)
Responsibilities
- Understanding the business requirements so as to formulate the problems to solve and restrict the slice of data to be explored.
- Collecting data from various sources.
- Performing cleansing, processing, and validation on the data subject to analyze, in order to ensure its quality.
- Exploring and visualizing data.
- Performing statistical analysis and experiments to derive business insights.
- Clearly communicating the findings from the analysis to turn information into something actionable through reports, dashboards, and/or presentations.
Skills
- Experience solving problems in the project’s business domain.
- Experience with data integration from multiple sources
- Proficiency in at least one query language, especially SQL.
- Working experience with NoSQL databases, such as MongoDB and Elasticsearch.
- Working experience with popular statistical and machine learning techniques, such as clustering, linear regression, KNN, decision trees, etc.
- Good scripting skills using Python, R or any other relevant language
- Proficiency in at least one data visualization tool, such as Matplotlib, Plotly, D3.js, ggplot, etc.
- Great communication skills.
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fpython.png&w=32&q=75)
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fscala.png&w=32&q=75)
- We are looking for a Data Engineer to build the next-generation mobile applications for our world-class fintech product.
- The candidate will be responsible for expanding and optimising our data and data pipeline architecture, as well as optimising data flow and collection for cross-functional teams.
- The ideal candidate is an experienced data pipeline builder and data wrangler who enjoys optimising data systems and building them from the ground up.
- Looking for a person with a strong ability to analyse and provide valuable insights to the product and business team to solve daily business problems.
- You should be able to work in a high-volume environment, have outstanding planning and organisational skills.
Qualifications for Data Engineer
- Working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.
- Experience building and optimising ‘big data’ data pipelines, architectures, and data sets.
- Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
- Strong analytic skills related to working with unstructured datasets. Build processes supporting data transformation, data structures, metadata, dependency and workload management.
- Experience supporting and working with cross-functional teams in a dynamic environment.
- Looking for a candidate with 2-3 years of experience in a Data Engineer role, who is a CS graduate or has an equivalent experience.
What we're looking for?
- Experience with big data tools: Hadoop, Spark, Kafka and other alternate tools.
- Experience with relational SQL and NoSQL databases, including MySql/Postgres and Mongodb.
- Experience with data pipeline and workflow management tools: Luigi, Airflow.
- Experience with AWS cloud services: EC2, EMR, RDS, Redshift.
- Experience with stream-processing systems: Storm, Spark-Streaming.
- Experience with object-oriented/object function scripting languages: Python, Java, Scala.
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fdata_science.png&w=32&q=75)
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fmachine_learning.png&w=32&q=75)
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fdeep_learning.png&w=32&q=75)
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fpython.png&w=32&q=75)
Understand various raw data input formats, build consumers on Kafka/ksqldb for them and ingest large amounts of raw data into Flink and Spark.
Conduct complex data analysis and report on results.
Build various aggregation streams for data and convert raw data into various logical processing streams.
Build algorithms to integrate multiple sources of data and create a unified data model from all the sources.
Build a unified data model on both SQL and NO-SQL databases to act as data sink.
Communicate the designs effectively with the fullstack engineering team for development.
Explore machine learning models that can be fitted on top of the data pipelines.
Mandatory Qualifications Skills:
Deep knowledge of Scala and Java programming languages is mandatory
Strong background in streaming data frameworks (Apache Flink, Apache Spark) is mandatory
Good understanding and hands on skills on streaming messaging platforms such as Kafka
Familiarity with R, C and Python is an asset
Analytical mind and business acumen with strong math skills (e.g. statistics, algebra)
Problem-solving aptitude
Excellent communication and presentation skills
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fpython.png&w=32&q=75)
Roles and responsibilities:
- Responsible for development and maintenance of applications with technologies involving Enterprise Java and Distributed technologies.
- Experience in Hadoop, Kafka, Spark, Elastic Search, SQL, Kibana, Python, experience w/ machine learning and Analytics etc.
- Collaborate with developers, product manager, business analysts and business users in conceptualizing, estimating and developing new software applications and enhancements..
- Collaborate with QA team to define test cases, metrics, and resolve questions about test results.
- Assist in the design and implementation process for new products, research and create POC for possible solutions.
- Develop components based on business and/or application requirements
- Create unit tests in accordance with team policies & procedures
- Advise, and mentor team members in specialized technical areas as well as fulfill administrative duties as defined by support process
- Work with cross-functional teams during crisis to address and resolve complex incidents and problems in addition to assessment, analysis, and resolution of cross-functional issues.
![icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fsearch.png&w=48&q=75)
![companies logos](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fhiring_companies_logos-v2.webp&w=3840&q=80)