
We are seeking an experienced Web Scraping Engineer to data extraction efforts for our enterprise clients. In this role, you will be tasked with creating and maintaining robust, large-scale scraping systems for gathering structured data.
Responsibilities:
Develop and optimize custom web scraping tools and workflows.
Integrate scraping systems with data storage solutions like SQL and NoSQL databases.
Troubleshoot and resolve scraping challenges, including CAPTCHAs, rate limiting, and IP blocking.
Provide technical guidance on scraping best practices and standards.
Skills Required:
Expert in Python and scraping libraries such as Scrapy and BeautifulSoup.
Deep understanding of web scraping techniques and challenges (CAPTCHAs, anti-bot measures).
Experience with cloud platforms (AWS, Google Cloud).
Strong background in databases and data storage systems (SQL, MongoDB).

Similar jobs


Profile: Backen API developer
Should be well versed with API knowledge for social media scraping.
Full time remote job


Responsibilities
· Develop Python-based APIs using FastAPI and Flask frameworks.
· Develop Python-based Automation scripts and Libraries.
· Develop Front End Components using VueJS and ReactJS.
· Writing and modifying Docker files for the Back-End and Front-End Components.
· Integrate CI/CD pipelines for Automation and Code quality checks.
· Writing complex ORM mappings using SQLAlchemy.
Required Skills:
· Strong experience in Python development in a full stack environment is a requirement, including NodeJS, VueJS/Vuex, Flask, etc.
· Experience with SQLAchemy or similar ORM frameworks.
· Experience working with Geolocation APIs (e.g., Google Maps, Mapbox).
· Experience using Elasticsearch and Airflow is a plus.
· Strong knowledge of SQL, comfortable working with MySQL and/or PostgreSQL databases.
· Understand concepts of Data Modeling.
· Experience with REST.
· Experience with Git, GitFlow, and code review process.
· Good understanding of basic UI and UX principles.
· Project excellent problem-solving and communication skills.


JD / Skills Sets
1. Good knowledge on Python
2. Good knowledge on My-Sql, mongodb
3. Design Pattern
4. OOPs
5. Automation
6. Web scraping
7. Redis queue
8. Basic idea of Finance Domain will be beneficial.
9. Git10. AWS (EC2, RDS, S3)


1: proficient in python, flask, pandas, GitHub and AWS
2: good knowledge of databases both SQL and NoSQL
3:Strong experience in REST and SOAP APIs
4: Experience with working on scalable interactive web applications
5:Basic knowledge of JavaScript and Html
6: Automation and crawling tools and modules
7: Multithreading and Multiprocessing
8:Good Understanding of test-driven Development
9: Preferred exposure to finance domain

Your Responsibilities would be to:
- Architect new and optimize existing software codebases and systems used to crawl, launch, run, and monitor the Anakin family of app crawlers
- Deeply own the lifecycle of software, including rolling out to operations, managing configurations, maintaining and upgrading, and supporting end-users
- Configure and optimize the automated testing and deployment systems used to maintain over 1000+ crawlers across the company
- Analyze data and bugs that require in-depth investigations
- Interface directly with external customers including managing relationships and steering requirements
Basic Qualifications:
- Extremely effective, self-driven builder
- 2+ years of experience as a backend software engineer
- 2+ years of experience with Python
- 2+ years of experience with AWS services such as EC2, S3, Lambda, etc.
- Should have managed a team of software engineers
- Deep experience with network debugging across all OSI layers (Wireshark)
- Knowledge of networks or/and cybersecurity
Preferred Skills and Experience
- Broad understanding of the landscape of software engineering design patterns and principles
- Ability to work quickly and accurately in a highly stressful environment during removing bugs in run-time within minutes
- Excellent communicator, both written and verbal
Additional Requirements
- Must be available to work extended hours and weekends when needed to meet critical deadlines
- Must have an aversion to politics and BS. Should let his/her work speak for him/her.
- Must be comfortable with uncertainty. In almost all the cases, your job will be to figure it out.
- Must not be bounded to comfort zone. Often, you will need to challenge yourself to go above and beyond.

Roles and Responsibilities
- Apply knowledge set to fetch data from multiple online sources, cleanse it and build APIs on top of it
- Develop a deep understanding of our vast data sources on the web and know exactly how, when, and which data to scrap, parse and store
- We're looking for people who will naturally take ownership of data products and who can bring a project all the way from a fast prototype to production.
- Integrating and maintaining Python services
- Developing robust microservices and applications
Desired Candidate Profile
- Strong relevant experience of at least 2-3 years.
- Strong coding experience in Python3.
- Should have good experience with Django and NodeJs.
- Proficient on modelling applications on both RDBMS and NOSQL databases
- Should have experience in web scraping
- Good understanding and hands-on with scheduling and managing tasks with cron
- Should have experience with microservice architecture
- Compile and analyze data, processes, and codes to troubleshoot problems and identify areas for improvement
- Should have experience in shell scripting, GIT and docker
- Writing unit tests for 100% code coverage.

Company Introduction –
- Information Security & Data Analytics Series A funded company.
- Working in cutting edge technologies - Using AI for predictive intelligence and Facial Biometrics.
- Among Top 5 Cyber excellence companies globally (Holger Schulze awards)
- Bronze award for best startup of the year (Indian Express IT awards), only cyber Security Company in top 3.
- More than 100+ clients in India.
Job Description:-
Job Title: Python Developer
Key Requirements:-
- Mine data from structured and unstructured data sources.
- Extract data (text, images, and videos) from multiple documents in different formats.
- Extract information and intelligence from data.
- Extract data based on regular expressions.
- Collect data from structured RDBMS databases.
- Work closely with Project/Business/Research teams to provide mined data/intelligence for analysis.
- Should have strong exposure to core python skills like multiprocessing, multithreading, file handling, data structure like JSON, Data frames, and User Defined Data structure.
- Should have excellent knowledge of classes, file handling, memory manipulations.
- Strong Knowledge in Python.
- Strong exposure to frond end languages like CSS, JavaScript, Ajax etc.
- Should have exposure to requests, Frontera, scarpy-cluster, elastic-search, distributed computing tools like Kafka, Hbase, Redis, Zookeeper, restAPI.
- Should be familiar with *nix development environment.
- Knowledge of Django will be added advantage.
- Excellent knowledge on Web Crawling/Web scraping.
- Should have used scraping modules like Selenium, Scrapy, and Beautiful soup.
- Experience with text processing.
- Basics of databases. Good troubleshooting and debugging skills.
Experience : 1-4 Years Experiene
Education
B.Tech, MCA, Computer Engineering

