Sr Data Engineer Job Description About Us DataWeave is a Data Platform which aggregates publicly available data from disparate sources and makes it available in the right format to enable companies take strategic decisions using trans-firewall Analytics. It's hard to tell what we love more, problems or solutions! Every day, we choose to address some of the hardest data problems that there are. We are in the business of making sense of messy public data on the web. At serious scale! Requirements: - Building an intelligent and highly scalable crawling platform - Data extraction and processing at scale - Enhancing existing data stores/data models - Building a low latency API layer for serving data to power Dashboards, Reports, and Analytics functionality - Constantly evolving our data platform to support new features Expectations: - 4+ years of relevant industry experience. - Strong in algorithms and problem solving Skills - Software development experience in one or more general purpose programming languages (e.g. Python, C/C++, Ruby, Java, C#). - Exceptional coding abilities and experience with building large-scale and high-availability applications. - Experience in search/information retrieval platforms like Solr, Lucene and ElasticSearch. - Experience in building and maintaining large scale web crawlers. - In Depth knowledge of SQL and and No-Sql datastore. - Ability to design and build quick prototypes. - Experience in working on cloud based infrastructure like AWS, GCE. Growth at DataWeave - Fast paced growth opportunities at dynamically evolving start-up. - You have the opportunity to work in many different areas and explore wide variety of tools to figure out what really excites you.
JOB DESCRIPTIONWe are looking for a Data Engineer with a solid background in scalable systems to work with our engineering team to improve and optimize our platform. You will have significant input into the team’s architectural approach and execution. We are looking for a hands-on programmer who enjoys designing and optimizing data pipelines for large-scale data.This is NOT a "data scientist" role, so please don't apply if you're looking for that.RESPONSIBILITIES:1. Build, maintain and test, performant, scalable data pipelines2. Work with data scientists and application developers to implement scalable pipelines for data ingest, processing, machine learning and visualization3. Building interfaces for ingest across various data storesMUST-HAVE:1. A track record of building and deploying data pipelines as a part of work or side projects2. Ability to work with RDBMS, MySQL or Postgres 3. Ability to deploy over cloud infrastructure, at least AWS4. Demonstrated ability and hunger to learnGOOD-TO-HAVE:1. Computer Science degree2. Expertise in at least one of: Python, Java, Scala3. Expertise and experience in deploying solutions based on Spark and Kafka4. Knowledge of container systems like Docker or Kubernetes5. Experience with NoSQL / graph databases:6. Knowledge of Machine LearningKindly apply only if you are skilled in building data pipelines.
Position Description Demonstrates up-to-date expertise in Software Engineering and applies this to the development, execution, and improvement of action plans Models compliance with company policies and procedures and supports company mission, values, and standards of ethics and integrity Provides and supports the implementation of business solutions Provides support to the business Troubleshoots business, production issues and on call support. Minimum Qualifications BS/MS in Computer Science or related field 5+ years’ experience building web applications Solid understanding of computer science principles Excellent Soft Skills Understanding the major algorithms like searching and sorting Strong skills in writing clean code using languages like Java and J2EE technologies. Understanding how to engineer the RESTful, Micro services and knowledge of major software patterns like MVC, Singleton, Facade, Business Delegate Deep knowledge of web technologies such as HTML5, CSS, JSON Good understanding of continuous integration tools and frameworks like Jenkins Experience in working with the Agile environments, like Scrum and Kanban. Experience in dealing with the performance tuning for very large-scale apps. Experience in writing scripting using Perl, Python and Shell scripting. Experience in writing jobs using Open source cluster computing frameworks like Spark Relational database design experience- MySQL, Oracle, SOLR, NoSQL - Cassandra, Mango DB and Hive. Aptitude for writing clean, succinct and efficient code. Attitude to thrive in a fun, fast-paced start-up like environment
Description Deep experience and understanding of Apache Hadoop and surrounding technologies required; Experience with Spark, Impala, Hive, Flume, Parquet and MapReduce. Strong understanding of development languages to include: Java, Python, Scala, Shell Scripting Expertise in Apache Spark 2. x framework principals and usages. Should be proficient in developing Spark Batch and Streaming job in Python, Scala or Java. Should have proven experience in performance tuning of Spark applications both from application code and configuration perspective. Should be proficient in Kafka and integration with Spark. Should be proficient in Spark SQL and data warehousing techniques using Hive. Should be very proficient in Unix shell scripting and in operating on Linux. Should have knowledge about any cloud based infrastructure. Good experience in tuning Spark applications and performance improvements. Strong understanding of data profiling concepts and ability to operationalize analyses into design and development activities Experience with best practices of software development; Version control systems, automated builds, etc. Experienced in and able to lead the following phases of the Software Development Life Cycle on any project (feasibility planning, analysis, development, integration, test and implementation) Capable of working within the team or as an individual Experience to create technical documentation
We are building the AI core for a Legal Workflow solution. You will be expected to build and train models to extract relevant information from contracts and other legal documents. Required Skills/Experience: - Python - Basics of Deep Learning - Experience with one ML framework (like TensorFlow, Keras, Caffee) Preferred Skills/Expereince: - Exposure to ML concepts like LSTM, RNN and Conv Nets - Experience with NLP and Stanford POS tagger
Requirements: Minimum 4-years work experience in building, managing and maintaining Analytics applications B.Tech/BE in CS/IT from Tier 1/2 Institutes Strong Fundamentals of Data Structures and Algorithms Good analytical & problem-solving skills Strong hands-on experience in Python In depth Knowledge of queueing systems (Kafka/ActiveMQ/RabbitMQ) Experience in building Data pipelines & Real time Analytics Systems Experience in SQL (MYSQL) & NoSQL (Mongo/Cassandra) databases is a plus Understanding of Service Oriented Architecture Delivered high-quality work with a significant contribution Expert in git, unit tests, technical documentation and other development best practices Experience in Handling small teams
This is one of those exceptional roles where, unlike most data science roles that shape the marketing/ optimization efforts, data science here helps build a consumer-facing product from the ground up. ------Key Skills Expected--------- • Strong understanding & experience in building efficient search & recommendation algorithms; experience in Machine/ Deep Learning would be beneficial • Driving semantics through NLP of unstructured and semi-structured data • Should be able to independently build large-scale crawlers, clean & analyze that information and use as an input-feed to ranking/ recommendation algorithms • Should have excellent back-end coding skills • Strong knowledge of hosting webservices like AWS • Experience in Python-Django would be a plus We are looking for self-starters who are looking to solve genuinely hard problems.
Couture.ai is building a patent-pending AI platform targeted towards vertical-specific solutions. The platform is already licensed by Reliance Jio and few European retailers, to empower real-time experiences for their combined >200 million end users. For this role, credible display of innovation in past projects (or academia) is a must. We are looking for a candidate who lives and talks Data & Algorithms, love to play with BigData engineering, hands-on with Apache Spark, Kafka, RDBMS/NoSQL DBs, Big Data Analytics and handling Unix & Production Server. Tier-1 college (BE from IITs, BITS-Pilani, top NITs, IIITs or MS in Stanford, Berkley, CMU, UW–Madison) or exceptionally bright work history is a must. Let us know if this interests you to explore the profile further.
Couture.ai is building a patent-pending AI platform targeted towards vertical-specific solutions. The platform is already licensed by Reliance Jio and few European retailers, to empower real-time experiences for their combined >200 million end users. For this role, credible display of innovation in past projects is a must. We are looking for hands-on leaders in data engineering with the 5-11 year of research/large-scale production implementation experience with: - Proven expertise in Spark, Kafka, and Hadoop ecosystem. - Rock-solid algorithmic capabilities. - Production deployments for massively large-scale systems, real-time personalization, big data analytics and semantic search. - Expertise in Containerization (Docker, Kubernetes) and Cloud Infra, preferably OpenStack. - Experience with Spark ML, Tensorflow (& TF Serving), MXNet, Scala, Python, NoSQL DBs, Kubernetes, ElasticSearch/Solr in production. Tier-1 college (BE from IITs, BITS-Pilani, IIITs, top NITs, DTU, NSIT or MS in Stanford, UC, MIT, CMU, UW–Madison, ETH, top global schools) or exceptionally bright work history is a must. Let us know if this interests you to explore the profile further.