Description Deep experience and understanding of Apache Hadoop and surrounding technologies required; Experience with Spark, Impala, Hive, Flume, Parquet and MapReduce. Strong understanding of development languages to include: Java, Python, Scala, Shell Scripting Expertise in Apache Spark 2. x framework principals and usages. Should be proficient in developing Spark Batch and Streaming job in Python, Scala or Java. Should have proven experience in performance tuning of Spark applications both from application code and configuration perspective. Should be proficient in Kafka and integration with Spark. Should be proficient in Spark SQL and data warehousing techniques using Hive. Should be very proficient in Unix shell scripting and in operating on Linux. Should have knowledge about any cloud based infrastructure. Good experience in tuning Spark applications and performance improvements. Strong understanding of data profiling concepts and ability to operationalize analyses into design and development activities Experience with best practices of software development; Version control systems, automated builds, etc. Experienced in and able to lead the following phases of the Software Development Life Cycle on any project (feasibility planning, analysis, development, integration, test and implementation) Capable of working within the team or as an individual Experience to create technical documentation
Job Title: Distributed Systems Engineer - SDET Job Location: Pune, India Job Description: Are you looking to put your computer science skills to use? Are you looking to work for one of the hottest start-ups in Silicon Valley? Are you looking to define the next generation data management platform based on Apache Spark? Are you excited by the idea of being a Spark committer? If you answered yes to all of the questions above, we definitely want to talk to you. We are looking to add highly motivated engineers to work as a QE software engineer in our product development team in Pune. We work on cutting edge data management products that transform the way businesses operate. As a distributed systems engineer (if you are good) , you will get to work on defining key elements of our real time analytics platform, including 1. Distributed in memory data management 2. OLTP and OLAP querying in a single platform 3. Approximate Query Processing over large data sets 4. Online machine learning algorithms applied to streaming data sets 5. Streaming and continuous querying Requirements: 1. Experience in testing modern SQL, NewSQL products highly desirable 2. Experience with SQL language, JDBC, end to end testing of databases 3. Hands on Experience in writing SQL queries 4. Experience on database performance benchmarks like TPC-H, TPC-C and TPC-E a plus 5. Prior experience in benchmarking against Cassandra or MemSQL is a big plus 6. You should be able to program either in Java or have some exposure to functional programming in Scala 7. You should care about performance, and by that, we mean performance optimizations in a JVM 8. You should be self motivated and driven to succeed 9. If you are an open source committer on any project, especially an Apache project, you will fit right in 10. Experience working with Spark, SparkSQL, Spark Streaming is a BIG plus 11. Plans & authors Test plans and ensure testability is considered by development in all stages of the life cycle. 12. Plans, schedules and tracks the creations of Test plans / automation scripts using defined methodologies for manual and/or automated tests 13. Work as QE team member in troubleshooting, isolating, reproducing, tracking bugs and verifying fixes. 14. Analyze test results to ensure existing functionality and recommends corrective action. Documents test results, manages and maintains defect & test case databases to assist in process improvement and estimation of future releases. 15. Performs the assessment and planning of test efforts required for automation of new functions/features under development. Influences design changes to improve quality and feature testability. 16. If you have solved big complex problems, we want to talk to you 17. If you are a math geek, with a background in statistics, mathematics and you know what a linear regression is, this just might be the place for you 18. Exposure to stream data processing Storm, Samza is a plus Open source contributors: Send us your Github id Product: SnappyData is a new real-time analytics platform that combines probabilistic data structures, approximate query processing and in memory distributed data management to deliver powerful analytic querying and alerting capabilities on Apache Spark at a fraction of the cost of traditional big data analytics platforms. SnappyData fuses the Spark computational engine with a highly available, multi-tenanted in-memory database to execute OLAP and OLTP queries on streaming data. Further, SnappyData can store data in a variety of synopsis data structures to provide extremely fast responses on less resources. Finally, applications can either submit Spark programs or connect using JDBC/ODBC to run interactive or continuous SQL queries. Skills: 1. Distributed Systems, 2. Scala, 3. Apache Spark, 4. Spark SQL, 5. Spark Streaming, 6. Java, 7. YARN/Mesos What's in it for you: 1. Cutting edge work that is ultra meaningful 2. Colleagues who are the best of the best 3. Meaningful startup equity 4. Competitive base salary 5. Full benefits 6. Casual, Fun Office Company Overview: SnappyData is a Silicon Valley funded startup founded by engineers who pioneered the distributed in memory data business. It is advised by some of the legends of the computing industry who have been instrumental in creating multiple disruptions that have defined computing over the past 40 years. The engineering team that powers SnappyData built GemFire, one of the industry leading in memory data grids, which is used worldwide in mission critical applications ranging from finance to retail.
We at InfoVision Labs, are passionate about technology and what our clients would like to get accomplished. We continuously strive to understand business challenges, changing competitive landscape and how the cutting edge technology can help position our client to the forefront of the competition.We are a fun loving team of Usability Experts and Software Engineers, focused on Mobile Technology, Responsive Web Solutions and Cloud Based Solutions. Job Responsibilities: ◾Minimum 3 years of experience in Big Data skills required. ◾Complete life cycle experience with Big Data is highly preferred ◾Skills – Hadoop, Spark, “R”, Hive, Pig, H-Base and Scala ◾Excellent communication skills ◾Ability to work independently with no-supervision.
Crest (Part of the Springer Nature group):-Headquartered in Pune, Crest is a Springer Nature company that delivers cutting edge IT and ITeS solutions to some of the biggest scientific content and database brands in the world. Our global teams work closely with our counterparts and clients in Europe, USA and New Zealand, leveraging the latest technology, marketing intelligence and subject matter expertise. With handpicked SME’s in a range of sciences and technology teams working on the latest ECM, Scala, SAP and MS Tech platforms, Crest not only develops quality STM content, but continuously enhances the channels though which they are delivered to the world. Crest is an ISO 9001 certified, driven by over 1000 professionals in Technology, Research& Analysis and Marketing & BPM. Specialties: 1. Technology 2. Research 3. Marketing Intelligence 4. Business Process Management