Job description
Position: Data Scientist
Location: Bangalore
Long Term Contract position
Remote Till Covid
Experience in applied data science, analytics, data storytelling.
- Write well documented code that can be shared and used across teams, and can scale to be used in existing products. SQL, Advanced Python or R (descriptive / predictive models), Tableau Visualization. Working knowledge of Hadoop, BigQuery, Presto, Vertica
- Apply your expertise in quantitative analysis, data mining, and the presentation of data to uncover unique actionable insights about customer service, health of public conversation and social media
- Inform, influence, support, and execute analysis that feeds into one of our many analytics domains - Customer analytics, product analytics, business operation analytics, cost analytics, media analytics, people analytics
- Select and deselect analytics priorities, insights and data based on ability to drive our desired outcomes
- Own the end to end process, from initiation to deployment, and through ongoing communication and collaboration, sharing of results to partners and leadership
- Mentor and create sense of community and learning environments for our global team of data analysts
Soft skills:
- Ability to communicate findings clearly to both technical and non-technical audiences and to effectively collaborate within cross-functional teams
- Working knowledge of agile framework and processes.
- You should be comfortable managing work plans, timelines and milestones
- You have a sense of urgency, move quickly and ship things
Bonus Points:
- You're experienced in metrics and experiment-driven development
- Experience in statistical methodology (multivariate, time-series, experimental design, data mining, etc.)
About BYGRAD
Similar jobs
RESPONSIBILITIES:
Requirement understanding and elicitation, analyze, data/workflows, contribute to product
project and Proof of concept (POC)
Contribute to prepare design documents and effort estimations.
Develop AI/ML Models using best in-class ML models.
Building, testing, and deploying AI/ML solutions.
Work with Business Analysts and Product Managers to assist with defining functional user
stories.
Ensure deliverables across teams are of high quality and clearly documented.
Recommend best ML practices/Industry standards for any ML use case.
Proactively take up R and D and recommend solution options for any ML use case.
REQUIREMENTS:
Required Skills
Overall experience of 4 to 7 Years working on AI/ML framework development
Good programming knowledge in Python is must.
Good Knowledge of R and SAS is desired.
Good hands on and working knowledge SQL, Data Model, CRISP-DM.
Proficiency with Uni/multivariate statistics, algorithm design, and predictive AI/ML modelling.
Strong knowledge of machine learning algorithms, linear regression, logistic regression, KNN,
Random Forest, Support Vector Machines and Natural Language Processing.
Experience with NLP and deep neural networks using synthetic and artificial data.
Involved in different phases of SDLC and have good working exposure on different SLDC’s like
Agile Methodologies.
● Research and develop advanced statistical and machine learning models for
analysis of large-scale, high-dimensional data.
● Dig deeper into data, understand characteristics of data, evaluate alternate
models and validate hypotheses through theoretical and empirical approaches.
● Productize has proven or working models into production-quality code.
● Collaborate with product management, marketing, and engineering teams in
Business Units to elicit & understand their requirements & challenges and
develop potential solutions
● Stay current with the latest research and technology ideas; share knowledge by
clearly articulating results and ideas to key decision-makers.
● File patents for innovative solutions that add to the company's IP portfolio
Requirements
● 4 to 6 years of strong experience in data mining, machine learning and
statistical analysis.
● BS/MS/Ph.D. in Computer Science, Statistics, Applied Math, or related areas
from Premier institutes ( only IITs / IISc / BITS / Top NITs or top US university
should apply)
● Experience in productizing models to code in a fast-paced start-up
environment.
● Fluency in analytical tools such as Matlab, R, Weka etc.
● Strong intuition for data and Keen aptitude on large scale data analysis
● Strong communication and collaboration skills.
Job Title – Data Scientist (Forecasting)
Anicca Data is seeking a Data Scientist (Forecasting) who is motivated to apply his/her/their skill set to solve complex and challenging problems. The focus of the role will center around applying deep learning models to real-world applications. The candidate should have experience in training, testing deep learning architectures. This candidate is expected to work on existing codebases or write an optimized codebase at Anicca Data. The ideal addition to our team is self-motivated, highly organized, and a team player who thrives in a fast-paced environment with the ability to learn quickly and work independently.
Job Location: Remote (for time being) and Bangalore, India (post-COVID crisis)
Required Skills:
- At least 3+ years of experience in a Data Scientist role
- Bachelor's/Master’s degree in Computer Science, Engineering, Statistics, Mathematics, or similar quantitative discipline. D. will add merit to the application process
- Experience with large data sets, big data, and analytics
- Exposure to statistical modeling, forecasting, and machine learning. Deep theoretical and practical knowledge of deep learning, machine learning, statistics, probability, time series forecasting
- Training Machine Learning (ML) algorithms in areas of forecasting and prediction
- Experience in developing and deploying machine learning solutions in a cloud environment (AWS, Azure, Google Cloud) for production systems
- Research and enhance existing in-house, open-source models, integrate innovative techniques, or create new algorithms to solve complex business problems
- Experience in translating business needs into problem statements, prototypes, and minimum viable products
- Experience managing complex projects including scoping, requirements gathering, resource estimations, sprint planning, and management of internal and external communication and resources
- Write C++ and Python code along with TensorFlow, PyTorch to build and enhance the platform that is used for training ML models
Preferred Experience
- Worked on forecasting projects – both classical and ML models
- Experience with training time series forecasting methods like Moving Average (MA) and Autoregressive Integrated Moving Average (ARIMA) with Neural Networks (NN) models as Feed-forward NN and Nonlinear Autoregressive
- Strong background in forecasting accuracy drivers
- Experience in Advanced Analytics techniques such as regression, classification, and clustering
- Ability to explain complex topics in simple terms, ability to explain use cases and tell stories
- Big data developer with 8+ years of professional IT experience with expertise in Hadoop ecosystem components in ingestion, Data modeling, querying, processing, storage, analysis, Data Integration and Implementing enterprise level systems spanning Big Data.
- A skilled developer with strong problem solving, debugging and analytical capabilities, who actively engages in understanding customer requirements.
- Expertise in Apache Hadoop ecosystem components like Spark, Hadoop Distributed File Systems(HDFS), HiveMapReduce, Hive, Sqoop, HBase, Zookeeper, YARN, Flume, Pig, Nifi, Scala and Oozie.
- Hands on experience in creating real - time data streaming solutions using Apache Spark core, Spark SQL & DataFrames, Kafka, Spark streaming and Apache Storm.
- Excellent knowledge of Hadoop architecture and daemons of Hadoop clusters, which include Name node,Data node, Resource manager, Node Manager and Job history server.
- Worked on both Cloudera and Horton works in Hadoop Distributions. Experience in managing Hadoop clustersusing Cloudera Manager tool.
- Well versed in installation, Configuration, Managing of Big Data and underlying infrastructure of Hadoop Cluster.
- Hands on experience in coding MapReduce/Yarn Programs using Java, Scala and Python for analyzing Big Data.
- Exposure to Cloudera development environment and management using Cloudera Manager.
- Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle .
- Implemented Spark using PYTHON and utilizing Data frames and Spark SQL API for faster processing of data and handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS.
- Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
- Hands on experience in MLlib from Spark which are used for predictive intelligence, customer segmentation and for smooth maintenance in Spark streaming.
- Experience in using Flume to load log files into HDFS and Oozie for workflow design and scheduling.
- Experience in optimizing MapReduce jobs to use HDFS efficiently by using various compression mechanisms.
- Working on creating data pipeline for different events of ingestion, aggregation, and load consumer response data into Hive external tables in HDFS location to serve as feed for tableau dashboards.
- Hands on experience in using Sqoop to import data into HDFS from RDBMS and vice-versa.
- In-depth Understanding of Oozie to schedule all Hive/Sqoop/HBase jobs.
- Hands on expertise in real time analytics with Apache Spark.
- Experience in converting Hive/SQL queries into RDD transformations using Apache Spark, Scala and Python.
- Extensive experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS).
- Experience in Microsoft cloud and setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.
- Extensively worked on Spark using Python on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL.
- Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
- Knowledge in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
- Experienced in writing Ad Hoc queries using Cloudera Impala, also used Impala analytical functions.
- Experience in creating Data frames using PySpark and performing operation on the Data frames using Python.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS and MapReduce Programming Paradigm, High Availability and YARN architecture.
- Establishing multiple connections to different Redshift clusters (Bank Prod, Card Prod, SBBDA Cluster) and provide the access for pulling the information we need for analysis.
- Generated various kinds of knowledge reports using Power BI based on Business specification.
- Developed interactive Tableau dashboards to provide a clear understanding of industry specific KPIs using quick filters and parameters to handle them more efficiently.
- Well Experience in projects using JIRA, Testing, Maven and Jenkins build tools.
- Experienced in designing, built, and deploying and utilizing almost all the AWS stack (Including EC2, S3,), focusing on high-availability, fault tolerance, and auto-scaling.
- Good experience with use-case development, with Software methodologies like Agile and Waterfall.
- Working knowledge of Amazon's Elastic Cloud Compute( EC2 ) infrastructure for computational tasks and Simple Storage Service ( S3 ) as Storage mechanism.
- Good working experience in importing data using Sqoop, SFTP from various sources like RDMS, Teradata, Mainframes, Oracle, Netezza to HDFS and performed transformations on it using Hive, Pig and Spark .
- Extensive experience in Text Analytics, developing different Statistical Machine Learning solutions to various business problems and generating data visualizations using Python and R.
- Proficient in NoSQL databases including HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
- Hands on experience in Hadoop Big data technology working on MapReduce, Pig, Hive as Analysis tool, Sqoop and Flume data import/export tools.
2. hands on experience using python, sql, tablaue
3. Data Analyst
About Amagi & Growth
Amagi Corporation is a next-generation media technology company that provides cloud broadcast and targeted advertising solutions to broadcast TV and streaming TV platforms. Amagi enables content owners to launch, distribute and monetize live linear channels on Free-Ad-Supported TV and video services platforms. Amagi also offers 24x7 cloud managed services bringing simplicity, advanced automation, and transparency to the entire broadcast operations. Overall, Amagi supports 500+ channels on its platform for linear channel creation, distribution, and monetization with deployments in over 40 countries. Amagi has offices in New York (Corporate office), Los Angeles, and London, broadcast operations in New Delhi, and our Development & Innovation center in Bangalore. Amagi is also expanding in Singapore, Canada and other countries.
Amagi has seen phenomenal growth as a global organization over the last 3 years. Amagi has been a profitable firm for the last 2 years, and is now looking at investing in multiple new areas. Amagi has been backed by 4 investors - Emerald, Premji Invest, Nadathur and Mayfield. As of the fiscal year ending March 31, 2021, the company witnessed stellar growth in the areas of channel creation, distribution, and monetization, enabling customers to extend distribution and earn advertising dollars while saving up to 40% in cost of operations compared to traditional delivery models. Some key highlights of this include:
· Annual revenue growth of 136%
· 44% increase in customers
· 50+ Free Ad Supported Streaming TV (FAST) platform partnerships and 100+ platform partnerships globally
· 250+ channels added to its cloud platform taking the overall tally to more than 500
· Approximately 2 billion ad opportunities every month supporting OTT ad-insertion for 1000+ channels
· 60% increase in workforce in the US, UK, and India to support strong customer growth (current headcount being 360 full-time employees + Contractors)
· 5-10x growth in ad impressions among top customers
- Minimum 1 years of relevant experience, in PySpark (mandatory)
- Hands on experience in development, test, deploy, maintain and improving data integration pipeline in AWS cloud environment is added plus
- Ability to play lead role and independently manage 3-5 member of Pyspark development team
- EMR ,Python and PYspark mandate.
- Knowledge and awareness working with AWS Cloud technologies like Apache Spark, , Glue, Kafka, Kinesis, and Lambda in S3, Redshift, RDS
Strong knowledge in statistical and data mining techniques: GLM/Regression, Random Forest, Boosting, Trees, text mining, etc.
Sound Knowlegde querying databases and using statistical computer languages: R, Python, SQL, etc.
Strong understanding creating and using advanced machine learning algorithms and statistics: regression, simulation, scenario analysis, modeling, clustering, decision trees, neural networks, etc.
Big data Developer
Exp: 3yrs to 7 yrs.
Job Location: Hyderabad
Notice: Immediate / within 30 days
1. Expertise in building AWS Data Engineering pipelines with AWS Glue -> Athena -> Quick sight
2. Experience in developing lambda functions with AWS Lambda
3. Expertise with Spark/PySpark Candidate should be hands on with PySpark code and should be able to do transformations with Spark
4. Should be able to code in Python and Scala.
5. Snowflake experience will be a plus
We can start keeping Hadoop and Hive requirements as good to have or understanding of is enough rather than keeping it as a desirable requirement.
Role: Data Engineer
Company: PayU
Location: Bangalore/ Mumbai
Experience : 2-5 yrs
About Company:
PayU is the payments and fintech business of Prosus, a global consumer internet group and one of the largest technology investors in the world. Operating and investing globally in markets with long-term growth potential, Prosus builds leading consumer internet companies that empower people and enrich communities.
The leading online payment service provider in 36 countries, PayU is dedicated to creating a fast, simple and efficient payment process for merchants and buyers. Focused on empowering people through financial services and creating a world without financial borders where everyone can prosper, PayU is one of the biggest investors in the fintech space globally, with investments totalling $700 million- to date. PayU also specializes in credit products and services for emerging markets across the globe. We are dedicated to removing risks to merchants, allowing consumers to use credit in ways that suit them and enabling a greater number of global citizens to access credit services.
Our local operations in Asia, Central and Eastern Europe, Latin America, the Middle East, Africa and South East Asia enable us to combine the expertise of high growth companies with our own unique local knowledge and technology to ensure that our customers have access to the best financial services.
India is the biggest market for PayU globally and the company has already invested $400 million in this region in last 4 years. PayU in its next phase of growth is developing a full regional fintech ecosystem providing multiple digital financial services in one integrated experience. We are going to do this through 3 mechanisms: build, co-build/partner; select strategic investments.
PayU supports over 350,000+ merchants and millions of consumers making payments online with over 250 payment methods and 1,800+ payment specialists. The markets in which PayU operates represent a potential consumer base of nearly 2.3 billion people and a huge growth potential for merchants.
Job responsibilities:
- Design infrastructure for data, especially for but not limited to consumption in machine learning applications
- Define database architecture needed to combine and link data, and ensure integrity across different sources
- Ensure performance of data systems for machine learning to customer-facing web and mobile applications using cutting-edge open source frameworks, to highly available RESTful services, to back-end Java based systems
- Work with large, fast, complex data sets to solve difficult, non-routine analysis problems, applying advanced data handling techniques if needed
- Build data pipelines, includes implementing, testing, and maintaining infrastructural components related to the data engineering stack.
- Work closely with Data Engineers, ML Engineers and SREs to gather data engineering requirements to prototype, develop, validate and deploy data science and machine learning solutions
Requirements to be successful in this role:
- Strong knowledge and experience in Python, Pandas, Data wrangling, ETL processes, statistics, data visualisation, Data Modelling and Informatica.
- Strong experience with scalable compute solutions such as in Kafka, Snowflake
- Strong experience with workflow management libraries and tools such as Airflow, AWS Step Functions etc.
- Strong experience with data engineering practices (i.e. data ingestion pipelines and ETL)
- A good understanding of machine learning methods, algorithms, pipelines, testing practices and frameworks
- Preferred) MEng/MSc/PhD degree in computer science, engineering, mathematics, physics, or equivalent (preference: DS/ AI)
- Experience with designing and implementing tools that support sharing of data, code, practices across organizations at scale
As an experienced Data Scientist you’ll join a team of data scientists, analysts, and software engineers
working to push the boundaries of data science in health care. We like to experiment, iterate, and
innovate with technology, from developing new algorithms specific to health care’s challenges, to
bringing the latest machine learning practices and applications developed in other industries into the
health care world. We know that algorithms are only valuable when powered by the right data, so we
focus on fully understanding the problems we need to solve, and truly understanding the data behind
them before launching into solutions – ensuring that the solutions we do land on are impactful and
powerful
Essential functions
• Research, conceptualize, and implement analytical approaches and predictive modeling to
evaluate scenarios, predict utilization and clinical outcomes, and recommend actions to impact
results.
• Manage and execute on the entire model development process, including scope definition,
hypothesis formation, data cleaning and preparation, feature selection, model implementation
in production, validation and iteration, using multiple data sources.
• Provide guidance on necessary data and software infrastructure capabilities to deliver a scalable
solution across partners and support the implementation of the team’s algorithms and models
• Contribute to the development and publication in major journals, conferences showcasing
leadership in healthcare data science.
• Work closely and collaborate with Data Scientists, Machine Learning engineers, IT teams and
Business stakeholders spread out across various locations in US and India to achieve business
goals
• Provide guidance to other Data Scientist and Machine Learning Engineers