Role and Responsibilities
- Execute data mining projects, training and deploying models over a typical duration of 2 -12 months.
- The ideal candidate should be able to innovate, analyze the customer requirement, develop a solution in the time box of the project plan, execute and deploy the solution.
- Integrate the data mining projects embedded data mining applications in the FogHorn platform (on Docker or Android).
Candidates must meet ALL of the following qualifications:
- Have analyzed, trained and deployed at least three data mining models in the past. If the candidate did not directly deploy their own models, they will have worked with others who have put their models into production. The models should have been validated as robust over at least an initial time period.
- Three years of industry work experience, developing data mining models which were deployed and used.
- Programming experience in Python is core using data mining related libraries like Scikit-Learn. Other relevant Python mining libraries include NumPy, SciPy and Pandas.
- Data mining algorithm experience in at least 3 algorithms across: prediction (statistical regression, neural nets, deep learning, decision trees, SVM, ensembles), clustering (k-means, DBSCAN or other) or Bayesian networks
Any of the following extra qualifications will make a candidate more competitive:
- Soft Skills
- Sets expectations, develops project plans and meets expectations.
- Experience adapting technical dialogue to the right level for the audience (i.e. executives) or specific jargon for a given vertical market and job function.
- Technical skills
- Commonly, candidates have a MS or Ph.D. in Computer Science, Math, Statistics or an engineering technical discipline. BS candidates with experience are considered.
- Have managed past models in production over their full life cycle until model replacement is needed. Have developed automated model refreshing on newer data. Have developed frameworks for model automation as a prototype for product.
- Training or experience in Deep Learning, such as TensorFlow, Keras, convolutional neural networks (CNN) or Long Short Term Memory (LSTM) neural network architectures. If you don’t have deep learning experience, we will train you on the job.
- Shrinking deep learning models, optimizing to speed up execution time of scoring or inference.
- OpenCV or other image processing tools or libraries
- Cloud computing: Google Cloud, Amazon AWS or Microsoft Azure. We have integration with Google Cloud and are working on other integrations.
- Decision trees like XGBoost or Random Forests is helpful.
- Complex Event Processing (CEP) or other streaming data as a data source for data mining analysis
- Time series algorithms from ARIMA to LSTM to Digital Signal Processing (DSP).
- Bayesian Networks (BN), a.k.a. Bayesian Belief Networks (BBN) or Graphical Belief Networks (GBN)
- Experience with PMML is of interest (see www.DMG.org).
- Vertical experience in Industrial Internet of Things (IoT) applications:
- Energy: Oil and Gas, Wind Turbines
- Manufacturing: Motors, chemical processes, tools, automotive
- Smart Cities: Elevators, cameras on population or cars, power grid
- Transportation: Cars, truck fleets, trains
About FogHorn Systems
FogHorn is a leading developer of “edge intelligence” software for industrial and commercial IoT application solutions. FogHorn’s Lightning software platform brings the power of advanced analytics and machine learning to the on-premise edge environment enabling a new class of applications for advanced monitoring and diagnostics, machine performance optimization, proactive maintenance and operational intelligence use cases. FogHorn’s technology is ideally suited for OEMs, systems integrators and end customers in manufacturing, power and water, oil and gas, renewable energy, mining, transportation, healthcare, retail, as well as Smart Grid, Smart City, Smart Building and connected vehicle applications.
- 2019 Edge Computing Company of the Year – Compass Intelligence
- 2019 Internet of Things 50: 10 Coolest Industrial IoT Companies – CRN
- 2018 IoT Planforms Leadership Award & Edge Computing Excellence – IoT Evolution World Magazine
- 2018 10 Hot IoT Startups to Watch – Network World. (Gartner estimated 20 billion connected things in use worldwide by 2020)
- 2018 Winner in Artificial Intelligence and Machine Learning – Globe Awards
- 2018 Ten Edge Computing Vendors to Watch – ZDNet & 451 Research
- 2018 The 10 Most Innovative AI Solution Providers – Insights Success
- 2018 The AI 100 – CB Insights
- 2017 Cool Vendor in IoT Edge Computing – Gartner
- 2017 20 Most Promising AI Service Providers – CIO Review
Our Series A round was for $15 million. Our Series B round was for $30 million October 2017. Investors include: Saudi Aramco Energy Ventures, Intel Capital, GE, Dell, Bosch, Honeywell and The Hive.
About the Data Science Solutions team
In 2018, our Data Science Solutions team grew from 4 to 9. We are growing again from 11. We work on revenue generating projects for clients, such as predictive maintenance, time to failure, manufacturing defects. About half of our projects have been related to vision recognition or deep learning. We are not only working on consulting projects but developing vertical solution applications that run on our Lightning platform, with embedded data mining.
Our data scientists like our team because:
- We care about “best practices”
- Have a direct impact on the company’s revenue
- Give or receive mentoring as part of the collaborative process
- Questions and challenging the status quo with data is safe
- Intellectual curiosity balanced with humility
- Present papers or projects in our “Thought Leadership” meeting series, to support continuous learning
About Foghorn Systems
Who Are We?
Vahak (https://www.vahak.in) is India’s largest & most trusted online transport marketplace & directory for road transport businesses and individual commercial vehicle (Trucks, Trailers, Containers, Hyva, LCVs) owners for online truck and load booking, transport business branding and transport business network expansion. Lorry owners can find intercity and intracity loads from all over India and connect with other businesses to find trusted transporters and best deals in the Indian logistics services market. With the Vahak app, users can book loads and lorries from a live transport marketplace with over 5 Lakh + Transporters and Lorry owners in over 10,000+ locations for daily transport requirements.
Vahak has raised a capital of $5 Million in a “Pre Series A” round from RTP Global along with participation from Luxor Capital and Leo Capital. The other marquee angel investors include Kunal Shah, Founder and CEO, CRED; Jitendra Gupta, Founder and CEO, Jupiter; Vidit Aatrey and Sanjeev Barnwal, Co-founders, Meesho; Mohd Farid, Co-founder, Sharechat; Amrish Rau, CEO, Pine Labs; Harsimarbir Singh, Co-founder, Pristyn Care; Rohit and Kunal Bahl, Co-founders, Snapdeal; and Ravish Naresh, Co-founder and CEO, Khatabook.
Responsibilities for Data Analyst:
- Undertake preprocessing of structured and unstructured data
- Propose solutions and strategies to business challenges
- Present information using data visualization techniques
- Identify valuable data sources and automate collection processes
- Analyze large amounts of information to discover trends and patterns
- Mine and analyze data from company databases to drive optimization and improvement of product development, marketing techniques and business strategies.
- Proactively analyze data to answer key questions from stakeholders or out of self-initiated curiosity with an eye for what drives business performance.
Qualifications for Data Analyst
- Experience using business intelligence tools (e.g. Tableau, Power BI – not mandatory)
- Strong SQL or Excel skills with the ability to learn other analytic tools
- Conceptual understanding of various modelling techniques, pros and cons of each technique
- Strong problem solving skills with an emphasis on product development.
- Programming advanced computing, Developing algorithms and predictive modeling experience
- Experience using statistical computer languages (R, Python, SQL, etc.) to manipulate data and draw insights from large data sets.
- Advantage - Knowledge of a variety of machine learning techniques (clustering, decision tree learning, artificial neural networks, etc.) and their real-world advantages/drawbacks.
- Demonstrated experience applying data analysis methods to real-world data problems
Create data funnels to feed into models via web, structured and unstructured data
Maintain coding standards using SDLC, Git, AWS deployments etc
Keep abreast of developments in the field
Deploy models in production and monitor them
Documentations of processes and logic
Take ownership of the solution from code to deployment and performance
Azure – Data Engineer
- At least 2 years hands on experience working with an Agile data engineering team working on big data pipelines using Azure in a commercial environment.
- Dealing with senior stakeholders/leadership
- Understanding of Azure data security and encryption best practices. [ADFS/ACLs]
Data Bricks –experience writing in and using data bricks Using Python to transform, manipulate data.
Data Factory – experience using data factory in an enterprise solution to build data pipelines. Experience calling rest APIs.
Synapse/data warehouse – experience using synapse/data warehouse to present data securely and to build & manage data models.
Microsoft SQL server – We’d expect the candidate to have come from a SQL/Data background and progressed into Azure
PowerBI – Experience with this is preferred
- Experience using GIT as a source control system
- Understanding of DevOps concepts and application
- Understanding of Azure Cloud costs/management and running platforms efficiently
- Design, build & test ETL processes using Python & SQL for the corporate data warehouse
- Inform, influence, support, and execute our product decisions
- Maintain advertising data integrity by working closely with R&D to organize and store data in a format that provides accurate data and allows the business to quickly identify issues.
- Evaluate and prototype new technologies in the area of data processing
- Think quickly, communicate clearly and work collaboratively with product, data, engineering, QA and operations teams
- High energy level, strong team player and good work ethic
- Data analysis, understanding of business requirements and translation into logical pipelines & processes
- Identification, analysis & resolution of production & development bugs
- Support the release process including completing & reviewing documentation
- Configure data mappings & transformations to orchestrate data integration & validation
- Provide subject matter expertise
- Document solutions, tools & processes
- Create & support test plans with hands-on testing
- Peer reviews of work developed by other data engineers within the team
- Establish good working relationships & communication channels with relevant departments
Skills and Qualifications we look for
- University degree 2.1 or higher (or equivalent) in a relevant subject. Master’s degree in any data subject will be a strong advantage.
- 4 - 6 years experience with data engineering.
- Strong coding ability and software development experience in Python.
- Strong hands-on experience with SQL and Data Processing.
- Google cloud platform (Cloud composer, Dataflow, Cloud function, Bigquery, Cloud storage, dataproc)
- Good working experience in any one of the ETL tools (Airflow would be preferable).
- Should possess strong analytical and problem solving skills.
- Good to have skills - Apache pyspark, CircleCI, Terraform
- Motivated, self-directed, able to work with ambiguity and interested in emerging technologies, agile and collaborative processes.
- Understanding & experience of agile / scrum delivery methodology
- Gather information from multiple data sources make Approval Decisions mechanically
- Read and interpret credit related information to the borrowers
- Interpret, analyze and assess all forms of complex information
- Embark on risk assessment analysis
- Maintain the credit exposure of the company within certain risk level with set limit in mind
- Build strategies to minimize risk and increase approval rates
- Design Champion and Challenger tests, implement and read test results
- Build Line assignment strategies
- Credit Risk Modeling
- Statistical Data Understanding and interpretation
- Basic Regression and Advanced Machine Learning Models
- Conversant with coding on Python using libraries like Sklearn etc.
- Build and understand decision trees
Roles & Responsibilities
- Proven experience with deploying and tuning Open Source components into enterprise ready production tooling Experience with datacentre (Metal as a Service – MAAS) and cloud deployment technologies (AWS or GCP Architect certificates required)
- Deep understanding of Linux from kernel mechanisms through user space management
- Experience on CI/CD (Continuous Integrations and Deployment) system solutions (Jenkins).
- Using Monitoring tools (local and on public cloud platforms) Nagios, Prometheus, Sensu, ELK, Cloud Watch, Splunk, New Relic etc. to trigger instant alerts, reports and dashboards. Work closely with the development and infrastructure teams to analyze and design solutions with four nines (99.99%) up-time, globally distributed, clustered, production and non-production virtualized infrastructure.
- Wide understanding of IP networking as well as data centre infrastructure
- Expert with software development tools and sourcecode management, understanding, managing issues, code changes and grouping them into deployment releases in a stable and measurable way to maximize production Must be expert at developing and using ansible roles and configuring deployment templates with jinja2.
- Solid understanding of data collection tools like Flume, Filebeat, Metricbeat, JMX Exporter agents.
- Extensive experience operating and tuning the kafka streaming data platform, specifically as a message queue for big data processing
- Strong understanding and must have experience:
- Apache spark framework, specifically spark core and spark streaming,
- Orchestration platforms, mesos and kubernetes,
- Data storage platforms, elasticstack, carbon, clickhouse, cassandra, ceph, hdfs
- Core presentation technologies kibana, and grafana.
- Excellent scripting and programming skills (bash, python, java, go, rust). Must have previous experience with “rust” in order to support, improve in house developed products
Red Hat Certified Architect certificate or equivalent required CCNA certificate required 3-5 years of experience running open source big data platforms
1)Machine learning development using Python or Scala Spark
2)Knowledge of multiple ML algorithms like Random forest, XG boost, RNN, CNN, Transform learning etc..
3)Aware of typical challenges in machine learning implementation and respective applications
Good to have
1)Stack development or DevOps team experience
2)Cloud service (AWS, Cloudera), SAAS, PAAS
3)Big data tools and framework
- Previous experience of working in large scale data engineering
- 4+ years of experience working in data engineering and/or backend technologies with cloud experience (any) is mandatory.
- Previous experience of architecting and designing backend for large scale data processing.
- Familiarity and experience of working in different technologies related to data engineering – different database technologies, Hadoop, spark, storm, hive etc.
- Hands-on and have the ability to contribute a key portion of data engineering backend.
- Self-inspired and motivated to drive for exceptional results.
- Familiarity and experience working with different stages of data engineering – data acquisition, data refining, large scale data processing, efficient data storage for business analysis.
- Familiarity and experience working with different DB technologies and how to scale them.
- End to end responsibility to come up with data engineering architecture, design, development and then implementation of it.
- Build data engineering workflow for large scale data processing.
- Discover opportunities in data acquisition.
- Bring industry best practices for data engineering workflow.
- Develop data set processes for data modelling, mining and production.
- Take additional tech responsibilities for driving an initiative to completion
- Recommend ways to improve data reliability, efficiency and quality
- Goes out of their way to reduce complexity.
- Humble and outgoing - engineering cheerleaders.