Machine Learning 2.a : Applications

By Sagar Gandhi on

As promised last week, the question - Why should I care? will be addressed in this and the next, connecting post.
In general, what do you think, when do people care about something?
If something is having a direct or indirect impact on their lives, they usually care. Machine Learning has advanced the ability of human being to perform almost every GOFAI task in a smarter way.
There are many aspects like increased computation speed, data availability, affordable data storage, which can explain the rise of the Machine Learning era. But the most prominent one is the Practical Applications of Machine Learning. Simply put, it actually works in real life! It is being used by tens of thousands of companies, just to craft smarter machines.

Hence, rather than extrapolating reasons of why people are using Machine Learning, more fun is identifying where and why they are using it. Listing down the places where ML is used is a massive task. There are multiple perspectives, like a researcher’s view, an entrepreneur’s view, etc. The focus of this post is what seems frequent, hence, by and large, though this list covers many applications, there are many more.
The adopted model of presenting use-cases is simplistic: Problem solved and who is solving it. Worth mentioning, only a single firm (not sure if the best) is mentioned for each application, but there are many more players in each area.

Let us delve straight into the applications.

(1) Finance: Machine Learning has been applied to finance sector since the nineties. As finance is a huge parent field, there are manifold of applications belonging to this industry.

i. Predictive Analysis for Loans using Credit Score
Task Done: Using credit score, predicting bad loans early, so that risk can be minimized.
Who: Lending Club - A platform which connects lenders to borrowers, uses machine learning heavily.

ii. Trading Algorithm
Task Done: As a fancy synonym - ‘High Frequency Trading’ suggests, from massive amount of data, searching a true signal representing market dynamics is very niche task. That can be done using Machine Learning
Who: KFL Capital - Uses statistical models to perform predictions on financial data.

iii. Fraud Detection
Task Done: Fraud Detection is a huge problem Finance Industry faces, costing around $80 billion each year. So there is definite need of analyzing extremely large data in real-time. This part comprises of multiple aspects - Authentication and Behavioral Analysis, each of these can be studies independently.
Who: Feedzai: Uses Machine Learning and Artificial Intelligence to achieve a task of safety for business and experience for customers.

iv. Information Extraction
Task Done: Using Machine Learning techniques to extrapolate information from the Web Content, such as articles, publications, UGC (tweets, etc.), and convert it into actionable signal and identifying the trend.
Who: Dataminr: Termed as one of the disruptive company, Dataminr transforms twitter streams and other data into a must-know information, in real-time.

(2) Biology: Machine Learning is clearly alleviating the burden of solving many biological problems, by saving the cost and time, and profiling predictions that guide new experiments. The reason is simple: Data enrichment and intense information are driving the Biological Sciences.

i. Evolution - Phylogenetic Tree Construction
Task Done: Phylogenetic analyses have become central to understanding biodiversity, evolution, ecology, and genomes. Hence, from the available divergent data, based on comparison between different genomes, Phylogenetic Trees are constructed.
Who: Geneious provides phylogenetic analysis software that helps in visualization and production of trees on the fly.

ii. Genomics
a. Gene Prediction and annotation:
Task Done: Gene Prediction is one of the key steps in Genome Annotation. There exists a need to find sub-sequences of bases that encode proteins. What would be better tool than machine learning to deal of huge piles of data?
Who: Softberry: These guys help in annotating genomes of multiple species, in offering services of Gene Finding.

b. Motif Identification
Task Done: Within a complex organism, what are exactly functional connections is a prominent question to be asked. It would directly impact our understanding of such organisms. One of the first step in the process is to obtain accurate internal representation, and this is what Motif Finding is
Who: MEME suite is a collection of tools for the discovery and analysis of sequence motifs. They use statistical modelling techniques to choose best parameters describing each motif.

iii. Proteomics
Proteins perform vast array of functions within living organisms. Proteomics is the large-scale study of such proteins.

a. Function Prediction:
Task Done: If we are keen to understand how exactly convoluted things perform at higher level, there is a need to understnad life at the molecular level. However, only experiments are not sufficient, due to their innate complexity and also the expense; scalability is a huge issue. The computational annotation of protein function has therefore emerged as a problem at the forefront and people are using Machine Learning to solve this problem.

b. Structure Prediction:
Task Done: Protein structure prediction is the prediction of the three-dimensional structure of a protein from its amino acid sequence - that is, the prediction of its folding and its secondary, tertiary, and quaternary structure from its primary structure.
Who: There are many online servers available to perform this task. https://en.wikipedia.org/wiki/List_of_protein_structure_prediction_software

iv. Systems Biology
Task Done: Genetic Regulatory Networks (GRNs) is a collection of molecular regulators that interact with each other and with other substances in the cell to govern the gene expression levels. They can be represented as large directed graphs and their inference is a central problem in bioinformatics. Because of the scarcity of quality data and contained noise, machine learning is essential to performing good and tractable inference of the underlying causal structure.
Who: GINsim (Gene Interaction Network simulation) is a computer tool for the modeling and simulation of genetic regulatory networks. GINsim consists of a simulator of qualitative models of genetic regulatory networks based on a discrete, logical formalism.

v. Microarray Data Analysis
Task Done: Microarray analysis techniques are used in interpreting the data generated from experiments on DNA, RNA, and protein microarrays, which allow one to investigate the expression state of a large number of genes - in many cases, an organism’s entire genome - in a single experiment. Such experiments can generate very large volumes of data, allowing researchers to assess the overall state of a cell or organism. These large data amount can be difficult to analyze, and this is where Machine Learning techniques come handy.
Who: Bioconductor: Has advanced facilities for analysis of microarray platforms including Affymetrix, Illumina, Nimblegen, Agilent, and other one- and two-color technologies. Bioconductor includes extensive support for analysis of expression arrays.

vi. Image Analysis
Bio-imaging involves the process of gathering, processing and realizing the structural or functional imagege of complex objects. Examples are many, including X-ray, CT, MRI and fMRI, PET and HRRT PET, SPECT, MEG and so on. Medical imaging and microscope/fluorescence image processing are important parts of bio-imaging. Together, they help in diagnose or examine the diseases and of course the study of normal anatomy and physiology.
It would be unfair to list down one/two firms working on this area, as this arena in itself is too large.

Well, that is enough to envelop in one go, isn’t it?

While gathering the information, mostly Wikipedia is used, but in order to assure the exactness, other material is indirectly referred.

Among the domains covered, i.e. Finance and Biology, if you are aware of other parent branches, mention then in comments. More areas to follow in the coming blogs …

Note: A special thanks to Vivek for enthusiastically reviewing and helping in aligning the content.