CISC 603-50- R-2017/Summer – Theory of Computation

CISC 603-50- R-2017/Summer – Theory of Computation

CISC 603-50- R-2017/Summer – Theory of Computation

A Survey on Domain-Specific Languages for Machine.pdf

A Survey on Domain-Specific Languages for Machine

Learning

August 3, 2017

CISC 603-50- R-2017/Summer – Theory of Computation

Student: Dileep Sharma

Instructor: Majid Shaalan

Contents

1 Statement 2

2 Abstract 2

3 Introduction 2

3.1 Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

3.2 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3.3 Domain Specific Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4 DSL Feature Model 4

4.1 Language Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4.2 Transformation Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4.3 DSL Tool Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4.4 DSL Process Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5 Languages Surveyed 9

5.1 OptiML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5.2 ScalOps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5.3 Scala . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5.4 PIG LATIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5.5 Breukervl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5.6 Possibility of survey of other language . . . . . . . . . . . . . . . . . . . . . 11

5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6 Reference 12

1

1 Statement

The purpose of this paper is to identify, describe and design Domain Specific Language(

DSL ) applicable to Machine learning world in big data space, that can make process more

faster and efficient.

2 Abstract

In last couple of decades, the data we have at our disposal have increased tremendously because of technology advance. This technology advance has helped us in capturing, storing, analogizing and visualizing data and that has lead to big data. We need better algorithm to read and analyze these big and complex diastases. Machine Learning is turning out to be the most effective way of analyzing these datasets and predicting future behavior.

To better analyze these datasets with Machine Learning we need enhanced computational power, that can be obtained using parallel processing using GPUs. Machine Learning algorithms needs to be adapted and optimized to specific applications. However, programming these devices to run efficiently and correctly is difficult, error-prone, and results in software that is harder to read and maintain. This paper is primarily concern about Domain Specific language that can help us in writing Machine Algorithms in efficient way to analyze Big Data.

3 Introduction

Technological advance in recent past has caused a data revolution. This high volume of data is called big data. Every second, smartphones, tablets,cars, websites, and systems generate a massive amount of data, and users and software engineers have access to a subset of that data to perform their activities.

CISC 603-50- R-2017/Summer – Theory of Computation

3.1 Big Data

Apart for large amount of data, Big data also accounts for complex data, known as variety. Big Data also created new challenges in data management. Traditional ways of data storage and analysis do not scale well to this amount of data, which can reach hundreds of terabytes or more, and new approaches are being developed to address these issues Big data is basically defined by 5V’s:

• Volume

– Refers to amount of data

– Big Data doesnt sample

– Big Data observes and tracks what happens

• Velocity

– Speed of data processing

– Speed of data generation

– Big Data is often available in real-time

• Variety

– Number of types of data

• Variability

– Inconsistency of data

• Veracity

– Quality of data

3.2 Machine Learning

Machine learning is turning out to be one of the most advanced technique to process and make inferences from Big Data. Machine Learning is widely used to discover identify trends, patterns, suggest actions, and optimize output. There are still a lot of challenges in using Machine Learning to solve big data problems, such as memory and time issues. To resolve these issues, we can use GPUs for parallel processing and scatter data across different machines. There are basically two kind of Machine Learning:

• Supervised

– All data is labeled

– You have both Input variable and Output variable

– Use an algorithm to learn the mapping function from the input to the output

• Unsupervised

– All data is unlabeled

– You only have input data and no corresponding output variables

– Algorithm try to find pattern in input data

3.3 Domain Specific Language

In model-driven engineering, a Domain-Specific Language (DSL) is a specialized language, which, combined to a transformation function, serves to raise the abstraction level of software and ease software development. The Machine Learning Implementation can be made better by using techniques such as Domain-Specific. DSL solves problem in a single domain while General Purpose Languages(GSL) solves problems in a couple of domains. DSLs facilitate results to be expressed in the idiom and at the level of abstraction of the problem domain Language.

DSLs offer pre-defined abstractions to represent concepts from the application domain. This representation may be more clear and intuitive. Moreover, DSL compilers may optimize the code written for the specific domain, and they can perform error detection more efficiently. Lastly, DSLs may have more specific tool support that help software engineers increase their productivity. These languages are easier to learn. There can be three kind of DSL languages:

• Markup language

• Specification Language

• Programming Language

4 DSL Feature Model

DSl Feature model covers languages, transformation, tooling, and process aspects 1. Language and transformation are mandatory features because they are parts of the DSL definition. Tool is also mandatory because it serves to automate transformation from a domain, the problem space, down to lower abstraction levels, the solution space. Process is optional because it can be undefined or implicit.

4.1 Language Features

There are two language features called as2:

• Abstract Syntax

– Characterizes elements of a domain and their relationships without implementation consideration

• Concrete Syntax

– Representation of a DSL in a human usable form

Figure 1: Roots of DSL Feature Language

Figure 2: Language Feature

Figure 3: Root of the Transformation Features

4.2 Transformation Feature

Transformation feature ensures the correspondence from the problem to the solution, takes into account the problem-to-solution element mapping, and all design, implementation, platform and architecture decisions. Transformation has to answer to three questions3:

How to specify transformation4? What are the assets expected from the transformation5?

How to realize the transformation to produce the expected assets? 6.

4.3 DSL Tool Features

There are basically three kind of tool features namely Respect of Abstraction, Assistance and Quality Factor/reftool. The purpose of abstraction is to reduce software description.

Abstraction can be intrusive pr seamless. Assistance aims at guiding the DSL tool user during definition and transformation of domain data. Assistance is adaptive when assistance changes in function of the context of usage and its Static when it does not changes. Process guidance guide the user at a process step or at the process workflow level and Checking is mandatory in the feature model because a DSL tool must ensure consistency and completeness of domain data. DSL checking can be realized on the fly or on user action. Quality Factor covers non-functional aspects of DSL Tool.

CISC 603-50- R-2017/Summer – Theory of Computation

Figure 4: Specification Features

Figure 5: Target Asset Features

Figure 6: Operational Transformation Features

Figure 7: DSL Tool Features

Figure 8: DSL Process Features

place-order

4.4 DSL Process Features

A DSL process defines how development projects with DSL must be executed. This part of the feature model addresses the Domain-Specific Software Development (DSSD). It can be of three types: Work Definition, Role and Guidance/refProcess.

5 Languages Surveyed

5.1 OptiML

• Highly expressive language textual programming language built on top of Scala

• OptiML provides the link between ML applications and heterogeneous parallel hard- ware

• OptiML code outperformed explicitly parallelized MATLAB code on heterogeneous system

• Require no knowledge of the underlying embedding implementation

• No explicit parallelization

• No explicit code for the lower level programming models

• Statically typed language

• Declarative language

• Transative language

• Support Vector, matrix, Graph operations

• Does not supports Distributed and Cloud computing feature

5.2 ScalOps

• It enables algorithms to run on cloud

• Textual programming language

• A declarative language

• Statically typed

• Supports vector, matrix,and graph operations in both parallel and cloud computing environment

• Transative language

5.3 Scala

• It is concise

• Has broad applicability

• Texual programming language