 # Statistics and Data Science

Parent Programme
Bachelor of Science in Computing (Level 7 NFQ)
MODULE NFQ
Level 7
MODULE CREDIT UNITS
ECTS: 5
MODULE TITLE
Statistics and Data Science
Reference Code: M3.7
STAGE
Year 3
Spring Semester 2
12 Weeks X 3.15 Hours per week

## Introduction to Statistics and Data Science

• The evolution and context of data science
• CRISP-DM approach and data mining tasks
• Descriptive vs. inferential statistics
• Population and sample; parameters and statistics
• Variables and observations
• Contextualising the course: careers in statistics, statistics in business and ICT
• Practical examples

## Exploratory data analysis (EDA)

Statistics and Data Science

• Types of variables: qualitative, quantitative
• Measurement scales: nominal, ordinal, discrete, continuous
• Types of graphs: bar charts, histograms, boxplots, time series plots scatterplots, etc.
• Summarising distributions: measures of centre (mean, median and mode);
• measures of variance (variance, standard deviation, quartiles, interquartile range); measures of skewness
• Contextualising EDA: examples of CSO reports

## Statistics and Data Science - Probability

• Experiments; outcomes; sample spaces; events; simple events; compound events
• Classical, relative frequency, and subjective probability approaches
• Calculating simple probabilities; marginal and conditional probabilities
• Mutually exclusive events; dependent and independent events; multiplication rule; addition rule; Bayes’ theorem
• Contextualising probability in industry: e.g., examples of weather and climate

## Distributions: Discrete

Statistics and Data Science -Distributions: Discrete

• Probability mass functions (PMFs)
• Bernoulli random variables
• Binomial distribution
• Poisson distribution
• Contextualising distributions using industry software: examples using R, Python etc.

Distributions: Continuous

• Cumulative distribution functions (CDFs) and probability density functions (pdfs)
• Exponential distribution
• Normal distribution: properties, z-scores, normal tables
• Contextualising distributions using industry software: examples using R, Python etc.

## Hypothesis testing

• Testing theories; Introduction to hypothesis testing
• Testing a mean or proportion (one-sample tests)
• Testing the difference between two means or two proportions (two-sample tests)
• Chi-squared tests
• Sample size and power
• Contextualising hypothesis tests: examples from business and finance

## Making predictions

• Correlation
• Regression line; regression analysis; least squares fit; producing predictions
• Multiple regression and non-linear regression
• Model validation; outliers; influential observations
• Contextualising predictions:
• cryptocurrency and other financial examples, weather examples, measuring error, corrections to predictions etc.

## Data science in context

• Statistics in business - legal issues, ethics, GDPR, privacy

#### Minimum Intended Learning Outcomes (MIMLOs)

Upon successful completion of this module, the learner should be able to:
LO1
Evaluate the value of data science and its IT applications.
LO2
Appraise how data science can benefit managers in decision making.
LO3
Apply statistical and probability techniques to analysing data.
LO4
Assess data in data warehouses to inform business intelligence.
LO5
Combine legal requirements and ethics in decision making.

#### Assessment

MIMLOs
Assessment
Percentage
1-3
Continuous Assessment
60%
1-5
Exam
40%

### Aims & Objectives

The aim of this module is to equip the learner with the tools and methods to find structure in and to give deeper insight into data and to analyse and quantify uncertainty.   This module shows the learners how data science can help organisations to reduce costs, make more informed decisions and develop new products and services.

This module will ensure learners meet the following objectives:

• Develop an understanding of data collection.
• Assembling data through data cleaning and transformation.
• Undertake preliminary data analysis using graphs and descriptive and inferential statistics.
• Identify and develop the model that bests fits the problem requirement.
APPLY NOW
Top