Analysis and Processing of Big Data

Mestrado em Engenharia Informática - Internet das Coisas
7.5 ECTS; 1º Ano, 1º Semestre, 30,0 PL + 30,0 TP + 15,0 OT + 10,0 O

- Ricardo Nuno Taborda Campos

Not applicable

1. Get familiarized with the 5 V?s of big data;
2. Understand the risks of using big data in what concerns to data privacy
3. Understand the lifecycle of a big data project and its architecture
4. Get to know query, storage and distributed systems behind big data
5. Know how to extract information

1. Introduction to Data Science
- What is Data Science?
- Data Analysis, Data Analytics, Big Data
- Skills to become a Data Scientist
- Data Science Lifecycle

2. Ethics and Data Privacy
- How can we avoid big data?
- Identity;
- Privacy;
- Ethics;
- Ownership;
- Reputation;

3. Introduction to Big Data
- What is big data?
- Who is using Big Data?
- Where is this data coming from?
- Why are they collecting this data?
- How does big data differs from traditional databases?
- Different types of data
- Get familiarized with the 5 V?s of Big Data: volume, velocity, variety, veracity and value;

4. Big Data Storage and Processing Framework: Apache Hadoop e Spark
- MapReduce;
- RDDs
- Dataframes
- Streaming

5. Text Analytics
- What is Text Analytics?
- Applications;
- Natural Language Processing (NLP) Arquitecture;
- NLP commercial solutions;
- Text Analytics with Python

Evaluation Methodology
Periodic Assessment: Research Project (RP) (50%)+Hands-on Lab(50%)
Students are excluded from the exam if they score < 4 points in either of the 2 assessment moments or if they do not reach a minimum of 70% of attendance.
Final Evaluation: RP(100%)

- Provost, F. e Fawcett, T. e , . (2013). Data Science for Business. (pp. 1-386). USA: O´Reilly
- Witten, I. e Frank, E. e Hall, M. (2011). Data Mining: Practical Machine Learning Tools and Techniques. (pp. 1-629). USA: Elsevier
- Erl, T. e Khattak, W. e Buhler, P. (2016). Big Data Fundamentals: Concepts, Drivers & Techniques. (pp. 1-235). USA: Prentice Hall
- Davis, K. (2012). Ethics of Big Data. (pp. 1-79). USA: O´Reilly

Method of interaction
Theoretical and practical teaching with audiovisual media, laboratory equipment and practical examples. Assessement: Realization and presentation of group projects.

Software used in class
Apache Hadoop; Spark; Python: Anaconda e Jupyter Notebooks