Publication in the Diário da República: Despacho n.º 7043/2016 - 27/05/2016
7.5 ECTS; 1º Ano, 1º Semestre, 30,0 PL + 30,0 TP + 15,0 OT + 10,0 O
- Ricardo Nuno Taborda Campos
1. Get familiarized with the 5 V?s of big data;
2. Understand the risks of using big data in what concerns to data privacy
3. Understand the lifecycle of a big data project and its architecture
4. Get to know query, storage and distributed systems behind big data
5. Know how to extract information
1. Introduction to Data Science
- What is Data Science?
- Data Analysis, Data Analytics, Big Data
- Skills to become a Data Scientist
- Data Science Lifecycle
2. Ethics and Data Privacy
- How can we avoid big data?
3. Introduction to Big Data
- What is big data?
- Who is using Big Data?
- Where is this data coming from?
- Why are they collecting this data?
- How does big data differs from traditional databases?
- Different types of data
- Get familiarized with the 5 V?s of Big Data: volume, velocity, variety, veracity and value;
4. Big Data Storage and Processing Framework: Apache Hadoop e Spark
5. Text Analytics
- What is Text Analytics?
- Natural Language Processing (NLP) Arquitecture;
- NLP commercial solutions;
- Text Analytics with Python and NLTK
Periodic: Proj (100%)
Exam: Proj (100%)
The project is required to obtain approval. In case of non-delivery, students are automatically reproved getting unable to propose to exam
- Provost, F. e Fawcett, T. e , . (2013). Data Science for Business. (pp. 1-386). USA: OÂ´Reilly
- Witten, I. e Frank, E. e Hall, M. (2011). Data Mining: Practical Machine Learning Tools and Techniques. (pp. 1-629). USA: Elsevier
- Erl, T. e Khattak, W. e Buhler, P. (2016). Big Data Fundamentals: Concepts, Drivers & Techniques. (pp. 1-235). USA: Prentice Hall
- Davis, K. (2012). Ethics of Big Data. (pp. 1-79). USA: OÂ´Reilly
Theoretical and practical teaching with audiovisual media, laboratory equipment and practical examples. Assessement: Realization and presentation of group projects.
Software used in class
Apache Hadoop; Spark; Python and NLTK