Publication in the Diário da República: Despacho n.º 7043/2016 - 27/05/2016
7.5 ECTS; 1º Ano, 1º Semestre, 30,0 PL + 30,0 TP + 15,0 OT + 10,0 O
- Ricardo Nuno Taborda Campos
1. Get familiarized with the 5 V?s of big data;
2. Understand the risks of using big data in what concerns to data privacy
3. Understand the lifecycle of a big data project and its architecture
4. Get to know query, storage and distributed systems behind big data
5. Know how to extract information
1. Introduction to Data Science
- What is Data Science?
- Data Analysis, Data Analytics, Big Data
- Skills to become a Data Scientist
- Data Science Lifecycle
2. Ethics and Data Privacy
- How can we avoid big data?
3. Introduction to Big Data
- What is big data?
- Who is using Big Data?
- Where is this data coming from?
- Why are they collecting this data?
- How does big data differs from traditional databases?
- Different types of data
- Get familiarized with the 5 V?s of Big Data: volume, velocity, variety, veracity and value;
4. Big Data Storage and Processing Framework: Apache Hadoop e Spark
5. Text Analytics
- What is Text Analytics?
- Natural Language Processing (NLP) Arquitecture;
- NLP commercial solutions;
- Text Analytics with Python
Periodic Assessment: Research Project (RP) (50%)+Hands-on Lab(50%)
Students are excluded from the exam if they score < 4 points in either of the 2 assessment moments or if they do not reach a minimum of 70% of attendance.
Final Evaluation: RP(100%)
- Khattak, W. e Erl, T. e Buhler, P. (2016). Big Data Fundamentals: Concepts, Drivers & Techniques. (pp. 1-235). USA: Prentice Hall
- Hall, M. e Frank, E. e Witten, I. (2011). Data Mining: Practical Machine Learning Tools and Techniques. (pp. 1-629). USA: Elsevier
- , . e Fawcett, T. e Provost, F. (2013). Data Science for Business. (pp. 1-386). USA: OÂ´Reilly
- Davis, K. (2012). Ethics of Big Data. (pp. 1-79). USA: OÂ´Reilly
Theoretical and practical teaching with audiovisual media, laboratory equipment and practical examples. Assessement: Realization and presentation of group projects.
Software used in class
Apache Hadoop; Spark; Python: Anaconda e Jupyter Notebooks