-
Presentation
Presentation
This course encompasses a range of subjects about data engineering from the perspective of a Data Scientist, covering various aspects from different data sources to the structuring and provisioning of processed data for modeling and visualization purposes. Students acquire foundational conceptual knowledge in data engineering, enabling them to furnish data of high quality for utilization in data science applications.
-
Class from course
Class from course
-
Degree | Semesters | ECTS
Degree | Semesters | ECTS
Bachelor | Semestral | 6
-
Year | Nature | Language
Year | Nature | Language
2 | Mandatory | Português
-
Code
Code
ULHT6634-24446
-
Prerequisites and corequisites
Prerequisites and corequisites
Not applicable
-
Professional Internship
Professional Internship
Não
-
Syllabus
Syllabus
This course is divided into the following programmatic content: S1 - Introduction What is a Data Engineer? What does a Data Scientist need to know about data engineering? Data engineering pipelines. S2 - Relational Databases SQL review. Relational concepts used in data models. S3 - Data Modeling Data sources. Data Lake. Data Warehouse. Data Lakehouse. Data models. S4 - Fundamentals of Big Data Hadoop. MapReduce. S5 - Data Transformation Necessary transformations for storing data for Data Science projects. S6 - Data Visualization Tools S7 - NoSQL Databases S8 - Data Engineering Projects
-
Objectives
Objectives
LG1. Learn fundamental concepts of data engineering. LG2. Understand the data engineering lifecycle. LG3. Use SQL to transform and query data. LG4. Understand data modeling techniques for organizing and managing data. LG5. Build pipelines to collect, transform, analyze and visualize data from operational source systems. LG6. Be able to apply the principles used in class to build a simple data pipeline and visualize the data.
-
Teaching methodologies and assessment
Teaching methodologies and assessment
The lectures are conducted in person and are primarily based on exposition. Students are encouraged to actively participate by asking questions stimulating their interest in the subject matter. When appropriate, specific problems that students are familiar with are analysed before the presentation of content. Some topics arise from analysing problems, the resolution of which naturally leads to their questioning and/or formulation. Whenever possible, examples and counterexamples are provided to illustrate the content. At the end of most lectures, problems are presented for students to work on independently to ensure a thorough understanding of the concepts and techniques covered.
-
References
References
Lau, S., Gonzales, J., Nolan, D. - Learning Data Science. available at: https://learningds.org/intro.html Kleppmann, M. (2017). Designing Data-Intensive Applications. O'Reilly Media.
-
Office Hours
Office Hours
-
Mobility
Mobility
No