diff --git a/Semester_2/Einheit_06/Uebung_5_LSG.ipynb b/Semester_2/Einheit_06/Uebung_5_LSG.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..c2dd61d9fe4e55873a0a6b114db534ec52c14765 --- /dev/null +++ b/Semester_2/Einheit_06/Uebung_5_LSG.ipynb @@ -0,0 +1,1931 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "03c85993", + "metadata": {}, + "source": [ + "# <font color='blue'>**Übung 5 - Datenanalyse - Pandas**</font>\n", + "(Diese Übung gehört zur Vorlesungseinheit 6)\n", + "\n", + "## <font color='blue'>**Problemstellung: Analyse von PKW-Verbrauchsdaten**</font>\n", + "### <font color='blue'>**Problembeschreibung**</font>\n", + "\n", + "Eine umfangreiche frei verfügbare Datenbank über Verbrauchsdaten von gut 46.000 PKW-Modelle wird von der US-Regierung unter https://www.fueleconomy.gov/feg/ws/index.shtml zur Verfügung gestellt (Eine bereits etwas aufbereitete Form liegt beweits als \"vehicles.csv\" im Verzeichnis dieser Übung). Diese soll mithilfe des Pakets pandas weiter aufbereitet und untersucht werden. Der Verbrauch ist wie in den USA üblich in **miles per gallon** angegeben. Dies soll zu **l/100km** umgewandelt werden. Außerdem sollen Datenlücken sinnvoll gefüllt werden. Der **Hubraum** (displ) ist bereits in Liter angegeben. Wenn in dieser Übung ohne weitere Angabe von **Verbrauch** gesprochen wird, ist der **kombinierte Verbrauch** (Stadt- und Land) gemeint. **Fueltype** in der bereitgestellten Datenbasis bezieht sich auf den **Primärkraftstoff** (Hybridfahrzeuge erscheinen als Verbenner). Bei Elektrofahrzeugen ist der Verbrauch in **miles per gallon gasoline equivalents** angegeben (Hintergrund für Interessierte https://www.caranddriver.com/research/a31863350/mpge/).\n", + "\n", + "Nach einer grundsätzlichen Vertrautmachung mit der Datenbasis sollen folgende **Fragestellungen** beantwortet werden:\n", + "1) Zusammenhang zwischen Hubraum und Verbrauch, sowie Jahr und Verbrauch bei allen PKW (z.B. Korrelation, Scatter-plot, Liniendiagramm mit Median-Verbrauch über die Zeit)\n", + "2) Zusammenhang der Verbrauchsdaten vom verwendeten Kraftstoff (Boxplot mit Quartilen)\n", + "3) Vergleich der Verbrauchsdaten der deutschen Marken Audi, BMW, Mercedes-Benz, Porsche und Volkswagen (Boxplot mit Quartilen)\n", + "4) Vergleich der Verbrauchsdaten der 5 häufigsten Fahrzeugklassen (Boxplot mit Quartilen, Klassen automatisiert ermitteln)\n", + "5) Ermitteln der 15 verbrauchsarmsten Fahrzeuge eines gewählten Herstellers (oder Klasse) unter Ausschluss bestimmter Kraftstoffarten (z.B. ausgenommen Elektrofahrzeuge)\n", + "\n", + "### <font color='blue'>**Modellbildung und Algorithmierung**</font>\n", + "\n", + "Das Paket Pandas übernimmt die Methoden zur Datenspeicherung und Auswertung, die wir zum Beantworten der Fragestellungen verwenden und zum Teil kombinieren müssen. Dies wird im Bereich Umsetzung für jede der Fragen separat erklärt. Dies passt zu der eher interaktiven Anwendung bei der Datenauswertung eines neuen Datensatzes. Hat man regelmäßig Datensätze nach gleichem Format, können die pandas-Methoden natürlich auch in Algorithmen verwendet werden (z.B. monatliche Statistiken über Verkaufszahlen).\n", + "\n", + "### <font color='blue'>**Umsetzung**</font>\n", + "\n", + "Zunächst importieren wir das Paket pandas und lesen die CSV-Datei in einen Dataframe ein (Da wir später hauptsächlich in einer Kopie arbeiten, nennen wir diesen z.B. `df_orig`). Von diesem lassen wir uns zunächst die ersten Einträge anzeigen, um den Aufbau der Datenbank zu erkennen." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "2d37eba9", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "<div>\n", + "<style scoped>\n", + " .dataframe tbody tr th:only-of-type {\n", + " vertical-align: middle;\n", + " }\n", + "\n", + " .dataframe tbody tr th {\n", + " vertical-align: top;\n", + " }\n", + "\n", + " .dataframe thead th {\n", + " text-align: right;\n", + " }\n", + "</style>\n", + "<table border=\"1\" class=\"dataframe\">\n", + " <thead>\n", + " <tr style=\"text-align: right;\">\n", + " <th></th>\n", + " <th>make</th>\n", + " <th>model</th>\n", + " <th>year</th>\n", + " <th>VClass</th>\n", + " <th>cylinders</th>\n", + " <th>displ</th>\n", + " <th>fuelType</th>\n", + " <th>city</th>\n", + " <th>highway</th>\n", + " <th>combined</th>\n", + " </tr>\n", + " <tr>\n", + " <th>id</th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " </tr>\n", + " </thead>\n", + " <tbody>\n", + " <tr>\n", + " <th>0</th>\n", + " <td>Alfa Romeo</td>\n", + " <td>Spider Veloce 2000</td>\n", + " <td>1985</td>\n", + " <td>Two Seaters</td>\n", + " <td>4.0</td>\n", + " <td>2.0</td>\n", + " <td>Regular Gasoline</td>\n", + " <td>19</td>\n", + " <td>25</td>\n", + " <td>21</td>\n", + " </tr>\n", + " <tr>\n", + " <th>1</th>\n", + " <td>Ferrari</td>\n", + " <td>Testarossa</td>\n", + " <td>1985</td>\n", + " <td>Two Seaters</td>\n", + " <td>12.0</td>\n", + " <td>4.9</td>\n", + " <td>Regular Gasoline</td>\n", + " <td>9</td>\n", + " <td>14</td>\n", + " <td>11</td>\n", + " </tr>\n", + " <tr>\n", + " <th>2</th>\n", + " <td>Dodge</td>\n", + " <td>Charger</td>\n", + " <td>1985</td>\n", + " <td>Subcompact Cars</td>\n", + " <td>4.0</td>\n", + " <td>2.2</td>\n", + " <td>Regular Gasoline</td>\n", + " <td>23</td>\n", + " <td>33</td>\n", + " <td>27</td>\n", + " </tr>\n", + " <tr>\n", + " <th>3</th>\n", + " <td>Dodge</td>\n", + " <td>B150/B250 Wagon 2WD</td>\n", + " <td>1985</td>\n", + " <td>Vans</td>\n", + " <td>8.0</td>\n", + " <td>5.2</td>\n", + " <td>Regular Gasoline</td>\n", + " <td>10</td>\n", + " <td>12</td>\n", + " <td>11</td>\n", + " </tr>\n", + " <tr>\n", + " <th>4</th>\n", + " <td>Subaru</td>\n", + " <td>Legacy AWD Turbo</td>\n", + " <td>1993</td>\n", + " <td>Compact Cars</td>\n", + " <td>4.0</td>\n", + " <td>2.2</td>\n", + " <td>Premium Gasoline</td>\n", + " <td>17</td>\n", + " <td>23</td>\n", + " <td>19</td>\n", + " </tr>\n", + " </tbody>\n", + "</table>\n", + "</div>" + ], + "text/plain": [ + " make model year VClass cylinders displ \\\n", + "id \n", + "0 Alfa Romeo Spider Veloce 2000 1985 Two Seaters 4.0 2.0 \n", + "1 Ferrari Testarossa 1985 Two Seaters 12.0 4.9 \n", + "2 Dodge Charger 1985 Subcompact Cars 4.0 2.2 \n", + "3 Dodge B150/B250 Wagon 2WD 1985 Vans 8.0 5.2 \n", + "4 Subaru Legacy AWD Turbo 1993 Compact Cars 4.0 2.2 \n", + "\n", + " fuelType city highway combined \n", + "id \n", + "0 Regular Gasoline 19 25 21 \n", + "1 Regular Gasoline 9 14 11 \n", + "2 Regular Gasoline 23 33 27 \n", + "3 Regular Gasoline 10 12 11 \n", + "4 Premium Gasoline 17 23 19 " + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "df_orig = pd.read_csv(\"vehicles.csv\", index_col=0)\n", + "df_orig.head()" + ] + }, + { + "cell_type": "markdown", + "id": "eda1a1b9", + "metadata": {}, + "source": [ + "Für jedes Fahrzeugmodell stehen die Einträge **id** (fortlaufende Nummer), **make** (Hersteller), **model** (Modell), **year** (Jahr), **VClass** (Fahrzeugtyp/Fahrzeugklasse), **cylinders** (Zylinderanzahl), **displ** (Hubraum in l), **fuelType** (primärer Kraftstoff), **city**, **highway** und **combined** (Verbrauch in mpg für Stadt, Land und kombiniert) zur Verfügung.\n", + "\n", + "#### <font color='blue'>**Aufbereitung**</font>\n", + "\n", + "Als erstes sollen die Daten aufbereitet werden. Wir definieren dazu eine Funktion, die den Verbrauch in mpg zu l/100km, gerundet auf eine Nachkommastelle, umrechnet. Dazu benötigen wir die Umrechnungsfaktoren:\n", + "\n", + "| Imperial | Metrisch |\n", + "| :--- | :--- |\n", + "| 1 mile | 1.61 km |\n", + "| 1 gallon | 3.79 l | " + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "801061e9", + "metadata": {}, + "outputs": [], + "source": [ + "def mpg_to_lp100km(mpg):\n", + " return round(379./(mpg*1.61), 1) " + ] + }, + { + "cell_type": "markdown", + "id": "bf5e39b2", + "metadata": {}, + "source": [ + "Diese Funktion können wir nun mit `apply`auf die Einträge der Verbauchsspalten anwenden. Dazu kopieren wir zunächst den eingelesenen Dataframe, um die Originaldaten nicht zu beeinflussen. Die ermittelten Spalten werden in der Kopie gespeichert und ersetzen den ursprünglichen Wert. Wir prüfen den Datensatz, um zu erkennen, ob es funktioniert hat." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "b905c9a2", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "<div>\n", + "<style scoped>\n", + " .dataframe tbody tr th:only-of-type {\n", + " vertical-align: middle;\n", + " }\n", + "\n", + " .dataframe tbody tr th {\n", + " vertical-align: top;\n", + " }\n", + "\n", + " .dataframe thead th {\n", + " text-align: right;\n", + " }\n", + "</style>\n", + "<table border=\"1\" class=\"dataframe\">\n", + " <thead>\n", + " <tr style=\"text-align: right;\">\n", + " <th></th>\n", + " <th>make</th>\n", + " <th>model</th>\n", + " <th>year</th>\n", + " <th>VClass</th>\n", + " <th>cylinders</th>\n", + " <th>displ</th>\n", + " <th>fuelType</th>\n", + " <th>city</th>\n", + " <th>highway</th>\n", + " <th>combined</th>\n", + " </tr>\n", + " <tr>\n", + " <th>id</th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " </tr>\n", + " </thead>\n", + " <tbody>\n", + " <tr>\n", + " <th>0</th>\n", + " <td>Alfa Romeo</td>\n", + " <td>Spider Veloce 2000</td>\n", + " <td>1985</td>\n", + " <td>Two Seaters</td>\n", + " <td>4.0</td>\n", + " <td>2.0</td>\n", + " <td>Regular Gasoline</td>\n", + " <td>12.4</td>\n", + " <td>9.4</td>\n", + " <td>11.2</td>\n", + " </tr>\n", + " <tr>\n", + " <th>1</th>\n", + " <td>Ferrari</td>\n", + " <td>Testarossa</td>\n", + " <td>1985</td>\n", + " <td>Two Seaters</td>\n", + " <td>12.0</td>\n", + " <td>4.9</td>\n", + " <td>Regular Gasoline</td>\n", + " <td>26.2</td>\n", + " <td>16.8</td>\n", + " <td>21.4</td>\n", + " </tr>\n", + " <tr>\n", + " <th>2</th>\n", + " <td>Dodge</td>\n", + " <td>Charger</td>\n", + " <td>1985</td>\n", + " <td>Subcompact Cars</td>\n", + " <td>4.0</td>\n", + " <td>2.2</td>\n", + " <td>Regular Gasoline</td>\n", + " <td>10.2</td>\n", + " <td>7.1</td>\n", + " <td>8.7</td>\n", + " </tr>\n", + " <tr>\n", + " <th>3</th>\n", + " <td>Dodge</td>\n", + " <td>B150/B250 Wagon 2WD</td>\n", + " <td>1985</td>\n", + " <td>Vans</td>\n", + " <td>8.0</td>\n", + " <td>5.2</td>\n", + " <td>Regular Gasoline</td>\n", + " <td>23.5</td>\n", + " <td>19.6</td>\n", + " <td>21.4</td>\n", + " </tr>\n", + " <tr>\n", + " <th>4</th>\n", + " <td>Subaru</td>\n", + " <td>Legacy AWD Turbo</td>\n", + " <td>1993</td>\n", + " <td>Compact Cars</td>\n", + " <td>4.0</td>\n", + " <td>2.2</td>\n", + " <td>Premium Gasoline</td>\n", + " <td>13.8</td>\n", + " <td>10.2</td>\n", + " <td>12.4</td>\n", + " </tr>\n", + " </tbody>\n", + "</table>\n", + "</div>" + ], + "text/plain": [ + " make model year VClass cylinders displ \\\n", + "id \n", + "0 Alfa Romeo Spider Veloce 2000 1985 Two Seaters 4.0 2.0 \n", + "1 Ferrari Testarossa 1985 Two Seaters 12.0 4.9 \n", + "2 Dodge Charger 1985 Subcompact Cars 4.0 2.2 \n", + "3 Dodge B150/B250 Wagon 2WD 1985 Vans 8.0 5.2 \n", + "4 Subaru Legacy AWD Turbo 1993 Compact Cars 4.0 2.2 \n", + "\n", + " fuelType city highway combined \n", + "id \n", + "0 Regular Gasoline 12.4 9.4 11.2 \n", + "1 Regular Gasoline 26.2 16.8 21.4 \n", + "2 Regular Gasoline 10.2 7.1 8.7 \n", + "3 Regular Gasoline 23.5 19.6 21.4 \n", + "4 Premium Gasoline 13.8 10.2 12.4 " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = df_orig.copy()\n", + "df['city'] = df_orig['city'].apply(mpg_to_lp100km)\n", + "df['highway'] = df_orig['highway'].apply(mpg_to_lp100km)\n", + "df['combined'] = df_orig['combined'].apply(mpg_to_lp100km)\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "id": "e87738ba", + "metadata": {}, + "source": [ + "Als nächstes sollen Datenlücken behandelt werden. Zum Finden der Lücken hilft die `info`-Funktion" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "90cc4bae", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "<class 'pandas.core.frame.DataFrame'>\n", + "Int64Index: 46186 entries, 0 to 46185\n", + "Data columns (total 10 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 make 46186 non-null object \n", + " 1 model 46186 non-null object \n", + " 2 year 46186 non-null int64 \n", + " 3 VClass 46186 non-null object \n", + " 4 cylinders 45680 non-null float64\n", + " 5 displ 45682 non-null float64\n", + " 6 fuelType 46186 non-null object \n", + " 7 city 46186 non-null float64\n", + " 8 highway 46186 non-null float64\n", + " 9 combined 46186 non-null float64\n", + "dtypes: float64(5), int64(1), object(4)\n", + "memory usage: 3.9+ MB\n" + ] + } + ], + "source": [ + "df.info()" + ] + }, + { + "cell_type": "markdown", + "id": "c8413d12", + "metadata": {}, + "source": [ + "Man sieht, dass es **46186 Einträge** gibt, für **Zylinderanzahl** und **Hubraum** aber nur **45680** bzw. **45682**. Auch wenn bereits die Vermutung naheliegt, dass diese Einträge zu Elektrofahrzeugen gehören, auf die diese Motordaten nicht zutreffen, untersuchen wir dies, indem wir uns die betreffenden Einträge anzeigen lassen. Dazu nutzen wir die selektive Auswahl und wählen mithilfe von `isnull()` nur die Einträge aus dem Dataframe aus, bei denen der Wert für die Zylinder nicht vorhanden ist. Da wir diesen noch weiter verwenden, speichern wir ihn ab." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "3b66e4aa", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "<div>\n", + "<style scoped>\n", + " .dataframe tbody tr th:only-of-type {\n", + " vertical-align: middle;\n", + " }\n", + "\n", + " .dataframe tbody tr th {\n", + " vertical-align: top;\n", + " }\n", + "\n", + " .dataframe thead th {\n", + " text-align: right;\n", + " }\n", + "</style>\n", + "<table border=\"1\" class=\"dataframe\">\n", + " <thead>\n", + " <tr style=\"text-align: right;\">\n", + " <th></th>\n", + " <th>make</th>\n", + " <th>model</th>\n", + " <th>year</th>\n", + " <th>VClass</th>\n", + " <th>cylinders</th>\n", + " <th>displ</th>\n", + " <th>fuelType</th>\n", + " <th>city</th>\n", + " <th>highway</th>\n", + " <th>combined</th>\n", + " </tr>\n", + " <tr>\n", + " <th>id</th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " </tr>\n", + " </thead>\n", + " <tbody>\n", + " <tr>\n", + " <th>7138</th>\n", + " <td>Nissan</td>\n", + " <td>Altra EV</td>\n", + " <td>2000</td>\n", + " <td>Midsize Station Wagons</td>\n", + " <td>NaN</td>\n", + " <td>NaN</td>\n", + " <td>Electricity</td>\n", + " <td>2.9</td>\n", + " <td>2.6</td>\n", + " <td>2.8</td>\n", + " </tr>\n", + " <tr>\n", + " <th>7139</th>\n", + " <td>Toyota</td>\n", + " <td>RAV4 EV</td>\n", + " <td>2000</td>\n", + " <td>Sport Utility Vehicle - 2WD</td>\n", + " <td>NaN</td>\n", + " <td>NaN</td>\n", + " <td>Electricity</td>\n", + " <td>2.9</td>\n", + " <td>3.7</td>\n", + " <td>3.3</td>\n", + " </tr>\n", + " <tr>\n", + " <th>8143</th>\n", + " <td>Toyota</td>\n", + " <td>RAV4 EV</td>\n", + " <td>2001</td>\n", + " <td>Sport Utility Vehicle - 2WD</td>\n", + " <td>NaN</td>\n", + " <td>NaN</td>\n", + " <td>Electricity</td>\n", + " <td>2.9</td>\n", + " <td>3.7</td>\n", + " <td>3.3</td>\n", + " </tr>\n", + " <tr>\n", + " <th>8144</th>\n", + " <td>Ford</td>\n", + " <td>Th!nk</td>\n", + " <td>2001</td>\n", + " <td>Two Seaters</td>\n", + " <td>NaN</td>\n", + " <td>NaN</td>\n", + " <td>Electricity</td>\n", + " <td>3.2</td>\n", + " <td>4.1</td>\n", + " <td>3.6</td>\n", + " </tr>\n", + " <tr>\n", + " <th>8146</th>\n", + " <td>Ford</td>\n", + " <td>Explorer USPS Electric</td>\n", + " <td>2001</td>\n", + " <td>Sport Utility Vehicle - 2WD</td>\n", + " <td>NaN</td>\n", + " <td>NaN</td>\n", + " <td>Electricity</td>\n", + " <td>5.2</td>\n", + " <td>7.1</td>\n", + " <td>6.0</td>\n", + " </tr>\n", + " <tr>\n", + " <th>...</th>\n", + " <td>...</td>\n", + " <td>...</td>\n", + " <td>...</td>\n", + " <td>...</td>\n", + " <td>...</td>\n", + " <td>...</td>\n", + " <td>...</td>\n", + " <td>...</td>\n", + " <td>...</td>\n", + " <td>...</td>\n", + " </tr>\n", + " <tr>\n", + " <th>40253</th>\n", + " <td>Hyundai</td>\n", + " <td>Ioniq 6 Long range AWD (18 inch Wheels)</td>\n", + " <td>2023</td>\n", + " <td>Midsize Cars</td>\n", + " <td>NaN</td>\n", + " <td>NaN</td>\n", + " <td>Electricity</td>\n", + " <td>1.8</td>\n", + " <td>2.1</td>\n", + " <td>1.9</td>\n", + " </tr>\n", + " <tr>\n", + " <th>40254</th>\n", + " <td>Hyundai</td>\n", + " <td>Ioniq 6 Long range AWD (20 inch Wheels)</td>\n", + " <td>2023</td>\n", + " <td>Midsize Cars</td>\n", + " <td>NaN</td>\n", + " <td>NaN</td>\n", + " <td>Electricity</td>\n", + " <td>2.1</td>\n", + " <td>2.5</td>\n", + " <td>2.3</td>\n", + " </tr>\n", + " <tr>\n", + " <th>40255</th>\n", + " <td>Hyundai</td>\n", + " <td>Ioniq 6 Long range RWD (18 inch Wheels)</td>\n", + " <td>2023</td>\n", + " <td>Midsize Cars</td>\n", + " <td>NaN</td>\n", + " <td>NaN</td>\n", + " <td>Electricity</td>\n", + " <td>1.5</td>\n", + " <td>1.9</td>\n", + " <td>1.7</td>\n", + " </tr>\n", + " <tr>\n", + " <th>40256</th>\n", + " <td>Hyundai</td>\n", + " <td>Ioniq 6 Long range RWD (20 inch Wheels)</td>\n", + " <td>2023</td>\n", + " <td>Midsize Cars</td>\n", + " <td>NaN</td>\n", + " <td>NaN</td>\n", + " <td>Electricity</td>\n", + " <td>1.8</td>\n", + " <td>2.2</td>\n", + " <td>2.0</td>\n", + " </tr>\n", + " <tr>\n", + " <th>40257</th>\n", + " <td>Hyundai</td>\n", + " <td>Ioniq 6 Standard Range RWD</td>\n", + " <td>2023</td>\n", + " <td>Midsize Cars</td>\n", + " <td>NaN</td>\n", + " <td>NaN</td>\n", + " <td>Electricity</td>\n", + " <td>1.6</td>\n", + " <td>2.0</td>\n", + " <td>1.7</td>\n", + " </tr>\n", + " </tbody>\n", + "</table>\n", + "<p>506 rows × 10 columns</p>\n", + "</div>" + ], + "text/plain": [ + " make model year \\\n", + "id \n", + "7138 Nissan Altra EV 2000 \n", + "7139 Toyota RAV4 EV 2000 \n", + "8143 Toyota RAV4 EV 2001 \n", + "8144 Ford Th!nk 2001 \n", + "8146 Ford Explorer USPS Electric 2001 \n", + "... ... ... ... \n", + "40253 Hyundai Ioniq 6 Long range AWD (18 inch Wheels) 2023 \n", + "40254 Hyundai Ioniq 6 Long range AWD (20 inch Wheels) 2023 \n", + "40255 Hyundai Ioniq 6 Long range RWD (18 inch Wheels) 2023 \n", + "40256 Hyundai Ioniq 6 Long range RWD (20 inch Wheels) 2023 \n", + "40257 Hyundai Ioniq 6 Standard Range RWD 2023 \n", + "\n", + " VClass cylinders displ fuelType city \\\n", + "id \n", + "7138 Midsize Station Wagons NaN NaN Electricity 2.9 \n", + "7139 Sport Utility Vehicle - 2WD NaN NaN Electricity 2.9 \n", + "8143 Sport Utility Vehicle - 2WD NaN NaN Electricity 2.9 \n", + "8144 Two Seaters NaN NaN Electricity 3.2 \n", + "8146 Sport Utility Vehicle - 2WD NaN NaN Electricity 5.2 \n", + "... ... ... ... ... ... \n", + "40253 Midsize Cars NaN NaN Electricity 1.8 \n", + "40254 Midsize Cars NaN NaN Electricity 2.1 \n", + "40255 Midsize Cars NaN NaN Electricity 1.5 \n", + "40256 Midsize Cars NaN NaN Electricity 1.8 \n", + "40257 Midsize Cars NaN NaN Electricity 1.6 \n", + "\n", + " highway combined \n", + "id \n", + "7138 2.6 2.8 \n", + "7139 3.7 3.3 \n", + "8143 3.7 3.3 \n", + "8144 4.1 3.6 \n", + "8146 7.1 6.0 \n", + "... ... ... \n", + "40253 2.1 1.9 \n", + "40254 2.5 2.3 \n", + "40255 1.9 1.7 \n", + "40256 2.2 2.0 \n", + "40257 2.0 1.7 \n", + "\n", + "[506 rows x 10 columns]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_nulls = df[ df[\"cylinders\"].isnull() ]\n", + "df_nulls" + ] + }, + { + "cell_type": "markdown", + "id": "5d830244", + "metadata": {}, + "source": [ + "Es sieht so aus, als wären dies alles Elektrofahrzeuge. Um sicher zu sein, lassen wir mithilfe von `describe()` Informationen über die vorhandenen fuelTypes anzeigen." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "474ac122", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "count 506\n", + "unique 2\n", + "top Electricity\n", + "freq 503\n", + "Name: fuelType, dtype: object" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_nulls[\"fuelType\"].describe()" + ] + }, + { + "cell_type": "markdown", + "id": "b33853f1", + "metadata": {}, + "source": [ + "Nur 503 der 506 Einträge haben den Fueltype \"Electricity\". Um herauszufinden, was es mit diesen Daten auf sich hat, verwenden wir `unique()` um alle darin vorkommenden Werte zu erhalten." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "6c2684f0", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['Electricity', 'Regular Gasoline'], dtype=object)" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_nulls[\"fuelType\"].unique()" + ] + }, + { + "cell_type": "markdown", + "id": "8321c2fa", + "metadata": {}, + "source": [ + "Um die 3 Benzin-Fahrzeuge mit fehlenden Informationen zum Motor anzuzeigen, filtern wir den Dataframe nach dem Fueltype." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "fc1c3f5f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "<div>\n", + "<style scoped>\n", + " .dataframe tbody tr th:only-of-type {\n", + " vertical-align: middle;\n", + " }\n", + "\n", + " .dataframe tbody tr th {\n", + " vertical-align: top;\n", + " }\n", + "\n", + " .dataframe thead th {\n", + " text-align: right;\n", + " }\n", + "</style>\n", + "<table border=\"1\" class=\"dataframe\">\n", + " <thead>\n", + " <tr style=\"text-align: right;\">\n", + " <th></th>\n", + " <th>make</th>\n", + " <th>model</th>\n", + " <th>year</th>\n", + " <th>VClass</th>\n", + " <th>cylinders</th>\n", + " <th>displ</th>\n", + " <th>fuelType</th>\n", + " <th>city</th>\n", + " <th>highway</th>\n", + " <th>combined</th>\n", + " </tr>\n", + " <tr>\n", + " <th>id</th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " </tr>\n", + " </thead>\n", + " <tbody>\n", + " <tr>\n", + " <th>21410</th>\n", + " <td>Subaru</td>\n", + " <td>RX Turbo</td>\n", + " <td>1985</td>\n", + " <td>Subcompact Cars</td>\n", + " <td>NaN</td>\n", + " <td>NaN</td>\n", + " <td>Regular Gasoline</td>\n", + " <td>10.7</td>\n", + " <td>8.4</td>\n", + " <td>9.8</td>\n", + " </tr>\n", + " <tr>\n", + " <th>21411</th>\n", + " <td>Subaru</td>\n", + " <td>RX Turbo</td>\n", + " <td>1985</td>\n", + " <td>Subcompact Cars</td>\n", + " <td>NaN</td>\n", + " <td>NaN</td>\n", + " <td>Regular Gasoline</td>\n", + " <td>11.2</td>\n", + " <td>8.7</td>\n", + " <td>10.2</td>\n", + " </tr>\n", + " <tr>\n", + " <th>21500</th>\n", + " <td>Mazda</td>\n", + " <td>RX-7</td>\n", + " <td>1986</td>\n", + " <td>Two Seaters</td>\n", + " <td>NaN</td>\n", + " <td>1.3</td>\n", + " <td>Regular Gasoline</td>\n", + " <td>15.7</td>\n", + " <td>10.7</td>\n", + " <td>13.1</td>\n", + " </tr>\n", + " </tbody>\n", + "</table>\n", + "</div>" + ], + "text/plain": [ + " make model year VClass cylinders displ \\\n", + "id \n", + "21410 Subaru RX Turbo 1985 Subcompact Cars NaN NaN \n", + "21411 Subaru RX Turbo 1985 Subcompact Cars NaN NaN \n", + "21500 Mazda RX-7 1986 Two Seaters NaN 1.3 \n", + "\n", + " fuelType city highway combined \n", + "id \n", + "21410 Regular Gasoline 10.7 8.4 9.8 \n", + "21411 Regular Gasoline 11.2 8.7 10.2 \n", + "21500 Regular Gasoline 15.7 10.7 13.1 " + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_nulls[ df_nulls[\"fuelType\"] == \"Regular Gasoline\" ]" + ] + }, + { + "cell_type": "markdown", + "id": "300b0882", + "metadata": {}, + "source": [ + "Hier verzichten wir zunächst darauf, den fehlenden Daten über den Hubraum nachzugehen.\n", + "\n", + "Es muss nun entschieden werden, wie wir mit den fehlenden Daten umgehen wollen, um möglichst aussagekräftige Daten zu behalten. \n", + "\n", + "Eine Möglichkeit (die wir in dieser Übung verwenden) ist, allen Elektrofahrzeugen bei Zylinderanzahl und Hubraum den Wert `0.0` einzutragen und die wenigen übrigen Datensätze, in denen diese Informationen fehlen, zu entfernen. \n", + "\n", + "Wir beginnen mit dem Entfernen aller Einträge, bei denen Zylinder keinen Wert hat, obwohl es keine Elektrofahrzeuge sind. Einträge entfernen können wir mit mit der Dataframe-Methode `drop([liste von indizes], inplace = True)`. Nun benötigen wir eine Liste von Indizes. Wir sehen diese zwar in der Tabelle vom Schritt zuvor, können diese aber auch automatisch ermitteln. Ein Dataframe hat das Attribut `index`, in dem eine Liste aller vorhandenen Indizes gespeichert ist. Wir können also dieses Attribut des im Schritt zuvor gefilterten dataframe verwenden, um der Drop-Methode die Indizes zu übergeben. Wir überprüfen das Ergebnis mit `info()`." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "7f4c738f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "<class 'pandas.core.frame.DataFrame'>\n", + "Int64Index: 46183 entries, 0 to 46185\n", + "Data columns (total 10 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 make 46183 non-null object \n", + " 1 model 46183 non-null object \n", + " 2 year 46183 non-null int64 \n", + " 3 VClass 46183 non-null object \n", + " 4 cylinders 45680 non-null float64\n", + " 5 displ 45681 non-null float64\n", + " 6 fuelType 46183 non-null object \n", + " 7 city 46183 non-null float64\n", + " 8 highway 46183 non-null float64\n", + " 9 combined 46183 non-null float64\n", + "dtypes: float64(5), int64(1), object(4)\n", + "memory usage: 3.9+ MB\n" + ] + } + ], + "source": [ + "delete_df = df_nulls[ df_nulls[\"fuelType\"] == \"Regular Gasoline\" ]\n", + "df.drop( delete_df.index, inplace = True )\n", + "df.info()" + ] + }, + { + "cell_type": "markdown", + "id": "b3d9faef", + "metadata": {}, + "source": [ + "Nun sind diese 3 Fahrzeuge aus dem Dataframe entfernt. Wir haben bisher nicht die fehlenden Hubraum-Daten untersucht. Da wir bereits wissen, dass wir bei den Elektrofahrzeugen die Werte auf `0.0` setzen werden, müssen wir lediglich überprüfen, ob es noch Einträge ohne angegebenen Hubraum gibt, die keine Elektrofahrzeuge sind. Dazu filtern wir den Dataframe mit einer kombinierten Bedingung. (Achtung, im Zusammenhang mit dem Filtern von Dataframes werden die Operatoren `&` für `and` und `|` für `or` benutzt, und jede Bedingung muss in Klammern gesetzt werden). " + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "c8521251", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "<div>\n", + "<style scoped>\n", + " .dataframe tbody tr th:only-of-type {\n", + " vertical-align: middle;\n", + " }\n", + "\n", + " .dataframe tbody tr th {\n", + " vertical-align: top;\n", + " }\n", + "\n", + " .dataframe thead th {\n", + " text-align: right;\n", + " }\n", + "</style>\n", + "<table border=\"1\" class=\"dataframe\">\n", + " <thead>\n", + " <tr style=\"text-align: right;\">\n", + " <th></th>\n", + " <th>make</th>\n", + " <th>model</th>\n", + " <th>year</th>\n", + " <th>VClass</th>\n", + " <th>cylinders</th>\n", + " <th>displ</th>\n", + " <th>fuelType</th>\n", + " <th>city</th>\n", + " <th>highway</th>\n", + " <th>combined</th>\n", + " </tr>\n", + " <tr>\n", + " <th>id</th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " </tr>\n", + " </thead>\n", + " <tbody>\n", + " </tbody>\n", + "</table>\n", + "</div>" + ], + "text/plain": [ + "Empty DataFrame\n", + "Columns: [make, model, year, VClass, cylinders, displ, fuelType, city, highway, combined]\n", + "Index: []" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[ ( df[\"displ\"].isnull() ) & ( df[\"fuelType\"] != \"Electricity\" ) ]" + ] + }, + { + "cell_type": "markdown", + "id": "fe26b2d4", + "metadata": {}, + "source": [ + "Dieser Dataframe ist leer. Das bedeutet, alle übrigenen fehlenden Daten gehören zu Elektrofahrzeugen. Wir können somit die Methode `fillna(0.0, inplace = True)` auf den gesamten Dataframe anwenden und wissen, dass dies nur noch Elektrofahrzeuge bearbeit. Wir überprüfen das Ergebnis mit `info()`." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "cd124b9b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "<class 'pandas.core.frame.DataFrame'>\n", + "Int64Index: 46183 entries, 0 to 46185\n", + "Data columns (total 10 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 make 46183 non-null object \n", + " 1 model 46183 non-null object \n", + " 2 year 46183 non-null int64 \n", + " 3 VClass 46183 non-null object \n", + " 4 cylinders 46183 non-null float64\n", + " 5 displ 46183 non-null float64\n", + " 6 fuelType 46183 non-null object \n", + " 7 city 46183 non-null float64\n", + " 8 highway 46183 non-null float64\n", + " 9 combined 46183 non-null float64\n", + "dtypes: float64(5), int64(1), object(4)\n", + "memory usage: 3.9+ MB\n" + ] + } + ], + "source": [ + "df.fillna( 0.0, inplace = True )\n", + "df.info()" + ] + }, + { + "cell_type": "markdown", + "id": "c6cb248c", + "metadata": {}, + "source": [ + "#### <font color='blue'>**Datenanalyse**</font>\n", + "\n", + "Nun können wir die Daten nach belieben analysieren. Dabei orientieren wir uns an den Fragestellungen aus der Problembeschreibung.\n", + "\n", + "<font color='blue'>*1) Zusammenhang zwischen Hubraum und Verbrauch, sowie Jahr und Verbrauch bei allen PKW (z.B. Korrelation, Scatter-plot, Liniendiagramm mit Median-Verbrauch über die Zeit)*\n", + "\n", + "Zunächst erstellen wir eine Korrelationsmatrix." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "3dca5c9e", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/tmp/ipykernel_4268/1134722465.py:1: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.\n", + " df.corr()\n" + ] + }, + { + "data": { + "text/html": [ + "<div>\n", + "<style scoped>\n", + " .dataframe tbody tr th:only-of-type {\n", + " vertical-align: middle;\n", + " }\n", + "\n", + " .dataframe tbody tr th {\n", + " vertical-align: top;\n", + " }\n", + "\n", + " .dataframe thead th {\n", + " text-align: right;\n", + " }\n", + "</style>\n", + "<table border=\"1\" class=\"dataframe\">\n", + " <thead>\n", + " <tr style=\"text-align: right;\">\n", + " <th></th>\n", + " <th>year</th>\n", + " <th>cylinders</th>\n", + " <th>displ</th>\n", + " <th>city</th>\n", + " <th>highway</th>\n", + " <th>combined</th>\n", + " </tr>\n", + " </thead>\n", + " <tbody>\n", + " <tr>\n", + " <th>year</th>\n", + " <td>1.000000</td>\n", + " <td>0.001967</td>\n", + " <td>-0.032015</td>\n", + " <td>-0.256472</td>\n", + " <td>-0.325967</td>\n", + " <td>-0.288675</td>\n", + " </tr>\n", + " <tr>\n", + " <th>cylinders</th>\n", + " <td>0.001967</td>\n", + " <td>1.000000</td>\n", + " <td>0.910062</td>\n", + " <td>0.798197</td>\n", + " <td>0.684470</td>\n", + " <td>0.767443</td>\n", + " </tr>\n", + " <tr>\n", + " <th>displ</th>\n", + " <td>-0.032015</td>\n", + " <td>0.910062</td>\n", + " <td>1.000000</td>\n", + " <td>0.817058</td>\n", + " <td>0.739412</td>\n", + " <td>0.800452</td>\n", + " </tr>\n", + " <tr>\n", + " <th>city</th>\n", + " <td>-0.256472</td>\n", + " <td>0.798197</td>\n", + " <td>0.817058</td>\n", + " <td>1.000000</td>\n", + " <td>0.932662</td>\n", + " <td>0.985919</td>\n", + " </tr>\n", + " <tr>\n", + " <th>highway</th>\n", + " <td>-0.325967</td>\n", + " <td>0.684470</td>\n", + " <td>0.739412</td>\n", + " <td>0.932662</td>\n", + " <td>1.000000</td>\n", + " <td>0.971547</td>\n", + " </tr>\n", + " <tr>\n", + " <th>combined</th>\n", + " <td>-0.288675</td>\n", + " <td>0.767443</td>\n", + " <td>0.800452</td>\n", + " <td>0.985919</td>\n", + " <td>0.971547</td>\n", + " <td>1.000000</td>\n", + " </tr>\n", + " </tbody>\n", + "</table>\n", + "</div>" + ], + "text/plain": [ + " year cylinders displ city highway combined\n", + "year 1.000000 0.001967 -0.032015 -0.256472 -0.325967 -0.288675\n", + "cylinders 0.001967 1.000000 0.910062 0.798197 0.684470 0.767443\n", + "displ -0.032015 0.910062 1.000000 0.817058 0.739412 0.800452\n", + "city -0.256472 0.798197 0.817058 1.000000 0.932662 0.985919\n", + "highway -0.325967 0.684470 0.739412 0.932662 1.000000 0.971547\n", + "combined -0.288675 0.767443 0.800452 0.985919 0.971547 1.000000" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.corr()" + ] + }, + { + "cell_type": "markdown", + "id": "73b47921", + "metadata": {}, + "source": [ + "Aus diesen Werten erkennen wir bereits, dass eine recht **starke Korrelation zwischen Hubraum und und Verbrauch** besteht, und dass eine **schwache negative Korrelation zwischen dem Jahr und dem Verbrauch** besteht. Das kann man so interpretieren, dass in späteren Jahren tendenziell verbrauchsärmere Autos entwickelt wurden als in früheren, aber viel Streuung vorhanden ist. Diesen Zusammenhang wollen wir mithilfe von Scatter-Plots noch genauer untersuchen. Zunächst erstellen wir den Scatterplot für Hubraum und Verbrauch, bei dem eine recht starke Korrelation (0.8) ermittelt wurde, als Referenz." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "292e5918", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "<Axes: xlabel='displ', ylabel='combined'>" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "<Figure size 1000x800 with 1 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import matplotlib.pyplot as plt\n", + "plt.rcParams.update({'font.size': 10, 'figure.figsize': (10, 8)}) # Diese zwei Zeilen vergrößern die Plots, die pandas generiert\n", + "\n", + "df.plot( kind='scatter', x='displ', y='combined' )" + ] + }, + { + "cell_type": "markdown", + "id": "af04ff5d", + "metadata": {}, + "source": [ + "Man kann gut die Tendenz erkennen, dass größerer Hubraum tendenziell mit größerem Verbrauch einhergeht.\n", + "Betrachten wir den Plot für die jährliche Entwicklung, bei der die Korrelation mit dem Betrag 0.29 kleiner ist:" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "9c0601a7", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "<Axes: xlabel='year', ylabel='combined'>" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "<Figure size 1000x800 with 1 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "df.plot( kind='scatter', x='year', y='combined' )" + ] + }, + { + "cell_type": "markdown", + "id": "34f255df", + "metadata": {}, + "source": [ + "Hier sieht man deutlich **größere Streuung**. Dennoch ist auch hier die Tendenz zu erkennen. \n", + "\n", + "Eine weitere aufschlussreiche Grafik ist ein Plot des Medianverbrauchs. Der Median ist der Wert, der die oberen 50% von den unteren 50% trennt. Für einen solchen Plot hat pandas keine direkte Methode implementiert, wir müssen den Plot also manuell erstellen.\n", + "\n", + "Durch Nutzung der Methode `groupby()` können wir die Daten nach Jahr gruppieren (standardmäßig werden die Gruppen aufsteigend sortiert). Mit dem Indexoperator können wir daraus den Verbrauchswert auswählen und mit `median()` dann den Median aller Verbrauchswerte in dem entsprechenden Jahr erhalten. Die Verwendung dieser Methoden ergibt:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "6130ecbc", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "year\n", + "1984 12.4\n", + "1985 12.4\n", + "1986 12.4\n", + "1987 12.4\n", + "1988 12.4\n", + "1989 12.4\n", + "1990 13.1\n", + "1991 13.1\n", + "1992 13.1\n", + "1993 13.1\n", + "1994 13.1\n", + "1995 13.1\n", + "1996 12.4\n", + "1997 12.4\n", + "1998 12.4\n", + "1999 12.4\n", + "2000 12.4\n", + "2001 12.4\n", + "2002 12.4\n", + "2003 12.4\n", + "2004 12.4\n", + "2005 12.4\n", + "2006 12.4\n", + "2007 12.4\n", + "2008 12.4\n", + "2009 11.8\n", + "2010 11.8\n", + "2011 11.8\n", + "2012 11.2\n", + "2013 11.2\n", + "2014 10.7\n", + "2015 10.7\n", + "2016 10.2\n", + "2017 10.2\n", + "2018 10.2\n", + "2019 10.2\n", + "2020 10.2\n", + "2021 10.2\n", + "2022 10.2\n", + "2023 10.2\n", + "2024 9.4\n", + "Name: combined, dtype: float64" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.groupby(\"year\")[\"combined\"].median()" + ] + }, + { + "cell_type": "markdown", + "id": "10e21ceb", + "metadata": {}, + "source": [ + "Es handelt sich dabei um einen pandas-Datentyp. Wir können daraus mittels `.keys()` eine Liste der Jahre erhalten, und per Konvertierung zu einer Liste eine Liste der dazugehörigen Werte. Diese können wie gewohnt mit plt geplotted werden." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "2b267cf1", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "<Figure size 1000x800 with 1 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "temp_medians = df.groupby(\"year\")[\"combined\"].median()\n", + "years = temp_medians.keys()\n", + "combined_yearly_median = list(temp_medians)\n", + "\n", + "plt.plot(years, combined_yearly_median)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "d4057fb9", + "metadata": {}, + "source": [ + "In diesem Diagramm wird die sinkende Tendenz des Verbrauchs noch einmal sichtbar. Es bleibt aber zu beachten, dass hier nur Fahrzeugmodelle ausgewertet werden. Es werden keine Aussagen über die Verkaufszahlen und Flottenstärke gemacht.\n", + "\n", + "<font color='blue'>*2) Zusammenhang der Verbrauchsdaten vom verwendeten Kraftstoff (Boxplot mit Quartilen)*\n", + " \n", + "Für diesen Plot können wir direkt die Methode `boxplot`von pandas verwenden\n" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "e576a84c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "<Axes: title={'center': 'combined'}, xlabel='fuelType'>" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "<Figure size 1000x800 with 1 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "df.boxplot(column='combined', by='fuelType')" + ] + }, + { + "cell_type": "markdown", + "id": "f49c89df", + "metadata": {}, + "source": [ + "<font color='blue'>*3) Vergleich der Verbrauchsdaten der deutschen Marken Audi, BMW, Mercedes-Benz, Porsche und Volkswagen (Boxplot mit Quartilen)*\n", + " \n", + "Hier kann wieder der Boxplot verwendet werden, allerdings muss zuvor der Dataframe gefiltert werden. Das Kriterium ist, ob der Hersteller einem der 5 Hersteller entspricht. Man könnte dazu wie bereits bei der Datenaufbereitung eine kombinierte Vergleichsoperation aufbauen, allerdings wird der Code dann schnell lang. Stattdessen gibt es mit `isin()` eine Methode, die für jedes Element überprüft, ob es in der übergebenen Liste enthalten ist. Anstelle also nach allen gesuchten Marken händisch zu überprüfen, erstellen wir eine Liste mit allen gewünschten Marken und verwenden die einfache (nicht kombinierte) Bedingung mit `isin()`." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "76c128ad", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "<Axes: title={'center': 'combined'}, xlabel='make'>" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "<Figure size 1000x800 with 1 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "german_makes = [\"Volkswagen\",\"BMW\",\"Mercedes-Benz\",\"Porsche\", \"Audi\"]\n", + "\n", + "df[ df['make'].isin(german_makes) ].boxplot(column='combined', by='make')" + ] + }, + { + "cell_type": "markdown", + "id": "3f7d3807", + "metadata": {}, + "source": [ + "<font color='blue'>*4) Vergleich der Verbrauchsdaten der 5 häufigsten Fahrzeugklassen (Boxplot mit Quartilen, Klassen automatisiert ermitteln)*\n", + " \n", + "Prinzipiell ist das Vorgehen wie bei den Marken. Wir erstellen lediglich die Liste von Fahrzeugklassen nicht selbst, sondern nutzen pandas, um die 5 häufigsten Fahrzeugklassen zu ermitteln. Dazu können wir die Methode `value_counts` auf VClass anwenden. Das Ergebnis ist ähnlich wie bei `groupby()` eine Zusammenstellung der Kategorie mit der jeweiligen Anzahl, absteigend sortiert. Unsere 5 gesuchten Klassen sind die ersten 5 Keys." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "31415b78", + "metadata": {}, + "outputs": [], + "source": [ + "counts = df[\"VClass\"].value_counts()\n", + "# counts # entkommentieren zum Betrachten des Ergebnisses" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "65ce7e2f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "<Axes: title={'center': 'combined'}, xlabel='VClass'>" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "<Figure size 1000x800 with 1 Axes>" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "classes = counts.keys()[:5]\n", + "\n", + "df[ df['VClass'].isin(classes) ].boxplot(column='combined', by='VClass')" + ] + }, + { + "cell_type": "markdown", + "id": "c69ceeac", + "metadata": {}, + "source": [ + "<font color='blue'> *5) Ermitteln der 15 verbrauchsarmsten Fahrzeuge eines gewählten Herstellers (oder Klasse) unter Ausschluss bestimmter Kraftstoffarten (z.B. ausgenommen Elektrofahrzeuge)*\n", + " \n", + "Hier können wir wieder mehrere Funktionen kombinieren. Zunächst filtern wir mithilfe einer kombinierten Bedingung den Dataframe nach \"nicht-Elektrofahrzeugen\" und den gewählten Marken. Auf das Ergebnis wenden wir die Methode `sort_values()`an, wobei wir den Verbrauch als Kriterium angeben. Dies sortiert den Dataframe nach aufsteigendem Verbrauch. Von diesem Ergebnis wiederum lassen wir die ersten 15 Einträge mittel `head()` anzeigen." + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "32c6d93f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "<div>\n", + "<style scoped>\n", + " .dataframe tbody tr th:only-of-type {\n", + " vertical-align: middle;\n", + " }\n", + "\n", + " .dataframe tbody tr th {\n", + " vertical-align: top;\n", + " }\n", + "\n", + " .dataframe thead th {\n", + " text-align: right;\n", + " }\n", + "</style>\n", + "<table border=\"1\" class=\"dataframe\">\n", + " <thead>\n", + " <tr style=\"text-align: right;\">\n", + " <th></th>\n", + " <th>make</th>\n", + " <th>model</th>\n", + " <th>year</th>\n", + " <th>VClass</th>\n", + " <th>cylinders</th>\n", + " <th>displ</th>\n", + " <th>fuelType</th>\n", + " <th>city</th>\n", + " <th>highway</th>\n", + " <th>combined</th>\n", + " </tr>\n", + " <tr>\n", + " <th>id</th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " <th></th>\n", + " </tr>\n", + " </thead>\n", + " <tbody>\n", + " <tr>\n", + " <th>28193</th>\n", + " <td>Volkswagen</td>\n", + " <td>Jetta Hybrid</td>\n", + " <td>2015</td>\n", + " <td>Compact Cars</td>\n", + " <td>4.0</td>\n", + " <td>1.4</td>\n", + " <td>Premium Gasoline</td>\n", + " <td>5.7</td>\n", + " <td>4.9</td>\n", + " <td>5.4</td>\n", + " </tr>\n", + " <tr>\n", + " <th>26546</th>\n", + " <td>Volkswagen</td>\n", + " <td>Jetta Hybrid</td>\n", + " <td>2014</td>\n", + " <td>Compact Cars</td>\n", + " <td>4.0</td>\n", + " <td>1.4</td>\n", + " <td>Premium Gasoline</td>\n", + " <td>5.7</td>\n", + " <td>4.9</td>\n", + " <td>5.4</td>\n", + " </tr>\n", + " <tr>\n", + " <th>29450</th>\n", + " <td>Volkswagen</td>\n", + " <td>Jetta Hybrid</td>\n", + " <td>2016</td>\n", + " <td>Compact Cars</td>\n", + " <td>4.0</td>\n", + " <td>1.4</td>\n", + " <td>Premium Gasoline</td>\n", + " <td>5.6</td>\n", + " <td>4.9</td>\n", + " <td>5.4</td>\n", + " </tr>\n", + " <tr>\n", + " <th>25661</th>\n", + " <td>Volkswagen</td>\n", + " <td>Jetta Hybrid</td>\n", + " <td>2013</td>\n", + " <td>Compact Cars</td>\n", + " <td>4.0</td>\n", + " <td>1.4</td>\n", + " <td>Premium Gasoline</td>\n", + " <td>5.7</td>\n", + " <td>4.9</td>\n", + " <td>5.4</td>\n", + " </tr>\n", + " <tr>\n", + " <th>9770</th>\n", + " <td>Volkswagen</td>\n", + " <td>Jetta Wagon</td>\n", + " <td>2003</td>\n", + " <td>Small Station Wagons</td>\n", + " <td>4.0</td>\n", + " <td>1.9</td>\n", + " <td>Diesel</td>\n", + " <td>6.7</td>\n", + " <td>5.2</td>\n", + " <td>6.0</td>\n", + " </tr>\n", + " <tr>\n", + " <th>8698</th>\n", + " <td>Volkswagen</td>\n", + " <td>Jetta Wagon</td>\n", + " <td>2002</td>\n", + " <td>Small Station Wagons</td>\n", + " <td>4.0</td>\n", + " <td>1.9</td>\n", + " <td>Diesel</td>\n", + " <td>6.7</td>\n", + " <td>5.2</td>\n", + " <td>6.0</td>\n", + " </tr>\n", + " <tr>\n", + " <th>29788</th>\n", + " <td>Audi</td>\n", + " <td>A3 e-tron ultra</td>\n", + " <td>2016</td>\n", + " <td>Compact Cars</td>\n", + " <td>4.0</td>\n", + " <td>1.4</td>\n", + " <td>Premium Gasoline</td>\n", + " <td>6.4</td>\n", + " <td>5.7</td>\n", + " <td>6.0</td>\n", + " </tr>\n", + " <tr>\n", + " <th>27756</th>\n", + " <td>BMW</td>\n", + " <td>i3 REX</td>\n", + " <td>2014</td>\n", + " <td>Subcompact Cars</td>\n", + " <td>2.0</td>\n", + " <td>0.6</td>\n", + " <td>Premium Gasoline</td>\n", + " <td>5.7</td>\n", + " <td>6.4</td>\n", + " <td>6.0</td>\n", + " </tr>\n", + " <tr>\n", + " <th>29889</th>\n", + " <td>BMW</td>\n", + " <td>i3 REX</td>\n", + " <td>2016</td>\n", + " <td>Subcompact Cars</td>\n", + " <td>2.0</td>\n", + " <td>0.6</td>\n", + " <td>Premium Gasoline</td>\n", + " <td>5.7</td>\n", + " <td>6.4</td>\n", + " <td>6.0</td>\n", + " </tr>\n", + " <tr>\n", + " <th>28586</th>\n", + " <td>BMW</td>\n", + " <td>i3 REX</td>\n", + " <td>2015</td>\n", + " <td>Subcompact Cars</td>\n", + " <td>2.0</td>\n", + " <td>0.6</td>\n", + " <td>Premium Gasoline</td>\n", + " <td>5.7</td>\n", + " <td>6.4</td>\n", + " <td>6.0</td>\n", + " </tr>\n", + " <tr>\n", + " <th>9392</th>\n", + " <td>Volkswagen</td>\n", + " <td>New Beetle</td>\n", + " <td>2003</td>\n", + " <td>Subcompact Cars</td>\n", + " <td>4.0</td>\n", + " <td>1.9</td>\n", + " <td>Diesel</td>\n", + " <td>6.7</td>\n", + " <td>5.4</td>\n", + " <td>6.2</td>\n", + " </tr>\n", + " <tr>\n", + " <th>7297</th>\n", + " <td>Volkswagen</td>\n", + " <td>New Beetle</td>\n", + " <td>2001</td>\n", + " <td>Subcompact Cars</td>\n", + " <td>4.0</td>\n", + " <td>1.9</td>\n", + " <td>Diesel</td>\n", + " <td>6.7</td>\n", + " <td>5.4</td>\n", + " <td>6.2</td>\n", + " </tr>\n", + " <tr>\n", + " <th>5625</th>\n", + " <td>Volkswagen</td>\n", + " <td>New Jetta</td>\n", + " <td>1999</td>\n", + " <td>Compact Cars</td>\n", + " <td>4.0</td>\n", + " <td>1.9</td>\n", + " <td>Diesel</td>\n", + " <td>6.7</td>\n", + " <td>5.4</td>\n", + " <td>6.2</td>\n", + " </tr>\n", + " <tr>\n", + " <th>9573</th>\n", + " <td>Volkswagen</td>\n", + " <td>Jetta</td>\n", + " <td>2003</td>\n", + " <td>Compact Cars</td>\n", + " <td>4.0</td>\n", + " <td>1.9</td>\n", + " <td>Diesel</td>\n", + " <td>6.7</td>\n", + " <td>5.4</td>\n", + " <td>6.2</td>\n", + " </tr>\n", + " <tr>\n", + " <th>7455</th>\n", + " <td>Volkswagen</td>\n", + " <td>Golf</td>\n", + " <td>2001</td>\n", + " <td>Compact Cars</td>\n", + " <td>4.0</td>\n", + " <td>1.9</td>\n", + " <td>Diesel</td>\n", + " <td>6.7</td>\n", + " <td>5.4</td>\n", + " <td>6.2</td>\n", + " </tr>\n", + " <tr>\n", + " <th>7465</th>\n", + " <td>Volkswagen</td>\n", + " <td>Jetta</td>\n", + " <td>2001</td>\n", + " <td>Compact Cars</td>\n", + " <td>4.0</td>\n", + " <td>1.9</td>\n", + " <td>Diesel</td>\n", + " <td>6.7</td>\n", + " <td>5.4</td>\n", + " <td>6.2</td>\n", + " </tr>\n", + " <tr>\n", + " <th>3107</th>\n", + " <td>Volkswagen</td>\n", + " <td>Jetta</td>\n", + " <td>1996</td>\n", + " <td>Compact Cars</td>\n", + " <td>4.0</td>\n", + " <td>1.9</td>\n", + " <td>Diesel</td>\n", + " <td>6.9</td>\n", + " <td>5.4</td>\n", + " <td>6.2</td>\n", + " </tr>\n", + " <tr>\n", + " <th>5487</th>\n", + " <td>Volkswagen</td>\n", + " <td>New Beetle</td>\n", + " <td>1999</td>\n", + " <td>Subcompact Cars</td>\n", + " <td>4.0</td>\n", + " <td>1.9</td>\n", + " <td>Diesel</td>\n", + " <td>6.7</td>\n", + " <td>5.4</td>\n", + " <td>6.2</td>\n", + " </tr>\n", + " <tr>\n", + " <th>3103</th>\n", + " <td>Volkswagen</td>\n", + " <td>Golf/GTI</td>\n", + " <td>1996</td>\n", + " <td>Compact Cars</td>\n", + " <td>4.0</td>\n", + " <td>1.9</td>\n", + " <td>Diesel</td>\n", + " <td>6.9</td>\n", + " <td>5.4</td>\n", + " <td>6.2</td>\n", + " </tr>\n", + " <tr>\n", + " <th>8499</th>\n", + " <td>Volkswagen</td>\n", + " <td>Jetta</td>\n", + " <td>2002</td>\n", + " <td>Compact Cars</td>\n", + " <td>4.0</td>\n", + " <td>1.9</td>\n", + " <td>Diesel</td>\n", + " <td>6.7</td>\n", + " <td>5.4</td>\n", + " <td>6.2</td>\n", + " </tr>\n", + " </tbody>\n", + "</table>\n", + "</div>" + ], + "text/plain": [ + " make model year VClass cylinders \\\n", + "id \n", + "28193 Volkswagen Jetta Hybrid 2015 Compact Cars 4.0 \n", + "26546 Volkswagen Jetta Hybrid 2014 Compact Cars 4.0 \n", + "29450 Volkswagen Jetta Hybrid 2016 Compact Cars 4.0 \n", + "25661 Volkswagen Jetta Hybrid 2013 Compact Cars 4.0 \n", + "9770 Volkswagen Jetta Wagon 2003 Small Station Wagons 4.0 \n", + "8698 Volkswagen Jetta Wagon 2002 Small Station Wagons 4.0 \n", + "29788 Audi A3 e-tron ultra 2016 Compact Cars 4.0 \n", + "27756 BMW i3 REX 2014 Subcompact Cars 2.0 \n", + "29889 BMW i3 REX 2016 Subcompact Cars 2.0 \n", + "28586 BMW i3 REX 2015 Subcompact Cars 2.0 \n", + "9392 Volkswagen New Beetle 2003 Subcompact Cars 4.0 \n", + "7297 Volkswagen New Beetle 2001 Subcompact Cars 4.0 \n", + "5625 Volkswagen New Jetta 1999 Compact Cars 4.0 \n", + "9573 Volkswagen Jetta 2003 Compact Cars 4.0 \n", + "7455 Volkswagen Golf 2001 Compact Cars 4.0 \n", + "7465 Volkswagen Jetta 2001 Compact Cars 4.0 \n", + "3107 Volkswagen Jetta 1996 Compact Cars 4.0 \n", + "5487 Volkswagen New Beetle 1999 Subcompact Cars 4.0 \n", + "3103 Volkswagen Golf/GTI 1996 Compact Cars 4.0 \n", + "8499 Volkswagen Jetta 2002 Compact Cars 4.0 \n", + "\n", + " displ fuelType city highway combined \n", + "id \n", + "28193 1.4 Premium Gasoline 5.7 4.9 5.4 \n", + "26546 1.4 Premium Gasoline 5.7 4.9 5.4 \n", + "29450 1.4 Premium Gasoline 5.6 4.9 5.4 \n", + "25661 1.4 Premium Gasoline 5.7 4.9 5.4 \n", + "9770 1.9 Diesel 6.7 5.2 6.0 \n", + "8698 1.9 Diesel 6.7 5.2 6.0 \n", + "29788 1.4 Premium Gasoline 6.4 5.7 6.0 \n", + "27756 0.6 Premium Gasoline 5.7 6.4 6.0 \n", + "29889 0.6 Premium Gasoline 5.7 6.4 6.0 \n", + "28586 0.6 Premium Gasoline 5.7 6.4 6.0 \n", + "9392 1.9 Diesel 6.7 5.4 6.2 \n", + "7297 1.9 Diesel 6.7 5.4 6.2 \n", + "5625 1.9 Diesel 6.7 5.4 6.2 \n", + "9573 1.9 Diesel 6.7 5.4 6.2 \n", + "7455 1.9 Diesel 6.7 5.4 6.2 \n", + "7465 1.9 Diesel 6.7 5.4 6.2 \n", + "3107 1.9 Diesel 6.9 5.4 6.2 \n", + "5487 1.9 Diesel 6.7 5.4 6.2 \n", + "3103 1.9 Diesel 6.9 5.4 6.2 \n", + "8499 1.9 Diesel 6.7 5.4 6.2 " + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[ ( df['fuelType'] != \"Electricity\" ) & df[\"make\"].isin(german_makes) ].sort_values(\"combined\").head(20)" + ] + }, + { + "cell_type": "markdown", + "id": "6ab19c5c", + "metadata": {}, + "source": [ + "Damit sind alle Anregungen aus der Problemstellung abgearbeitet.\n", + "\n", + "### <font color = \"blue\"> **Anregungen zum selbst Programmieren**\n", + "\n", + "Es gibt unzählige weitere Möglichkeiten, diese Daten auszuwerten. Überlege dir interessante Fragestellungen und werte sie aus." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/Semester_2/Einheit_07/Uebung_6.ipynb b/Semester_2/Einheit_07/Uebung_6.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..d8424894a04ba93363db3db9b0b6d2020cfadb46 --- /dev/null +++ b/Semester_2/Einheit_07/Uebung_6.ipynb @@ -0,0 +1,141 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "52dace14", + "metadata": {}, + "source": [ + "# <font color='blue'>**Übung 6 - Datenanalyse - Pandas**</font>\n", + "(Diese Übung gehört zur Vorlesungseinheit 7)\n", + "\n", + "## <font color='blue'>**Problemstellung: Erstellen und Modifizieren eines Pandas-Datensatzes**</font>\n", + "\n", + "In der letzten Übung wurde ein großer Datensatz eingelesen und viel analysiert. Im Gegensatz dazu, wird in dieser kürzeren Übung ein kleiner Datensatz angelegt, modifiziert und weniger Auswertungen gemacht. Diese Übung sollte auf jeden Fall selbst ausprobiert werden, bevor die Lösung betrachtet wird.\n", + "\n", + "### <font color='blue'>**Problembeschreibung**</font>\n", + "\n", + "Folgende Ergebisse einer Umfrage einer Umfrage liegen vor. Dieser Datensatz soll manuell als Dataframe angelegt werden. Es gibt 6 Personen (um nicht so viel eintragen zu müssen) und 3 Fragen. Die erste Frage ist nur eine ja/nein Frage, die anderen Fragen haben Punkte zwischen 1 und 5.\n", + "\n", + "| Person | Frage 1 | Frage 2 | Frage 3 |\n", + "|---|---|---|---|\n", + "| Person 1 | ja | 3 | 5 |\n", + "| Person 2 | ja | 1 | 1 |\n", + "| Person 3 | nein | 3 | 4 |\n", + "| Person 4 | nein | 2 | 4 |\n", + "| Person 5 | ja | 2 | 5 |\n", + "| Person 6 | ja | 4 | 2 |\n", + "\n", + "Erstelle einen Pandas Dataframe mit den vorliegenden Daten. Die Spalte Person soll dabei der Index sein, die Fragenbezeichnung der Spaltenname.\n", + "\n", + "Erstelle nun mithilfe von Pandas Auswertungen: Durchschnittswerte der Fragen und Diagramme, wie viele Personen die Punktzahl vergeben haben (vgl. Histogramm).\n", + "\n", + "Erstelle zum Schluss eine neue Spalte/Serie mit der durchschnittlichen Bewertung der beiden Punktefragen (pro Person). Ggf. musst du auf den index achten. Gibt es eine Korrelation zwischen der Antwort auf Frage 2 und der durchschnittlichen Bewertung?\n", + "\n", + "### <font color='blue'>**Umsetzung**</font>" + ] + }, + { + "cell_type": "markdown", + "id": "f118b640", + "metadata": {}, + "source": [ + "Dataframe:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f04c682a", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "cadd1033", + "metadata": {}, + "source": [ + "Durchschnitte" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ea4577ab", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "b06bb1f0", + "metadata": {}, + "source": [ + "Plot:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "552008f3", + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "87a8674e", + "metadata": {}, + "source": [ + "Durchschnittliche Punktzahl pro Person:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "197bf9e7", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "125e0f8f", + "metadata": {}, + "source": [ + "Korrelation:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b6c2eaf4", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}