Year
-> This field contains the year in which the incident occurred.
Month
-> This field contains the number of the month in which the incident occurred.
Day
-> This field contains the numeric day of the month on which the incident occurred.
Country
-> This field identifies the country or location where the incident occurred.
State
-> This field identifies the state where the incident occurred.
Region
-> This field identifies the region in which the incident occurred.
City
-> Name of the city, village, or town in which the incident occurred.
Latitude
-> The latitude of the city in which the event occurred.
Longitude
-> The longitude of the city in which the event occurred.
AttackType
-> The general method of attack and broad class of tactics used.
Killed
-> The number of total confirmed fatalities for the incident.
Wounded
-> The number of total confirmed wounded for the incident.
Target
-> The specific person, building, installation that was targeted and/or victimized.
Summary
-> Short summary of the event about what happened.
Group
-> Group which claimed the responsibility for the attack.
Target_type
-> The general type of target/victim.
Weapon_type
-> General type of weapon used in the incident.
import pandas as pd
global_terrorism = pd.read_csv("terrorismdata.csv")
# Viewing the Top 5 rows
global_terrorism.head()
Year | Month | Day | Country | State | Region | City | Latitude | Longitude | AttackType | Killed | Wounded | Target | Summary | Group | Target_type | Weapon_type | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1970 | 7 | 2 | Dominican Republic | NaN | Central America & Caribbean | Santo Domingo | 18.456792 | -69.951164 | Assassination | 1.0 | 0.0 | Julio Guzman | NaN | MANO-D | Private Citizens & Property | Unknown |
1 | 1970 | 0 | 0 | Mexico | Federal | North America | Mexico city | 19.371887 | -99.086624 | Hostage Taking (Kidnapping) | 0.0 | 0.0 | Nadine Chaval, daughter | NaN | 23rd of September Communist League | Government (Diplomatic) | Unknown |
2 | 1970 | 1 | 0 | Philippines | Tarlac | Southeast Asia | Unknown | 15.478598 | 120.599741 | Assassination | 1.0 | 0.0 | Employee | NaN | Unknown | Journalists & Media | Unknown |
3 | 1970 | 1 | 0 | Greece | Attica | Western Europe | Athens | 37.997490 | 23.762728 | Bombing/Explosion | NaN | NaN | U.S. Embassy | NaN | Unknown | Government (Diplomatic) | Explosives |
4 | 1970 | 1 | 0 | Japan | Fukouka | East Asia | Fukouka | 33.580412 | 130.396361 | Facility/Infrastructure Attack | NaN | NaN | U.S. Consulate | NaN | Unknown | Government (Diplomatic) | Incendiary |
# Shape of the Dataset
global_terrorism.shape
(181691, 17)
# Listing columns of the dataset
global_terrorism.columns
Index(['Year', 'Month', 'Day', 'Country', 'State', 'Region', 'City', 'Latitude', 'Longitude', 'AttackType', 'Killed', 'Wounded', 'Target', 'Summary', 'Group', 'Target_type', 'Weapon_type'], dtype='object')
# Checking Count & Data Types
global_terrorism.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 181691 entries, 0 to 181690 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Year 181691 non-null int64 1 Month 181691 non-null int64 2 Day 181691 non-null int64 3 Country 181691 non-null object 4 State 181270 non-null object 5 Region 181691 non-null object 6 City 181257 non-null object 7 Latitude 177135 non-null float64 8 Longitude 177134 non-null float64 9 AttackType 181691 non-null object 10 Killed 171378 non-null float64 11 Wounded 165380 non-null float64 12 Target 181055 non-null object 13 Summary 115562 non-null object 14 Group 181691 non-null object 15 Target_type 181691 non-null object 16 Weapon_type 181691 non-null object dtypes: float64(4), int64(3), object(10) memory usage: 23.6+ MB
# Describing integer and float columns
global_terrorism.describe()
Year | Month | Day | Latitude | Longitude | Killed | Wounded | |
---|---|---|---|---|---|---|---|
count | 181691.000000 | 181691.000000 | 181691.000000 | 177135.000000 | 1.771340e+05 | 171378.000000 | 165380.000000 |
mean | 2002.638997 | 6.467277 | 15.505644 | 23.498343 | -4.586957e+02 | 2.403272 | 3.167668 |
std | 13.259430 | 3.388303 | 8.814045 | 18.569242 | 2.047790e+05 | 11.545741 | 35.949392 |
min | 1970.000000 | 0.000000 | 0.000000 | -53.154613 | -8.618590e+07 | 0.000000 | 0.000000 |
25% | 1991.000000 | 4.000000 | 8.000000 | 11.510046 | 4.545640e+00 | 0.000000 | 0.000000 |
50% | 2009.000000 | 6.000000 | 15.000000 | 31.467463 | 4.324651e+01 | 0.000000 | 0.000000 |
75% | 2014.000000 | 9.000000 | 23.000000 | 34.685087 | 6.871033e+01 | 2.000000 | 2.000000 |
max | 2017.000000 | 12.000000 | 31.000000 | 74.633553 | 1.793667e+02 | 1570.000000 | 8191.000000 |
# Checking for null values
global_terrorism.isnull().sum()
Year 0 Month 0 Day 0 Country 0 State 421 Region 0 City 434 Latitude 4556 Longitude 4557 AttackType 0 Killed 10313 Wounded 16311 Target 636 Summary 66129 Group 0 Target_type 0 Weapon_type 0 dtype: int64
Completely droping Summary
column as it contains a significant number of missing values. Additionally, it may not be essential for our analysis, so we can drop it from the dataset.
For the remaining columns with missing values, we have relatively few null values compared to the size of our dataset. To avoid losing too much data, we will remove rows with missing values.
global_terrorism.drop("Summary", axis = 1, inplace = True)
global_terrorism.dropna(inplace = True)
# Checking for any null values after cleaning
global_terrorism.isnull().sum()
Year 0 Month 0 Day 0 Country 0 State 0 Region 0 City 0 Latitude 0 Longitude 0 AttackType 0 Killed 0 Wounded 0 Target 0 Group 0 Target_type 0 Weapon_type 0 dtype: int64
Filtering out records with "Month" or "Day" values equal to 0. These records are likely incomplete or incorrect, and excluding them will be essential for the visualization.
global_terrorism = global_terrorism[global_terrorism["Month"] != 0]
global_terrorism = global_terrorism[global_terrorism["Day"] != 0]
# Again checking the shape to see how much data left after cleaning
global_terrorism.shape
(158946, 16)
# Renaming Columns to maintain consistency
col_rename = {"AttackType" : "Attack_Type", "Target_type" : "Target_Type", "Weapon_type" : "Weapon_Type"}
global_terrorism.rename(columns = col_rename, inplace = True)
# Replacing "Unknown" values with appropriate labels in specified columns
global_terrorism["Group"] = global_terrorism["Group"].replace({"Unknown" : "No Group took Responsibility"})
global_terrorism["Target_Type"] = global_terrorism["Target_Type"].replace({"Unknown" : "Other"})
global_terrorism["Weapon_Type"] = global_terrorism["Weapon_Type"].replace({"Unknown" : "Other"})
Creating a Casualties
column by adding Killed
& Wounded
columns.
global_terrorism["Casualties"] = global_terrorism["Killed"] + global_terrorism["Wounded"]
# Resetting the index to ensure a clean & continuous index
global_terrorism = global_terrorism.reset_index(drop = True)
Finally, seeing the cleaned dataset after all the cleansing and formatting.
global_terrorism.head()
Year | Month | Day | Country | State | Region | City | Latitude | Longitude | Attack_Type | Killed | Wounded | Target | Group | Target_Type | Weapon_Type | Casualties | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1970 | 1 | 1 | United States | Illinois | North America | Cairo | 37.005105 | -89.176269 | Armed Assault | 0.0 | 0.0 | Cairo Police Headquarters | Black Nationalists | Police | Firearms | 0.0 |
1 | 1970 | 1 | 2 | Uruguay | Montevideo | South America | Montevideo | -34.891151 | -56.187214 | Assassination | 0.0 | 0.0 | Juan Maria de Lucah/Chief of Directorate of in... | Tupamaros (Uruguay) | Police | Firearms | 0.0 |
2 | 1970 | 1 | 2 | United States | California | North America | Oakland | 37.791927 | -122.225906 | Bombing/Explosion | 0.0 | 0.0 | Edes Substation | No Group took Responsibility | Utilities | Explosives | 0.0 |
3 | 1970 | 1 | 2 | United States | Wisconsin | North America | Madison | 43.076592 | -89.412488 | Facility/Infrastructure Attack | 0.0 | 0.0 | R.O.T.C. offices at University of Wisconsin, M... | New Year's Gang | Military | Incendiary | 0.0 |
4 | 1970 | 1 | 3 | United States | Wisconsin | North America | Madison | 43.072950 | -89.386694 | Facility/Infrastructure Attack | 0.0 | 0.0 | Selective Service Headquarters in Madison Wisc... | New Year's Gang | Government (General) | Incendiary | 0.0 |
Saving the cleaned dataset to an Excel file for further visualization in Tableau
.
global_terrorism.to_excel("cleaned_global_terrorism.xlsx", index = False)