Dataset Description

Year -> This field contains the year in which the incident occurred.

Month -> This field contains the number of the month in which the incident occurred.

Day -> This field contains the numeric day of the month on which the incident occurred.

Country -> This field identifies the country or location where the incident occurred.

State -> This field identifies the state where the incident occurred.

Region -> This field identifies the region in which the incident occurred.

City -> Name of the city, village, or town in which the incident occurred.

Latitude -> The latitude of the city in which the event occurred.

Longitude -> The longitude of the city in which the event occurred.

AttackType -> The general method of attack and broad class of tactics used.

Killed -> The number of total confirmed fatalities for the incident.

Wounded -> The number of total confirmed wounded for the incident.

Target -> The specific person, building, installation that was targeted and/or victimized.

Summary -> Short summary of the event about what happened.

Group -> Group which claimed the responsibility for the attack.

Target_type -> The general type of target/victim.

Weapon_type -> General type of weapon used in the incident.

Importing the required libraries

In [1]:
import pandas as pd

Loading the Dataset

In [2]:
global_terrorism = pd.read_csv("terrorismdata.csv")

# Viewing the Top 5 rows
global_terrorism.head()
Out[2]:
Year Month Day Country State Region City Latitude Longitude AttackType Killed Wounded Target Summary Group Target_type Weapon_type
0 1970 7 2 Dominican Republic NaN Central America & Caribbean Santo Domingo 18.456792 -69.951164 Assassination 1.0 0.0 Julio Guzman NaN MANO-D Private Citizens & Property Unknown
1 1970 0 0 Mexico Federal North America Mexico city 19.371887 -99.086624 Hostage Taking (Kidnapping) 0.0 0.0 Nadine Chaval, daughter NaN 23rd of September Communist League Government (Diplomatic) Unknown
2 1970 1 0 Philippines Tarlac Southeast Asia Unknown 15.478598 120.599741 Assassination 1.0 0.0 Employee NaN Unknown Journalists & Media Unknown
3 1970 1 0 Greece Attica Western Europe Athens 37.997490 23.762728 Bombing/Explosion NaN NaN U.S. Embassy NaN Unknown Government (Diplomatic) Explosives
4 1970 1 0 Japan Fukouka East Asia Fukouka 33.580412 130.396361 Facility/Infrastructure Attack NaN NaN U.S. Consulate NaN Unknown Government (Diplomatic) Incendiary
In [3]:
# Shape of the Dataset

global_terrorism.shape
Out[3]:
(181691, 17)
In [4]:
# Listing columns of the dataset

global_terrorism.columns
Out[4]:
Index(['Year', 'Month', 'Day', 'Country', 'State', 'Region', 'City',
       'Latitude', 'Longitude', 'AttackType', 'Killed', 'Wounded', 'Target',
       'Summary', 'Group', 'Target_type', 'Weapon_type'],
      dtype='object')
In [5]:
# Checking Count & Data Types

global_terrorism.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 181691 entries, 0 to 181690
Data columns (total 17 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   Year         181691 non-null  int64  
 1   Month        181691 non-null  int64  
 2   Day          181691 non-null  int64  
 3   Country      181691 non-null  object 
 4   State        181270 non-null  object 
 5   Region       181691 non-null  object 
 6   City         181257 non-null  object 
 7   Latitude     177135 non-null  float64
 8   Longitude    177134 non-null  float64
 9   AttackType   181691 non-null  object 
 10  Killed       171378 non-null  float64
 11  Wounded      165380 non-null  float64
 12  Target       181055 non-null  object 
 13  Summary      115562 non-null  object 
 14  Group        181691 non-null  object 
 15  Target_type  181691 non-null  object 
 16  Weapon_type  181691 non-null  object 
dtypes: float64(4), int64(3), object(10)
memory usage: 23.6+ MB
In [6]:
# Describing integer and float columns

global_terrorism.describe()
Out[6]:
Year Month Day Latitude Longitude Killed Wounded
count 181691.000000 181691.000000 181691.000000 177135.000000 1.771340e+05 171378.000000 165380.000000
mean 2002.638997 6.467277 15.505644 23.498343 -4.586957e+02 2.403272 3.167668
std 13.259430 3.388303 8.814045 18.569242 2.047790e+05 11.545741 35.949392
min 1970.000000 0.000000 0.000000 -53.154613 -8.618590e+07 0.000000 0.000000
25% 1991.000000 4.000000 8.000000 11.510046 4.545640e+00 0.000000 0.000000
50% 2009.000000 6.000000 15.000000 31.467463 4.324651e+01 0.000000 0.000000
75% 2014.000000 9.000000 23.000000 34.685087 6.871033e+01 2.000000 2.000000
max 2017.000000 12.000000 31.000000 74.633553 1.793667e+02 1570.000000 8191.000000
In [7]:
# Checking for null values

global_terrorism.isnull().sum()
Out[7]:
Year               0
Month              0
Day                0
Country            0
State            421
Region             0
City             434
Latitude        4556
Longitude       4557
AttackType         0
Killed         10313
Wounded        16311
Target           636
Summary        66129
Group              0
Target_type        0
Weapon_type        0
dtype: int64

Data Formatting and Cleaning

  • Completely droping Summary column as it contains a significant number of missing values. Additionally, it may not be essential for our analysis, so we can drop it from the dataset.

  • For the remaining columns with missing values, we have relatively few null values compared to the size of our dataset. To avoid losing too much data, we will remove rows with missing values.

In [8]:
global_terrorism.drop("Summary", axis = 1, inplace = True)

global_terrorism.dropna(inplace = True)
In [9]:
# Checking for any null values after cleaning

global_terrorism.isnull().sum()
Out[9]:
Year           0
Month          0
Day            0
Country        0
State          0
Region         0
City           0
Latitude       0
Longitude      0
AttackType     0
Killed         0
Wounded        0
Target         0
Group          0
Target_type    0
Weapon_type    0
dtype: int64

Filtering out records with "Month" or "Day" values equal to 0. These records are likely incomplete or incorrect, and excluding them will be essential for the visualization.

In [10]:
global_terrorism = global_terrorism[global_terrorism["Month"] != 0]

global_terrorism = global_terrorism[global_terrorism["Day"] != 0]
In [11]:
# Again checking the shape to see how much data left after cleaning

global_terrorism.shape
Out[11]:
(158946, 16)
In [12]:
# Renaming Columns to maintain consistency

col_rename = {"AttackType" : "Attack_Type", "Target_type" : "Target_Type", "Weapon_type" : "Weapon_Type"}
global_terrorism.rename(columns = col_rename, inplace = True)
In [13]:
# Replacing "Unknown" values with appropriate labels in specified columns

global_terrorism["Group"] = global_terrorism["Group"].replace({"Unknown" : "No Group took Responsibility"})
global_terrorism["Target_Type"] = global_terrorism["Target_Type"].replace({"Unknown" : "Other"})
global_terrorism["Weapon_Type"] = global_terrorism["Weapon_Type"].replace({"Unknown" : "Other"})

Creating a Casualties column by adding Killed & Wounded columns.

In [14]:
global_terrorism["Casualties"] = global_terrorism["Killed"] + global_terrorism["Wounded"]
In [15]:
# Resetting the index to ensure a clean & continuous index


global_terrorism = global_terrorism.reset_index(drop = True)

Finally, seeing the cleaned dataset after all the cleansing and formatting.

In [16]:
global_terrorism.head()
Out[16]:
Year Month Day Country State Region City Latitude Longitude Attack_Type Killed Wounded Target Group Target_Type Weapon_Type Casualties
0 1970 1 1 United States Illinois North America Cairo 37.005105 -89.176269 Armed Assault 0.0 0.0 Cairo Police Headquarters Black Nationalists Police Firearms 0.0
1 1970 1 2 Uruguay Montevideo South America Montevideo -34.891151 -56.187214 Assassination 0.0 0.0 Juan Maria de Lucah/Chief of Directorate of in... Tupamaros (Uruguay) Police Firearms 0.0
2 1970 1 2 United States California North America Oakland 37.791927 -122.225906 Bombing/Explosion 0.0 0.0 Edes Substation No Group took Responsibility Utilities Explosives 0.0
3 1970 1 2 United States Wisconsin North America Madison 43.076592 -89.412488 Facility/Infrastructure Attack 0.0 0.0 R.O.T.C. offices at University of Wisconsin, M... New Year's Gang Military Incendiary 0.0
4 1970 1 3 United States Wisconsin North America Madison 43.072950 -89.386694 Facility/Infrastructure Attack 0.0 0.0 Selective Service Headquarters in Madison Wisc... New Year's Gang Government (General) Incendiary 0.0

Saving the cleaned dataset to an Excel file for further visualization in Tableau.

In [17]:
global_terrorism.to_excel("cleaned_global_terrorism.xlsx", index = False)