Performing Analysis of Meteorological Data

Data analysis can be described as a process consisting of several steps in which the raw data are transformed and processed in order to visualize data and make predictions. In this blog we will be performing data analysis on Meteorological data. You can find the data set here.

This dataset provides historical data on many meteorological parameters such as pressure, temperature, humidity, wind_speed, visibility, etc. The dataset has hourly temperature recorded for last 10 years starting from 2006–04–01 00:00:00.000 +0200 to 2016–09–09 23:00:00.000 +0200. It corresponds to Finland, a country in the Northern Europe.

We will be performing some basic tasks to perform our analysis such as data cleaning ,data normalizing ,testing the Hypothesis-

“Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming.”

Prerequisites

If you already have Jupyter notebook and all the necessary python libraries (numpy , Scikit-Learn, matplotlib ) installed you are ready to get started.

If not you can use colab too!

Let’s get started with the basic steps:

Step1: Import necessary python libraries

Step2: Load the Dataset

Step3: Data Preparation

Step4: Cleaning the dataset

Step5: Data formatting and resampling

Step6: Analyse the data

Step7: Data Visualization

Implementation

Import the necessary python libraries

Start by importing all the helpful libraries will be needing for the analysis.

Now, you must be thinking what Numpy, pandas and Matplotlib is,

NumPy is a general-purpose array-processing package for scientific computing with Python.

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.

Load the Dataset

To load the data from a CSV (comma-separated value) file into a Pandas DataFrame we use the “read_csv()” function in Pandas. Here stored the data frame into a variable and viewed the first five rows using head() function.

So here’s a sneak peek of our dataset :

Data Preparation

Let’s describe our dataset. Check weather any null value present or not and analyse the datatype.

Here I used the shape, size, columns, dtypes, describe and info methods to extract the data about the dataset.

Tips: We can use dataframename.info() method which gives us dtypes, shape, and null values

Data Cleaning

Now we need to focus on important factors as Apparent Temperature ( C ) & Humidity so we can use the drop() function to drop the unwanted data and clean the dataset.

Data Formatting

Before starting with analysis of the data , we need to normalize the dataset by converting the Formatted Date column to Date Time object. It can be easily done using Pandas function to_datetime().

Resampling the data

Since we have hourly data, we need to resample it to monthly. Resampling is a convenient method for frequency conversion. Object must have a datetime like index.

Here is how the data looks after resampling:

Here “MS” denotes: Month starting We are displaying the average apparent temperature and humidity using mean() function.

Plotting the variation in Apparent Temperature and Humidity with time

We will be using Seaborn to plot the variation in apparent temperature and Humidity with time.

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

The above code gives the following output:

The above plot shows that humidity remained almost constant in these years. Even the average apparent temperature is almost same (since peaks lie on the same line).

Data Analysis

As we can see, both the peaks and the troughs are almost same throughout the period of 10 years. Let’s retrieve the data of a particular month from every year and check whether the average Apparent temperature & the average humidity for the same period have increased or not.

This monthly analysis has to be done for all 12 months over the 10 year period.

Use the same code to analyse from January to December just by changing the variable name and index (i.e) index.month==1 for Jan, index.month==2 for Feb, and so on.

Data Visualization

Use the code given below to plot the graphs for each month

Here is the plots of average temperature and humidity on monthly basis over 10 years.

January

February

March

April

May

June

July

August

September

October

November

December

Conclusion

As for our Hypothesis,

“Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming.”

No change in average humidity was observed over the ten years from [2006 to 2016].

We can observe an increase in average apparent temperature in the year 2009 ,then again a drop in 2010 , a slight increase in 2011, a significant drop in 2015 and finally an increase in 2016. Hence the global warming has caused an uncertainty in temperature over the past 10 years .

Thank you for reading my article!

You can find the source code here.

Masters Student | AI Learner | A Keen Reader & Listener | Active Blogger