#6.1 Analyze Market Sector Similarities (Python Financial Analysis)

wsh
Python Financial Analysis
7 min readAug 23, 2021

--

Python financial Analysis #6.1

Python Financial Analysis | Home

What is sector similarity?

One of the most common things we hear as advice of investment is that we must distribute money over various kinds of companies/ETFs for safety. If money is well distributed, we won’t lose much even if one or two companies fail.
Definition of Diversification (investpedia.com)
What Is a Diversified Portfolio? (thebalance.com)
Diversification (Wikipedia)

But the problem of such diversified investment is about how to distribute money. There are several methods of diversification like foreign diversification. It’s a good manner to invest money on companies of several countries. But when we choose companies, we should pay attention to their similarities, because investing money on similar companies often doesn’t lead to a good diversification. Since it’s not easy to evaluate similarities of every pair of companies, in this article we’re going to see similarities of sectors instead.

Sector similarity is a measurement of how much two sectors have correlated performance change, or how much they have similar graph of performance. In this article, we’re going to see how to compare performances 16 sectors, and numerically evaluate how much they are similar/correlated in Python.

Compare similarities

The image below is a map of sector similarities of all pairs. If the number inside a cell is large, then it means these sectors have larger correlations. You may notice that the map is not symmetric about the diagonal line. This is due to our definition of sector similarity described below.

[NEW] View in an interactive heatmap

Here are the most similar and least similar 3 pairs of sectors. The graph of most similar ones show that even small changes of price look similar on the graph, that is, when one sector has a performance jump, the other sector has the same magnitude of gain at the same timing. In a meaning of diversified investment, investing on these most similar sectors is almost equivalent to investing all the money on either of the sectors.

While some sectors have strong similarities, some pairs of sectors don’t have significant similarities at all. Look at the last graphs. These don’t’ look the same at all except right after the COVID19 impact.

You can download graphs of all 240 pairs of sectors from here (Google Drive):
https://drive.google.com/drive/folders/1rOPzYP_ZveHQpfGAdW0vzAi-LXOP9-OU?usp=sharing

Calculate similarities (in Python)

The sector similarities are comparison of the average performance of companies that belong to each sector. We take the weighted average of prices of companies as the performance of a sector, where the weights are the market caps of these companies. Because we have to compare sector performances, the performances of each sector are relative to Jan 2020, defining the performances at the beginning as 1.0.

4.1 Make custom market index — prerequisites (Python Financial Analysis)
4.2 Make custom market index — make your own index (Python Financial Analysis)
4.3 Make custom market index — market cap based index (Python Financial Analysis)
5.1 Analyze COVID-19 Impacts by Sector in Python — compare weighted average prices (Python Financial Analysis

1. Imports packages, read dataset

We first import necessary packages. If have experiences of Python or have read previous articles, you should already know most of the packages. The new one is “financialanalysis”. It’s a bunch of tools for making financial data analysis much easier and less stressful. If you haven’t installed the “financialanalysis”, you can install with “pip install financialanalysis”. Note “sklearn” is also included as a dependency of “financialanalysis”.

Then we read the datasets “pricedata_reshaped.csv” and “meta.csv”. You can download them from the link below. “meta.csv” is a collection of basic information like market caps of all 5100 companies listed on the U.S. market. “pricedata_reshaped.csv” is the closing prices of these companies of this 5 years.
Download dataset from Google Drive
https://drive.google.com/drive/folders/1Ux2u1s5mctYiywS08sv7_3_PbnWd8v0G?usp=sharing
Download dataset from One Drive
https://1drv.ms/u/s!AtyojsH6G4d-iwjRiL5vtXjHxGme?e=inK2bL

The object returned by “read_csv” is an object called DataFrame. If you don’t know how to use DataFrame, the following articles will help you:
2 Handling table-like data in Python with DataFrame (Python Financial Analysis)
Python DataFrame slicing in the easiest way (How to find a company from 5000 companies)

2. Convert date, and fill missing values

Because the dates in the CSV file are written in a text format, we must convert them into “datetime” objects. The code below does this for you, but the function “stringToDatetime()” function of “financialanalysis” is a substitute of writing boring code.
Python “datetime” in the easiest way (how to handle dates in data science with Python)

Then we fill missing values of DataFrame “meta” and “price” with zeros. If a value is missing in a CSV file, Ptyhon and DataFrame fill such a missing value with a special variable “NaN” (Not a Number). We must replace them with zeros because any calculation including NaN becomes NaN. That becomes a problem later.

3. Crop the “price”

We then select the data range on which we calculate the performance of each sector. We take the range from the beginning of the COVID19 pandemic to the recent. You can manually crop the “price” DataFrame by specifying range with datetime objects, but you can do the same operation in a single line of code using the “cropTimeseriesDataFrame()” function.
Python DataFrame slicing in the easiest way (How to find a company from 5000 companies)

4. Calculate sector performances

We finally calculate the performances of each sector. After getting a lists of all sectors (“SECTORS”), we iterate over the sector names. We firsts extract meta information of companies that belong to the sector “sector”. We then iterate over tickers of the companies. “w” and “p” are the weight and price of the company. Because this is a weighted average based on the market caps, “w” is the relative market cap of this company. If you don’t understand what’s written here, the following article may help you:
5.1 Analyze COVID-19 Impacts by Sector in Python — compare weighted average prices (Python Financial Analysis

5. Calculate sector similarities

We then calculate the similarities of each pair of sectors. After extracting the sector performances of sector1 and sector2, we take their division to see their relative performance. If the “price_division” is close to a straight line, it means that the two sectors are similar. Thus we first make a straight line that is compared to the price division.

We take the straight line from linear regression of the price division. This method finds a line that most approximates the original data. The line here is “y”.
Linear regression on time series data like stock price

After finding an approximation line “y”, we measure how much it’s close to the original graph “price_division”. In order to calculate such similarity, we first calculate how much they are not similar with Root Mean Square. We then take its inverse as the similarity of the sector1 and sector2.
Calculate graph similarities with Root Mean Square (RMS)

6. Display the similarities as a heatmap

The we make the heatmap line the one shown at the top of this article. Unlike the code we saw so far, there’s not so much to explain here. You can just copy and past the code below, or use the “dataframeToHeatmap()” function instead. If you don’t know the basics of Matplotlib, the following article may help you:
3 Make graphs of stock price in Python (Python Financial Analysis)

7. Generate comparison graphs

You can also compare specific pairs of sectors with their performance history.

Full Python code

If you don’t like coding, yon just copy and past this code. Don’t forget to download the datasets, and place them in the same folder where the code is saved. Run it with “python sector_similarities.py”.

Other Links

Python Financial Analysis | Home
Python Data Analysis | Home

--

--