## Data Presentation

Table 8.2 Growth (% p.a.) of 50 small pension funds

11.8

10.3

it very difficult to detect any pattern which may exist. Grouping the data in a frequency table will be helpful.

First, it is necessary to decide how many groups to use. Condensing the data into too few groups may obscure an important feature, but using too many groups will produce a table which is not much easier to read and interpret than the original data. Usually, between five and ten groups work well, but a large amount of data may need more groups.

Let us group the data in the intervals listed below.

 0 < x < 4 x < 6 6 < x < 8 8 < x < 10 10 < x < 12 x < 14 and x < 16%

Defining the boundaries this way means that there is no doubt about which interval an observation belongs to. If we had written the intervals as: 0-4, 4-6, 6-8, etc., it would not have been clear to which interval an observation of 10% belonged. While defining the intervals as 0-3.9, 4-5.9, 6-7.9, etc., would have removed the problem of ambiguity, these intervals are discrete (there are gaps between them) and it is preferable to use continuous intervals for a continuous variable.

Now that the interval boundaries have been defined, each observation can be put into its interval and the frequency in each interval found as in columns (1) and (2) of Table 8.3. Notice that the table has been given a title, this is essential to allow readers to interpret what is displayed.

Frequency tables are suitable for all types of data.

 (1) (2) (3) (4) (5) (6) (7) Adjusted Adjusted Cumulative Frequency frequency relative Interval Frequency frequency density Units density frequency 0 < x < 4 6 6 1.5 2 3 0.06 4 < x < 6 12 18 6.0 1 12 0.26 6 < x < 8 11 29 5.5 1 11 0.23 8 < x < 10 9 38 4.5 1 9 0.19 10 < x < 12 5 43 2.5 1 5 0.11 12 < x < 14 4 47 2.0 1 4 0.09 14 < x < 16 3 50 1.5 1 3 0.06 Total 50 Total 1

8.4.2 Cumulative Frequency Tables

Cumulative frequency tables are suitable for discrete and continuous data and show the number of observations below the end (or less commonly above the beginning) of the current interval. This is demonstrated in column (3) of Table 8.3. Column (3) allows us to see, for example, that 29 funds (the majority) reported levels of growth below 8%.

### 8.4.3 Bar Charts

A bar chart is an effective way of displaying discrete or categorical data. Suppose that you have records showing the number of months in the past three years that each member of a sample of 100 UK higher income funds outperformed the index (see Table 8.4). These data are displayed in the frequency table, Table 8.4. The bar chart, Figure 8.1, is drawn so that the height of each bar represents frequency. Notice that the chart has a title and that each axis is labelled. To signify that the data are not continuous, the bars do not touch. A bar chart can be used with grouped data, but the groups (and bars) should be of equal width. Usually the label for each bar is placed at the centre of its base.

Table 8.4 Number of months each member of a sample of 100 UK equity higher income funds outperformed the FTSE 350 High Yield index over the past two years

months

0

1

2

3

4

5

6

7

8

frequency

0

0

0

1

0

2

4

8

12

months

9

10

11

12

13

14

15

16

17

frequency

14

18

15

10

8

3

3

1

0

months

18

19

20

21

22

23

24

frequency

0

1

0

0

0

0

0

total

Months

Figure 8.1 Funds outperforming the FTSE 350 High Yield index

The 100 funds could be divided by size and this information incorporated into a component bar chart, Figure 8.2. In Figure 8.2 it can be seen that smaller funds exhibit greater variability in performance. However, if there are too many components, the chart becomes difficult to read and a different form of presentation should be used.

 □ large □ medium □ small

Months

Figure 8.2 Funds outperforming the FTSE 350 High Yield index

One year's performance can be compared with another's using a multiple bar chart, Figure 8.3.

1 10

Figure 8.3 Comparison of fund performance, 2001, 2002

8.4.4 Histograms

Sometimes a histogram may appear to be the same as a bar chart, but there are important differences. In a histogram, frequency is represented by area rather than by

Bar charts can be constructed easily using the chart wizard within the insert pulldown menu.

Figure 8.1

• Enter the frequencies from Table 8.4 you wish to graph into the Excel spreadsheet.

• Select chart type "column" and subtype "clustered column''.

• Identify whether frequencies are listed in a column or a row, then identify the range of cells.

• The chart wizard expects two series of frequencies, but this figure only requires one so the range given included an adjacent blank row.

• Work through the chart options page of the wizard adding or removing labels, gridlines, etc. until the desired appearance is achieved. This can always be edited later by using the Chart pull-down menu and selecting Chart Options.

• Finally, decide whether to save your chart as an object within your spreadsheet or as a separate chart attached to the sheet.

• Subsequent editing may be undertaken via the Chart pull-down menu or by double clicking on a particular feature within the chart.

Figure 8.2

• Select chart sub-type Stacked Column from the chart wizard.

• When defining the data ranges for your three series it is useful to include the column heading cells from your Excel spreadsheet. The wizard then automatically selects these as series labels within the legend or key.

Figure 8.3

• As for Figure 8.1, select chart sub-type clustered column.

• As for Figure 8.2, include heading cells in data range to provide series labels.

height. For this reason it is not necessary to have intervals of equal width. Also, histograms nearly always depict continuous data, so the rectangles are drawn touching each other to signify the continuous horizontal scale. Figure 8.4 is a histogram of the data in Table 8.3.

Notice that the vertical scale of Figure 8.4 is frequency density; this is because in the histogram, area equals frequency. For example, the second rectangle has height 6 and the length of its base is 2 (since area of a rectangle = base x height).

frequency

Frequency density =-

class width

Calculating frequency density produces fractions which may not be convenient to work with. An alternative which leads to fewer fractions is to decide on a standard class width and call this one unit. Adjusted frequencies are then calculated by dividing by the number of standard units in each interval. This is shown in columns (5) and (6) of Table 8.3, using 2 as the standard class width.

Figure 8.4 Histogram of growth of 50 small pension funds

The importance of using a relative frequency measure, rather than using actual frequencies is shown by Figure 8.5, which is unadjusted. The first, wide rectangle gives a visual impression of more low performing funds than were actually observed.

Figure 8.5 Misleading chart for growth of 50 small pension funds

Histograms representing different total frequencies are difficult to compare because the graphs have different areas. The solution is to scale the histograms so that each has a total area of one unit. This involves calculating relative frequencies and also making any necessary adjustments for unequal class widths (column (7), Table 8.3).

The dotted lines on the adjusted relative frequency histogram, Figure 8.6, join the midpoints of the tops of the rectangles. They meet the axis half a standard class width from the end of the last rectangle on each side. The shape thus formed is called a frequency polygon. Imagine collecting more and more data and therefore splitting it into narrower and narrower intervals; eventually the lines joining the points in the frequency polygon would become very short and would begin to look like a single curve

2 units 1 unit 1 unit 1 unit 1 unit 1 unit 1 unit

Figure 8.6 Adjusted relative frequency histogram with frequency polygon

Excel Application 8.2

### Figures 8.5-8.8

The Excel software programme produces bar charts rather than histograms. Although it is possible to build custom charts, it is very difficult to produce the appearance and features of a histogram. If you are going to use histograms frequently it may be worth investing time in creating a suitable template. For occasional use, a fine pencil and graph paper will produce good results more quickly.

(Figure 8.7). Such a frequency curve can be thought of as the shape of the distribution which would result if it were possible to observe all members of a large population.

### 8.4.5 Stem and Leaf Plots

A stem and leaf plot illustrates the distribution of data in a similar manner to a histogram. However, it offers the advantage of retaining the value of each observation. Figure 8.8a is a stem and leaf plot of the data from Table 8.2. It was constructed by dividing the value of each observation into two parts, the stem (in this case the whole number in the value) and the leaf (the remainder of the value). Each observation was recorded by placing its leaf to the right of the appropriate stem. In Figure 8.8b the leaves have been reordered within each stem branch to appear in order of magnitude. This is an ordered stem and leaf plot.

In instances when several stems have a large number of leaves, more may be revealed about the distribution of the data by dividing each stem.

Figure 8.7 Frequency polygon becomes a frequency curve

Figure 8.7 Frequency polygon becomes a frequency curve

 15 6 15 6 14 2 4 14 2 4 13 5 1 13 1 5 12 9 2 12 2 9 11 8 3 2 11 2 3 8 10 6 3 10 3 6 9 4 9 4 2 2 9 2 2 4 4 9 8 4 0 8 3 8 0 3 4 8 7 5 7 3 3 6 4 7 3 3 4 5 6 6 0 4 3 1 9 6 0 1 3 4 9 5 7 3 6 8 7 2 5 5 2 3 5 6 7 4 6 2 7 3 9 4 2 3 6 7 9 3 1 3 1 2 6 0 3 2 0 3 6 1 9 2 1 2 9

Figure 8.8a Stem and leaf plot Figure 8.8b Ordered stem and leaf plot

Annual growth (%) of 50 small pension funds

Key:

Figure 8.8a Stem and leaf plot Figure 8.8b Ordered stem and leaf plot

Annual growth (%) of 50 small pension funds

Stem and leaf plots offer some scope for comparing two data sets. Figure 8.9 demonstrates this.

8.4.6 Pie Charts

Pie charts are most suitable for categorical data and highlight how the total frequency is split between the categories. The pie (a circle) is cut into slices (sectors) so that the area

 15 6 2 14 2 4 4 2 13 1 5 7 6 6 3 12 2 9 9 9 4 3 1 0 11 2 3 8 7 7 6 5 3 1 10 3 6 7 6 3 2 2 1 9 2 2 4 4 9 9 9 6 3 2 0 8 0 3 4 8 6 5 4 3 3 7 3 3 4 5 6 8 4 3 0 6 0 1 3 4 9 6 4 2 5 2 3 5 6 7 8 3 3 4 2 3 6 7 9 5 2 3 1 2 0 3 6 1 2 9

Figure 8.9 Back-to-back ordered stem and leaf plots: annual growth (%) of (a) large and (b) small pension funds

Key:

Figure 8.9 Back-to-back ordered stem and leaf plots: annual growth (%) of (a) large and (b) small pension funds of a slice represents the frequency in that class. The angle at the centre of the sector is the same fraction of 360 ° (a full circle) as the class frequency is of the total frequency. The data in Table 8.5 have been displayed in Figure 8.10. Notice that the chart has a key to identify the sectors. An alternative is to place the labels in each sector, provided that the sectors are large enough to permit this. Pie charts become difficult to read if there are too many sectors or if several sectors are very small. When drawing one chart, the size of the circle is irrelevant.

 Asset class Percentage Angle UK equities 52.8 0.528 x 3600 = 190 UK fixed interest securities 7.3 0.073 x 3600 = 26 0 Index-linked bonds 5.5 0.055 x 3600 = 20 0 Overseas equities 19.6 0.196 x 3600 = 710 Overseas bonds 3.3 0.033 x 3600 = 12° Property 4.7 0.047 x 3600 = 17° Cash and other 6.6 0.066 x 3600 = 24 0

Sometimes an attempt is made to compare different-sized frequency distributions in drawing appropriately sized pie charts. Such diagrams are very difficult to read meaningfully and so a relative frequency or component bar chart is preferable. However, if pie charts are to be compared, it is essential to remember that frequency is represented by the area, not by the radius of the circle.

If a computer package is being used to draw a pie chart, it is usual to have the option of producing the diagram in three-dimensional perspective. Care should be taken with this since the result can be very misleading. This is demonstrated in

□ Index-linked bonds E3 Overseas equities

I I Overseas bonds

M Property

I I Cash and other

EZ3 UK equities

□ UK fixed interest securities

Figure 8.10 Asset allocation for UK pension funds

Excel Application 8.3

Figure 8.10

• As in Excel Application 8.1, use the chart wizard from the Insert pull-down menu.

• Select "Pie" for both the chart type and sub-type.

• The simplest way to enter the data range is to use the Collapse Dialogue Button at the right of the data range box, then select cells in the worksheet. If you are unsure about this (or any other part of a wizard or menu) right mouse click on the part in question and a help box will appear. Figure 8.12 shows the screen when the help box for the Collapse Dialogue Button has been opened.

• By including cells A4:A10 (see Figure 8.12) in those selected for the data range the labels seen in the key (legend) to the right of the pie chart are inserted automatically. This reduces the need for subsequent editing.

Figure 8.11, which displays the same data as Figure 8.10. The perspective makes the proportion allocated to overseas bonds (3.3%) appear to be substantially less than half the proportion allocated to "cash and other'' (6.6%). Figure 8.12 shows the computer package.

E3 Overseas equities

I I Overseas bonds

HQ Property

EZ3 UK equities

I I UK fixed interest securities

Figure 8.11 Misleading 3-D pie chart
Figure 8.12 Creating a pie chart
Figure 8.13 Multiple time series graph

8.4.7 Time Series Graphs

Time series graphs illustrate how a variable changes over time. The order of the observations is of paramount importance and must not be destroyed in the handling of the data. Several time series may be displayed together and Figure 8.13 shows an example of this. Notice that both axes are labelled and that the graph has a key. It is important to be very wary of extrapolating from time series data. For example, the time series in Figure 8.13 has been truncated at 1999. Can you really predict the next

Excel Application 8.4

Figure 8.13

Select Charts ... from the Insert pull-down menu.

Select Chart type 'Line' and chart sub-type ''Line with markers'' displayed at each data value.

After inserting the Data Range on page 2 of the wizard, don't forget to click on the Series tab to insert series labels. Otherwise these will need to be edited later using the Chart pull-down menu and selecting Source Data ... (Figure 8.14). This dialogue box can also be used to add or delete series.

The labels on the x-axis were rotated from the default position by double clicking on this part of the saved chart, then using the Alignment page of the resultant menu. Similarly, default background shading and gridlines, visible in Figure 8.14, were removed to give the final appearance of Figure 8.13.

Figure 8.14

two years' figures? Your extrapolation can be checked against the figures quoted (see p. 152).

### 8.4.8 Cumulative Frequency Graphs

To construct the cumulative frequency curve in Figure 8.15, values from Table 8.3 were plotted against the appropriate interval end point and joined with a smooth curve. The cumulative frequency curve is also called an ogive.

Figure 8.15 Cumulative frequence curve for data in Table 8.3

Estimates of intermediate values can be made by drawing a perpendicular line from the value in which you are interested on one axis, until it meets the curve. A second line is drawn perpendicular to the first at this point and the level at which it crosses the second axis is read off. For example, the lines drawn on Fig. 8.13 allow us to estimate that half the funds experienced an annual growth of less than 7.3%, or, reading in the opposite direction, five funds had growth exceeding 13%.

### 8.4.9 Scatter Diagrams

Scatter diagrams help us to look for a relationship between two variables. For example, Figure 8.16 shows two scatter diagrams for the 20-year period 1980-99. Each point on the diagram represents one year's returns.

The scatter diagrams give a general impression that perhaps high values of the variables plotted on the x-axis are associated with high values of the variables plotted on the y-axis at least for this time period. There is obviously quite a lot of variation in the data, particularly in Figure 8.16b, and there are a few obvious exceptions to the suggested relationship. We will return to scatter diagrams in Chapter 11.

Retail Price Index Change

nt e

rn er

60 50 40 30 20 10

Equity return

Figure 8.16 Scatter diagrams

Excel Application 8.5

Figure 8.16.

• Select chart type XY (Scatter) from chart wizard and chart sub-type "Scatter".

• You have the option to show more than one series on each scatter diagram, so Figure 8.16b could have been superimposed on Figure 8.16a. This makes sense in certain circumstances, but often just makes the graph difficult to read and interpret.

### 8.4.10 The Misrepresentation of Data

Earlier sections considered several ways in which data can be displayed to help us to read and interpret them. The aim was always to provide a simple summary which clearly showed the main features of the data. However, it is easy to mislead the reader (accidentally or intentionally) with poorly constructed diagrams. Misleading histograms and pie charts were noted in Sections 8.4.4 and 8.4.6. The dangers of extrapolation from time series were highlighted in Section 8.4.7. Further common misrepresentations are shown below to emphasise that one must always look very carefully before making inferences from a diagram.

In Figure 8.16a and b the same points have been plotted, but different scales have been used and in Figure 8.17a the vertical scale does not start from zero. The overall impression is quite different, as indicated in the figure captions.

8075-

Massive increase in sales

50 1996

1997

1998

1999

2000 Year

1996

A slight increase in sales

1997

1998

1999

2000 Year

### Figure 8.17 Sales of personal pension policies

The same scales have been used in Figure 8.18, but the use of the false zero in Figure 8.18b makes sales of policy D look more than twice as high as sales of policy A

ra CO

C3 CO

Product Product

Figure 8.18 Sales of five different policies and sales of product E almost insignificant. Despite the misleading visual impression of the diagram on the right, all the information we need to interpret it is actually given. However, had the vertical scale been missing, the diagram would have become meaningless.

olive oil a

1985 1990 1995

Figure 8.19 "Improved production techniques double the output of olive oil in a decade.'

Another type of distortion can occur in the use of three-dimensional diagrams. In Figure 8.19 the third bottle is indeed twice as high as the first, but most people will, at a glance, tend to interpret the bottle sizes in terms of the amount of liquid held. The third bottle has 8 times the capacity of the first, greatly exaggerating the increase in output.

The examples above illustrate that it is important to make sure that when diagrams are drawn they are meaningful and contain all the information that is necessary for interpretation.