Alessio Trivella | past projects

Home_____ Research_____ Studies_____ Curiosities

MSc projects

During my MSc studies, I enjoyed developing and applying mathematical techniques and tools to solve real-life problems, spanning the boundaries of different disciplines. The following are example of projects that I carried out as part of courses, either at the or at the .

Optimal asset allocation using exchange traded funds

Energy system investment and operation

Short-term electricity prices forecasting using learning models

The 3D bin-packing problem with optimal load balancing

A heuristic algorithm for multiple global alignment of sequences

The Hough transform for detecting straight lines

The synchronization phenomenon in neural networks

Heat conduction in heterogeneous materials

A special case of the Weber problem

Nonconforming finite element method for elliptic problems

Consulting

From 2012 to 2014 I have worked as business analyst and project manager at . My main customer was the , where I contributed to the analysis and management of around a dozen projects related to IT developments in the areas of transactional products (SWIFT and SEPA payments, credit cards, electronic statements) and internet banking applications.

Program "Camt XML Reporting"

Feasibility Study "Online Tool for Financial Institutions"

[Not available] Feasibility Study "UniCredit E-Banking Rationalization"

[Not available] Home Banking tool for Anti-Money Laundering

[Not available] E.Banking enhancement with SEPA functions

Optimal asset allocation using exchange traded funds
Course: Optimization in Finance

Goal of this project is to determine an optimal allocation strategy of Exchange Traded Funds (ETFs), a relatively new concept of funds which replicate the performance of an index, allowing a significant diversification at low operating cost.

Several thousands ETFs are currently traded worldwide, and determining a good investment portfolio composed by only a few of them can be challenging. Therefore, we restrict the universe of financial instruments by applying a sequence of techniques of increasing level of complexity:

First, simple database filtering and "due diligence" analysis (e.g., the ETF must be liquid, long only, and with sufficient historical data available) are used to reduce the set to about 100-200 assets.
Second, we use hierarchical clustering to find a set of low correlated and high performing ETFs, to the order of 10-15. For clustering, the Spearman's correlation has been used as distance measure between the logreturns of two ETFs, and the complete-linkage function for the cluster-to-cluster distance. For each cluster, the ETF with the best Sharpe ratio over 3 years is chosen.
With a limited number of ETFs, more advanced scenario-based optimization methods such as the CVaR and Downside Regret are implemented to determine the optimal portfolio mix. Scenarios are generated using the bootstrap method in such a way that the statistical properties of the historical data are preserved with good approximation by the scenarios, that is, they can capture well the evolution of the asset dynamics. Portfolio updates and transaction costs have been considered too, and modeled by means of mixed-integer programming.
Finally, the investment strategies deriving from optimization are backtested against the simple 1/N strategy over a horizon which is several years long and includes a few major and minor economic crises (e.g. 2008). Any valid optimization-based strategy should, indeed, be able to beat the 1/N strategy and perform well enough on a sufficiently long test period.

The algorithms were implemented in MATLAB and GAMS, and the backtest showed that our asset selection combined with an optimization strategy that embeds CVaR with a low risk aversion performed best. Besides a drop in the crisis of 2008, it mainly kept on increasing the budget, ending up with more than twice the initial budget in 8 years.

Performance under different strategies

Energy system investment and operation
Course: Modelling and analysis of Sustainable Energy Systems using Operations Research

In this work, we develop a model to find the optimal investment and dispacth for a power system composed by wind farms, conventional power plants, a pumped hydro storage, and transmission lines connecting a set of neighbouring regions. The goal is to determine the optimal mix of technologies and their dispacth under different economic conditions, using mathematical programming and the modelling software GAMS.

As a case study, the power system of Denmark is considered and data regarding electricity consumption in Denmark, investment and operating costs of different technologies as well as their efficiency, fuel utilization, and emissions have been collected from several sources. Disregarding unit commitment considerations, the problem can be formulated as a linear program where the objective is to minimize the total investment and operational cost of the energy system. We model one year with hourly time resolution.

The program is run under different economic conditions including low/high production cost for the conventional power units as a consequence of fuel and CO₂ emission costs, and low/high state incentives for wind energy. We noticed that generally with no or low wind support, the system mainly relies on conventional power units and only a small part of the capacity is based on wind power. As the wind support increases, so does the installed capacity of wind power. Storage remains quite low in all scenarios.

Technology mix under different scenarios

Alternative investment approaches based on scenario optimization have also been formulated to make the model more robust against adverse scenarios. Given the increasing model size resulting from the implementation of scenarios, a time aggretation technique was necessary to keep the model tractable.

Short-term electricity prices forecasting using learning models
Course: Computational Data Analysis

In this work, we investigate the use of different machine learning methods to forecast short-term electricity prices. The goal is to forecast the "next-week" prices in DK-East (Zealand) using historical hourly data of the region. The task is challenging as these time series are extremely complex to predict due to their non-stationary nature. As a consequence, traditional statistical-based modeling techniques appear too simple to simulate electricity prices, especially when the system changes rapidly. In the last years, methods from machine learning have been widely applied to financial time series forecasting as they have many advantages compared to statistical models. We compare several methods, and eventually focus on Random Forest and Support Vector Machines.

Random Forest is an ensemble learning method which constructs a large collection of decision trees and returns the mean prediction of the individual trees. The main ideas behind Random Forest are the bagging framework and the random selection of features to grow decorrelated trees.
Support Vector Machines can be used for regression in a variant called Support Vector Regression (SVR). Given a set of training data (x_i,y_i), the goal of SVR is to find a function f(x)=<w,x_i>+b that is as close as possible to the targets y_i for all training data, and is as flat as possible at the same time.

By comparing the RMSE error we can say that the models, if properly trained, perform similarly. Moreover, the methods capture different patterns hidden in the financial time series, thus, ensemble of regressors and hybrid models can be used to improve the overall prediction.

Random Forest

Support Vector Regression

In conclusion, machine learning methods such as Random Forest and Support Vector Machines can be successfully used for financial time series forecasting and can help investors, traders, borrowers to make better investment decision. However, given the extremely complex behaviour of such series, the methods must be applied carefully in terms of variable selection, data matrix construction, and parameter tuning.

The 3D bin-packing problem with optimal load balancing
Course: MSc thesis in Operations Research

The work is divided into two parts. First, we consider the classical 3D Bin-Packing Problem (BPP) where a set of rectangular-shaped 3D boxes must be placed into the minimal number of identical bins. Second, we move to the 3D BPP with optimal load balancing, that is, boxes must be arranged into the minimal number of bins in such a way that the centers of mass of the loaded bins fall as close as possible to an ideal location.

Literature about the 3D Bin-Packing problem is vast, so we have tried both to improve existing methods and to develop new ones. In contrast, literature about load balancing of containers is scarce, thus, we build up new mathematical models to describe the problem and heuristic methods to solve large instances. Some of the contributions are:

The constructive extreme point-based heuristics for the 3D BPP have been improved by iteratively generating greedy randomized initial orderings of the boxes to pack.
A column generation algorithm for the 3D BPP has been outlined. To solve the difficult pricing, a 3D Knapsack-Packing Problem, we still make use of randomized extreme-points heuristics adapted to the knapsack problem.
Two mixed-integer linear programs have been devised. The first model finds the most balanced arrangement of boxes into a single bin. The second, more elaborate, is able to determine the 3D BPP solution which minimizes the sum of displacements from the ideal barycentre of the used containers: it integrates packing and balancing.
A multi-scale local search heuristic for balancing a single bin has been developed. The algorithms is based on the characterization of packings by means of a set of three interval graphs, that is, graphs obtained from the intersection of intervals (projections of boxes) on the real line. The figure shows an example of interval graphs from a 2D packing. The heuristic incorporates an inner local search able to move among the transitive orientations of a graph, and an outer local search which works on the structure of the three graphs.

Example of 2D interval graphs

A heuristic algorithm for multiple global alignment of sequences
Course: Bioinformatics

One of the most important problems in bioinformatics is the comparison of biological sequences (nucleic acids and proteins). Results can be used, for instance, to establish the evolution of a specie or to predict the function of a gene/protein by significant similarity with a known gene/protein. The comparison is performed by aligning the sequences one below the other, possibly inserting gaps, in such a way that the correspondence between aligned letters is maximized.

The alignment can be local or global, and between two sequences (pairwise) or more than two (multiple). This project consisted in implementing an algorithm for multiple global alignment of sequences. The alignment of two sequences of length∼n can be efficiently solved with dynamic programming in O(n²), but the problem of comparing k sequences is NP-hard and cannot in general be solved to optimality. Consequently, heuristic algorithms should be used.

Our heuristic algorithm first finds the two sequences with the best alignment score and produces align them to optimality. When this is done, we have k-1 elements: k-2 sequences and 1 block of 2 aligned sequences. Again, we find the two elements with the best alignment score, which can be two sequences or one sequence and the block, and align them. The process is repeated until we find a single block containing all the sequences (k-1 steps). However, we only know how to compare a couple of sequences (by dynamic programming). We thus developed a heuristic method to align two blocks of already aligned sequences. The algorithm was implemented using MATLAB, and below some results.

Ex. 1: DNA

Ex. 2: DNA

Ex. 3: proteins

The Hough transform for detecting straight lines
Course: Image Analysis

The Hough trasform is a technique used in image analysis to extract features of particular shapes inside an image. This project consists in implementing the Hough Trasform for detecting straight lines.

The Hough trasform can be applied to binary images only, so the image is first turn into binary using an edge detection process. Assume the edge pixels to be white and the background pixels black. Each straight line can be written in the Hesse form ρ=xcos(θ)+ysin(θ), thus using two parameters (ρ,θ). Think of (ρ,θ) as a parameter space and discretize it in a finite manner: what we obtain is a matrix and all its elements are initialized to 0. Now, for each white pixel consider the bundle of straight lines passing through it and add +1 to the cells of the matrix containing the values (ρ,θ) corresponding to the lines of the sheaf. The values of the matrix exceeding a certain threshold are turn back into straight lines in the original image.

The main idea consists in exploiting a duality between the image I and a proper parameter space P, so that each edge point in I corresponds to a set of points (a curve) in P, and vice versa , each point in P corresponds to a set of points (a straight line) in I. The algorithm was implemented using MATLAB and below is an example of outcome.

Original image

Binary image

Dual parameter space

Straight lines identified

The synchronization phenomenon in neural networks
Course: Laboratory of Mathematical Modelling

The goal of this project is to numerically study a neuronal network, and in particular to identify the conditions (such as network topology and synaptic weights) which could lead to a visible phenomenon of synchronization among neurons.

After understanding the anatomy of a biological neuron and how it works, we studied and implemented the Hodgkin-Huxley model: a mathematical model (system of ordinary differential equations) able to simulate the reaction of a neuron subject to different action potentials. Then, we created different types of networks by varying the degree of randomness in the connections (with 64 neurons).

Network topologies

NEURON, a specialized software for simulating neuronal networks, is used to implement and run the model for the different networks. We find that totally randomly generated networks yield to a better synchronization. In fact, the presence of long links (i.e. links between distant neurons) allows the potential of neurons to faster propagate through the whole network. Finally, we changed the synaptic weights (generated from normal distributions N(m,v) with with different mean and variance) to determine the best conditions for synchronization.

Synaptyc weights ∼ N(0,1)
No visible synchronization

Synaptyc weights ∼ N(1,0.1)
Clear synchronization!

Heat conduction in heterogeneous materials
Course: ECMI Modelling Week

Materials with a high temperature resistance are indispensable in many engineering applications and such materials are generally far from being homogeneous due to their multi-phase, porous micro-structure. This project consisted in the numerical study using MATLAB of heat conduction through a fire retardant wall with an aluminium honeycomb micro-structure, and was carried out with a team of international students.

Micro scale mesh

Studying the problem uniquely at a macro-scale would provide strongly inaccurate results since the micro-structure properties are not taken into account. Vice versa, the problem cannot be solved entirely at the micro-scale due to the problem size.

The most promising way to proceed is therefore adopting a multi-scale analysis: the two different scales, micro and macro, interact with each other and information goes from micro to macro and vice versa using, for example, averaging schemes. At macro-scale, we consider the temperature evolution through the wall, and the heat balance equation takes the general time-dependent form. In the micro-scale a microstructural representative volume element (RVE) is defined where characteristic physical and geometrical properties are embedded, and a 2D Boundary Value Problem (BVP) is solved. Problems at both scales are solved using a linear Finite Element Method (FEM) and the picture illustrates our RVE and the mesh used for the FEM where the different materials (aluminium and air) are located. Finally a solution of the whole problem is obtained from the interaction of the two scales.

Solution of a micro-scale problem using Dirichlet BC
(in practice we need anti-periodic BC)

A special case of the Weber problem
Course: Facilities Location

The Weber Problem is a classical problem in the field of Facilities Location. Given a set of n facilities A_j=(x_j,y_j) with weights w_j, we need to locate a new facility so that the weighted sum of euclidean distances to the existing facilities is minimized, i.e., we need to solve:

A special case of it is the Fermat Problem: given a triangle ABC, find the point T which minimizes the sum of the three distances AT+BT+CT. The problem is simply solved drawing the Simpson's lines and considering their intersection (if no angle is greater than 120°, otherwise the solution is one the vertices).

Define now another special case of the Weber Problem, the Complementary Problem (CP): given a triangle ABC, find the point T which minimizes the quantity AT+BT-CT. In other words, the task is to place a facility as close as possible to two existing facilities and as far as possible to a third one. The project consists in providing the optimal solution to the CP in the general case and a critical analysis of the scientific article where the CP is defined. To solve the CP, a system of coordinates is introduced and an elaborate geometrical construction is developed. Different cases concerning the features of the triangle ABC are considered separately, and several regions of the space are studied one after the other. Thanks to the properties of the Simpson's lines, algebraic manipulations, limit processes etc., we were able to get the optimal solution in all caes. Depending on the case, the optimal solution can be one point, two points, or a set of infinitely many points.

Geometrical construction 1

Geometrical construction 2

Nonconforming finite element method for elliptic problems
Course: Numerical Methods for Partial Differential Equations

Consider the following elliptic problem in divergence form:

where the domain is bi-dimensional. Let T be a triangular mesh and E the set of internal edges of T. Define the space of L² functions which are linear on each triangle of T, and continuous in the midpoints of E. Such a space represents an example of nonconforming finite elements approximation. The project consists in implementing using MATLAB an algorithm to solve the elliptic problem using nonconforming elements, and determine the convergence error.

We start to examine the space. The degrees of freedom are represented by blue point in the first figure on the left (red points are boundary conditions). The second figure shows a basis function corresponding to a degree of freedom: notice that it is discontinuous, implying the final solution will be discontinuous too. Then, the implementation of the classical linear elements is adapted to this nonconforming case. We computed the error and found the convergence to be quadratic for the L² error, and linear for the broken H¹ semi-norm error. An advantage with respect to the classical linear elements is that by connecting each other only in the midpoints of edges, nonconforming elements seem to be more flexible and less stiff, hence preferable in several mechanical application. The disadvantage is that the degrees of freedom increase, in fact, the number of edges S of a mesh is greater then the number of nodes N, and asymptotically S/N=1.5. Below are some examples of our numerical results.

u(x,y)= e^-(x²+y²)

u(x,y) = (x-0.5)²-(y-0.5)²

u(x,y) = sin(πx) sin(πy)

Program "Camt XML Reporting"
Business Integration Partners

The goal of this program was the development of the standard for the account reporting of corporate customers of UniCredit. The program was composed by several initiatives corresponding of implementing CAMT in different banks within UniCredit Group (UniCredit Bank, HypoVereinsbank, Bank Austria, UniCredit Slovakia, UniCredit Hungary) and of different message types. In particular, some of the implemented messages were camt.052 (intra-day reporting), camt.053 (daily statement), camt.054 (bulk movements) and camt.086 (billing reporting).

I performed the functional analysis needed to define the solution process, which consisted in determining the banking applications involved in the process (e.g. a new Camt Generator platform, current account application, profiling system, payment systems etc.) and the way they should interact.
Moreover, I was responsible for defining the business mapping rules of the camt messages, i.e. associating the destination tags of the XML messages with fields of input files using particular java and regex expressions, ensuring that both the ISO and the UniCredit guidelines were respected. The fact that different mapping rules are required for each camt type (052, 053, 054, 086), each transaction type (e.g. SEPA CT, SEPA DD, foreign CT, credit card etc.) and each movement direction (initiator, receiver, return) made the task particularly challenging.

Illustrative E2E architectural solution

Illustrative portion of mapping rules

Feasibility Study "Online Tool for Financial Institutions"
Business Integration Partners

The goal of this feasibility study was to analyze a new Online Tool for Financial Institution customers that could be used by the three main hubs of the UniCredit Group: Italy, Germany and Austria. The tool should offer a variety of functionalities, from data search functionalities (payment orders, collection instructions, cash letters, documentary credits, guarantees, related Swift message details) to payment initiations (payment order Swift MT 103, bank to bank transfer Swift MT 202, cancellation Swift MT 192, queries etc.), reporting, support and service functionalities.

A large part of the study was dedicated to the functional analysis needed to understand and define the new involved processes (e.g. profiling of new customers in the system, the connection between the tool and the banking applications during a payment initiation etc.). Many other aspects have also been considered as the internet banking security, the technological infrastructure (hardware, network, connections), the definition and sizing of the different working environments (development, validation, production), business continuity and disaster recovery issues, and an html demo showing the expected user interfaces of the tool has also beed created. The legal & compliance regulations differ among the countries and finding a common feasible solution was particularly challenging. The new tool was finally economically priced through a request for information (RFI) and a request for proposal (RFP) involving different vendors.

Illustrative E2E architectural solution

Illustrative user interface page