Research Papers

2024
  • Data Augmentation for Traffic Classification PDF details
    C.Wang, A. Finamore, P. Michiardi, M. Gallo, D. Rossi
    Passive and Active Measurements (PAM)
    Data Augmentation (DA) -- enriching training data by adding synthetic samples -- is a technique widely adopted in Computer Vision (CV) and Natural Language Processing (NLP) tasks to improve models performance. Yet, DA has struggled to gain traction in networking contexts, particularly in Traffic Classification (TC) tasks. In this work, we fulfill this gap by benchmarking 18 augmentation functions applied to 3 TC datasets using packet time series as input representation and considering a variety of training conditions. Our results show that (i) DA can reap benefits previously unexplored, (ii) augmentations acting on time series sequence order and masking are better suited for TC than amplitude augmentations and (iii) basic models latent space analysis can help understanding the positive/negative effects of augmentations on classification performance.

    @inproceeding{AF:PAM24, title={Toward Generative Data Augmentation for Traffic Classification}, author={C. {Wang} and A. {Finamore} and P. {Michiardi} and M. {Gallo} and D. {Rossi}}, year={2024}, booktitle={Passive and Active Measurements (PAM)}, location={Virtual}, doi={10.1007/978-3-031-56249-5_7}, howpublished="https://afinamore.io/pubs/PAM24_data-augmentation.pdf" }
2023
  • Toward Generative Data Augmentation for Traffic Classification PDF details
    C.Wang, A. Finamore, P. Michiardi, M. Gallo, D. Rossi
    Student workshop at ACM Conference on emerging Networking Experiments and Technologies (CoNEXT)
    Data Augmentation (DA)--augmenting training data with synthetic samples—is wildly adopted in Computer Vision (CV) to improve models performance. Conversely, DA has not been yet popularized in networking use cases, including Traffic Classification (TC). In this work, we present a preliminary study of 14 hand-crafted DAs applied on the MIRAGE19 dataset. Our results (i) show that DA can reap benefits previ- ously unexplored in TC and (ii) foster a research agenda on the use of generative models to automate DA design.

    @inproceeding{AF:CoNEXT23, title={Toward Generative Data Augmentation for Traffic Classification}, author={C. {Wang} and A. {Finamore} and P. {Michiardi} and M. {Gallo} and D. {Rossi}}, year={2023}, booktitle={Conference on emerging Networking Experiments and Technologies (CoNEXT), Student workshop}, location={Paris, France}, doi={}, howpublished="https://afinamore.io/pubs/CoNEXT23_sw_handcrafted_da.pdf" }
  • Replication: Contrastive Learning and Data Augmentation in Traffic Classification Using a Flowpic Input Representation PDF SLIDES details
    ACM Internet Measurement Conference (IMC)
    The popularity of Deep Learning (DL), coupled with network traffic visibility reduction due to the increased adoption of HTTPS, QUIC and DNS-SEC, re-ignited interest towards Traffic Classification (TC). However, to tame the dependency from task-specific large labeled datasets we need to find better ways to learn representations that are valid across tasks. In this work we investigate this problem comparing transfer learning, meta-learning and contrastive learning against reference Machine Learning (ML) tree-based and monolithic DL models (16 methods total). Using two publicly available datasets, namely MIRAGE19 (40 classes) and AppClassNet (500 classes), we show that (i) using large datasets we can obtain more general representations, (ii) contrastive learning is the best methodology and (iii) meta-learning the worst one, and (iv) while ML tree-based cannot handle large tasks but fits well small tasks, by means of reusing learned representations, DL methods are reaching tree-based models performance also for small tasks.

    @inproceeding{AF:IMC23, title={Replication: Contrastive Learning and Data Augmentation in Traffic Classification Using a Flowpic Input Representation}, author=, year={2023}, booktitle={Internet Measurement Conference (IMC)}, location={Montreal, Canada}, doi={}, howpublished="https://afinamore.io/pubs/IMC23_replication.pdf" }
  • Many or Few Samples? Comparing Transfer, Contrastive and Meta-Learning in Encrypted Traffic Classification PDF details
    I. Guarino, C. Wang, A. Finamore, A. Pescape, D. Rossi
    IEEE/IFIP Traffic Measurement and Analysis (TMA)
    The popularity of Deep Learning (DL), coupled with network traffic visibility reduction due to the increased adoption of HTTPS, QUIC and DNS-SEC, re-ignited interest towards Traffic Classification (TC). However, to tame the dependency from task-specific large labeled datasets we need to find better ways to learn representations that are valid across tasks. In this work we investigate this problem comparing transfer learning, meta-learning and contrastive learning against reference Machine Learning (ML) tree-based and monolithic DL models (16 methods total). Using two publicly available datasets, namely MIRAGE19 (40 classes) and AppClassNet (500 classes), we show that (i) using large datasets we can obtain more general representations, (ii) contrastive learning is the best methodology and (iii) meta-learning the worst one, and (iv) while ML tree-based cannot handle large tasks but fits well small tasks, by means of reusing learned representations, DL methods are reaching tree-based models performance also for small tasks.

    @article{AF:TMA23, title={Many or Few Samples? Comparing Transfer, Contrastive and Meta-Learning in Encrypted Traffic Classification}, author={I. {Guarino} and C. {Wang} and A. {Finamore} and A. {Pescape} and D. {Rossi}}, year={2023}, booktitle=Traffic Measurement and Analysis (TMA), location={Naples, Italy}, doi={10.48550/arXiv.2305.12432}, howpublished="https://afinamore.io/pubs/TMA23_manyorfew.pdf" }
  • "It's a Match!" -- A Benchmark of Task Affinity Scores for Joint Learning PDF details
    R. Azorin, M. Gallo, A. Finamore, D. Rossi, P. Michiardi
    International Workshop on Practical Deep Learning in the Wild (PracticalDL) - colocated with AAAI
    While the promises of Multi-Task Learning (MTL) are attractive, characterizing the conditions of its success is still an open problem in Deep Learning. Some tasks may benefit from being learned together while others may be detrimental to one another. From a task perspective, grouping cooperative tasks while separating competing tasks is paramount to reap the benefits of MTL, i.e., reducing training and inference costs. Therefore, estimating task affinity for joint learning is a key endeavor. Recent work suggests that the training conditions themselves have a significant impact on the outcomes of MTL. Yet, the literature is lacking of a benchmark to assess the effectiveness of tasks affinity estimation techniques and their relation with actual MTL performance. In this paper, we take a first step in recovering this gap by (i) defining a set of affinity scores by both revisiting contributions from previous literature as well presenting new ones and (ii) benchmarking them on the Taskonomy dataset. Our empirical campaign reveals how, even in a small-scale scenario, task affinity scoring does not correlate well with actual MTL performance. Yet, some metrics can be more indicative than others.

    @article{AF:PracticalDL23, title={"It's a Match!" -- A Benchmark of Task Affinity Scores for Joint Learning}, author={R. {Azorin} and M. {Gallo} and A. {Finamore} and D. {Rossi} and P. {Michiardi}}, year={2023}, booktitle={International Workshop on Practical Deep Learning in the Wild (PracticalDL)}, location={Washington, US}, doi={10.48550/arXiv.2301.02873}, howpublished="https://afinamore.io/pubs/PracticalDL23_mtl.pdf" }
2022
  • Accelerating Deep Learning Classification with Error-controlled Approximate-key Caching PDF details
    A. Finamore, J. Roberts, M. Gallo, D. Rossi
    IEEE International Conference on Computer Communications (INFOCOM)
    While Deep Learning (DL) technologies are a promising tool to solve networking problems that map to classification tasks, their computational complexity is still too high with respect to real-time traffic measurements requirements. To reduce the DL inference cost, we propose a novel caching paradigm, that we named approximate-key caching, which returns approximate results for lookups of selected input based on cached DL inference results. While approximate cache hits alleviate DL inference workload and increase the system throughput, they however introduce an approximation error. As such, we couple approximate-key caching with an error-correction principled algorithm, that we named auto-refresh. We analytically model our caching system performance for classic LRU and ideal caches, we perform a trace-driven evaluation of the expected performance, and we compare the benefits of our proposed approach with the state-of-the-art similarity caching -- testifying the practical interest of our proposal.

    @inproceedings{AF:INFOCOM22, title={Accelerating Deep Learning Classification with Error-controlled Approximate-key Caching}, author={A. {Finamore} and J. {Roberts} and M. {Gallo} and D. {Rossi}}, year={2022}, booktitle={IEEE International Conference on Computer Communications (INFOCOM)}, location={Virtual Event}, doi={10.1109/INFOCOM48880.2022.9796677}, howpublished="https://afinamore.io/pubs/INFOCOM22_approximate_key_caching.pdf" }
  • Towards a systematic multi-modal representation learning for network data PDF details
    Z. B. Houidi, R. Azorin, M. Gallo, A. Finamore, D. Rossi
    ACM Workshop on Hot Topics in Networks (HotNets)
    Learning the right representations from complex input data is the key ability of successful machine learning (ML) models. The latter are often tailored to a specific data modality. For example, recurrent neural networks (RNNs) were designed having sequential data in mind, while convolutional neural networks (CNNs) were designed to exploit spatial correlation in images. Unlike computer vision (CV) and natural language processing (NLP), each of which targets a single well-defined modality, network ML problems often have a mixture of data modalities as input. Yet, instead of exploiting such abundance, practitioners tend to rely on sub-features thereof, reducing the problem to single modality for the sake of simplicity. In this paper, we advocate for exploiting all the modalities naturally present in network data. As a first step, we observe that network data systematically exhibits a mixture of quantities (e.g., measurements), and entities (e.g., IP addresses, names, etc.). Whereas the former are generally well exploited, the latter are often underused or poorly represented (e.g., with one-hot encoding). We propose to systematically leverage language models to learn entity representations, whenever significant sequences of such entities are historically observed. Through two diverse use-cases, we show that such entity encoding can benefit and naturally augment classic quantity-based features.

    @inproceedings{AF:HotNets22, title={Towards a systematic multi-modal representation learning for network data}, author={Z. B. {Houidi} and R. {Azorin} and M. {Gallo} and A. {Finamore} and D. {Rossi}}, year={2022}, booktitle={ACM Workshop on Hot Topics in Networks (HotNets)}, doi={10.1145/3563766.3564108}, howpublished="https://afinamore.io/pubs/HotNets22_representation.pdf" }
  • AppClassNet: a commercial-grade dataset for application identification research PDF details
    C. Wang, A. Finamore, L. Yang, K. Fauvel, D. Rossi
    ACM Computer Communication Review (CCR)
    The recent success of Artificial Intelligence (AI) is rooted into several concomitant factors, namely theoretical progress coupled with abundance of data and computing power. Large companies can take advantage of a deluge of data, typically withhold from the research community due to privacy or business sensitivity concerns, and this is particularly true for networking data. Therefore, the lack of high quality data is often recognized as one of the main factors currently limiting networking research from fully leveraging AI methodologies potential. Following numerous requests we received from the scientific community, we release AppClassNet, a commercial-grade dataset for benchmarking traffic classification and management methodologies. AppClassNet is significantly larger than the datasets generally available to the academic community in terms of both the number of samples and classes, and reaches scales similar to the popular ImageNet dataset commonly used in computer vision literature. To avoid leaking user- and business-sensitive information, we opportunely anonymized the dataset, while empirically showing that it still represents a relevant benchmark for algorithmic research. In this paper, we describe the public dataset and our anonymization process. We hope that AppClassNet can be instrumental for other researchers to address more complex commercial-grade problems in the broad field of traffic classification and management.

    @article{AF:CCR22, title={AppClassNet: a commercial-grade dataset for application identification research}, author={C. {Wang} and A. {Finamore} and L. {Yang} and K. {Fauvel} and D. {Rossi}}, year={2022}, booktitle={ACM Computer Communication Review (CCR)}, doi={10.1145/3561954.3561958}, howpublished="https://afinamore.io/pubs/CCR22_appclassnet.pdf" }
2021
  • A First Look at Class Incremental Learning in Deep Learning Mobile Traffic Classification PDF details
    G. Bovenzi, L. Yang, A. Finamore, G. Aceto, D. Ciuonzo, A. Pescapè, D. Rossi
    IEEE/IFIP Traffic Measurement and Analysis (TMA)
    The recent popularity growth of Deep Learning (DL) re-ignited the interest towards traffic classification, with several studies demonstrating the accuracy of DL-based classifiers to identify Internet applications' traffic. Even with the aid of hardware accelerators (GPUs, TPUs), DL model training remains expensive, and limits the ability to operate frequent model updates necessary to fit to the ever evolving nature of Internet traffic, and mobile traffic in particular. To address this pain point, in this work we explore Incremental Learning (IL) techniques to add new classes to models without a full retraining, hence speeding up model's updates cycle. We consider iCarl, a state of the art IL method, and MIRAGE-2019, a public dataset with traffic from 40 Android apps, aiming to understand "if there is a case for incremental learning in traffic classification". By dissecting iCarl internals, we discuss ways to improve its design, contributing a revised version, namely iCarl+. Despite our analysis reveals their infancy, IL techniques are a promising research area on the roadmap towards automated DL-based traffic analysis systems.

    @inproceedings{AF:CORR-21c, title={A First Look at Class Incremental Learning in Deep Learning Mobile Traffic Classification}, author=G. {Bovenzi} and L. {Yang} and A. {Finamore} and G. {Aceto} and D. {Ciuonzo} and A. {Pescapè} and D. {Rossi}, year={2021}, booktitle=Traffic Measurement and Analysis (TMA), location={Virtual Event}, doi={10.48550/arXiv.2107.04464}, howpublished="https://afinamore.io/pubs/TMA21_icarl_plus.pdf" }
  • FENXI: Deep-learning Traffic Analytics at the Edge PDF details
    M. Gallo, A. Finamore, G. Simon, D. Rossi
    ACM/IEEE Symposium on Edge Computing (SEC)
    Live traffic analysis at the first aggregation point in the ISP network enables the implementation of complex traffic engineering policies but is limited by the scarce processing capabilities, especially for Deep Learning (DL) based analytics. The introduction of specialized hardware accelerators i.e., Tensor Processing Unit (TPU), offers the opportunity to enhance the processing capabilities of network devices at the edge. Yet, to date, no packet processing pipeline is capable of offering DL-based analysis capabilities in the data-plane, without interfering with network operations. In this paper, we present FENXI, a system to run complex analytics by leveraging TPU. The design of FENXI decouples forwarding operations and traffic analytics which operates at different granularities i.e., packet and flow levels. We conceive two independent modules that asynchronously communicate to exchange network data and analytics results, and design data structures to extract flow level statistics without impacting per-packet processing. We prototyped and evaluated FENXI on general-purpose servers considering both adversarial and realistic network conditions. Our analysis shows that FENXI can sustain 100 Gbps line rate traffic processing requiring only limited resources, while also dynamically adapting to variable network conditions.

    @inproceedings{AF:SEC-21, title={FENXI: Deep-learning Traffic Analytics at the Edge}, author={M. {Gallo} and A. {Finamore} and G. {Simon} and D. {Rossi}, year={2021}, booktitle={Symposium on Edge Computing (SEC)}, location={San Jose, CA, U.S}, doi={10.1145/3453142.3491273}, howpublished="https://afinamore.io/pubs/SEC21_fenxi.pdf" }
  • Deep Learning and Zero-Day Traffic Classification: Lessons Learned From a Commercial-Grade Dataset PDF details
    L. Yang, A. Finamore, F. Jun, D. Rossi
    IEEE Transactions on Network and Service Management (TNSM)
    The increasing success of Machine Learning (ML) and Deep Learning (DL) has recently re-sparked interest towards traffic classification. While classification of known traffic is a well investigated subject with supervised classification tools (such as ML and DL models) are known to provide satisfactory performance, detection of unknown (or zero-day) traffic is more challenging and typically handled by unsupervised techniques (such as clustering algorithms). In this paper, we share our experience on a commercial-grade DL traffic classification engine that is able to (i) identify known applications from encrypted traffic, as well as (ii) handle unknown zero-day applications. In particular, our contribution for (i) is to perform a thorough assessment of state of the art traffic classifiers in commercial-grade settings comprising few thousands of very fine grained application labels, as opposite to the few tens of classes generally targeted in academic evaluations. Additionally, we contribute to the problem of (ii) detection of zero-day applications by proposing a novel technique, tailored for DL models, that is significantly more accurate and light-weight than the state of the art. Summarizing our main findings, we gather that (i) while ML and DL models are both equally able to provide satisfactory solution for classification of known traffic, however (ii) the non-linear feature extraction process of the DL backbone provides sizeable advantages for the detection of unknown classes.

    @ARTICLE{AF:TNSM-21a, title={Deep Learning and Zero-Day Traffic Classification: Lessons Learned From a Commercial-Grade Dataset}, author={L. {Yang} and A. {Finamore} and F. {Jun} and D. {Rossi}}, year={2021}, journal={IEEE Transactions on Network and Service Management}, volume={18}, number={4}, pages={4103-4118}, doi={10.1109/TNSM.2021.3122940}, howpublished="https://afinamore.io/pubs/TNSM21_traffic_classification.pdf" }
  • Towards a Generic Deep Learning Pipeline for Traffic Measurements PDF details
    R. Azorin, M. Gallo, A. Finamore, M. Filippone, P. Michiardi, D. Rossi
    ACM CoNEXT Student Workshop
    As networks grow bigger, traffic measurements become more and more challenging. Common practices require specialized solutions tied to specific measurements. We aim at automating the design of generic top-down measurements tools thanks to Deep Learning. To this end, we focus our study on (i) researching an appropriate input traffic representation and (ii) comparing Deep Learning pipelines for several measurements. In this paper, we propose an empirical campaign to study a variety of modeling approaches for multiple traffic metrics predictions, with a strong focus on the trade-off between performance and cost that these approaches offer.

    @inproceedings{AF:CoNEXT-SW21, title={Towards a Generic Deep Learning Pipeline for Traffic Measurements}, author={R. {Azorin} and M. {Gallo} and A. {Finamore} and M. {Filippone} and P. {Michiardi} and D. {Rossi}}, year={2021}, booktitle={ACM CoNEXT Student Workshop}, location={Virtual Event}, doi={10.1145/3488658.3493785}, howpublished="https://afinamore.io/pubs/CoNEXT-student-workshop21.pdf" }
  • Are we breaking bubbles as we move? Using a large sample to explore the relationship between urban mobility and segregation PDF details
    S. Park, T. Oshan, A. El Ali, A. Finamore
    Journal of Computers, Environment and Urban Systems (CEUS)
    Segregation often dismantles common activity spaces and isolates people of different backgrounds, leading to irreconcilable inequalities that disfavour the poor and minorities and intensifies societal fragmentation. Therefore, segregation has become an increasing concern and topic of research with studies typically concentrating on the residential communities of a particular racial or socioeconomic group. This paper enhances the residential view of segregation and examines the topic in the context of urban mobility. Specifically, it expands upon prior research by employing large-sample, seamless telecommunication logs of London, UK to provide a holistic view of mobility across the entire socioeconomic spectrum. A method is developed to transform the data to flows between geographic areas with different socioeconomic statuses. Spatial interaction models are then calibrated to examine the impact of both geographical distance and socioeconomic distance on the deterrence of flows and the analysis is extended to analyze the interaction of the two factors. Overall, socioeconomic distance is found to have a subtle effect compared to geographical distance. However, different effects are observed depending on the socioeconomic distance between flows and the deterrence of mobility tends to be the greatest when both physical and socioeconomic distance are high, suggesting that both factors may play a role creating and maintaining segregation.

    @article{AF:CEUS-21, title=Are we breaking bubbles as we move? Using a large sample to explore the relationship between urban mobility and segregation, author={S. {Park} and T. {Oshan} and A. {El Ali} and A. {Finamore}}, year=2021, journal=Computers, Environment and Urban Systems (CEUS), volume=86, pages=101585, doi={10.1016/j.compenvurbsys.2020.101585}, howpublished="https://afinamore.io/pubs/CEUS21_mobility.pdf" }
2020
  • Where Things Roam: Uncovering Cellular IoT/M2M Connectivity PDF details
    A. Lutu, B. Jun, A. Finamore, F. Bustamante, D. Perino
    ACM Internet Measurement Conference (IMC)
    Support for "things" roaming internationally has become critical for Internet of Things (IoT) verticals, from connected cars to smart meters and wearables, and explains the commercial success of Machine-to-Machine (M2M) platforms. We analyze IoT verticals operating with connectivity via IoT SIMs, and present the first large-scale study of commercially deployed IoT SIMs for energy meters. We also present the first characterization of an operational M2M platform and the first analysis of the rather opaque associated ecosystem. For operators, the exponential growth of IoT has meant increased stress on the infrastructure shared with traditional roaming traffic. Our analysis quantifies the adoption of roaming by M2M platforms and the impact they have on the underlying visited Mobile Network Operators (MNOs). To manage the impact of massive deployments of device operating with an IoT SIM, operators must be able to distinguish between the latter and traditional inbound roamers. We build a comprehensive dataset capturing the device population of a large European MNO over three weeks. With this, we propose and validate a classification approach that can allow operators to distinguish inbound roaming IoT devices.

    @inproceedings{AF:IMC-20, title={Where Things Roam: Uncovering Cellular IoT/M2M Connectivity}, author={A. {Lutu} and B. {Jun} and A. {Finamore} and F. {Bustamante} and D. {Perino}}, year={2020}, booktitle={Internet Measurement Conference (IMC)}, location={Virtual Event}, doi={10.1145/3419394.3423661}, howpublished="https://afinamore.io/pubs/IMC20_roaming.pdf" }
  • Opening the Deep Pandora Box: Explainable Traffic Classification (poster) PDF details
    C. Beliard, A. Finamore, D. Rossi
    IEEE International Conference on Computer Communications (INFOCOM)
    Fostered by the tremendous success in the image recognition field, recently there has been a strong push for the adoption of Convolutional Neural Networks (CNN) in networks, especially at the edge, assisted by low-power hardware equipment (known as “tensor processing units”) for the acceleration of CNN-related computations. The availability of such hardware has reignited the interest for traffic classification approaches that are based on Deep Learning. However, unlike tree-based approaches that are easy to interpret, CNNs are in essence represented by a large number of weights, whose interpretation is particularly obscure for the human operators. Since human operators will need to deal, troubleshoot, and maintain these automatically learned models, that will replace the more easily human-readable heuristic rules of DPI classification engine, there is a clear need to open the “deep pandora box”, and make it easily accessible for network domain experts. In this demonstration, we shed light in the inference process of a commercial-grade classification engine dealing with hundreds of classes, enriching the classification workflow with tools to enable better understanding of the inner mechanics of both the traffic and the models.

    @INPROCEEDINGS{AF:INFOCOM-20, title={Opening the Deep Pandora Box: Explainable Traffic Classification}, author={C. {Beliard} and A. {Finamore} and D. {Rossi}}, year={2020}, booktitle={IEEE International Conference on Computer Communications (INFOCOM)}, location={Virtual event}, doi={10.1109/INFOCOMWKSHPS50562.2020.9162704}, howpublished="https://afinamore.io/pubs/INFOCOM20_poster.pdf" }
  • Real-time Deep Learning based Traffic Analytics (poster) PDF details
    M. Gallo, A. Finamore, G. Simon, D. Rossi
    ACM Special Interest Group on Data Communications (SIGCOMM)
    The increased interest towards Deep Learning (DL) tech- nologies has led to the development of a new generation of specialized hardware accelerator [4] such as Graphic Process- ing Unit (GPU) and Tensor Processing Unit (TPU) [1, 2]. The integration of such components in network routers is how- ever not trivial. Indeed, routers typically aim to minimize the overhead of per-packet processing (e.g., Ethernet switching, IP forwarding, telemetry) and design choices (e.g., power, memory consumption) to integrate a new accelerator need to factor in these key requirements. The literature and bench- marks on DL hardware accelerators have overlooked specific router constraints (e.g., strict latency) and focused instead on cloud deployment [3] and image processing. Likewise, there is limited literature regarding DL application on traffic processing at line-rate. Among all hardware accelerators, we are interested in edge TPUs [1, 2]. Since their design focuses on DL inference, edge TPUs matches the vision of operators, who consider running pre-trained DL models in routers with low power drain. Edge TPUs are expected to limit the amount of com- putational resources for inference and to yield a higher ratio of operations per watt footprint than GPUs. This demo aims to investigate the operational points at which edge TPUs become a viable option, using traffic classi- fication as a use case. We sketch the design of a real-time DL traffic classification system, and compare inference speed (i.e., number of classifications per second) of a state-of-the- art Convolutional Neural Network (CNN) model running on different hardware (Central Processing Unit (CPU), GPU, TPU). To constrast their performance, we run stress tests based on synthetic traffic and under different conditions. We collect the results into a dashboard which enables network operators and system designers to both explore the stress test results with regards to their considered operational points, as well as triggering synthetic live tests on top of Ascend 310 TPUs [1].

    @INPROCEEDINGS{AF:SIGCOMM-20, title={Real-time Deep Learning based Traffic Analytics}, author={M. {Gallo} and A. {Finamore} and G. {Simon} and D. {Rossi}}, year={2020}, booktitle={ACM Special Interest Group on Data Communications (SIGCOMM)}, location={Virtual event}, doi=, howpublished="https://afinamore.io/pubs/SIGCOMM20_poster.pdf" }
  • Back in control -- An extensible middle-box on your phone PDF details
    J. Newman, A. Razaghpanah, N. Vallina-Rodriguez, F. Bustamante, M. Allman, D. Perino, A. Finamore
    arXiv/CoRR

    @article{AF:CORR-20, title={Back in control -- An extensible middle-box on your phone}, author={J. {Newman} and A. {Razaghpanah} and N. {Vallina-Rodriguez} and F. {Bustamante} and M. {Allman} and D. {Perino} and A. {Finamore}, year={2020}, journal={CoRR}, doi=, howpublished="https://afinamore.io/pubs/CORR20_mbz.pdf" }
2019
  • Tackling Mobile Traffic Critical Path Analysis With Passive and Active Measurements PDF details
    G. Tangari, D. Perino, A. Finamore, M. Charalambides, G. Pavlou
    IEEE Network Traffic Measurement and Analysis Conference (TMA)
    Critical Path Analysis (CPA) studies the delivery of webpages to identify page resources, their interrelations, as well as their impact on the page loading latency. Despite CPA being a generic methodology, its mechanisms have been applied only to browsers and web traffic, but those do not directly apply to study generic mobile apps. Likewise, web browsing represents only a small fraction of the overall mobile traffic. In this paper, we take a first step towards filling this gap by exploring how CPA can be performed for generic mobile applications. We propose Mobile Critical Path Analysis (MCPA), a methodology based on passive and active network measurements that is applicable to a broad set of apps to expose a fine-grained view of their traffic dynamics. We validate MCPA on popular apps across different categories and usage scenarios. We show that MCPA can identify user interactions with mobile apps only based on traffic monitoring, and the relevant network activities that are bottlenecks. Overall, we observe that apps spend 60% of time and 84% of bytes on critical traffic on average, corresponding to +22% time and +13% bytes than what observed for browsing.

    @INPROCEEDINGS{AF:TMA-19, title={Tackling Mobile Traffic Critical Path Analysis With Passive and Active Measurements}, author={G. {Tangari} and D. {Perino} and A. {Finamore} and M. {Charalambides} and G. {Pavlou}}, year={2019}, booktitle={IEEE Network Traffic Measurement and Analysis Conference (TMA)}, location={Paris, France}, doi={10.23919/TMA.2019.8784636}, howpublished="https://afinamore.io/pubs/TMA19_mcpa.pdf" }
  • Generalizing Critical Path Analysis on Mobile Traffic PDF details
    G. Tangari, D. Perino, A. Finamore
    arXiv/CoRR
    Critical Path Analysis (CPA) studies the delivery of webpages to identify page resources, their interrelations, as well as their impact on the page loading latency. Despite CPA being a generic methodology, its mechanisms have been applied only to browsers and web traffic, but those do not directly apply to study generic mobile apps. Likewise, web browsing represents only a small fraction of the overall mobile traffic. In this paper, we take a first step towards filling this gap by exploring how CPA can be performed for generic mobile applications. We propose Mobile Critical Path Analysis (MCPA), a methodology based on passive and active network measurements that is applicable to a broad set of apps to expose a fine-grained view of their traffic dynamics. We validate MCPA on popular apps across different categories and usage scenarios. We show that MCPA can identify user interactions with mobile apps only based on traffic monitoring, and the relevant network activities that are bottlenecks. Overall, we observe that apps spend 60% of time and 84% of bytes on critical traffic on average, corresponding to +22% time and +13% bytes than what observed for browsing.

    @article{AF:CORR-19, title={Generalizing Critical Path Analysis on Mobile Traffic}, author={G. {Tangari} and A. {Finamore} and D. {Perino}}, year={2019}, journal={CoRR}, doi=, howpublished="https://afinamore.io/pubs/CORR19_mcpa.pdf" }
2018
  • CHIMP: Crowdsourcing Human Inputs for Mobile Phones PDF details
    M. Almeida, M. Bilal, A. Finamore, I. Leontiadis, Y. Grunenberger, M. Varvello, J. Blackburn
    ACM International World Wide Web Conference (WWW)
    While developing mobile apps is becoming easier, testing and characterizing their behavior is still hard. On the one hand, the de facto testing tool, called "Monkey," scales well due to being based on random inputs, but fails to gather inputs useful in understanding things like user engagement and attention. On the other hand, gathering inputs and data from real users requires distributing instrumented apps, or even phones with pre-installed apps, an expensive and inherently unscaleable task. To address these limitations we present CHIMP, a system that integrates automated tools and large-scale crowdsourced inputs. CHIMP is different from previous approaches in that it runs apps in a virtualized mobile environment that thousands of users all over the world can access via a standard Web browser. CHIMP is thus able to gather the full range of real-user inputs, detailed run-time traces of apps, and network traffic. We thus describe CHIMP»s design and demonstrate the efficiency of our approach by testing thousands of apps via thousands of crowdsourced users. We calibrate CHIMP with a large-scale campaign to understand how users approach app testing tasks. Finally, we show how CHIMP can be used to improve both traditional app testing tasks, as well as more novel tasks such as building a traffic classifier on encrypted network flows.

    @inproceedings{AF:WWW-18, title={CHIMP: Crowdsourcing Human Inputs for Mobile Phones}, author={M. {Almeida} and M. {Bilal} and A. {Finamore}, and I. {Leontiadis} and Y. {Grunenberger} and M. {Varvello} and J. {Blackburn}}, year={2018}, booktitle={ACM International World Wide Web Conference (WWW)}, location={Lyon, France}, doi={10.1145/3178876.3186035}, howpublished="https://afinamore.io/pubs/WWW18_chimp.pdf" }
2017
  • Traffic Analysis with Off-the-Shelf Hardware: Challenges and Lessons Learned PDF details
    M. Trevisan, A. Finamore, M. Mellia, M. M. Munafo, D. Rossi
    IEEE Communications Magazine
    In recent years, the progress in both hardware and software allows user-space applications to capture packets at 10 Gb/s line rate or more, with cheap COTS hardware. However, processing packets at such rates with software is still far from being trivial. In the literature, this challenge has been extensively studied for network intrusion detection systems, where per-packet operations are easy to parallelize with support of hardware acceleration. Conversely, the scalability of statistical traffic analyzers (STAs) is intrinsically complicated by the need to track per-flow state to collect statistics. This challenge has received less attention so far, and it is the focus of this work. We present and discuss design choices to enable a STA to collects hundreds of per-flow metrics at a multi-10-Gb/s line rate. We leverage a handful of hardware advancements proposed over the last years (e.g., RSS queues, NUMA architecture), and we provide insights on the trade-offs they imply when combined with state-of-the-art packet capture libraries and the multi-process paradigm. We outline the principles to design an optimized STA, and we implement them to engineer DPDKStat, a solution combining the Intel DPDK framework with the traffic analyzer Tstat. Using traces collected from real networks, we demonstrate that DPDKStat achieves 40 Gb/s of aggregated rate with a single COTS PC

    @ARTICLE{AF:IEEECOMM-17, title={Traffic Analysis with Off-the-Shelf Hardware: Challenges and Lessons Learned}, author={M. {Trevisan} and A. {Finamore} and M. {Mellia} and M. {Munafo} and D. {Rossi}}, year={2017}, journal={IEEE Communications Magazine}, number={3}, pages={163-169}, doi={10.1109/MCOM.2017.1600756CM}, howpublished="https://afinamore.io/pubs/IEEECOM17_tstat-dpdk.pdf" }
  • Dissecting DNS Stakeholders in Mobile Networks PDF SLIDES details
    M. Almeida, A. Finamore, D. Perino, N. Vallina-Rodriguez, M. Varvello
    ACM Conference on emerging Networking EXperiments and Technologies (CoNEXT)
    The functioning of mobile apps involves a large number of protocols and entities, with the Domain Name System (DNS) acting as a predominant one. Despite being one of the oldest Internet systems, DNS still operates with semi-obscure interactions among its stakeholders: domain owners, network operators, operating systems, and app developers. The goal of this work is to holistically understand the dynamics of DNS in mobile traffic along with the role of each of its stakeholders. We use two complementary (anonymized) datasets: traffic logs provided by a European mobile network operator (MNO) with 19M customers, and traffic logs from 5,000 users of Lumen, a traffic monitoring app for Android. We complement such passive traffic analysis with active measurements at four European MNOs. Our study reveals that 10k domains (out of 198M) account for 87% of total network flows. The time to live (TTL) values for such domains are mostly short (< 1min), despite domain-to-IPs mapping tends to change on a longer time-scale. Further, depending on the operators recursive resolver architecture, end-user devices receive even smaller TTL values leading to suboptimal effectiveness of the on-device DNS cache. Despite a number of on-device and in-network optimizations available to minimize DNS overhead, which we find corresponding to 10% of page load time (PLT) on average, we have not found wide evidence of their adoption in the wild.

    @inproceedings{AF:CONEXT-17, title={Dissecting DNS Stakeholders in Mobile Networks}, author={M. {Almeida}, A. {Finamore} and D. {Perino} and N. {Vallina-Rodriguez} and M. {Varvello}}, year={2017}, booktitle={ACM Conference on Emerging Networking EXperiments and Technologies (CoNEXT)}, location={Incheon, Republic of Korea}, doi={10.1145/3143361.3143375}, howpublished="https://afinamore.io/pubs/CONEXT17_dns.pdf" }
  • The Good, the Bad, and the KPIs: How to Combine Performance Metrics to Better Capture Underperforming Sectors in Mobile Networks PDF details
    I. Leontiadis, Joan Serra, A. Finamore, G. Dimopoulos, K. Papagiannaki
    IEEE International Conference on Data Engineering (ICDE)
    Mobile network operators collect a humongous amount of network measurements. Among those, sector Key Performance Indicators (KPIs) are used to monitor the radio access, i.e., the "last mile" of mobile networks. Thresholding mechanisms and synthetic combinations of KPIs are used to assess the network health, and rank sectors to identify the underperforming ones. It follows that the available monitoring methodologies heavily rely on the fine grained tuning of thresholds and weights, currently established through domain knowledge of both vendors and operators. In this paper, we study how to bridge sector KPIs to reflect Quality of Experience (QoE) groundtruth measurements, namely throughput, latency and video streaming stall events. We leverage one month of data collected in the operational network of mobile network operator serving more than 10 million subscribers. We extensively investigate up to which extent adopted methodologies efficiently capture QoE. Moreover, we challenge the current state of the art by presenting data-driven approaches based on Particle Swarm Optimization (PSO) metaheuristics and random forest regression algorithms, to better assess sector performance. Results show that the proposed methodologies outperforms state of the art solution improving the correlation with respect to the baseline by a factor of 3, and improving visibility on underperforming sectors. Our work opens new areas for research in monitoring solutions for enriching the quality and accuracy of the network performance indicators collected at the network edge.

    @INPROCEEDINGS{AF:ICDE-17, title={The Good, the Bad, and the KPIs: How to Combine Performance Metrics to Better Capture Underperforming Sectors in Mobile Networks}, author={I. {Leontiadis} and J. {Serrà} and A. {Finamore} and G. {Dimopoulos} and K. {Papagiannaki}}, year={2017}, booktitle={IEEE International Conference on Data Engineering (ICDE)}, location={SanDiego, CA, USA}, doi={10.1109/ICDE.2017.89}, howpublished="https://afinamore.io/pubs/ICDE17_kpis.pdf" }
  • Mind the Gap Between HTTP and HTTPS in Mobile Networks PDF SLIDES details
    A. Finamore, M. Varvello, K. Papagiannaki
    Passive and Active Measurement Conference (PAM)
    Fueled by a plethora of applications and Internet services, mobile data consumption is on the rise. Over the years, mobile operators deployed webproxies to optimize HTTP content delivery. Webproxies also produce HTTP-logs which are a fundamental data source to understand network/services performance and user behavior. The recent surge of HTTPS is progressively reducing such wealth of information, to the point that it is unclear whether HTTP-logs are still representative of the overall traffic. Unfortunately, HTTPS monitoring is challenging and adds some extra cost which refrains operators from “turning on the switch”. In this work, we study the “gap” between HTTP and HTTPS both quantifying their intrinsic traffic characteristics, and investigating the usability of the information that can be logged from their transactions. We leverage a 24-hours dataset collected from a webproxy operated by a European mobile carrier with more than 10M subscribers. Our quantification of this gap suggests that its importance is strictly related to the target analysis.

    @INPROCEEDINGS{AF:PAM-17, title={Mind the Gap Between HTTP and HTTPS in Mobile Networks}, author={A. {Finamore} and M. {Varvello} and K. {Papagiannaki}}, year={2017}, booktitle={Passive and Active Measurements Conference (PAM)}, location={Sydney, Australia}, doi={10.1007/978-3-319-54328-4_16}, howpublished="https://afinamore.io/pubs/PAM17_https.pdf" }
  • Characterising users experience and critical path in mobile applications (poster) PDF details
    A. Finamore, J. Newman, D. Perino, N. Rattanavipanon, C. Soriente, N. Vallina-Rodriguez
    ACM Internet Measurement Conference (IMC)
    Users Quality of Experience (QoE) analysis is paramount for telcos to drive business, and for users to verify their SLAs with the operators. Over the years both academia and industry put a lot of efforts to define methodologies and create tools to monitor QoE. In particular, web traffic QoE analysis is a well understood and ``standardized'' research area, and a varied set of tools and metrics are available to understand how content is delivered. For instance, it is easy to capture the interaction of a browser with a website (e.g., as HAR objects), to inspect content delivery critical path via the download waterfall (e.g., via Chrome, or tools like WProf), or quantify webpage retrieval performance via page load time (PLT), or Google speed index. These tools and techniques suit well fixed access networks but cannot be directly applied to analyze mobile user QoE. Indeed, most of mobile traffic is generated by large number of applications and do not resemble to traditional web browsing traffic. Also, mobile architectures are more complex than fixed infrastructures and their main performance bottleneck is latency. In this work, we consider QoE for mobile applications by focusing on network level metrics and taking into account application level information. The main goal is to identify network activities (as DNS lookup, TCP and TLS handshake, content download, etc.) which are bottlenecks for application performance and propose countermeasures. To achieve this objective we combine active and passive on-device measurements. On the one hand, we developed an app that leverages the Android VPN APIs (i.e., it does not require rooted devices) to monitor network traffic, while collecting also info about user activity with the device. This enable us to analyze mobile apps traffic in different scenarios (e.g., application startup, application in background/foreground without user activity, user interaction) and develop techniques to reconstruct the networking waterfall for each of them, similarly to what is currently possible for generic web browsing traffic. However, we envision a generic mobile device system component capable to inspect all traffic, and possibly tapping into user engagement (e.g., interactivity with the screen). By analysing the waterfall we aim to identify activities constituting the critical path for each scenario and application. On the other hand, the monitoring app allows to schedule active experiments. This allows us to define a comparison baseline to contrast against other analysis, as well as further diagnose problems when the waterfall analysis suggests there may be a bottleneck. In the poster, we present the methodology and preliminary results on a small set of applications in a testbed with real devices.

    @INPROCEEDINGS{AF:IMC-17, title={Characterising users experience and critical path in mobile applications}, author={A. {Finamore} and J. {Newman} and D. {Perino} and N. {Rattanavipanon} and C. {Soriente} and N. {Vallina-Rodriguez}}, year={2017}, booktitle={ACM Internet Measurements Conference (IMC)}, location={London, UK}, doi={10.1007/978-3-319-54328-4_16}, howpublished="https://afinamore.io/pubs/IMC17_vins.pdf" }
2016
  • Is the Web HTTP/2 Yet? PDF details
    M. Varvello , K. Schomp , D. Naylor , J. Blackburn , A. Finamore , K. Papagiannaki
    Passive and Active Measurement Conference (PAM)
    Version 2 of the Hypertext Transfer Protocol (HTTP/2) was finalized in May 2015 as RFC 7540. It addresses well-known problems with HTTP/1.1 (e.g., head of line blocking and redundant headers) and introduces new features (e.g., server push and content priority). Though HTTP/2 is designed to be the future of the web, it remains unclear whether the web will—or should—hop on board. To shed light on this question, we built a measurement platform that monitors HTTP/2 adoption and performance across the Alexa top 1 million websites on a daily basis. Our system is live and up-to-date results can be viewed at [1]. In this paper, we report findings from an 11 month measurement campaign (November 2014 – October 2015). As of October 2015, we find 68,000 websites reporting HTTP/2 support, of which about 10,000 actually serve content with it. Unsurprisingly, popular sites are quicker to adopt HTTP/2 and 31 % of the Alexa top 100 already support it. For the most part, websites do not change as they move from HTTP/1.1 to HTTP/2; current web development practices like inlining and domain sharding are still present. Contrary to previous results, we find that these practices make HTTP/2 more resilient to losses and jitter. In all, we find that 80 % of websites supporting HTTP/2 experience a decrease in page load time compared with HTTP/1.1 and the decrease grows in mobile networks.

    @INPROCEEDINGS{AF:PAM-16, title={Is the Web HTTP/2 Yet?}, author={A. {Finamore} and J. {Newman} and D. {Perino} and N. {Rattanavipanon} and C. {Soriente} and N. {Vallina-Rodriguez}}, year={2016}, booktitle={Passive and Active Measurements Conference (PAM)}, location={Crete, Greece}, doi={10.1007/978-3-319-30505-9_17}, howpublished="https://afinamore.io/pubs/PAM16_http2.pdf" }
  • Statistical Network Monitoring: Methodology and Application to Carrier-Grade NAT PDF details
    E. Bocchi, A. Safari Khatouni, S. Traverso, A. Finamore, M. M. Munafo, M. Mellia, D. Rossi
    International Journal of Computer and Telecommunications Networking (ComNet)
    When considering to passively collect and then process network traffic traces, the need to analyze raw data at several Gbps and to extract higher level indexes from the stream of packets poses typical BigData-like challenges. In this paper, we engineer a methodology to extract, collect and process passive traffic traces. In particular, we design and implement analytics that, based on a filtering process and on the building of empirical distributions, enable the comparison between two generic collections, e.g., data gathered from two different vantage points, from different populations, or at different times. The ultimate goal is to highlight statistically significant differences that could be useful to flag to incidents for the network manager. After introducing the methodology, we apply it to assess the impact of Carrier-Grade NAT (CGN), a technology that Internet Service Providers (ISPs) deploy to limit the usage of expensive public IP addresses. Since CGN may introduce connectivity issues and performance degradation, we process a large dataset of passive measurements collected from an ISP using CGN for part of its customers. We first extract detailed per-flow information by processing packets from live links. Then, we derive higher level statistics that are significant for the end-users, e.g., TCP connection setup time, HTTP response time, or BitTorrent average download throughput. At last, we contrast figures of customers being offered public or private addresses, and look for statistically significant differences. Results show that CGN does not impair quality of service in the analyzed ISP deployment. In addition, we use the collected data to derive useful figures for the proper dimensioning of the CGN and the configuration of its parameters in order to avoid impairments on end-users’ experience.

    @article{AF:COMNET-16, title={Statistical network monitoring: Methodology and application to carrier-grade NAT}, author={E. {Bocchi} and A. {Safari} and S. {Traverso} and A. {Finamore} and M. {Munafò} and M. {Mellia} and D. {Rossi}, year={2016}, journal={International Journal of Computer and Telecommunications Networking (ComNet)}, pages={20 - 35}, doi=10.1016/j.comnet.2016.06.018, howpublished="https://afinamore.io/pubs/COMNET16_CGNAT.pdf" }
  • A study of the impact of DNS Resolvers on CDN Performance Using a Causal Approach PDF details
    H. Hours, E. Biersack, P. Loiseau, A. Finamore, M. Mellia
    International Journal of Computer and Telecommunications Networking (ComNet)
    Resources such as Web pages or videos that are published in the Internet are referred to by their Uniform Resource Locator (URL). If a user accesses a resource via its URL, the host name part of the URL needs to be translated into a routable IP address. This translation is performed by the Domain Name System service (DNS). DNS also plays an important role when Content Distribution Networks (CDNs) are used to host replicas of popular objects on multiple servers that are located in geographically different areas. A CDN makes use of the DNS service to infer client location and direct the client request to the optimal server. While most Internet Service Providers (ISPs) offer a DNS service to their customers, clients may instead use a public DNS service. The choice of the DNS service can impact the performance of clients when retrieving a resource from a given CDN. In this paper we study the impact on download performance for clients using either the DNS service of their ISP or the public DNS service provided by Google DNS. We adopt a causal approach that exposes the structural dependencies of the different parameters impacted by the DNS service used and we show how to model these dependencies with a Bayesian network. The Bayesian network allows us to explain and quantify the performance benefits seen by clients when using the DNS service of their ISP. We also discuss how the further improve client performance.

    @article{AF:COMNET-16b, title={A study of the impact of DNS resolvers on CDN performance using a causal approach}, author=H. {Hours} and E. {Biersack} and P. {Loiseau} and A. {Finamore} and M. {Mellia}, year={2016}, journal={International Journal of Computer and Telecommunications Networking (ComNet)}, pages={200 - 210}, doi=10.1016/j.comnet.2016.06.023, howpublished="https://afinamore.io/pubs/%3Cnil%3E" }
  • Lost in Space: Improving Inference of IPv4 Address Space Utilization PDF details
    A. Dainotti, K. Benson, A. King, B. Huffaker, E. Glatz, X. A. Dimitropoulos, P. Richter, A. Finamore, A. C. Snoeren
    IEEE Journal on Selected Areas in Communications (JSAC)
    One challenge in understanding the evolution of the Internet infrastructure is the lack of systematic mechanisms for monitoring the extent to which allocated IP addresses are actually used. In this paper, we advance the science of inferring IPv4 address space utilization by proposing a novel taxonomy and analyzing and correlating results obtained through different types of measurements. We have previously studied an approach based on passive measurements that can reveal used portions of the address space unseen by active approaches. In this paper, we study such passive approaches in detail, extending our methodology to new types of vantage points and identifying traffic components that most significantly contribute to discovering used IPv4 network blocks. We then combine the results we obtained through passive measurements together with data from active measurement studies, as well as measurements from Border Gateway Protocol and additional data sets available to researchers. Through the analysis of this large collection of heterogeneous data sets, we substantially improve the state of the art in terms of: 1) understanding the challenges and opportunities in using passive and active techniques to study address utilization and 2) knowledge of the utilization of the IPv4 space.

    @ARTICLE{AF:JSAC-16, title={Lost in Space: Improving Inference of IPv4 Address Space Utilization}, author={A. {Dainotti} and K. {Benson} and A. {King} and B. {Huffaker} and E. {Glatz} and X. {Dimitropoulos} and P. {Richter} and A. {Finamore} and A. C. {Snoeren}}, year={2016}, journal={IEEE Journal on Selected Areas in Communications (JSAC)}, number={6}, pages={1862-1876}, doi={10.1109/JSAC.2016.2559218}, howpublished="https://afinamore.io/pubs/JSAC14_lost-in-space.pdf" }
  • A First Characterization of Anycast Traffic from Passive Traces PDF details
    D. Giordano, D. Cicalese, A. Finamore, M. Mellia, M. M. Munafo, D. Zeaiter Joumblatt, D. Rossi
    IEEE Network Traffic Measurement and Analysis Conference (TMA)
    IP anycast routes packets to the topologically nearest server according to BGP proximity. In the last years, new players have started adopting this technology to serve web content via Anycast-enabled CDNs (A-CDN). To the best of our knowledge, in the literature, there are studies that focus on a specific A-CDN deployment, but little is known about the users and the services that A-CDNs are serving in the Internet at large. This prompted us to perform a passive characterization study, bringing out the principal A-CDN actors in our monitored setup, the services they offer, their penetration, etc. Results show a very heterogeneous picture, with A-CDN empowered services that are very popular (e.g., Twitter or Bing), serve a lot of different contents (e.g., Wordpress or adult content), and even include audio/video streaming (e.g., Soundcloud, or Vine). Our measurements show that the A-CDN technology is quite mature and popular, with more than 50% of web users that access content served by a A-CDN during peak time.

    @ARTICLE{AF:TMA-16, title={A First Characterization of Anycast Traffic from Passive Traces}, author={D. {Giordano} and D. {Cicalese} and A. {Finamore} and M. {Mellia} and M. {Munafo} and D. {Joumblatt} and D. {Rossi}, year={2016}, booktitle={IEEE Network Traffic Measurement and Analysis Conference (TMA)}, location={Louvain La Neuve, Belgium}, doi=, howpublished="https://afinamore.io/pubs/TMA16_anycast.pdf" }
2015
  • A Study of the Impact of DNS Resolvers on Performance Using a Causal Approach PDF details
    H. Hours, E. Biersack, P. Loiseau, A. Finamore, M. Mellia
    IEEE International Teletraffic Conference (ITC)
    For a user to access any resource on the Internet, it is necessary to first locate a server hosting the requested resource. The Domain Name System service (DNS) represents the first step in this process, translating a human readable name, the resource host name, into an IP address. With the expansion of Content Distribution Networks (CDNs), the DNS service has seen its importance increase. In a CDN, objects are replicated on different servers to decrease the distance from the client to a server hosting the object that needs to be accessed. The DNS service should improve user experience by directing its demand to the optimal CDN server. While most of the Internet Service Providers (ISPs) offer a DNS service to their customers, it is now common to see clients using a public DNS service instead. This choice may have an impact on Web browsing performance. In this paper we study the impact of choosing one DNS service instead of another and we compare the performance of a large European ISP DNS service with the one of a public DNS service, Google DNS. We propose a causal approach to expose the structural dependencies of the different parameters impacted by the DNS service used and we show how to model these dependencies with a Bayesian network. This model allows us to explain and quantify the benefits obtained by clients using their ISP DNS service and to propose a solution to further improve their performance.

    @INPROCEEDINGS{AF:ITC-15, title={A Study of the Impact of DNS Resolvers on Performance Using a Causal Approach}, author={H. {Hours} and E. {Biersack} and P. {Loiseau} and A. {Finamore} and M. {Mellia}}, year={2015}, booktitle={International Teletraffic Congress (ITC)}, location={Ghent, Belgium}, doi={10.1109/ITC.2015.9}, howpublished="https://afinamore.io/pubs/ITC15_DNS.pdf" }
  • Impact of Carrier-Grade NAT on Web Browsing PDF details
    E. Bocchi, A. Safari Khatouni, S. Traverso, A. Finamore, V. Di Gennaro, M. Mellia, M. Munafò, D. Rossi
    IEEE International Wireless Communications and Mobile Computing Conference (IWCMC)}
    Public IPv4 addresses are a scarce resource. While IPv6 adoption is lagging, Network Address Translation (NAT) technologies have been deployed over the last years to alleviate IPv4 exiguity and their high rental cost. In particular, Carrier-Grade NAT (CGN) is a well known solution to mask a whole ISP network behind a limited amount of public IP addresses, significantly reducing expenses.

    @INPROCEEDINGS{AF:IWCMC-15, title={Impact of Carrier-Grade NAT on web browsing}, author={E. {Bocchi} and A. S. {Khatouni} and S. {Traverso} and A. {Finamore} and V. {Di Gennaro} and M. {Mellia} and M. {Munafò} and D. {Rossi}}, year={2015}, booktitle={IEEE International Wireless Communications and Mobile Computing Conference (IWCMC)}, location={Dubrovnik, Croatia}, pages={532-537}, doi={10.1109/IWCMC.2015.7289140}, howpublished="https://afinamore.io/pubs/TRAC15_CGNAT.pdf" }
  • Macroscopic View of Malware in Home Networks PDF details
    A. Finamore, S. Saha, G. Modelo-Howard, S.J. Lee, E. Bocchi, L. Grimaudo, M. Mellia, E. Baralis
    IEEE Consumer Communications and Networking Conference (CCNC)
    Malicious activities on the Web are increasingly threatening users in the Internet. Home networks are one of the prime targets of the attackers to host malware, commonly exploited as a stepping stone to further launch a variety of attacks. Due to diversification, existing security solutions often fail to detect malicious activities that remain hidden and pose threats to users' security and privacy. Characterizing behavioral patterns of known malware can help to improve the classification accuracy of threats. More importantly, as different malware might share commonalities, studying the behavior of known malware could help the detection of previously unknown malicious activities. We pose the research question if it is possible to characterize such behavioral patterns analyzing the traffic from known infected clients. We present our quest to discover such characterizations. Results show that commonalities arise but their identification may require some ingenuity. We also present our discovery of malicious activities that were left undetected by commercial IDS.

    @INPROCEEDINGS{AF:CCNC-15, title={Macroscopic view of malware in home networks}, author={A. {Finamore} and S. {Saha} and G. {Modelo-Howard} and S. {Lee} and E. {Bocchi} and L. {Grimaudo} and M. {Mellia} and E. {Baralis}}, year={2015}, booktitle={IEEE Consumer Communications and Networking Conference (CCNC)}, location={Las Vegas, NV, USA}, doi={10.1109/CCNC.2015.7157987}, howpublished="https://afinamore.io/pubs/CCNC15-malware.pdf" }
2014
  • The Cost of the 'S' in HTTPS PDF details
    D. Naylor, A. Finamore, I. Leontiadis, Y. Grunenberger, M. Mellia, M. M. Munafo, K. Papagiannaki, P. Steenkiste
    ACM Conference on emerging Networking EXperiments and Technologies. (CoNEXT)
    Increased user concern over security and privacy on the Internet has led to widespread adoption of HTTPS, the secure version of HTTP. HTTPS authenticates the communicating end points and provides confidentiality for the ensuing communication. However, as with any security solution, it does not come for free. HTTPS may introduce overhead in terms of infrastructure costs, communication latency, data usage, and energy consumption. Moreover, given the opaqueness of the encrypted communication, any in-network value added services requiring visibility into application layer content, such as caches and virus scanners, become ineffective. This paper attempts to shed some light on these costs. First, taking advantage of datasets collected from large ISPs, we examine the accelerating adoption of HTTPS over the last three years. Second, we quantify the direct and indirect costs of this evolution. Our results show that, indeed, security does not come for free. This work thus aims to stimulate discussion on technologies that can mitigate the costs of HTTPS while still protecting the user's privacy.

    @inproceedings{AF:CONEXT-14, title={The Cost of the "S" in HTTPS}, author={D. {Naylor} and A. {Finamore} and I. {Leontiadis} and Y. {Grunenberger} and M. {Mellia} and M. {Munafo} and K. {Papagiannaki} and P. {Steenkiste}}, year={2014}, booktitle={ACM Conference on Emerging Networking Experiments and Technologies (CoNEXT)}, location={Sydney, Australia}, doi={10.1145/2674005.2674991}, howpublished="https://afinamore.io/pubs/CONEXT15_costofthes.pdf" }
  • Large-Scale Network Traffic Monitoring with DBStream, a System for Rolling Big Data Analysis PDF details
    A. Bar, A. Finamore, P. Casas, L. Golab, M. Mellia
    IEEE International Conference on Big Data (Big Data)
    The complexity of the Internet has rapidly increased, making it more important and challenging to design scalable network monitoring tools. Network monitoring typically requires rolling data analysis, i.e., continuously and incrementally updating (rolling-over) various reports and statistics over highvolume data streams. In this paper, we describe DBStream, which is an SQL-based system that explicitly supports incremental queries for rolling data analysis. We also present a performance comparison of DBStream with a parallel data processing engine (Spark), showing that, in some scenarios, a single DBStream node can outperform a cluster of ten Spark nodes on rolling network monitoring workloads. Although our performance evaluation is based on network monitoring data, our results can be generalized to other Big Data problems with high volume and velocity.

    @INPROCEEDINGS{AF-BIGDATA-14, title={Large-scale network traffic monitoring with DBStream, a system for rolling big data analysis}, author={A. {Bär} and A. {Finamore} and P. {Casas} and L. {Golab} and M. {Mellia}}, year={2014}, booktitle={IEEE International Conference on Big Data (Big Data)}, location={Washington, DC, USA}, doi={10.1109/BigData.2014.7004227}, howpublished="https://afinamore.io/pubs/BIGDATA14_dbstream.pdf" }
  • Who to Blame when YouTube is not Working? Detecting Anomalies in CDN-Provisioned Services PDF details
    A. D'Alconzo, P. Casas, P. Fiadino, A. Bar, A. Finamore
    IEEE International Wireless Communications and Mobile Computing Conference (IWCMC)
    Internet-scale services like YouTube are provisioned by large Content Delivery Networks (CDNs), which push content as close as possible to the end-users to improve their Quality of Experience (QoE) and to pursue their own optimization goals. Adopting space and time variant traffic delivery policies, CDNs serve users' requests from multiple servers/caches at different physical locations and different times. CDNs traffic distribution policies can have a relevant impact on the traffic routed through the Internet Service Provider (ISP), as well as unexpected negative effects on the end-user QoE. In the event of poor QoE due to faulty CDN server selection, a major problem for the ISP is to avoid being blamed by its customers. In this paper we show a real case study in which Google CDN server selection policies negatively impact the QoE of the customers of a major European ISP watching YouTube. We argue that it is extremely important for the ISP to rapidly and automatically detect such events to increase its visibility on the overall operation of the network, as well as to promptly answer possible customer complaints. We therefore present an Anomaly Detection (AD) system for detecting unexpected cache-selection changes in the traffic delivered by CDNs. The proposed algorithm improves over traditional AD approaches by analyzing the complete probability distribution of the monitored features, as well as by self-adapting its functioning to dynamic environments, providing better detection capabilities.

    @INPROCEEDINGS{AF:IWCMC-14a, title={Who to blame when YouTube is not working? detecting anomalies in CDN-provisioned services}, author={A. {D'Alconzo} and P. {Casas} and P. {Fiadino} and A. {Bar} and A. {Finamore}}, year={2014}, booktitle={IEEE International Wireless Communications and Mobile Computing Conference (IWCMC)}, location={Nicosia, Cyprus}, doi={10.1109/IWCMC.2014.6906396}, howpublished="https://afinamore.io/pubs/TRAC14_youtube.pdf" }
  • Energy Efficiency in Access And Aggregation Networks: From Current Traffic to Potential Savings PDF details
    E. Bonetto, A. Finamore, M. Mellia, R. Fiandra
    International Journal of Computer and Telecommunications Networking (ComNet)
    Access and aggregation networks account nowadays for a large share of the consumed energy in communication networks, and actions to ameliorate their energy cost are under investigation by the research community. In this work, we present a study of the possible savings that could be achieved if such technologies were in place. We take advantage of large datasets of measurements collected from the network of FASTWEB, a national-wide Internet Service Provider in Italy. We first perform a detailed characterization of the energy consumption of Points of Presence (PoPs) investigating on how factors such as external temperature, cooling technology and traffic load influence the consumed energy. Our measurements precisely quantify how the power consumption in today networks is practically independent from the traffic volume, while it is correlated only with the external temperature. We then narrow down our analysis to consider the traffic generated by each household. More specifically, by observing about 10,000 ADSL customers, we characterize the typical traffic patterns generated by users who access the Internet. Using the available real data, we thus investigate if the energy consumption can be significantly reduced by applying simple energy-efficient policies that are currently under studies. We investigate energy-to-traffic proportional and resource consolidation technologies for the PoP, while sleep modes policies are considered at the ADSL lines. All these energy-efficient policies, even if they are not yet available, are currently being widely investigated by both manufacturers and researchers. At the PoP level, our dataset shows that it would be possible to save up to 50% of energy, and that even simple mechanisms would easily allow to save 30% of energy. Considering the ADSL lines, it results that sleep mode policies can be effectively implemented, reducing the energy consumption of ADSL modems with little or marginal impact on the Quality of Service offered to users. We make available all datasets used in this paper to allow other researchers to benchmark their proposals considering actual traffic traces.

    @article{AF:COMNET-14, title={Energy efficiency in access and aggregation networks: From current traffic to potential savings}, author={E. {Bonetto} and A. {Finamore} and M. {Mellia} and R. {Fiandra}, year={2014}, journal={International Journal of Computer and Telecommunications Networking (ComNet)}, pages={151 - 166}, doi={10.1016/j.comnet.2014.03.008}, howpublished="https://afinamore.io/pubs/COMNET14_energy.pdf" }
  • When YouTube Does Not Work - Analysis of QoE-Relevant Degradation in Google CDN Traffic PDF details
    P. Casas, A. D'Alconzo, P. Fiadino, A. Bar, A. Finamore, T. Zseby
    IEEE Transactions on Network and Service Management (TNSM)
    YouTube is the most popular service in today's Internet. Google relies on its massive content delivery network (CDN) to push YouTube videos as close as possible to the end-users, both to improve their watching experience as well as to reduce the load on the core of the network, using dynamic server selection strategies. However, we show that such a dynamic approach can actually have negative effects on the end-user quality of experience (QoE). Through the comprehensive analysis of one month of YouTube flow traces collected at the network of a large European ISP, we report a real case study in which YouTube QoE-relevant degradation affecting a large number of users occurs as a result of Google's server selection strategies. We present an iterative and structured process to detect, characterize, and diagnose QoE-relevant anomalies in CDN distributed services such as YouTube. The overall process uses statistical analysis methodologies to unveil the root causes behind automatically detected problems linked to the dynamics of CDNs' server selection strategies.

    @ARTICLE{AF:TNSM-14, title={When YouTube Does not Work—Analysis of QoE-Relevant Degradation in Google CDN Traffic}, author={P. {Casas} and A. {D'Alconzo} and P. {Fiadino} and A. {Bär} and A. {Finamore} and T. {Zseby}}, year={2014}, journal={IEEE Transactions on Network and Service Management (TNSM)}, number={4}, pages={441-457}, doi={10.1109/TNSM.2014.2377691}, howpublished="https://afinamore.io/pubs/TNSM14_youtube.pdf" }
  • On the Analysis of QoE-Based Performance Degradation in YouTube Traffic PDF details
    P. Casas, A. D'Alconzo, P. Fiadino, A. Bar, A. Finamore
    {IEEE International Conference on Network and Service Management (CNSM)}
    YouTube is the most popular service in today's Internet. Google relies on its massive Content Delivery Network (CDN) to push YouTube videos as close as possible to the end-users to improve their Quality of Experience (QoE), using dynamic server selection strategies. Such traffic delivery policies can have a relevant impact on the traffic routed through the Internet Service Providers (ISPs) providing the access, but most importantly, they can have negative effects on the end-user QoE. In this paper we shed light on the problem of diagnosing QoE-based performance degradation events in YouTube's traffic. Through the analysis of one month of YouTube flow traces collected at the network of a large European ISP, we particularly identify and drill down a Google's CDN server selection policy negatively impacting the watching experience of YouTube users during several days at peak-load times. The analysis combines both the user-side perspective and the CDN perspective of the end-to-end YouTube delivery service to diagnose the problem. The main contributions of the paper are threefold: firstly, we provide a large-scale characterization of the YouTube service in terms of traffic characteristics and provisioning behavior of the Google CDN servers. Secondly, we introduce simple yet effective QoE-based KPIs to monitor YouTube videos from the end-user perspective. Finally and most important, we analyze and provide evidence of the occurrence of QoE-based YouTube anomalies induced by CDN server selection policies, which are somehow normally hidden from the common knowledge of the end-user. This is a main issue for ISPs, who see their reputation degrade when such events occur, even if Google is the culprit.

    @INPROCEEDINGS{AF:CNSM-14, title={On the analysis of QoE-based performance degradation in YouTube traffic}, author={P. {Casas} and A. {D'Alconzo} and P. {Fiadino} and A. {Bär} and A. {Finamore}}, year={2014}, booktitle={IEEE International Conference on Network and Service Management (CNSM)}, location={Rio de Janeiro, Brazil}, pages={1-9}, doi={10.1109/CNSM.2014.7014135}, howpublished="https://afinamore.io/pubs/%3Cnil%3E" }
  • YouTube All Around: Characterizing YouTube From Mobile and Fixed-Line Network Vantage Points PDF details
    P. Casas, P. Fiadino, A. Bar, A. D'Alconzo, A. Finamore, M. Mellia
    {IEEE European Conference on Networks and Communications (EuCNC)}
    YouTube is the most popular service in today's Internet. Its own success forces Google to constantly evolve its functioning to cope with the ever growing number of users watching YouTube. Understanding the characteristics of YouTube's traffic as well as the way YouTube flows are served from the massive Google CDN is paramount for ISPs, specially for mobile operators, who must handle the huge surge of traffic with the capacity constraints of mobile networks. This papers presents a characterization of the YouTube traffic accessed through mobile and fixed-line networks. The analysis specially considers the YouTube content provisioning, studying the characteristics of the hosting servers as seen from both types of networks. To the best of our knowledge, this is the first paper presenting such a simultaneous characterization from mobile and fixed-line vantage points.

    @INPROCEEDINGS{AF:EUCNC-14, title={YouTube all around: Characterizing YouTube from mobile and fixed-line network vantage points}, author={P. {Casas} and P. {Fiadino} and A. {Bär} and A. {D'Alconzo} and A. {Finamore} and M. {Mellia}}, year={2014}, booktitle={European Conference on Networks and Communications (EuCNC)}, location={Bologna, Italy}, doi={10.1109/EuCNC.2014.6882697}, howpublished="https://afinamore.io/pubs/EUCNC14_youtube.pdf" }
  • DBStream: An Online Aggregation, Filtering and Processing System for Network Traffic Monitoring PDF details
    A. Bar, P. Casas, L. Golab, A. Finamore
    IEEE International Wireless Communications and Mobile Computing Conference (IWCMC)
    Network traffic monitoring systems generate high volumes of heterogeneous data streams which have to be processed and analyzed with different time constraints for daily network management operations. Some monitoring applications such as anomaly detection, performance tracking and alerting require fast processing of specific incoming real-time data. Other applications like fault diagnosis and trend analysis need to process historical data and perform deep analysis on generally heterogeneous sources of data. The Data Stream Warehousing (DSW) paradigm provides the means to handle both types of monitoring applications within a single system, providing fast and rich data analysis capabilities as well as data persistence. In this paper, we introduce DBStream, a novel online traffic monitoring system based on the DSW paradigm, which allows fast and flexible analysis across multiple heterogeneous data sources. DBStream provides a novel stream processing language for implementing data processing modules, as well as aggregation, filtering, and storage capabilities for further data analysis. We show multiple traffic monitoring applications running on DBStream, processing real traffic from operational ISPs.

    @INPROCEEDINGS{AF:IWCMC-14b, title={DBStream: An online aggregation, filtering and processing system for network traffic monitoring}, author={A. {Bär} and P. {Casas} and L. {Golab} and A. {Finamore}}, year={2014}, booktitle={International Wireless Communications and Mobile Computing Conference (IWCMC)}, location={Nicosia, Cyprus}, doi={10.1109/IWCMC.2014.6906426}, howpublished="https://afinamore.io/pubs/TRAC14_dbstream.pdf" }
  • On the Detection of Network Traffic Anomalies in Content Delivery Network Services PDF details
    P. Fiadino, A. D'Alconzo, A. Bar, A. Finamore, P. Casas
    IEEE International Teletraffic Congress (ITC)
    Today's Internet traffic is largely dominated by major content providers and highly distributed Content Delivery Networks (CDNs). Internet-scale applications like Facebook and YouTube are served by large CDNs like Akamai and Google CDN, which push content as close to end-users as possible to improve the overall performance of the applications, minimize the effects of peering point congestion and enhance the user experience. The load is balanced among multiple servers or caches according to non-disclosed CDN internal policies. As such, adopting space and time variant policies, users' requests are served from different physical locations at different time. Cache selection and load balancing policies can have a relevant impact on the traffic routed by the underlying transport network, as well as on the end-user experience. In this paper, we analyze the provisioning of two major Internet applications, namely Facebook and YouTube, in two datasets collected at major European Internet Service Providers (ISPs). First, we show how the cache selection performed by Akamai might result in higher transport costs for the ISP. Second, we present evidence on large-scale outages occurring in the Facebook traffic distribution. Finally, we characterize the variation of YouTube cache selection strategies and their impact on the users' quality of experience. We argue that it is important for the ISP to rapidly and automatically detect such events. Therefore, we present an Anomaly Detection (AD) system for detecting unexpected cache-selection events and changes in the traffic delivered by CDNs. The proposed algorithm improves over traditional AD approaches by analyzing the complete probability distribution of the monitored features, providing higher visibility and better detection capabilities.

    @INPROCEEDINGS{AF:ITC-14, title={On the detection of network traffic anomalies in content delivery network services}, author={P. {Fiadino} and A. {D'Alconzo} and A. {Bär} and A. {Finamore} and P. {Casas}}, year={2014}, booktitle={IEEE International Teletraffic Congress (ITC)}, location={Karlskrona, Sweden}, doi={10.1109/ITC.2014.6932930}, howpublished="https://afinamore.io/pubs/ITC14_anomalies.pdf" }
  • Gold Mining in a River of Internet Content Traffic PDF details
    Z. Ben-Houidi, G. Scavo, S. Ghamri-Doudane, A. Finamore, S. Traverso, M. Mellia
    TMA
    With the advent of Over-The-Top content providers (OTTs), Internet Service Providers (ISPs) saw their portfolio of services shrink to the low margin role of data transporters. In order to counter this effect, some ISPs started to follow big OTTs like Facebook and Google in trying to turn their data into a valuable asset. In this paper, we explore the questions of what meaningful information can be extracted from network data, and what interesting insights it can provide. To this end, we tackle the first challenge of detecting “user-URLs”, i.e., those links that were clicked by users as opposed to those objects automatically downloaded by browsers and applications. We devise algorithms to pinpoint such URLs, and validate them on manually collected ground truth traces. We then apply them on a three-day long traffic trace spanning more than 19,000 residential users that generated around 190 million HTTP transactions. We find that only 1.6% of these observed URLs were actually clicked by users. As a first application for our methods, we answer the question of which platforms participate most in promoting the Internet content. Surprisingly, we find that, despite its notoriety, only 11% of the user URL visits are coming from Google Search.

    @INPROCEEDINGS{AF:ITC-14, title={Gold Mining in a River of Internet Content Traffic}, author={Z. {Ben-Houidi} and G. {Scavo} and S. {Ghamri-Doudane} and A. {Finamore} and S. {Traverso} and M. {Mellia}}, year={2014}, booktitle={International Teletraffic Congress (ITC)}, location={London, UK}, doi={10.1007/978-3-642-54999-1_8}, howpublished="https://afinamore.io/pubs/TMA14_content-curation.pdf" }
2013
  • Is There a Case for Mobile Phone Content Pre-staging? PDF details
    A. Finamore, M. Mellia, Z. Gilani, K. Papagiannaki, V. Erramilli, Y. Grunenberger
    ACM Conference on emerging Networking EXperiments and Technologies (CoNEXT)
    Content caching is a fundamental building block of the Internet. Caches are widely deployed at network edges to improve performance for end-users, and to reduce load on web servers and the backbone network. Considering mobile 3G/4G networks, however, the bottleneck is at the access link, where bandwidth is shared among all mobile terminals. As such, per-user capacity cannot grow to cope with the traffic demand. Unfortunately, caching policies would not reduce the load on the wireless link which would have to carry multiple copies of the same object that is being downloaded by multiple mobile terminals sharing the same access link. In this paper we investigate if it is worth to push the caching paradigm even farther. We hypothesize a system in which mobile terminals implement a local cache, where popular content can be pushed/pre-staged. This exploits the peculiar broadcast capability of the wireless channels to replicate content "for free" on all terminals, saving the cost of transmitting multiple copies of those popular objects. Relying on a large data set collected from a European mobile carrier, we analyse the content popularity characteristics of mobile traffic, and quantify the benefit that the push-to-mobile system would produce. We found that content pre-staging, by proactively and periodically broadcasting "bundles" of popular objects to devices, allows to both greatly i) improve users' performance and ii) reduce up to 20% (40%) the downloaded volume (number of requests) in optimistic scenarios with a bundle of 100 MB. However, some technical constraints and content characteristics could question the actual gain such system would reach in practice.

    @inproceedings{AF:CONEXT-13, title={Is There a Case for Mobile Phone Content Pre-Staging?}, author={A. {Finamore} and M. {Mellia} and Z. {Gilani} and K. {Papagiannaki} and V. {Erramilli} and Y. {Grunenberger}}, year={2013}, booktitle={ACM Conference on Emerging Networking Experiments and Technologies (CoNEXT)}, location={Santa Barbara, California, USA}, doi={10.1145/2535372.2535414}, howpublished="https://afinamore.io/pubs/CONEXT13_prestaging.pdf" }
  • When Assistance Becomes Dependence: Characterizing the Costs and Inefficiencies of A-GPS PDF details
    N. Vallina-Rodriguez, J. Crowcroft, A. Finamore, Y. Grunenberger, K. Papagiannaki
    ACM SIGMOBILE Mobile Computing and Communications Review (MCCR)
    Location based services are a vital component of the mobile ecosystem. Among all the location technologies used behind the scenes, A-GPS (Assisted-GPS) is considered to be the most accurate. Unlike standalone GPS systems, A-GPS uses network support to speed nup position fix. However, it can be a dangerous strategy due to varying cell conditions which may impair performance, sometimes potentially neglecting the expected benefits of the original design. We present the characterization of the accuracy, location acquisition speed, energy cost, and network dependency of the state of the art A-GPS receivers shipped in popular mobile devices. Our analysis is based on active measurements, an exhaustive on-device analysis, and cellular traffic traces processing. The results reveals a number of inefficiencies as a result of the strong dependence on the cellular network to obtain assisting data, implementation, and integration problems.

    @article{AF:MCCR-13, title={When Assistance Becomes Dependence: Characterizing the Costs and Inefficiencies of A-GPS}, author={N. {Vallina-Rodriguez} and J. {Crowcroft} and A. {Finamore} and Y. {Grunenberger} and K. {Papagiannaki}}, year={2013}, journal={SIGMOBILE Mobile Computing and Communications Review (MCCR)}, number={4}, pages={3–14}, doi={10.1145/2557968.2557970}, howpublished="https://afinamore.io/pubs/MC2R13_agps.pdf" }
  • I Tube, YouTube, P2PTube: Assessing ISP Benefits of Peer-Assisted Caching of YouTube Content (poster) PDF details
    Y. Nicolas, D. Wolff, D. Rossi, A. Finamore
    IEEE International Conference on Peer-to-Peer Computing (P2P)
    The last few years have seen an explosion of video on demand traffic carried over the Internet infrastructure. While P2P applications have been proposed to carry VoD and TV content, they have so far encountered limited adoption except in Asian countries. Part of why this happens is explained with the fact that (i) the current asymmetric network infrastructure does not offer enough system capacity needed to let a fully P2P-VoD/TV to be self-sustainable, (ii) that the actual capacity at nominal peers is often smaller than the available one due to inefficiency in NAT punching[1], and (iii) the very same nonelastic nature of the service, that makes the system inherently less robust w.r.t elastic file-sharing to dynamic changes in the istantaneously available bandwidth. The other part of the story can be summarized with the success of CDN-managed services such as Netflix, Hulu and especially YouTube - according to [2], about 3 billion YouTube videos are viewed and 100's of thousand videos are uploaded every day, with independent research confirming YouTube to represent 20-30% of ISPs incoming traffic[3].

    @INPROCEEDINGS{AF:P2P-13, title={I Tube, YouTube, P2PTube: Assessing ISP benefits of peer-assisted caching of YouTube content}, author={Y. {Nicolas} and D. {Wolff} and D. {Rossi} and A. {Finamore}}, year={2013}, booktitle={IEEE International Conference on Peer-to-Peer Computing (P2P)}, location={Trento, Italy}, doi={10.1109/P2P.2013.6688724}, howpublished="https://afinamore.io/pubs/P2P13_poster.pdf" }
  • Reviewing Traffic Classification PDF details
    S. Valenti, D. Rossi, A. Dainotti, A. Pescape, A. Finamore, M. Mellia
    Data Traffic Monitoring and Analysis [book]
    Traffic classification has received increasing attention in the last years. It aims at offering the ability to automatically recognize the application that has generated a given stream of packets from the direct and passive observation of the individual packets, or stream of packets, flowing in the network. This ability is instrumental to a number of activities that are of extreme interest to carriers, Internet service providers and network administrators in general. Indeed, traffic classification is the basic block that is required to enable any traffic management operations, from differentiating traffic pricing and treatment (e.g., policing, shaping, etc.), to security operations (e.g., firewalling, filtering, anomaly detection, etc.). Up to few years ago, almost any Internet application was using well-known transport layer protocol ports that easily allowed its identification. More recently, the number of applications using random or non-standard ports has dramatically increased (e.g. Skype, BitTorrent, VPNs, etc.). Moreover, often network applications are configured to use well-known protocol ports assigned to other applications (e.g. TCP port 80 originally reserved for Web traffic) attempting to disguise their presence. For these reasons, and for the importance of correctly classifying traffic flows, novel approaches based respectively on packet inspection, statistical and machine learning techniques, and behavioral methods have been investigated and are becoming standard practice. In this chapter, we discuss the main trend in the field of traffic classification and we describe some of the main proposals of the research community. We complete this chapter by developing two examples of behavioral classifiers: both use supervised machine learning algorithms for classifications, but each is based on different features to describe the traffic. After presenting them, we compare their performance using a large dataset, showing the benefits and drawback of each approach.

    @INPROCEEDINGS{AF:book-13, title={Reviewing Traffic Classification}, author={S. {Valenti} and D. {Rossi} and A. {Dainotti} and A. {Pescapè} and A. {Finamore} and M. {Mellia}}, year={2013}, booktitle={Lecture Notes in Computer Science}, doi={10.1007/978-3-642-36784-7_6}, howpublished="https://afinamore.io/pubs/%3Cnil%3E" }
  • Public DNS Resolvers: Friends or Foes PDF details
    A. Finamore, I. Bermudez, M. Mellia
    Corr/arxiv

    , title=, author=, year=, doi=, howpublished="https://afinamore.io/pubs/techreport13_dns.pdf" }
2012
  • Characterization of ISP Traffic: Trends, User Habits, and Access Technology Impact PDF details
    J. L. Garcia-Dorado, A. Finamore, M. Mellia, M. Meo, M. M. Munafo
    IEEE Transactions on Network and Service Management (TNSM)
    In the recent years, the research community has increased its focus on network monitoring which is seen as a key tool to understand the Internet and the Internet users. Several studies have presented a deep characterization of a particular application, or a particular network, considering the point of view of either the ISP, or the Internet user. In this paper, we take a different perspective. We focus on three European countries where we have been collecting traffic for more than a year and a half through 5 vantage points with different access technologies. This humongous amount of information allows us not only to provide precise, multiple, and quantitative measurements of "What the user do with the Internet" in each country but also to identify common/uncommon patterns and habits across different countries and nations. Considering different time scales, we start presenting the trend of application popularity; then we focus our attention to a one-month long period, and further drill into a typical daily characterization of users activity. Results depict an evolving scenario due to the consolidation of new services as Video Streaming and File Hosting and to the adoption of new P2P technologies. Despite the heterogeneity of the users, some common tendencies emerge that can be leveraged by the ISPs to improve their service.

    @ARTICLE{AF:TNSM-12, title={Characterization of ISP Traffic: Trends, User Habits, and Access Technology Impact}, author={J. L. {Garcia-Dorado} and A. {Finamore} and M. {Mellia} and M. {Meo} and M. {Munafo}}, year={2012}, journal={IEEE Transactions on Network and Service Management (TNSM)}, number={2}, pages={142-155}, doi={10.1109/TNSM.2012.022412.110184}, howpublished="https://afinamore.io/pubs/TON12_charisp.pdf" }
  • Breaking for Commercials: Characterizing Mobile Advertising PDF details
    N. Vallina-Rodriguez, J. Shah, A. Finamore, Y. Grunenberger, K. Papagiannaki, H. Haddadi, J. Crowcroft
    IMC
    Mobile phones and tablets can be considered as the first incarnation of the post-PC era. Their explosive adoption rate has been driven by a number of factors, with the most signifcant influence being applications (apps) and app markets. Individuals and organizations are able to develop and publish apps, and the most popular form of monetization is mobile advertising. The mobile advertisement (ad) ecosystem has been the target of prior research, but these works typically focused on a small set of apps or are from a user privacy perspective. In this work we make use of a unique, anonymized data set corresponding to one day of traffic for a major European mobile carrier with more than 3 million subscribers. We further take a principled approach to characterize mobile ad traffic along a number of dimensions, such as overall traffic, frequency, as well as possible implications in terms of energy on a mobile device. Our analysis demonstrates a number of inefficiencies in today's ad delivery. We discuss the benefits of well-known techniques, such as pre-fetching and caching, to limit the energy and network signalling overhead caused by current systems. A prototype implementation on Android devices demonstrates an improvement of 50% in terms of energy consumption for offline ad-sponsored apps while limiting the amount of ad related traffic.

    @inproceedings{AF:IMC-12, title={Breaking for Commercials: Characterizing Mobile Advertising}, author={Vallina-Rodriguez, Narseo and Shah, Jay and Finamore, Alessandro and Grunenberger, Yan and Papagiannaki, Konstantina and Haddadi, Hamed and Crowcroft, Jon}, year={2012}, booktitle={IEEE Traffic Monitoring and Analysis Conference (TMA)}, location={Boston, Massachusetts, USA}, doi={10.1145/2398776.2398812}, howpublished="https://afinamore.io/pubs/IMC12_mobileads.pdf" }
  • Uncovering the Big Players of the Web PDF details
    V. Gehlen, A. Finamore, M. Mellia, M. M. Munafo
    IEEE Traffic Monitoring and Analysis Conference (TMA)
    In this paper we aim at observing how today the Internet large organizations deliver web content to end users. Using one-week long data sets collected at three vantage points aggregating more than 30,000 Internet customers, we characterize the offered services precisely quantifying and comparing the performance of different players. Results show that today 65% of the web traffic is handled by the top 10 organizations. We observe that, while all of them serve the same type of content, different server architectures have been adopted considering load balancing schemes, servers number and location: some organizations handle thousands of servers with the closest being few milliseconds far away from the end user, while others manage few data centers. Despite this, the performance of bulk transfer rate offered to end users are typically good, but impairment can arise when content is not readily available at the server and has to be retrieved from the CDN back-end.

    @inproceedings{AF:TMA-12, title={Uncovering the Big Players of the Web}, author={V. {Gehlen} and A. {Finamore} and M. {Mellia} and M. M. {Munafo}, year={2012}, booktitle={Traffic Monitoring and Analysis Conference (TMA)}, location={Boston, Massachusetts, USA}, doi={10.1007/978-3-642-28534-9_2}, howpublished="https://afinamore.io/pubs/TMA12_bigplayers.pdf" }
2011
  • Experiences of Internet Traffic Monitoring with Tstat PDF details
    A. Finamore, M. Mellia, M. Meo, M. M. Munafo, D. Rossi
    IEEE Network magazine
    Since the early days of the Internet, network traffic monitoring has always played a strategic role in understanding and characterizing users¿ activities. In this article, we present our experience in engineering and deploying Tstat, an open source passive monitoring tool that has been developed in the past 10 years. Started as a scalable tool to continuously monitor packets that flow on a link, Tstat has evolved into a complex application that gives network researchers and operators the possibility to derive extended and complex measurements thanks to advanced traffic classifiers. After discussing Tstat capabilities and internal design, we present some examples of measurements collected deploying Tstat at the edge of several ISP networks in past years. While other works report a continuous decline of P2P traffic with streaming and file hosting services rapidly increasing in popularity, the results presented in this article picture a different scenario. First, P2P decline has stopped, and in the last months of 2010 there was a counter tendency to increase P2P traffic over UDP, so the common belief that UDP traffic is negligible is not true anymore. Furthermore, streaming and file hosting applications have either stabilized or are experiencing decreasing traffic shares. We then discuss the scalability issues software-based tools have to cope with when deployed in real networks, showing the importance of properly identifying bottlenecks.

    @ARTICLE{AF:IEEENET-11, title={Experiences of Internet traffic monitoring with tstat}, author={A. {Finamore} and M. {Mellia} and M. {Meo} and M. M. {Munafo} and P. D. {Torino} and D. {Rossi}}, year={2011}, journal={IEEE Network magazine}, number={3}, pages={8-14}, doi={10.1109/MNET.2011.5772055}, howpublished="https://afinamore.io/pubs/IEEENET11_tstat.pdf" }
  • Dissecting Video Server Selection Strategies in the YouTube CDN PDF details
    R. Torres, A. Finamore, J. Ryong Kim, M. Mellia, M. M. Munafo, S. G. Rao
    IEEE International Conference on Distributed Computing Systems (ICDCS)
    In this paper, we conduct a detailed study of the YouTube CDN with a view to understanding the mechanisms and policies used to determine which data centers users download video from. Our analysis is conducted using week-long datasets simultaneously collected from the edge of five networks - two university campuses and three ISP networks - located in three different countries. We employ state-of-the-art delay-based geolocation techniques to find the geographical location of YouTube servers. A unique aspect of our work is that we perform our analysis on groups of related YouTube flows. This enables us to infer key aspects of the system design that would be difficult to glean by considering individual flows in isolation. Our results reveal that while the RTT between users and data centers plays a role in the video server selection process, a variety of other factors may influence this selection including load-balancing, diurnal effects, variations across DNS servers within a network, limited availability of rarely accessed video, and the need to alleviate hot-spots that may arise due to popular video content.

    @INPROCEEDINGS{AF:CDCS-11, title={Dissecting Video Server Selection Strategies in the YouTube CDN}, author={R. {Torres} and A. {Finamore} and J. R. {Kim} and M. {Mellia} and M. M. {Munafo} and S. {Rao}}, year={2011}, booktitle={IEEE International Conference on Distributed Computing Systems (ICDCS)}, location={Minneapolis, MN, USA}, doi={10.1109/ICDCS.2011.43}, howpublished="https://afinamore.io/pubs/ICDS11_youtube.pdf" }
  • YouTube Everywhere: Impact of Device and Infrastructure Synergies on User Experience PDF details
    A. Finamore, M. Mellia, M. M. Munafo, R. Torres, S. G. Rao
    ACM Internet Measurement Conference (IMC)
    In this paper we present a complete measurement study that compares YouTube traffic generated by mobile devices (smart-phones,tablets) with traffic generated by common PCs (desktops, notebooks, netbooks). We investigate the users' behavior and correlate it with the system performance. Our measurements are performed using unique data sets which are collected from vantage points in nation-wide ISPs and University campuses from two countries in Europe and the U.S. Our results show that the user access patterns are similar across a wide range of user locations, access technologies and user devices. Users stick with default player configurations, e.g., not changing video resolution or rarely enabling full screen playback. Furthermore it is very common that users abort video playback, with 60% of videos watched for no more than 20% of their duration. We show that the YouTube system is highly optimized for PC access and leverages aggressive buffering policies to guarantee excellent video playback. This however causes 25%-39% of data to be unnecessarily transferred, since users abort the playback very early. This waste of data transferred is even higher when mobile devices are considered. The limited storage offered by those devices makes the video download more complicated and overall less efficient, so that clients typically download more data than the actual video size. Overall, this result calls for better system optimization for both, PC and mobile accesses.

    @inproceedings{AF:IMC-11, title={YouTube Everywhere: Impact of Device and Infrastructure Synergies on User Experience}, author={A. {Finamore} and M. {Mellia} and M. {Munafo} and R. {Torres} and S. {Rao}}, year={2011}, booktitle={ACM Internet Measurement Conference (IMC)}, location={Berlin, Germany}, doi={10.1145/2068816.2068849}, howpublished="https://afinamore.io/pubs/IMC11_youtube.pdf" }
  • Mining Unclassified Traffic Using Automatic Clustering Techniques PDF details
    A. Finamore, M. Mellia, M. Meo
    IEEE Traffic Monitoring and Analysis Conference (TMA)
    In this paper we present a fully unsupervised algorithm to identify classes of traffic inside an aggregate. The algorithm leverages on the K-means clustering algorithm, augmented with a mechanism to automatically determine the number of traffic clusters. The signatures used for clustering are statistical representations of the application layer protocols. The proposed technique is extensively tested considering UDP traffic traces collected from operative networks. Performance tests show that it can clusterize the traffic in few tens of pure clusters, achieving an accuracy above 95%. Results are promising and suggest that the proposed approach might effectively be used for automatic traffic monitoring, e.g., to identify the birth of new applications and protocols, or the presence of anomalous or unexpected traffic.

    @inproceedings{AF:IMC-11, title={Mining Unclassified Traffic Using Automatic Clustering Techniques}, author={A. {Finamore} and M. {Mellia} and M. {Meo}}, year={2011}, booktitle={IEEE Traffic Monitoring and Analysis Conference (TMA)}, location={Wien, Austria}, doi={10.1007/978-3-642-20305-3_13}, howpublished="https://afinamore.io/pubs/TMA11_clustering.pdf" }
2010
  • KISS: Stochastic Packet Inspection Classifier for UDP Traffic PDF details
    A. Finamore, M. Mellia, M. Meo, D. Rossi
    IEEE/ACM Transactions on Networking (ToN)
    This paper proposes KISS, a novel Internet classification engine. Motivated by the expected raise of UDP traffic, which stems from the momentum of Peer-to-Peer (P2P) streaming applications, we propose a novel classification framework that leverages on statistical characterization of payload. Statistical signatures are derived by the means of a Chi-Square (χ 2 )-like test, which extracts the protocol “format,” but ignores the protocol “semantic” and “synchronization” rules. The signatures feed a decision process based either on the geometric distance among samples, or on Support Vector Machines. KISS is very accurate, and its signatures are intrinsically robust to packet sampling, reordering, and flow asymmetry, so that it can be used on almost any network. KISS is tested in different scenarios, considering traditional client-server protocols, VoIP, and both traditional and new P2P Internet applications. Results are astonishing. The average True Positive percentage is 99.6%, with the worst case equal to 98.1,% while results are almost perfect when dealing with new P2P streaming applications.

    @ARTICLE{AF:ToN-10, title={KISS: Stochastic Packet Inspection Classifier for UDP Traffic}, author={A. {Finamore} and M. {Mellia} and M. {Meo} and D. {Rossi}}, year={2010}, journal={IEEE/ACM Transactions on Networking (ToN)}, number={5}, pages={1505-1515}, doi={10.1109/TNET.2010.2044046}, howpublished="https://afinamore.io/pubs/TON10_kiss.pdf" }
  • Comparing P2PTV Traffic Classifiers PDF details
    N. Cascarano, F. Risso, A. Este, F. Gringoli, L. Salgarelli, A. Finamore, M. Mellia
    International Conference on Communications (ICC)
    Peer-to-Peer IP Television (P2PTV) applications represent one of the fastest growing application classes on the Internet, both in terms of their popularity and in terms of the amount of traffic they generate. While network operators require monitoring tools that can effectively analyze the traffic produced by these systems, few techniques have been tested on these mostly closed-source, proprietary applications. In this paper we examine the properties of three traffic classifiers applied to the problem of identifying P2PTV traffic. We report on extensive experiments conducted on traffic traces with reliable ground truth information, highlighting the benefits and shortcomings of each approach. The results show that not only their performance in terms of accuracy can vary significantly, but also that their usability features suggest different effective aspects that can be integrated.

    @INPROCEEDINGS{AF:ICC-10, title={Comparing P2PTV Traffic Classifiers}, author={N. {Cascarano} and F. {Risso} and A. {Este} and F. {Gringoli} and L. {Salgarelli} and A. {Finamore} and M. {Mellia}}, year={2010}, booktitle={International Conference on Communications (ICC)}, location={Cape Town, South Africa}, doi={10.1109/ICC.2010.5501744}, howpublished="https://afinamore.io/pubs/ICC10_ciscorfp.pdf" }
  • Stochastic Packet Inspection for TCP Traffic PDF details
    G. La Mantia, D. Rossi, A. Finamore, M. Mellia, M. Meo
    IEEE International Conference on Communications (ICC)
    In this paper, we extend the concept of Stochastic Packet Inspection (SPI) to support TCP traffic classification. SPI is a method based on the statistical fingerprint of the application-layer headers: by characterizing the frequencies of observed symbols, SPI can identify application protocol formats by automatically recognizing group of bits that take e.g., constant values, or random values, or are part of a counter. To correctly characterize symbol frequencies, SPI needs volumes of traffic to obtain statistically significant signatures. Earlier proposed for UDP traffic, SPI has to be modified to cope with the connection oriented service offered by TCP, in which application-layer headers are only found at the beginning of a TCP connection. In this paper, we extend SPI to support TCP traffic, and analyze its performance on real network data. The key idea is to move the classification target from single flows to endpoints, which aggregates all traffic sent/received by the same IP address and TCP port pair. The first few packets of flows sent from (or destined to) the same endpoint are then aggregated to yield a single SPI signature. Results show that SPI is able to achieve remarkably good results, with an average true positive rate of about 98%.

    @INPROCEEDINGS{AF:ICC-10, title={Stochastic Packet Inspection for TCP Traffic}, author={G. {La Mantia} and D. {Rossi} and A. {Finamore} and M. {Mellia} and M. {Meo}}, year={2010}, booktitle={IEEE International Conference on Communications (ICC)}, location={Cape Town, South Africa}, doi={10.1109/ICC.2010.5502280}, howpublished="https://afinamore.io/pubs/ICC10_kiss.pdf" }
  • Kiss to Abacus: A Comparison of P2P-TV Traffic Classifiers PDF details
    A. Finamore, M. Meo, D. Rossi, S. Valenti
    IEEE Traffic Monitoring and Analysis Conference (TMA)
    In the last few years the research community has proposed several techniques for network traffic classification. While the performance of these methods is promising especially for specific classes of traffic and particular network conditions, the lack of accurate comparisons among them makes it difficult to choose between them and find the most suitable technique for given needs. Motivated also by the increase of P2P-TV traffic, this work compares Abacus, a novel behavioral classification algorithm specific for P2P-TV traffic, and Kiss, an extremely accurate statistical payload-based classifier. We first evaluate their performance on a common set of traces and later we analyze their requirements in terms of both memory occupation and CPU consumption. Our results show that the behavioral classifier can be as accurate as the payload-based with also a substantial gain in terms of computational cost, although it can deal only with a very specific type of traffic.

    @INPROCEEDINGS{AF:ICC-10, title={Kiss to Abacus: A Comparison of P2P-TV Traffic Classifiers}, author={A. {Finamore} and M. {Meo} and D. {Rossi} and S. {Valenti}}, year={2010}, booktitle={IEEE Traffic Monitoring and Analysis Conference (TMA)}, location={Zurich, Switzerland}, doi={10.1007/978-3-642-12365-8_9}, howpublished="https://afinamore.io/pubs/TMA10_kiss-to-abacus.pdf" }
  • Live Traffic Monitoring with Tstat: Capabilities and Experiences PDF details
    A. Finamore, M. Mellia, M. Meo, M. M. Munafo, D. Rossi
    WWIC
    Network monitoring has always played a key role in understanding telecommunication networks since the pioneering time of the Internet. Today, monitoring traffic has become a key element to characterize network usage and users’ activities, to understand how complex applications work, to identify anomalous or malicious behaviors, etc. In this paper we present our experience in engineering and deploying Tstat, a passive monitoring tool that has been developed in the past ten years. Started as a scalable tool to continuously monitor packets that flow on a link, Tstat has evolved into a complex application that gives to network researchers and operators the possibility to derive extended and complex measurements. Tstat offers the capability to track traffic flows, it integrates advanced behavioral classifiers that identify the application that has generated a flow, and automatically derives performance indexes that allow to easily characterize both network usage and users’ activity. After describing Tstat capabilities and internal design, in this paper we present some examples of measurements collected deploying Tstat at the edge of our campus network for the past years.

    @INPROCEEDINGS{AF:WWIC-10, title={Live Traffic Monitoring with Tstat: Capabilities and Experiences}, author={A. {Finamore} and M. {Mellia} and M. {Munafo} and D. {Rossi}}, year={2010}, booktitle={Wired/Wireless Internet Communications (WWIC)}, location={Lulea, Sweden}, doi={10.1007/978-3-642-13315-2_24}, howpublished="https://afinamore.io/pubs/%3Cnil%3E" }
2009
  • KISS: Stochastic Packet Inspection PDF details
    A. Finamore, M. Mellia, M. Meo, D. Rossi
    IEEE Traffic Monitoring and Analysis Conference (TMA)
    This paper proposes KISS, a new Internet classification method. Motivated by the expected raise of UDP traffic volume, which stems from the momentum of P2P streaming applications, we propose a novel statistical payload-based classification framework, targeted to UDP traffic. Statistical signatures are automatically inferred from training data, by the means of a Chi-Square like test, which extracts the protocol “syntax”, but ignores the protocol semantic and synchronization rules. The signatures feed a decision engine based on Support Vector Machines. KISS is tested in different scenarios, considering both data, VoIP, and traditional P2P Internet applications. Results are astonishing. The average True Positive percentage is 99.6%, with the worst case equal 98.7%. Less than 0.05% of False Positives are detected.

    @INPROCEEDINGS{AF:TMA-09, title={KISS: Stochastic Packet Inspection}, author={A. {Finamore} and M. {Mellia} and M. {Meo} and D. {Rossi}}, year={2009}, booktitle={IEEE Traffic Monitoring and Analysis Conference (TMA)}, location={Aachen, Germany}, doi={10.1007/978-3-642-01645-5_14}, howpublished="https://afinamore.io/pubs/TMA09_kiss.pdf" }