Fisher information in flow size distribution estimation

Tune, P; Veitch, D

Fisher information in flow size distribution estimation

Tune, P Veitch, D

Permalink

Publication Type:: Journal Article
Citation:: IEEE Transactions on Information Theory, 2011, 57 (10), pp. 7011 - 7035
Issue Date:: 2011-10-01

Closed Access

	Filename	Description	Size
	06034747.pdf	Published Version	5.64 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Tune, P	en_US
dc.contributor.author	Veitch, D https://orcid.org/0000-0001-8163-3464	en_US
dc.date.issued	2011-10-01	en_US
dc.identifier.citation	IEEE Transactions on Information Theory, 2011, 57 (10), pp. 7011 - 7035	en_US
dc.identifier.issn	0018-9448	en_US
dc.identifier.uri	http://hdl.handle.net/10453/113141
dc.description.abstract	The flow size distribution is a useful metric for traffic modeling and management. Its estimation based on sampled data, however, is problematic. Previous work has shown that flow sampling (FS) offers enormous statistical benefits over packet sampling but high resource requirements precludes its use in routers. We present dual sampling (DS), a two-parameter family, which, to a large extent, provide FS-like statistical performance by approaching FS continuously, with just packet-sampling-like computational cost. Our work utilizes a Fisher information based approach recently used to evaluate a number of sampling schemes, excluding FS, for TCP flows. We revise and extend the approach to make rigorous and fair comparisons between FS, DS, and others. We show how DS significantly outperforms other packet based methods, including Sample and Hold, the closest packet sampling-based competitor to FS. We describe a packet sampling-based implementation of DS and analyze its key computational costs to show that router implementation is feasible. Our approach offers insights into numerous issues, including the notion of "flow quality" for understanding the relative performance of methods, and how and when employing sequence numbers is beneficial. Our work is theoretical with some simulation support and case studies on Internet data. © 2011 IEEE.	en_US
dc.relation.ispartof	IEEE Transactions on Information Theory	en_US
dc.relation.isbasedon	10.1109/TIT.2011.2165150	en_US
dc.subject.classification	Networking & Telecommunications	en_US
dc.title	Fisher information in flow size distribution estimation	en_US
dc.type	Journal Article
utslib.citation.volume	10	en_US
utslib.citation.volume	57	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	0906 Electrical and Electronic Engineering	en_US
utslib.for	1005 Communications Technologies	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
utslib.copyright.status	closed_access
pubs.issue	10	en_US
pubs.publication-status	Published	en_US
pubs.volume	57	en_US

Abstract:

The flow size distribution is a useful metric for traffic modeling and management. Its estimation based on sampled data, however, is problematic. Previous work has shown that flow sampling (FS) offers enormous statistical benefits over packet sampling but high resource requirements precludes its use in routers. We present dual sampling (DS), a two-parameter family, which, to a large extent, provide FS-like statistical performance by approaching FS continuously, with just packet-sampling-like computational cost. Our work utilizes a Fisher information based approach recently used to evaluate a number of sampling schemes, excluding FS, for TCP flows. We revise and extend the approach to make rigorous and fair comparisons between FS, DS, and others. We show how DS significantly outperforms other packet based methods, including Sample and Hold, the closest packet sampling-based competitor to FS. We describe a packet sampling-based implementation of DS and analyze its key computational costs to show that router implementation is feasible. Our approach offers insights into numerous issues, including the notion of "flow quality" for understanding the relative performance of methods, and how and when employing sequence numbers is beneficial. Our work is theoretical with some simulation support and case studies on Internet data. © 2011 IEEE.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/113141