Ben Blamey

About Me | Publications | Supervision | Teaching | Code

See Google Scholar for an up to date publication list.


Rapid development of cloud-native intelligent data pipelines for scientific data streams using the HASTE Toolkit.

GigaScience, Volume 10, Issue 3, March 2021 Paper (OUP)

Differentiated assessments for advanced courses that reveal issues with prerequisite skills: A design investigation

ITiCSE-WGR '20: Proceedings of the Working Group Reports on Innovation and Technology in Computer Science Education. June 2020 Paper (ACM)

Resource-and message size-aware scheduling of stream processing at the edge with application to realtime microscopy


Adapting The Secretary Hiring Problem for Optimal Hot-Cold Tier Placement under Top-K Workloads

The paper examines analytic solutions to optimization problems related to tiered/hierarchical storage under Top-K queries with HASTE, and its relation to the classic discrete optimization ‘Secretary Hiring Problem’. DBDM/CCGrid 2019. Paper (IEEE) Pre-Print Slides

Smart Resource Management for Data Streaming using an Online Bin-packing Strategy

2020 IEEE International Conference on Big Data (Big Data). Paper (IEEE)

Apache Spark Streaming, Kafka and HarmonicIO: a performance benchmark and architecture comparison for enterprise and scientific computing

Bench 2019: Benchmarking, Measuring, and Optimizing. pp 335-347. Paper (Springer) Pre-Print

HarmonicIO: Scalable Data Stream Processing for Scientific Datasets

IEEE Services 2018. IEEE Xplore

Lifelogging with SAESNEG: A System for the Automated Extraction of Social Network Event Groups (PhD Thesis, 2015)

Phase A extracts information, focusing on natural language processing; new techniques are developed; including a novel distributed approach to handling temporal expressions, and a parser for social events (such as birthdays). Information is also extracted from image and metadata, the resultant annotations feeding the subsequent event clustering. Phase B performs event clustering through the application of a number of pairwise similarity strategies -- a mixture of new and existing algorithms. Clustering itself is achieved by combining machine-learning with correlation clustering.

This thesis presents SAESNEG, a System for the Automated Extraction of Social Network Event Groups; a pipeline for the aggregation of the personal social media footprint, and its partitioning into events, the ``event clustering'' problem. SAESNEG facilitates a reminiscence-friendly user experience, where the user is able to navigate their social media footprint. A range of socio-technical issues are explored: the challenges to reminiscence, lifelogging, ownership, and digital death. Whilst previous systems have focused on the organisation of a single type of data, such as photos or Tweets respectively; SAESNEG handles a variety of types of social network documents found in a typical footprint (e.g. photos, Tweets, check-ins), with a variety of image, text and other metadata — differently heterogeneous data; adapted to sparse, private events typical of the personal social media footprint.

The main contributions of this thesis are the identification of the technical research task (and the associated social need), the development of novel algorithms and approaches, and the integration of these with existing algorithms to form the pipeline. Results demonstrate SAESNEG's capability to perform event clustering on a differently heterogeneous dataset, enabling users to achieve lifelogging in the context of their existing social media networks.

PhD Thesis (British Library) | PhD Thesis (PDF) | Bibtex

'The First Day of Summer': Parsing Temporal Expressions with Distributed Semantics

SGAI 2013: Research and Development in Intelligent Systems XXX. pp 389-402.

Detecting and understanding temporal expressions are key tasks in natural language processing (NLP), and are important for event detection and information retrieval. In the existing approaches, temporal semantics are typically represented as discrete ranges or specific dates, and the task is restricted to text that conforms to this representation. We propose an alternate paradigm: that of distributed temporal semantics - where a probability density function models relative probabilities of the various interpretations. We extend SUTime, a state-of-the-art NLP system to incorporate our approach, and build definitions of new and existing temporal expressions.

Online Demo | Paper (Springer) | Paper (PDF) | Slides (PDF) | Bibtex | Source Code

R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora

SGAI 2012: Research and Development in Intelligent Systems XXIX. pp 207-212.

Awarded Prize for Best Poster at BCS SGAI 2012.

Paper (Springer)

Paper (PDF) | Poster (PDF) | Bibtex | Data (Twitter only)

If you use the data - please cite the paper! ☺

Malmö University Profile | Google Scholar | GitHub | ORCID | dblp | | ResearchGate | Publons | Semantic Scholar | bitbucket | dockerhub | Scopus