Computer Sciencehttp://hdl.handle.net/10523/5682024-03-22T17:04:26Z2024-03-22T17:04:26ZAccelerating deep neural network training on optical interconnect systemsDai, Feihttp://hdl.handle.net/10523/162242023-10-11T23:00:08Z2023-10-11T22:58:33ZAccelerating deep neural network training on optical interconnect systems
2023
Dai, Fei
As deep learning (DL) algorithms evolve and data volumes expand, training deep neural networks (DNNs) has become essential across various domains, delivering unprecedented task accuracy. However, with the surge in dataset size and advancements in DNN models, the training process has become increasingly time-consuming. Traditionally, the acceleration of DNN training is tackled by adding more cores or nodes for parallel training in a chip or distributed system. This approach, however, encounters communication bottlenecks as it scales using electrical interconnect systems. A promising alternative is optical interconnection technology, which provides high bandwidth and parallel communication through wavelength-division multiplexing (WDM) at various integration levels. Yet, the fundamental difference between optical and electrical interconnects imposes challenges in directly applying existing parallel DNN training methods, necessitating the development of specific acceleration schemes for DNN training in optical interconnect systems. This thesis delves into such performance optimisation in optical network-on-chip (ONoC) and optical interconnect systems. It explores methods to harness the potential of optical communication for both accelerating DNN training on ONoC and optical interconnect systems, and optimising communication in distributed DNN training.
Fully connected neural networks (FCNNs) are pivotal in DL, and the fully connected layer is a critical component in both convolutional neural networks and Transformers. With this in mind, we first propose a fine-grained parallel computing model for accelerating FCNN training on ONoC. This model determines the optimal number of cores for each execution stage, thus minimising the time for one FCNN training epoch. We propose three mapping strategies for core allocation and compare their merits and drawbacks regarding hotspot level, memory requirement, and state transitions. By balancing computation and communication within the ONoC context, our scheme bridges the gap in optimising parallel FCNN training, providing a powerful tool for efficient FCNN training on ONoC.
As the size of datasets and complexity of DNN models continue to increase, it becomes increasingly necessary to use distributed DNN training instead of relying on a single machine. Given the frequent use of collective communication algorithms (All-reduce and All-gather) in distributed DNN training, we propose two efficient algorithms to minimise communication time in optical interconnect systems. First, we introduce WRHT, an All-reduce algorithm for distributed data-parallel DNN training that groups nodes hierarchically and reuses wavelengths to reduce communication steps and time. Second, we present OpTree, an effective All-gather algorithm for optical interconnect systems that optimise communication time by calculating the ideal m-ary tree for optical routing. With WRHT and OpTree, the communication time of distributed DNN training in optical interconnect systems can be significantly reduced, enhancing the overall efficiency of the training process.
Finally, to tackle the challenges of high memory requirements and substantial communication overhead in distributed data-parallel DNN training, we present a layer-wise hybrid-parallel acceleration scheme (LHAS) to expedite distributed DNN training on optical interconnect systems. LHAS includes the analysis of intra-layer and inter-layer communication, a cost model for communication and computation, and solutions for group communication and handling DNNs with multiple branches. By determining the optimal configuration (including the parallel method and the optimal number of nodes for each layer), LHAS can minimise the total DNN training time. LHAS, a notable advancement in distributed DNN training on optical interconnect systems, proposes an innovative and efficient approach for DNN model training, potentially transforming the field and inspiring future research.
2023-10-11T22:58:33ZDigital transmit beamforming with low RF-complexityAhmad, Waqashttp://hdl.handle.net/10523/157212024-03-08T14:18:56Z2023-08-17T03:54:38ZDigital transmit beamforming with low RF-complexity
2023
Ahmad, Waqas
The demand for high data rate is increasing day by day due to dramatic growth in wireless-connected devices. The shortage of bandwidth at sub-6GHz frequency bands cannot fulfil this demand. Therefore, future wireless systems will use millimetre wave (mmWave) frequency bands due to the availability of large bandwidth at these frequencies. The mmWave-enabled systems are supposed to have large antenna arrays to compensate for the high pathloss of mmWave frequency bands, and beamforming is one of the key enabling technologies to achieve it by forming highly directive beams. Digital beamforming (DBF) is considered optimal since each antenna signal is independently preserved from the digital baseband until the antenna aperture. However, a conventional DBF system requires a separate radio-frequency (RF) chain per antenna which creates a bottleneck for the DBF implementation of large antenna arrays due to high power consumption, hardware form factor, complexity, and cost. Therefore, sub-optimal analog beamforming and hybrid beamforming have gained more popularity because they can operate with a small number of RF chains while maintaining the same beamforming gain.
However, the RF-precoder in analog and hybrid beamforming becomes power-hungry when the number of transmit antennas or data streams is increased. In this dissertation, we have three major contributions.
Contribution 1: A low RF-complexity DBF architecture is proposed for future large-scale multi-antenna systems. This architecture ensures that the baseband unit maintains full control of each physical antenna at the aperture. Antennas at the aperture domain are divided into groups where each group transmits an independent data stream. All antennas in the same group share the same RF chain in a time-multiplexed manner to preserve the digitally weighted signal from baseband till the antenna aperture to enable a DBF system with reduced RF-complexity. Under ideal hardware conditions, the results presented in this dissertation show that the proposed architecture can achieve nearly the same performance as that of the conventional full RF-complexity DBF in terms of bandwidth and spectral efficiency.
Contribution 2: The performance of the proposed architecture has further been evaluated under the non-ideal hardware conditions.
For this purpose, mathematical models have been developed for non-linearities such as phase-noise, inphase-quadrature (IQ) imbalance, and inter-RF-chain cross-talk. The presented results show that, in terms of phase noise and IQ imbalance, the proposed and full RF-complexity DBF architectures demonstrate equivalent performance. However, due to reduced inter-RF-chain cross-talk, the proposal significantly outperforms full RF-complexity DBF in terms of error-vector-magnitude (EVM), channel capacity, and energy efficiency. The comparison of the proposed low RF-complexity DBF with the analog and hybrid beamforming systems shows that the proposal is more robust because it does not require a power-hungry RF beamformer.
Contribution 3: In the final part of this work, a multiuser extension of the proposed DBF architecture is presented. A dynamic antenna grouping algorithm is proposed such that it adapts itself according to the varying channel conditions. Notice that in contrast to a single-user beamforming architecture, here one antenna group is used to serve only a single user. Different antenna groups at the antenna aperture are used to serve different users. The mutual orthogonality among the multiuser channels is confirmed by the channel block-diagonalisation technique. The contribution also focuses on some simplified fixed antenna grouping proposals. Results show that the proposed low RF-complexity multiuser beamformer performs nearly the same, in terms of sum-rate performance, as that of state-of-the-art hybrid beamforming and full RF-complexity DBF techniques. Notice that in terms of robustness, the proposed architecture remains superior than hybrid and analog beamforming due to digital control of the antenna aperture.
2023-08-17T03:54:38ZCapturing the hierarchical structure of sequential events with temporal poolingSlack, Daniel Roberthttp://hdl.handle.net/10523/156572023-07-21T14:02:09Z2023-07-20T21:10:46ZCapturing the hierarchical structure of sequential events with temporal pooling
2023
Slack, Daniel Robert
Hierarchical representations of temporal sequences are central to the way we process the world around us. Consider the example of a child reciting a nursery rhyme. The song is composed of multiple phrases, each of which is composed of multiple words, each in turn composed of a series of articulated phonemes. The child must remember which of these elements occurs where and in what order. Depending on the song, phonemes, words, phrases, and even verses may reappear in different contexts within the piece. Each of these elements is a chunk of the complete sequence. How sequence chunks are learned and then compiled hierarchically into high-level structures within the brain is still an open question. This question is the central focus of this thesis. We present two primary contributions with the aim of answering this question. Our first model is a sequence chunker inspired by Jeff Hawkins’ hierarchical temporal memory model. It is designed to encode commonly observed sequences as declaratively represented chunks, and thereafter support sequential execution of items in these learned chunks. Our second model provides a method for combining this sequence chunker with reinforcement learning, enabling the formation of hierarchical sequence representations through the interoperation of these two methods. We draw on research on the basal ganglia, especially Ann Graybiel’s work covering the striatum. In particular, we explore the natural flow between the action-outcome learning of the dorsomedial striatum to the stimulus-response learning of the dorsolateral striatum. We demonstrate how the combination of these two forms of learning can lead to a smooth transition from low-level sequence chunks to deep hierarchical representations of sequences. Our model builds on recent work in neuroscience and makes novel experimental predictions which warrant further investigation.
2023-07-20T21:10:46ZEnergy-efficient collision avoidance algorithms for UAV swarmsHuang, Shuangyaohttp://hdl.handle.net/10523/156562023-07-21T14:02:10Z2023-07-20T04:18:44ZEnergy-efficient collision avoidance algorithms for UAV swarms
2023
Huang, Shuangyao
Unmanned Aerial Vehicle (UAV) swarms can provide promising solutions for unmanned delivery, search and rescue, tracking, monitoring, and post-disaster communication recovery in terms of safety and cost. The key challenges in collision avoidance for UAV swarms are safety, energy efficiency, cooperation, and reaction time. Conventional solutions suffer from low energy efficiency, long reaction times, and ineffective cooperation. Recent methods based on machine learning, such as Multi-Agent Reinforcement Learning (MARL), can achieve high energy efficiency, effective cooperation, and fast reaction. However, they have high failure rates in collision avoidance. How to address all these challenges in collision avoidance remains an open and critical problem. This thesis proposes three solutions to address the challenges progressively.
The first solution is the E2Coop algorithm that combines Artificial Potential Field (APF) and Particle Swarm Optimization (PSO) to achieve high energy efficiency and effective cooperation while ensuring safety. In E2Coop, APF provides environmental awareness and coordination to UAVs by constructing a potential field to represent the environment. At the same time, PSO searches optimal trajectories for UAVs, considering safety and energy efficiency under the coordination of APF. The second solution is CoDe, which uses Multi-Agent Reinforcement Learning (MARL) to train cooperative policies for UAVs operating in a swarm. The key contribution in CoDe is a novel credit assignment scheme based on difference rewards and counterfactual policy gradients. The credit assignment scheme requires no assumption on value functions, has low computational complexity, and applies to continuous action space. CoDe is over 90% faster than E2Coop in execution while reducing energy consumption by over 20%. The third solution is CoDe+, which combines E2Coop and CoDe to reduce learning variances and improve sample efficiency. It achieves at least 40% higher average score and saves over 50% of energy than E2Coop and nearly 30% than CoDe on average. The three solutions progressively address the key challenges of collision avoidance in UAV swarms, enabling more applications of UAV swarms in large-scale infrastructure-less and contact-less connections.
2023-07-20T04:18:44ZA tough nut to crack: Challenges in natural language processingParameswaran, Pradeeshhttp://hdl.handle.net/10523/147862023-02-03T13:02:09Z2023-02-02T21:04:24ZA tough nut to crack: Challenges in natural language processing
2023
Parameswaran, Pradeesh
Natural language processing (NLP) is the ability for computers to understand human language. Despite extensive research in NLP, some challenging problems still exist. In this thesis, we will examine three challenging areas of NLP: identification of target sarcasm, assessing human judgement, and appraising the quality of medical evidence. Our choice to focus on these specific problems was guided by challenges from the Australasian Language Association (ALTA) Shared Tasks.
The main goal of this thesis is to examine how well machine learning approaches perform in these areas and to provide a deeper understanding of what makes tackling these areas challenging. Throughout our investigation, we conducted our experiments using various techniques and evaluated them on publicly available data sets. For two of the three challenges (target sarcasm detection, and assessing human judgement), we have created state-of-the-art models for these tasks.
Our experimental results show that deep learning classifiers could perform well in identifying the target of sarcasm and accessing human judgement because of the complexity of the models. However, when it comes to evaluating the quality of medical evidence, traditional machine learning classifiers perform better because of the use of handcrafted features. We also found that humans and machines face similar challenges, when we measured the performance of humans against machine learning classifiers for evaluating human behaviour.
Nevertheless, we have successfully tackled the primary goal of this thesis through our experimentation with publicly available data sets and open-source machine learning frameworks.
2023-02-02T21:04:24ZA Context-aware Interface for Immersive Sports SpectatingLo, Wei HongRegenbrecht, HolgerEns, BarrettZollmann, Stefaniehttp://hdl.handle.net/10523/141892024-01-24T00:58:31Z2022-11-27T09:34:51ZA Context-aware Interface for Immersive Sports Spectating
2022-10
Lo, Wei Hong; Regenbrecht, Holger; Ens, Barrett; Zollmann, Stefanie
Novel Augmented Reality sports spectating interfaces allow on-site sports spectators to access game-related information by overlaying relevant digital data into their field of view. However, displaying all game-related information at once would overload the user. Therefore it is important to develop a suitable interface that is aware of the game context, the user’s context, and is able to display relevant information at the right time. We developed a state inference model based on spectators’ behavior and game states to provide a context-aware sports spectating interface. The interface gradually reveals information using different levels of detail that is based on the context of the game. As an implementation of our model, we created a prototype featuring a context-aware adaptive interface for a sports spectating scenario. Although our implementation is just a preliminary prototype, the goal of this research is to begin the exploration of intelligent context-aware interfaces to be used in on-site sports spectating.
2022-11-27T09:34:51ZBiomarkers of emotionGreene, Nicholashttp://hdl.handle.net/10523/141212022-11-15T13:02:08Z2022-11-14T21:56:23ZBiomarkers of emotion
2022
Greene, Nicholas
Emotion recognition is a burgeoning field in machine learning. Previous approaches have focused on classification of expressions from 2D facial images or voice acoustics. However, this focus on external body signals overlooks the rich source of internal physiological changes that occur within the body during an emotional event. In this work, we developed the Open Access data set PeakAffectDS, which contains physiological recordings that includes external (facial electromyography) and internal (electrocardiogram and respiration) markers of emotion. Fifty-one participants were recorded while viewing evocative movie clips. We applied deep learning methods to classify induced emotion on PeakAffectDS, with accuracy of 0.24, 0.21, 0.18, and 0.18 for six class classification on zygomaticus EMG, corrugator EMG, ECG, and respiration respectively. The modeling results are underwhelming, but preliminary analysis of participant responses are encouraging and validate the study design.
2022-11-14T21:56:23ZARSpectator — Enriching on-site sports spectating with augmented realityLo, Wei Honghttp://hdl.handle.net/10523/137832022-11-16T13:02:24Z2022-11-08T01:30:24ZARSpectator — Enriching on-site sports spectating with augmented reality
2022
Lo, Wei Hong
Recent technological advancements in sports broadcasting provide an enhanced experience for broadcast viewers through visualizations, statistics, commentary, and better viewpoints. Unfortunately, on-site spectators often do not have the same access to such information. In this thesis, we introduce a novel system ARSpectator; an Augmented Reality (AR) approach that integrates event-related information into the on-site spectators' field of view. This thesis describes the overall system of ARSpectator but ultimately focuses on the research's visualization, interface and user experience aspects.
The thesis starts by describing the components of ARSpectator and their interactions. Due to limited on-site accessibility, we developed a prototyping framework that allows flexible extended reality (XR) prototyping. The framework includes the planning phase, characteristics, and components needed for the prototypes. The framework's modular design also allows for synchronization in changes across all prototypes. In total, we developed four prototypes, from on-site stadium usage to a virtual reality prototype, where development and evaluation are made possible off-site.
We then investigate the visualization aspect of ARSpectator. The main visualization technique we focused on is situated visualization — a method where we present visualizations in spatial relevance to their referents. Based on related frameworks, we developed a conceptual situated visualization framework for on-site sports spectating. Building on that, we implemented and evaluated two situated visualization methods — Situated Broadcast-styled Visualization and Situated Infographics. Both visualization methods received positive feedback during a user study that we conducted.
Experience from the development of the prototypes showed that technical factors, such as registration, latency, and jitter, impact the user experience. Based on previous work, we investigated three common technical factors — latency, registration accuracy and jitter to find out the noticeable and disruptive effects they have on user experience. We conducted an experiment in which we highlighted the importance of reducing the effects of these technical factors, as when compounded, there is a considerable disruption to the user experience.
During the development and evaluation of the visualizations, we realized that regardless of how intuitive the visualizations are, an advanced user interface is required for a good experience interacting with the visualizations. Hence, we proposed a context-aware state inference model to analyze the user context. We developed and evaluated a Manual Trigger Interface and a Context-aware Adaptive Interface with potential end-users. Although the concept of a context-aware interface is compelling to participants, our research shows that the interface would need to be well-designed to avoid distractions.
Finally, we proposed a Stadium of the Future vision that explains how ARSpectator will play a significant role. We also explore the potential of XR technology in providing an interactive experience not only on-site but also for remote spectating. We included ideas that were brainstormed but were not implemented. We then conclude with the future outlook of this research area.
2022-11-08T01:30:24Z