new

Get trending papers in your email inbox!

Subscribe

Daily Papers

by AK and the research community

iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform

In recent text-to-speech synthesis and voice conversion systems, a mel-spectrogram is commonly applied as an intermediate representation, and the necessity for a mel-spectrogram vocoder is increasing. A mel-spectrogram vocoder must solve three inverse problems: recovery of the original-scale magnitude spectrogram, phase reconstruction, and frequency-to-time conversion. A typical convolutional mel-spectrogram vocoder solves these problems jointly and implicitly using a convolutional neural network, including temporal upsampling layers, when directly calculating a raw waveform. Such an approach allows skipping redundant processes during waveform synthesis (e.g., the direct reconstruction of high-dimensional original-scale spectrograms). By contrast, the approach solves all problems in a black box and cannot effectively employ the time-frequency structures existing in a mel-spectrogram. We thus propose iSTFTNet, which replaces some output-side layers of the mel-spectrogram vocoder with the inverse short-time Fourier transform (iSTFT) after sufficiently reducing the frequency dimension using upsampling layers, reducing the computational cost from black-box modeling and avoiding redundant estimations of high-dimensional spectrograms. During our experiments, we applied our ideas to three HiFi-GAN variants and made the models faster and more lightweight with a reasonable speech quality. Audio samples are available at https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/istftnet/.

SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond

Recent advances in generative models that iteratively synthesize audio clips sparked great success to text-to-audio synthesis (TTA), but with the cost of slow synthesis speed and heavy computation. Although there have been attempts to accelerate the iterative procedure, high-quality TTA systems remain inefficient due to hundreds of iterations required in the inference phase and large amount of model parameters. To address the challenges, we propose SpecMaskGIT, a light-weighted, efficient yet effective TTA model based on the masked generative modeling of spectrograms. First, SpecMaskGIT synthesizes a realistic 10s audio clip by less than 16 iterations, an order-of-magnitude less than previous iterative TTA methods.As a discrete model, SpecMaskGIT outperforms larger VQ-Diffusion and auto-regressive models in the TTA benchmark, while being real-time with only 4 CPU cores or even 30x faster with a GPU. Next, built upon a latent space of Mel-spectrogram, SpecMaskGIT has a wider range of applications (e.g., the zero-shot bandwidth extension) than similar methods built on the latent wave domain. Moreover, we interpret SpecMaskGIT as a generative extension to previous discriminative audio masked Transformers, and shed light on its audio representation learning potential. We hope our work inspires the exploration of masked audio modeling toward further diverse scenarios.

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation

Single-channel, speaker-independent speech separation methods have recently seen great progress. However, the accuracy, latency, and computational cost of such methods remain insufficient. The majority of the previous methods have formulated the separation problem through the time-frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase and magnitude of the signal, the suboptimality of time-frequency representation for speech separation, and the long latency in calculating the spectrograms. To address these shortcomings, we propose a fully-convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time-domain speech separation. Conv-TasNet uses a linear encoder to generate a representation of the speech waveform optimized for separating individual speakers. Speaker separation is achieved by applying a set of weighting functions (masks) to the encoder output. The modified encoder representations are then inverted back to the waveforms using a linear decoder. The masks are found using a temporal convolutional network (TCN) consisting of stacked 1-D dilated convolutional blocks, which allows the network to model the long-term dependencies of the speech signal while maintaining a small model size. The proposed Conv-TasNet system significantly outperforms previous time-frequency masking methods in separating two- and three-speaker mixtures. Additionally, Conv-TasNet surpasses several ideal time-frequency magnitude masks in two-speaker speech separation as evaluated by both objective distortion measures and subjective quality assessment by human listeners. Finally, Conv-TasNet has a significantly smaller model size and a shorter minimum latency, making it a suitable solution for both offline and real-time speech separation applications.

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

In this paper, we propose a convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3D) space. The proposed network takes a sequence of consecutive spectrogram time-frames as input and maps it to two outputs in parallel. As the first output, the sound event detection (SED) is performed as a multi-label classification task on each time-frame producing temporal activity for all the sound event classes. As the second output, localization is performed by estimating the 3D Cartesian coordinates of the direction-of-arrival (DOA) for each sound event class using multi-output regression. The proposed method is able to associate multiple DOAs with respective sound event labels and further track this association with respect to time. The proposed method uses separately the phase and magnitude component of the spectrogram calculated on each audio channel as the feature, thereby avoiding any method- and array-specific feature extraction. The method is evaluated on five Ambisonic and two circular array format datasets with different overlapping sound events in anechoic, reverberant and real-life scenarios. The proposed method is compared with two SED, three DOA estimation, and one SELD baselines. The results show that the proposed method is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios. The proposed method achieved a consistently higher recall of the estimated number of DOAs across datasets in comparison to the best baseline. Additionally, this recall was observed to be significantly better than the best baseline method for a higher number of overlapping sound events.

Taming Visually Guided Sound Generation

Recent advances in visually-induced audio generation are based on sampling short, low-fidelity, and one-class sounds. Moreover, sampling 1 second of audio from the state-of-the-art model takes minutes on a high-end GPU. In this work, we propose a single model capable of generating visually relevant, high-fidelity sounds prompted with a set of frames from open-domain videos in less time than it takes to play it on a single GPU. We train a transformer to sample a new spectrogram from the pre-trained spectrogram codebook given the set of video features. The codebook is obtained using a variant of VQGAN trained to produce a compact sampling space with a novel spectrogram-based perceptual loss. The generated spectrogram is transformed into a waveform using a window-based GAN that significantly speeds up generation. Considering the lack of metrics for automatic evaluation of generated spectrograms, we also build a family of metrics called FID and MKL. These metrics are based on a novel sound classifier, called Melception, and designed to evaluate the fidelity and relevance of open-domain samples. Both qualitative and quantitative studies are conducted on small- and large-scale datasets to evaluate the fidelity and relevance of generated samples. We also compare our model to the state-of-the-art and observe a substantial improvement in quality, size, and computation time. Code, demo, and samples: v-iashin.github.io/SpecVQGAN

Estimation of Classical Cepheid's Physical Parameters from NIR Light Curves

Recent space-borne and ground-based observations provide photometric measurements as time series. The effect of interstellar dust extinction in the near-infrared range is only 10% of that measured in the V band. However, the sensitivity of the light curve shape to the physical parameters in the near-infrared is much lower. So, interpreting these types of data sets requires new approaches like the different large-scale surveys, which create similar problems with big data. Using a selected data set, we provide a method for applying routines implemented in R to extract most information of measurements to determine physical parameters, which can also be used in automatic classification schemes and pipeline processing. We made a multivariate classification of 131 Cepheid light curves (LC) in J, H, and K colors, where all the LCs were represented in 20D parameter space in these colors separately. Performing a Principal Component Analysis (PCA), we got an orthogonal coordinate system and squared Euclidean distances between LCs, with 6 significant eigenvalues, reducing the 20-dimension to 6. We also estimated the optimal number of partitions of similar objects and found it to be equal to 7 in each color; their dependence on the period, absolute magnitude, amplitude, and metallicity are also discussed. We computed the Spearman rank correlations, showing that periods and absolute magnitudes correlate with the first three PCs significantly. The first two PC are also found to have a relationship with the amplitude, but the metallicity effects are only marginal. The method shown can be generalized and implemented in unsupervised classification schemes and analysis of mixed and biased samples. The analysis of our Classical Cepheid near-infrared LC sample showed that the J, H, K curves are insufficient for determination of stellar metallicity, with mass being the key factor shaping them.

A universal break in energy functions of three hyperactive repeating fast radio bursts

Fast radio bursts (FRBs) are millisecond-duration pulses occurring at cosmological distances with a mysterious origin. Observations show that at least some FRBs are produced by magnetars. All magnetar-powered FRB models require some triggering mechanisms, among which the most popular is the crust cracking of a neutron star, which is called starquake. However, so far there has been no decisive evidence for this speculation. Here we report the energy functions of the three most active repeating FRBs, which show a universal break around 10^{38} erg. Such a break is similar to that of the frequency-magnitude relationship of earthquakes. The break and change of the power-law indices below and above it can be well understood within the framework of FRBs triggered by starquakes in the magnetar models. The seed of weak FRBs can grow both on the magnetar surface and in the deeper crust. In contrast, the triggering of strong FRBs is confined by the crustal thickness and the seed of strong FRBs can only grow on the surface. This difference in dimensionality causes a break in the scaling properties from weak to strong FRBs, occurring at a point where the penetration depth of starquakes equals the crustal thickness. Our result, together with the earthquake-like temporal properties of these FRBs, strongly supports that FRBs are triggered by starquakes, providing a new opportunity to study the physical properties of the neutron star crust.

High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models

Text-to-speech (TTS) methods have shown promising results in voice cloning, but they require a large number of labeled text-speech pairs. Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations(semantic \& acoustic) and using two sequence-to-sequence tasks to enable training with minimal supervision. However, existing methods suffer from information redundancy and dimension explosion in semantic representation, and high-frequency waveform distortion in discrete acoustic representation. Autoregressive frameworks exhibit typical instability and uncontrollability issues. And non-autoregressive frameworks suffer from prosodic averaging caused by duration prediction models. To address these issues, we propose a minimally-supervised high-fidelity speech synthesis method, where all modules are constructed based on the diffusion models. The non-autoregressive framework enhances controllability, and the duration diffusion model enables diversified prosodic expression. Contrastive Token-Acoustic Pretraining (CTAP) is used as an intermediate semantic representation to solve the problems of information redundancy and dimension explosion in existing semantic coding methods. Mel-spectrogram is used as the acoustic representation. Both semantic and acoustic representations are predicted by continuous variable regression tasks to solve the problem of high-frequency fine-grained waveform distortion. Experimental results show that our proposed method outperforms the baseline method. We provide audio samples on our website.

KIC 4150611: A quadruply eclipsing heptuple star system with a g-mode period-spacing pattern Asteroseismic modelling of the g-mode period-spacing pattern

In this work, we aim to estimate the stellar parameters of the primary (Aa) by performing asteroseismic analysis on its period-spacing pattern. We use the C-3PO neural network to perform asteroseismic modelling of the g-mode period-spacing pattern of Aa, discussing the interplay of this information with external constraints from spectroscopy (T_{rm eff} and log(g)) and eclipse modelling (R). To estimate the level of uncertainty due to different frequency extraction and pattern identification processes, we consider four different variations on the period-spacing patterns. To better understand the correlations between and the uncertainty structure of our parameter estimates, we also employed a classical, parameter-based MCMC grid search on four different stellar grids. The best-fitting, externally constrained model to the period-spacing pattern arrives at estimates of the stellar properties for Aa of: M=1.51 pm 0.05 M_odot, X_c =0.43 pm 0.04, R=1.66 pm 0.1 R_odot, f_{rm ov}=0.010, Omega_c=1.58 pm 0.01 d^{-1} with rigid rotation to within the measurement errors, log(T_{rm eff})=3.856 pm 0.008 dex, log(g)=4.18 pm 0.04 dex, and log(L)=0.809 pm 0.005 dex, which agree well with previous measurements from eclipse modelling, spectroscopy, and the Gaia DR3 luminosity. We find that the near-core properties of the best-fitting asteroseismic models are consistent with external constraints from eclipse modelling and spectroscopy. Aa appears to be a typical example of a gamma Dor star, fitting well within existing populations. We find that Aa is quasi-rigidly rotating to within the uncertainties, and note that the asteroseismic age estimate for Aa (1100 pm 100 Myr) is considerably older than the young (35 Myr) age implied by previous isochrone fits to the B binary in the literature. Our MCMC parameter-based grid-search agrees well with our pattern-modelling approach.

Signatures of the Shock Interaction as an Additional Power Source in the Nebular Spectra of SN 2023ixf

Red supergiants may lose significant mass through steady winds and episodic eruptions in the final 100-1000 years before the core collapses, shaping their circumstellar environment. Interaction between supernova (SN) ejecta and distant circumstellar material (CSM) can generate shocks, which can energize the ejecta and serve as a key power source during the nebular phase of the SN. In the present work, we investigate the nebular spectrum of SN 2023ixf, observed one year post-explosion (at +363 d) with the recently commissioned WEAVE instrument on the 4.2m William Herschel Telescope. This marks the first supernova spectrum captured with WEAVE. In this spectrum, Halpha exhibits a peculiar evolution, flanked by blueward and redward broad components centred at simpm 5650,km,s^{-1} from the rest velocity of Halpha, which are seen for only a few SNe to date. These features indicate energy deposition from shocks generated by the interaction of ejecta with a CSM expelled nearly 350 - 640 years pre-explosion. Comparisons of the +363 d spectrum with model spectra from the literature, that include varying shock powers, suggest a shock power of at least sim 5 times 10 ^{40},erg,s^{-1} at this epoch. Additionally, analysis of the [O I] doublet, along with other prominent emission lines, provides evidence for clumpiness, dust formation, and asymmetry within the ejecta and/or the surrounding CSM. These emission lines also helped to constrain the oxygen mass (approx0.19^{scriptscriptstyle +0.08}_{scriptscriptstyle -0.04} M_odot), He-core mass (<3 M_odot) and the zero-age main sequence mass (lesssim 12 M_odot) of the progenitor of SN 2023ixf. The comparison with other Type II SNe highlights SN 2023ixf's unique shock interaction signatures and evidence of dust formation, setting it apart in terms of evolution and dynamics.

Revisiting the Classics: On the Optical Colours of Novae as Standard Crayons

We present a systematic study of the BVRI colours of novae over the course of their eruptions. Where possible, interstellar reddening was measured using the equivalent widths of Diffuse Interstellar Bands (DIBs). Some novae lack spectra with sufficient resolution and signal-to-noise ratios; therefore, we supplement as necessary with 3D and 2D dust maps. Utilising only novae with DIB- or 3D-map-based E(B-V), we find an average intrinsic (B-V)_0 colour of novae at V-band light curve peak of 0.18 with a standard deviation of 0.31, based on a sample of 23 novae. When the light curve has declined by 2 magnitudes (t_2), we find an average (B-V)_0 = -0.02 with a standard deviation of 0.19. These average colours are consistent with previous findings, although the spreads are larger than previously found due to more accurate reddening estimates. We also examined the intrinsic (R-I)_0 and (V-R)_0 colours across our sample. These colours behave similarly to (B-V)_0, except that the (V-R)_0 colour gets redder after peak, likely due to the contributions of emission line flux. We searched for correlations between nova colours and t_2, peak V-band absolute magnitude, and GeV gamma-ray luminosity, but find no statistically significant correlations. Nova colours can therefore be used as standard "crayons" to estimate interstellar reddening from photometry alone, with 0.2--0.3 mag uncertainty. We present a novel Bayesian strategy for estimating distances to Galactic novae based on these E(B-V) measurements, independent of assumptions about luminosity, built using 3D dust maps and a stellar mass model of the Milky Way.

PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation

Recently, universal waveform generation tasks have been investigated conditioned on various out-of-distribution scenarios. Although GAN-based methods have shown their strength in fast waveform generation, they are vulnerable to train-inference mismatch scenarios such as two-stage text-to-speech. Meanwhile, diffusion-based models have shown their powerful generative performance in other domains; however, they stay out of the limelight due to slow inference speed in waveform generation tasks. Above all, there is no generator architecture that can explicitly disentangle the natural periodic features of high-resolution waveform signals. In this paper, we propose PeriodWave, a novel universal waveform generation model. First, we introduce a period-aware flow matching estimator that can capture the periodic features of the waveform signal when estimating the vector fields. Additionally, we utilize a multi-period estimator that avoids overlaps to capture different periodic features of waveform signals. Although increasing the number of periods can improve the performance significantly, this requires more computational costs. To reduce this issue, we also propose a single period-conditional universal estimator that can feed-forward parallel by period-wise batch inference. Additionally, we utilize discrete wavelet transform to losslessly disentangle the frequency information of waveform signals for high-frequency modeling, and introduce FreeU to reduce the high-frequency noise for waveform generation. The experimental results demonstrated that our model outperforms the previous models both in Mel-spectrogram reconstruction and text-to-speech tasks. All source code will be available at https://github.com/sh-lee-prml/PeriodWave.

Mel-RoFormer for Vocal Separation and Vocal Melody Transcription

Developing a versatile deep neural network to model music audio is crucial in MIR. This task is challenging due to the intricate spectral variations inherent in music signals, which convey melody, harmonics, and timbres of diverse instruments. In this paper, we introduce Mel-RoFormer, a spectrogram-based model featuring two key designs: a novel Mel-band Projection module at the front-end to enhance the model's capability to capture informative features across multiple frequency bands, and interleaved RoPE Transformers to explicitly model the frequency and time dimensions as two separate sequences. We apply Mel-RoFormer to tackle two essential MIR tasks: vocal separation and vocal melody transcription, aimed at isolating singing voices from audio mixtures and transcribing their lead melodies, respectively. Despite their shared focus on singing signals, these tasks possess distinct optimization objectives. Instead of training a unified model, we adopt a two-step approach. Initially, we train a vocal separation model, which subsequently serves as a foundation model for fine-tuning for vocal melody transcription. Through extensive experiments conducted on benchmark datasets, we showcase that our models achieve state-of-the-art performance in both vocal separation and melody transcription tasks, underscoring the efficacy and versatility of Mel-RoFormer in modeling complex music audio signals.

Music Source Separation in the Waveform Domain

Source separation for music is the task of isolating contributions, or stems, from different instruments recorded individually and arranged together to form a song. Such components include voice, bass, drums and any other accompaniments.Contrarily to many audio synthesis tasks where the best performances are achieved by models that directly generate the waveform, the state-of-the-art in source separation for music is to compute masks on the magnitude spectrum. In this paper, we compare two waveform domain architectures. We first adapt Conv-Tasnet, initially developed for speech source separation,to the task of music source separation. While Conv-Tasnet beats many existing spectrogram-domain methods, it suffersfrom significant artifacts, as shown by human evaluations. We propose instead Demucs, a novel waveform-to-waveform model,with a U-Net structure and bidirectional LSTM.Experiments on the MusDB dataset show that, with proper data augmentation, Demucs beats allexisting state-of-the-art architectures, including Conv-Tasnet, with 6.3 SDR on average, (and up to 6.8 with 150 extra training songs, even surpassing the IRM oracle for the bass source).Using recent development in model quantization, Demucs can be compressed down to 120MBwithout any loss of accuracy.We also provide human evaluations, showing that Demucs benefit from a large advantagein terms of the naturalness of the audio. However, it suffers from some bleeding,especially between the vocals and other source.

Galaxy Spectra neural Networks (GaSNets). I. Searching for strong lens candidates in eBOSS spectra using Deep Learning

With the advent of new spectroscopic surveys from ground and space, observing up to hundreds of millions of galaxies, spectra classification will become overwhelming for standard analysis techniques. To prepare for this challenge, we introduce a family of deep learning tools to classify features in one-dimensional spectra. As the first application of these Galaxy Spectra neural Networks (GaSNets), we focus on tools specialized at identifying emission lines from strongly lensed star-forming galaxies in the eBOSS spectra. We first discuss the training and testing of these networks and define a threshold probability, PL, of 95% for the high quality event detection. Then, using a previous set of spectroscopically selected strong lenses from eBOSS, confirmed with HST, we estimate a completeness of ~80% as the fraction of lenses recovered above the adopted PL. We finally apply the GaSNets to ~1.3M spectra to collect a first list of ~430 new high quality candidates identified with deep learning applied to spectroscopy and visually graded as highly probable real events. A preliminary check against ground-based observations tentatively shows that this sample has a confirmation rate of 38%, in line with previous samples selected with standard (no deep learning) classification tools and follow-up by Hubble Space Telescope. This first test shows that machine learning can be efficiently extended to feature recognition in the wavelength space, which will be crucial for future surveys like 4MOST, DESI, Euclid, and the Chinese Space Station Telescope (CSST).

PhaseNet: A Deep-Neural-Network-Based Seismic Arrival Time Picking Method

As the number of seismic sensors grows, it is becoming increasingly difficult for analysts to pick seismic phases manually and comprehensively, yet such efforts are fundamental to earthquake monitoring. Despite years of improvements in automatic phase picking, it is difficult to match the performance of experienced analysts. A more subtle issue is that different seismic analysts may pick phases differently, which can introduce bias into earthquake locations. We present a deep-neural-network-based arrival-time picking method called "PhaseNet" that picks the arrival times of both P and S waves. Deep neural networks have recently made rapid progress in feature learning, and with sufficient training, have achieved super-human performance in many applications. PhaseNet uses three-component seismic waveforms as input and generates probability distributions of P arrivals, S arrivals, and noise as output. We engineer PhaseNet such that peaks in probability provide accurate arrival times for both P and S waves, and have the potential to increase the number of S-wave observations dramatically over what is currently available. This will enable both improved locations and improved shear wave velocity models. PhaseNet is trained on the prodigious available data set provided by analyst-labeled P and S arrival times from the Northern California Earthquake Data Center. The dataset we use contains more than seven million waveform samples extracted from over thirty years of earthquake recordings. We demonstrate that PhaseNet achieves much higher picking accuracy and recall rate than existing methods.

Promise and Peril: Stellar Contamination and Strict Limits on the Atmosphere Composition of TRAPPIST-1c from JWST NIRISS Transmission Spectra

Attempts to probe the atmospheres of rocky planets around M dwarfs present both promise and peril. While their favorable planet-to-star radius ratios enable searches for even thin secondary atmospheres, their high activity levels and high-energy outputs threaten atmosphere survival. Here, we present the 0.6--2.85\,mum transmission spectrum of the 1.1\,rm R_oplus, sim340\,K rocky planet TRAPPIST-1\,c obtained over two JWST NIRISS/SOSS transit observations. Each of the two spectra displays 100--500\,ppm signatures of stellar contamination. Despite being separated by 367\,days, the retrieved spot and faculae properties are consistent between the two visits, resulting in nearly identical transmission spectra. Jointly retrieving for stellar contamination and a planetary atmosphere reveals that our spectrum can rule out hydrogen-dominated, lesssim300times solar metallicity atmospheres with effective surface pressures down to 10\,mbar at the 3-sigma level. For high-mean molecular weight atmospheres, where O_2 or N_2 is the background gas, our spectrum disfavors partial pressures of more than sim10\,mbar for H_2O, CO, NH_3 and CH_4 at the 2-sigma level. Similarly, under the assumption of a 100\% H_2O, NH_3, CO, or CH_4 atmosphere, our spectrum disfavors thick, >1\,bar atmospheres at the 2-sigma level. These non-detections of spectral features are in line with predictions that even heavier, CO_2-rich, atmospheres would be efficiently lost on TRAPPIST-1\,c given the cumulative high-energy irradiation experienced by the planet. Our results further stress the importance of robustly accounting for stellar contamination when analyzing JWST observations of exo-Earths around M dwarfs, as well as the need for high-fidelity stellar models to search for the potential signals of thin secondary atmospheres.

Seismic Arrival-time Picking on Distributed Acoustic Sensing Data using Semi-supervised Learning

Distributed Acoustic Sensing (DAS) is an emerging technology for earthquake monitoring and subsurface imaging. The recorded seismic signals by DAS have several distinct characteristics, such as unknown coupling effects, strong anthropogenic noise, and ultra-dense spatial sampling. These aspects differ from conventional seismic data recorded by seismic networks, making it challenging to utilize DAS at present for seismic monitoring. New data analysis algorithms are needed to extract useful information from DAS data. Previous studies on conventional seismic data demonstrated that deep learning models could achieve performance close to human analysts in picking seismic phases. However, phase picking on DAS data is still a difficult problem due to the lack of manual labels. Further, the differences in mathematical structure between these two data formats, i.e., ultra-dense DAS arrays and sparse seismic networks, make model fine-tuning or transfer learning difficult to implement on DAS data. In this work, we design a new approach using semi-supervised learning to solve the phase-picking task on DAS arrays. We use a pre-trained PhaseNet model as a teacher network to generate noisy labels of P and S arrivals on DAS data and apply the Gaussian mixture model phase association (GaMMA) method to refine these noisy labels to build training datasets. We develop a new deep learning model, PhaseNet-DAS, to process the 2D spatial-temporal data of DAS arrays and train the model on DAS data. The new deep learning model achieves high picking accuracy and good earthquake detection performance. We then apply the model to process continuous data and build earthquake catalogs directly from DAS recording. Our approach using semi-supervised learning provides a way to build effective deep learning models for DAS, which have the potential to improve earthquake monitoring using large-scale fiber networks.

New Radio Observations of the Supernova Remnant CTA 1

We present new radio images of the supernova remnant (SNR) CTA 1 at 1420 and 408 MHz, and in the 21 cm line of H I observed with the Dominion Radio Astrophysical Observatory Synthesis Telescope and at 1420 MHz observed with the Effelsberg 100 m telescope. We confirm previously described continuum features and elaborate further on filamentary features identified using the high-resolution (1') maps from these new observations. We investigate the abrupt change in sign of rotation measure (RM) across the SNR, using the linear polarization observations in the four bands around 1420 MHz. Following X. H. Sun et al.'s (2011) investigation, we both confirm that the distribution of signs of the RMs for extragalactic sources in the area appears to match that of the shell, as well as combine the data from the four bands to estimate the relative depolarization and the intrinsic rotation measure of the SNR. We do not conclusively reject X. H. Sun et al.'s (2011) claim of a Faraday screen in the foreground causing the distribution of RMs that we observe; however, we do suggest an alternative explanation of a swept-up stellar wind from the progenitor star with a toroidal magnetic field. Finally, we expand on the analysis of the H I observations by applying the Rolling Hough Transform to isolate filamentary structure and better identify H I emission with the SNR. Further constraining the H I velocity channels associated with CTA 1, we use more recent Galactic rotation curves to calculate an updated kinematic distance of 1.09 +/- 0.2 kpc.

Galaxy Spectra neural Network (GaSNet). II. Using Deep Learning for Spectral Classification and Redshift Predictions

Large sky spectroscopic surveys have reached the scale of photometric surveys in terms of sample sizes and data complexity. These huge datasets require efficient, accurate, and flexible automated tools for data analysis and science exploitation. We present the Galaxy Spectra Network/GaSNet-II, a supervised multi-network deep learning tool for spectra classification and redshift prediction. GaSNet-II can be trained to identify a customized number of classes and optimize the redshift predictions for classified objects in each of them. It also provides redshift errors, using a network-of-networks that reproduces a Monte Carlo test on each spectrum, by randomizing their weight initialization. As a demonstration of the capability of the deep learning pipeline, we use 260k Sloan Digital Sky Survey spectra from Data Release 16, separated into 13 classes including 140k galactic, and 120k extragalactic objects. GaSNet-II achieves 92.4% average classification accuracy over the 13 classes (larger than 90% for the majority of them), and an average redshift error of approximately 0.23% for galaxies and 2.1% for quasars. We further train/test the same pipeline to classify spectra and predict redshifts for a sample of 200k 4MOST mock spectra and 21k publicly released DESI spectra. On 4MOST mock data, we reach 93.4% accuracy in 10-class classification and an average redshift error of 0.55% for galaxies and 0.3% for active galactic nuclei. On DESI data, we reach 96% accuracy in (star/galaxy/quasar only) classification and an average redshift error of 2.8% for galaxies and 4.8% for quasars, despite the small sample size available. GaSNet-II can process ~40k spectra in less than one minute, on a normal Desktop GPU. This makes the pipeline particularly suitable for real-time analyses of Stage-IV survey observations and an ideal tool for feedback loops aimed at night-by-night survey strategy optimization.

A Model RRNet for Spectral Information Exploitation and LAMOST Medium-resolution Spectrum Parameter Estimation

This work proposes a Residual Recurrent Neural Network (RRNet) for synthetically extracting spectral information, and estimating stellar atmospheric parameters together with 15 chemical element abundances for medium-resolution spectra from Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST). The RRNet consists of two fundamental modules: a residual module and a recurrent module. The residual module extracts spectral features based on the longitudinally driving power from parameters, while the recurrent module recovers spectral information and restrains the negative influences from noises based on Cross-band Belief Enhancement. RRNet is trained by the spectra from common stars between LAMOST DR7 and APOGEE-Payne catalog. The 17 stellar parameters and their uncertainties for 2.37 million medium-resolution spectra from LAMOST DR7 are predicted. For spectra with S/N >= 10, the precision of estimations Teff and log g are 88 K and 0.13 dex respectively, elements C, Mg, Al, Si, Ca, Fe, Ni are 0.05 dex to 0.08 dex, and N, O, S, K, Ti, Cr, Mn are 0.09 dex to 0.14 dex, while that of Cu is 0.19 dex. Compared with StarNet and SPCANet, RRNet shows higher accuracy and robustness. In comparison to Apache Point Observatory Galactic Evolution Experiment and Galactic Archaeology with HERMES surveys, RRNet manifests good consistency within a reasonable range of bias. Finally, this work releases a catalog for 2.37 million medium-resolution spectra from the LAMOST DR7, the source code, the trained model and the experimental data respectively for astronomical science exploration and data processing algorithm research reference.