Edge cases¶
Every weirdness here came from a real input that exposed it. Most are handled by the convert pipeline; a few are deliberate non-handlings explained at the bottom.
Multi-disc folder layouts¶
The discover step merges siblings with shared prefixes:
| Layout | Detected as |
|---|---|
Album/CD1, Album/CD2 |
one album, 2 discs |
Album/CD-1, Album/CD-2 |
one album |
Album/CD 1, Album/CD 2 |
one album |
Album/Disc 1, Album/Disc 2 |
one album |
Album/Disk3 |
(treated as disc 3 if siblings exist) |
Album (CD1)/, Album (CD2)/ |
one album, shared-prefix style |
Album/CD2 (Bonus Live CD) |
merges with CD1 even with trailing text |
Single Album/CD1/ (no CD2 sibling) is not promoted to multi-disc — treated as a regular single-disc album.
Mixed-content parents (Ultimate Queen layout: a folder with both CD1//CD2/ AND bare albums as siblings) only merge the matching-prefix disc pairs; bare albums stay as-is.
Parent-with-audio + disc subfolders (System Of A Down self-titled with duplicate Disc 1//Disc 2/ subfolders inside the album folder containing the same tracks): drops the disc subfolders, uses the parent's top-level tracks as one album.
Filename-encoded multi-disc¶
Some rips put the disc + track in the filename rather than a CD subfolder:
Zara Larsson — So Good (Deluxe)/
1-01. Ain't My Fault.flac
1-02. I Would Like.flac
...
2-01. So Good.flac
2-02. Lush Life.flac
_maybe_apply_filename_disc_track detects + applies disc/track from the filename pattern when no folder-disc structure exists. Triggers only when ≥2 distinct disc numbers appear (so single-disc 01-NN doesn't accidentally trip).
Continuous numbering across discs¶
Mega-comp convention: disc 1 has tracks 1-9, disc 2 has 10-18, etc. — track numbers continue rather than restart per disc.
A State Of Trance Classics Vol. 7/
01-01 - Chicane - Offshore.m4a
01-02 - Above & Beyond - ...
...
01-09 - G&M Project - ...
02-10 - BT - Flaming June.m4a # disc 2 starts at track 10
02-11 - Marcel Woods - ...
...
02-18 - Peter Martijn Wijnia - ...
03-19 - Marco V - GODD.m4a # disc 3 starts at 19
...
The convert pipeline detects this (each disc's min track == previous disc's max + 1) and resets to per-disc 1..N in the output tags. Otherwise Subsonic / Music.app would show "track 10" of "1 track" on disc 2, which displays weirdly.
Scene-encoded DTT track numbers¶
Now! / Absolute Music / Billboard compilations encode multi-disc structure into a 3-digit track number prefix on the filename (101_artist_-_title.mp3 = disc 1 track 1, 218_artist_-_title.mp3 = disc 2 track 18). The MP3 track tag often carries the in-disc number; the filename carries the encoded form. We read the filename because the tag is unreliable.
Trigger conditions (all required, conservative):
discoverdid NOT already merge this album from disc subfolders.- Every track's filename starts with a 3-digit number (100-999).
- The set of
tn // 100has ≥2 unique values. - Each disc cluster has ≥3 tracks.
A 100-track regular album with one track numbered 100 won't accidentally trigger this.
Various-Artists detection¶
Detected as compilation when:
album_artistis a VA alias (VA,Various,Various Artists,V.A.,V/A,Various Artist,Verschillende Artiesten, etc.)- The per-track artist majority is itself a VA alias (rips that leave album_artist empty but stamp every track artist as
VA) - No
album_artistand tracks span multiple different artists
Compilations route to Various Artists/ and set cpil (MP4) / TCMP (ID3) compilation flag.
Tagless VA filename parsing¶
Filename 01 - VA - Artist - Title.flac has a leading VA - segment. The filename parser splits out the real per-track artist; summary detects compilation by VA-marker artist_fallback even if album_artist is empty.
Tagless 100-track VA mix¶
Real example: a 7Os8Os9Os compilation with 100 MP3s named NN. Artist - Title.mp3 and no tags at all. Filename pre-fill in _process_album populates per-track artists before summarise → distinct artists triggers compilation flag → output lands under Various Artists/.
Disc-suffix in album tags¶
Roses (CD1) → Roses
Are You Ready: Best Of AC/DC [CD1] → Are You Ready: Best Of AC/DC
Roses (CD2) Live In Madrid → Roses (Cranberries layout — middle-of-string disc marker)
Echoes (1) → Echoes (bare-paren disc index)
clean_album_title strips them, including the bare-trailing (1)/(2) form.
Disc-1 bias in album-name vote¶
Multi-disc albums where bonus discs have polluted tags (e.g. Album (CD2) Live In Madrid) would otherwise win majority count over disc 1's clean Album tag. The summary biases toward disc 1 — disc 1 tracks vote first; only fall back to all tracks if disc 1 had no consensus.
Folder name junk¶
Stripped from the directory name when used as ALBUM fallback:
- Codec / quality bracketed tags:
[FLAC],[16Bit-44.1kHz],[lossless] - Year:
(2012)(extracted, not just dropped) - VA prefix:
VA -,Various - - Scene-domain tags:
[nextorrent.com],[example.org]— limited to known TLDs so we don't strip[Live]/[PMEDIA]/ catalog numbers
Edition annotations (added in a later commit):
(Remastered),(Remastered 2009),(2009 Remaster)(Deluxe Edition),(Super Deluxe Edition),(Expanded Edition)(40th Anniversary Edition),(10th Anniversary)(Bonus Tracks),[Bonus Disc](2018 Reissue),(Reissue)(Special Edition),(Limited Edition),(Collector's Edition)
Live annotations are kept — (Live), (Live in Madrid), (Live at Budokan 1979) — since live albums are distinct works from their studio counterparts.
Filesystem-safe sanitisation¶
/,:,*,?,",<,>,|all replaced with_R.E.M.trailing dots preserved (single-letter acronyms — distinct from disc-marker(1).)- NFC unicode normalisation
- ≤180-byte component cap (some filesystems cap at 255; we leave headroom)
Smart quotes / em-dashes (Sting 1984–1994): preserved verbatim.
Filename collisions¶
Two tracks with same track-no + title in one album → auto-suffix (2), (3). Reserves the FINAL filename, not just the planned one, so a third track titled Same (2) chains to (3) rather than colliding with the auto-renamed second.
Output dir collisions¶
Two source albums normalising to the same output path → skip second + warn in the summary. Surfaces in --dry-run too. Reservation happens BEFORE encode so dry-run and real-run agree on what would happen.
Source-side dedupe¶
Some rip groups ship every track twice under different filename conventions:
Match on (disc_no, track_no, title.casefold(), artist.casefold()) + duration within 0.5s. Without the duration check, a remix and its original at the same track_no would falsely dedupe; with it, only true duplicates collapse.
Atomic per-album writes¶
Hidden .staging sibling dir, swap on success, no half-replaced albums on mid-encode failure. Per-track failures caught at thread level, propagate to album-level error (album fails in summary, CLI exits non-zero, prior album content preserved by atomic swap).
Cover corruption¶
_measure returns (0, 0) on Pillow failure → candidate dropped from the picker. cover.normalize() exception → fall through to next-best candidate. CAA HTTP 5xx (Internet Archive CDN flaky) → caught + warning + fall back to local cover. Don't crash the album.
Lossy → lossy guard¶
--format aac/mp3 against an MP3/OGG source falls back to ALAC for that track unless --allow-lossy-recompress. AUTO never triggers this since MP3 explicitly transcodes to AAC by design.
MP3-in-MP4 deliberately avoided¶
When a source MP3 is processed under --format auto, the output is transcoded to AAC rather than stream-copied into an .m4a container. MP3-in-MP4 is a valid combination per the MP4 spec and mutagen reads its tags fine, but Finder / Music.app's metadata pipeline shortcuts based on the codec field and won't display tags reliably. Transcoding loses a tiny amount of audio fidelity (transparent on Bluetooth playback) in exchange for a tag schema every consumer can read.
Cases the pipeline does NOT handle (deliberate)¶
"Radio Show -" pseudo-artists¶
Some rips deliberately tag the album_artist as a programme rather than a person — Armin van Buuren's A State Of Trance weekly show ships with album_artist = "Radio Show - A State Of Trance" on every episode. We trust the tag; episodes land under output/Radio Show - A State Of Trance/2025 - A State Of Trance 1254 (...)/ rather than mixed into Armin van Buuren/.
This is the right default — keeps weekly shows from polluting the main artist library — but means searching for the actual artist won't find them. Workarounds: re-tag source files, or future --artist-override flag.
Classical-style compilations¶
Best Of/ wrapper folder with one sub-folder per composer — each containing tracks tagged only with the composer name, no ALBUM tag — produces one output album per composer-folder, not a single merged "Various Artists" comp. Tried merging once; concluded per-composer artist folders were the better default.