git subrepo pull (merge) --force deps/libchdr
[pcsx_rearmed.git] / deps / libchdr / include / dr_libs / dr_flac.h
CommitLineData
2ff0b512 1/*
2FLAC audio decoder. Choice of public domain or MIT-0. See license statements at the end of this file.
648db22b 3dr_flac - v0.12.42 - 2023-11-02
2ff0b512 4
5David Reid - mackron@gmail.com
6
7GitHub: https://github.com/mackron/dr_libs
8*/
9
10/*
11RELEASE NOTES - v0.12.0
12=======================
13Version 0.12.0 has breaking API changes including changes to the existing API and the removal of deprecated APIs.
14
15
16Improved Client-Defined Memory Allocation
17-----------------------------------------
18The main change with this release is the addition of a more flexible way of implementing custom memory allocation routines. The
19existing system of DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE are still in place and will be used by default when no custom
20allocation callbacks are specified.
21
22To use the new system, you pass in a pointer to a drflac_allocation_callbacks object to drflac_open() and family, like this:
23
24 void* my_malloc(size_t sz, void* pUserData)
25 {
26 return malloc(sz);
27 }
28 void* my_realloc(void* p, size_t sz, void* pUserData)
29 {
30 return realloc(p, sz);
31 }
32 void my_free(void* p, void* pUserData)
33 {
34 free(p);
35 }
36
37 ...
38
39 drflac_allocation_callbacks allocationCallbacks;
40 allocationCallbacks.pUserData = &myData;
41 allocationCallbacks.onMalloc = my_malloc;
42 allocationCallbacks.onRealloc = my_realloc;
43 allocationCallbacks.onFree = my_free;
44 drflac* pFlac = drflac_open_file("my_file.flac", &allocationCallbacks);
45
46The advantage of this new system is that it allows you to specify user data which will be passed in to the allocation routines.
47
48Passing in null for the allocation callbacks object will cause dr_flac to use defaults which is the same as DRFLAC_MALLOC,
49DRFLAC_REALLOC and DRFLAC_FREE and the equivalent of how it worked in previous versions.
50
51Every API that opens a drflac object now takes this extra parameter. These include the following:
52
53 drflac_open()
54 drflac_open_relaxed()
55 drflac_open_with_metadata()
56 drflac_open_with_metadata_relaxed()
57 drflac_open_file()
58 drflac_open_file_with_metadata()
59 drflac_open_memory()
60 drflac_open_memory_with_metadata()
61 drflac_open_and_read_pcm_frames_s32()
62 drflac_open_and_read_pcm_frames_s16()
63 drflac_open_and_read_pcm_frames_f32()
64 drflac_open_file_and_read_pcm_frames_s32()
65 drflac_open_file_and_read_pcm_frames_s16()
66 drflac_open_file_and_read_pcm_frames_f32()
67 drflac_open_memory_and_read_pcm_frames_s32()
68 drflac_open_memory_and_read_pcm_frames_s16()
69 drflac_open_memory_and_read_pcm_frames_f32()
70
71
72
73Optimizations
74-------------
75Seeking performance has been greatly improved. A new binary search based seeking algorithm has been introduced which significantly
76improves performance over the brute force method which was used when no seek table was present. Seek table based seeking also takes
77advantage of the new binary search seeking system to further improve performance there as well. Note that this depends on CRC which
78means it will be disabled when DR_FLAC_NO_CRC is used.
79
80The SSE4.1 pipeline has been cleaned up and optimized. You should see some improvements with decoding speed of 24-bit files in
81particular. 16-bit streams should also see some improvement.
82
83drflac_read_pcm_frames_s16() has been optimized. Previously this sat on top of drflac_read_pcm_frames_s32() and performed it's s32
84to s16 conversion in a second pass. This is now all done in a single pass. This includes SSE2 and ARM NEON optimized paths.
85
86A minor optimization has been implemented for drflac_read_pcm_frames_s32(). This will now use an SSE2 optimized pipeline for stereo
87channel reconstruction which is the last part of the decoding process.
88
89The ARM build has seen a few improvements. The CLZ (count leading zeroes) and REV (byte swap) instructions are now used when
90compiling with GCC and Clang which is achieved using inline assembly. The CLZ instruction requires ARM architecture version 5 at
91compile time and the REV instruction requires ARM architecture version 6.
92
93An ARM NEON optimized pipeline has been implemented. To enable this you'll need to add -mfpu=neon to the command line when compiling.
94
95
96Removed APIs
97------------
98The following APIs were deprecated in version 0.11.0 and have been completely removed in version 0.12.0:
99
100 drflac_read_s32() -> drflac_read_pcm_frames_s32()
101 drflac_read_s16() -> drflac_read_pcm_frames_s16()
102 drflac_read_f32() -> drflac_read_pcm_frames_f32()
103 drflac_seek_to_sample() -> drflac_seek_to_pcm_frame()
104 drflac_open_and_decode_s32() -> drflac_open_and_read_pcm_frames_s32()
105 drflac_open_and_decode_s16() -> drflac_open_and_read_pcm_frames_s16()
106 drflac_open_and_decode_f32() -> drflac_open_and_read_pcm_frames_f32()
107 drflac_open_and_decode_file_s32() -> drflac_open_file_and_read_pcm_frames_s32()
108 drflac_open_and_decode_file_s16() -> drflac_open_file_and_read_pcm_frames_s16()
109 drflac_open_and_decode_file_f32() -> drflac_open_file_and_read_pcm_frames_f32()
110 drflac_open_and_decode_memory_s32() -> drflac_open_memory_and_read_pcm_frames_s32()
111 drflac_open_and_decode_memory_s16() -> drflac_open_memory_and_read_pcm_frames_s16()
112 drflac_open_and_decode_memory_f32() -> drflac_open_memroy_and_read_pcm_frames_f32()
113
114Prior versions of dr_flac operated on a per-sample basis whereas now it operates on PCM frames. The removed APIs all relate
115to the old per-sample APIs. You now need to use the "pcm_frame" versions.
116*/
117
118
119/*
120Introduction
121============
122dr_flac is a single file library. To use it, do something like the following in one .c file.
123
124 ```c
125 #define DR_FLAC_IMPLEMENTATION
126 #include "dr_flac.h"
127 ```
128
129You can then #include this file in other parts of the program as you would with any other header file. To decode audio data, do something like the following:
130
131 ```c
132 drflac* pFlac = drflac_open_file("MySong.flac", NULL);
133 if (pFlac == NULL) {
134 // Failed to open FLAC file
135 }
136
137 drflac_int32* pSamples = malloc(pFlac->totalPCMFrameCount * pFlac->channels * sizeof(drflac_int32));
138 drflac_uint64 numberOfInterleavedSamplesActuallyRead = drflac_read_pcm_frames_s32(pFlac, pFlac->totalPCMFrameCount, pSamples);
139 ```
140
141The drflac object represents the decoder. It is a transparent type so all the information you need, such as the number of channels and the bits per sample,
142should be directly accessible - just make sure you don't change their values. Samples are always output as interleaved signed 32-bit PCM. In the example above
143a native FLAC stream was opened, however dr_flac has seamless support for Ogg encapsulated FLAC streams as well.
144
145You do not need to decode the entire stream in one go - you just specify how many samples you'd like at any given time and the decoder will give you as many
146samples as it can, up to the amount requested. Later on when you need the next batch of samples, just call it again. Example:
147
148 ```c
149 while (drflac_read_pcm_frames_s32(pFlac, chunkSizeInPCMFrames, pChunkSamples) > 0) {
150 do_something();
151 }
152 ```
153
154You can seek to a specific PCM frame with `drflac_seek_to_pcm_frame()`.
155
156If you just want to quickly decode an entire FLAC file in one go you can do something like this:
157
158 ```c
159 unsigned int channels;
160 unsigned int sampleRate;
161 drflac_uint64 totalPCMFrameCount;
162 drflac_int32* pSampleData = drflac_open_file_and_read_pcm_frames_s32("MySong.flac", &channels, &sampleRate, &totalPCMFrameCount, NULL);
163 if (pSampleData == NULL) {
164 // Failed to open and decode FLAC file.
165 }
166
167 ...
168
169 drflac_free(pSampleData, NULL);
170 ```
171
172You can read samples as signed 16-bit integer and 32-bit floating-point PCM with the *_s16() and *_f32() family of APIs respectively, but note that these
173should be considered lossy.
174
175
176If you need access to metadata (album art, etc.), use `drflac_open_with_metadata()`, `drflac_open_file_with_metdata()` or `drflac_open_memory_with_metadata()`.
177The rationale for keeping these APIs separate is that they're slightly slower than the normal versions and also just a little bit harder to use. dr_flac
178reports metadata to the application through the use of a callback, and every metadata block is reported before `drflac_open_with_metdata()` returns.
179
180The main opening APIs (`drflac_open()`, etc.) will fail if the header is not present. The presents a problem in certain scenarios such as broadcast style
181streams or internet radio where the header may not be present because the user has started playback mid-stream. To handle this, use the relaxed APIs:
182
183 `drflac_open_relaxed()`
184 `drflac_open_with_metadata_relaxed()`
185
186It is not recommended to use these APIs for file based streams because a missing header would usually indicate a corrupt or perverse file. In addition, these
187APIs can take a long time to initialize because they may need to spend a lot of time finding the first frame.
188
189
190
191Build Options
192=============
193#define these options before including this file.
194
195#define DR_FLAC_NO_STDIO
196 Disable `drflac_open_file()` and family.
197
198#define DR_FLAC_NO_OGG
199 Disables support for Ogg/FLAC streams.
200
201#define DR_FLAC_BUFFER_SIZE <number>
202 Defines the size of the internal buffer to store data from onRead(). This buffer is used to reduce the number of calls back to the client for more data.
203 Larger values means more memory, but better performance. My tests show diminishing returns after about 4KB (which is the default). Consider reducing this if
204 you have a very efficient implementation of onRead(), or increase it if it's very inefficient. Must be a multiple of 8.
205
206#define DR_FLAC_NO_CRC
207 Disables CRC checks. This will offer a performance boost when CRC is unnecessary. This will disable binary search seeking. When seeking, the seek table will
208 be used if available. Otherwise the seek will be performed using brute force.
209
210#define DR_FLAC_NO_SIMD
211 Disables SIMD optimizations (SSE on x86/x64 architectures, NEON on ARM architectures). Use this if you are having compatibility issues with your compiler.
212
9e052883 213#define DR_FLAC_NO_WCHAR
214 Disables all functions ending with `_w`. Use this if your compiler does not provide wchar.h. Not required if DR_FLAC_NO_STDIO is also defined.
215
2ff0b512 216
217
218Notes
219=====
220- dr_flac does not support changing the sample rate nor channel count mid stream.
221- dr_flac is not thread-safe, but its APIs can be called from any thread so long as you do your own synchronization.
222- When using Ogg encapsulation, a corrupted metadata block will result in `drflac_open_with_metadata()` and `drflac_open()` returning inconsistent samples due
223 to differences in corrupted stream recorvery logic between the two APIs.
224*/
225
226#ifndef dr_flac_h
227#define dr_flac_h
228
229#ifdef __cplusplus
230extern "C" {
231#endif
232
233#define DRFLAC_STRINGIFY(x) #x
234#define DRFLAC_XSTRINGIFY(x) DRFLAC_STRINGIFY(x)
235
236#define DRFLAC_VERSION_MAJOR 0
237#define DRFLAC_VERSION_MINOR 12
648db22b 238#define DRFLAC_VERSION_REVISION 42
2ff0b512 239#define DRFLAC_VERSION_STRING DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MAJOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MINOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_REVISION)
240
241#include <stddef.h> /* For size_t. */
242
648db22b 243/* Sized Types */
2ff0b512 244typedef signed char drflac_int8;
245typedef unsigned char drflac_uint8;
246typedef signed short drflac_int16;
247typedef unsigned short drflac_uint16;
248typedef signed int drflac_int32;
249typedef unsigned int drflac_uint32;
9e052883 250#if defined(_MSC_VER) && !defined(__clang__)
2ff0b512 251 typedef signed __int64 drflac_int64;
252 typedef unsigned __int64 drflac_uint64;
253#else
254 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
255 #pragma GCC diagnostic push
256 #pragma GCC diagnostic ignored "-Wlong-long"
257 #if defined(__clang__)
258 #pragma GCC diagnostic ignored "-Wc++11-long-long"
259 #endif
260 #endif
261 typedef signed long long drflac_int64;
262 typedef unsigned long long drflac_uint64;
263 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
264 #pragma GCC diagnostic pop
265 #endif
266#endif
9e052883 267#if defined(__LP64__) || defined(_WIN64) || (defined(__x86_64__) && !defined(__ILP32__)) || defined(_M_X64) || defined(__ia64) || defined(_M_IA64) || defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__)
2ff0b512 268 typedef drflac_uint64 drflac_uintptr;
269#else
270 typedef drflac_uint32 drflac_uintptr;
271#endif
272typedef drflac_uint8 drflac_bool8;
273typedef drflac_uint32 drflac_bool32;
274#define DRFLAC_TRUE 1
275#define DRFLAC_FALSE 0
648db22b 276/* End Sized Types */
2ff0b512 277
648db22b 278/* Decorations */
2ff0b512 279#if !defined(DRFLAC_API)
280 #if defined(DRFLAC_DLL)
281 #if defined(_WIN32)
282 #define DRFLAC_DLL_IMPORT __declspec(dllimport)
283 #define DRFLAC_DLL_EXPORT __declspec(dllexport)
284 #define DRFLAC_DLL_PRIVATE static
285 #else
286 #if defined(__GNUC__) && __GNUC__ >= 4
287 #define DRFLAC_DLL_IMPORT __attribute__((visibility("default")))
288 #define DRFLAC_DLL_EXPORT __attribute__((visibility("default")))
289 #define DRFLAC_DLL_PRIVATE __attribute__((visibility("hidden")))
290 #else
291 #define DRFLAC_DLL_IMPORT
292 #define DRFLAC_DLL_EXPORT
293 #define DRFLAC_DLL_PRIVATE static
294 #endif
295 #endif
296
297 #if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
298 #define DRFLAC_API DRFLAC_DLL_EXPORT
299 #else
300 #define DRFLAC_API DRFLAC_DLL_IMPORT
301 #endif
302 #define DRFLAC_PRIVATE DRFLAC_DLL_PRIVATE
303 #else
304 #define DRFLAC_API extern
305 #define DRFLAC_PRIVATE static
306 #endif
307#endif
648db22b 308/* End Decorations */
2ff0b512 309
310#if defined(_MSC_VER) && _MSC_VER >= 1700 /* Visual Studio 2012 */
311 #define DRFLAC_DEPRECATED __declspec(deprecated)
312#elif (defined(__GNUC__) && __GNUC__ >= 4) /* GCC 4 */
313 #define DRFLAC_DEPRECATED __attribute__((deprecated))
314#elif defined(__has_feature) /* Clang */
315 #if __has_feature(attribute_deprecated)
316 #define DRFLAC_DEPRECATED __attribute__((deprecated))
317 #else
318 #define DRFLAC_DEPRECATED
319 #endif
320#else
321 #define DRFLAC_DEPRECATED
322#endif
323
324DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision);
325DRFLAC_API const char* drflac_version_string(void);
326
648db22b 327/* Allocation Callbacks */
328typedef struct
329{
330 void* pUserData;
331 void* (* onMalloc)(size_t sz, void* pUserData);
332 void* (* onRealloc)(void* p, size_t sz, void* pUserData);
333 void (* onFree)(void* p, void* pUserData);
334} drflac_allocation_callbacks;
335/* End Allocation Callbacks */
336
2ff0b512 337/*
338As data is read from the client it is placed into an internal buffer for fast access. This controls the size of that buffer. Larger values means more speed,
339but also more memory. In my testing there is diminishing returns after about 4KB, but you can fiddle with this to suit your own needs. Must be a multiple of 8.
340*/
341#ifndef DR_FLAC_BUFFER_SIZE
342#define DR_FLAC_BUFFER_SIZE 4096
343#endif
344
648db22b 345
346/* Architecture Detection */
2ff0b512 347#if defined(_WIN64) || defined(_LP64) || defined(__LP64__)
348#define DRFLAC_64BIT
349#endif
350
648db22b 351#if defined(__x86_64__) || defined(_M_X64)
352 #define DRFLAC_X64
353#elif defined(__i386) || defined(_M_IX86)
354 #define DRFLAC_X86
355#elif defined(__arm__) || defined(_M_ARM) || defined(__arm64) || defined(__arm64__) || defined(__aarch64__) || defined(_M_ARM64)
356 #define DRFLAC_ARM
357#endif
358/* End Architecture Detection */
359
360
2ff0b512 361#ifdef DRFLAC_64BIT
362typedef drflac_uint64 drflac_cache_t;
363#else
364typedef drflac_uint32 drflac_cache_t;
365#endif
366
367/* The various metadata block types. */
368#define DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO 0
369#define DRFLAC_METADATA_BLOCK_TYPE_PADDING 1
370#define DRFLAC_METADATA_BLOCK_TYPE_APPLICATION 2
371#define DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE 3
372#define DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT 4
373#define DRFLAC_METADATA_BLOCK_TYPE_CUESHEET 5
374#define DRFLAC_METADATA_BLOCK_TYPE_PICTURE 6
375#define DRFLAC_METADATA_BLOCK_TYPE_INVALID 127
376
377/* The various picture types specified in the PICTURE block. */
378#define DRFLAC_PICTURE_TYPE_OTHER 0
379#define DRFLAC_PICTURE_TYPE_FILE_ICON 1
380#define DRFLAC_PICTURE_TYPE_OTHER_FILE_ICON 2
381#define DRFLAC_PICTURE_TYPE_COVER_FRONT 3
382#define DRFLAC_PICTURE_TYPE_COVER_BACK 4
383#define DRFLAC_PICTURE_TYPE_LEAFLET_PAGE 5
384#define DRFLAC_PICTURE_TYPE_MEDIA 6
385#define DRFLAC_PICTURE_TYPE_LEAD_ARTIST 7
386#define DRFLAC_PICTURE_TYPE_ARTIST 8
387#define DRFLAC_PICTURE_TYPE_CONDUCTOR 9
388#define DRFLAC_PICTURE_TYPE_BAND 10
389#define DRFLAC_PICTURE_TYPE_COMPOSER 11
390#define DRFLAC_PICTURE_TYPE_LYRICIST 12
391#define DRFLAC_PICTURE_TYPE_RECORDING_LOCATION 13
392#define DRFLAC_PICTURE_TYPE_DURING_RECORDING 14
393#define DRFLAC_PICTURE_TYPE_DURING_PERFORMANCE 15
394#define DRFLAC_PICTURE_TYPE_SCREEN_CAPTURE 16
395#define DRFLAC_PICTURE_TYPE_BRIGHT_COLORED_FISH 17
396#define DRFLAC_PICTURE_TYPE_ILLUSTRATION 18
397#define DRFLAC_PICTURE_TYPE_BAND_LOGOTYPE 19
398#define DRFLAC_PICTURE_TYPE_PUBLISHER_LOGOTYPE 20
399
400typedef enum
401{
402 drflac_container_native,
403 drflac_container_ogg,
404 drflac_container_unknown
405} drflac_container;
406
407typedef enum
408{
409 drflac_seek_origin_start,
410 drflac_seek_origin_current
411} drflac_seek_origin;
412
9e052883 413/* The order of members in this structure is important because we map this directly to the raw data within the SEEKTABLE metadata block. */
2ff0b512 414typedef struct
415{
416 drflac_uint64 firstPCMFrame;
417 drflac_uint64 flacFrameOffset; /* The offset from the first byte of the header of the first frame. */
418 drflac_uint16 pcmFrameCount;
419} drflac_seekpoint;
2ff0b512 420
421typedef struct
422{
423 drflac_uint16 minBlockSizeInPCMFrames;
424 drflac_uint16 maxBlockSizeInPCMFrames;
425 drflac_uint32 minFrameSizeInPCMFrames;
426 drflac_uint32 maxFrameSizeInPCMFrames;
427 drflac_uint32 sampleRate;
428 drflac_uint8 channels;
429 drflac_uint8 bitsPerSample;
430 drflac_uint64 totalPCMFrameCount;
431 drflac_uint8 md5[16];
432} drflac_streaminfo;
433
434typedef struct
435{
436 /*
437 The metadata type. Use this to know how to interpret the data below. Will be set to one of the
438 DRFLAC_METADATA_BLOCK_TYPE_* tokens.
439 */
440 drflac_uint32 type;
441
442 /*
443 A pointer to the raw data. This points to a temporary buffer so don't hold on to it. It's best to
444 not modify the contents of this buffer. Use the structures below for more meaningful and structured
445 information about the metadata. It's possible for this to be null.
446 */
447 const void* pRawData;
448
449 /* The size in bytes of the block and the buffer pointed to by pRawData if it's non-NULL. */
450 drflac_uint32 rawDataSize;
451
452 union
453 {
454 drflac_streaminfo streaminfo;
455
456 struct
457 {
458 int unused;
459 } padding;
460
461 struct
462 {
463 drflac_uint32 id;
464 const void* pData;
465 drflac_uint32 dataSize;
466 } application;
467
468 struct
469 {
470 drflac_uint32 seekpointCount;
471 const drflac_seekpoint* pSeekpoints;
472 } seektable;
473
474 struct
475 {
476 drflac_uint32 vendorLength;
477 const char* vendor;
478 drflac_uint32 commentCount;
479 const void* pComments;
480 } vorbis_comment;
481
482 struct
483 {
484 char catalog[128];
485 drflac_uint64 leadInSampleCount;
486 drflac_bool32 isCD;
487 drflac_uint8 trackCount;
488 const void* pTrackData;
489 } cuesheet;
490
491 struct
492 {
493 drflac_uint32 type;
494 drflac_uint32 mimeLength;
495 const char* mime;
496 drflac_uint32 descriptionLength;
497 const char* description;
498 drflac_uint32 width;
499 drflac_uint32 height;
500 drflac_uint32 colorDepth;
501 drflac_uint32 indexColorCount;
502 drflac_uint32 pictureDataSize;
503 const drflac_uint8* pPictureData;
504 } picture;
505 } data;
506} drflac_metadata;
507
508
509/*
510Callback for when data needs to be read from the client.
511
512
513Parameters
514----------
515pUserData (in)
516 The user data that was passed to drflac_open() and family.
517
518pBufferOut (out)
519 The output buffer.
520
521bytesToRead (in)
522 The number of bytes to read.
523
524
525Return Value
526------------
527The number of bytes actually read.
528
529
530Remarks
531-------
532A return value of less than bytesToRead indicates the end of the stream. Do _not_ return from this callback until either the entire bytesToRead is filled or
533you have reached the end of the stream.
534*/
535typedef size_t (* drflac_read_proc)(void* pUserData, void* pBufferOut, size_t bytesToRead);
536
537/*
538Callback for when data needs to be seeked.
539
540
541Parameters
542----------
543pUserData (in)
544 The user data that was passed to drflac_open() and family.
545
546offset (in)
547 The number of bytes to move, relative to the origin. Will never be negative.
548
549origin (in)
550 The origin of the seek - the current position or the start of the stream.
551
552
553Return Value
554------------
555Whether or not the seek was successful.
556
557
558Remarks
559-------
560The offset will never be negative. Whether or not it is relative to the beginning or current position is determined by the "origin" parameter which will be
561either drflac_seek_origin_start or drflac_seek_origin_current.
562
563When seeking to a PCM frame using drflac_seek_to_pcm_frame(), dr_flac may call this with an offset beyond the end of the FLAC stream. This needs to be detected
564and handled by returning DRFLAC_FALSE.
565*/
566typedef drflac_bool32 (* drflac_seek_proc)(void* pUserData, int offset, drflac_seek_origin origin);
567
568/*
569Callback for when a metadata block is read.
570
571
572Parameters
573----------
574pUserData (in)
575 The user data that was passed to drflac_open() and family.
576
577pMetadata (in)
578 A pointer to a structure containing the data of the metadata block.
579
580
581Remarks
582-------
583Use pMetadata->type to determine which metadata block is being handled and how to read the data. This
584will be set to one of the DRFLAC_METADATA_BLOCK_TYPE_* tokens.
585*/
586typedef void (* drflac_meta_proc)(void* pUserData, drflac_metadata* pMetadata);
587
588
2ff0b512 589/* Structure for internal use. Only used for decoders opened with drflac_open_memory. */
590typedef struct
591{
592 const drflac_uint8* data;
593 size_t dataSize;
594 size_t currentReadPos;
595} drflac__memory_stream;
596
597/* Structure for internal use. Used for bit streaming. */
598typedef struct
599{
600 /* The function to call when more data needs to be read. */
601 drflac_read_proc onRead;
602
603 /* The function to call when the current read position needs to be moved. */
604 drflac_seek_proc onSeek;
605
606 /* The user data to pass around to onRead and onSeek. */
607 void* pUserData;
608
609
610 /*
611 The number of unaligned bytes in the L2 cache. This will always be 0 until the end of the stream is hit. At the end of the
612 stream there will be a number of bytes that don't cleanly fit in an L1 cache line, so we use this variable to know whether
613 or not the bistreamer needs to run on a slower path to read those last bytes. This will never be more than sizeof(drflac_cache_t).
614 */
615 size_t unalignedByteCount;
616
617 /* The content of the unaligned bytes. */
618 drflac_cache_t unalignedCache;
619
620 /* The index of the next valid cache line in the "L2" cache. */
621 drflac_uint32 nextL2Line;
622
623 /* The number of bits that have been consumed by the cache. This is used to determine how many valid bits are remaining. */
624 drflac_uint32 consumedBits;
625
626 /*
627 The cached data which was most recently read from the client. There are two levels of cache. Data flows as such:
628 Client -> L2 -> L1. The L2 -> L1 movement is aligned and runs on a fast path in just a few instructions.
629 */
630 drflac_cache_t cacheL2[DR_FLAC_BUFFER_SIZE/sizeof(drflac_cache_t)];
631 drflac_cache_t cache;
632
633 /*
634 CRC-16. This is updated whenever bits are read from the bit stream. Manually set this to 0 to reset the CRC. For FLAC, this
635 is reset to 0 at the beginning of each frame.
636 */
637 drflac_uint16 crc16;
638 drflac_cache_t crc16Cache; /* A cache for optimizing CRC calculations. This is filled when when the L1 cache is reloaded. */
639 drflac_uint32 crc16CacheIgnoredBytes; /* The number of bytes to ignore when updating the CRC-16 from the CRC-16 cache. */
640} drflac_bs;
641
642typedef struct
643{
644 /* The type of the subframe: SUBFRAME_CONSTANT, SUBFRAME_VERBATIM, SUBFRAME_FIXED or SUBFRAME_LPC. */
645 drflac_uint8 subframeType;
646
647 /* The number of wasted bits per sample as specified by the sub-frame header. */
648 drflac_uint8 wastedBitsPerSample;
649
650 /* The order to use for the prediction stage for SUBFRAME_FIXED and SUBFRAME_LPC. */
651 drflac_uint8 lpcOrder;
652
653 /* A pointer to the buffer containing the decoded samples in the subframe. This pointer is an offset from drflac::pExtraData. */
654 drflac_int32* pSamplesS32;
655} drflac_subframe;
656
657typedef struct
658{
659 /*
660 If the stream uses variable block sizes, this will be set to the index of the first PCM frame. If fixed block sizes are used, this will
661 always be set to 0. This is 64-bit because the decoded PCM frame number will be 36 bits.
662 */
663 drflac_uint64 pcmFrameNumber;
664
665 /*
666 If the stream uses fixed block sizes, this will be set to the frame number. If variable block sizes are used, this will always be 0. This
667 is 32-bit because in fixed block sizes, the maximum frame number will be 31 bits.
668 */
669 drflac_uint32 flacFrameNumber;
670
671 /* The sample rate of this frame. */
672 drflac_uint32 sampleRate;
673
674 /* The number of PCM frames in each sub-frame within this frame. */
675 drflac_uint16 blockSizeInPCMFrames;
676
677 /*
678 The channel assignment of this frame. This is not always set to the channel count. If interchannel decorrelation is being used this
679 will be set to DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE, DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE or DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE.
680 */
681 drflac_uint8 channelAssignment;
682
683 /* The number of bits per sample within this frame. */
684 drflac_uint8 bitsPerSample;
685
686 /* The frame's CRC. */
687 drflac_uint8 crc8;
688} drflac_frame_header;
689
690typedef struct
691{
692 /* The header. */
693 drflac_frame_header header;
694
695 /*
696 The number of PCM frames left to be read in this FLAC frame. This is initially set to the block size. As PCM frames are read,
697 this will be decremented. When it reaches 0, the decoder will see this frame as fully consumed and load the next frame.
698 */
699 drflac_uint32 pcmFramesRemaining;
700
701 /* The list of sub-frames within the frame. There is one sub-frame for each channel, and there's a maximum of 8 channels. */
702 drflac_subframe subframes[8];
703} drflac_frame;
704
705typedef struct
706{
707 /* The function to call when a metadata block is read. */
708 drflac_meta_proc onMeta;
709
710 /* The user data posted to the metadata callback function. */
711 void* pUserDataMD;
712
713 /* Memory allocation callbacks. */
714 drflac_allocation_callbacks allocationCallbacks;
715
716
717 /* The sample rate. Will be set to something like 44100. */
718 drflac_uint32 sampleRate;
719
720 /*
721 The number of channels. This will be set to 1 for monaural streams, 2 for stereo, etc. Maximum 8. This is set based on the
722 value specified in the STREAMINFO block.
723 */
724 drflac_uint8 channels;
725
726 /* The bits per sample. Will be set to something like 16, 24, etc. */
727 drflac_uint8 bitsPerSample;
728
729 /* The maximum block size, in samples. This number represents the number of samples in each channel (not combined). */
730 drflac_uint16 maxBlockSizeInPCMFrames;
731
732 /*
733 The total number of PCM Frames making up the stream. Can be 0 in which case it's still a valid stream, but just means
734 the total PCM frame count is unknown. Likely the case with streams like internet radio.
735 */
736 drflac_uint64 totalPCMFrameCount;
737
738
739 /* The container type. This is set based on whether or not the decoder was opened from a native or Ogg stream. */
740 drflac_container container;
741
742 /* The number of seekpoints in the seektable. */
743 drflac_uint32 seekpointCount;
744
745
746 /* Information about the frame the decoder is currently sitting on. */
747 drflac_frame currentFLACFrame;
748
749
750 /* The index of the PCM frame the decoder is currently sitting on. This is only used for seeking. */
751 drflac_uint64 currentPCMFrame;
752
753 /* The position of the first FLAC frame in the stream. This is only ever used for seeking. */
754 drflac_uint64 firstFLACFramePosInBytes;
755
756
757 /* A hack to avoid a malloc() when opening a decoder with drflac_open_memory(). */
758 drflac__memory_stream memoryStream;
759
760
761 /* A pointer to the decoded sample data. This is an offset of pExtraData. */
762 drflac_int32* pDecodedSamples;
763
764 /* A pointer to the seek table. This is an offset of pExtraData, or NULL if there is no seek table. */
765 drflac_seekpoint* pSeekpoints;
766
767 /* Internal use only. Only used with Ogg containers. Points to a drflac_oggbs object. This is an offset of pExtraData. */
768 void* _oggbs;
769
770 /* Internal use only. Used for profiling and testing different seeking modes. */
771 drflac_bool32 _noSeekTableSeek : 1;
772 drflac_bool32 _noBinarySearchSeek : 1;
773 drflac_bool32 _noBruteForceSeek : 1;
774
775 /* The bit streamer. The raw FLAC data is fed through this object. */
776 drflac_bs bs;
777
778 /* Variable length extra data. We attach this to the end of the object so we can avoid unnecessary mallocs. */
779 drflac_uint8 pExtraData[1];
780} drflac;
781
782
783/*
784Opens a FLAC decoder.
785
786
787Parameters
788----------
789onRead (in)
790 The function to call when data needs to be read from the client.
791
792onSeek (in)
793 The function to call when the read position of the client data needs to move.
794
795pUserData (in, optional)
796 A pointer to application defined data that will be passed to onRead and onSeek.
797
798pAllocationCallbacks (in, optional)
799 A pointer to application defined callbacks for managing memory allocations.
800
801
802Return Value
803------------
804Returns a pointer to an object representing the decoder.
805
806
807Remarks
808-------
809Close the decoder with `drflac_close()`.
810
811`pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
812
813This function will automatically detect whether or not you are attempting to open a native or Ogg encapsulated FLAC, both of which should work seamlessly
814without any manual intervention. Ogg encapsulation also works with multiplexed streams which basically means it can play FLAC encoded audio tracks in videos.
815
816This is the lowest level function for opening a FLAC stream. You can also use `drflac_open_file()` and `drflac_open_memory()` to open the stream from a file or
817from a block of memory respectively.
818
819The STREAMINFO block must be present for this to succeed. Use `drflac_open_relaxed()` to open a FLAC stream where the header may not be present.
820
821Use `drflac_open_with_metadata()` if you need access to metadata.
822
823
824Seek Also
825---------
826drflac_open_file()
827drflac_open_memory()
828drflac_open_with_metadata()
829drflac_close()
830*/
831DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
832
833/*
834Opens a FLAC stream with relaxed validation of the header block.
835
836
837Parameters
838----------
839onRead (in)
840 The function to call when data needs to be read from the client.
841
842onSeek (in)
843 The function to call when the read position of the client data needs to move.
844
845container (in)
846 Whether or not the FLAC stream is encapsulated using standard FLAC encapsulation or Ogg encapsulation.
847
848pUserData (in, optional)
849 A pointer to application defined data that will be passed to onRead and onSeek.
850
851pAllocationCallbacks (in, optional)
852 A pointer to application defined callbacks for managing memory allocations.
853
854
855Return Value
856------------
857A pointer to an object representing the decoder.
858
859
860Remarks
861-------
862The same as drflac_open(), except attempts to open the stream even when a header block is not present.
863
864Because the header is not necessarily available, the caller must explicitly define the container (Native or Ogg). Do not set this to `drflac_container_unknown`
865as that is for internal use only.
866
867Opening in relaxed mode will continue reading data from onRead until it finds a valid frame. If a frame is never found it will continue forever. To abort,
868force your `onRead` callback to return 0, which dr_flac will use as an indicator that the end of the stream was found.
869
870Use `drflac_open_with_metadata_relaxed()` if you need access to metadata.
871*/
872DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
873
874/*
875Opens a FLAC decoder and notifies the caller of the metadata chunks (album art, etc.).
876
877
878Parameters
879----------
880onRead (in)
881 The function to call when data needs to be read from the client.
882
883onSeek (in)
884 The function to call when the read position of the client data needs to move.
885
886onMeta (in)
887 The function to call for every metadata block.
888
889pUserData (in, optional)
890 A pointer to application defined data that will be passed to onRead, onSeek and onMeta.
891
892pAllocationCallbacks (in, optional)
893 A pointer to application defined callbacks for managing memory allocations.
894
895
896Return Value
897------------
898A pointer to an object representing the decoder.
899
900
901Remarks
902-------
903Close the decoder with `drflac_close()`.
904
905`pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
906
907This is slower than `drflac_open()`, so avoid this one if you don't need metadata. Internally, this will allocate and free memory on the heap for every
908metadata block except for STREAMINFO and PADDING blocks.
909
910The caller is notified of the metadata via the `onMeta` callback. All metadata blocks will be handled before the function returns. This callback takes a
911pointer to a `drflac_metadata` object which is a union containing the data of all relevant metadata blocks. Use the `type` member to discriminate against
912the different metadata types.
913
914The STREAMINFO block must be present for this to succeed. Use `drflac_open_with_metadata_relaxed()` to open a FLAC stream where the header may not be present.
915
916Note that this will behave inconsistently with `drflac_open()` if the stream is an Ogg encapsulated stream and a metadata block is corrupted. This is due to
917the way the Ogg stream recovers from corrupted pages. When `drflac_open_with_metadata()` is being used, the open routine will try to read the contents of the
918metadata block, whereas `drflac_open()` will simply seek past it (for the sake of efficiency). This inconsistency can result in different samples being
919returned depending on whether or not the stream is being opened with metadata.
920
921
922Seek Also
923---------
924drflac_open_file_with_metadata()
925drflac_open_memory_with_metadata()
926drflac_open()
927drflac_close()
928*/
929DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
930
931/*
932The same as drflac_open_with_metadata(), except attempts to open the stream even when a header block is not present.
933
934See Also
935--------
936drflac_open_with_metadata()
937drflac_open_relaxed()
938*/
939DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
940
941/*
942Closes the given FLAC decoder.
943
944
945Parameters
946----------
947pFlac (in)
948 The decoder to close.
949
950
951Remarks
952-------
953This will destroy the decoder object.
954
955
956See Also
957--------
958drflac_open()
959drflac_open_with_metadata()
960drflac_open_file()
961drflac_open_file_w()
962drflac_open_file_with_metadata()
963drflac_open_file_with_metadata_w()
964drflac_open_memory()
965drflac_open_memory_with_metadata()
966*/
967DRFLAC_API void drflac_close(drflac* pFlac);
968
969
970/*
971Reads sample data from the given FLAC decoder, output as interleaved signed 32-bit PCM.
972
973
974Parameters
975----------
976pFlac (in)
977 The decoder.
978
979framesToRead (in)
980 The number of PCM frames to read.
981
982pBufferOut (out, optional)
983 A pointer to the buffer that will receive the decoded samples.
984
985
986Return Value
987------------
988Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
989
990
991Remarks
992-------
993pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
994*/
995DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut);
996
997
998/*
999Reads sample data from the given FLAC decoder, output as interleaved signed 16-bit PCM.
1000
1001
1002Parameters
1003----------
1004pFlac (in)
1005 The decoder.
1006
1007framesToRead (in)
1008 The number of PCM frames to read.
1009
1010pBufferOut (out, optional)
1011 A pointer to the buffer that will receive the decoded samples.
1012
1013
1014Return Value
1015------------
1016Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
1017
1018
1019Remarks
1020-------
1021pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
1022
1023Note that this is lossy for streams where the bits per sample is larger than 16.
1024*/
1025DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut);
1026
1027/*
1028Reads sample data from the given FLAC decoder, output as interleaved 32-bit floating point PCM.
1029
1030
1031Parameters
1032----------
1033pFlac (in)
1034 The decoder.
1035
1036framesToRead (in)
1037 The number of PCM frames to read.
1038
1039pBufferOut (out, optional)
1040 A pointer to the buffer that will receive the decoded samples.
1041
1042
1043Return Value
1044------------
1045Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
1046
1047
1048Remarks
1049-------
1050pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
1051
1052Note that this should be considered lossy due to the nature of floating point numbers not being able to exactly represent every possible number.
1053*/
1054DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut);
1055
1056/*
1057Seeks to the PCM frame at the given index.
1058
1059
1060Parameters
1061----------
1062pFlac (in)
1063 The decoder.
1064
1065pcmFrameIndex (in)
1066 The index of the PCM frame to seek to. See notes below.
1067
1068
1069Return Value
1070-------------
1071`DRFLAC_TRUE` if successful; `DRFLAC_FALSE` otherwise.
1072*/
1073DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex);
1074
9e052883 1075
1076
1077#ifndef DR_FLAC_NO_STDIO
1078/*
1079Opens a FLAC decoder from the file at the given path.
1080
1081
1082Parameters
1083----------
1084pFileName (in)
1085 The path of the file to open, either absolute or relative to the current directory.
1086
1087pAllocationCallbacks (in, optional)
1088 A pointer to application defined callbacks for managing memory allocations.
1089
1090
1091Return Value
1092------------
1093A pointer to an object representing the decoder.
1094
1095
1096Remarks
1097-------
1098Close the decoder with drflac_close().
1099
1100
1101Remarks
1102-------
1103This will hold a handle to the file until the decoder is closed with drflac_close(). Some platforms will restrict the number of files a process can have open
1104at any given time, so keep this mind if you have many decoders open at the same time.
1105
1106
1107See Also
1108--------
1109drflac_open_file_with_metadata()
1110drflac_open()
1111drflac_close()
1112*/
1113DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
1114DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
1115
1116/*
1117Opens a FLAC decoder from the file at the given path and notifies the caller of the metadata chunks (album art, etc.)
1118
1119
1120Parameters
1121----------
1122pFileName (in)
1123 The path of the file to open, either absolute or relative to the current directory.
1124
1125pAllocationCallbacks (in, optional)
1126 A pointer to application defined callbacks for managing memory allocations.
1127
1128onMeta (in)
1129 The callback to fire for each metadata block.
1130
1131pUserData (in)
1132 A pointer to the user data to pass to the metadata callback.
1133
1134pAllocationCallbacks (in)
1135 A pointer to application defined callbacks for managing memory allocations.
1136
1137
1138Remarks
1139-------
1140Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
1141
1142
1143See Also
1144--------
1145drflac_open_with_metadata()
1146drflac_open()
1147drflac_close()
1148*/
1149DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1150DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1151#endif
1152
2ff0b512 1153/*
1154Opens a FLAC decoder from a pre-allocated block of memory
1155
1156
1157Parameters
1158----------
1159pData (in)
1160 A pointer to the raw encoded FLAC data.
1161
1162dataSize (in)
1163 The size in bytes of `data`.
1164
1165pAllocationCallbacks (in)
1166 A pointer to application defined callbacks for managing memory allocations.
1167
1168
1169Return Value
1170------------
1171A pointer to an object representing the decoder.
1172
1173
1174Remarks
1175-------
1176This does not create a copy of the data. It is up to the application to ensure the buffer remains valid for the lifetime of the decoder.
1177
1178
1179See Also
1180--------
1181drflac_open()
1182drflac_close()
1183*/
1184DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks);
1185
1186/*
1187Opens a FLAC decoder from a pre-allocated block of memory and notifies the caller of the metadata chunks (album art, etc.)
1188
1189
1190Parameters
1191----------
1192pData (in)
1193 A pointer to the raw encoded FLAC data.
1194
1195dataSize (in)
1196 The size in bytes of `data`.
1197
1198onMeta (in)
1199 The callback to fire for each metadata block.
1200
1201pUserData (in)
1202 A pointer to the user data to pass to the metadata callback.
1203
1204pAllocationCallbacks (in)
1205 A pointer to application defined callbacks for managing memory allocations.
1206
1207
1208Remarks
1209-------
1210Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
1211
1212
1213See Also
1214-------
1215drflac_open_with_metadata()
1216drflac_open()
1217drflac_close()
1218*/
1219DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1220
1221
1222
1223/* High Level APIs */
1224
1225/*
1226Opens a FLAC stream from the given callbacks and fully decodes it in a single operation. The return value is a
1227pointer to the sample data as interleaved signed 32-bit PCM. The returned data must be freed with drflac_free().
1228
1229You can pass in custom memory allocation callbacks via the pAllocationCallbacks parameter. This can be NULL in which
1230case it will use DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
1231
1232Sometimes a FLAC file won't keep track of the total sample count. In this situation the function will continuously
1233read samples into a dynamically sized buffer on the heap until no samples are left.
1234
1235Do not call this function on a broadcast type of stream (like internet radio streams and whatnot).
1236*/
1237DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1238
1239/* Same as drflac_open_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1240DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1241
1242/* Same as drflac_open_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1243DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1244
9e052883 1245#ifndef DR_FLAC_NO_STDIO
1246/* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a file. */
1247DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1248
1249/* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1250DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1251
1252/* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1253DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1254#endif
1255
2ff0b512 1256/* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a block of memory. */
1257DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1258
1259/* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1260DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1261
1262/* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1263DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1264
1265/*
1266Frees memory that was allocated internally by dr_flac.
1267
1268Set pAllocationCallbacks to the same object that was passed to drflac_open_*_and_read_pcm_frames_*(). If you originally passed in NULL, pass in NULL for this.
1269*/
1270DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks);
1271
1272
1273/* Structure representing an iterator for vorbis comments in a VORBIS_COMMENT metadata block. */
1274typedef struct
1275{
1276 drflac_uint32 countRemaining;
1277 const char* pRunningData;
1278} drflac_vorbis_comment_iterator;
1279
1280/*
1281Initializes a vorbis comment iterator. This can be used for iterating over the vorbis comments in a VORBIS_COMMENT
1282metadata block.
1283*/
1284DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments);
1285
1286/*
1287Goes to the next vorbis comment in the given iterator. If null is returned it means there are no more comments. The
1288returned string is NOT null terminated.
1289*/
1290DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut);
1291
1292
1293/* Structure representing an iterator for cuesheet tracks in a CUESHEET metadata block. */
1294typedef struct
1295{
1296 drflac_uint32 countRemaining;
1297 const char* pRunningData;
1298} drflac_cuesheet_track_iterator;
1299
9e052883 1300/* The order of members here is important because we map this directly to the raw data within the CUESHEET metadata block. */
2ff0b512 1301typedef struct
1302{
1303 drflac_uint64 offset;
1304 drflac_uint8 index;
1305 drflac_uint8 reserved[3];
1306} drflac_cuesheet_track_index;
2ff0b512 1307
1308typedef struct
1309{
1310 drflac_uint64 offset;
1311 drflac_uint8 trackNumber;
1312 char ISRC[12];
1313 drflac_bool8 isAudio;
1314 drflac_bool8 preEmphasis;
1315 drflac_uint8 indexCount;
1316 const drflac_cuesheet_track_index* pIndexPoints;
1317} drflac_cuesheet_track;
1318
1319/*
1320Initializes a cuesheet track iterator. This can be used for iterating over the cuesheet tracks in a CUESHEET metadata
1321block.
1322*/
1323DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData);
1324
1325/* Goes to the next cuesheet track in the given iterator. If DRFLAC_FALSE is returned it means there are no more comments. */
1326DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack);
1327
1328
1329#ifdef __cplusplus
1330}
1331#endif
1332#endif /* dr_flac_h */
1333
1334
1335/************************************************************************************************************************************************************
1336 ************************************************************************************************************************************************************
1337
1338 IMPLEMENTATION
1339
1340 ************************************************************************************************************************************************************
1341 ************************************************************************************************************************************************************/
1342#if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
1343#ifndef dr_flac_c
1344#define dr_flac_c
1345
1346/* Disable some annoying warnings. */
1347#if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
1348 #pragma GCC diagnostic push
1349 #if __GNUC__ >= 7
1350 #pragma GCC diagnostic ignored "-Wimplicit-fallthrough"
1351 #endif
1352#endif
1353
1354#ifdef __linux__
1355 #ifndef _BSD_SOURCE
1356 #define _BSD_SOURCE
1357 #endif
1358 #ifndef _DEFAULT_SOURCE
1359 #define _DEFAULT_SOURCE
1360 #endif
1361 #ifndef __USE_BSD
1362 #define __USE_BSD
1363 #endif
1364 #include <endian.h>
1365#endif
1366
1367#include <stdlib.h>
1368#include <string.h>
1369
648db22b 1370/* Inline */
2ff0b512 1371#ifdef _MSC_VER
1372 #define DRFLAC_INLINE __forceinline
1373#elif defined(__GNUC__)
1374 /*
1375 I've had a bug report where GCC is emitting warnings about functions possibly not being inlineable. This warning happens when
1376 the __attribute__((always_inline)) attribute is defined without an "inline" statement. I think therefore there must be some
1377 case where "__inline__" is not always defined, thus the compiler emitting these warnings. When using -std=c89 or -ansi on the
1378 command line, we cannot use the "inline" keyword and instead need to use "__inline__". In an attempt to work around this issue
1379 I am using "__inline__" only when we're compiling in strict ANSI mode.
1380 */
1381 #if defined(__STRICT_ANSI__)
9e052883 1382 #define DRFLAC_GNUC_INLINE_HINT __inline__
1383 #else
1384 #define DRFLAC_GNUC_INLINE_HINT inline
1385 #endif
1386
1387 #if (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 2)) || defined(__clang__)
1388 #define DRFLAC_INLINE DRFLAC_GNUC_INLINE_HINT __attribute__((always_inline))
2ff0b512 1389 #else
9e052883 1390 #define DRFLAC_INLINE DRFLAC_GNUC_INLINE_HINT
2ff0b512 1391 #endif
1392#elif defined(__WATCOMC__)
1393 #define DRFLAC_INLINE __inline
1394#else
1395 #define DRFLAC_INLINE
1396#endif
648db22b 1397/* End Inline */
2ff0b512 1398
1399/*
1400Intrinsics Support
1401
1402There's a bug in GCC 4.2.x which results in an incorrect compilation error when using _mm_slli_epi32() where it complains with
1403
1404 "error: shift must be an immediate"
1405
1406Unfortuantely dr_flac depends on this for a few things so we're just going to disable SSE on GCC 4.2 and below.
1407*/
1408#if !defined(DR_FLAC_NO_SIMD)
1409 #if defined(DRFLAC_X64) || defined(DRFLAC_X86)
1410 #if defined(_MSC_VER) && !defined(__clang__)
1411 /* MSVC. */
1412 #if _MSC_VER >= 1400 && !defined(DRFLAC_NO_SSE2) /* 2005 */
1413 #define DRFLAC_SUPPORT_SSE2
1414 #endif
1415 #if _MSC_VER >= 1600 && !defined(DRFLAC_NO_SSE41) /* 2010 */
1416 #define DRFLAC_SUPPORT_SSE41
1417 #endif
1418 #elif defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)))
1419 /* Assume GNUC-style. */
1420 #if defined(__SSE2__) && !defined(DRFLAC_NO_SSE2)
1421 #define DRFLAC_SUPPORT_SSE2
1422 #endif
1423 #if defined(__SSE4_1__) && !defined(DRFLAC_NO_SSE41)
1424 #define DRFLAC_SUPPORT_SSE41
1425 #endif
1426 #endif
1427
1428 /* If at this point we still haven't determined compiler support for the intrinsics just fall back to __has_include. */
1429 #if !defined(__GNUC__) && !defined(__clang__) && defined(__has_include)
1430 #if !defined(DRFLAC_SUPPORT_SSE2) && !defined(DRFLAC_NO_SSE2) && __has_include(<emmintrin.h>)
1431 #define DRFLAC_SUPPORT_SSE2
1432 #endif
1433 #if !defined(DRFLAC_SUPPORT_SSE41) && !defined(DRFLAC_NO_SSE41) && __has_include(<smmintrin.h>)
1434 #define DRFLAC_SUPPORT_SSE41
1435 #endif
1436 #endif
1437
1438 #if defined(DRFLAC_SUPPORT_SSE41)
1439 #include <smmintrin.h>
1440 #elif defined(DRFLAC_SUPPORT_SSE2)
1441 #include <emmintrin.h>
1442 #endif
1443 #endif
1444
1445 #if defined(DRFLAC_ARM)
1446 #if !defined(DRFLAC_NO_NEON) && (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
1447 #define DRFLAC_SUPPORT_NEON
2ff0b512 1448 #include <arm_neon.h>
1449 #endif
1450 #endif
1451#endif
1452
1453/* Compile-time CPU feature support. */
1454#if !defined(DR_FLAC_NO_SIMD) && (defined(DRFLAC_X86) || defined(DRFLAC_X64))
1455 #if defined(_MSC_VER) && !defined(__clang__)
1456 #if _MSC_VER >= 1400
1457 #include <intrin.h>
1458 static void drflac__cpuid(int info[4], int fid)
1459 {
1460 __cpuid(info, fid);
1461 }
1462 #else
1463 #define DRFLAC_NO_CPUID
1464 #endif
1465 #else
1466 #if defined(__GNUC__) || defined(__clang__)
1467 static void drflac__cpuid(int info[4], int fid)
1468 {
1469 /*
1470 It looks like the -fPIC option uses the ebx register which GCC complains about. We can work around this by just using a different register, the
1471 specific register of which I'm letting the compiler decide on. The "k" prefix is used to specify a 32-bit register. The {...} syntax is for
1472 supporting different assembly dialects.
1473
1474 What's basically happening is that we're saving and restoring the ebx register manually.
1475 */
1476 #if defined(DRFLAC_X86) && defined(__PIC__)
1477 __asm__ __volatile__ (
1478 "xchg{l} {%%}ebx, %k1;"
1479 "cpuid;"
1480 "xchg{l} {%%}ebx, %k1;"
1481 : "=a"(info[0]), "=&r"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
1482 );
1483 #else
1484 __asm__ __volatile__ (
1485 "cpuid" : "=a"(info[0]), "=b"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
1486 );
1487 #endif
1488 }
1489 #else
1490 #define DRFLAC_NO_CPUID
1491 #endif
1492 #endif
1493#else
1494 #define DRFLAC_NO_CPUID
1495#endif
1496
1497static DRFLAC_INLINE drflac_bool32 drflac_has_sse2(void)
1498{
1499#if defined(DRFLAC_SUPPORT_SSE2)
1500 #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE2)
1501 #if defined(DRFLAC_X64)
1502 return DRFLAC_TRUE; /* 64-bit targets always support SSE2. */
1503 #elif (defined(_M_IX86_FP) && _M_IX86_FP == 2) || defined(__SSE2__)
1504 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate SSE2 code we can assume support. */
1505 #else
1506 #if defined(DRFLAC_NO_CPUID)
1507 return DRFLAC_FALSE;
1508 #else
1509 int info[4];
1510 drflac__cpuid(info, 1);
1511 return (info[3] & (1 << 26)) != 0;
1512 #endif
1513 #endif
1514 #else
1515 return DRFLAC_FALSE; /* SSE2 is only supported on x86 and x64 architectures. */
1516 #endif
1517#else
1518 return DRFLAC_FALSE; /* No compiler support. */
1519#endif
1520}
1521
1522static DRFLAC_INLINE drflac_bool32 drflac_has_sse41(void)
1523{
1524#if defined(DRFLAC_SUPPORT_SSE41)
1525 #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE41)
9e052883 1526 #if defined(__SSE4_1__) || defined(__AVX__)
2ff0b512 1527 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate SSE41 code we can assume support. */
1528 #else
1529 #if defined(DRFLAC_NO_CPUID)
1530 return DRFLAC_FALSE;
1531 #else
1532 int info[4];
1533 drflac__cpuid(info, 1);
1534 return (info[2] & (1 << 19)) != 0;
1535 #endif
1536 #endif
1537 #else
1538 return DRFLAC_FALSE; /* SSE41 is only supported on x86 and x64 architectures. */
1539 #endif
1540#else
1541 return DRFLAC_FALSE; /* No compiler support. */
1542#endif
1543}
1544
1545
1546#if defined(_MSC_VER) && _MSC_VER >= 1500 && (defined(DRFLAC_X86) || defined(DRFLAC_X64)) && !defined(__clang__)
1547 #define DRFLAC_HAS_LZCNT_INTRINSIC
1548#elif (defined(__GNUC__) && ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7)))
1549 #define DRFLAC_HAS_LZCNT_INTRINSIC
1550#elif defined(__clang__)
1551 #if defined(__has_builtin)
1552 #if __has_builtin(__builtin_clzll) || __has_builtin(__builtin_clzl)
1553 #define DRFLAC_HAS_LZCNT_INTRINSIC
1554 #endif
1555 #endif
1556#endif
1557
1558#if defined(_MSC_VER) && _MSC_VER >= 1400 && !defined(__clang__)
1559 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1560 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1561 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1562#elif defined(__clang__)
1563 #if defined(__has_builtin)
1564 #if __has_builtin(__builtin_bswap16)
1565 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1566 #endif
1567 #if __has_builtin(__builtin_bswap32)
1568 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1569 #endif
1570 #if __has_builtin(__builtin_bswap64)
1571 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1572 #endif
1573 #endif
1574#elif defined(__GNUC__)
1575 #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))
1576 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1577 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1578 #endif
1579 #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
1580 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1581 #endif
9e052883 1582#elif defined(__WATCOMC__) && defined(__386__)
1583 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1584 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1585 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1586 extern __inline drflac_uint16 _watcom_bswap16(drflac_uint16);
1587 extern __inline drflac_uint32 _watcom_bswap32(drflac_uint32);
1588 extern __inline drflac_uint64 _watcom_bswap64(drflac_uint64);
1589#pragma aux _watcom_bswap16 = \
1590 "xchg al, ah" \
1591 parm [ax] \
1592 value [ax] \
1593 modify nomemory;
1594#pragma aux _watcom_bswap32 = \
1595 "bswap eax" \
1596 parm [eax] \
1597 value [eax] \
1598 modify nomemory;
1599#pragma aux _watcom_bswap64 = \
1600 "bswap eax" \
1601 "bswap edx" \
1602 "xchg eax,edx" \
1603 parm [eax edx] \
1604 value [eax edx] \
1605 modify nomemory;
2ff0b512 1606#endif
1607
1608
1609/* Standard library stuff. */
1610#ifndef DRFLAC_ASSERT
1611#include <assert.h>
1612#define DRFLAC_ASSERT(expression) assert(expression)
1613#endif
1614#ifndef DRFLAC_MALLOC
1615#define DRFLAC_MALLOC(sz) malloc((sz))
1616#endif
1617#ifndef DRFLAC_REALLOC
1618#define DRFLAC_REALLOC(p, sz) realloc((p), (sz))
1619#endif
1620#ifndef DRFLAC_FREE
1621#define DRFLAC_FREE(p) free((p))
1622#endif
1623#ifndef DRFLAC_COPY_MEMORY
1624#define DRFLAC_COPY_MEMORY(dst, src, sz) memcpy((dst), (src), (sz))
1625#endif
1626#ifndef DRFLAC_ZERO_MEMORY
1627#define DRFLAC_ZERO_MEMORY(p, sz) memset((p), 0, (sz))
1628#endif
1629#ifndef DRFLAC_ZERO_OBJECT
1630#define DRFLAC_ZERO_OBJECT(p) DRFLAC_ZERO_MEMORY((p), sizeof(*(p)))
1631#endif
1632
1633#define DRFLAC_MAX_SIMD_VECTOR_SIZE 64 /* 64 for AVX-512 in the future. */
1634
648db22b 1635/* Result Codes */
2ff0b512 1636typedef drflac_int32 drflac_result;
1637#define DRFLAC_SUCCESS 0
1638#define DRFLAC_ERROR -1 /* A generic error. */
1639#define DRFLAC_INVALID_ARGS -2
1640#define DRFLAC_INVALID_OPERATION -3
1641#define DRFLAC_OUT_OF_MEMORY -4
1642#define DRFLAC_OUT_OF_RANGE -5
1643#define DRFLAC_ACCESS_DENIED -6
1644#define DRFLAC_DOES_NOT_EXIST -7
1645#define DRFLAC_ALREADY_EXISTS -8
1646#define DRFLAC_TOO_MANY_OPEN_FILES -9
1647#define DRFLAC_INVALID_FILE -10
1648#define DRFLAC_TOO_BIG -11
1649#define DRFLAC_PATH_TOO_LONG -12
1650#define DRFLAC_NAME_TOO_LONG -13
1651#define DRFLAC_NOT_DIRECTORY -14
1652#define DRFLAC_IS_DIRECTORY -15
1653#define DRFLAC_DIRECTORY_NOT_EMPTY -16
1654#define DRFLAC_END_OF_FILE -17
1655#define DRFLAC_NO_SPACE -18
1656#define DRFLAC_BUSY -19
1657#define DRFLAC_IO_ERROR -20
1658#define DRFLAC_INTERRUPT -21
1659#define DRFLAC_UNAVAILABLE -22
1660#define DRFLAC_ALREADY_IN_USE -23
1661#define DRFLAC_BAD_ADDRESS -24
1662#define DRFLAC_BAD_SEEK -25
1663#define DRFLAC_BAD_PIPE -26
1664#define DRFLAC_DEADLOCK -27
1665#define DRFLAC_TOO_MANY_LINKS -28
1666#define DRFLAC_NOT_IMPLEMENTED -29
1667#define DRFLAC_NO_MESSAGE -30
1668#define DRFLAC_BAD_MESSAGE -31
1669#define DRFLAC_NO_DATA_AVAILABLE -32
1670#define DRFLAC_INVALID_DATA -33
1671#define DRFLAC_TIMEOUT -34
1672#define DRFLAC_NO_NETWORK -35
1673#define DRFLAC_NOT_UNIQUE -36
1674#define DRFLAC_NOT_SOCKET -37
1675#define DRFLAC_NO_ADDRESS -38
1676#define DRFLAC_BAD_PROTOCOL -39
1677#define DRFLAC_PROTOCOL_UNAVAILABLE -40
1678#define DRFLAC_PROTOCOL_NOT_SUPPORTED -41
1679#define DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED -42
1680#define DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED -43
1681#define DRFLAC_SOCKET_NOT_SUPPORTED -44
1682#define DRFLAC_CONNECTION_RESET -45
1683#define DRFLAC_ALREADY_CONNECTED -46
1684#define DRFLAC_NOT_CONNECTED -47
1685#define DRFLAC_CONNECTION_REFUSED -48
1686#define DRFLAC_NO_HOST -49
1687#define DRFLAC_IN_PROGRESS -50
1688#define DRFLAC_CANCELLED -51
1689#define DRFLAC_MEMORY_ALREADY_MAPPED -52
1690#define DRFLAC_AT_END -53
648db22b 1691
1692#define DRFLAC_CRC_MISMATCH -100
1693/* End Result Codes */
1694
2ff0b512 1695
1696#define DRFLAC_SUBFRAME_CONSTANT 0
1697#define DRFLAC_SUBFRAME_VERBATIM 1
1698#define DRFLAC_SUBFRAME_FIXED 8
1699#define DRFLAC_SUBFRAME_LPC 32
1700#define DRFLAC_SUBFRAME_RESERVED 255
1701
1702#define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE 0
1703#define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2 1
1704
1705#define DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT 0
1706#define DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE 8
1707#define DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE 9
1708#define DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE 10
1709
9e052883 1710#define DRFLAC_SEEKPOINT_SIZE_IN_BYTES 18
1711#define DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES 36
1712#define DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES 12
1713
2ff0b512 1714#define drflac_align(x, a) ((((x) + (a) - 1) / (a)) * (a))
1715
1716
1717DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision)
1718{
1719 if (pMajor) {
1720 *pMajor = DRFLAC_VERSION_MAJOR;
1721 }
1722
1723 if (pMinor) {
1724 *pMinor = DRFLAC_VERSION_MINOR;
1725 }
1726
1727 if (pRevision) {
1728 *pRevision = DRFLAC_VERSION_REVISION;
1729 }
1730}
1731
1732DRFLAC_API const char* drflac_version_string(void)
1733{
1734 return DRFLAC_VERSION_STRING;
1735}
1736
1737
1738/* CPU caps. */
1739#if defined(__has_feature)
1740 #if __has_feature(thread_sanitizer)
1741 #define DRFLAC_NO_THREAD_SANITIZE __attribute__((no_sanitize("thread")))
1742 #else
1743 #define DRFLAC_NO_THREAD_SANITIZE
1744 #endif
1745#else
1746 #define DRFLAC_NO_THREAD_SANITIZE
1747#endif
1748
1749#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
1750static drflac_bool32 drflac__gIsLZCNTSupported = DRFLAC_FALSE;
1751#endif
1752
1753#ifndef DRFLAC_NO_CPUID
1754static drflac_bool32 drflac__gIsSSE2Supported = DRFLAC_FALSE;
1755static drflac_bool32 drflac__gIsSSE41Supported = DRFLAC_FALSE;
1756
1757/*
1758I've had a bug report that Clang's ThreadSanitizer presents a warning in this function. Having reviewed this, this does
1759actually make sense. However, since CPU caps should never differ for a running process, I don't think the trade off of
1760complicating internal API's by passing around CPU caps versus just disabling the warnings is worthwhile. I'm therefore
1761just going to disable these warnings. This is disabled via the DRFLAC_NO_THREAD_SANITIZE attribute.
1762*/
1763DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
1764{
1765 static drflac_bool32 isCPUCapsInitialized = DRFLAC_FALSE;
1766
1767 if (!isCPUCapsInitialized) {
1768 /* LZCNT */
1769#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
1770 int info[4] = {0};
1771 drflac__cpuid(info, 0x80000001);
1772 drflac__gIsLZCNTSupported = (info[2] & (1 << 5)) != 0;
1773#endif
1774
1775 /* SSE2 */
1776 drflac__gIsSSE2Supported = drflac_has_sse2();
1777
1778 /* SSE4.1 */
1779 drflac__gIsSSE41Supported = drflac_has_sse41();
1780
1781 /* Initialized. */
1782 isCPUCapsInitialized = DRFLAC_TRUE;
1783 }
1784}
1785#else
1786static drflac_bool32 drflac__gIsNEONSupported = DRFLAC_FALSE;
1787
1788static DRFLAC_INLINE drflac_bool32 drflac__has_neon(void)
1789{
1790#if defined(DRFLAC_SUPPORT_NEON)
1791 #if defined(DRFLAC_ARM) && !defined(DRFLAC_NO_NEON)
1792 #if (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
1793 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate NEON code we can assume support. */
1794 #else
1795 /* TODO: Runtime check. */
1796 return DRFLAC_FALSE;
1797 #endif
1798 #else
1799 return DRFLAC_FALSE; /* NEON is only supported on ARM architectures. */
1800 #endif
1801#else
1802 return DRFLAC_FALSE; /* No compiler support. */
1803#endif
1804}
1805
1806DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
1807{
1808 drflac__gIsNEONSupported = drflac__has_neon();
1809
1810#if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
1811 drflac__gIsLZCNTSupported = DRFLAC_TRUE;
1812#endif
1813}
1814#endif
1815
1816
1817/* Endian Management */
1818static DRFLAC_INLINE drflac_bool32 drflac__is_little_endian(void)
1819{
1820#if defined(DRFLAC_X86) || defined(DRFLAC_X64)
1821 return DRFLAC_TRUE;
1822#elif defined(__BYTE_ORDER) && defined(__LITTLE_ENDIAN) && __BYTE_ORDER == __LITTLE_ENDIAN
1823 return DRFLAC_TRUE;
1824#else
1825 int n = 1;
1826 return (*(char*)&n) == 1;
1827#endif
1828}
1829
1830static DRFLAC_INLINE drflac_uint16 drflac__swap_endian_uint16(drflac_uint16 n)
1831{
1832#ifdef DRFLAC_HAS_BYTESWAP16_INTRINSIC
1833 #if defined(_MSC_VER) && !defined(__clang__)
1834 return _byteswap_ushort(n);
1835 #elif defined(__GNUC__) || defined(__clang__)
1836 return __builtin_bswap16(n);
9e052883 1837 #elif defined(__WATCOMC__) && defined(__386__)
1838 return _watcom_bswap16(n);
2ff0b512 1839 #else
1840 #error "This compiler does not support the byte swap intrinsic."
1841 #endif
1842#else
1843 return ((n & 0xFF00) >> 8) |
1844 ((n & 0x00FF) << 8);
1845#endif
1846}
1847
1848static DRFLAC_INLINE drflac_uint32 drflac__swap_endian_uint32(drflac_uint32 n)
1849{
1850#ifdef DRFLAC_HAS_BYTESWAP32_INTRINSIC
1851 #if defined(_MSC_VER) && !defined(__clang__)
1852 return _byteswap_ulong(n);
1853 #elif defined(__GNUC__) || defined(__clang__)
648db22b 1854 #if defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 6) && !defined(__ARM_ARCH_6M__) && !defined(DRFLAC_64BIT) /* <-- 64-bit inline assembly has not been tested, so disabling for now. */
2ff0b512 1855 /* Inline assembly optimized implementation for ARM. In my testing, GCC does not generate optimized code with __builtin_bswap32(). */
1856 drflac_uint32 r;
1857 __asm__ __volatile__ (
1858 #if defined(DRFLAC_64BIT)
1859 "rev %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(n) /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
1860 #else
1861 "rev %[out], %[in]" : [out]"=r"(r) : [in]"r"(n)
1862 #endif
1863 );
1864 return r;
1865 #else
1866 return __builtin_bswap32(n);
1867 #endif
9e052883 1868 #elif defined(__WATCOMC__) && defined(__386__)
1869 return _watcom_bswap32(n);
2ff0b512 1870 #else
1871 #error "This compiler does not support the byte swap intrinsic."
1872 #endif
1873#else
1874 return ((n & 0xFF000000) >> 24) |
1875 ((n & 0x00FF0000) >> 8) |
1876 ((n & 0x0000FF00) << 8) |
1877 ((n & 0x000000FF) << 24);
1878#endif
1879}
1880
1881static DRFLAC_INLINE drflac_uint64 drflac__swap_endian_uint64(drflac_uint64 n)
1882{
1883#ifdef DRFLAC_HAS_BYTESWAP64_INTRINSIC
1884 #if defined(_MSC_VER) && !defined(__clang__)
1885 return _byteswap_uint64(n);
1886 #elif defined(__GNUC__) || defined(__clang__)
1887 return __builtin_bswap64(n);
9e052883 1888 #elif defined(__WATCOMC__) && defined(__386__)
1889 return _watcom_bswap64(n);
2ff0b512 1890 #else
1891 #error "This compiler does not support the byte swap intrinsic."
1892 #endif
1893#else
1894 /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
1895 return ((n & ((drflac_uint64)0xFF000000 << 32)) >> 56) |
1896 ((n & ((drflac_uint64)0x00FF0000 << 32)) >> 40) |
1897 ((n & ((drflac_uint64)0x0000FF00 << 32)) >> 24) |
1898 ((n & ((drflac_uint64)0x000000FF << 32)) >> 8) |
1899 ((n & ((drflac_uint64)0xFF000000 )) << 8) |
1900 ((n & ((drflac_uint64)0x00FF0000 )) << 24) |
1901 ((n & ((drflac_uint64)0x0000FF00 )) << 40) |
1902 ((n & ((drflac_uint64)0x000000FF )) << 56);
1903#endif
1904}
1905
1906
1907static DRFLAC_INLINE drflac_uint16 drflac__be2host_16(drflac_uint16 n)
1908{
1909 if (drflac__is_little_endian()) {
1910 return drflac__swap_endian_uint16(n);
1911 }
1912
1913 return n;
1914}
1915
1916static DRFLAC_INLINE drflac_uint32 drflac__be2host_32(drflac_uint32 n)
1917{
1918 if (drflac__is_little_endian()) {
1919 return drflac__swap_endian_uint32(n);
1920 }
1921
1922 return n;
1923}
1924
9e052883 1925static DRFLAC_INLINE drflac_uint32 drflac__be2host_32_ptr_unaligned(const void* pData)
1926{
1927 const drflac_uint8* pNum = (drflac_uint8*)pData;
1928 return *(pNum) << 24 | *(pNum+1) << 16 | *(pNum+2) << 8 | *(pNum+3);
1929}
1930
2ff0b512 1931static DRFLAC_INLINE drflac_uint64 drflac__be2host_64(drflac_uint64 n)
1932{
1933 if (drflac__is_little_endian()) {
1934 return drflac__swap_endian_uint64(n);
1935 }
1936
1937 return n;
1938}
1939
1940
1941static DRFLAC_INLINE drflac_uint32 drflac__le2host_32(drflac_uint32 n)
1942{
1943 if (!drflac__is_little_endian()) {
1944 return drflac__swap_endian_uint32(n);
1945 }
1946
1947 return n;
1948}
1949
9e052883 1950static DRFLAC_INLINE drflac_uint32 drflac__le2host_32_ptr_unaligned(const void* pData)
1951{
1952 const drflac_uint8* pNum = (drflac_uint8*)pData;
1953 return *pNum | *(pNum+1) << 8 | *(pNum+2) << 16 | *(pNum+3) << 24;
1954}
1955
2ff0b512 1956
1957static DRFLAC_INLINE drflac_uint32 drflac__unsynchsafe_32(drflac_uint32 n)
1958{
1959 drflac_uint32 result = 0;
1960 result |= (n & 0x7F000000) >> 3;
1961 result |= (n & 0x007F0000) >> 2;
1962 result |= (n & 0x00007F00) >> 1;
1963 result |= (n & 0x0000007F) >> 0;
1964
1965 return result;
1966}
1967
1968
1969
1970/* The CRC code below is based on this document: http://zlib.net/crc_v3.txt */
1971static drflac_uint8 drflac__crc8_table[] = {
1972 0x00, 0x07, 0x0E, 0x09, 0x1C, 0x1B, 0x12, 0x15, 0x38, 0x3F, 0x36, 0x31, 0x24, 0x23, 0x2A, 0x2D,
1973 0x70, 0x77, 0x7E, 0x79, 0x6C, 0x6B, 0x62, 0x65, 0x48, 0x4F, 0x46, 0x41, 0x54, 0x53, 0x5A, 0x5D,
1974 0xE0, 0xE7, 0xEE, 0xE9, 0xFC, 0xFB, 0xF2, 0xF5, 0xD8, 0xDF, 0xD6, 0xD1, 0xC4, 0xC3, 0xCA, 0xCD,
1975 0x90, 0x97, 0x9E, 0x99, 0x8C, 0x8B, 0x82, 0x85, 0xA8, 0xAF, 0xA6, 0xA1, 0xB4, 0xB3, 0xBA, 0xBD,
1976 0xC7, 0xC0, 0xC9, 0xCE, 0xDB, 0xDC, 0xD5, 0xD2, 0xFF, 0xF8, 0xF1, 0xF6, 0xE3, 0xE4, 0xED, 0xEA,
1977 0xB7, 0xB0, 0xB9, 0xBE, 0xAB, 0xAC, 0xA5, 0xA2, 0x8F, 0x88, 0x81, 0x86, 0x93, 0x94, 0x9D, 0x9A,
1978 0x27, 0x20, 0x29, 0x2E, 0x3B, 0x3C, 0x35, 0x32, 0x1F, 0x18, 0x11, 0x16, 0x03, 0x04, 0x0D, 0x0A,
1979 0x57, 0x50, 0x59, 0x5E, 0x4B, 0x4C, 0x45, 0x42, 0x6F, 0x68, 0x61, 0x66, 0x73, 0x74, 0x7D, 0x7A,
1980 0x89, 0x8E, 0x87, 0x80, 0x95, 0x92, 0x9B, 0x9C, 0xB1, 0xB6, 0xBF, 0xB8, 0xAD, 0xAA, 0xA3, 0xA4,
1981 0xF9, 0xFE, 0xF7, 0xF0, 0xE5, 0xE2, 0xEB, 0xEC, 0xC1, 0xC6, 0xCF, 0xC8, 0xDD, 0xDA, 0xD3, 0xD4,
1982 0x69, 0x6E, 0x67, 0x60, 0x75, 0x72, 0x7B, 0x7C, 0x51, 0x56, 0x5F, 0x58, 0x4D, 0x4A, 0x43, 0x44,
1983 0x19, 0x1E, 0x17, 0x10, 0x05, 0x02, 0x0B, 0x0C, 0x21, 0x26, 0x2F, 0x28, 0x3D, 0x3A, 0x33, 0x34,
1984 0x4E, 0x49, 0x40, 0x47, 0x52, 0x55, 0x5C, 0x5B, 0x76, 0x71, 0x78, 0x7F, 0x6A, 0x6D, 0x64, 0x63,
1985 0x3E, 0x39, 0x30, 0x37, 0x22, 0x25, 0x2C, 0x2B, 0x06, 0x01, 0x08, 0x0F, 0x1A, 0x1D, 0x14, 0x13,
1986 0xAE, 0xA9, 0xA0, 0xA7, 0xB2, 0xB5, 0xBC, 0xBB, 0x96, 0x91, 0x98, 0x9F, 0x8A, 0x8D, 0x84, 0x83,
1987 0xDE, 0xD9, 0xD0, 0xD7, 0xC2, 0xC5, 0xCC, 0xCB, 0xE6, 0xE1, 0xE8, 0xEF, 0xFA, 0xFD, 0xF4, 0xF3
1988};
1989
1990static drflac_uint16 drflac__crc16_table[] = {
1991 0x0000, 0x8005, 0x800F, 0x000A, 0x801B, 0x001E, 0x0014, 0x8011,
1992 0x8033, 0x0036, 0x003C, 0x8039, 0x0028, 0x802D, 0x8027, 0x0022,
1993 0x8063, 0x0066, 0x006C, 0x8069, 0x0078, 0x807D, 0x8077, 0x0072,
1994 0x0050, 0x8055, 0x805F, 0x005A, 0x804B, 0x004E, 0x0044, 0x8041,
1995 0x80C3, 0x00C6, 0x00CC, 0x80C9, 0x00D8, 0x80DD, 0x80D7, 0x00D2,
1996 0x00F0, 0x80F5, 0x80FF, 0x00FA, 0x80EB, 0x00EE, 0x00E4, 0x80E1,
1997 0x00A0, 0x80A5, 0x80AF, 0x00AA, 0x80BB, 0x00BE, 0x00B4, 0x80B1,
1998 0x8093, 0x0096, 0x009C, 0x8099, 0x0088, 0x808D, 0x8087, 0x0082,
1999 0x8183, 0x0186, 0x018C, 0x8189, 0x0198, 0x819D, 0x8197, 0x0192,
2000 0x01B0, 0x81B5, 0x81BF, 0x01BA, 0x81AB, 0x01AE, 0x01A4, 0x81A1,
2001 0x01E0, 0x81E5, 0x81EF, 0x01EA, 0x81FB, 0x01FE, 0x01F4, 0x81F1,
2002 0x81D3, 0x01D6, 0x01DC, 0x81D9, 0x01C8, 0x81CD, 0x81C7, 0x01C2,
2003 0x0140, 0x8145, 0x814F, 0x014A, 0x815B, 0x015E, 0x0154, 0x8151,
2004 0x8173, 0x0176, 0x017C, 0x8179, 0x0168, 0x816D, 0x8167, 0x0162,
2005 0x8123, 0x0126, 0x012C, 0x8129, 0x0138, 0x813D, 0x8137, 0x0132,
2006 0x0110, 0x8115, 0x811F, 0x011A, 0x810B, 0x010E, 0x0104, 0x8101,
2007 0x8303, 0x0306, 0x030C, 0x8309, 0x0318, 0x831D, 0x8317, 0x0312,
2008 0x0330, 0x8335, 0x833F, 0x033A, 0x832B, 0x032E, 0x0324, 0x8321,
2009 0x0360, 0x8365, 0x836F, 0x036A, 0x837B, 0x037E, 0x0374, 0x8371,
2010 0x8353, 0x0356, 0x035C, 0x8359, 0x0348, 0x834D, 0x8347, 0x0342,
2011 0x03C0, 0x83C5, 0x83CF, 0x03CA, 0x83DB, 0x03DE, 0x03D4, 0x83D1,
2012 0x83F3, 0x03F6, 0x03FC, 0x83F9, 0x03E8, 0x83ED, 0x83E7, 0x03E2,
2013 0x83A3, 0x03A6, 0x03AC, 0x83A9, 0x03B8, 0x83BD, 0x83B7, 0x03B2,
2014 0x0390, 0x8395, 0x839F, 0x039A, 0x838B, 0x038E, 0x0384, 0x8381,
2015 0x0280, 0x8285, 0x828F, 0x028A, 0x829B, 0x029E, 0x0294, 0x8291,
2016 0x82B3, 0x02B6, 0x02BC, 0x82B9, 0x02A8, 0x82AD, 0x82A7, 0x02A2,
2017 0x82E3, 0x02E6, 0x02EC, 0x82E9, 0x02F8, 0x82FD, 0x82F7, 0x02F2,
2018 0x02D0, 0x82D5, 0x82DF, 0x02DA, 0x82CB, 0x02CE, 0x02C4, 0x82C1,
2019 0x8243, 0x0246, 0x024C, 0x8249, 0x0258, 0x825D, 0x8257, 0x0252,
2020 0x0270, 0x8275, 0x827F, 0x027A, 0x826B, 0x026E, 0x0264, 0x8261,
2021 0x0220, 0x8225, 0x822F, 0x022A, 0x823B, 0x023E, 0x0234, 0x8231,
2022 0x8213, 0x0216, 0x021C, 0x8219, 0x0208, 0x820D, 0x8207, 0x0202
2023};
2024
2025static DRFLAC_INLINE drflac_uint8 drflac_crc8_byte(drflac_uint8 crc, drflac_uint8 data)
2026{
2027 return drflac__crc8_table[crc ^ data];
2028}
2029
2030static DRFLAC_INLINE drflac_uint8 drflac_crc8(drflac_uint8 crc, drflac_uint32 data, drflac_uint32 count)
2031{
2032#ifdef DR_FLAC_NO_CRC
2033 (void)crc;
2034 (void)data;
2035 (void)count;
2036 return 0;
9e052883 2037#else
2038#if 0
2039 /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc8(crc, 0, 8);") */
2040 drflac_uint8 p = 0x07;
2041 for (int i = count-1; i >= 0; --i) {
2042 drflac_uint8 bit = (data & (1 << i)) >> i;
2043 if (crc & 0x80) {
2044 crc = ((crc << 1) | bit) ^ p;
2045 } else {
2046 crc = ((crc << 1) | bit);
2047 }
2048 }
2049 return crc;
2ff0b512 2050#else
2051 drflac_uint32 wholeBytes;
2052 drflac_uint32 leftoverBits;
2053 drflac_uint64 leftoverDataMask;
2054
2055 static drflac_uint64 leftoverDataMaskTable[8] = {
2056 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2057 };
2058
2059 DRFLAC_ASSERT(count <= 32);
2060
2061 wholeBytes = count >> 3;
2062 leftoverBits = count - (wholeBytes*8);
2063 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2064
2065 switch (wholeBytes) {
2066 case 4: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
2067 case 3: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
2068 case 2: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
2069 case 1: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
2070 case 0: if (leftoverBits > 0) crc = (drflac_uint8)((crc << leftoverBits) ^ drflac__crc8_table[(crc >> (8 - leftoverBits)) ^ (data & leftoverDataMask)]);
2071 }
2072 return crc;
2073#endif
9e052883 2074#endif
2ff0b512 2075}
2076
2077static DRFLAC_INLINE drflac_uint16 drflac_crc16_byte(drflac_uint16 crc, drflac_uint8 data)
2078{
2079 return (crc << 8) ^ drflac__crc16_table[(drflac_uint8)(crc >> 8) ^ data];
2080}
2081
2082static DRFLAC_INLINE drflac_uint16 drflac_crc16_cache(drflac_uint16 crc, drflac_cache_t data)
2083{
2084#ifdef DRFLAC_64BIT
2085 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
2086 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
2087 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
2088 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
2089#endif
2090 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
2091 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
2092 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 8) & 0xFF));
2093 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 0) & 0xFF));
2094
2095 return crc;
2096}
2097
2098static DRFLAC_INLINE drflac_uint16 drflac_crc16_bytes(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 byteCount)
2099{
2100 switch (byteCount)
2101 {
2102#ifdef DRFLAC_64BIT
2103 case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
2104 case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
2105 case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
2106 case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
2107#endif
2108 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
2109 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
2110 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 8) & 0xFF));
2111 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 0) & 0xFF));
2112 }
2113
2114 return crc;
2115}
2116
9e052883 2117#if 0
2118static DRFLAC_INLINE drflac_uint16 drflac_crc16__32bit(drflac_uint16 crc, drflac_uint32 data, drflac_uint32 count)
2119{
2120#ifdef DR_FLAC_NO_CRC
2121 (void)crc;
2122 (void)data;
2123 (void)count;
2124 return 0;
2125#else
2126#if 0
2127 /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc16(crc, 0, 16);") */
2128 drflac_uint16 p = 0x8005;
2129 for (int i = count-1; i >= 0; --i) {
2130 drflac_uint16 bit = (data & (1ULL << i)) >> i;
2131 if (r & 0x8000) {
2132 r = ((r << 1) | bit) ^ p;
2133 } else {
2134 r = ((r << 1) | bit);
2135 }
2136 }
2137
2138 return crc;
2139#else
2140 drflac_uint32 wholeBytes;
2141 drflac_uint32 leftoverBits;
2142 drflac_uint64 leftoverDataMask;
2143
2144 static drflac_uint64 leftoverDataMaskTable[8] = {
2145 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2146 };
2147
2148 DRFLAC_ASSERT(count <= 64);
2149
2150 wholeBytes = count >> 3;
2151 leftoverBits = count & 7;
2152 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2153
2154 switch (wholeBytes) {
2155 default:
2156 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
2157 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
2158 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
2159 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
2160 case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
2161 }
2162 return crc;
2163#endif
2164#endif
2165}
2166
2167static DRFLAC_INLINE drflac_uint16 drflac_crc16__64bit(drflac_uint16 crc, drflac_uint64 data, drflac_uint32 count)
2168{
2169#ifdef DR_FLAC_NO_CRC
2170 (void)crc;
2171 (void)data;
2172 (void)count;
2173 return 0;
2174#else
2175 drflac_uint32 wholeBytes;
2176 drflac_uint32 leftoverBits;
2177 drflac_uint64 leftoverDataMask;
2178
2179 static drflac_uint64 leftoverDataMaskTable[8] = {
2180 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2181 };
2182
2183 DRFLAC_ASSERT(count <= 64);
2184
2185 wholeBytes = count >> 3;
2186 leftoverBits = count & 7;
2187 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2188
2189 switch (wholeBytes) {
2190 default:
2191 case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 << 32) << leftoverBits)) >> (56 + leftoverBits))); /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
2192 case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 << 32) << leftoverBits)) >> (48 + leftoverBits)));
2193 case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 << 32) << leftoverBits)) >> (40 + leftoverBits)));
2194 case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF << 32) << leftoverBits)) >> (32 + leftoverBits)));
2195 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 ) << leftoverBits)) >> (24 + leftoverBits)));
2196 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 ) << leftoverBits)) >> (16 + leftoverBits)));
2197 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 ) << leftoverBits)) >> ( 8 + leftoverBits)));
2198 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF ) << leftoverBits)) >> ( 0 + leftoverBits)));
2199 case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
2200 }
2201 return crc;
2202#endif
2203}
2204
2205
2206static DRFLAC_INLINE drflac_uint16 drflac_crc16(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 count)
2207{
2208#ifdef DRFLAC_64BIT
2209 return drflac_crc16__64bit(crc, data, count);
2210#else
2211 return drflac_crc16__32bit(crc, data, count);
2212#endif
2213}
2214#endif
2215
2216
2ff0b512 2217#ifdef DRFLAC_64BIT
2218#define drflac__be2host__cache_line drflac__be2host_64
2219#else
2220#define drflac__be2host__cache_line drflac__be2host_32
2221#endif
2222
2223/*
2224BIT READING ATTEMPT #2
2225
2226This uses a 32- or 64-bit bit-shifted cache - as bits are read, the cache is shifted such that the first valid bit is sitting
2227on the most significant bit. It uses the notion of an L1 and L2 cache (borrowed from CPU architecture), where the L1 cache
2228is a 32- or 64-bit unsigned integer (depending on whether or not a 32- or 64-bit build is being compiled) and the L2 is an
2229array of "cache lines", with each cache line being the same size as the L1. The L2 is a buffer of about 4KB and is where data
2230from onRead() is read into.
2231*/
2232#define DRFLAC_CACHE_L1_SIZE_BYTES(bs) (sizeof((bs)->cache))
2233#define DRFLAC_CACHE_L1_SIZE_BITS(bs) (sizeof((bs)->cache)*8)
2234#define DRFLAC_CACHE_L1_BITS_REMAINING(bs) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (bs)->consumedBits)
2235#define DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount) (~((~(drflac_cache_t)0) >> (_bitCount)))
2236#define DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (_bitCount))
2237#define DRFLAC_CACHE_L1_SELECT(bs, _bitCount) (((bs)->cache) & DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount))
2238#define DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)))
2239#define DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, _bitCount)(DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> (DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)) & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1)))
2240#define DRFLAC_CACHE_L2_SIZE_BYTES(bs) (sizeof((bs)->cacheL2))
2241#define DRFLAC_CACHE_L2_LINE_COUNT(bs) (DRFLAC_CACHE_L2_SIZE_BYTES(bs) / sizeof((bs)->cacheL2[0]))
2242#define DRFLAC_CACHE_L2_LINES_REMAINING(bs) (DRFLAC_CACHE_L2_LINE_COUNT(bs) - (bs)->nextL2Line)
2243
2244
2245#ifndef DR_FLAC_NO_CRC
2246static DRFLAC_INLINE void drflac__reset_crc16(drflac_bs* bs)
2247{
2248 bs->crc16 = 0;
2249 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2250}
2251
2252static DRFLAC_INLINE void drflac__update_crc16(drflac_bs* bs)
2253{
2254 if (bs->crc16CacheIgnoredBytes == 0) {
2255 bs->crc16 = drflac_crc16_cache(bs->crc16, bs->crc16Cache);
2256 } else {
2257 bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache, DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bs->crc16CacheIgnoredBytes);
2258 bs->crc16CacheIgnoredBytes = 0;
2259 }
2260}
2261
2262static DRFLAC_INLINE drflac_uint16 drflac__flush_crc16(drflac_bs* bs)
2263{
2264 /* We should never be flushing in a situation where we are not aligned on a byte boundary. */
2265 DRFLAC_ASSERT((DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7) == 0);
2266
2267 /*
2268 The bits that were read from the L1 cache need to be accumulated. The number of bytes needing to be accumulated is determined
2269 by the number of bits that have been consumed.
2270 */
2271 if (DRFLAC_CACHE_L1_BITS_REMAINING(bs) == 0) {
2272 drflac__update_crc16(bs);
2273 } else {
2274 /* We only accumulate the consumed bits. */
2275 bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache >> DRFLAC_CACHE_L1_BITS_REMAINING(bs), (bs->consumedBits >> 3) - bs->crc16CacheIgnoredBytes);
2276
2277 /*
2278 The bits that we just accumulated should never be accumulated again. We need to keep track of how many bytes were accumulated
2279 so we can handle that later.
2280 */
2281 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2282 }
2283
2284 return bs->crc16;
2285}
2286#endif
2287
2288static DRFLAC_INLINE drflac_bool32 drflac__reload_l1_cache_from_l2(drflac_bs* bs)
2289{
2290 size_t bytesRead;
2291 size_t alignedL1LineCount;
2292
2293 /* Fast path. Try loading straight from L2. */
2294 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
2295 bs->cache = bs->cacheL2[bs->nextL2Line++];
2296 return DRFLAC_TRUE;
2297 }
2298
2299 /*
2300 If we get here it means we've run out of data in the L2 cache. We'll need to fetch more from the client, if there's
2301 any left.
2302 */
2303 if (bs->unalignedByteCount > 0) {
2304 return DRFLAC_FALSE; /* If we have any unaligned bytes it means there's no more aligned bytes left in the client. */
2305 }
2306
2307 bytesRead = bs->onRead(bs->pUserData, bs->cacheL2, DRFLAC_CACHE_L2_SIZE_BYTES(bs));
2308
2309 bs->nextL2Line = 0;
2310 if (bytesRead == DRFLAC_CACHE_L2_SIZE_BYTES(bs)) {
2311 bs->cache = bs->cacheL2[bs->nextL2Line++];
2312 return DRFLAC_TRUE;
2313 }
2314
2315
2316 /*
2317 If we get here it means we were unable to retrieve enough data to fill the entire L2 cache. It probably
2318 means we've just reached the end of the file. We need to move the valid data down to the end of the buffer
2319 and adjust the index of the next line accordingly. Also keep in mind that the L2 cache must be aligned to
2320 the size of the L1 so we'll need to seek backwards by any misaligned bytes.
2321 */
2322 alignedL1LineCount = bytesRead / DRFLAC_CACHE_L1_SIZE_BYTES(bs);
2323
2324 /* We need to keep track of any unaligned bytes for later use. */
2325 bs->unalignedByteCount = bytesRead - (alignedL1LineCount * DRFLAC_CACHE_L1_SIZE_BYTES(bs));
2326 if (bs->unalignedByteCount > 0) {
2327 bs->unalignedCache = bs->cacheL2[alignedL1LineCount];
2328 }
2329
2330 if (alignedL1LineCount > 0) {
2331 size_t offset = DRFLAC_CACHE_L2_LINE_COUNT(bs) - alignedL1LineCount;
2332 size_t i;
2333 for (i = alignedL1LineCount; i > 0; --i) {
2334 bs->cacheL2[i-1 + offset] = bs->cacheL2[i-1];
2335 }
2336
2337 bs->nextL2Line = (drflac_uint32)offset;
2338 bs->cache = bs->cacheL2[bs->nextL2Line++];
2339 return DRFLAC_TRUE;
2340 } else {
2341 /* If we get into this branch it means we weren't able to load any L1-aligned data. */
2342 bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs);
2343 return DRFLAC_FALSE;
2344 }
2345}
2346
2347static drflac_bool32 drflac__reload_cache(drflac_bs* bs)
2348{
2349 size_t bytesRead;
2350
2351#ifndef DR_FLAC_NO_CRC
2352 drflac__update_crc16(bs);
2353#endif
2354
2355 /* Fast path. Try just moving the next value in the L2 cache to the L1 cache. */
2356 if (drflac__reload_l1_cache_from_l2(bs)) {
2357 bs->cache = drflac__be2host__cache_line(bs->cache);
2358 bs->consumedBits = 0;
2359#ifndef DR_FLAC_NO_CRC
2360 bs->crc16Cache = bs->cache;
2361#endif
2362 return DRFLAC_TRUE;
2363 }
2364
2365 /* Slow path. */
2366
2367 /*
2368 If we get here it means we have failed to load the L1 cache from the L2. Likely we've just reached the end of the stream and the last
2369 few bytes did not meet the alignment requirements for the L2 cache. In this case we need to fall back to a slower path and read the
2370 data from the unaligned cache.
2371 */
2372 bytesRead = bs->unalignedByteCount;
2373 if (bytesRead == 0) {
2374 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); /* <-- The stream has been exhausted, so marked the bits as consumed. */
2375 return DRFLAC_FALSE;
2376 }
2377
2378 DRFLAC_ASSERT(bytesRead < DRFLAC_CACHE_L1_SIZE_BYTES(bs));
2379 bs->consumedBits = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bytesRead) * 8;
2380
2381 bs->cache = drflac__be2host__cache_line(bs->unalignedCache);
2382 bs->cache &= DRFLAC_CACHE_L1_SELECTION_MASK(DRFLAC_CACHE_L1_BITS_REMAINING(bs)); /* <-- Make sure the consumed bits are always set to zero. Other parts of the library depend on this property. */
2383 bs->unalignedByteCount = 0; /* <-- At this point the unaligned bytes have been moved into the cache and we thus have no more unaligned bytes. */
2384
2385#ifndef DR_FLAC_NO_CRC
2386 bs->crc16Cache = bs->cache >> bs->consumedBits;
2387 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2388#endif
2389 return DRFLAC_TRUE;
2390}
2391
2392static void drflac__reset_cache(drflac_bs* bs)
2393{
2394 bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs); /* <-- This clears the L2 cache. */
2395 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); /* <-- This clears the L1 cache. */
2396 bs->cache = 0;
2397 bs->unalignedByteCount = 0; /* <-- This clears the trailing unaligned bytes. */
2398 bs->unalignedCache = 0;
2399
2400#ifndef DR_FLAC_NO_CRC
2401 bs->crc16Cache = 0;
2402 bs->crc16CacheIgnoredBytes = 0;
2403#endif
2404}
2405
2406
2407static DRFLAC_INLINE drflac_bool32 drflac__read_uint32(drflac_bs* bs, unsigned int bitCount, drflac_uint32* pResultOut)
2408{
2409 DRFLAC_ASSERT(bs != NULL);
2410 DRFLAC_ASSERT(pResultOut != NULL);
2411 DRFLAC_ASSERT(bitCount > 0);
2412 DRFLAC_ASSERT(bitCount <= 32);
2413
2414 if (bs->consumedBits == DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2415 if (!drflac__reload_cache(bs)) {
2416 return DRFLAC_FALSE;
2417 }
2418 }
2419
2420 if (bitCount <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2421 /*
2422 If we want to load all 32-bits from a 32-bit cache we need to do it slightly differently because we can't do
2423 a 32-bit shift on a 32-bit integer. This will never be the case on 64-bit caches, so we can have a slightly
2424 more optimal solution for this.
2425 */
2426#ifdef DRFLAC_64BIT
2427 *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
2428 bs->consumedBits += bitCount;
2429 bs->cache <<= bitCount;
2430#else
2431 if (bitCount < DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2432 *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
2433 bs->consumedBits += bitCount;
2434 bs->cache <<= bitCount;
2435 } else {
2436 /* Cannot shift by 32-bits, so need to do it differently. */
2437 *pResultOut = (drflac_uint32)bs->cache;
2438 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);
2439 bs->cache = 0;
2440 }
2441#endif
2442
2443 return DRFLAC_TRUE;
2444 } else {
2445 /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
2446 drflac_uint32 bitCountHi = DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2447 drflac_uint32 bitCountLo = bitCount - bitCountHi;
2448 drflac_uint32 resultHi;
2449
2450 DRFLAC_ASSERT(bitCountHi > 0);
2451 DRFLAC_ASSERT(bitCountHi < 32);
2452 resultHi = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountHi);
2453
2454 if (!drflac__reload_cache(bs)) {
2455 return DRFLAC_FALSE;
2456 }
9e052883 2457 if (bitCountLo > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2458 /* This happens when we get to end of stream */
2459 return DRFLAC_FALSE;
2460 }
2ff0b512 2461
2462 *pResultOut = (resultHi << bitCountLo) | (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountLo);
2463 bs->consumedBits += bitCountLo;
2464 bs->cache <<= bitCountLo;
2465 return DRFLAC_TRUE;
2466 }
2467}
2468
2469static drflac_bool32 drflac__read_int32(drflac_bs* bs, unsigned int bitCount, drflac_int32* pResult)
2470{
2471 drflac_uint32 result;
2472
2473 DRFLAC_ASSERT(bs != NULL);
2474 DRFLAC_ASSERT(pResult != NULL);
2475 DRFLAC_ASSERT(bitCount > 0);
2476 DRFLAC_ASSERT(bitCount <= 32);
2477
2478 if (!drflac__read_uint32(bs, bitCount, &result)) {
2479 return DRFLAC_FALSE;
2480 }
2481
2482 /* Do not attempt to shift by 32 as it's undefined. */
2483 if (bitCount < 32) {
2484 drflac_uint32 signbit;
2485 signbit = ((result >> (bitCount-1)) & 0x01);
2486 result |= (~signbit + 1) << bitCount;
2487 }
2488
2489 *pResult = (drflac_int32)result;
2490 return DRFLAC_TRUE;
2491}
2492
2493#ifdef DRFLAC_64BIT
2494static drflac_bool32 drflac__read_uint64(drflac_bs* bs, unsigned int bitCount, drflac_uint64* pResultOut)
2495{
2496 drflac_uint32 resultHi;
2497 drflac_uint32 resultLo;
2498
2499 DRFLAC_ASSERT(bitCount <= 64);
2500 DRFLAC_ASSERT(bitCount > 32);
2501
2502 if (!drflac__read_uint32(bs, bitCount - 32, &resultHi)) {
2503 return DRFLAC_FALSE;
2504 }
2505
2506 if (!drflac__read_uint32(bs, 32, &resultLo)) {
2507 return DRFLAC_FALSE;
2508 }
2509
2510 *pResultOut = (((drflac_uint64)resultHi) << 32) | ((drflac_uint64)resultLo);
2511 return DRFLAC_TRUE;
2512}
2513#endif
2514
9e052883 2515/* Function below is unused, but leaving it here in case I need to quickly add it again. */
2516#if 0
2517static drflac_bool32 drflac__read_int64(drflac_bs* bs, unsigned int bitCount, drflac_int64* pResultOut)
2518{
2519 drflac_uint64 result;
2520 drflac_uint64 signbit;
2521
2522 DRFLAC_ASSERT(bitCount <= 64);
2523
2524 if (!drflac__read_uint64(bs, bitCount, &result)) {
2525 return DRFLAC_FALSE;
2526 }
2527
2528 signbit = ((result >> (bitCount-1)) & 0x01);
2529 result |= (~signbit + 1) << bitCount;
2530
2531 *pResultOut = (drflac_int64)result;
2532 return DRFLAC_TRUE;
2533}
2534#endif
2535
2ff0b512 2536static drflac_bool32 drflac__read_uint16(drflac_bs* bs, unsigned int bitCount, drflac_uint16* pResult)
2537{
2538 drflac_uint32 result;
2539
2540 DRFLAC_ASSERT(bs != NULL);
2541 DRFLAC_ASSERT(pResult != NULL);
2542 DRFLAC_ASSERT(bitCount > 0);
2543 DRFLAC_ASSERT(bitCount <= 16);
2544
2545 if (!drflac__read_uint32(bs, bitCount, &result)) {
2546 return DRFLAC_FALSE;
2547 }
2548
2549 *pResult = (drflac_uint16)result;
2550 return DRFLAC_TRUE;
2551}
2552
9e052883 2553#if 0
2554static drflac_bool32 drflac__read_int16(drflac_bs* bs, unsigned int bitCount, drflac_int16* pResult)
2555{
2556 drflac_int32 result;
2557
2558 DRFLAC_ASSERT(bs != NULL);
2559 DRFLAC_ASSERT(pResult != NULL);
2560 DRFLAC_ASSERT(bitCount > 0);
2561 DRFLAC_ASSERT(bitCount <= 16);
2562
2563 if (!drflac__read_int32(bs, bitCount, &result)) {
2564 return DRFLAC_FALSE;
2565 }
2566
2567 *pResult = (drflac_int16)result;
2568 return DRFLAC_TRUE;
2569}
2570#endif
2571
2ff0b512 2572static drflac_bool32 drflac__read_uint8(drflac_bs* bs, unsigned int bitCount, drflac_uint8* pResult)
2573{
2574 drflac_uint32 result;
2575
2576 DRFLAC_ASSERT(bs != NULL);
2577 DRFLAC_ASSERT(pResult != NULL);
2578 DRFLAC_ASSERT(bitCount > 0);
2579 DRFLAC_ASSERT(bitCount <= 8);
2580
2581 if (!drflac__read_uint32(bs, bitCount, &result)) {
2582 return DRFLAC_FALSE;
2583 }
2584
2585 *pResult = (drflac_uint8)result;
2586 return DRFLAC_TRUE;
2587}
2588
2589static drflac_bool32 drflac__read_int8(drflac_bs* bs, unsigned int bitCount, drflac_int8* pResult)
2590{
2591 drflac_int32 result;
2592
2593 DRFLAC_ASSERT(bs != NULL);
2594 DRFLAC_ASSERT(pResult != NULL);
2595 DRFLAC_ASSERT(bitCount > 0);
2596 DRFLAC_ASSERT(bitCount <= 8);
2597
2598 if (!drflac__read_int32(bs, bitCount, &result)) {
2599 return DRFLAC_FALSE;
2600 }
2601
2602 *pResult = (drflac_int8)result;
2603 return DRFLAC_TRUE;
2604}
2605
2606
2607static drflac_bool32 drflac__seek_bits(drflac_bs* bs, size_t bitsToSeek)
2608{
2609 if (bitsToSeek <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2610 bs->consumedBits += (drflac_uint32)bitsToSeek;
2611 bs->cache <<= bitsToSeek;
2612 return DRFLAC_TRUE;
2613 } else {
2614 /* It straddles the cached data. This function isn't called too frequently so I'm favouring simplicity here. */
2615 bitsToSeek -= DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2616 bs->consumedBits += DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2617 bs->cache = 0;
2618
2619 /* Simple case. Seek in groups of the same number as bits that fit within a cache line. */
2620#ifdef DRFLAC_64BIT
2621 while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2622 drflac_uint64 bin;
2623 if (!drflac__read_uint64(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
2624 return DRFLAC_FALSE;
2625 }
2626 bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
2627 }
2628#else
2629 while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2630 drflac_uint32 bin;
2631 if (!drflac__read_uint32(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
2632 return DRFLAC_FALSE;
2633 }
2634 bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
2635 }
2636#endif
2637
2638 /* Whole leftover bytes. */
2639 while (bitsToSeek >= 8) {
2640 drflac_uint8 bin;
2641 if (!drflac__read_uint8(bs, 8, &bin)) {
2642 return DRFLAC_FALSE;
2643 }
2644 bitsToSeek -= 8;
2645 }
2646
2647 /* Leftover bits. */
2648 if (bitsToSeek > 0) {
2649 drflac_uint8 bin;
2650 if (!drflac__read_uint8(bs, (drflac_uint32)bitsToSeek, &bin)) {
2651 return DRFLAC_FALSE;
2652 }
2653 bitsToSeek = 0; /* <-- Necessary for the assert below. */
2654 }
2655
2656 DRFLAC_ASSERT(bitsToSeek == 0);
2657 return DRFLAC_TRUE;
2658 }
2659}
2660
2661
2662/* This function moves the bit streamer to the first bit after the sync code (bit 15 of the of the frame header). It will also update the CRC-16. */
2663static drflac_bool32 drflac__find_and_seek_to_next_sync_code(drflac_bs* bs)
2664{
2665 DRFLAC_ASSERT(bs != NULL);
2666
2667 /*
2668 The sync code is always aligned to 8 bits. This is convenient for us because it means we can do byte-aligned movements. The first
2669 thing to do is align to the next byte.
2670 */
2671 if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
2672 return DRFLAC_FALSE;
2673 }
2674
2675 for (;;) {
2676 drflac_uint8 hi;
2677
2678#ifndef DR_FLAC_NO_CRC
2679 drflac__reset_crc16(bs);
2680#endif
2681
2682 if (!drflac__read_uint8(bs, 8, &hi)) {
2683 return DRFLAC_FALSE;
2684 }
2685
2686 if (hi == 0xFF) {
2687 drflac_uint8 lo;
2688 if (!drflac__read_uint8(bs, 6, &lo)) {
2689 return DRFLAC_FALSE;
2690 }
2691
2692 if (lo == 0x3E) {
2693 return DRFLAC_TRUE;
2694 } else {
2695 if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
2696 return DRFLAC_FALSE;
2697 }
2698 }
2699 }
2700 }
2701
2702 /* Should never get here. */
2703 /*return DRFLAC_FALSE;*/
2704}
2705
2706
2707#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
2708#define DRFLAC_IMPLEMENT_CLZ_LZCNT
2709#endif
2710#if defined(_MSC_VER) && _MSC_VER >= 1400 && (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(__clang__)
2711#define DRFLAC_IMPLEMENT_CLZ_MSVC
2712#endif
9e052883 2713#if defined(__WATCOMC__) && defined(__386__)
2714#define DRFLAC_IMPLEMENT_CLZ_WATCOM
2715#endif
2716#ifdef __MRC__
2717#include <intrinsics.h>
2718#define DRFLAC_IMPLEMENT_CLZ_MRC
2719#endif
2ff0b512 2720
2721static DRFLAC_INLINE drflac_uint32 drflac__clz_software(drflac_cache_t x)
2722{
2723 drflac_uint32 n;
2724 static drflac_uint32 clz_table_4[] = {
2725 0,
2726 4,
2727 3, 3,
2728 2, 2, 2, 2,
2729 1, 1, 1, 1, 1, 1, 1, 1
2730 };
2731
2732 if (x == 0) {
2733 return sizeof(x)*8;
2734 }
2735
2736 n = clz_table_4[x >> (sizeof(x)*8 - 4)];
2737 if (n == 0) {
2738#ifdef DRFLAC_64BIT
2739 if ((x & ((drflac_uint64)0xFFFFFFFF << 32)) == 0) { n = 32; x <<= 32; }
2740 if ((x & ((drflac_uint64)0xFFFF0000 << 32)) == 0) { n += 16; x <<= 16; }
2741 if ((x & ((drflac_uint64)0xFF000000 << 32)) == 0) { n += 8; x <<= 8; }
2742 if ((x & ((drflac_uint64)0xF0000000 << 32)) == 0) { n += 4; x <<= 4; }
2743#else
2744 if ((x & 0xFFFF0000) == 0) { n = 16; x <<= 16; }
2745 if ((x & 0xFF000000) == 0) { n += 8; x <<= 8; }
2746 if ((x & 0xF0000000) == 0) { n += 4; x <<= 4; }
2747#endif
2748 n += clz_table_4[x >> (sizeof(x)*8 - 4)];
2749 }
2750
2751 return n - 1;
2752}
2753
2754#ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
2755static DRFLAC_INLINE drflac_bool32 drflac__is_lzcnt_supported(void)
2756{
2757 /* Fast compile time check for ARM. */
2758#if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
2759 return DRFLAC_TRUE;
9e052883 2760#elif defined(__MRC__)
2761 return DRFLAC_TRUE;
2ff0b512 2762#else
2763 /* If the compiler itself does not support the intrinsic then we'll need to return false. */
2764 #ifdef DRFLAC_HAS_LZCNT_INTRINSIC
2765 return drflac__gIsLZCNTSupported;
2766 #else
2767 return DRFLAC_FALSE;
2768 #endif
2769#endif
2770}
2771
2772static DRFLAC_INLINE drflac_uint32 drflac__clz_lzcnt(drflac_cache_t x)
2773{
2774 /*
2775 It's critical for competitive decoding performance that this function be highly optimal. With MSVC we can use the __lzcnt64() and __lzcnt() intrinsics
2776 to achieve good performance, however on GCC and Clang it's a little bit more annoying. The __builtin_clzl() and __builtin_clzll() intrinsics leave
2777 it undefined as to the return value when `x` is 0. We need this to be well defined as returning 32 or 64, depending on whether or not it's a 32- or
2778 64-bit build. To work around this we would need to add a conditional to check for the x = 0 case, but this creates unnecessary inefficiency. To work
2779 around this problem I have written some inline assembly to emit the LZCNT (x86) or CLZ (ARM) instruction directly which removes the need to include
2780 the conditional. This has worked well in the past, but for some reason Clang's MSVC compatible driver, clang-cl, does not seem to be handling this
2781 in the same way as the normal Clang driver. It seems that `clang-cl` is just outputting the wrong results sometimes, maybe due to some register
2782 getting clobbered?
2783
2784 I'm not sure if this is a bug with dr_flac's inlined assembly (most likely), a bug in `clang-cl` or just a misunderstanding on my part with inline
2785 assembly rules for `clang-cl`. If somebody can identify an error in dr_flac's inlined assembly I'm happy to get that fixed.
2786
2787 Fortunately there is an easy workaround for this. Clang implements MSVC-specific intrinsics for compatibility. It also defines _MSC_VER for extra
2788 compatibility. We can therefore just check for _MSC_VER and use the MSVC intrinsic which, fortunately for us, Clang supports. It would still be nice
2789 to know how to fix the inlined assembly for correctness sake, however.
2790 */
2791
2792#if defined(_MSC_VER) /*&& !defined(__clang__)*/ /* <-- Intentionally wanting Clang to use the MSVC __lzcnt64/__lzcnt intrinsics due to above ^. */
2793 #ifdef DRFLAC_64BIT
2794 return (drflac_uint32)__lzcnt64(x);
2795 #else
2796 return (drflac_uint32)__lzcnt(x);
2797 #endif
2798#else
2799 #if defined(__GNUC__) || defined(__clang__)
2800 #if defined(DRFLAC_X64)
2801 {
2802 drflac_uint64 r;
2803 __asm__ __volatile__ (
2804 "lzcnt{ %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
2805 );
2806
2807 return (drflac_uint32)r;
2808 }
2809 #elif defined(DRFLAC_X86)
2810 {
2811 drflac_uint32 r;
2812 __asm__ __volatile__ (
2813 "lzcnt{l %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
2814 );
2815
2816 return r;
2817 }
648db22b 2818 #elif defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5) && !defined(__ARM_ARCH_6M__) && !defined(DRFLAC_64BIT) /* <-- I haven't tested 64-bit inline assembly, so only enabling this for the 32-bit build for now. */
2ff0b512 2819 {
2820 unsigned int r;
2821 __asm__ __volatile__ (
2822 #if defined(DRFLAC_64BIT)
2823 "clz %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(x) /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
2824 #else
2825 "clz %[out], %[in]" : [out]"=r"(r) : [in]"r"(x)
2826 #endif
2827 );
2828
2829 return r;
2830 }
2831 #else
2832 if (x == 0) {
2833 return sizeof(x)*8;
2834 }
2835 #ifdef DRFLAC_64BIT
2836 return (drflac_uint32)__builtin_clzll((drflac_uint64)x);
2837 #else
2838 return (drflac_uint32)__builtin_clzl((drflac_uint32)x);
2839 #endif
2840 #endif
2841 #else
2842 /* Unsupported compiler. */
2843 #error "This compiler does not support the lzcnt intrinsic."
2844 #endif
2845#endif
2846}
2847#endif
2848
2849#ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
2850#include <intrin.h> /* For BitScanReverse(). */
2851
2852static DRFLAC_INLINE drflac_uint32 drflac__clz_msvc(drflac_cache_t x)
2853{
2854 drflac_uint32 n;
2855
2856 if (x == 0) {
2857 return sizeof(x)*8;
2858 }
2859
2860#ifdef DRFLAC_64BIT
2861 _BitScanReverse64((unsigned long*)&n, x);
2862#else
2863 _BitScanReverse((unsigned long*)&n, x);
2864#endif
2865 return sizeof(x)*8 - n - 1;
2866}
2867#endif
2868
9e052883 2869#ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM
2870static __inline drflac_uint32 drflac__clz_watcom (drflac_uint32);
2871#ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM_LZCNT
2872/* Use the LZCNT instruction (only available on some processors since the 2010s). */
2873#pragma aux drflac__clz_watcom_lzcnt = \
2874 "db 0F3h, 0Fh, 0BDh, 0C0h" /* lzcnt eax, eax */ \
2875 parm [eax] \
2876 value [eax] \
2877 modify nomemory;
2878#else
2879/* Use the 386+-compatible implementation. */
2880#pragma aux drflac__clz_watcom = \
2881 "bsr eax, eax" \
2882 "xor eax, 31" \
2883 parm [eax] nomemory \
2884 value [eax] \
2885 modify exact [eax] nomemory;
2886#endif
2887#endif
2888
2ff0b512 2889static DRFLAC_INLINE drflac_uint32 drflac__clz(drflac_cache_t x)
2890{
2891#ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
2892 if (drflac__is_lzcnt_supported()) {
2893 return drflac__clz_lzcnt(x);
2894 } else
2895#endif
2896 {
2897#ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
2898 return drflac__clz_msvc(x);
9e052883 2899#elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM_LZCNT)
2900 return drflac__clz_watcom_lzcnt(x);
2901#elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM)
2902 return (x == 0) ? sizeof(x)*8 : drflac__clz_watcom(x);
2903#elif defined(__MRC__)
2904 return __cntlzw(x);
2ff0b512 2905#else
2906 return drflac__clz_software(x);
2907#endif
2908 }
2909}
2910
2911
2912static DRFLAC_INLINE drflac_bool32 drflac__seek_past_next_set_bit(drflac_bs* bs, unsigned int* pOffsetOut)
2913{
2914 drflac_uint32 zeroCounter = 0;
2915 drflac_uint32 setBitOffsetPlus1;
2916
2917 while (bs->cache == 0) {
2918 zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2919 if (!drflac__reload_cache(bs)) {
2920 return DRFLAC_FALSE;
2921 }
2922 }
2923
9e052883 2924 if (bs->cache == 1) {
2925 /* Not catching this would lead to undefined behaviour: a shift of a 32-bit number by 32 or more is undefined */
2926 *pOffsetOut = zeroCounter + (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs) - 1;
2927 if (!drflac__reload_cache(bs)) {
2928 return DRFLAC_FALSE;
2929 }
2930
2931 return DRFLAC_TRUE;
2932 }
2933
2ff0b512 2934 setBitOffsetPlus1 = drflac__clz(bs->cache);
2935 setBitOffsetPlus1 += 1;
2936
9e052883 2937 if (setBitOffsetPlus1 > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2938 /* This happens when we get to end of stream */
2939 return DRFLAC_FALSE;
2940 }
2941
2ff0b512 2942 bs->consumedBits += setBitOffsetPlus1;
2943 bs->cache <<= setBitOffsetPlus1;
2944
2945 *pOffsetOut = zeroCounter + setBitOffsetPlus1 - 1;
2946 return DRFLAC_TRUE;
2947}
2948
2949
2950
2951static drflac_bool32 drflac__seek_to_byte(drflac_bs* bs, drflac_uint64 offsetFromStart)
2952{
2953 DRFLAC_ASSERT(bs != NULL);
2954 DRFLAC_ASSERT(offsetFromStart > 0);
2955
2956 /*
2957 Seeking from the start is not quite as trivial as it sounds because the onSeek callback takes a signed 32-bit integer (which
2958 is intentional because it simplifies the implementation of the onSeek callbacks), however offsetFromStart is unsigned 64-bit.
2959 To resolve we just need to do an initial seek from the start, and then a series of offset seeks to make up the remainder.
2960 */
2961 if (offsetFromStart > 0x7FFFFFFF) {
2962 drflac_uint64 bytesRemaining = offsetFromStart;
2963 if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
2964 return DRFLAC_FALSE;
2965 }
2966 bytesRemaining -= 0x7FFFFFFF;
2967
2968 while (bytesRemaining > 0x7FFFFFFF) {
2969 if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
2970 return DRFLAC_FALSE;
2971 }
2972 bytesRemaining -= 0x7FFFFFFF;
2973 }
2974
2975 if (bytesRemaining > 0) {
2976 if (!bs->onSeek(bs->pUserData, (int)bytesRemaining, drflac_seek_origin_current)) {
2977 return DRFLAC_FALSE;
2978 }
2979 }
2980 } else {
2981 if (!bs->onSeek(bs->pUserData, (int)offsetFromStart, drflac_seek_origin_start)) {
2982 return DRFLAC_FALSE;
2983 }
2984 }
2985
2986 /* The cache should be reset to force a reload of fresh data from the client. */
2987 drflac__reset_cache(bs);
2988 return DRFLAC_TRUE;
2989}
2990
2991
2992static drflac_result drflac__read_utf8_coded_number(drflac_bs* bs, drflac_uint64* pNumberOut, drflac_uint8* pCRCOut)
2993{
2994 drflac_uint8 crc;
2995 drflac_uint64 result;
2996 drflac_uint8 utf8[7] = {0};
2997 int byteCount;
2998 int i;
2999
3000 DRFLAC_ASSERT(bs != NULL);
3001 DRFLAC_ASSERT(pNumberOut != NULL);
3002 DRFLAC_ASSERT(pCRCOut != NULL);
3003
3004 crc = *pCRCOut;
3005
3006 if (!drflac__read_uint8(bs, 8, utf8)) {
3007 *pNumberOut = 0;
3008 return DRFLAC_AT_END;
3009 }
3010 crc = drflac_crc8(crc, utf8[0], 8);
3011
3012 if ((utf8[0] & 0x80) == 0) {
3013 *pNumberOut = utf8[0];
3014 *pCRCOut = crc;
3015 return DRFLAC_SUCCESS;
3016 }
3017
3018 /*byteCount = 1;*/
3019 if ((utf8[0] & 0xE0) == 0xC0) {
3020 byteCount = 2;
3021 } else if ((utf8[0] & 0xF0) == 0xE0) {
3022 byteCount = 3;
3023 } else if ((utf8[0] & 0xF8) == 0xF0) {
3024 byteCount = 4;
3025 } else if ((utf8[0] & 0xFC) == 0xF8) {
3026 byteCount = 5;
3027 } else if ((utf8[0] & 0xFE) == 0xFC) {
3028 byteCount = 6;
3029 } else if ((utf8[0] & 0xFF) == 0xFE) {
3030 byteCount = 7;
3031 } else {
3032 *pNumberOut = 0;
3033 return DRFLAC_CRC_MISMATCH; /* Bad UTF-8 encoding. */
3034 }
3035
3036 /* Read extra bytes. */
3037 DRFLAC_ASSERT(byteCount > 1);
3038
3039 result = (drflac_uint64)(utf8[0] & (0xFF >> (byteCount + 1)));
3040 for (i = 1; i < byteCount; ++i) {
3041 if (!drflac__read_uint8(bs, 8, utf8 + i)) {
3042 *pNumberOut = 0;
3043 return DRFLAC_AT_END;
3044 }
3045 crc = drflac_crc8(crc, utf8[i], 8);
3046
3047 result = (result << 6) | (utf8[i] & 0x3F);
3048 }
3049
3050 *pNumberOut = result;
3051 *pCRCOut = crc;
3052 return DRFLAC_SUCCESS;
3053}
3054
3055
9e052883 3056static DRFLAC_INLINE drflac_uint32 drflac__ilog2_u32(drflac_uint32 x)
3057{
3058#if 1 /* Needs optimizing. */
3059 drflac_uint32 result = 0;
3060 while (x > 0) {
3061 result += 1;
3062 x >>= 1;
3063 }
3064
3065 return result;
3066#endif
3067}
3068
3069static DRFLAC_INLINE drflac_bool32 drflac__use_64_bit_prediction(drflac_uint32 bitsPerSample, drflac_uint32 order, drflac_uint32 precision)
3070{
3071 /* https://web.archive.org/web/20220205005724/https://github.com/ietf-wg-cellar/flac-specification/blob/37a49aa48ba4ba12e8757badfc59c0df35435fec/rfc_backmatter.md */
3072 return bitsPerSample + precision + drflac__ilog2_u32(order) > 32;
3073}
3074
2ff0b512 3075
3076/*
3077The next two functions are responsible for calculating the prediction.
3078
3079When the bits per sample is >16 we need to use 64-bit integer arithmetic because otherwise we'll run out of precision. It's
3080safe to assume this will be slower on 32-bit platforms so we use a more optimal solution when the bits per sample is <=16.
3081*/
9e052883 3082#if defined(__clang__)
3083__attribute__((no_sanitize("signed-integer-overflow")))
3084#endif
2ff0b512 3085static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_32(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
3086{
3087 drflac_int32 prediction = 0;
3088
3089 DRFLAC_ASSERT(order <= 32);
3090
3091 /* 32-bit version. */
3092
3093 /* VC++ optimizes this to a single jmp. I've not yet verified this for other compilers. */
3094 switch (order)
3095 {
3096 case 32: prediction += coefficients[31] * pDecodedSamples[-32];
3097 case 31: prediction += coefficients[30] * pDecodedSamples[-31];
3098 case 30: prediction += coefficients[29] * pDecodedSamples[-30];
3099 case 29: prediction += coefficients[28] * pDecodedSamples[-29];
3100 case 28: prediction += coefficients[27] * pDecodedSamples[-28];
3101 case 27: prediction += coefficients[26] * pDecodedSamples[-27];
3102 case 26: prediction += coefficients[25] * pDecodedSamples[-26];
3103 case 25: prediction += coefficients[24] * pDecodedSamples[-25];
3104 case 24: prediction += coefficients[23] * pDecodedSamples[-24];
3105 case 23: prediction += coefficients[22] * pDecodedSamples[-23];
3106 case 22: prediction += coefficients[21] * pDecodedSamples[-22];
3107 case 21: prediction += coefficients[20] * pDecodedSamples[-21];
3108 case 20: prediction += coefficients[19] * pDecodedSamples[-20];
3109 case 19: prediction += coefficients[18] * pDecodedSamples[-19];
3110 case 18: prediction += coefficients[17] * pDecodedSamples[-18];
3111 case 17: prediction += coefficients[16] * pDecodedSamples[-17];
3112 case 16: prediction += coefficients[15] * pDecodedSamples[-16];
3113 case 15: prediction += coefficients[14] * pDecodedSamples[-15];
3114 case 14: prediction += coefficients[13] * pDecodedSamples[-14];
3115 case 13: prediction += coefficients[12] * pDecodedSamples[-13];
3116 case 12: prediction += coefficients[11] * pDecodedSamples[-12];
3117 case 11: prediction += coefficients[10] * pDecodedSamples[-11];
3118 case 10: prediction += coefficients[ 9] * pDecodedSamples[-10];
3119 case 9: prediction += coefficients[ 8] * pDecodedSamples[- 9];
3120 case 8: prediction += coefficients[ 7] * pDecodedSamples[- 8];
3121 case 7: prediction += coefficients[ 6] * pDecodedSamples[- 7];
3122 case 6: prediction += coefficients[ 5] * pDecodedSamples[- 6];
3123 case 5: prediction += coefficients[ 4] * pDecodedSamples[- 5];
3124 case 4: prediction += coefficients[ 3] * pDecodedSamples[- 4];
3125 case 3: prediction += coefficients[ 2] * pDecodedSamples[- 3];
3126 case 2: prediction += coefficients[ 1] * pDecodedSamples[- 2];
3127 case 1: prediction += coefficients[ 0] * pDecodedSamples[- 1];
3128 }
3129
3130 return (drflac_int32)(prediction >> shift);
3131}
3132
3133static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_64(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
3134{
3135 drflac_int64 prediction;
3136
3137 DRFLAC_ASSERT(order <= 32);
3138
3139 /* 64-bit version. */
3140
3141 /* This method is faster on the 32-bit build when compiling with VC++. See note below. */
3142#ifndef DRFLAC_64BIT
3143 if (order == 8)
3144 {
3145 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3146 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3147 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3148 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3149 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3150 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3151 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3152 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3153 }
3154 else if (order == 7)
3155 {
3156 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3157 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3158 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3159 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3160 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3161 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3162 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3163 }
3164 else if (order == 3)
3165 {
3166 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3167 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3168 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3169 }
3170 else if (order == 6)
3171 {
3172 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3173 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3174 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3175 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3176 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3177 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3178 }
3179 else if (order == 5)
3180 {
3181 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3182 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3183 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3184 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3185 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3186 }
3187 else if (order == 4)
3188 {
3189 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3190 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3191 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3192 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3193 }
3194 else if (order == 12)
3195 {
3196 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3197 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3198 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3199 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3200 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3201 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3202 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3203 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3204 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3205 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3206 prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3207 prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
3208 }
3209 else if (order == 2)
3210 {
3211 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3212 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3213 }
3214 else if (order == 1)
3215 {
3216 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3217 }
3218 else if (order == 10)
3219 {
3220 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3221 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3222 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3223 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3224 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3225 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3226 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3227 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3228 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3229 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3230 }
3231 else if (order == 9)
3232 {
3233 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3234 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3235 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3236 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3237 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3238 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3239 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3240 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3241 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3242 }
3243 else if (order == 11)
3244 {
3245 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3246 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3247 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3248 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3249 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3250 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3251 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3252 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3253 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3254 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3255 prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3256 }
3257 else
3258 {
3259 int j;
3260
3261 prediction = 0;
3262 for (j = 0; j < (int)order; ++j) {
3263 prediction += coefficients[j] * (drflac_int64)pDecodedSamples[-j-1];
3264 }
3265 }
3266#endif
3267
3268 /*
3269 VC++ optimizes this to a single jmp instruction, but only the 64-bit build. The 32-bit build generates less efficient code for some
3270 reason. The ugly version above is faster so we'll just switch between the two depending on the target platform.
3271 */
3272#ifdef DRFLAC_64BIT
3273 prediction = 0;
3274 switch (order)
3275 {
3276 case 32: prediction += coefficients[31] * (drflac_int64)pDecodedSamples[-32];
3277 case 31: prediction += coefficients[30] * (drflac_int64)pDecodedSamples[-31];
3278 case 30: prediction += coefficients[29] * (drflac_int64)pDecodedSamples[-30];
3279 case 29: prediction += coefficients[28] * (drflac_int64)pDecodedSamples[-29];
3280 case 28: prediction += coefficients[27] * (drflac_int64)pDecodedSamples[-28];
3281 case 27: prediction += coefficients[26] * (drflac_int64)pDecodedSamples[-27];
3282 case 26: prediction += coefficients[25] * (drflac_int64)pDecodedSamples[-26];
3283 case 25: prediction += coefficients[24] * (drflac_int64)pDecodedSamples[-25];
3284 case 24: prediction += coefficients[23] * (drflac_int64)pDecodedSamples[-24];
3285 case 23: prediction += coefficients[22] * (drflac_int64)pDecodedSamples[-23];
3286 case 22: prediction += coefficients[21] * (drflac_int64)pDecodedSamples[-22];
3287 case 21: prediction += coefficients[20] * (drflac_int64)pDecodedSamples[-21];
3288 case 20: prediction += coefficients[19] * (drflac_int64)pDecodedSamples[-20];
3289 case 19: prediction += coefficients[18] * (drflac_int64)pDecodedSamples[-19];
3290 case 18: prediction += coefficients[17] * (drflac_int64)pDecodedSamples[-18];
3291 case 17: prediction += coefficients[16] * (drflac_int64)pDecodedSamples[-17];
3292 case 16: prediction += coefficients[15] * (drflac_int64)pDecodedSamples[-16];
3293 case 15: prediction += coefficients[14] * (drflac_int64)pDecodedSamples[-15];
3294 case 14: prediction += coefficients[13] * (drflac_int64)pDecodedSamples[-14];
3295 case 13: prediction += coefficients[12] * (drflac_int64)pDecodedSamples[-13];
3296 case 12: prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
3297 case 11: prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3298 case 10: prediction += coefficients[ 9] * (drflac_int64)pDecodedSamples[-10];
3299 case 9: prediction += coefficients[ 8] * (drflac_int64)pDecodedSamples[- 9];
3300 case 8: prediction += coefficients[ 7] * (drflac_int64)pDecodedSamples[- 8];
3301 case 7: prediction += coefficients[ 6] * (drflac_int64)pDecodedSamples[- 7];
3302 case 6: prediction += coefficients[ 5] * (drflac_int64)pDecodedSamples[- 6];
3303 case 5: prediction += coefficients[ 4] * (drflac_int64)pDecodedSamples[- 5];
3304 case 4: prediction += coefficients[ 3] * (drflac_int64)pDecodedSamples[- 4];
3305 case 3: prediction += coefficients[ 2] * (drflac_int64)pDecodedSamples[- 3];
3306 case 2: prediction += coefficients[ 1] * (drflac_int64)pDecodedSamples[- 2];
3307 case 1: prediction += coefficients[ 0] * (drflac_int64)pDecodedSamples[- 1];
3308 }
3309#endif
3310
3311 return (drflac_int32)(prediction >> shift);
3312}
3313
2ff0b512 3314
9e052883 3315#if 0
3316/*
3317Reference implementation for reading and decoding samples with residual. This is intentionally left unoptimized for the
3318sake of readability and should only be used as a reference.
3319*/
3320static drflac_bool32 drflac__decode_samples_with_residual__rice__reference(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3321{
3322 drflac_uint32 i;
2ff0b512 3323
9e052883 3324 DRFLAC_ASSERT(bs != NULL);
3325 DRFLAC_ASSERT(pSamplesOut != NULL);
2ff0b512 3326
9e052883 3327 for (i = 0; i < count; ++i) {
3328 drflac_uint32 zeroCounter = 0;
3329 for (;;) {
3330 drflac_uint8 bit;
3331 if (!drflac__read_uint8(bs, 1, &bit)) {
3332 return DRFLAC_FALSE;
3333 }
2ff0b512 3334
9e052883 3335 if (bit == 0) {
3336 zeroCounter += 1;
3337 } else {
3338 break;
3339 }
3340 }
3341
3342 drflac_uint32 decodedRice;
3343 if (riceParam > 0) {
3344 if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
3345 return DRFLAC_FALSE;
3346 }
3347 } else {
3348 decodedRice = 0;
3349 }
3350
3351 decodedRice |= (zeroCounter << riceParam);
3352 if ((decodedRice & 0x01)) {
3353 decodedRice = ~(decodedRice >> 1);
3354 } else {
3355 decodedRice = (decodedRice >> 1);
3356 }
3357
3358
3359 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
3360 pSamplesOut[i] = decodedRice + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
3361 } else {
3362 pSamplesOut[i] = decodedRice + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
3363 }
3364 }
3365
3366 return DRFLAC_TRUE;
3367}
3368#endif
3369
3370#if 0
3371static drflac_bool32 drflac__read_rice_parts__reference(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3372{
3373 drflac_uint32 zeroCounter = 0;
3374 drflac_uint32 decodedRice;
3375
3376 for (;;) {
3377 drflac_uint8 bit;
3378 if (!drflac__read_uint8(bs, 1, &bit)) {
3379 return DRFLAC_FALSE;
3380 }
3381
3382 if (bit == 0) {
3383 zeroCounter += 1;
3384 } else {
3385 break;
3386 }
3387 }
3388
3389 if (riceParam > 0) {
3390 if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
3391 return DRFLAC_FALSE;
3392 }
3393 } else {
3394 decodedRice = 0;
3395 }
3396
3397 *pZeroCounterOut = zeroCounter;
3398 *pRiceParamPartOut = decodedRice;
3399 return DRFLAC_TRUE;
3400}
3401#endif
3402
3403#if 0
3404static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3405{
3406 drflac_cache_t riceParamMask;
3407 drflac_uint32 zeroCounter;
3408 drflac_uint32 setBitOffsetPlus1;
3409 drflac_uint32 riceParamPart;
3410 drflac_uint32 riceLength;
3411
3412 DRFLAC_ASSERT(riceParam > 0); /* <-- riceParam should never be 0. drflac__read_rice_parts__param_equals_zero() should be used instead for this case. */
3413
3414 riceParamMask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParam);
3415
3416 zeroCounter = 0;
3417 while (bs->cache == 0) {
3418 zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
3419 if (!drflac__reload_cache(bs)) {
3420 return DRFLAC_FALSE;
3421 }
3422 }
3423
3424 setBitOffsetPlus1 = drflac__clz(bs->cache);
3425 zeroCounter += setBitOffsetPlus1;
3426 setBitOffsetPlus1 += 1;
3427
3428 riceLength = setBitOffsetPlus1 + riceParam;
3429 if (riceLength < DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3430 riceParamPart = (drflac_uint32)((bs->cache & (riceParamMask >> setBitOffsetPlus1)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceLength));
3431
3432 bs->consumedBits += riceLength;
3433 bs->cache <<= riceLength;
3434 } else {
3435 drflac_uint32 bitCountLo;
3436 drflac_cache_t resultHi;
3437
3438 bs->consumedBits += riceLength;
3439 bs->cache <<= setBitOffsetPlus1 & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1); /* <-- Equivalent to "if (setBitOffsetPlus1 < DRFLAC_CACHE_L1_SIZE_BITS(bs)) { bs->cache <<= setBitOffsetPlus1; }" */
3440
3441 /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
3442 bitCountLo = bs->consumedBits - DRFLAC_CACHE_L1_SIZE_BITS(bs);
3443 resultHi = DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, riceParam); /* <-- Use DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE() if ever this function allows riceParam=0. */
3444
3445 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3446#ifndef DR_FLAC_NO_CRC
3447 drflac__update_crc16(bs);
3448#endif
3449 bs->cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3450 bs->consumedBits = 0;
3451#ifndef DR_FLAC_NO_CRC
3452 bs->crc16Cache = bs->cache;
3453#endif
3454 } else {
3455 /* Slow path. We need to fetch more data from the client. */
3456 if (!drflac__reload_cache(bs)) {
3457 return DRFLAC_FALSE;
3458 }
3459 if (bitCountLo > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3460 /* This happens when we get to end of stream */
3461 return DRFLAC_FALSE;
3462 }
3463 }
3464
3465 riceParamPart = (drflac_uint32)(resultHi | DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, bitCountLo));
3466
3467 bs->consumedBits += bitCountLo;
3468 bs->cache <<= bitCountLo;
3469 }
3470
3471 pZeroCounterOut[0] = zeroCounter;
3472 pRiceParamPartOut[0] = riceParamPart;
3473
3474 return DRFLAC_TRUE;
3475}
3476#endif
3477
3478static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts_x1(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3479{
3480 drflac_uint32 riceParamPlus1 = riceParam + 1;
3481 /*drflac_cache_t riceParamPlus1Mask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParamPlus1);*/
3482 drflac_uint32 riceParamPlus1Shift = DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPlus1);
3483 drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
3484
3485 /*
3486 The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
3487 no idea how this will work in practice...
3488 */
3489 drflac_cache_t bs_cache = bs->cache;
3490 drflac_uint32 bs_consumedBits = bs->consumedBits;
3491
3492 /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
3493 drflac_uint32 lzcount = drflac__clz(bs_cache);
3494 if (lzcount < sizeof(bs_cache)*8) {
3495 pZeroCounterOut[0] = lzcount;
3496
3497 /*
3498 It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
3499 this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
3500 outside of this function at a higher level.
3501 */
3502 extract_rice_param_part:
3503 bs_cache <<= lzcount;
3504 bs_consumedBits += lzcount;
3505
3506 if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
f5b7bb83 3507 /* Getting here means the rice parameter part is wholly contained within the current cache line. */
3508 pRiceParamPartOut[0] = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
3509 bs_cache <<= riceParamPlus1;
3510 bs_consumedBits += riceParamPlus1;
2ff0b512 3511 } else {
f5b7bb83 3512 drflac_uint32 riceParamPartHi;
3513 drflac_uint32 riceParamPartLo;
3514 drflac_uint32 riceParamPartLoBitCount;
2ff0b512 3515
f5b7bb83 3516 /*
3517 Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
3518 line, reload the cache, and then combine it with the head of the next cache line.
3519 */
2ff0b512 3520
f5b7bb83 3521 /* Grab the high part of the rice parameter part. */
3522 riceParamPartHi = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
2ff0b512 3523
f5b7bb83 3524 /* Before reloading the cache we need to grab the size in bits of the low part. */
3525 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
3526 DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
2ff0b512 3527
f5b7bb83 3528 /* Now reload the cache. */
3529 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3530 #ifndef DR_FLAC_NO_CRC
3531 drflac__update_crc16(bs);
3532 #endif
3533 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3534 bs_consumedBits = riceParamPartLoBitCount;
3535 #ifndef DR_FLAC_NO_CRC
3536 bs->crc16Cache = bs_cache;
3537 #endif
3538 } else {
3539 /* Slow path. We need to fetch more data from the client. */
3540 if (!drflac__reload_cache(bs)) {
3541 return DRFLAC_FALSE;
3542 }
9e052883 3543 if (riceParamPartLoBitCount > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3544 /* This happens when we get to end of stream */
3545 return DRFLAC_FALSE;
3546 }
2ff0b512 3547
f5b7bb83 3548 bs_cache = bs->cache;
3549 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
3550 }
2ff0b512 3551
3552 /* We should now have enough information to construct the rice parameter part. */
3553 riceParamPartLo = (drflac_uint32)(bs_cache >> (DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPartLoBitCount)));
3554 pRiceParamPartOut[0] = riceParamPartHi | riceParamPartLo;
3555
3556 bs_cache <<= riceParamPartLoBitCount;
3557 }
3558 } else {
3559 /*
3560 Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
3561 to drflac__clz() and we need to reload the cache.
3562 */
3563 drflac_uint32 zeroCounter = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BITS(bs) - bs_consumedBits);
3564 for (;;) {
3565 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3566 #ifndef DR_FLAC_NO_CRC
3567 drflac__update_crc16(bs);
3568 #endif
3569 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3570 bs_consumedBits = 0;
3571 #ifndef DR_FLAC_NO_CRC
3572 bs->crc16Cache = bs_cache;
3573 #endif
3574 } else {
3575 /* Slow path. We need to fetch more data from the client. */
3576 if (!drflac__reload_cache(bs)) {
3577 return DRFLAC_FALSE;
3578 }
3579
3580 bs_cache = bs->cache;
3581 bs_consumedBits = bs->consumedBits;
3582 }
3583
3584 lzcount = drflac__clz(bs_cache);
3585 zeroCounter += lzcount;
3586
3587 if (lzcount < sizeof(bs_cache)*8) {
3588 break;
3589 }
3590 }
3591
3592 pZeroCounterOut[0] = zeroCounter;
3593 goto extract_rice_param_part;
3594 }
3595
3596 /* Make sure the cache is restored at the end of it all. */
3597 bs->cache = bs_cache;
3598 bs->consumedBits = bs_consumedBits;
3599
3600 return DRFLAC_TRUE;
3601}
3602
3603static DRFLAC_INLINE drflac_bool32 drflac__seek_rice_parts(drflac_bs* bs, drflac_uint8 riceParam)
3604{
3605 drflac_uint32 riceParamPlus1 = riceParam + 1;
3606 drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
3607
3608 /*
3609 The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
3610 no idea how this will work in practice...
3611 */
3612 drflac_cache_t bs_cache = bs->cache;
3613 drflac_uint32 bs_consumedBits = bs->consumedBits;
3614
3615 /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
3616 drflac_uint32 lzcount = drflac__clz(bs_cache);
3617 if (lzcount < sizeof(bs_cache)*8) {
3618 /*
3619 It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
3620 this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
3621 outside of this function at a higher level.
3622 */
3623 extract_rice_param_part:
3624 bs_cache <<= lzcount;
3625 bs_consumedBits += lzcount;
3626
3627 if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
3628 /* Getting here means the rice parameter part is wholly contained within the current cache line. */
3629 bs_cache <<= riceParamPlus1;
3630 bs_consumedBits += riceParamPlus1;
3631 } else {
3632 /*
3633 Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
3634 line, reload the cache, and then combine it with the head of the next cache line.
3635 */
3636
3637 /* Before reloading the cache we need to grab the size in bits of the low part. */
3638 drflac_uint32 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
3639 DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
3640
3641 /* Now reload the cache. */
3642 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3643 #ifndef DR_FLAC_NO_CRC
3644 drflac__update_crc16(bs);
3645 #endif
3646 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3647 bs_consumedBits = riceParamPartLoBitCount;
3648 #ifndef DR_FLAC_NO_CRC
3649 bs->crc16Cache = bs_cache;
3650 #endif
3651 } else {
3652 /* Slow path. We need to fetch more data from the client. */
3653 if (!drflac__reload_cache(bs)) {
3654 return DRFLAC_FALSE;
3655 }
3656
9e052883 3657 if (riceParamPartLoBitCount > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3658 /* This happens when we get to end of stream */
3659 return DRFLAC_FALSE;
3660 }
3661
2ff0b512 3662 bs_cache = bs->cache;
3663 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
3664 }
3665
3666 bs_cache <<= riceParamPartLoBitCount;
3667 }
3668 } else {
3669 /*
3670 Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
3671 to drflac__clz() and we need to reload the cache.
3672 */
3673 for (;;) {
3674 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3675 #ifndef DR_FLAC_NO_CRC
3676 drflac__update_crc16(bs);
3677 #endif
3678 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3679 bs_consumedBits = 0;
3680 #ifndef DR_FLAC_NO_CRC
3681 bs->crc16Cache = bs_cache;
3682 #endif
3683 } else {
3684 /* Slow path. We need to fetch more data from the client. */
3685 if (!drflac__reload_cache(bs)) {
3686 return DRFLAC_FALSE;
3687 }
3688
3689 bs_cache = bs->cache;
3690 bs_consumedBits = bs->consumedBits;
3691 }
3692
3693 lzcount = drflac__clz(bs_cache);
3694 if (lzcount < sizeof(bs_cache)*8) {
3695 break;
3696 }
3697 }
3698
3699 goto extract_rice_param_part;
3700 }
3701
3702 /* Make sure the cache is restored at the end of it all. */
3703 bs->cache = bs_cache;
3704 bs->consumedBits = bs_consumedBits;
3705
3706 return DRFLAC_TRUE;
3707}
3708
3709
3710static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar_zeroorder(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3711{
3712 drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3713 drflac_uint32 zeroCountPart0;
3714 drflac_uint32 riceParamPart0;
3715 drflac_uint32 riceParamMask;
3716 drflac_uint32 i;
3717
3718 DRFLAC_ASSERT(bs != NULL);
2ff0b512 3719 DRFLAC_ASSERT(pSamplesOut != NULL);
3720
3721 (void)bitsPerSample;
3722 (void)order;
3723 (void)shift;
3724 (void)coefficients;
3725
3726 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3727
3728 i = 0;
3729 while (i < count) {
3730 /* Rice extraction. */
3731 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
3732 return DRFLAC_FALSE;
3733 }
3734
3735 /* Rice reconstruction. */
3736 riceParamPart0 &= riceParamMask;
3737 riceParamPart0 |= (zeroCountPart0 << riceParam);
3738 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3739
3740 pSamplesOut[i] = riceParamPart0;
3741
3742 i += 1;
3743 }
3744
3745 return DRFLAC_TRUE;
3746}
3747
9e052883 3748static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
2ff0b512 3749{
3750 drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3751 drflac_uint32 zeroCountPart0 = 0;
3752 drflac_uint32 zeroCountPart1 = 0;
3753 drflac_uint32 zeroCountPart2 = 0;
3754 drflac_uint32 zeroCountPart3 = 0;
3755 drflac_uint32 riceParamPart0 = 0;
3756 drflac_uint32 riceParamPart1 = 0;
3757 drflac_uint32 riceParamPart2 = 0;
3758 drflac_uint32 riceParamPart3 = 0;
3759 drflac_uint32 riceParamMask;
3760 const drflac_int32* pSamplesOutEnd;
3761 drflac_uint32 i;
3762
3763 DRFLAC_ASSERT(bs != NULL);
2ff0b512 3764 DRFLAC_ASSERT(pSamplesOut != NULL);
3765
9e052883 3766 if (lpcOrder == 0) {
3767 return drflac__decode_samples_with_residual__rice__scalar_zeroorder(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
2ff0b512 3768 }
3769
3770 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3771 pSamplesOutEnd = pSamplesOut + (count & ~3);
3772
9e052883 3773 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
2ff0b512 3774 while (pSamplesOut < pSamplesOutEnd) {
3775 /*
3776 Rice extraction. It's faster to do this one at a time against local variables than it is to use the x4 version
3777 against an array. Not sure why, but perhaps it's making more efficient use of registers?
3778 */
3779 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
3780 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
3781 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
3782 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
3783 return DRFLAC_FALSE;
3784 }
3785
3786 riceParamPart0 &= riceParamMask;
3787 riceParamPart1 &= riceParamMask;
3788 riceParamPart2 &= riceParamMask;
3789 riceParamPart3 &= riceParamMask;
3790
3791 riceParamPart0 |= (zeroCountPart0 << riceParam);
3792 riceParamPart1 |= (zeroCountPart1 << riceParam);
3793 riceParamPart2 |= (zeroCountPart2 << riceParam);
3794 riceParamPart3 |= (zeroCountPart3 << riceParam);
3795
3796 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3797 riceParamPart1 = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
3798 riceParamPart2 = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
3799 riceParamPart3 = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
3800
9e052883 3801 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3802 pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 1);
3803 pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 2);
3804 pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 3);
2ff0b512 3805
3806 pSamplesOut += 4;
3807 }
3808 } else {
3809 while (pSamplesOut < pSamplesOutEnd) {
3810 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
3811 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
3812 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
3813 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
3814 return DRFLAC_FALSE;
3815 }
3816
3817 riceParamPart0 &= riceParamMask;
3818 riceParamPart1 &= riceParamMask;
3819 riceParamPart2 &= riceParamMask;
3820 riceParamPart3 &= riceParamMask;
3821
3822 riceParamPart0 |= (zeroCountPart0 << riceParam);
3823 riceParamPart1 |= (zeroCountPart1 << riceParam);
3824 riceParamPart2 |= (zeroCountPart2 << riceParam);
3825 riceParamPart3 |= (zeroCountPart3 << riceParam);
3826
3827 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3828 riceParamPart1 = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
3829 riceParamPart2 = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
3830 riceParamPart3 = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
3831
9e052883 3832 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3833 pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 1);
3834 pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 2);
3835 pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 3);
2ff0b512 3836
3837 pSamplesOut += 4;
3838 }
3839 }
3840
3841 i = (count & ~3);
3842 while (i < count) {
3843 /* Rice extraction. */
3844 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
3845 return DRFLAC_FALSE;
3846 }
3847
3848 /* Rice reconstruction. */
3849 riceParamPart0 &= riceParamMask;
3850 riceParamPart0 |= (zeroCountPart0 << riceParam);
3851 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3852 /*riceParamPart0 = (riceParamPart0 >> 1) ^ (~(riceParamPart0 & 0x01) + 1);*/
3853
3854 /* Sample reconstruction. */
9e052883 3855 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
3856 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
2ff0b512 3857 } else {
9e052883 3858 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
2ff0b512 3859 }
3860
3861 i += 1;
3862 pSamplesOut += 1;
3863 }
3864
3865 return DRFLAC_TRUE;
3866}
3867
3868#if defined(DRFLAC_SUPPORT_SSE2)
3869static DRFLAC_INLINE __m128i drflac__mm_packs_interleaved_epi32(__m128i a, __m128i b)
3870{
3871 __m128i r;
3872
3873 /* Pack. */
3874 r = _mm_packs_epi32(a, b);
3875
3876 /* a3a2 a1a0 b3b2 b1b0 -> a3a2 b3b2 a1a0 b1b0 */
3877 r = _mm_shuffle_epi32(r, _MM_SHUFFLE(3, 1, 2, 0));
3878
3879 /* a3a2 b3b2 a1a0 b1b0 -> a3b3 a2b2 a1b1 a0b0 */
3880 r = _mm_shufflehi_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
3881 r = _mm_shufflelo_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
3882
3883 return r;
3884}
3885#endif
3886
3887#if defined(DRFLAC_SUPPORT_SSE41)
3888static DRFLAC_INLINE __m128i drflac__mm_not_si128(__m128i a)
3889{
3890 return _mm_xor_si128(a, _mm_cmpeq_epi32(_mm_setzero_si128(), _mm_setzero_si128()));
3891}
3892
3893static DRFLAC_INLINE __m128i drflac__mm_hadd_epi32(__m128i x)
3894{
3895 __m128i x64 = _mm_add_epi32(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
3896 __m128i x32 = _mm_shufflelo_epi16(x64, _MM_SHUFFLE(1, 0, 3, 2));
3897 return _mm_add_epi32(x64, x32);
3898}
3899
3900static DRFLAC_INLINE __m128i drflac__mm_hadd_epi64(__m128i x)
3901{
3902 return _mm_add_epi64(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
3903}
3904
3905static DRFLAC_INLINE __m128i drflac__mm_srai_epi64(__m128i x, int count)
3906{
3907 /*
3908 To simplify this we are assuming count < 32. This restriction allows us to work on a low side and a high side. The low side
3909 is shifted with zero bits, whereas the right side is shifted with sign bits.
3910 */
3911 __m128i lo = _mm_srli_epi64(x, count);
3912 __m128i hi = _mm_srai_epi32(x, count);
3913
3914 hi = _mm_and_si128(hi, _mm_set_epi32(0xFFFFFFFF, 0, 0xFFFFFFFF, 0)); /* The high part needs to have the low part cleared. */
3915
3916 return _mm_or_si128(lo, hi);
3917}
3918
3919static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3920{
3921 int i;
3922 drflac_uint32 riceParamMask;
3923 drflac_int32* pDecodedSamples = pSamplesOut;
3924 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
3925 drflac_uint32 zeroCountParts0 = 0;
3926 drflac_uint32 zeroCountParts1 = 0;
3927 drflac_uint32 zeroCountParts2 = 0;
3928 drflac_uint32 zeroCountParts3 = 0;
3929 drflac_uint32 riceParamParts0 = 0;
3930 drflac_uint32 riceParamParts1 = 0;
3931 drflac_uint32 riceParamParts2 = 0;
3932 drflac_uint32 riceParamParts3 = 0;
3933 __m128i coefficients128_0;
3934 __m128i coefficients128_4;
3935 __m128i coefficients128_8;
3936 __m128i samples128_0;
3937 __m128i samples128_4;
3938 __m128i samples128_8;
3939 __m128i riceParamMask128;
3940
3941 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3942
3943 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3944 riceParamMask128 = _mm_set1_epi32(riceParamMask);
3945
3946 /* Pre-load. */
3947 coefficients128_0 = _mm_setzero_si128();
3948 coefficients128_4 = _mm_setzero_si128();
3949 coefficients128_8 = _mm_setzero_si128();
3950
3951 samples128_0 = _mm_setzero_si128();
3952 samples128_4 = _mm_setzero_si128();
3953 samples128_8 = _mm_setzero_si128();
3954
3955 /*
3956 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
3957 what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
3958 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
3959 so I think there's opportunity for this to be simplified.
3960 */
3961#if 1
3962 {
3963 int runningOrder = order;
3964
3965 /* 0 - 3. */
3966 if (runningOrder >= 4) {
3967 coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
3968 samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 4));
3969 runningOrder -= 4;
3970 } else {
3971 switch (runningOrder) {
3972 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
3973 case 2: coefficients128_0 = _mm_set_epi32(0, 0, coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0, 0); break;
3974 case 1: coefficients128_0 = _mm_set_epi32(0, 0, 0, coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0, 0, 0); break;
3975 }
3976 runningOrder = 0;
3977 }
3978
3979 /* 4 - 7 */
3980 if (runningOrder >= 4) {
3981 coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
3982 samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 8));
3983 runningOrder -= 4;
3984 } else {
3985 switch (runningOrder) {
3986 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
3987 case 2: coefficients128_4 = _mm_set_epi32(0, 0, coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0, 0); break;
3988 case 1: coefficients128_4 = _mm_set_epi32(0, 0, 0, coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0, 0, 0); break;
3989 }
3990 runningOrder = 0;
3991 }
3992
3993 /* 8 - 11 */
3994 if (runningOrder == 4) {
3995 coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
3996 samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 12));
3997 runningOrder -= 4;
3998 } else {
3999 switch (runningOrder) {
4000 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
4001 case 2: coefficients128_8 = _mm_set_epi32(0, 0, coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0, 0); break;
4002 case 1: coefficients128_8 = _mm_set_epi32(0, 0, 0, coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0, 0, 0); break;
4003 }
4004 runningOrder = 0;
4005 }
4006
4007 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4008 coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
4009 coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
4010 coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
4011 }
4012#else
4013 /* This causes strict-aliasing warnings with GCC. */
4014 switch (order)
4015 {
4016 case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
4017 case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
4018 case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
4019 case 9: ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
4020 case 8: ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
4021 case 7: ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
4022 case 6: ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
4023 case 5: ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
4024 case 4: ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
4025 case 3: ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
4026 case 2: ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
4027 case 1: ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
4028 }
4029#endif
4030
4031 /* For this version we are doing one sample at a time. */
4032 while (pDecodedSamples < pDecodedSamplesEnd) {
4033 __m128i prediction128;
4034 __m128i zeroCountPart128;
4035 __m128i riceParamPart128;
4036
4037 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
4038 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
4039 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
4040 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
4041 return DRFLAC_FALSE;
4042 }
4043
4044 zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
4045 riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
4046
4047 riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
4048 riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
4049 riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01))), _mm_set1_epi32(0x01))); /* <-- SSE2 compatible */
4050 /*riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_mullo_epi32(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01)), _mm_set1_epi32(0xFFFFFFFF)));*/ /* <-- Only supported from SSE4.1 and is slower in my testing... */
4051
4052 if (order <= 4) {
4053 for (i = 0; i < 4; i += 1) {
4054 prediction128 = _mm_mullo_epi32(coefficients128_0, samples128_0);
4055
4056 /* Horizontal add and shift. */
4057 prediction128 = drflac__mm_hadd_epi32(prediction128);
4058 prediction128 = _mm_srai_epi32(prediction128, shift);
4059 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4060
4061 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4062 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4063 }
4064 } else if (order <= 8) {
4065 for (i = 0; i < 4; i += 1) {
4066 prediction128 = _mm_mullo_epi32(coefficients128_4, samples128_4);
4067 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
4068
4069 /* Horizontal add and shift. */
4070 prediction128 = drflac__mm_hadd_epi32(prediction128);
4071 prediction128 = _mm_srai_epi32(prediction128, shift);
4072 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4073
4074 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
4075 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4076 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4077 }
4078 } else {
4079 for (i = 0; i < 4; i += 1) {
4080 prediction128 = _mm_mullo_epi32(coefficients128_8, samples128_8);
4081 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_4, samples128_4));
4082 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
4083
4084 /* Horizontal add and shift. */
4085 prediction128 = drflac__mm_hadd_epi32(prediction128);
4086 prediction128 = _mm_srai_epi32(prediction128, shift);
4087 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4088
4089 samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, 4);
4090 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
4091 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4092 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4093 }
4094 }
4095
4096 /* We store samples in groups of 4. */
4097 _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
4098 pDecodedSamples += 4;
4099 }
4100
4101 /* Make sure we process the last few samples. */
4102 i = (count & ~3);
4103 while (i < (int)count) {
4104 /* Rice extraction. */
4105 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
4106 return DRFLAC_FALSE;
4107 }
4108
4109 /* Rice reconstruction. */
4110 riceParamParts0 &= riceParamMask;
4111 riceParamParts0 |= (zeroCountParts0 << riceParam);
4112 riceParamParts0 = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
4113
4114 /* Sample reconstruction. */
4115 pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
4116
4117 i += 1;
4118 pDecodedSamples += 1;
4119 }
4120
4121 return DRFLAC_TRUE;
4122}
4123
4124static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4125{
4126 int i;
4127 drflac_uint32 riceParamMask;
4128 drflac_int32* pDecodedSamples = pSamplesOut;
4129 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4130 drflac_uint32 zeroCountParts0 = 0;
4131 drflac_uint32 zeroCountParts1 = 0;
4132 drflac_uint32 zeroCountParts2 = 0;
4133 drflac_uint32 zeroCountParts3 = 0;
4134 drflac_uint32 riceParamParts0 = 0;
4135 drflac_uint32 riceParamParts1 = 0;
4136 drflac_uint32 riceParamParts2 = 0;
4137 drflac_uint32 riceParamParts3 = 0;
4138 __m128i coefficients128_0;
4139 __m128i coefficients128_4;
4140 __m128i coefficients128_8;
4141 __m128i samples128_0;
4142 __m128i samples128_4;
4143 __m128i samples128_8;
4144 __m128i prediction128;
4145 __m128i riceParamMask128;
4146
4147 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4148
4149 DRFLAC_ASSERT(order <= 12);
4150
4151 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
4152 riceParamMask128 = _mm_set1_epi32(riceParamMask);
4153
4154 prediction128 = _mm_setzero_si128();
4155
4156 /* Pre-load. */
4157 coefficients128_0 = _mm_setzero_si128();
4158 coefficients128_4 = _mm_setzero_si128();
4159 coefficients128_8 = _mm_setzero_si128();
4160
4161 samples128_0 = _mm_setzero_si128();
4162 samples128_4 = _mm_setzero_si128();
4163 samples128_8 = _mm_setzero_si128();
4164
4165#if 1
4166 {
4167 int runningOrder = order;
4168
4169 /* 0 - 3. */
4170 if (runningOrder >= 4) {
4171 coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
4172 samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 4));
4173 runningOrder -= 4;
4174 } else {
4175 switch (runningOrder) {
4176 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
4177 case 2: coefficients128_0 = _mm_set_epi32(0, 0, coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0, 0); break;
4178 case 1: coefficients128_0 = _mm_set_epi32(0, 0, 0, coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0, 0, 0); break;
4179 }
4180 runningOrder = 0;
4181 }
4182
4183 /* 4 - 7 */
4184 if (runningOrder >= 4) {
4185 coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
4186 samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 8));
4187 runningOrder -= 4;
4188 } else {
4189 switch (runningOrder) {
4190 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
4191 case 2: coefficients128_4 = _mm_set_epi32(0, 0, coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0, 0); break;
4192 case 1: coefficients128_4 = _mm_set_epi32(0, 0, 0, coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0, 0, 0); break;
4193 }
4194 runningOrder = 0;
4195 }
4196
4197 /* 8 - 11 */
4198 if (runningOrder == 4) {
4199 coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
4200 samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 12));
4201 runningOrder -= 4;
4202 } else {
4203 switch (runningOrder) {
4204 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
4205 case 2: coefficients128_8 = _mm_set_epi32(0, 0, coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0, 0); break;
4206 case 1: coefficients128_8 = _mm_set_epi32(0, 0, 0, coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0, 0, 0); break;
4207 }
4208 runningOrder = 0;
4209 }
4210
4211 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4212 coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
4213 coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
4214 coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
4215 }
4216#else
4217 switch (order)
4218 {
4219 case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
4220 case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
4221 case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
4222 case 9: ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
4223 case 8: ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
4224 case 7: ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
4225 case 6: ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
4226 case 5: ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
4227 case 4: ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
4228 case 3: ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
4229 case 2: ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
4230 case 1: ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
4231 }
4232#endif
4233
4234 /* For this version we are doing one sample at a time. */
4235 while (pDecodedSamples < pDecodedSamplesEnd) {
4236 __m128i zeroCountPart128;
4237 __m128i riceParamPart128;
4238
4239 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
4240 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
4241 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
4242 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
4243 return DRFLAC_FALSE;
4244 }
4245
4246 zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
4247 riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
4248
4249 riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
4250 riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
4251 riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(1))), _mm_set1_epi32(1)));
4252
4253 for (i = 0; i < 4; i += 1) {
4254 prediction128 = _mm_xor_si128(prediction128, prediction128); /* Reset to 0. */
4255
4256 switch (order)
4257 {
4258 case 12:
4259 case 11: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(1, 1, 0, 0))));
4260 case 10:
4261 case 9: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(3, 3, 2, 2))));
4262 case 8:
4263 case 7: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(1, 1, 0, 0))));
4264 case 6:
4265 case 5: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(3, 3, 2, 2))));
4266 case 4:
4267 case 3: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(1, 1, 0, 0))));
4268 case 2:
4269 case 1: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(3, 3, 2, 2))));
4270 }
4271
4272 /* Horizontal add and shift. */
4273 prediction128 = drflac__mm_hadd_epi64(prediction128);
4274 prediction128 = drflac__mm_srai_epi64(prediction128, shift);
4275 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4276
4277 /* Our value should be sitting in prediction128[0]. We need to combine this with our SSE samples. */
4278 samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, 4);
4279 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
4280 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4281
4282 /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
4283 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4284 }
4285
4286 /* We store samples in groups of 4. */
4287 _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
4288 pDecodedSamples += 4;
4289 }
4290
4291 /* Make sure we process the last few samples. */
4292 i = (count & ~3);
4293 while (i < (int)count) {
4294 /* Rice extraction. */
4295 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
4296 return DRFLAC_FALSE;
4297 }
4298
4299 /* Rice reconstruction. */
4300 riceParamParts0 &= riceParamMask;
4301 riceParamParts0 |= (zeroCountParts0 << riceParam);
4302 riceParamParts0 = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
4303
4304 /* Sample reconstruction. */
4305 pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
4306
4307 i += 1;
4308 pDecodedSamples += 1;
4309 }
4310
4311 return DRFLAC_TRUE;
4312}
4313
9e052883 4314static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
2ff0b512 4315{
4316 DRFLAC_ASSERT(bs != NULL);
2ff0b512 4317 DRFLAC_ASSERT(pSamplesOut != NULL);
4318
4319 /* In my testing the order is rarely > 12, so in this case I'm going to simplify the SSE implementation by only handling order <= 12. */
9e052883 4320 if (lpcOrder > 0 && lpcOrder <= 12) {
4321 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
4322 return drflac__decode_samples_with_residual__rice__sse41_64(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
2ff0b512 4323 } else {
9e052883 4324 return drflac__decode_samples_with_residual__rice__sse41_32(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
2ff0b512 4325 }
4326 } else {
9e052883 4327 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
2ff0b512 4328 }
4329}
4330#endif
4331
4332#if defined(DRFLAC_SUPPORT_NEON)
4333static DRFLAC_INLINE void drflac__vst2q_s32(drflac_int32* p, int32x4x2_t x)
4334{
4335 vst1q_s32(p+0, x.val[0]);
4336 vst1q_s32(p+4, x.val[1]);
4337}
4338
4339static DRFLAC_INLINE void drflac__vst2q_u32(drflac_uint32* p, uint32x4x2_t x)
4340{
4341 vst1q_u32(p+0, x.val[0]);
4342 vst1q_u32(p+4, x.val[1]);
4343}
4344
4345static DRFLAC_INLINE void drflac__vst2q_f32(float* p, float32x4x2_t x)
4346{
4347 vst1q_f32(p+0, x.val[0]);
4348 vst1q_f32(p+4, x.val[1]);
4349}
4350
4351static DRFLAC_INLINE void drflac__vst2q_s16(drflac_int16* p, int16x4x2_t x)
4352{
4353 vst1q_s16(p, vcombine_s16(x.val[0], x.val[1]));
4354}
4355
4356static DRFLAC_INLINE void drflac__vst2q_u16(drflac_uint16* p, uint16x4x2_t x)
4357{
4358 vst1q_u16(p, vcombine_u16(x.val[0], x.val[1]));
4359}
4360
4361static DRFLAC_INLINE int32x4_t drflac__vdupq_n_s32x4(drflac_int32 x3, drflac_int32 x2, drflac_int32 x1, drflac_int32 x0)
4362{
4363 drflac_int32 x[4];
4364 x[3] = x3;
4365 x[2] = x2;
4366 x[1] = x1;
4367 x[0] = x0;
4368 return vld1q_s32(x);
4369}
4370
4371static DRFLAC_INLINE int32x4_t drflac__valignrq_s32_1(int32x4_t a, int32x4_t b)
4372{
4373 /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
4374
4375 /* Reference */
4376 /*return drflac__vdupq_n_s32x4(
4377 vgetq_lane_s32(a, 0),
4378 vgetq_lane_s32(b, 3),
4379 vgetq_lane_s32(b, 2),
4380 vgetq_lane_s32(b, 1)
4381 );*/
4382
4383 return vextq_s32(b, a, 1);
4384}
4385
4386static DRFLAC_INLINE uint32x4_t drflac__valignrq_u32_1(uint32x4_t a, uint32x4_t b)
4387{
4388 /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
4389
4390 /* Reference */
4391 /*return drflac__vdupq_n_s32x4(
4392 vgetq_lane_s32(a, 0),
4393 vgetq_lane_s32(b, 3),
4394 vgetq_lane_s32(b, 2),
4395 vgetq_lane_s32(b, 1)
4396 );*/
4397
4398 return vextq_u32(b, a, 1);
4399}
4400
4401static DRFLAC_INLINE int32x2_t drflac__vhaddq_s32(int32x4_t x)
4402{
4403 /* The sum must end up in position 0. */
4404
4405 /* Reference */
4406 /*return vdupq_n_s32(
4407 vgetq_lane_s32(x, 3) +
4408 vgetq_lane_s32(x, 2) +
4409 vgetq_lane_s32(x, 1) +
4410 vgetq_lane_s32(x, 0)
4411 );*/
4412
4413 int32x2_t r = vadd_s32(vget_high_s32(x), vget_low_s32(x));
4414 return vpadd_s32(r, r);
4415}
4416
4417static DRFLAC_INLINE int64x1_t drflac__vhaddq_s64(int64x2_t x)
4418{
4419 return vadd_s64(vget_high_s64(x), vget_low_s64(x));
4420}
4421
4422static DRFLAC_INLINE int32x4_t drflac__vrevq_s32(int32x4_t x)
4423{
4424 /* Reference */
4425 /*return drflac__vdupq_n_s32x4(
4426 vgetq_lane_s32(x, 0),
4427 vgetq_lane_s32(x, 1),
4428 vgetq_lane_s32(x, 2),
4429 vgetq_lane_s32(x, 3)
4430 );*/
4431
4432 return vrev64q_s32(vcombine_s32(vget_high_s32(x), vget_low_s32(x)));
4433}
4434
4435static DRFLAC_INLINE int32x4_t drflac__vnotq_s32(int32x4_t x)
4436{
4437 return veorq_s32(x, vdupq_n_s32(0xFFFFFFFF));
4438}
4439
4440static DRFLAC_INLINE uint32x4_t drflac__vnotq_u32(uint32x4_t x)
4441{
4442 return veorq_u32(x, vdupq_n_u32(0xFFFFFFFF));
4443}
4444
4445static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4446{
4447 int i;
4448 drflac_uint32 riceParamMask;
4449 drflac_int32* pDecodedSamples = pSamplesOut;
4450 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4451 drflac_uint32 zeroCountParts[4];
4452 drflac_uint32 riceParamParts[4];
4453 int32x4_t coefficients128_0;
4454 int32x4_t coefficients128_4;
4455 int32x4_t coefficients128_8;
4456 int32x4_t samples128_0;
4457 int32x4_t samples128_4;
4458 int32x4_t samples128_8;
4459 uint32x4_t riceParamMask128;
4460 int32x4_t riceParam128;
4461 int32x2_t shift64;
4462 uint32x4_t one128;
4463
4464 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4465
9e052883 4466 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
2ff0b512 4467 riceParamMask128 = vdupq_n_u32(riceParamMask);
4468
4469 riceParam128 = vdupq_n_s32(riceParam);
4470 shift64 = vdup_n_s32(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
4471 one128 = vdupq_n_u32(1);
4472
4473 /*
4474 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
4475 what's available in the input buffers. It would be conenient to use a fall-through switch to do this, but this results
4476 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
4477 so I think there's opportunity for this to be simplified.
4478 */
4479 {
4480 int runningOrder = order;
4481 drflac_int32 tempC[4] = {0, 0, 0, 0};
4482 drflac_int32 tempS[4] = {0, 0, 0, 0};
4483
4484 /* 0 - 3. */
4485 if (runningOrder >= 4) {
4486 coefficients128_0 = vld1q_s32(coefficients + 0);
4487 samples128_0 = vld1q_s32(pSamplesOut - 4);
4488 runningOrder -= 4;
4489 } else {
4490 switch (runningOrder) {
4491 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
4492 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
4493 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
4494 }
4495
4496 coefficients128_0 = vld1q_s32(tempC);
4497 samples128_0 = vld1q_s32(tempS);
4498 runningOrder = 0;
4499 }
4500
4501 /* 4 - 7 */
4502 if (runningOrder >= 4) {
4503 coefficients128_4 = vld1q_s32(coefficients + 4);
4504 samples128_4 = vld1q_s32(pSamplesOut - 8);
4505 runningOrder -= 4;
4506 } else {
4507 switch (runningOrder) {
4508 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
4509 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
4510 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
4511 }
4512
4513 coefficients128_4 = vld1q_s32(tempC);
4514 samples128_4 = vld1q_s32(tempS);
4515 runningOrder = 0;
4516 }
4517
4518 /* 8 - 11 */
4519 if (runningOrder == 4) {
4520 coefficients128_8 = vld1q_s32(coefficients + 8);
4521 samples128_8 = vld1q_s32(pSamplesOut - 12);
4522 runningOrder -= 4;
4523 } else {
4524 switch (runningOrder) {
4525 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
4526 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
4527 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
4528 }
4529
4530 coefficients128_8 = vld1q_s32(tempC);
4531 samples128_8 = vld1q_s32(tempS);
4532 runningOrder = 0;
4533 }
4534
4535 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4536 coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
4537 coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
4538 coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
4539 }
4540
4541 /* For this version we are doing one sample at a time. */
4542 while (pDecodedSamples < pDecodedSamplesEnd) {
4543 int32x4_t prediction128;
4544 int32x2_t prediction64;
4545 uint32x4_t zeroCountPart128;
4546 uint32x4_t riceParamPart128;
4547
4548 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
4549 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
4550 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
4551 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
4552 return DRFLAC_FALSE;
4553 }
4554
4555 zeroCountPart128 = vld1q_u32(zeroCountParts);
4556 riceParamPart128 = vld1q_u32(riceParamParts);
4557
4558 riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
4559 riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
4560 riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
4561
4562 if (order <= 4) {
4563 for (i = 0; i < 4; i += 1) {
4564 prediction128 = vmulq_s32(coefficients128_0, samples128_0);
4565
4566 /* Horizontal add and shift. */
4567 prediction64 = drflac__vhaddq_s32(prediction128);
4568 prediction64 = vshl_s32(prediction64, shift64);
4569 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4570
4571 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4572 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4573 }
4574 } else if (order <= 8) {
4575 for (i = 0; i < 4; i += 1) {
4576 prediction128 = vmulq_s32(coefficients128_4, samples128_4);
4577 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
4578
4579 /* Horizontal add and shift. */
4580 prediction64 = drflac__vhaddq_s32(prediction128);
4581 prediction64 = vshl_s32(prediction64, shift64);
4582 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4583
4584 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4585 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4586 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4587 }
4588 } else {
4589 for (i = 0; i < 4; i += 1) {
4590 prediction128 = vmulq_s32(coefficients128_8, samples128_8);
4591 prediction128 = vmlaq_s32(prediction128, coefficients128_4, samples128_4);
4592 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
4593
4594 /* Horizontal add and shift. */
4595 prediction64 = drflac__vhaddq_s32(prediction128);
4596 prediction64 = vshl_s32(prediction64, shift64);
4597 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4598
4599 samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
4600 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4601 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4602 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4603 }
4604 }
4605
4606 /* We store samples in groups of 4. */
4607 vst1q_s32(pDecodedSamples, samples128_0);
4608 pDecodedSamples += 4;
4609 }
4610
4611 /* Make sure we process the last few samples. */
4612 i = (count & ~3);
4613 while (i < (int)count) {
4614 /* Rice extraction. */
4615 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
4616 return DRFLAC_FALSE;
4617 }
4618
4619 /* Rice reconstruction. */
4620 riceParamParts[0] &= riceParamMask;
4621 riceParamParts[0] |= (zeroCountParts[0] << riceParam);
4622 riceParamParts[0] = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
4623
4624 /* Sample reconstruction. */
4625 pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
4626
4627 i += 1;
4628 pDecodedSamples += 1;
4629 }
4630
4631 return DRFLAC_TRUE;
4632}
4633
4634static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4635{
4636 int i;
4637 drflac_uint32 riceParamMask;
4638 drflac_int32* pDecodedSamples = pSamplesOut;
4639 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4640 drflac_uint32 zeroCountParts[4];
4641 drflac_uint32 riceParamParts[4];
4642 int32x4_t coefficients128_0;
4643 int32x4_t coefficients128_4;
4644 int32x4_t coefficients128_8;
4645 int32x4_t samples128_0;
4646 int32x4_t samples128_4;
4647 int32x4_t samples128_8;
4648 uint32x4_t riceParamMask128;
4649 int32x4_t riceParam128;
4650 int64x1_t shift64;
4651 uint32x4_t one128;
9e052883 4652 int64x2_t prediction128 = { 0 };
4653 uint32x4_t zeroCountPart128;
4654 uint32x4_t riceParamPart128;
2ff0b512 4655
4656 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4657
9e052883 4658 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
2ff0b512 4659 riceParamMask128 = vdupq_n_u32(riceParamMask);
4660
4661 riceParam128 = vdupq_n_s32(riceParam);
4662 shift64 = vdup_n_s64(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
4663 one128 = vdupq_n_u32(1);
4664
4665 /*
4666 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
9e052883 4667 what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
2ff0b512 4668 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
4669 so I think there's opportunity for this to be simplified.
4670 */
4671 {
4672 int runningOrder = order;
4673 drflac_int32 tempC[4] = {0, 0, 0, 0};
4674 drflac_int32 tempS[4] = {0, 0, 0, 0};
4675
4676 /* 0 - 3. */
4677 if (runningOrder >= 4) {
4678 coefficients128_0 = vld1q_s32(coefficients + 0);
4679 samples128_0 = vld1q_s32(pSamplesOut - 4);
4680 runningOrder -= 4;
4681 } else {
4682 switch (runningOrder) {
4683 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
4684 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
4685 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
4686 }
4687
4688 coefficients128_0 = vld1q_s32(tempC);
4689 samples128_0 = vld1q_s32(tempS);
4690 runningOrder = 0;
4691 }
4692
4693 /* 4 - 7 */
4694 if (runningOrder >= 4) {
4695 coefficients128_4 = vld1q_s32(coefficients + 4);
4696 samples128_4 = vld1q_s32(pSamplesOut - 8);
4697 runningOrder -= 4;
4698 } else {
4699 switch (runningOrder) {
4700 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
4701 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
4702 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
4703 }
4704
4705 coefficients128_4 = vld1q_s32(tempC);
4706 samples128_4 = vld1q_s32(tempS);
4707 runningOrder = 0;
4708 }
4709
4710 /* 8 - 11 */
4711 if (runningOrder == 4) {
4712 coefficients128_8 = vld1q_s32(coefficients + 8);
4713 samples128_8 = vld1q_s32(pSamplesOut - 12);
4714 runningOrder -= 4;
4715 } else {
4716 switch (runningOrder) {
4717 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
4718 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
4719 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
4720 }
4721
4722 coefficients128_8 = vld1q_s32(tempC);
4723 samples128_8 = vld1q_s32(tempS);
4724 runningOrder = 0;
4725 }
4726
4727 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4728 coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
4729 coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
4730 coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
4731 }
4732
4733 /* For this version we are doing one sample at a time. */
4734 while (pDecodedSamples < pDecodedSamplesEnd) {
2ff0b512 4735 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
4736 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
4737 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
4738 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
4739 return DRFLAC_FALSE;
4740 }
4741
4742 zeroCountPart128 = vld1q_u32(zeroCountParts);
4743 riceParamPart128 = vld1q_u32(riceParamParts);
4744
4745 riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
4746 riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
4747 riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
4748
4749 for (i = 0; i < 4; i += 1) {
4750 int64x1_t prediction64;
4751
4752 prediction128 = veorq_s64(prediction128, prediction128); /* Reset to 0. */
4753 switch (order)
4754 {
4755 case 12:
4756 case 11: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_8), vget_low_s32(samples128_8)));
4757 case 10:
4758 case 9: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_8), vget_high_s32(samples128_8)));
4759 case 8:
4760 case 7: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_4), vget_low_s32(samples128_4)));
4761 case 6:
4762 case 5: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_4), vget_high_s32(samples128_4)));
4763 case 4:
4764 case 3: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_0), vget_low_s32(samples128_0)));
4765 case 2:
4766 case 1: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_0), vget_high_s32(samples128_0)));
4767 }
4768
4769 /* Horizontal add and shift. */
4770 prediction64 = drflac__vhaddq_s64(prediction128);
4771 prediction64 = vshl_s64(prediction64, shift64);
4772 prediction64 = vadd_s64(prediction64, vdup_n_s64(vgetq_lane_u32(riceParamPart128, 0)));
4773
4774 /* Our value should be sitting in prediction64[0]. We need to combine this with our SSE samples. */
4775 samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
4776 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4777 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(vreinterpret_s32_s64(prediction64), vdup_n_s32(0)), samples128_0);
4778
4779 /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
4780 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4781 }
4782
4783 /* We store samples in groups of 4. */
4784 vst1q_s32(pDecodedSamples, samples128_0);
4785 pDecodedSamples += 4;
4786 }
4787
4788 /* Make sure we process the last few samples. */
4789 i = (count & ~3);
4790 while (i < (int)count) {
4791 /* Rice extraction. */
4792 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
4793 return DRFLAC_FALSE;
4794 }
4795
4796 /* Rice reconstruction. */
4797 riceParamParts[0] &= riceParamMask;
4798 riceParamParts[0] |= (zeroCountParts[0] << riceParam);
4799 riceParamParts[0] = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
4800
4801 /* Sample reconstruction. */
4802 pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
4803
4804 i += 1;
4805 pDecodedSamples += 1;
4806 }
4807
4808 return DRFLAC_TRUE;
4809}
4810
9e052883 4811static drflac_bool32 drflac__decode_samples_with_residual__rice__neon(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
2ff0b512 4812{
4813 DRFLAC_ASSERT(bs != NULL);
2ff0b512 4814 DRFLAC_ASSERT(pSamplesOut != NULL);
4815
4816 /* In my testing the order is rarely > 12, so in this case I'm going to simplify the NEON implementation by only handling order <= 12. */
9e052883 4817 if (lpcOrder > 0 && lpcOrder <= 12) {
4818 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
4819 return drflac__decode_samples_with_residual__rice__neon_64(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
2ff0b512 4820 } else {
9e052883 4821 return drflac__decode_samples_with_residual__rice__neon_32(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
2ff0b512 4822 }
4823 } else {
9e052883 4824 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
2ff0b512 4825 }
4826}
4827#endif
4828
9e052883 4829static drflac_bool32 drflac__decode_samples_with_residual__rice(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
2ff0b512 4830{
4831#if defined(DRFLAC_SUPPORT_SSE41)
4832 if (drflac__gIsSSE41Supported) {
9e052883 4833 return drflac__decode_samples_with_residual__rice__sse41(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
2ff0b512 4834 } else
4835#elif defined(DRFLAC_SUPPORT_NEON)
4836 if (drflac__gIsNEONSupported) {
9e052883 4837 return drflac__decode_samples_with_residual__rice__neon(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
2ff0b512 4838 } else
4839#endif
4840 {
4841 /* Scalar fallback. */
9e052883 4842 #if 0
4843 return drflac__decode_samples_with_residual__rice__reference(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4844 #else
4845 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4846 #endif
2ff0b512 4847 }
4848}
4849
4850/* Reads and seeks past a string of residual values as Rice codes. The decoder should be sitting on the first bit of the Rice codes. */
4851static drflac_bool32 drflac__read_and_seek_residual__rice(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam)
4852{
4853 drflac_uint32 i;
4854
4855 DRFLAC_ASSERT(bs != NULL);
2ff0b512 4856
4857 for (i = 0; i < count; ++i) {
4858 if (!drflac__seek_rice_parts(bs, riceParam)) {
4859 return DRFLAC_FALSE;
4860 }
4861 }
4862
4863 return DRFLAC_TRUE;
4864}
4865
9e052883 4866#if defined(__clang__)
4867__attribute__((no_sanitize("signed-integer-overflow")))
4868#endif
4869static drflac_bool32 drflac__decode_samples_with_residual__unencoded(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 unencodedBitsPerSample, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
2ff0b512 4870{
4871 drflac_uint32 i;
4872
4873 DRFLAC_ASSERT(bs != NULL);
2ff0b512 4874 DRFLAC_ASSERT(unencodedBitsPerSample <= 31); /* <-- unencodedBitsPerSample is a 5 bit number, so cannot exceed 31. */
4875 DRFLAC_ASSERT(pSamplesOut != NULL);
4876
4877 for (i = 0; i < count; ++i) {
4878 if (unencodedBitsPerSample > 0) {
4879 if (!drflac__read_int32(bs, unencodedBitsPerSample, pSamplesOut + i)) {
4880 return DRFLAC_FALSE;
4881 }
4882 } else {
4883 pSamplesOut[i] = 0;
4884 }
4885
9e052883 4886 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
4887 pSamplesOut[i] += drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
2ff0b512 4888 } else {
9e052883 4889 pSamplesOut[i] += drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
2ff0b512 4890 }
4891 }
4892
4893 return DRFLAC_TRUE;
4894}
4895
4896
4897/*
4898Reads and decodes the residual for the sub-frame the decoder is currently sitting on. This function should be called
4899when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be ignored. The
4900<blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
4901*/
9e052883 4902static drflac_bool32 drflac__decode_samples_with_residual(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 blockSize, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
2ff0b512 4903{
4904 drflac_uint8 residualMethod;
4905 drflac_uint8 partitionOrder;
4906 drflac_uint32 samplesInPartition;
4907 drflac_uint32 partitionsRemaining;
4908
4909 DRFLAC_ASSERT(bs != NULL);
4910 DRFLAC_ASSERT(blockSize != 0);
4911 DRFLAC_ASSERT(pDecodedSamples != NULL); /* <-- Should we allow NULL, in which case we just seek past the residual rather than do a full decode? */
4912
4913 if (!drflac__read_uint8(bs, 2, &residualMethod)) {
4914 return DRFLAC_FALSE;
4915 }
4916
4917 if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4918 return DRFLAC_FALSE; /* Unknown or unsupported residual coding method. */
4919 }
4920
4921 /* Ignore the first <order> values. */
9e052883 4922 pDecodedSamples += lpcOrder;
2ff0b512 4923
4924 if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
4925 return DRFLAC_FALSE;
4926 }
4927
4928 /*
4929 From the FLAC spec:
4930 The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
4931 */
4932 if (partitionOrder > 8) {
4933 return DRFLAC_FALSE;
4934 }
4935
4936 /* Validation check. */
9e052883 4937 if ((blockSize / (1 << partitionOrder)) < lpcOrder) {
2ff0b512 4938 return DRFLAC_FALSE;
4939 }
4940
9e052883 4941 samplesInPartition = (blockSize / (1 << partitionOrder)) - lpcOrder;
2ff0b512 4942 partitionsRemaining = (1 << partitionOrder);
4943 for (;;) {
4944 drflac_uint8 riceParam = 0;
4945 if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
4946 if (!drflac__read_uint8(bs, 4, &riceParam)) {
4947 return DRFLAC_FALSE;
4948 }
4949 if (riceParam == 15) {
4950 riceParam = 0xFF;
4951 }
4952 } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4953 if (!drflac__read_uint8(bs, 5, &riceParam)) {
4954 return DRFLAC_FALSE;
4955 }
4956 if (riceParam == 31) {
4957 riceParam = 0xFF;
4958 }
4959 }
4960
4961 if (riceParam != 0xFF) {
9e052883 4962 if (!drflac__decode_samples_with_residual__rice(bs, bitsPerSample, samplesInPartition, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
2ff0b512 4963 return DRFLAC_FALSE;
4964 }
4965 } else {
4966 drflac_uint8 unencodedBitsPerSample = 0;
4967 if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
4968 return DRFLAC_FALSE;
4969 }
4970
9e052883 4971 if (!drflac__decode_samples_with_residual__unencoded(bs, bitsPerSample, samplesInPartition, unencodedBitsPerSample, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
2ff0b512 4972 return DRFLAC_FALSE;
4973 }
4974 }
4975
4976 pDecodedSamples += samplesInPartition;
4977
4978 if (partitionsRemaining == 1) {
4979 break;
4980 }
4981
4982 partitionsRemaining -= 1;
4983
4984 if (partitionOrder != 0) {
4985 samplesInPartition = blockSize / (1 << partitionOrder);
4986 }
4987 }
4988
4989 return DRFLAC_TRUE;
4990}
4991
4992/*
4993Reads and seeks past the residual for the sub-frame the decoder is currently sitting on. This function should be called
4994when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be set to 0. The
4995<blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
4996*/
4997static drflac_bool32 drflac__read_and_seek_residual(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 order)
4998{
4999 drflac_uint8 residualMethod;
5000 drflac_uint8 partitionOrder;
5001 drflac_uint32 samplesInPartition;
5002 drflac_uint32 partitionsRemaining;
5003
5004 DRFLAC_ASSERT(bs != NULL);
5005 DRFLAC_ASSERT(blockSize != 0);
5006
5007 if (!drflac__read_uint8(bs, 2, &residualMethod)) {
5008 return DRFLAC_FALSE;
5009 }
5010
5011 if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
5012 return DRFLAC_FALSE; /* Unknown or unsupported residual coding method. */
5013 }
5014
5015 if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
5016 return DRFLAC_FALSE;
5017 }
5018
5019 /*
5020 From the FLAC spec:
5021 The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
5022 */
5023 if (partitionOrder > 8) {
5024 return DRFLAC_FALSE;
5025 }
5026
5027 /* Validation check. */
5028 if ((blockSize / (1 << partitionOrder)) <= order) {
5029 return DRFLAC_FALSE;
5030 }
5031
5032 samplesInPartition = (blockSize / (1 << partitionOrder)) - order;
5033 partitionsRemaining = (1 << partitionOrder);
5034 for (;;)
5035 {
5036 drflac_uint8 riceParam = 0;
5037 if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
5038 if (!drflac__read_uint8(bs, 4, &riceParam)) {
5039 return DRFLAC_FALSE;
5040 }
5041 if (riceParam == 15) {
5042 riceParam = 0xFF;
5043 }
5044 } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
5045 if (!drflac__read_uint8(bs, 5, &riceParam)) {
5046 return DRFLAC_FALSE;
5047 }
5048 if (riceParam == 31) {
5049 riceParam = 0xFF;
5050 }
5051 }
5052
5053 if (riceParam != 0xFF) {
5054 if (!drflac__read_and_seek_residual__rice(bs, samplesInPartition, riceParam)) {
5055 return DRFLAC_FALSE;
5056 }
5057 } else {
5058 drflac_uint8 unencodedBitsPerSample = 0;
5059 if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
5060 return DRFLAC_FALSE;
5061 }
5062
5063 if (!drflac__seek_bits(bs, unencodedBitsPerSample * samplesInPartition)) {
5064 return DRFLAC_FALSE;
5065 }
5066 }
5067
5068
5069 if (partitionsRemaining == 1) {
5070 break;
5071 }
5072
5073 partitionsRemaining -= 1;
5074 samplesInPartition = blockSize / (1 << partitionOrder);
5075 }
5076
5077 return DRFLAC_TRUE;
5078}
5079
5080
5081static drflac_bool32 drflac__decode_samples__constant(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
5082{
5083 drflac_uint32 i;
5084
5085 /* Only a single sample needs to be decoded here. */
5086 drflac_int32 sample;
5087 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5088 return DRFLAC_FALSE;
5089 }
5090
5091 /*
5092 We don't really need to expand this, but it does simplify the process of reading samples. If this becomes a performance issue (unlikely)
5093 we'll want to look at a more efficient way.
5094 */
5095 for (i = 0; i < blockSize; ++i) {
5096 pDecodedSamples[i] = sample;
5097 }
5098
5099 return DRFLAC_TRUE;
5100}
5101
5102static drflac_bool32 drflac__decode_samples__verbatim(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
5103{
5104 drflac_uint32 i;
5105
5106 for (i = 0; i < blockSize; ++i) {
5107 drflac_int32 sample;
5108 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5109 return DRFLAC_FALSE;
5110 }
5111
5112 pDecodedSamples[i] = sample;
5113 }
5114
5115 return DRFLAC_TRUE;
5116}
5117
5118static drflac_bool32 drflac__decode_samples__fixed(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
5119{
5120 drflac_uint32 i;
5121
5122 static drflac_int32 lpcCoefficientsTable[5][4] = {
5123 {0, 0, 0, 0},
5124 {1, 0, 0, 0},
5125 {2, -1, 0, 0},
5126 {3, -3, 1, 0},
5127 {4, -6, 4, -1}
5128 };
5129
5130 /* Warm up samples and coefficients. */
5131 for (i = 0; i < lpcOrder; ++i) {
5132 drflac_int32 sample;
5133 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5134 return DRFLAC_FALSE;
5135 }
5136
5137 pDecodedSamples[i] = sample;
5138 }
5139
9e052883 5140 if (!drflac__decode_samples_with_residual(bs, subframeBitsPerSample, blockSize, lpcOrder, 0, 4, lpcCoefficientsTable[lpcOrder], pDecodedSamples)) {
2ff0b512 5141 return DRFLAC_FALSE;
5142 }
5143
5144 return DRFLAC_TRUE;
5145}
5146
5147static drflac_bool32 drflac__decode_samples__lpc(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 bitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
5148{
5149 drflac_uint8 i;
5150 drflac_uint8 lpcPrecision;
5151 drflac_int8 lpcShift;
5152 drflac_int32 coefficients[32];
5153
5154 /* Warm up samples. */
5155 for (i = 0; i < lpcOrder; ++i) {
5156 drflac_int32 sample;
5157 if (!drflac__read_int32(bs, bitsPerSample, &sample)) {
5158 return DRFLAC_FALSE;
5159 }
5160
5161 pDecodedSamples[i] = sample;
5162 }
5163
5164 if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
5165 return DRFLAC_FALSE;
5166 }
5167 if (lpcPrecision == 15) {
5168 return DRFLAC_FALSE; /* Invalid. */
5169 }
5170 lpcPrecision += 1;
5171
5172 if (!drflac__read_int8(bs, 5, &lpcShift)) {
5173 return DRFLAC_FALSE;
5174 }
5175
5176 /*
5177 From the FLAC specification:
5178
5179 Quantized linear predictor coefficient shift needed in bits (NOTE: this number is signed two's-complement)
5180
5181 Emphasis on the "signed two's-complement". In practice there does not seem to be any encoders nor decoders supporting negative shifts. For now dr_flac is
5182 not going to support negative shifts as I don't have any reference files. However, when a reference file comes through I will consider adding support.
5183 */
5184 if (lpcShift < 0) {
5185 return DRFLAC_FALSE;
5186 }
5187
5188 DRFLAC_ZERO_MEMORY(coefficients, sizeof(coefficients));
5189 for (i = 0; i < lpcOrder; ++i) {
5190 if (!drflac__read_int32(bs, lpcPrecision, coefficients + i)) {
5191 return DRFLAC_FALSE;
5192 }
5193 }
5194
9e052883 5195 if (!drflac__decode_samples_with_residual(bs, bitsPerSample, blockSize, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
2ff0b512 5196 return DRFLAC_FALSE;
5197 }
5198
5199 return DRFLAC_TRUE;
5200}
5201
5202
5203static drflac_bool32 drflac__read_next_flac_frame_header(drflac_bs* bs, drflac_uint8 streaminfoBitsPerSample, drflac_frame_header* header)
5204{
5205 const drflac_uint32 sampleRateTable[12] = {0, 88200, 176400, 192000, 8000, 16000, 22050, 24000, 32000, 44100, 48000, 96000};
5206 const drflac_uint8 bitsPerSampleTable[8] = {0, 8, 12, (drflac_uint8)-1, 16, 20, 24, (drflac_uint8)-1}; /* -1 = reserved. */
5207
5208 DRFLAC_ASSERT(bs != NULL);
5209 DRFLAC_ASSERT(header != NULL);
5210
5211 /* Keep looping until we find a valid sync code. */
5212 for (;;) {
5213 drflac_uint8 crc8 = 0xCE; /* 0xCE = drflac_crc8(0, 0x3FFE, 14); */
5214 drflac_uint8 reserved = 0;
5215 drflac_uint8 blockingStrategy = 0;
5216 drflac_uint8 blockSize = 0;
5217 drflac_uint8 sampleRate = 0;
5218 drflac_uint8 channelAssignment = 0;
5219 drflac_uint8 bitsPerSample = 0;
5220 drflac_bool32 isVariableBlockSize;
5221
5222 if (!drflac__find_and_seek_to_next_sync_code(bs)) {
5223 return DRFLAC_FALSE;
5224 }
5225
5226 if (!drflac__read_uint8(bs, 1, &reserved)) {
5227 return DRFLAC_FALSE;
5228 }
5229 if (reserved == 1) {
5230 continue;
5231 }
5232 crc8 = drflac_crc8(crc8, reserved, 1);
5233
5234 if (!drflac__read_uint8(bs, 1, &blockingStrategy)) {
5235 return DRFLAC_FALSE;
5236 }
5237 crc8 = drflac_crc8(crc8, blockingStrategy, 1);
5238
5239 if (!drflac__read_uint8(bs, 4, &blockSize)) {
5240 return DRFLAC_FALSE;
5241 }
5242 if (blockSize == 0) {
5243 continue;
5244 }
5245 crc8 = drflac_crc8(crc8, blockSize, 4);
5246
5247 if (!drflac__read_uint8(bs, 4, &sampleRate)) {
5248 return DRFLAC_FALSE;
5249 }
5250 crc8 = drflac_crc8(crc8, sampleRate, 4);
5251
5252 if (!drflac__read_uint8(bs, 4, &channelAssignment)) {
5253 return DRFLAC_FALSE;
5254 }
5255 if (channelAssignment > 10) {
5256 continue;
5257 }
5258 crc8 = drflac_crc8(crc8, channelAssignment, 4);
5259
5260 if (!drflac__read_uint8(bs, 3, &bitsPerSample)) {
5261 return DRFLAC_FALSE;
5262 }
5263 if (bitsPerSample == 3 || bitsPerSample == 7) {
5264 continue;
5265 }
5266 crc8 = drflac_crc8(crc8, bitsPerSample, 3);
5267
5268
5269 if (!drflac__read_uint8(bs, 1, &reserved)) {
5270 return DRFLAC_FALSE;
5271 }
5272 if (reserved == 1) {
5273 continue;
5274 }
5275 crc8 = drflac_crc8(crc8, reserved, 1);
5276
5277
5278 isVariableBlockSize = blockingStrategy == 1;
5279 if (isVariableBlockSize) {
5280 drflac_uint64 pcmFrameNumber;
5281 drflac_result result = drflac__read_utf8_coded_number(bs, &pcmFrameNumber, &crc8);
5282 if (result != DRFLAC_SUCCESS) {
5283 if (result == DRFLAC_AT_END) {
5284 return DRFLAC_FALSE;
5285 } else {
5286 continue;
5287 }
5288 }
5289 header->flacFrameNumber = 0;
5290 header->pcmFrameNumber = pcmFrameNumber;
5291 } else {
5292 drflac_uint64 flacFrameNumber = 0;
5293 drflac_result result = drflac__read_utf8_coded_number(bs, &flacFrameNumber, &crc8);
5294 if (result != DRFLAC_SUCCESS) {
5295 if (result == DRFLAC_AT_END) {
5296 return DRFLAC_FALSE;
5297 } else {
5298 continue;
5299 }
5300 }
5301 header->flacFrameNumber = (drflac_uint32)flacFrameNumber; /* <-- Safe cast. */
5302 header->pcmFrameNumber = 0;
5303 }
5304
5305
5306 DRFLAC_ASSERT(blockSize > 0);
5307 if (blockSize == 1) {
5308 header->blockSizeInPCMFrames = 192;
5309 } else if (blockSize <= 5) {
5310 DRFLAC_ASSERT(blockSize >= 2);
5311 header->blockSizeInPCMFrames = 576 * (1 << (blockSize - 2));
5312 } else if (blockSize == 6) {
5313 if (!drflac__read_uint16(bs, 8, &header->blockSizeInPCMFrames)) {
5314 return DRFLAC_FALSE;
5315 }
5316 crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 8);
5317 header->blockSizeInPCMFrames += 1;
5318 } else if (blockSize == 7) {
5319 if (!drflac__read_uint16(bs, 16, &header->blockSizeInPCMFrames)) {
5320 return DRFLAC_FALSE;
5321 }
5322 crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 16);
9e052883 5323 if (header->blockSizeInPCMFrames == 0xFFFF) {
5324 return DRFLAC_FALSE; /* Frame is too big. This is the size of the frame minus 1. The STREAMINFO block defines the max block size which is 16-bits. Adding one will make it 17 bits and therefore too big. */
5325 }
2ff0b512 5326 header->blockSizeInPCMFrames += 1;
5327 } else {
5328 DRFLAC_ASSERT(blockSize >= 8);
5329 header->blockSizeInPCMFrames = 256 * (1 << (blockSize - 8));
5330 }
5331
5332
5333 if (sampleRate <= 11) {
5334 header->sampleRate = sampleRateTable[sampleRate];
5335 } else if (sampleRate == 12) {
5336 if (!drflac__read_uint32(bs, 8, &header->sampleRate)) {
5337 return DRFLAC_FALSE;
5338 }
5339 crc8 = drflac_crc8(crc8, header->sampleRate, 8);
5340 header->sampleRate *= 1000;
5341 } else if (sampleRate == 13) {
5342 if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
5343 return DRFLAC_FALSE;
5344 }
5345 crc8 = drflac_crc8(crc8, header->sampleRate, 16);
5346 } else if (sampleRate == 14) {
5347 if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
5348 return DRFLAC_FALSE;
5349 }
5350 crc8 = drflac_crc8(crc8, header->sampleRate, 16);
5351 header->sampleRate *= 10;
5352 } else {
5353 continue; /* Invalid. Assume an invalid block. */
5354 }
5355
5356
5357 header->channelAssignment = channelAssignment;
5358
5359 header->bitsPerSample = bitsPerSampleTable[bitsPerSample];
5360 if (header->bitsPerSample == 0) {
5361 header->bitsPerSample = streaminfoBitsPerSample;
5362 }
5363
9e052883 5364 if (header->bitsPerSample != streaminfoBitsPerSample) {
5365 /* If this subframe has a different bitsPerSample then streaminfo or the first frame, reject it */
5366 return DRFLAC_FALSE;
5367 }
5368
2ff0b512 5369 if (!drflac__read_uint8(bs, 8, &header->crc8)) {
5370 return DRFLAC_FALSE;
5371 }
5372
5373#ifndef DR_FLAC_NO_CRC
5374 if (header->crc8 != crc8) {
5375 continue; /* CRC mismatch. Loop back to the top and find the next sync code. */
5376 }
5377#endif
5378 return DRFLAC_TRUE;
5379 }
5380}
5381
5382static drflac_bool32 drflac__read_subframe_header(drflac_bs* bs, drflac_subframe* pSubframe)
5383{
5384 drflac_uint8 header;
5385 int type;
5386
5387 if (!drflac__read_uint8(bs, 8, &header)) {
5388 return DRFLAC_FALSE;
5389 }
5390
5391 /* First bit should always be 0. */
5392 if ((header & 0x80) != 0) {
5393 return DRFLAC_FALSE;
5394 }
5395
5396 type = (header & 0x7E) >> 1;
5397 if (type == 0) {
5398 pSubframe->subframeType = DRFLAC_SUBFRAME_CONSTANT;
5399 } else if (type == 1) {
5400 pSubframe->subframeType = DRFLAC_SUBFRAME_VERBATIM;
5401 } else {
5402 if ((type & 0x20) != 0) {
5403 pSubframe->subframeType = DRFLAC_SUBFRAME_LPC;
5404 pSubframe->lpcOrder = (drflac_uint8)(type & 0x1F) + 1;
5405 } else if ((type & 0x08) != 0) {
5406 pSubframe->subframeType = DRFLAC_SUBFRAME_FIXED;
5407 pSubframe->lpcOrder = (drflac_uint8)(type & 0x07);
5408 if (pSubframe->lpcOrder > 4) {
5409 pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
5410 pSubframe->lpcOrder = 0;
5411 }
5412 } else {
5413 pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
5414 }
5415 }
5416
5417 if (pSubframe->subframeType == DRFLAC_SUBFRAME_RESERVED) {
5418 return DRFLAC_FALSE;
5419 }
5420
5421 /* Wasted bits per sample. */
5422 pSubframe->wastedBitsPerSample = 0;
5423 if ((header & 0x01) == 1) {
5424 unsigned int wastedBitsPerSample;
5425 if (!drflac__seek_past_next_set_bit(bs, &wastedBitsPerSample)) {
5426 return DRFLAC_FALSE;
5427 }
5428 pSubframe->wastedBitsPerSample = (drflac_uint8)wastedBitsPerSample + 1;
5429 }
5430
5431 return DRFLAC_TRUE;
5432}
5433
5434static drflac_bool32 drflac__decode_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex, drflac_int32* pDecodedSamplesOut)
5435{
5436 drflac_subframe* pSubframe;
5437 drflac_uint32 subframeBitsPerSample;
5438
5439 DRFLAC_ASSERT(bs != NULL);
5440 DRFLAC_ASSERT(frame != NULL);
5441
5442 pSubframe = frame->subframes + subframeIndex;
5443 if (!drflac__read_subframe_header(bs, pSubframe)) {
5444 return DRFLAC_FALSE;
5445 }
5446
5447 /* Side channels require an extra bit per sample. Took a while to figure that one out... */
5448 subframeBitsPerSample = frame->header.bitsPerSample;
5449 if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
5450 subframeBitsPerSample += 1;
5451 } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
5452 subframeBitsPerSample += 1;
5453 }
5454
9e052883 5455 if (subframeBitsPerSample > 32) {
5456 /* libFLAC and ffmpeg reject 33-bit subframes as well */
5457 return DRFLAC_FALSE;
5458 }
5459
2ff0b512 5460 /* Need to handle wasted bits per sample. */
5461 if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
5462 return DRFLAC_FALSE;
5463 }
5464 subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
5465
5466 pSubframe->pSamplesS32 = pDecodedSamplesOut;
5467
5468 switch (pSubframe->subframeType)
5469 {
5470 case DRFLAC_SUBFRAME_CONSTANT:
5471 {
5472 drflac__decode_samples__constant(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
5473 } break;
5474
5475 case DRFLAC_SUBFRAME_VERBATIM:
5476 {
5477 drflac__decode_samples__verbatim(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
5478 } break;
5479
5480 case DRFLAC_SUBFRAME_FIXED:
5481 {
5482 drflac__decode_samples__fixed(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
5483 } break;
5484
5485 case DRFLAC_SUBFRAME_LPC:
5486 {
5487 drflac__decode_samples__lpc(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
5488 } break;
5489
5490 default: return DRFLAC_FALSE;
5491 }
5492
5493 return DRFLAC_TRUE;
5494}
5495
5496static drflac_bool32 drflac__seek_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex)
5497{
5498 drflac_subframe* pSubframe;
5499 drflac_uint32 subframeBitsPerSample;
5500
5501 DRFLAC_ASSERT(bs != NULL);
5502 DRFLAC_ASSERT(frame != NULL);
5503
5504 pSubframe = frame->subframes + subframeIndex;
5505 if (!drflac__read_subframe_header(bs, pSubframe)) {
5506 return DRFLAC_FALSE;
5507 }
5508
5509 /* Side channels require an extra bit per sample. Took a while to figure that one out... */
5510 subframeBitsPerSample = frame->header.bitsPerSample;
5511 if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
5512 subframeBitsPerSample += 1;
5513 } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
5514 subframeBitsPerSample += 1;
5515 }
5516
5517 /* Need to handle wasted bits per sample. */
5518 if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
5519 return DRFLAC_FALSE;
5520 }
5521 subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
5522
5523 pSubframe->pSamplesS32 = NULL;
5524
5525 switch (pSubframe->subframeType)
5526 {
5527 case DRFLAC_SUBFRAME_CONSTANT:
5528 {
5529 if (!drflac__seek_bits(bs, subframeBitsPerSample)) {
5530 return DRFLAC_FALSE;
5531 }
5532 } break;
5533
5534 case DRFLAC_SUBFRAME_VERBATIM:
5535 {
5536 unsigned int bitsToSeek = frame->header.blockSizeInPCMFrames * subframeBitsPerSample;
5537 if (!drflac__seek_bits(bs, bitsToSeek)) {
5538 return DRFLAC_FALSE;
5539 }
5540 } break;
5541
5542 case DRFLAC_SUBFRAME_FIXED:
5543 {
5544 unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
5545 if (!drflac__seek_bits(bs, bitsToSeek)) {
5546 return DRFLAC_FALSE;
5547 }
5548
5549 if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
5550 return DRFLAC_FALSE;
5551 }
5552 } break;
5553
5554 case DRFLAC_SUBFRAME_LPC:
5555 {
5556 drflac_uint8 lpcPrecision;
5557
5558 unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
5559 if (!drflac__seek_bits(bs, bitsToSeek)) {
5560 return DRFLAC_FALSE;
5561 }
5562
5563 if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
5564 return DRFLAC_FALSE;
5565 }
5566 if (lpcPrecision == 15) {
5567 return DRFLAC_FALSE; /* Invalid. */
5568 }
5569 lpcPrecision += 1;
5570
5571
5572 bitsToSeek = (pSubframe->lpcOrder * lpcPrecision) + 5; /* +5 for shift. */
5573 if (!drflac__seek_bits(bs, bitsToSeek)) {
5574 return DRFLAC_FALSE;
5575 }
5576
5577 if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
5578 return DRFLAC_FALSE;
5579 }
5580 } break;
5581
5582 default: return DRFLAC_FALSE;
5583 }
5584
5585 return DRFLAC_TRUE;
5586}
5587
5588
5589static DRFLAC_INLINE drflac_uint8 drflac__get_channel_count_from_channel_assignment(drflac_int8 channelAssignment)
5590{
5591 drflac_uint8 lookup[] = {1, 2, 3, 4, 5, 6, 7, 8, 2, 2, 2};
5592
5593 DRFLAC_ASSERT(channelAssignment <= 10);
5594 return lookup[channelAssignment];
5595}
5596
5597static drflac_result drflac__decode_flac_frame(drflac* pFlac)
5598{
5599 int channelCount;
5600 int i;
5601 drflac_uint8 paddingSizeInBits;
5602 drflac_uint16 desiredCRC16;
5603#ifndef DR_FLAC_NO_CRC
5604 drflac_uint16 actualCRC16;
5605#endif
5606
5607 /* This function should be called while the stream is sitting on the first byte after the frame header. */
5608 DRFLAC_ZERO_MEMORY(pFlac->currentFLACFrame.subframes, sizeof(pFlac->currentFLACFrame.subframes));
5609
5610 /* The frame block size must never be larger than the maximum block size defined by the FLAC stream. */
5611 if (pFlac->currentFLACFrame.header.blockSizeInPCMFrames > pFlac->maxBlockSizeInPCMFrames) {
5612 return DRFLAC_ERROR;
5613 }
5614
5615 /* The number of channels in the frame must match the channel count from the STREAMINFO block. */
5616 channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
5617 if (channelCount != (int)pFlac->channels) {
5618 return DRFLAC_ERROR;
5619 }
5620
5621 for (i = 0; i < channelCount; ++i) {
5622 if (!drflac__decode_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i, pFlac->pDecodedSamples + (pFlac->currentFLACFrame.header.blockSizeInPCMFrames * i))) {
5623 return DRFLAC_ERROR;
5624 }
5625 }
5626
5627 paddingSizeInBits = (drflac_uint8)(DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7);
5628 if (paddingSizeInBits > 0) {
5629 drflac_uint8 padding = 0;
5630 if (!drflac__read_uint8(&pFlac->bs, paddingSizeInBits, &padding)) {
5631 return DRFLAC_AT_END;
5632 }
5633 }
5634
5635#ifndef DR_FLAC_NO_CRC
5636 actualCRC16 = drflac__flush_crc16(&pFlac->bs);
5637#endif
5638 if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
5639 return DRFLAC_AT_END;
5640 }
5641
5642#ifndef DR_FLAC_NO_CRC
5643 if (actualCRC16 != desiredCRC16) {
5644 return DRFLAC_CRC_MISMATCH; /* CRC mismatch. */
5645 }
5646#endif
5647
5648 pFlac->currentFLACFrame.pcmFramesRemaining = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
5649
5650 return DRFLAC_SUCCESS;
5651}
5652
5653static drflac_result drflac__seek_flac_frame(drflac* pFlac)
5654{
5655 int channelCount;
5656 int i;
5657 drflac_uint16 desiredCRC16;
5658#ifndef DR_FLAC_NO_CRC
5659 drflac_uint16 actualCRC16;
5660#endif
5661
5662 channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
5663 for (i = 0; i < channelCount; ++i) {
5664 if (!drflac__seek_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i)) {
5665 return DRFLAC_ERROR;
5666 }
5667 }
5668
5669 /* Padding. */
5670 if (!drflac__seek_bits(&pFlac->bs, DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7)) {
5671 return DRFLAC_ERROR;
5672 }
5673
5674 /* CRC. */
5675#ifndef DR_FLAC_NO_CRC
5676 actualCRC16 = drflac__flush_crc16(&pFlac->bs);
5677#endif
5678 if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
5679 return DRFLAC_AT_END;
5680 }
5681
5682#ifndef DR_FLAC_NO_CRC
5683 if (actualCRC16 != desiredCRC16) {
5684 return DRFLAC_CRC_MISMATCH; /* CRC mismatch. */
5685 }
5686#endif
5687
5688 return DRFLAC_SUCCESS;
5689}
5690
5691static drflac_bool32 drflac__read_and_decode_next_flac_frame(drflac* pFlac)
5692{
5693 DRFLAC_ASSERT(pFlac != NULL);
5694
5695 for (;;) {
5696 drflac_result result;
5697
5698 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5699 return DRFLAC_FALSE;
5700 }
5701
5702 result = drflac__decode_flac_frame(pFlac);
5703 if (result != DRFLAC_SUCCESS) {
5704 if (result == DRFLAC_CRC_MISMATCH) {
5705 continue; /* CRC mismatch. Skip to the next frame. */
5706 } else {
5707 return DRFLAC_FALSE;
5708 }
5709 }
5710
5711 return DRFLAC_TRUE;
5712 }
5713}
5714
5715static void drflac__get_pcm_frame_range_of_current_flac_frame(drflac* pFlac, drflac_uint64* pFirstPCMFrame, drflac_uint64* pLastPCMFrame)
5716{
5717 drflac_uint64 firstPCMFrame;
5718 drflac_uint64 lastPCMFrame;
5719
5720 DRFLAC_ASSERT(pFlac != NULL);
5721
5722 firstPCMFrame = pFlac->currentFLACFrame.header.pcmFrameNumber;
5723 if (firstPCMFrame == 0) {
5724 firstPCMFrame = ((drflac_uint64)pFlac->currentFLACFrame.header.flacFrameNumber) * pFlac->maxBlockSizeInPCMFrames;
5725 }
5726
5727 lastPCMFrame = firstPCMFrame + pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
5728 if (lastPCMFrame > 0) {
5729 lastPCMFrame -= 1; /* Needs to be zero based. */
5730 }
5731
5732 if (pFirstPCMFrame) {
5733 *pFirstPCMFrame = firstPCMFrame;
5734 }
5735 if (pLastPCMFrame) {
5736 *pLastPCMFrame = lastPCMFrame;
5737 }
5738}
5739
5740static drflac_bool32 drflac__seek_to_first_frame(drflac* pFlac)
5741{
5742 drflac_bool32 result;
5743
5744 DRFLAC_ASSERT(pFlac != NULL);
5745
5746 result = drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes);
5747
5748 DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
5749 pFlac->currentPCMFrame = 0;
5750
5751 return result;
5752}
5753
5754static DRFLAC_INLINE drflac_result drflac__seek_to_next_flac_frame(drflac* pFlac)
5755{
5756 /* This function should only ever be called while the decoder is sitting on the first byte past the FRAME_HEADER section. */
5757 DRFLAC_ASSERT(pFlac != NULL);
5758 return drflac__seek_flac_frame(pFlac);
5759}
5760
5761
5762static drflac_uint64 drflac__seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 pcmFramesToSeek)
5763{
5764 drflac_uint64 pcmFramesRead = 0;
5765 while (pcmFramesToSeek > 0) {
5766 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
5767 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
5768 break; /* Couldn't read the next frame, so just break from the loop and return. */
5769 }
5770 } else {
5771 if (pFlac->currentFLACFrame.pcmFramesRemaining > pcmFramesToSeek) {
5772 pcmFramesRead += pcmFramesToSeek;
5773 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)pcmFramesToSeek; /* <-- Safe cast. Will always be < currentFrame.pcmFramesRemaining < 65536. */
5774 pcmFramesToSeek = 0;
5775 } else {
5776 pcmFramesRead += pFlac->currentFLACFrame.pcmFramesRemaining;
5777 pcmFramesToSeek -= pFlac->currentFLACFrame.pcmFramesRemaining;
5778 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
5779 }
5780 }
5781 }
5782
5783 pFlac->currentPCMFrame += pcmFramesRead;
5784 return pcmFramesRead;
5785}
5786
5787
5788static drflac_bool32 drflac__seek_to_pcm_frame__brute_force(drflac* pFlac, drflac_uint64 pcmFrameIndex)
5789{
5790 drflac_bool32 isMidFrame = DRFLAC_FALSE;
5791 drflac_uint64 runningPCMFrameCount;
5792
5793 DRFLAC_ASSERT(pFlac != NULL);
5794
5795 /* If we are seeking forward we start from the current position. Otherwise we need to start all the way from the start of the file. */
5796 if (pcmFrameIndex >= pFlac->currentPCMFrame) {
5797 /* Seeking forward. Need to seek from the current position. */
5798 runningPCMFrameCount = pFlac->currentPCMFrame;
5799
5800 /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
5801 if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
5802 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5803 return DRFLAC_FALSE;
5804 }
5805 } else {
5806 isMidFrame = DRFLAC_TRUE;
5807 }
5808 } else {
5809 /* Seeking backwards. Need to seek from the start of the file. */
5810 runningPCMFrameCount = 0;
5811
5812 /* Move back to the start. */
5813 if (!drflac__seek_to_first_frame(pFlac)) {
5814 return DRFLAC_FALSE;
5815 }
5816
5817 /* Decode the first frame in preparation for sample-exact seeking below. */
5818 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5819 return DRFLAC_FALSE;
5820 }
5821 }
5822
5823 /*
5824 We need to as quickly as possible find the frame that contains the target sample. To do this, we iterate over each frame and inspect its
5825 header. If based on the header we can determine that the frame contains the sample, we do a full decode of that frame.
5826 */
5827 for (;;) {
5828 drflac_uint64 pcmFrameCountInThisFLACFrame;
5829 drflac_uint64 firstPCMFrameInFLACFrame = 0;
5830 drflac_uint64 lastPCMFrameInFLACFrame = 0;
5831
5832 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
5833
5834 pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
5835 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
5836 /*
5837 The sample should be in this frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
5838 it never existed and keep iterating.
5839 */
5840 drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
5841
5842 if (!isMidFrame) {
5843 drflac_result result = drflac__decode_flac_frame(pFlac);
5844 if (result == DRFLAC_SUCCESS) {
5845 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
5846 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
5847 } else {
5848 if (result == DRFLAC_CRC_MISMATCH) {
5849 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
5850 } else {
5851 return DRFLAC_FALSE;
5852 }
5853 }
5854 } else {
5855 /* We started seeking mid-frame which means we need to skip the frame decoding part. */
5856 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
5857 }
5858 } else {
5859 /*
5860 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
5861 frame never existed and leave the running sample count untouched.
5862 */
5863 if (!isMidFrame) {
5864 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
5865 if (result == DRFLAC_SUCCESS) {
5866 runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
5867 } else {
5868 if (result == DRFLAC_CRC_MISMATCH) {
5869 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
5870 } else {
5871 return DRFLAC_FALSE;
5872 }
5873 }
5874 } else {
5875 /*
5876 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
5877 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
5878 */
5879 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
5880 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
5881 isMidFrame = DRFLAC_FALSE;
5882 }
5883
5884 /* If we are seeking to the end of the file and we've just hit it, we're done. */
5885 if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
5886 return DRFLAC_TRUE;
5887 }
5888 }
5889
5890 next_iteration:
5891 /* Grab the next frame in preparation for the next iteration. */
5892 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5893 return DRFLAC_FALSE;
5894 }
5895 }
5896}
5897
5898
5899#if !defined(DR_FLAC_NO_CRC)
5900/*
5901We use an average compression ratio to determine our approximate start location. FLAC files are generally about 50%-70% the size of their
5902uncompressed counterparts so we'll use this as a basis. I'm going to split the middle and use a factor of 0.6 to determine the starting
5903location.
5904*/
5905#define DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO 0.6f
5906
5907static drflac_bool32 drflac__seek_to_approximate_flac_frame_to_byte(drflac* pFlac, drflac_uint64 targetByte, drflac_uint64 rangeLo, drflac_uint64 rangeHi, drflac_uint64* pLastSuccessfulSeekOffset)
5908{
5909 DRFLAC_ASSERT(pFlac != NULL);
5910 DRFLAC_ASSERT(pLastSuccessfulSeekOffset != NULL);
5911 DRFLAC_ASSERT(targetByte >= rangeLo);
5912 DRFLAC_ASSERT(targetByte <= rangeHi);
5913
5914 *pLastSuccessfulSeekOffset = pFlac->firstFLACFramePosInBytes;
5915
5916 for (;;) {
5917 /* After rangeLo == rangeHi == targetByte fails, we need to break out. */
5918 drflac_uint64 lastTargetByte = targetByte;
5919
5920 /* When seeking to a byte, failure probably means we've attempted to seek beyond the end of the stream. To counter this we just halve it each attempt. */
5921 if (!drflac__seek_to_byte(&pFlac->bs, targetByte)) {
5922 /* If we couldn't even seek to the first byte in the stream we have a problem. Just abandon the whole thing. */
5923 if (targetByte == 0) {
5924 drflac__seek_to_first_frame(pFlac); /* Try to recover. */
5925 return DRFLAC_FALSE;
5926 }
5927
5928 /* Halve the byte location and continue. */
5929 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5930 rangeHi = targetByte;
5931 } else {
5932 /* Getting here should mean that we have seeked to an appropriate byte. */
5933
5934 /* Clear the details of the FLAC frame so we don't misreport data. */
5935 DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
5936
5937 /*
5938 Now seek to the next FLAC frame. We need to decode the entire frame (not just the header) because it's possible for the header to incorrectly pass the
5939 CRC check and return bad data. We need to decode the entire frame to be more certain. Although this seems unlikely, this has happened to me in testing
5940 so it needs to stay this way for now.
5941 */
5942#if 1
5943 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
5944 /* Halve the byte location and continue. */
5945 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5946 rangeHi = targetByte;
5947 } else {
5948 break;
5949 }
5950#else
5951 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5952 /* Halve the byte location and continue. */
5953 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5954 rangeHi = targetByte;
5955 } else {
5956 break;
5957 }
5958#endif
5959 }
5960
5961 /* We already tried this byte and there are no more to try, break out. */
5962 if(targetByte == lastTargetByte) {
5963 return DRFLAC_FALSE;
5964 }
5965 }
5966
5967 /* The current PCM frame needs to be updated based on the frame we just seeked to. */
5968 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
5969
5970 DRFLAC_ASSERT(targetByte <= rangeHi);
5971
5972 *pLastSuccessfulSeekOffset = targetByte;
5973 return DRFLAC_TRUE;
5974}
5975
5976static drflac_bool32 drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 offset)
5977{
9e052883 5978 /* This section of code would be used if we were only decoding the FLAC frame header when calling drflac__seek_to_approximate_flac_frame_to_byte(). */
5979#if 0
5980 if (drflac__decode_flac_frame(pFlac) != DRFLAC_SUCCESS) {
5981 /* We failed to decode this frame which may be due to it being corrupt. We'll just use the next valid FLAC frame. */
5982 if (drflac__read_and_decode_next_flac_frame(pFlac) == DRFLAC_FALSE) {
5983 return DRFLAC_FALSE;
5984 }
5985 }
5986#endif
5987
2ff0b512 5988 return drflac__seek_forward_by_pcm_frames(pFlac, offset) == offset;
5989}
5990
5991
5992static drflac_bool32 drflac__seek_to_pcm_frame__binary_search_internal(drflac* pFlac, drflac_uint64 pcmFrameIndex, drflac_uint64 byteRangeLo, drflac_uint64 byteRangeHi)
5993{
5994 /* This assumes pFlac->currentPCMFrame is sitting on byteRangeLo upon entry. */
5995
5996 drflac_uint64 targetByte;
5997 drflac_uint64 pcmRangeLo = pFlac->totalPCMFrameCount;
5998 drflac_uint64 pcmRangeHi = 0;
5999 drflac_uint64 lastSuccessfulSeekOffset = (drflac_uint64)-1;
6000 drflac_uint64 closestSeekOffsetBeforeTargetPCMFrame = byteRangeLo;
6001 drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
6002
6003 targetByte = byteRangeLo + (drflac_uint64)(((drflac_int64)((pcmFrameIndex - pFlac->currentPCMFrame) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO);
6004 if (targetByte > byteRangeHi) {
6005 targetByte = byteRangeHi;
6006 }
6007
6008 for (;;) {
6009 if (drflac__seek_to_approximate_flac_frame_to_byte(pFlac, targetByte, byteRangeLo, byteRangeHi, &lastSuccessfulSeekOffset)) {
6010 /* We found a FLAC frame. We need to check if it contains the sample we're looking for. */
6011 drflac_uint64 newPCMRangeLo;
6012 drflac_uint64 newPCMRangeHi;
6013 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &newPCMRangeLo, &newPCMRangeHi);
6014
6015 /* If we selected the same frame, it means we should be pretty close. Just decode the rest. */
6016 if (pcmRangeLo == newPCMRangeLo) {
6017 if (!drflac__seek_to_approximate_flac_frame_to_byte(pFlac, closestSeekOffsetBeforeTargetPCMFrame, closestSeekOffsetBeforeTargetPCMFrame, byteRangeHi, &lastSuccessfulSeekOffset)) {
6018 break; /* Failed to seek to closest frame. */
6019 }
6020
6021 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
6022 return DRFLAC_TRUE;
6023 } else {
6024 break; /* Failed to seek forward. */
6025 }
6026 }
6027
6028 pcmRangeLo = newPCMRangeLo;
6029 pcmRangeHi = newPCMRangeHi;
6030
6031 if (pcmRangeLo <= pcmFrameIndex && pcmRangeHi >= pcmFrameIndex) {
6032 /* The target PCM frame is in this FLAC frame. */
6033 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame) ) {
6034 return DRFLAC_TRUE;
6035 } else {
6036 break; /* Failed to seek to FLAC frame. */
6037 }
6038 } else {
6039 const float approxCompressionRatio = (drflac_int64)(lastSuccessfulSeekOffset - pFlac->firstFLACFramePosInBytes) / ((drflac_int64)(pcmRangeLo * pFlac->channels * pFlac->bitsPerSample)/8.0f);
6040
6041 if (pcmRangeLo > pcmFrameIndex) {
6042 /* We seeked too far forward. We need to move our target byte backward and try again. */
6043 byteRangeHi = lastSuccessfulSeekOffset;
6044 if (byteRangeLo > byteRangeHi) {
6045 byteRangeLo = byteRangeHi;
6046 }
6047
6048 targetByte = byteRangeLo + ((byteRangeHi - byteRangeLo) / 2);
6049 if (targetByte < byteRangeLo) {
6050 targetByte = byteRangeLo;
6051 }
6052 } else /*if (pcmRangeHi < pcmFrameIndex)*/ {
6053 /* We didn't seek far enough. We need to move our target byte forward and try again. */
6054
6055 /* If we're close enough we can just seek forward. */
6056 if ((pcmFrameIndex - pcmRangeLo) < seekForwardThreshold) {
6057 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
6058 return DRFLAC_TRUE;
6059 } else {
6060 break; /* Failed to seek to FLAC frame. */
6061 }
6062 } else {
6063 byteRangeLo = lastSuccessfulSeekOffset;
6064 if (byteRangeHi < byteRangeLo) {
6065 byteRangeHi = byteRangeLo;
6066 }
6067
6068 targetByte = lastSuccessfulSeekOffset + (drflac_uint64)(((drflac_int64)((pcmFrameIndex-pcmRangeLo) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * approxCompressionRatio);
6069 if (targetByte > byteRangeHi) {
6070 targetByte = byteRangeHi;
6071 }
6072
6073 if (closestSeekOffsetBeforeTargetPCMFrame < lastSuccessfulSeekOffset) {
6074 closestSeekOffsetBeforeTargetPCMFrame = lastSuccessfulSeekOffset;
6075 }
6076 }
6077 }
6078 }
6079 } else {
6080 /* Getting here is really bad. We just recover as best we can, but moving to the first frame in the stream, and then abort. */
6081 break;
6082 }
6083 }
6084
6085 drflac__seek_to_first_frame(pFlac); /* <-- Try to recover. */
6086 return DRFLAC_FALSE;
6087}
6088
6089static drflac_bool32 drflac__seek_to_pcm_frame__binary_search(drflac* pFlac, drflac_uint64 pcmFrameIndex)
6090{
6091 drflac_uint64 byteRangeLo;
6092 drflac_uint64 byteRangeHi;
6093 drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
6094
6095 /* Our algorithm currently assumes the FLAC stream is currently sitting at the start. */
6096 if (drflac__seek_to_first_frame(pFlac) == DRFLAC_FALSE) {
6097 return DRFLAC_FALSE;
6098 }
6099
6100 /* If we're close enough to the start, just move to the start and seek forward. */
6101 if (pcmFrameIndex < seekForwardThreshold) {
6102 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFrameIndex) == pcmFrameIndex;
6103 }
6104
6105 /*
6106 Our starting byte range is the byte position of the first FLAC frame and the approximate end of the file as if it were completely uncompressed. This ensures
6107 the entire file is included, even though most of the time it'll exceed the end of the actual stream. This is OK as the frame searching logic will handle it.
6108 */
6109 byteRangeLo = pFlac->firstFLACFramePosInBytes;
6110 byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
6111
6112 return drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi);
6113}
6114#endif /* !DR_FLAC_NO_CRC */
6115
6116static drflac_bool32 drflac__seek_to_pcm_frame__seek_table(drflac* pFlac, drflac_uint64 pcmFrameIndex)
6117{
6118 drflac_uint32 iClosestSeekpoint = 0;
6119 drflac_bool32 isMidFrame = DRFLAC_FALSE;
6120 drflac_uint64 runningPCMFrameCount;
6121 drflac_uint32 iSeekpoint;
6122
6123
6124 DRFLAC_ASSERT(pFlac != NULL);
6125
6126 if (pFlac->pSeekpoints == NULL || pFlac->seekpointCount == 0) {
6127 return DRFLAC_FALSE;
6128 }
6129
9e052883 6130 /* Do not use the seektable if pcmFramIndex is not coverd by it. */
6131 if (pFlac->pSeekpoints[0].firstPCMFrame > pcmFrameIndex) {
6132 return DRFLAC_FALSE;
6133 }
6134
2ff0b512 6135 for (iSeekpoint = 0; iSeekpoint < pFlac->seekpointCount; ++iSeekpoint) {
6136 if (pFlac->pSeekpoints[iSeekpoint].firstPCMFrame >= pcmFrameIndex) {
6137 break;
6138 }
6139
6140 iClosestSeekpoint = iSeekpoint;
6141 }
6142
6143 /* There's been cases where the seek table contains only zeros. We need to do some basic validation on the closest seekpoint. */
6144 if (pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount == 0 || pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount > pFlac->maxBlockSizeInPCMFrames) {
6145 return DRFLAC_FALSE;
6146 }
6147 if (pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame > pFlac->totalPCMFrameCount && pFlac->totalPCMFrameCount > 0) {
6148 return DRFLAC_FALSE;
6149 }
6150
6151#if !defined(DR_FLAC_NO_CRC)
6152 /* At this point we should know the closest seek point. We can use a binary search for this. We need to know the total sample count for this. */
6153 if (pFlac->totalPCMFrameCount > 0) {
6154 drflac_uint64 byteRangeLo;
6155 drflac_uint64 byteRangeHi;
6156
6157 byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
6158 byteRangeLo = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset;
6159
6160 /*
6161 If our closest seek point is not the last one, we only need to search between it and the next one. The section below calculates an appropriate starting
6162 value for byteRangeHi which will clamp it appropriately.
6163
6164 Note that the next seekpoint must have an offset greater than the closest seekpoint because otherwise our binary search algorithm will break down. There
6165 have been cases where a seektable consists of seek points where every byte offset is set to 0 which causes problems. If this happens we need to abort.
6166 */
6167 if (iClosestSeekpoint < pFlac->seekpointCount-1) {
6168 drflac_uint32 iNextSeekpoint = iClosestSeekpoint + 1;
6169
6170 /* Basic validation on the seekpoints to ensure they're usable. */
6171 if (pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset >= pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset || pFlac->pSeekpoints[iNextSeekpoint].pcmFrameCount == 0) {
6172 return DRFLAC_FALSE; /* The next seekpoint doesn't look right. The seek table cannot be trusted from here. Abort. */
6173 }
6174
6175 if (pFlac->pSeekpoints[iNextSeekpoint].firstPCMFrame != (((drflac_uint64)0xFFFFFFFF << 32) | 0xFFFFFFFF)) { /* Make sure it's not a placeholder seekpoint. */
6176 byteRangeHi = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset - 1; /* byteRangeHi must be zero based. */
6177 }
6178 }
6179
6180 if (drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
6181 if (drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6182 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
6183
6184 if (drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi)) {
6185 return DRFLAC_TRUE;
6186 }
6187 }
6188 }
6189 }
6190#endif /* !DR_FLAC_NO_CRC */
6191
6192 /* Getting here means we need to use a slower algorithm because the binary search method failed or cannot be used. */
6193
6194 /*
6195 If we are seeking forward and the closest seekpoint is _before_ the current sample, we just seek forward from where we are. Otherwise we start seeking
6196 from the seekpoint's first sample.
6197 */
6198 if (pcmFrameIndex >= pFlac->currentPCMFrame && pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame <= pFlac->currentPCMFrame) {
6199 /* Optimized case. Just seek forward from where we are. */
6200 runningPCMFrameCount = pFlac->currentPCMFrame;
6201
6202 /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
6203 if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
6204 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6205 return DRFLAC_FALSE;
6206 }
6207 } else {
6208 isMidFrame = DRFLAC_TRUE;
6209 }
6210 } else {
6211 /* Slower case. Seek to the start of the seekpoint and then seek forward from there. */
6212 runningPCMFrameCount = pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame;
6213
6214 if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
6215 return DRFLAC_FALSE;
6216 }
6217
6218 /* Grab the frame the seekpoint is sitting on in preparation for the sample-exact seeking below. */
6219 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6220 return DRFLAC_FALSE;
6221 }
6222 }
6223
6224 for (;;) {
6225 drflac_uint64 pcmFrameCountInThisFLACFrame;
6226 drflac_uint64 firstPCMFrameInFLACFrame = 0;
6227 drflac_uint64 lastPCMFrameInFLACFrame = 0;
6228
6229 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
6230
6231 pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
6232 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
6233 /*
6234 The sample should be in this frame. We need to fully decode it, but if it's an invalid frame (a CRC mismatch) we need to pretend
6235 it never existed and keep iterating.
6236 */
6237 drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
6238
6239 if (!isMidFrame) {
6240 drflac_result result = drflac__decode_flac_frame(pFlac);
6241 if (result == DRFLAC_SUCCESS) {
6242 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
6243 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
6244 } else {
6245 if (result == DRFLAC_CRC_MISMATCH) {
6246 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
6247 } else {
6248 return DRFLAC_FALSE;
6249 }
6250 }
6251 } else {
6252 /* We started seeking mid-frame which means we need to skip the frame decoding part. */
6253 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
6254 }
6255 } else {
6256 /*
6257 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
6258 frame never existed and leave the running sample count untouched.
6259 */
6260 if (!isMidFrame) {
6261 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
6262 if (result == DRFLAC_SUCCESS) {
6263 runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
6264 } else {
6265 if (result == DRFLAC_CRC_MISMATCH) {
6266 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
6267 } else {
6268 return DRFLAC_FALSE;
6269 }
6270 }
6271 } else {
6272 /*
6273 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
6274 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
6275 */
6276 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
6277 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
6278 isMidFrame = DRFLAC_FALSE;
6279 }
6280
6281 /* If we are seeking to the end of the file and we've just hit it, we're done. */
6282 if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
6283 return DRFLAC_TRUE;
6284 }
6285 }
6286
6287 next_iteration:
6288 /* Grab the next frame in preparation for the next iteration. */
6289 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6290 return DRFLAC_FALSE;
6291 }
6292 }
6293}
6294
6295
6296#ifndef DR_FLAC_NO_OGG
6297typedef struct
6298{
6299 drflac_uint8 capturePattern[4]; /* Should be "OggS" */
6300 drflac_uint8 structureVersion; /* Always 0. */
6301 drflac_uint8 headerType;
6302 drflac_uint64 granulePosition;
6303 drflac_uint32 serialNumber;
6304 drflac_uint32 sequenceNumber;
6305 drflac_uint32 checksum;
6306 drflac_uint8 segmentCount;
6307 drflac_uint8 segmentTable[255];
6308} drflac_ogg_page_header;
6309#endif
6310
6311typedef struct
6312{
6313 drflac_read_proc onRead;
6314 drflac_seek_proc onSeek;
6315 drflac_meta_proc onMeta;
6316 drflac_container container;
6317 void* pUserData;
6318 void* pUserDataMD;
6319 drflac_uint32 sampleRate;
6320 drflac_uint8 channels;
6321 drflac_uint8 bitsPerSample;
6322 drflac_uint64 totalPCMFrameCount;
6323 drflac_uint16 maxBlockSizeInPCMFrames;
6324 drflac_uint64 runningFilePos;
6325 drflac_bool32 hasStreamInfoBlock;
6326 drflac_bool32 hasMetadataBlocks;
6327 drflac_bs bs; /* <-- A bit streamer is required for loading data during initialization. */
6328 drflac_frame_header firstFrameHeader; /* <-- The header of the first frame that was read during relaxed initalization. Only set if there is no STREAMINFO block. */
6329
6330#ifndef DR_FLAC_NO_OGG
6331 drflac_uint32 oggSerial;
6332 drflac_uint64 oggFirstBytePos;
6333 drflac_ogg_page_header oggBosHeader;
6334#endif
6335} drflac_init_info;
6336
6337static DRFLAC_INLINE void drflac__decode_block_header(drflac_uint32 blockHeader, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
6338{
6339 blockHeader = drflac__be2host_32(blockHeader);
6340 *isLastBlock = (drflac_uint8)((blockHeader & 0x80000000UL) >> 31);
6341 *blockType = (drflac_uint8)((blockHeader & 0x7F000000UL) >> 24);
6342 *blockSize = (blockHeader & 0x00FFFFFFUL);
6343}
6344
6345static DRFLAC_INLINE drflac_bool32 drflac__read_and_decode_block_header(drflac_read_proc onRead, void* pUserData, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
6346{
6347 drflac_uint32 blockHeader;
6348
6349 *blockSize = 0;
6350 if (onRead(pUserData, &blockHeader, 4) != 4) {
6351 return DRFLAC_FALSE;
6352 }
6353
6354 drflac__decode_block_header(blockHeader, isLastBlock, blockType, blockSize);
6355 return DRFLAC_TRUE;
6356}
6357
6358static drflac_bool32 drflac__read_streaminfo(drflac_read_proc onRead, void* pUserData, drflac_streaminfo* pStreamInfo)
6359{
6360 drflac_uint32 blockSizes;
6361 drflac_uint64 frameSizes = 0;
6362 drflac_uint64 importantProps;
6363 drflac_uint8 md5[16];
6364
6365 /* min/max block size. */
6366 if (onRead(pUserData, &blockSizes, 4) != 4) {
6367 return DRFLAC_FALSE;
6368 }
6369
6370 /* min/max frame size. */
6371 if (onRead(pUserData, &frameSizes, 6) != 6) {
6372 return DRFLAC_FALSE;
6373 }
6374
6375 /* Sample rate, channels, bits per sample and total sample count. */
6376 if (onRead(pUserData, &importantProps, 8) != 8) {
6377 return DRFLAC_FALSE;
6378 }
6379
6380 /* MD5 */
6381 if (onRead(pUserData, md5, sizeof(md5)) != sizeof(md5)) {
6382 return DRFLAC_FALSE;
6383 }
6384
6385 blockSizes = drflac__be2host_32(blockSizes);
6386 frameSizes = drflac__be2host_64(frameSizes);
6387 importantProps = drflac__be2host_64(importantProps);
6388
6389 pStreamInfo->minBlockSizeInPCMFrames = (drflac_uint16)((blockSizes & 0xFFFF0000) >> 16);
6390 pStreamInfo->maxBlockSizeInPCMFrames = (drflac_uint16) (blockSizes & 0x0000FFFF);
6391 pStreamInfo->minFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)0x00FFFFFF << 16) << 24)) >> 40);
6392 pStreamInfo->maxFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)0x00FFFFFF << 16) << 0)) >> 16);
6393 pStreamInfo->sampleRate = (drflac_uint32)((importantProps & (((drflac_uint64)0x000FFFFF << 16) << 28)) >> 44);
6394 pStreamInfo->channels = (drflac_uint8 )((importantProps & (((drflac_uint64)0x0000000E << 16) << 24)) >> 41) + 1;
6395 pStreamInfo->bitsPerSample = (drflac_uint8 )((importantProps & (((drflac_uint64)0x0000001F << 16) << 20)) >> 36) + 1;
6396 pStreamInfo->totalPCMFrameCount = ((importantProps & ((((drflac_uint64)0x0000000F << 16) << 16) | 0xFFFFFFFF)));
6397 DRFLAC_COPY_MEMORY(pStreamInfo->md5, md5, sizeof(md5));
6398
6399 return DRFLAC_TRUE;
6400}
6401
6402
6403static void* drflac__malloc_default(size_t sz, void* pUserData)
6404{
6405 (void)pUserData;
6406 return DRFLAC_MALLOC(sz);
6407}
6408
6409static void* drflac__realloc_default(void* p, size_t sz, void* pUserData)
6410{
6411 (void)pUserData;
6412 return DRFLAC_REALLOC(p, sz);
6413}
6414
6415static void drflac__free_default(void* p, void* pUserData)
6416{
6417 (void)pUserData;
6418 DRFLAC_FREE(p);
6419}
6420
6421
6422static void* drflac__malloc_from_callbacks(size_t sz, const drflac_allocation_callbacks* pAllocationCallbacks)
6423{
6424 if (pAllocationCallbacks == NULL) {
6425 return NULL;
6426 }
6427
6428 if (pAllocationCallbacks->onMalloc != NULL) {
6429 return pAllocationCallbacks->onMalloc(sz, pAllocationCallbacks->pUserData);
6430 }
6431
6432 /* Try using realloc(). */
6433 if (pAllocationCallbacks->onRealloc != NULL) {
6434 return pAllocationCallbacks->onRealloc(NULL, sz, pAllocationCallbacks->pUserData);
6435 }
6436
6437 return NULL;
6438}
6439
6440static void* drflac__realloc_from_callbacks(void* p, size_t szNew, size_t szOld, const drflac_allocation_callbacks* pAllocationCallbacks)
6441{
6442 if (pAllocationCallbacks == NULL) {
6443 return NULL;
6444 }
6445
6446 if (pAllocationCallbacks->onRealloc != NULL) {
6447 return pAllocationCallbacks->onRealloc(p, szNew, pAllocationCallbacks->pUserData);
6448 }
6449
6450 /* Try emulating realloc() in terms of malloc()/free(). */
6451 if (pAllocationCallbacks->onMalloc != NULL && pAllocationCallbacks->onFree != NULL) {
6452 void* p2;
6453
6454 p2 = pAllocationCallbacks->onMalloc(szNew, pAllocationCallbacks->pUserData);
6455 if (p2 == NULL) {
6456 return NULL;
6457 }
6458
6459 if (p != NULL) {
6460 DRFLAC_COPY_MEMORY(p2, p, szOld);
6461 pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
6462 }
6463
6464 return p2;
6465 }
6466
6467 return NULL;
6468}
6469
6470static void drflac__free_from_callbacks(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
6471{
6472 if (p == NULL || pAllocationCallbacks == NULL) {
6473 return;
6474 }
6475
6476 if (pAllocationCallbacks->onFree != NULL) {
6477 pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
6478 }
6479}
6480
6481
9e052883 6482static drflac_bool32 drflac__read_and_decode_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_uint64* pFirstFramePos, drflac_uint64* pSeektablePos, drflac_uint32* pSeekpointCount, drflac_allocation_callbacks* pAllocationCallbacks)
2ff0b512 6483{
6484 /*
6485 We want to keep track of the byte position in the stream of the seektable. At the time of calling this function we know that
6486 we'll be sitting on byte 42.
6487 */
6488 drflac_uint64 runningFilePos = 42;
6489 drflac_uint64 seektablePos = 0;
6490 drflac_uint32 seektableSize = 0;
6491
6492 for (;;) {
6493 drflac_metadata metadata;
6494 drflac_uint8 isLastBlock = 0;
648db22b 6495 drflac_uint8 blockType = 0;
2ff0b512 6496 drflac_uint32 blockSize;
6497 if (drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize) == DRFLAC_FALSE) {
6498 return DRFLAC_FALSE;
6499 }
6500 runningFilePos += 4;
6501
6502 metadata.type = blockType;
6503 metadata.pRawData = NULL;
6504 metadata.rawDataSize = 0;
6505
6506 switch (blockType)
6507 {
6508 case DRFLAC_METADATA_BLOCK_TYPE_APPLICATION:
6509 {
6510 if (blockSize < 4) {
6511 return DRFLAC_FALSE;
6512 }
6513
6514 if (onMeta) {
6515 void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6516 if (pRawData == NULL) {
6517 return DRFLAC_FALSE;
6518 }
6519
6520 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6521 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6522 return DRFLAC_FALSE;
6523 }
6524
6525 metadata.pRawData = pRawData;
6526 metadata.rawDataSize = blockSize;
6527 metadata.data.application.id = drflac__be2host_32(*(drflac_uint32*)pRawData);
6528 metadata.data.application.pData = (const void*)((drflac_uint8*)pRawData + sizeof(drflac_uint32));
6529 metadata.data.application.dataSize = blockSize - sizeof(drflac_uint32);
6530 onMeta(pUserDataMD, &metadata);
6531
6532 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6533 }
6534 } break;
6535
6536 case DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE:
6537 {
6538 seektablePos = runningFilePos;
6539 seektableSize = blockSize;
6540
6541 if (onMeta) {
9e052883 6542 drflac_uint32 seekpointCount;
2ff0b512 6543 drflac_uint32 iSeekpoint;
6544 void* pRawData;
6545
9e052883 6546 seekpointCount = blockSize/DRFLAC_SEEKPOINT_SIZE_IN_BYTES;
6547
6548 pRawData = drflac__malloc_from_callbacks(seekpointCount * sizeof(drflac_seekpoint), pAllocationCallbacks);
2ff0b512 6549 if (pRawData == NULL) {
6550 return DRFLAC_FALSE;
6551 }
6552
9e052883 6553 /* We need to read seekpoint by seekpoint and do some processing. */
6554 for (iSeekpoint = 0; iSeekpoint < seekpointCount; ++iSeekpoint) {
6555 drflac_seekpoint* pSeekpoint = (drflac_seekpoint*)pRawData + iSeekpoint;
2ff0b512 6556
9e052883 6557 if (onRead(pUserData, pSeekpoint, DRFLAC_SEEKPOINT_SIZE_IN_BYTES) != DRFLAC_SEEKPOINT_SIZE_IN_BYTES) {
6558 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6559 return DRFLAC_FALSE;
6560 }
2ff0b512 6561
9e052883 6562 /* Endian swap. */
2ff0b512 6563 pSeekpoint->firstPCMFrame = drflac__be2host_64(pSeekpoint->firstPCMFrame);
6564 pSeekpoint->flacFrameOffset = drflac__be2host_64(pSeekpoint->flacFrameOffset);
6565 pSeekpoint->pcmFrameCount = drflac__be2host_16(pSeekpoint->pcmFrameCount);
6566 }
6567
9e052883 6568 metadata.pRawData = pRawData;
6569 metadata.rawDataSize = blockSize;
6570 metadata.data.seektable.seekpointCount = seekpointCount;
6571 metadata.data.seektable.pSeekpoints = (const drflac_seekpoint*)pRawData;
6572
2ff0b512 6573 onMeta(pUserDataMD, &metadata);
6574
6575 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6576 }
6577 } break;
6578
6579 case DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT:
6580 {
6581 if (blockSize < 8) {
6582 return DRFLAC_FALSE;
6583 }
6584
6585 if (onMeta) {
6586 void* pRawData;
6587 const char* pRunningData;
6588 const char* pRunningDataEnd;
6589 drflac_uint32 i;
6590
6591 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6592 if (pRawData == NULL) {
6593 return DRFLAC_FALSE;
6594 }
6595
6596 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6597 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6598 return DRFLAC_FALSE;
6599 }
6600
6601 metadata.pRawData = pRawData;
6602 metadata.rawDataSize = blockSize;
6603
6604 pRunningData = (const char*)pRawData;
6605 pRunningDataEnd = (const char*)pRawData + blockSize;
6606
9e052883 6607 metadata.data.vorbis_comment.vendorLength = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
2ff0b512 6608
6609 /* Need space for the rest of the block */
6610 if ((pRunningDataEnd - pRunningData) - 4 < (drflac_int64)metadata.data.vorbis_comment.vendorLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6611 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6612 return DRFLAC_FALSE;
6613 }
6614 metadata.data.vorbis_comment.vendor = pRunningData; pRunningData += metadata.data.vorbis_comment.vendorLength;
9e052883 6615 metadata.data.vorbis_comment.commentCount = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
2ff0b512 6616
6617 /* Need space for 'commentCount' comments after the block, which at minimum is a drflac_uint32 per comment */
6618 if ((pRunningDataEnd - pRunningData) / sizeof(drflac_uint32) < metadata.data.vorbis_comment.commentCount) { /* <-- Note the order of operations to avoid overflow to a valid value */
6619 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6620 return DRFLAC_FALSE;
6621 }
6622 metadata.data.vorbis_comment.pComments = pRunningData;
6623
6624 /* Check that the comments section is valid before passing it to the callback */
6625 for (i = 0; i < metadata.data.vorbis_comment.commentCount; ++i) {
6626 drflac_uint32 commentLength;
6627
6628 if (pRunningDataEnd - pRunningData < 4) {
6629 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6630 return DRFLAC_FALSE;
6631 }
6632
9e052883 6633 commentLength = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
2ff0b512 6634 if (pRunningDataEnd - pRunningData < (drflac_int64)commentLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6635 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6636 return DRFLAC_FALSE;
6637 }
6638 pRunningData += commentLength;
6639 }
6640
6641 onMeta(pUserDataMD, &metadata);
6642
6643 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6644 }
6645 } break;
6646
6647 case DRFLAC_METADATA_BLOCK_TYPE_CUESHEET:
6648 {
6649 if (blockSize < 396) {
6650 return DRFLAC_FALSE;
6651 }
6652
6653 if (onMeta) {
6654 void* pRawData;
6655 const char* pRunningData;
6656 const char* pRunningDataEnd;
9e052883 6657 size_t bufferSize;
2ff0b512 6658 drflac_uint8 iTrack;
6659 drflac_uint8 iIndex;
9e052883 6660 void* pTrackData;
2ff0b512 6661
9e052883 6662 /*
6663 This needs to be loaded in two passes. The first pass is used to calculate the size of the memory allocation
6664 we need for storing the necessary data. The second pass will fill that buffer with usable data.
6665 */
2ff0b512 6666 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6667 if (pRawData == NULL) {
6668 return DRFLAC_FALSE;
6669 }
6670
6671 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6672 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6673 return DRFLAC_FALSE;
6674 }
6675
6676 metadata.pRawData = pRawData;
6677 metadata.rawDataSize = blockSize;
6678
6679 pRunningData = (const char*)pRawData;
6680 pRunningDataEnd = (const char*)pRawData + blockSize;
6681
6682 DRFLAC_COPY_MEMORY(metadata.data.cuesheet.catalog, pRunningData, 128); pRunningData += 128;
6683 metadata.data.cuesheet.leadInSampleCount = drflac__be2host_64(*(const drflac_uint64*)pRunningData); pRunningData += 8;
6684 metadata.data.cuesheet.isCD = (pRunningData[0] & 0x80) != 0; pRunningData += 259;
6685 metadata.data.cuesheet.trackCount = pRunningData[0]; pRunningData += 1;
9e052883 6686 metadata.data.cuesheet.pTrackData = NULL; /* Will be filled later. */
6687
6688 /* Pass 1: Calculate the size of the buffer for the track data. */
6689 {
6690 const char* pRunningDataSaved = pRunningData; /* Will be restored at the end in preparation for the second pass. */
2ff0b512 6691
9e052883 6692 bufferSize = metadata.data.cuesheet.trackCount * DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES;
2ff0b512 6693
9e052883 6694 for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
6695 drflac_uint8 indexCount;
6696 drflac_uint32 indexPointSize;
6697
6698 if (pRunningDataEnd - pRunningData < DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES) {
6699 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6700 return DRFLAC_FALSE;
6701 }
6702
6703 /* Skip to the index point count */
6704 pRunningData += 35;
6705
6706 indexCount = pRunningData[0];
6707 pRunningData += 1;
6708
6709 bufferSize += indexCount * sizeof(drflac_cuesheet_track_index);
6710
6711 /* Quick validation check. */
6712 indexPointSize = indexCount * DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES;
6713 if (pRunningDataEnd - pRunningData < (drflac_int64)indexPointSize) {
6714 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6715 return DRFLAC_FALSE;
6716 }
6717
6718 pRunningData += indexPointSize;
2ff0b512 6719 }
6720
9e052883 6721 pRunningData = pRunningDataSaved;
6722 }
6723
6724 /* Pass 2: Allocate a buffer and fill the data. Validation was done in the step above so can be skipped. */
6725 {
6726 char* pRunningTrackData;
6727
6728 pTrackData = drflac__malloc_from_callbacks(bufferSize, pAllocationCallbacks);
6729 if (pTrackData == NULL) {
2ff0b512 6730 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6731 return DRFLAC_FALSE;
6732 }
6733
9e052883 6734 pRunningTrackData = (char*)pTrackData;
6735
6736 for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
6737 drflac_uint8 indexCount;
6738
6739 DRFLAC_COPY_MEMORY(pRunningTrackData, pRunningData, DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES);
6740 pRunningData += DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES-1; /* Skip forward, but not beyond the last byte in the CUESHEET_TRACK block which is the index count. */
6741 pRunningTrackData += DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES-1;
6742
6743 /* Grab the index count for the next part. */
6744 indexCount = pRunningData[0];
6745 pRunningData += 1;
6746 pRunningTrackData += 1;
6747
6748 /* Extract each track index. */
6749 for (iIndex = 0; iIndex < indexCount; ++iIndex) {
6750 drflac_cuesheet_track_index* pTrackIndex = (drflac_cuesheet_track_index*)pRunningTrackData;
6751
6752 DRFLAC_COPY_MEMORY(pRunningTrackData, pRunningData, DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES);
6753 pRunningData += DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES;
6754 pRunningTrackData += sizeof(drflac_cuesheet_track_index);
6755
6756 pTrackIndex->offset = drflac__be2host_64(pTrackIndex->offset);
6757 }
2ff0b512 6758 }
9e052883 6759
6760 metadata.data.cuesheet.pTrackData = pTrackData;
2ff0b512 6761 }
6762
9e052883 6763 /* The original data is no longer needed. */
6764 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6765 pRawData = NULL;
6766
2ff0b512 6767 onMeta(pUserDataMD, &metadata);
6768
9e052883 6769 drflac__free_from_callbacks(pTrackData, pAllocationCallbacks);
6770 pTrackData = NULL;
2ff0b512 6771 }
6772 } break;
6773
6774 case DRFLAC_METADATA_BLOCK_TYPE_PICTURE:
6775 {
6776 if (blockSize < 32) {
6777 return DRFLAC_FALSE;
6778 }
6779
6780 if (onMeta) {
6781 void* pRawData;
6782 const char* pRunningData;
6783 const char* pRunningDataEnd;
6784
6785 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6786 if (pRawData == NULL) {
6787 return DRFLAC_FALSE;
6788 }
6789
6790 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6791 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6792 return DRFLAC_FALSE;
6793 }
6794
6795 metadata.pRawData = pRawData;
6796 metadata.rawDataSize = blockSize;
6797
6798 pRunningData = (const char*)pRawData;
6799 pRunningDataEnd = (const char*)pRawData + blockSize;
6800
9e052883 6801 metadata.data.picture.type = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6802 metadata.data.picture.mimeLength = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
2ff0b512 6803
6804 /* Need space for the rest of the block */
6805 if ((pRunningDataEnd - pRunningData) - 24 < (drflac_int64)metadata.data.picture.mimeLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6806 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6807 return DRFLAC_FALSE;
6808 }
9e052883 6809 metadata.data.picture.mime = pRunningData; pRunningData += metadata.data.picture.mimeLength;
6810 metadata.data.picture.descriptionLength = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
2ff0b512 6811
6812 /* Need space for the rest of the block */
6813 if ((pRunningDataEnd - pRunningData) - 20 < (drflac_int64)metadata.data.picture.descriptionLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6814 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6815 return DRFLAC_FALSE;
6816 }
9e052883 6817 metadata.data.picture.description = pRunningData; pRunningData += metadata.data.picture.descriptionLength;
6818 metadata.data.picture.width = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6819 metadata.data.picture.height = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6820 metadata.data.picture.colorDepth = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6821 metadata.data.picture.indexColorCount = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6822 metadata.data.picture.pictureDataSize = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
2ff0b512 6823 metadata.data.picture.pPictureData = (const drflac_uint8*)pRunningData;
6824
6825 /* Need space for the picture after the block */
6826 if (pRunningDataEnd - pRunningData < (drflac_int64)metadata.data.picture.pictureDataSize) { /* <-- Note the order of operations to avoid overflow to a valid value */
6827 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6828 return DRFLAC_FALSE;
6829 }
6830
6831 onMeta(pUserDataMD, &metadata);
6832
6833 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6834 }
6835 } break;
6836
6837 case DRFLAC_METADATA_BLOCK_TYPE_PADDING:
6838 {
6839 if (onMeta) {
6840 metadata.data.padding.unused = 0;
6841
6842 /* Padding doesn't have anything meaningful in it, so just skip over it, but make sure the caller is aware of it by firing the callback. */
6843 if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6844 isLastBlock = DRFLAC_TRUE; /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
6845 } else {
6846 onMeta(pUserDataMD, &metadata);
6847 }
6848 }
6849 } break;
6850
6851 case DRFLAC_METADATA_BLOCK_TYPE_INVALID:
6852 {
6853 /* Invalid chunk. Just skip over this one. */
6854 if (onMeta) {
6855 if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6856 isLastBlock = DRFLAC_TRUE; /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
6857 }
6858 }
6859 } break;
6860
6861 default:
6862 {
6863 /*
6864 It's an unknown chunk, but not necessarily invalid. There's a chance more metadata blocks might be defined later on, so we
6865 can at the very least report the chunk to the application and let it look at the raw data.
6866 */
6867 if (onMeta) {
6868 void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6869 if (pRawData == NULL) {
6870 return DRFLAC_FALSE;
6871 }
6872
6873 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6874 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6875 return DRFLAC_FALSE;
6876 }
6877
6878 metadata.pRawData = pRawData;
6879 metadata.rawDataSize = blockSize;
6880 onMeta(pUserDataMD, &metadata);
6881
6882 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6883 }
6884 } break;
6885 }
6886
6887 /* If we're not handling metadata, just skip over the block. If we are, it will have been handled earlier in the switch statement above. */
6888 if (onMeta == NULL && blockSize > 0) {
6889 if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6890 isLastBlock = DRFLAC_TRUE;
6891 }
6892 }
6893
6894 runningFilePos += blockSize;
6895 if (isLastBlock) {
6896 break;
6897 }
6898 }
6899
9e052883 6900 *pSeektablePos = seektablePos;
6901 *pSeekpointCount = seektableSize / DRFLAC_SEEKPOINT_SIZE_IN_BYTES;
6902 *pFirstFramePos = runningFilePos;
2ff0b512 6903
6904 return DRFLAC_TRUE;
6905}
6906
6907static drflac_bool32 drflac__init_private__native(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
6908{
6909 /* Pre Condition: The bit stream should be sitting just past the 4-byte id header. */
6910
6911 drflac_uint8 isLastBlock;
6912 drflac_uint8 blockType;
6913 drflac_uint32 blockSize;
6914
6915 (void)onSeek;
6916
6917 pInit->container = drflac_container_native;
6918
6919 /* The first metadata block should be the STREAMINFO block. */
6920 if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
6921 return DRFLAC_FALSE;
6922 }
6923
6924 if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
6925 if (!relaxed) {
6926 /* We're opening in strict mode and the first block is not the STREAMINFO block. Error. */
6927 return DRFLAC_FALSE;
6928 } else {
6929 /*
6930 Relaxed mode. To open from here we need to just find the first frame and set the sample rate, etc. to whatever is defined
6931 for that frame.
6932 */
6933 pInit->hasStreamInfoBlock = DRFLAC_FALSE;
6934 pInit->hasMetadataBlocks = DRFLAC_FALSE;
6935
6936 if (!drflac__read_next_flac_frame_header(&pInit->bs, 0, &pInit->firstFrameHeader)) {
6937 return DRFLAC_FALSE; /* Couldn't find a frame. */
6938 }
6939
6940 if (pInit->firstFrameHeader.bitsPerSample == 0) {
6941 return DRFLAC_FALSE; /* Failed to initialize because the first frame depends on the STREAMINFO block, which does not exist. */
6942 }
6943
6944 pInit->sampleRate = pInit->firstFrameHeader.sampleRate;
6945 pInit->channels = drflac__get_channel_count_from_channel_assignment(pInit->firstFrameHeader.channelAssignment);
6946 pInit->bitsPerSample = pInit->firstFrameHeader.bitsPerSample;
6947 pInit->maxBlockSizeInPCMFrames = 65535; /* <-- See notes here: https://xiph.org/flac/format.html#metadata_block_streaminfo */
6948 return DRFLAC_TRUE;
6949 }
6950 } else {
6951 drflac_streaminfo streaminfo;
6952 if (!drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
6953 return DRFLAC_FALSE;
6954 }
6955
6956 pInit->hasStreamInfoBlock = DRFLAC_TRUE;
6957 pInit->sampleRate = streaminfo.sampleRate;
6958 pInit->channels = streaminfo.channels;
6959 pInit->bitsPerSample = streaminfo.bitsPerSample;
6960 pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount;
6961 pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames; /* Don't care about the min block size - only the max (used for determining the size of the memory allocation). */
6962 pInit->hasMetadataBlocks = !isLastBlock;
6963
6964 if (onMeta) {
6965 drflac_metadata metadata;
6966 metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
6967 metadata.pRawData = NULL;
6968 metadata.rawDataSize = 0;
6969 metadata.data.streaminfo = streaminfo;
6970 onMeta(pUserDataMD, &metadata);
6971 }
6972
6973 return DRFLAC_TRUE;
6974 }
6975}
6976
6977#ifndef DR_FLAC_NO_OGG
6978#define DRFLAC_OGG_MAX_PAGE_SIZE 65307
6979#define DRFLAC_OGG_CAPTURE_PATTERN_CRC32 1605413199 /* CRC-32 of "OggS". */
6980
6981typedef enum
6982{
6983 drflac_ogg_recover_on_crc_mismatch,
6984 drflac_ogg_fail_on_crc_mismatch
6985} drflac_ogg_crc_mismatch_recovery;
6986
6987#ifndef DR_FLAC_NO_CRC
6988static drflac_uint32 drflac__crc32_table[] = {
6989 0x00000000L, 0x04C11DB7L, 0x09823B6EL, 0x0D4326D9L,
6990 0x130476DCL, 0x17C56B6BL, 0x1A864DB2L, 0x1E475005L,
6991 0x2608EDB8L, 0x22C9F00FL, 0x2F8AD6D6L, 0x2B4BCB61L,
6992 0x350C9B64L, 0x31CD86D3L, 0x3C8EA00AL, 0x384FBDBDL,
6993 0x4C11DB70L, 0x48D0C6C7L, 0x4593E01EL, 0x4152FDA9L,
6994 0x5F15ADACL, 0x5BD4B01BL, 0x569796C2L, 0x52568B75L,
6995 0x6A1936C8L, 0x6ED82B7FL, 0x639B0DA6L, 0x675A1011L,
6996 0x791D4014L, 0x7DDC5DA3L, 0x709F7B7AL, 0x745E66CDL,
6997 0x9823B6E0L, 0x9CE2AB57L, 0x91A18D8EL, 0x95609039L,
6998 0x8B27C03CL, 0x8FE6DD8BL, 0x82A5FB52L, 0x8664E6E5L,
6999 0xBE2B5B58L, 0xBAEA46EFL, 0xB7A96036L, 0xB3687D81L,
7000 0xAD2F2D84L, 0xA9EE3033L, 0xA4AD16EAL, 0xA06C0B5DL,
7001 0xD4326D90L, 0xD0F37027L, 0xDDB056FEL, 0xD9714B49L,
7002 0xC7361B4CL, 0xC3F706FBL, 0xCEB42022L, 0xCA753D95L,
7003 0xF23A8028L, 0xF6FB9D9FL, 0xFBB8BB46L, 0xFF79A6F1L,
7004 0xE13EF6F4L, 0xE5FFEB43L, 0xE8BCCD9AL, 0xEC7DD02DL,
7005 0x34867077L, 0x30476DC0L, 0x3D044B19L, 0x39C556AEL,
7006 0x278206ABL, 0x23431B1CL, 0x2E003DC5L, 0x2AC12072L,
7007 0x128E9DCFL, 0x164F8078L, 0x1B0CA6A1L, 0x1FCDBB16L,
7008 0x018AEB13L, 0x054BF6A4L, 0x0808D07DL, 0x0CC9CDCAL,
7009 0x7897AB07L, 0x7C56B6B0L, 0x71159069L, 0x75D48DDEL,
7010 0x6B93DDDBL, 0x6F52C06CL, 0x6211E6B5L, 0x66D0FB02L,
7011 0x5E9F46BFL, 0x5A5E5B08L, 0x571D7DD1L, 0x53DC6066L,
7012 0x4D9B3063L, 0x495A2DD4L, 0x44190B0DL, 0x40D816BAL,
7013 0xACA5C697L, 0xA864DB20L, 0xA527FDF9L, 0xA1E6E04EL,
7014 0xBFA1B04BL, 0xBB60ADFCL, 0xB6238B25L, 0xB2E29692L,
7015 0x8AAD2B2FL, 0x8E6C3698L, 0x832F1041L, 0x87EE0DF6L,
7016 0x99A95DF3L, 0x9D684044L, 0x902B669DL, 0x94EA7B2AL,
7017 0xE0B41DE7L, 0xE4750050L, 0xE9362689L, 0xEDF73B3EL,
7018 0xF3B06B3BL, 0xF771768CL, 0xFA325055L, 0xFEF34DE2L,
7019 0xC6BCF05FL, 0xC27DEDE8L, 0xCF3ECB31L, 0xCBFFD686L,
7020 0xD5B88683L, 0xD1799B34L, 0xDC3ABDEDL, 0xD8FBA05AL,
7021 0x690CE0EEL, 0x6DCDFD59L, 0x608EDB80L, 0x644FC637L,
7022 0x7A089632L, 0x7EC98B85L, 0x738AAD5CL, 0x774BB0EBL,
7023 0x4F040D56L, 0x4BC510E1L, 0x46863638L, 0x42472B8FL,
7024 0x5C007B8AL, 0x58C1663DL, 0x558240E4L, 0x51435D53L,
7025 0x251D3B9EL, 0x21DC2629L, 0x2C9F00F0L, 0x285E1D47L,
7026 0x36194D42L, 0x32D850F5L, 0x3F9B762CL, 0x3B5A6B9BL,
7027 0x0315D626L, 0x07D4CB91L, 0x0A97ED48L, 0x0E56F0FFL,
7028 0x1011A0FAL, 0x14D0BD4DL, 0x19939B94L, 0x1D528623L,
7029 0xF12F560EL, 0xF5EE4BB9L, 0xF8AD6D60L, 0xFC6C70D7L,
7030 0xE22B20D2L, 0xE6EA3D65L, 0xEBA91BBCL, 0xEF68060BL,
7031 0xD727BBB6L, 0xD3E6A601L, 0xDEA580D8L, 0xDA649D6FL,
7032 0xC423CD6AL, 0xC0E2D0DDL, 0xCDA1F604L, 0xC960EBB3L,
7033 0xBD3E8D7EL, 0xB9FF90C9L, 0xB4BCB610L, 0xB07DABA7L,
7034 0xAE3AFBA2L, 0xAAFBE615L, 0xA7B8C0CCL, 0xA379DD7BL,
7035 0x9B3660C6L, 0x9FF77D71L, 0x92B45BA8L, 0x9675461FL,
7036 0x8832161AL, 0x8CF30BADL, 0x81B02D74L, 0x857130C3L,
7037 0x5D8A9099L, 0x594B8D2EL, 0x5408ABF7L, 0x50C9B640L,
7038 0x4E8EE645L, 0x4A4FFBF2L, 0x470CDD2BL, 0x43CDC09CL,
7039 0x7B827D21L, 0x7F436096L, 0x7200464FL, 0x76C15BF8L,
7040 0x68860BFDL, 0x6C47164AL, 0x61043093L, 0x65C52D24L,
7041 0x119B4BE9L, 0x155A565EL, 0x18197087L, 0x1CD86D30L,
7042 0x029F3D35L, 0x065E2082L, 0x0B1D065BL, 0x0FDC1BECL,
7043 0x3793A651L, 0x3352BBE6L, 0x3E119D3FL, 0x3AD08088L,
7044 0x2497D08DL, 0x2056CD3AL, 0x2D15EBE3L, 0x29D4F654L,
7045 0xC5A92679L, 0xC1683BCEL, 0xCC2B1D17L, 0xC8EA00A0L,
7046 0xD6AD50A5L, 0xD26C4D12L, 0xDF2F6BCBL, 0xDBEE767CL,
7047 0xE3A1CBC1L, 0xE760D676L, 0xEA23F0AFL, 0xEEE2ED18L,
7048 0xF0A5BD1DL, 0xF464A0AAL, 0xF9278673L, 0xFDE69BC4L,
7049 0x89B8FD09L, 0x8D79E0BEL, 0x803AC667L, 0x84FBDBD0L,
7050 0x9ABC8BD5L, 0x9E7D9662L, 0x933EB0BBL, 0x97FFAD0CL,
7051 0xAFB010B1L, 0xAB710D06L, 0xA6322BDFL, 0xA2F33668L,
7052 0xBCB4666DL, 0xB8757BDAL, 0xB5365D03L, 0xB1F740B4L
7053};
7054#endif
7055
7056static DRFLAC_INLINE drflac_uint32 drflac_crc32_byte(drflac_uint32 crc32, drflac_uint8 data)
7057{
7058#ifndef DR_FLAC_NO_CRC
7059 return (crc32 << 8) ^ drflac__crc32_table[(drflac_uint8)((crc32 >> 24) & 0xFF) ^ data];
7060#else
7061 (void)data;
7062 return crc32;
7063#endif
7064}
7065
9e052883 7066#if 0
7067static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint32(drflac_uint32 crc32, drflac_uint32 data)
7068{
7069 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 24) & 0xFF));
7070 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 16) & 0xFF));
7071 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 8) & 0xFF));
7072 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 0) & 0xFF));
7073 return crc32;
7074}
7075
7076static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint64(drflac_uint32 crc32, drflac_uint64 data)
7077{
7078 crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 32) & 0xFFFFFFFF));
7079 crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 0) & 0xFFFFFFFF));
7080 return crc32;
7081}
7082#endif
7083
2ff0b512 7084static DRFLAC_INLINE drflac_uint32 drflac_crc32_buffer(drflac_uint32 crc32, drflac_uint8* pData, drflac_uint32 dataSize)
7085{
7086 /* This can be optimized. */
7087 drflac_uint32 i;
7088 for (i = 0; i < dataSize; ++i) {
7089 crc32 = drflac_crc32_byte(crc32, pData[i]);
7090 }
7091 return crc32;
7092}
7093
7094
7095static DRFLAC_INLINE drflac_bool32 drflac_ogg__is_capture_pattern(drflac_uint8 pattern[4])
7096{
7097 return pattern[0] == 'O' && pattern[1] == 'g' && pattern[2] == 'g' && pattern[3] == 'S';
7098}
7099
7100static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_header_size(drflac_ogg_page_header* pHeader)
7101{
7102 return 27 + pHeader->segmentCount;
7103}
7104
7105static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_body_size(drflac_ogg_page_header* pHeader)
7106{
7107 drflac_uint32 pageBodySize = 0;
7108 int i;
7109
7110 for (i = 0; i < pHeader->segmentCount; ++i) {
7111 pageBodySize += pHeader->segmentTable[i];
7112 }
7113
7114 return pageBodySize;
7115}
7116
7117static drflac_result drflac_ogg__read_page_header_after_capture_pattern(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
7118{
7119 drflac_uint8 data[23];
7120 drflac_uint32 i;
7121
7122 DRFLAC_ASSERT(*pCRC32 == DRFLAC_OGG_CAPTURE_PATTERN_CRC32);
7123
7124 if (onRead(pUserData, data, 23) != 23) {
7125 return DRFLAC_AT_END;
7126 }
7127 *pBytesRead += 23;
7128
7129 /*
7130 It's not actually used, but set the capture pattern to 'OggS' for completeness. Not doing this will cause static analysers to complain about
7131 us trying to access uninitialized data. We could alternatively just comment out this member of the drflac_ogg_page_header structure, but I
7132 like to have it map to the structure of the underlying data.
7133 */
7134 pHeader->capturePattern[0] = 'O';
7135 pHeader->capturePattern[1] = 'g';
7136 pHeader->capturePattern[2] = 'g';
7137 pHeader->capturePattern[3] = 'S';
7138
7139 pHeader->structureVersion = data[0];
7140 pHeader->headerType = data[1];
7141 DRFLAC_COPY_MEMORY(&pHeader->granulePosition, &data[ 2], 8);
7142 DRFLAC_COPY_MEMORY(&pHeader->serialNumber, &data[10], 4);
7143 DRFLAC_COPY_MEMORY(&pHeader->sequenceNumber, &data[14], 4);
7144 DRFLAC_COPY_MEMORY(&pHeader->checksum, &data[18], 4);
7145 pHeader->segmentCount = data[22];
7146
7147 /* Calculate the CRC. Note that for the calculation the checksum part of the page needs to be set to 0. */
7148 data[18] = 0;
7149 data[19] = 0;
7150 data[20] = 0;
7151 data[21] = 0;
7152
7153 for (i = 0; i < 23; ++i) {
7154 *pCRC32 = drflac_crc32_byte(*pCRC32, data[i]);
7155 }
7156
7157
7158 if (onRead(pUserData, pHeader->segmentTable, pHeader->segmentCount) != pHeader->segmentCount) {
7159 return DRFLAC_AT_END;
7160 }
7161 *pBytesRead += pHeader->segmentCount;
7162
7163 for (i = 0; i < pHeader->segmentCount; ++i) {
7164 *pCRC32 = drflac_crc32_byte(*pCRC32, pHeader->segmentTable[i]);
7165 }
7166
7167 return DRFLAC_SUCCESS;
7168}
7169
7170static drflac_result drflac_ogg__read_page_header(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
7171{
7172 drflac_uint8 id[4];
7173
7174 *pBytesRead = 0;
7175
7176 if (onRead(pUserData, id, 4) != 4) {
7177 return DRFLAC_AT_END;
7178 }
7179 *pBytesRead += 4;
7180
7181 /* We need to read byte-by-byte until we find the OggS capture pattern. */
7182 for (;;) {
7183 if (drflac_ogg__is_capture_pattern(id)) {
7184 drflac_result result;
7185
7186 *pCRC32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
7187
7188 result = drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, pHeader, pBytesRead, pCRC32);
7189 if (result == DRFLAC_SUCCESS) {
7190 return DRFLAC_SUCCESS;
7191 } else {
7192 if (result == DRFLAC_CRC_MISMATCH) {
7193 continue;
7194 } else {
7195 return result;
7196 }
7197 }
7198 } else {
7199 /* The first 4 bytes did not equal the capture pattern. Read the next byte and try again. */
7200 id[0] = id[1];
7201 id[1] = id[2];
7202 id[2] = id[3];
7203 if (onRead(pUserData, &id[3], 1) != 1) {
7204 return DRFLAC_AT_END;
7205 }
7206 *pBytesRead += 1;
7207 }
7208 }
7209}
7210
7211
7212/*
7213The main part of the Ogg encapsulation is the conversion from the physical Ogg bitstream to the native FLAC bitstream. It works
7214in three general stages: Ogg Physical Bitstream -> Ogg/FLAC Logical Bitstream -> FLAC Native Bitstream. dr_flac is designed
7215in such a way that the core sections assume everything is delivered in native format. Therefore, for each encapsulation type
7216dr_flac is supporting there needs to be a layer sitting on top of the onRead and onSeek callbacks that ensures the bits read from
7217the physical Ogg bitstream are converted and delivered in native FLAC format.
7218*/
7219typedef struct
7220{
7221 drflac_read_proc onRead; /* The original onRead callback from drflac_open() and family. */
7222 drflac_seek_proc onSeek; /* The original onSeek callback from drflac_open() and family. */
7223 void* pUserData; /* The user data passed on onRead and onSeek. This is the user data that was passed on drflac_open() and family. */
7224 drflac_uint64 currentBytePos; /* The position of the byte we are sitting on in the physical byte stream. Used for efficient seeking. */
7225 drflac_uint64 firstBytePos; /* The position of the first byte in the physical bitstream. Points to the start of the "OggS" identifier of the FLAC bos page. */
7226 drflac_uint32 serialNumber; /* The serial number of the FLAC audio pages. This is determined by the initial header page that was read during initialization. */
7227 drflac_ogg_page_header bosPageHeader; /* Used for seeking. */
7228 drflac_ogg_page_header currentPageHeader;
7229 drflac_uint32 bytesRemainingInPage;
7230 drflac_uint32 pageDataSize;
7231 drflac_uint8 pageData[DRFLAC_OGG_MAX_PAGE_SIZE];
7232} drflac_oggbs; /* oggbs = Ogg Bitstream */
7233
7234static size_t drflac_oggbs__read_physical(drflac_oggbs* oggbs, void* bufferOut, size_t bytesToRead)
7235{
7236 size_t bytesActuallyRead = oggbs->onRead(oggbs->pUserData, bufferOut, bytesToRead);
7237 oggbs->currentBytePos += bytesActuallyRead;
7238
7239 return bytesActuallyRead;
7240}
7241
7242static drflac_bool32 drflac_oggbs__seek_physical(drflac_oggbs* oggbs, drflac_uint64 offset, drflac_seek_origin origin)
7243{
7244 if (origin == drflac_seek_origin_start) {
7245 if (offset <= 0x7FFFFFFF) {
7246 if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_start)) {
7247 return DRFLAC_FALSE;
7248 }
7249 oggbs->currentBytePos = offset;
7250
7251 return DRFLAC_TRUE;
7252 } else {
7253 if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
7254 return DRFLAC_FALSE;
7255 }
7256 oggbs->currentBytePos = offset;
7257
7258 return drflac_oggbs__seek_physical(oggbs, offset - 0x7FFFFFFF, drflac_seek_origin_current);
7259 }
7260 } else {
7261 while (offset > 0x7FFFFFFF) {
7262 if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
7263 return DRFLAC_FALSE;
7264 }
7265 oggbs->currentBytePos += 0x7FFFFFFF;
7266 offset -= 0x7FFFFFFF;
7267 }
7268
7269 if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_current)) { /* <-- Safe cast thanks to the loop above. */
7270 return DRFLAC_FALSE;
7271 }
7272 oggbs->currentBytePos += offset;
7273
7274 return DRFLAC_TRUE;
7275 }
7276}
7277
7278static drflac_bool32 drflac_oggbs__goto_next_page(drflac_oggbs* oggbs, drflac_ogg_crc_mismatch_recovery recoveryMethod)
7279{
7280 drflac_ogg_page_header header;
7281 for (;;) {
7282 drflac_uint32 crc32 = 0;
7283 drflac_uint32 bytesRead;
7284 drflac_uint32 pageBodySize;
7285#ifndef DR_FLAC_NO_CRC
7286 drflac_uint32 actualCRC32;
7287#endif
7288
7289 if (drflac_ogg__read_page_header(oggbs->onRead, oggbs->pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7290 return DRFLAC_FALSE;
7291 }
7292 oggbs->currentBytePos += bytesRead;
7293
7294 pageBodySize = drflac_ogg__get_page_body_size(&header);
7295 if (pageBodySize > DRFLAC_OGG_MAX_PAGE_SIZE) {
7296 continue; /* Invalid page size. Assume it's corrupted and just move to the next page. */
7297 }
7298
7299 if (header.serialNumber != oggbs->serialNumber) {
7300 /* It's not a FLAC page. Skip it. */
7301 if (pageBodySize > 0 && !drflac_oggbs__seek_physical(oggbs, pageBodySize, drflac_seek_origin_current)) {
7302 return DRFLAC_FALSE;
7303 }
7304 continue;
7305 }
7306
7307
7308 /* We need to read the entire page and then do a CRC check on it. If there's a CRC mismatch we need to skip this page. */
7309 if (drflac_oggbs__read_physical(oggbs, oggbs->pageData, pageBodySize) != pageBodySize) {
7310 return DRFLAC_FALSE;
7311 }
7312 oggbs->pageDataSize = pageBodySize;
7313
7314#ifndef DR_FLAC_NO_CRC
7315 actualCRC32 = drflac_crc32_buffer(crc32, oggbs->pageData, oggbs->pageDataSize);
7316 if (actualCRC32 != header.checksum) {
7317 if (recoveryMethod == drflac_ogg_recover_on_crc_mismatch) {
7318 continue; /* CRC mismatch. Skip this page. */
7319 } else {
7320 /*
7321 Even though we are failing on a CRC mismatch, we still want our stream to be in a good state. Therefore we
7322 go to the next valid page to ensure we're in a good state, but return false to let the caller know that the
7323 seek did not fully complete.
7324 */
7325 drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch);
7326 return DRFLAC_FALSE;
7327 }
7328 }
7329#else
7330 (void)recoveryMethod; /* <-- Silence a warning. */
7331#endif
7332
7333 oggbs->currentPageHeader = header;
7334 oggbs->bytesRemainingInPage = pageBodySize;
7335 return DRFLAC_TRUE;
7336 }
7337}
7338
9e052883 7339/* Function below is unused at the moment, but I might be re-adding it later. */
7340#if 0
7341static drflac_uint8 drflac_oggbs__get_current_segment_index(drflac_oggbs* oggbs, drflac_uint8* pBytesRemainingInSeg)
7342{
7343 drflac_uint32 bytesConsumedInPage = drflac_ogg__get_page_body_size(&oggbs->currentPageHeader) - oggbs->bytesRemainingInPage;
7344 drflac_uint8 iSeg = 0;
7345 drflac_uint32 iByte = 0;
7346 while (iByte < bytesConsumedInPage) {
7347 drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
7348 if (iByte + segmentSize > bytesConsumedInPage) {
7349 break;
7350 } else {
7351 iSeg += 1;
7352 iByte += segmentSize;
7353 }
7354 }
7355
7356 *pBytesRemainingInSeg = oggbs->currentPageHeader.segmentTable[iSeg] - (drflac_uint8)(bytesConsumedInPage - iByte);
7357 return iSeg;
7358}
7359
7360static drflac_bool32 drflac_oggbs__seek_to_next_packet(drflac_oggbs* oggbs)
7361{
7362 /* The current packet ends when we get to the segment with a lacing value of < 255 which is not at the end of a page. */
7363 for (;;) {
7364 drflac_bool32 atEndOfPage = DRFLAC_FALSE;
7365
7366 drflac_uint8 bytesRemainingInSeg;
7367 drflac_uint8 iFirstSeg = drflac_oggbs__get_current_segment_index(oggbs, &bytesRemainingInSeg);
7368
7369 drflac_uint32 bytesToEndOfPacketOrPage = bytesRemainingInSeg;
7370 for (drflac_uint8 iSeg = iFirstSeg; iSeg < oggbs->currentPageHeader.segmentCount; ++iSeg) {
7371 drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
7372 if (segmentSize < 255) {
7373 if (iSeg == oggbs->currentPageHeader.segmentCount-1) {
7374 atEndOfPage = DRFLAC_TRUE;
7375 }
7376
7377 break;
7378 }
7379
7380 bytesToEndOfPacketOrPage += segmentSize;
7381 }
7382
7383 /*
7384 At this point we will have found either the packet or the end of the page. If were at the end of the page we'll
7385 want to load the next page and keep searching for the end of the packet.
7386 */
7387 drflac_oggbs__seek_physical(oggbs, bytesToEndOfPacketOrPage, drflac_seek_origin_current);
7388 oggbs->bytesRemainingInPage -= bytesToEndOfPacketOrPage;
7389
7390 if (atEndOfPage) {
7391 /*
7392 We're potentially at the next packet, but we need to check the next page first to be sure because the packet may
7393 straddle pages.
7394 */
7395 if (!drflac_oggbs__goto_next_page(oggbs)) {
7396 return DRFLAC_FALSE;
7397 }
7398
7399 /* If it's a fresh packet it most likely means we're at the next packet. */
7400 if ((oggbs->currentPageHeader.headerType & 0x01) == 0) {
7401 return DRFLAC_TRUE;
7402 }
7403 } else {
7404 /* We're at the next packet. */
7405 return DRFLAC_TRUE;
7406 }
7407 }
7408}
7409
7410static drflac_bool32 drflac_oggbs__seek_to_next_frame(drflac_oggbs* oggbs)
7411{
7412 /* The bitstream should be sitting on the first byte just after the header of the frame. */
7413
7414 /* What we're actually doing here is seeking to the start of the next packet. */
7415 return drflac_oggbs__seek_to_next_packet(oggbs);
7416}
7417#endif
7418
2ff0b512 7419static size_t drflac__on_read_ogg(void* pUserData, void* bufferOut, size_t bytesToRead)
7420{
7421 drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
7422 drflac_uint8* pRunningBufferOut = (drflac_uint8*)bufferOut;
7423 size_t bytesRead = 0;
7424
7425 DRFLAC_ASSERT(oggbs != NULL);
7426 DRFLAC_ASSERT(pRunningBufferOut != NULL);
7427
7428 /* Reading is done page-by-page. If we've run out of bytes in the page we need to move to the next one. */
7429 while (bytesRead < bytesToRead) {
7430 size_t bytesRemainingToRead = bytesToRead - bytesRead;
7431
7432 if (oggbs->bytesRemainingInPage >= bytesRemainingToRead) {
7433 DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), bytesRemainingToRead);
7434 bytesRead += bytesRemainingToRead;
7435 oggbs->bytesRemainingInPage -= (drflac_uint32)bytesRemainingToRead;
7436 break;
7437 }
7438
7439 /* If we get here it means some of the requested data is contained in the next pages. */
7440 if (oggbs->bytesRemainingInPage > 0) {
7441 DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), oggbs->bytesRemainingInPage);
7442 bytesRead += oggbs->bytesRemainingInPage;
7443 pRunningBufferOut += oggbs->bytesRemainingInPage;
7444 oggbs->bytesRemainingInPage = 0;
7445 }
7446
7447 DRFLAC_ASSERT(bytesRemainingToRead > 0);
7448 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7449 break; /* Failed to go to the next page. Might have simply hit the end of the stream. */
7450 }
7451 }
7452
7453 return bytesRead;
7454}
7455
7456static drflac_bool32 drflac__on_seek_ogg(void* pUserData, int offset, drflac_seek_origin origin)
7457{
7458 drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
7459 int bytesSeeked = 0;
7460
7461 DRFLAC_ASSERT(oggbs != NULL);
7462 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
7463
7464 /* Seeking is always forward which makes things a lot simpler. */
7465 if (origin == drflac_seek_origin_start) {
7466 if (!drflac_oggbs__seek_physical(oggbs, (int)oggbs->firstBytePos, drflac_seek_origin_start)) {
7467 return DRFLAC_FALSE;
7468 }
7469
7470 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
7471 return DRFLAC_FALSE;
7472 }
7473
7474 return drflac__on_seek_ogg(pUserData, offset, drflac_seek_origin_current);
7475 }
7476
7477 DRFLAC_ASSERT(origin == drflac_seek_origin_current);
7478
7479 while (bytesSeeked < offset) {
7480 int bytesRemainingToSeek = offset - bytesSeeked;
7481 DRFLAC_ASSERT(bytesRemainingToSeek >= 0);
7482
7483 if (oggbs->bytesRemainingInPage >= (size_t)bytesRemainingToSeek) {
7484 bytesSeeked += bytesRemainingToSeek;
7485 (void)bytesSeeked; /* <-- Silence a dead store warning emitted by Clang Static Analyzer. */
7486 oggbs->bytesRemainingInPage -= bytesRemainingToSeek;
7487 break;
7488 }
7489
7490 /* If we get here it means some of the requested data is contained in the next pages. */
7491 if (oggbs->bytesRemainingInPage > 0) {
7492 bytesSeeked += (int)oggbs->bytesRemainingInPage;
7493 oggbs->bytesRemainingInPage = 0;
7494 }
7495
7496 DRFLAC_ASSERT(bytesRemainingToSeek > 0);
7497 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
7498 /* Failed to go to the next page. We either hit the end of the stream or had a CRC mismatch. */
7499 return DRFLAC_FALSE;
7500 }
7501 }
7502
7503 return DRFLAC_TRUE;
7504}
7505
7506
7507static drflac_bool32 drflac_ogg__seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
7508{
7509 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
7510 drflac_uint64 originalBytePos;
7511 drflac_uint64 runningGranulePosition;
7512 drflac_uint64 runningFrameBytePos;
7513 drflac_uint64 runningPCMFrameCount;
7514
7515 DRFLAC_ASSERT(oggbs != NULL);
7516
7517 originalBytePos = oggbs->currentBytePos; /* For recovery. Points to the OggS identifier. */
7518
7519 /* First seek to the first frame. */
7520 if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes)) {
7521 return DRFLAC_FALSE;
7522 }
7523 oggbs->bytesRemainingInPage = 0;
7524
7525 runningGranulePosition = 0;
7526 for (;;) {
7527 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7528 drflac_oggbs__seek_physical(oggbs, originalBytePos, drflac_seek_origin_start);
7529 return DRFLAC_FALSE; /* Never did find that sample... */
7530 }
7531
7532 runningFrameBytePos = oggbs->currentBytePos - drflac_ogg__get_page_header_size(&oggbs->currentPageHeader) - oggbs->pageDataSize;
7533 if (oggbs->currentPageHeader.granulePosition >= pcmFrameIndex) {
7534 break; /* The sample is somewhere in the previous page. */
7535 }
7536
7537 /*
7538 At this point we know the sample is not in the previous page. It could possibly be in this page. For simplicity we
7539 disregard any pages that do not begin a fresh packet.
7540 */
7541 if ((oggbs->currentPageHeader.headerType & 0x01) == 0) { /* <-- Is it a fresh page? */
7542 if (oggbs->currentPageHeader.segmentTable[0] >= 2) {
7543 drflac_uint8 firstBytesInPage[2];
7544 firstBytesInPage[0] = oggbs->pageData[0];
7545 firstBytesInPage[1] = oggbs->pageData[1];
7546
7547 if ((firstBytesInPage[0] == 0xFF) && (firstBytesInPage[1] & 0xFC) == 0xF8) { /* <-- Does the page begin with a frame's sync code? */
7548 runningGranulePosition = oggbs->currentPageHeader.granulePosition;
7549 }
7550
7551 continue;
7552 }
7553 }
7554 }
7555
7556 /*
7557 We found the page that that is closest to the sample, so now we need to find it. The first thing to do is seek to the
7558 start of that page. In the loop above we checked that it was a fresh page which means this page is also the start of
7559 a new frame. This property means that after we've seeked to the page we can immediately start looping over frames until
7560 we find the one containing the target sample.
7561 */
7562 if (!drflac_oggbs__seek_physical(oggbs, runningFrameBytePos, drflac_seek_origin_start)) {
7563 return DRFLAC_FALSE;
7564 }
7565 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7566 return DRFLAC_FALSE;
7567 }
7568
7569 /*
7570 At this point we'll be sitting on the first byte of the frame header of the first frame in the page. We just keep
7571 looping over these frames until we find the one containing the sample we're after.
7572 */
7573 runningPCMFrameCount = runningGranulePosition;
7574 for (;;) {
7575 /*
7576 There are two ways to find the sample and seek past irrelevant frames:
7577 1) Use the native FLAC decoder.
7578 2) Use Ogg's framing system.
7579
7580 Both of these options have their own pros and cons. Using the native FLAC decoder is slower because it needs to
7581 do a full decode of the frame. Using Ogg's framing system is faster, but more complicated and involves some code
7582 duplication for the decoding of frame headers.
7583
7584 Another thing to consider is that using the Ogg framing system will perform direct seeking of the physical Ogg
7585 bitstream. This is important to consider because it means we cannot read data from the drflac_bs object using the
7586 standard drflac__*() APIs because that will read in extra data for its own internal caching which in turn breaks
7587 the positioning of the read pointer of the physical Ogg bitstream. Therefore, anything that would normally be read
7588 using the native FLAC decoding APIs, such as drflac__read_next_flac_frame_header(), need to be re-implemented so as to
7589 avoid the use of the drflac_bs object.
7590
7591 Considering these issues, I have decided to use the slower native FLAC decoding method for the following reasons:
7592 1) Seeking is already partially accelerated using Ogg's paging system in the code block above.
7593 2) Seeking in an Ogg encapsulated FLAC stream is probably quite uncommon.
7594 3) Simplicity.
7595 */
7596 drflac_uint64 firstPCMFrameInFLACFrame = 0;
7597 drflac_uint64 lastPCMFrameInFLACFrame = 0;
7598 drflac_uint64 pcmFrameCountInThisFrame;
7599
7600 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
7601 return DRFLAC_FALSE;
7602 }
7603
7604 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
7605
7606 pcmFrameCountInThisFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
7607
7608 /* If we are seeking to the end of the file and we've just hit it, we're done. */
7609 if (pcmFrameIndex == pFlac->totalPCMFrameCount && (runningPCMFrameCount + pcmFrameCountInThisFrame) == pFlac->totalPCMFrameCount) {
7610 drflac_result result = drflac__decode_flac_frame(pFlac);
7611 if (result == DRFLAC_SUCCESS) {
7612 pFlac->currentPCMFrame = pcmFrameIndex;
7613 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
7614 return DRFLAC_TRUE;
7615 } else {
7616 return DRFLAC_FALSE;
7617 }
7618 }
7619
7620 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFrame)) {
7621 /*
7622 The sample should be in this FLAC frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
7623 it never existed and keep iterating.
7624 */
7625 drflac_result result = drflac__decode_flac_frame(pFlac);
7626 if (result == DRFLAC_SUCCESS) {
7627 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
7628 drflac_uint64 pcmFramesToDecode = (size_t)(pcmFrameIndex - runningPCMFrameCount); /* <-- Safe cast because the maximum number of samples in a frame is 65535. */
7629 if (pcmFramesToDecode == 0) {
7630 return DRFLAC_TRUE;
7631 }
7632
7633 pFlac->currentPCMFrame = runningPCMFrameCount;
7634
7635 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
7636 } else {
7637 if (result == DRFLAC_CRC_MISMATCH) {
7638 continue; /* CRC mismatch. Pretend this frame never existed. */
7639 } else {
7640 return DRFLAC_FALSE;
7641 }
7642 }
7643 } else {
7644 /*
7645 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
7646 frame never existed and leave the running sample count untouched.
7647 */
7648 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
7649 if (result == DRFLAC_SUCCESS) {
7650 runningPCMFrameCount += pcmFrameCountInThisFrame;
7651 } else {
7652 if (result == DRFLAC_CRC_MISMATCH) {
7653 continue; /* CRC mismatch. Pretend this frame never existed. */
7654 } else {
7655 return DRFLAC_FALSE;
7656 }
7657 }
7658 }
7659 }
7660}
7661
7662
7663
7664static drflac_bool32 drflac__init_private__ogg(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
7665{
7666 drflac_ogg_page_header header;
7667 drflac_uint32 crc32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
7668 drflac_uint32 bytesRead = 0;
7669
7670 /* Pre Condition: The bit stream should be sitting just past the 4-byte OggS capture pattern. */
7671 (void)relaxed;
7672
7673 pInit->container = drflac_container_ogg;
7674 pInit->oggFirstBytePos = 0;
7675
7676 /*
7677 We'll get here if the first 4 bytes of the stream were the OggS capture pattern, however it doesn't necessarily mean the
7678 stream includes FLAC encoded audio. To check for this we need to scan the beginning-of-stream page markers and check if
7679 any match the FLAC specification. Important to keep in mind that the stream may be multiplexed.
7680 */
7681 if (drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7682 return DRFLAC_FALSE;
7683 }
7684 pInit->runningFilePos += bytesRead;
7685
7686 for (;;) {
7687 int pageBodySize;
7688
7689 /* Break if we're past the beginning of stream page. */
7690 if ((header.headerType & 0x02) == 0) {
7691 return DRFLAC_FALSE;
7692 }
7693
7694 /* Check if it's a FLAC header. */
7695 pageBodySize = drflac_ogg__get_page_body_size(&header);
7696 if (pageBodySize == 51) { /* 51 = the lacing value of the FLAC header packet. */
7697 /* It could be a FLAC page... */
7698 drflac_uint32 bytesRemainingInPage = pageBodySize;
7699 drflac_uint8 packetType;
7700
7701 if (onRead(pUserData, &packetType, 1) != 1) {
7702 return DRFLAC_FALSE;
7703 }
7704
7705 bytesRemainingInPage -= 1;
7706 if (packetType == 0x7F) {
7707 /* Increasingly more likely to be a FLAC page... */
7708 drflac_uint8 sig[4];
7709 if (onRead(pUserData, sig, 4) != 4) {
7710 return DRFLAC_FALSE;
7711 }
7712
7713 bytesRemainingInPage -= 4;
7714 if (sig[0] == 'F' && sig[1] == 'L' && sig[2] == 'A' && sig[3] == 'C') {
7715 /* Almost certainly a FLAC page... */
7716 drflac_uint8 mappingVersion[2];
7717 if (onRead(pUserData, mappingVersion, 2) != 2) {
7718 return DRFLAC_FALSE;
7719 }
7720
7721 if (mappingVersion[0] != 1) {
7722 return DRFLAC_FALSE; /* Only supporting version 1.x of the Ogg mapping. */
7723 }
7724
7725 /*
7726 The next 2 bytes are the non-audio packets, not including this one. We don't care about this because we're going to
7727 be handling it in a generic way based on the serial number and packet types.
7728 */
7729 if (!onSeek(pUserData, 2, drflac_seek_origin_current)) {
7730 return DRFLAC_FALSE;
7731 }
7732
7733 /* Expecting the native FLAC signature "fLaC". */
7734 if (onRead(pUserData, sig, 4) != 4) {
7735 return DRFLAC_FALSE;
7736 }
7737
7738 if (sig[0] == 'f' && sig[1] == 'L' && sig[2] == 'a' && sig[3] == 'C') {
7739 /* The remaining data in the page should be the STREAMINFO block. */
7740 drflac_streaminfo streaminfo;
7741 drflac_uint8 isLastBlock;
7742 drflac_uint8 blockType;
7743 drflac_uint32 blockSize;
7744 if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
7745 return DRFLAC_FALSE;
7746 }
7747
7748 if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
7749 return DRFLAC_FALSE; /* Invalid block type. First block must be the STREAMINFO block. */
7750 }
7751
7752 if (drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
7753 /* Success! */
7754 pInit->hasStreamInfoBlock = DRFLAC_TRUE;
7755 pInit->sampleRate = streaminfo.sampleRate;
7756 pInit->channels = streaminfo.channels;
7757 pInit->bitsPerSample = streaminfo.bitsPerSample;
7758 pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount;
7759 pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames;
7760 pInit->hasMetadataBlocks = !isLastBlock;
7761
7762 if (onMeta) {
7763 drflac_metadata metadata;
7764 metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
7765 metadata.pRawData = NULL;
7766 metadata.rawDataSize = 0;
7767 metadata.data.streaminfo = streaminfo;
7768 onMeta(pUserDataMD, &metadata);
7769 }
7770
7771 pInit->runningFilePos += pageBodySize;
7772 pInit->oggFirstBytePos = pInit->runningFilePos - 79; /* Subtracting 79 will place us right on top of the "OggS" identifier of the FLAC bos page. */
7773 pInit->oggSerial = header.serialNumber;
7774 pInit->oggBosHeader = header;
7775 break;
7776 } else {
7777 /* Failed to read STREAMINFO block. Aww, so close... */
7778 return DRFLAC_FALSE;
7779 }
7780 } else {
7781 /* Invalid file. */
7782 return DRFLAC_FALSE;
7783 }
7784 } else {
7785 /* Not a FLAC header. Skip it. */
7786 if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
7787 return DRFLAC_FALSE;
7788 }
7789 }
7790 } else {
7791 /* Not a FLAC header. Seek past the entire page and move on to the next. */
7792 if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
7793 return DRFLAC_FALSE;
7794 }
7795 }
7796 } else {
7797 if (!onSeek(pUserData, pageBodySize, drflac_seek_origin_current)) {
7798 return DRFLAC_FALSE;
7799 }
7800 }
7801
7802 pInit->runningFilePos += pageBodySize;
7803
7804
7805 /* Read the header of the next page. */
7806 if (drflac_ogg__read_page_header(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7807 return DRFLAC_FALSE;
7808 }
7809 pInit->runningFilePos += bytesRead;
7810 }
7811
7812 /*
7813 If we get here it means we found a FLAC audio stream. We should be sitting on the first byte of the header of the next page. The next
7814 packets in the FLAC logical stream contain the metadata. The only thing left to do in the initialization phase for Ogg is to create the
7815 Ogg bistream object.
7816 */
7817 pInit->hasMetadataBlocks = DRFLAC_TRUE; /* <-- Always have at least VORBIS_COMMENT metadata block. */
7818 return DRFLAC_TRUE;
7819}
7820#endif
7821
7822static drflac_bool32 drflac__init_private(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD)
7823{
7824 drflac_bool32 relaxed;
7825 drflac_uint8 id[4];
7826
7827 if (pInit == NULL || onRead == NULL || onSeek == NULL) {
7828 return DRFLAC_FALSE;
7829 }
7830
7831 DRFLAC_ZERO_MEMORY(pInit, sizeof(*pInit));
7832 pInit->onRead = onRead;
7833 pInit->onSeek = onSeek;
7834 pInit->onMeta = onMeta;
7835 pInit->container = container;
7836 pInit->pUserData = pUserData;
7837 pInit->pUserDataMD = pUserDataMD;
7838
7839 pInit->bs.onRead = onRead;
7840 pInit->bs.onSeek = onSeek;
7841 pInit->bs.pUserData = pUserData;
7842 drflac__reset_cache(&pInit->bs);
7843
7844
7845 /* If the container is explicitly defined then we can try opening in relaxed mode. */
7846 relaxed = container != drflac_container_unknown;
7847
7848 /* Skip over any ID3 tags. */
7849 for (;;) {
7850 if (onRead(pUserData, id, 4) != 4) {
7851 return DRFLAC_FALSE; /* Ran out of data. */
7852 }
7853 pInit->runningFilePos += 4;
7854
7855 if (id[0] == 'I' && id[1] == 'D' && id[2] == '3') {
7856 drflac_uint8 header[6];
7857 drflac_uint8 flags;
7858 drflac_uint32 headerSize;
7859
7860 if (onRead(pUserData, header, 6) != 6) {
7861 return DRFLAC_FALSE; /* Ran out of data. */
7862 }
7863 pInit->runningFilePos += 6;
7864
7865 flags = header[1];
7866
7867 DRFLAC_COPY_MEMORY(&headerSize, header+2, 4);
7868 headerSize = drflac__unsynchsafe_32(drflac__be2host_32(headerSize));
7869 if (flags & 0x10) {
7870 headerSize += 10;
7871 }
7872
7873 if (!onSeek(pUserData, headerSize, drflac_seek_origin_current)) {
7874 return DRFLAC_FALSE; /* Failed to seek past the tag. */
7875 }
7876 pInit->runningFilePos += headerSize;
7877 } else {
7878 break;
7879 }
7880 }
7881
7882 if (id[0] == 'f' && id[1] == 'L' && id[2] == 'a' && id[3] == 'C') {
7883 return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7884 }
7885#ifndef DR_FLAC_NO_OGG
7886 if (id[0] == 'O' && id[1] == 'g' && id[2] == 'g' && id[3] == 'S') {
7887 return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7888 }
7889#endif
7890
7891 /* If we get here it means we likely don't have a header. Try opening in relaxed mode, if applicable. */
7892 if (relaxed) {
7893 if (container == drflac_container_native) {
7894 return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7895 }
7896#ifndef DR_FLAC_NO_OGG
7897 if (container == drflac_container_ogg) {
7898 return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7899 }
7900#endif
7901 }
7902
7903 /* Unsupported container. */
7904 return DRFLAC_FALSE;
7905}
7906
7907static void drflac__init_from_info(drflac* pFlac, const drflac_init_info* pInit)
7908{
7909 DRFLAC_ASSERT(pFlac != NULL);
7910 DRFLAC_ASSERT(pInit != NULL);
7911
7912 DRFLAC_ZERO_MEMORY(pFlac, sizeof(*pFlac));
7913 pFlac->bs = pInit->bs;
7914 pFlac->onMeta = pInit->onMeta;
7915 pFlac->pUserDataMD = pInit->pUserDataMD;
7916 pFlac->maxBlockSizeInPCMFrames = pInit->maxBlockSizeInPCMFrames;
7917 pFlac->sampleRate = pInit->sampleRate;
7918 pFlac->channels = (drflac_uint8)pInit->channels;
7919 pFlac->bitsPerSample = (drflac_uint8)pInit->bitsPerSample;
7920 pFlac->totalPCMFrameCount = pInit->totalPCMFrameCount;
7921 pFlac->container = pInit->container;
7922}
7923
7924
7925static drflac* drflac_open_with_metadata_private(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD, const drflac_allocation_callbacks* pAllocationCallbacks)
7926{
7927 drflac_init_info init;
7928 drflac_uint32 allocationSize;
7929 drflac_uint32 wholeSIMDVectorCountPerChannel;
7930 drflac_uint32 decodedSamplesAllocationSize;
7931#ifndef DR_FLAC_NO_OGG
9e052883 7932 drflac_oggbs* pOggbs = NULL;
2ff0b512 7933#endif
7934 drflac_uint64 firstFramePos;
7935 drflac_uint64 seektablePos;
9e052883 7936 drflac_uint32 seekpointCount;
2ff0b512 7937 drflac_allocation_callbacks allocationCallbacks;
7938 drflac* pFlac;
7939
7940 /* CPU support first. */
7941 drflac__init_cpu_caps();
7942
7943 if (!drflac__init_private(&init, onRead, onSeek, onMeta, container, pUserData, pUserDataMD)) {
7944 return NULL;
7945 }
7946
7947 if (pAllocationCallbacks != NULL) {
7948 allocationCallbacks = *pAllocationCallbacks;
7949 if (allocationCallbacks.onFree == NULL || (allocationCallbacks.onMalloc == NULL && allocationCallbacks.onRealloc == NULL)) {
7950 return NULL; /* Invalid allocation callbacks. */
7951 }
7952 } else {
7953 allocationCallbacks.pUserData = NULL;
7954 allocationCallbacks.onMalloc = drflac__malloc_default;
7955 allocationCallbacks.onRealloc = drflac__realloc_default;
7956 allocationCallbacks.onFree = drflac__free_default;
7957 }
7958
7959
7960 /*
7961 The size of the allocation for the drflac object needs to be large enough to fit the following:
7962 1) The main members of the drflac structure
7963 2) A block of memory large enough to store the decoded samples of the largest frame in the stream
7964 3) If the container is Ogg, a drflac_oggbs object
7965
7966 The complicated part of the allocation is making sure there's enough room the decoded samples, taking into consideration
7967 the different SIMD instruction sets.
7968 */
7969 allocationSize = sizeof(drflac);
7970
7971 /*
7972 The allocation size for decoded frames depends on the number of 32-bit integers that fit inside the largest SIMD vector
7973 we are supporting.
7974 */
7975 if ((init.maxBlockSizeInPCMFrames % (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) == 0) {
7976 wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32)));
7977 } else {
7978 wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) + 1;
7979 }
7980
7981 decodedSamplesAllocationSize = wholeSIMDVectorCountPerChannel * DRFLAC_MAX_SIMD_VECTOR_SIZE * init.channels;
7982
7983 allocationSize += decodedSamplesAllocationSize;
7984 allocationSize += DRFLAC_MAX_SIMD_VECTOR_SIZE; /* Allocate extra bytes to ensure we have enough for alignment. */
7985
7986#ifndef DR_FLAC_NO_OGG
7987 /* There's additional data required for Ogg streams. */
7988 if (init.container == drflac_container_ogg) {
7989 allocationSize += sizeof(drflac_oggbs);
2ff0b512 7990
9e052883 7991 pOggbs = (drflac_oggbs*)drflac__malloc_from_callbacks(sizeof(*pOggbs), &allocationCallbacks);
7992 if (pOggbs == NULL) {
7993 return NULL; /*DRFLAC_OUT_OF_MEMORY;*/
7994 }
7995
7996 DRFLAC_ZERO_MEMORY(pOggbs, sizeof(*pOggbs));
7997 pOggbs->onRead = onRead;
7998 pOggbs->onSeek = onSeek;
7999 pOggbs->pUserData = pUserData;
8000 pOggbs->currentBytePos = init.oggFirstBytePos;
8001 pOggbs->firstBytePos = init.oggFirstBytePos;
8002 pOggbs->serialNumber = init.oggSerial;
8003 pOggbs->bosPageHeader = init.oggBosHeader;
8004 pOggbs->bytesRemainingInPage = 0;
2ff0b512 8005 }
8006#endif
8007
8008 /*
8009 This part is a bit awkward. We need to load the seektable so that it can be referenced in-memory, but I want the drflac object to
8010 consist of only a single heap allocation. To this, the size of the seek table needs to be known, which we determine when reading
8011 and decoding the metadata.
8012 */
9e052883 8013 firstFramePos = 42; /* <-- We know we are at byte 42 at this point. */
8014 seektablePos = 0;
8015 seekpointCount = 0;
2ff0b512 8016 if (init.hasMetadataBlocks) {
8017 drflac_read_proc onReadOverride = onRead;
8018 drflac_seek_proc onSeekOverride = onSeek;
8019 void* pUserDataOverride = pUserData;
8020
8021#ifndef DR_FLAC_NO_OGG
8022 if (init.container == drflac_container_ogg) {
8023 onReadOverride = drflac__on_read_ogg;
8024 onSeekOverride = drflac__on_seek_ogg;
9e052883 8025 pUserDataOverride = (void*)pOggbs;
2ff0b512 8026 }
8027#endif
8028
9e052883 8029 if (!drflac__read_and_decode_metadata(onReadOverride, onSeekOverride, onMeta, pUserDataOverride, pUserDataMD, &firstFramePos, &seektablePos, &seekpointCount, &allocationCallbacks)) {
8030 #ifndef DR_FLAC_NO_OGG
8031 drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
8032 #endif
2ff0b512 8033 return NULL;
8034 }
8035
9e052883 8036 allocationSize += seekpointCount * sizeof(drflac_seekpoint);
2ff0b512 8037 }
8038
8039
8040 pFlac = (drflac*)drflac__malloc_from_callbacks(allocationSize, &allocationCallbacks);
8041 if (pFlac == NULL) {
9e052883 8042 #ifndef DR_FLAC_NO_OGG
8043 drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
8044 #endif
2ff0b512 8045 return NULL;
8046 }
8047
8048 drflac__init_from_info(pFlac, &init);
8049 pFlac->allocationCallbacks = allocationCallbacks;
8050 pFlac->pDecodedSamples = (drflac_int32*)drflac_align((size_t)pFlac->pExtraData, DRFLAC_MAX_SIMD_VECTOR_SIZE);
8051
8052#ifndef DR_FLAC_NO_OGG
8053 if (init.container == drflac_container_ogg) {
9e052883 8054 drflac_oggbs* pInternalOggbs = (drflac_oggbs*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize + (seekpointCount * sizeof(drflac_seekpoint)));
8055 DRFLAC_COPY_MEMORY(pInternalOggbs, pOggbs, sizeof(*pOggbs));
8056
8057 /* At this point the pOggbs object has been handed over to pInternalOggbs and can be freed. */
8058 drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
8059 pOggbs = NULL;
2ff0b512 8060
8061 /* The Ogg bistream needs to be layered on top of the original bitstream. */
8062 pFlac->bs.onRead = drflac__on_read_ogg;
8063 pFlac->bs.onSeek = drflac__on_seek_ogg;
8064 pFlac->bs.pUserData = (void*)pInternalOggbs;
8065 pFlac->_oggbs = (void*)pInternalOggbs;
8066 }
8067#endif
8068
8069 pFlac->firstFLACFramePosInBytes = firstFramePos;
8070
8071 /* NOTE: Seektables are not currently compatible with Ogg encapsulation (Ogg has its own accelerated seeking system). I may change this later, so I'm leaving this here for now. */
8072#ifndef DR_FLAC_NO_OGG
8073 if (init.container == drflac_container_ogg)
8074 {
8075 pFlac->pSeekpoints = NULL;
8076 pFlac->seekpointCount = 0;
8077 }
8078 else
8079#endif
8080 {
8081 /* If we have a seektable we need to load it now, making sure we move back to where we were previously. */
8082 if (seektablePos != 0) {
9e052883 8083 pFlac->seekpointCount = seekpointCount;
2ff0b512 8084 pFlac->pSeekpoints = (drflac_seekpoint*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize);
8085
8086 DRFLAC_ASSERT(pFlac->bs.onSeek != NULL);
8087 DRFLAC_ASSERT(pFlac->bs.onRead != NULL);
8088
8089 /* Seek to the seektable, then just read directly into our seektable buffer. */
8090 if (pFlac->bs.onSeek(pFlac->bs.pUserData, (int)seektablePos, drflac_seek_origin_start)) {
9e052883 8091 drflac_uint32 iSeekpoint;
8092
8093 for (iSeekpoint = 0; iSeekpoint < seekpointCount; iSeekpoint += 1) {
8094 if (pFlac->bs.onRead(pFlac->bs.pUserData, pFlac->pSeekpoints + iSeekpoint, DRFLAC_SEEKPOINT_SIZE_IN_BYTES) == DRFLAC_SEEKPOINT_SIZE_IN_BYTES) {
8095 /* Endian swap. */
2ff0b512 8096 pFlac->pSeekpoints[iSeekpoint].firstPCMFrame = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].firstPCMFrame);
8097 pFlac->pSeekpoints[iSeekpoint].flacFrameOffset = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].flacFrameOffset);
8098 pFlac->pSeekpoints[iSeekpoint].pcmFrameCount = drflac__be2host_16(pFlac->pSeekpoints[iSeekpoint].pcmFrameCount);
9e052883 8099 } else {
8100 /* Failed to read the seektable. Pretend we don't have one. */
8101 pFlac->pSeekpoints = NULL;
8102 pFlac->seekpointCount = 0;
8103 break;
f5b7bb83 8104 }
f5b7bb83 8105 }
2ff0b512 8106
f5b7bb83 8107 /* We need to seek back to where we were. If this fails it's a critical error. */
8108 if (!pFlac->bs.onSeek(pFlac->bs.pUserData, (int)pFlac->firstFLACFramePosInBytes, drflac_seek_origin_start)) {
8109 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
8110 return NULL;
8111 }
8112 } else {
8113 /* Failed to seek to the seektable. Ominous sign, but for now we can just pretend we don't have one. */
8114 pFlac->pSeekpoints = NULL;
8115 pFlac->seekpointCount = 0;
8116 }
8117 }
2ff0b512 8118 }
8119
2ff0b512 8120
9e052883 8121 /*
8122 If we get here, but don't have a STREAMINFO block, it means we've opened the stream in relaxed mode and need to decode
8123 the first frame.
8124 */
8125 if (!init.hasStreamInfoBlock) {
8126 pFlac->currentFLACFrame.header = init.firstFrameHeader;
8127 for (;;) {
8128 drflac_result result = drflac__decode_flac_frame(pFlac);
8129 if (result == DRFLAC_SUCCESS) {
8130 break;
8131 } else {
8132 if (result == DRFLAC_CRC_MISMATCH) {
8133 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
8134 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
8135 return NULL;
8136 }
8137 continue;
8138 } else {
8139 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
8140 return NULL;
8141 }
8142 }
8143 }
8144 }
8145
8146 return pFlac;
8147}
8148
8149
8150
8151#ifndef DR_FLAC_NO_STDIO
8152#include <stdio.h>
8153#ifndef DR_FLAC_NO_WCHAR
8154#include <wchar.h> /* For wcslen(), wcsrtombs() */
8155#endif
8156
648db22b 8157/* Errno */
9e052883 8158/* drflac_result_from_errno() is only used for fopen() and wfopen() so putting it inside DR_WAV_NO_STDIO for now. If something else needs this later we can move it out. */
8159#include <errno.h>
8160static drflac_result drflac_result_from_errno(int e)
8161{
8162 switch (e)
8163 {
8164 case 0: return DRFLAC_SUCCESS;
8165 #ifdef EPERM
8166 case EPERM: return DRFLAC_INVALID_OPERATION;
8167 #endif
8168 #ifdef ENOENT
8169 case ENOENT: return DRFLAC_DOES_NOT_EXIST;
8170 #endif
8171 #ifdef ESRCH
8172 case ESRCH: return DRFLAC_DOES_NOT_EXIST;
8173 #endif
8174 #ifdef EINTR
8175 case EINTR: return DRFLAC_INTERRUPT;
8176 #endif
8177 #ifdef EIO
8178 case EIO: return DRFLAC_IO_ERROR;
8179 #endif
8180 #ifdef ENXIO
8181 case ENXIO: return DRFLAC_DOES_NOT_EXIST;
8182 #endif
8183 #ifdef E2BIG
8184 case E2BIG: return DRFLAC_INVALID_ARGS;
8185 #endif
8186 #ifdef ENOEXEC
8187 case ENOEXEC: return DRFLAC_INVALID_FILE;
8188 #endif
8189 #ifdef EBADF
8190 case EBADF: return DRFLAC_INVALID_FILE;
8191 #endif
8192 #ifdef ECHILD
8193 case ECHILD: return DRFLAC_ERROR;
8194 #endif
8195 #ifdef EAGAIN
8196 case EAGAIN: return DRFLAC_UNAVAILABLE;
8197 #endif
8198 #ifdef ENOMEM
8199 case ENOMEM: return DRFLAC_OUT_OF_MEMORY;
8200 #endif
8201 #ifdef EACCES
8202 case EACCES: return DRFLAC_ACCESS_DENIED;
8203 #endif
8204 #ifdef EFAULT
8205 case EFAULT: return DRFLAC_BAD_ADDRESS;
8206 #endif
8207 #ifdef ENOTBLK
8208 case ENOTBLK: return DRFLAC_ERROR;
8209 #endif
8210 #ifdef EBUSY
8211 case EBUSY: return DRFLAC_BUSY;
8212 #endif
8213 #ifdef EEXIST
8214 case EEXIST: return DRFLAC_ALREADY_EXISTS;
8215 #endif
8216 #ifdef EXDEV
8217 case EXDEV: return DRFLAC_ERROR;
8218 #endif
8219 #ifdef ENODEV
8220 case ENODEV: return DRFLAC_DOES_NOT_EXIST;
8221 #endif
8222 #ifdef ENOTDIR
8223 case ENOTDIR: return DRFLAC_NOT_DIRECTORY;
8224 #endif
8225 #ifdef EISDIR
8226 case EISDIR: return DRFLAC_IS_DIRECTORY;
8227 #endif
8228 #ifdef EINVAL
8229 case EINVAL: return DRFLAC_INVALID_ARGS;
8230 #endif
8231 #ifdef ENFILE
8232 case ENFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
8233 #endif
8234 #ifdef EMFILE
8235 case EMFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
8236 #endif
8237 #ifdef ENOTTY
8238 case ENOTTY: return DRFLAC_INVALID_OPERATION;
8239 #endif
8240 #ifdef ETXTBSY
8241 case ETXTBSY: return DRFLAC_BUSY;
8242 #endif
8243 #ifdef EFBIG
8244 case EFBIG: return DRFLAC_TOO_BIG;
8245 #endif
8246 #ifdef ENOSPC
8247 case ENOSPC: return DRFLAC_NO_SPACE;
8248 #endif
8249 #ifdef ESPIPE
8250 case ESPIPE: return DRFLAC_BAD_SEEK;
8251 #endif
8252 #ifdef EROFS
8253 case EROFS: return DRFLAC_ACCESS_DENIED;
8254 #endif
8255 #ifdef EMLINK
8256 case EMLINK: return DRFLAC_TOO_MANY_LINKS;
8257 #endif
8258 #ifdef EPIPE
8259 case EPIPE: return DRFLAC_BAD_PIPE;
8260 #endif
8261 #ifdef EDOM
8262 case EDOM: return DRFLAC_OUT_OF_RANGE;
8263 #endif
8264 #ifdef ERANGE
8265 case ERANGE: return DRFLAC_OUT_OF_RANGE;
8266 #endif
8267 #ifdef EDEADLK
8268 case EDEADLK: return DRFLAC_DEADLOCK;
8269 #endif
8270 #ifdef ENAMETOOLONG
8271 case ENAMETOOLONG: return DRFLAC_PATH_TOO_LONG;
8272 #endif
8273 #ifdef ENOLCK
8274 case ENOLCK: return DRFLAC_ERROR;
8275 #endif
8276 #ifdef ENOSYS
8277 case ENOSYS: return DRFLAC_NOT_IMPLEMENTED;
8278 #endif
8279 #ifdef ENOTEMPTY
8280 case ENOTEMPTY: return DRFLAC_DIRECTORY_NOT_EMPTY;
8281 #endif
8282 #ifdef ELOOP
8283 case ELOOP: return DRFLAC_TOO_MANY_LINKS;
8284 #endif
8285 #ifdef ENOMSG
8286 case ENOMSG: return DRFLAC_NO_MESSAGE;
8287 #endif
8288 #ifdef EIDRM
8289 case EIDRM: return DRFLAC_ERROR;
8290 #endif
8291 #ifdef ECHRNG
8292 case ECHRNG: return DRFLAC_ERROR;
8293 #endif
8294 #ifdef EL2NSYNC
8295 case EL2NSYNC: return DRFLAC_ERROR;
8296 #endif
8297 #ifdef EL3HLT
8298 case EL3HLT: return DRFLAC_ERROR;
8299 #endif
8300 #ifdef EL3RST
8301 case EL3RST: return DRFLAC_ERROR;
8302 #endif
8303 #ifdef ELNRNG
8304 case ELNRNG: return DRFLAC_OUT_OF_RANGE;
8305 #endif
8306 #ifdef EUNATCH
8307 case EUNATCH: return DRFLAC_ERROR;
8308 #endif
8309 #ifdef ENOCSI
8310 case ENOCSI: return DRFLAC_ERROR;
8311 #endif
8312 #ifdef EL2HLT
8313 case EL2HLT: return DRFLAC_ERROR;
8314 #endif
8315 #ifdef EBADE
8316 case EBADE: return DRFLAC_ERROR;
8317 #endif
8318 #ifdef EBADR
8319 case EBADR: return DRFLAC_ERROR;
8320 #endif
8321 #ifdef EXFULL
8322 case EXFULL: return DRFLAC_ERROR;
8323 #endif
8324 #ifdef ENOANO
8325 case ENOANO: return DRFLAC_ERROR;
8326 #endif
8327 #ifdef EBADRQC
8328 case EBADRQC: return DRFLAC_ERROR;
8329 #endif
8330 #ifdef EBADSLT
8331 case EBADSLT: return DRFLAC_ERROR;
8332 #endif
8333 #ifdef EBFONT
8334 case EBFONT: return DRFLAC_INVALID_FILE;
8335 #endif
8336 #ifdef ENOSTR
8337 case ENOSTR: return DRFLAC_ERROR;
8338 #endif
8339 #ifdef ENODATA
8340 case ENODATA: return DRFLAC_NO_DATA_AVAILABLE;
8341 #endif
8342 #ifdef ETIME
8343 case ETIME: return DRFLAC_TIMEOUT;
8344 #endif
8345 #ifdef ENOSR
8346 case ENOSR: return DRFLAC_NO_DATA_AVAILABLE;
8347 #endif
8348 #ifdef ENONET
8349 case ENONET: return DRFLAC_NO_NETWORK;
8350 #endif
8351 #ifdef ENOPKG
8352 case ENOPKG: return DRFLAC_ERROR;
8353 #endif
8354 #ifdef EREMOTE
8355 case EREMOTE: return DRFLAC_ERROR;
8356 #endif
8357 #ifdef ENOLINK
8358 case ENOLINK: return DRFLAC_ERROR;
8359 #endif
8360 #ifdef EADV
8361 case EADV: return DRFLAC_ERROR;
8362 #endif
8363 #ifdef ESRMNT
8364 case ESRMNT: return DRFLAC_ERROR;
8365 #endif
8366 #ifdef ECOMM
8367 case ECOMM: return DRFLAC_ERROR;
8368 #endif
8369 #ifdef EPROTO
8370 case EPROTO: return DRFLAC_ERROR;
8371 #endif
8372 #ifdef EMULTIHOP
8373 case EMULTIHOP: return DRFLAC_ERROR;
8374 #endif
8375 #ifdef EDOTDOT
8376 case EDOTDOT: return DRFLAC_ERROR;
8377 #endif
8378 #ifdef EBADMSG
8379 case EBADMSG: return DRFLAC_BAD_MESSAGE;
8380 #endif
8381 #ifdef EOVERFLOW
8382 case EOVERFLOW: return DRFLAC_TOO_BIG;
8383 #endif
8384 #ifdef ENOTUNIQ
8385 case ENOTUNIQ: return DRFLAC_NOT_UNIQUE;
8386 #endif
8387 #ifdef EBADFD
8388 case EBADFD: return DRFLAC_ERROR;
8389 #endif
8390 #ifdef EREMCHG
8391 case EREMCHG: return DRFLAC_ERROR;
8392 #endif
8393 #ifdef ELIBACC
8394 case ELIBACC: return DRFLAC_ACCESS_DENIED;
8395 #endif
8396 #ifdef ELIBBAD
8397 case ELIBBAD: return DRFLAC_INVALID_FILE;
8398 #endif
8399 #ifdef ELIBSCN
8400 case ELIBSCN: return DRFLAC_INVALID_FILE;
8401 #endif
8402 #ifdef ELIBMAX
8403 case ELIBMAX: return DRFLAC_ERROR;
8404 #endif
8405 #ifdef ELIBEXEC
8406 case ELIBEXEC: return DRFLAC_ERROR;
8407 #endif
8408 #ifdef EILSEQ
8409 case EILSEQ: return DRFLAC_INVALID_DATA;
8410 #endif
8411 #ifdef ERESTART
8412 case ERESTART: return DRFLAC_ERROR;
8413 #endif
8414 #ifdef ESTRPIPE
8415 case ESTRPIPE: return DRFLAC_ERROR;
8416 #endif
8417 #ifdef EUSERS
8418 case EUSERS: return DRFLAC_ERROR;
8419 #endif
8420 #ifdef ENOTSOCK
8421 case ENOTSOCK: return DRFLAC_NOT_SOCKET;
8422 #endif
8423 #ifdef EDESTADDRREQ
8424 case EDESTADDRREQ: return DRFLAC_NO_ADDRESS;
8425 #endif
8426 #ifdef EMSGSIZE
8427 case EMSGSIZE: return DRFLAC_TOO_BIG;
8428 #endif
8429 #ifdef EPROTOTYPE
8430 case EPROTOTYPE: return DRFLAC_BAD_PROTOCOL;
8431 #endif
8432 #ifdef ENOPROTOOPT
8433 case ENOPROTOOPT: return DRFLAC_PROTOCOL_UNAVAILABLE;
8434 #endif
8435 #ifdef EPROTONOSUPPORT
8436 case EPROTONOSUPPORT: return DRFLAC_PROTOCOL_NOT_SUPPORTED;
8437 #endif
8438 #ifdef ESOCKTNOSUPPORT
8439 case ESOCKTNOSUPPORT: return DRFLAC_SOCKET_NOT_SUPPORTED;
8440 #endif
8441 #ifdef EOPNOTSUPP
8442 case EOPNOTSUPP: return DRFLAC_INVALID_OPERATION;
8443 #endif
8444 #ifdef EPFNOSUPPORT
8445 case EPFNOSUPPORT: return DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED;
8446 #endif
8447 #ifdef EAFNOSUPPORT
8448 case EAFNOSUPPORT: return DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED;
8449 #endif
8450 #ifdef EADDRINUSE
8451 case EADDRINUSE: return DRFLAC_ALREADY_IN_USE;
8452 #endif
8453 #ifdef EADDRNOTAVAIL
8454 case EADDRNOTAVAIL: return DRFLAC_ERROR;
8455 #endif
8456 #ifdef ENETDOWN
8457 case ENETDOWN: return DRFLAC_NO_NETWORK;
8458 #endif
8459 #ifdef ENETUNREACH
8460 case ENETUNREACH: return DRFLAC_NO_NETWORK;
8461 #endif
8462 #ifdef ENETRESET
8463 case ENETRESET: return DRFLAC_NO_NETWORK;
8464 #endif
8465 #ifdef ECONNABORTED
8466 case ECONNABORTED: return DRFLAC_NO_NETWORK;
8467 #endif
8468 #ifdef ECONNRESET
8469 case ECONNRESET: return DRFLAC_CONNECTION_RESET;
8470 #endif
8471 #ifdef ENOBUFS
8472 case ENOBUFS: return DRFLAC_NO_SPACE;
8473 #endif
8474 #ifdef EISCONN
8475 case EISCONN: return DRFLAC_ALREADY_CONNECTED;
8476 #endif
8477 #ifdef ENOTCONN
8478 case ENOTCONN: return DRFLAC_NOT_CONNECTED;
8479 #endif
8480 #ifdef ESHUTDOWN
8481 case ESHUTDOWN: return DRFLAC_ERROR;
8482 #endif
8483 #ifdef ETOOMANYREFS
8484 case ETOOMANYREFS: return DRFLAC_ERROR;
8485 #endif
8486 #ifdef ETIMEDOUT
8487 case ETIMEDOUT: return DRFLAC_TIMEOUT;
8488 #endif
8489 #ifdef ECONNREFUSED
8490 case ECONNREFUSED: return DRFLAC_CONNECTION_REFUSED;
8491 #endif
8492 #ifdef EHOSTDOWN
8493 case EHOSTDOWN: return DRFLAC_NO_HOST;
8494 #endif
8495 #ifdef EHOSTUNREACH
8496 case EHOSTUNREACH: return DRFLAC_NO_HOST;
8497 #endif
8498 #ifdef EALREADY
8499 case EALREADY: return DRFLAC_IN_PROGRESS;
8500 #endif
8501 #ifdef EINPROGRESS
8502 case EINPROGRESS: return DRFLAC_IN_PROGRESS;
8503 #endif
8504 #ifdef ESTALE
8505 case ESTALE: return DRFLAC_INVALID_FILE;
8506 #endif
8507 #ifdef EUCLEAN
8508 case EUCLEAN: return DRFLAC_ERROR;
8509 #endif
8510 #ifdef ENOTNAM
8511 case ENOTNAM: return DRFLAC_ERROR;
8512 #endif
8513 #ifdef ENAVAIL
8514 case ENAVAIL: return DRFLAC_ERROR;
8515 #endif
8516 #ifdef EISNAM
8517 case EISNAM: return DRFLAC_ERROR;
8518 #endif
8519 #ifdef EREMOTEIO
8520 case EREMOTEIO: return DRFLAC_IO_ERROR;
8521 #endif
8522 #ifdef EDQUOT
8523 case EDQUOT: return DRFLAC_NO_SPACE;
8524 #endif
8525 #ifdef ENOMEDIUM
8526 case ENOMEDIUM: return DRFLAC_DOES_NOT_EXIST;
8527 #endif
8528 #ifdef EMEDIUMTYPE
8529 case EMEDIUMTYPE: return DRFLAC_ERROR;
8530 #endif
8531 #ifdef ECANCELED
8532 case ECANCELED: return DRFLAC_CANCELLED;
8533 #endif
8534 #ifdef ENOKEY
8535 case ENOKEY: return DRFLAC_ERROR;
8536 #endif
8537 #ifdef EKEYEXPIRED
8538 case EKEYEXPIRED: return DRFLAC_ERROR;
8539 #endif
8540 #ifdef EKEYREVOKED
8541 case EKEYREVOKED: return DRFLAC_ERROR;
8542 #endif
8543 #ifdef EKEYREJECTED
8544 case EKEYREJECTED: return DRFLAC_ERROR;
8545 #endif
8546 #ifdef EOWNERDEAD
8547 case EOWNERDEAD: return DRFLAC_ERROR;
8548 #endif
8549 #ifdef ENOTRECOVERABLE
8550 case ENOTRECOVERABLE: return DRFLAC_ERROR;
8551 #endif
8552 #ifdef ERFKILL
8553 case ERFKILL: return DRFLAC_ERROR;
8554 #endif
8555 #ifdef EHWPOISON
8556 case EHWPOISON: return DRFLAC_ERROR;
8557 #endif
8558 default: return DRFLAC_ERROR;
8559 }
8560}
648db22b 8561/* End Errno */
9e052883 8562
648db22b 8563/* fopen */
9e052883 8564static drflac_result drflac_fopen(FILE** ppFile, const char* pFilePath, const char* pOpenMode)
8565{
8566#if defined(_MSC_VER) && _MSC_VER >= 1400
8567 errno_t err;
8568#endif
8569
8570 if (ppFile != NULL) {
8571 *ppFile = NULL; /* Safety. */
8572 }
8573
8574 if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
8575 return DRFLAC_INVALID_ARGS;
8576 }
8577
8578#if defined(_MSC_VER) && _MSC_VER >= 1400
8579 err = fopen_s(ppFile, pFilePath, pOpenMode);
8580 if (err != 0) {
8581 return drflac_result_from_errno(err);
8582 }
8583#else
8584#if defined(_WIN32) || defined(__APPLE__)
8585 *ppFile = fopen(pFilePath, pOpenMode);
8586#else
8587 #if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 64 && defined(_LARGEFILE64_SOURCE)
8588 *ppFile = fopen64(pFilePath, pOpenMode);
8589 #else
8590 *ppFile = fopen(pFilePath, pOpenMode);
8591 #endif
8592#endif
8593 if (*ppFile == NULL) {
8594 drflac_result result = drflac_result_from_errno(errno);
8595 if (result == DRFLAC_SUCCESS) {
8596 result = DRFLAC_ERROR; /* Just a safety check to make sure we never ever return success when pFile == NULL. */
8597 }
8598
8599 return result;
8600 }
8601#endif
8602
8603 return DRFLAC_SUCCESS;
8604}
8605
8606/*
8607_wfopen() isn't always available in all compilation environments.
8608
8609 * Windows only.
8610 * MSVC seems to support it universally as far back as VC6 from what I can tell (haven't checked further back).
8611 * MinGW-64 (both 32- and 64-bit) seems to support it.
8612 * MinGW wraps it in !defined(__STRICT_ANSI__).
8613 * OpenWatcom wraps it in !defined(_NO_EXT_KEYS).
8614
8615This can be reviewed as compatibility issues arise. The preference is to use _wfopen_s() and _wfopen() as opposed to the wcsrtombs()
8616fallback, so if you notice your compiler not detecting this properly I'm happy to look at adding support.
8617*/
8618#if defined(_WIN32)
8619 #if defined(_MSC_VER) || defined(__MINGW64__) || (!defined(__STRICT_ANSI__) && !defined(_NO_EXT_KEYS))
8620 #define DRFLAC_HAS_WFOPEN
8621 #endif
8622#endif
8623
8624#ifndef DR_FLAC_NO_WCHAR
8625static drflac_result drflac_wfopen(FILE** ppFile, const wchar_t* pFilePath, const wchar_t* pOpenMode, const drflac_allocation_callbacks* pAllocationCallbacks)
8626{
8627 if (ppFile != NULL) {
8628 *ppFile = NULL; /* Safety. */
8629 }
8630
8631 if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
8632 return DRFLAC_INVALID_ARGS;
8633 }
8634
8635#if defined(DRFLAC_HAS_WFOPEN)
8636 {
8637 /* Use _wfopen() on Windows. */
8638 #if defined(_MSC_VER) && _MSC_VER >= 1400
8639 errno_t err = _wfopen_s(ppFile, pFilePath, pOpenMode);
8640 if (err != 0) {
8641 return drflac_result_from_errno(err);
8642 }
8643 #else
8644 *ppFile = _wfopen(pFilePath, pOpenMode);
8645 if (*ppFile == NULL) {
8646 return drflac_result_from_errno(errno);
8647 }
8648 #endif
8649 (void)pAllocationCallbacks;
8650 }
8651#else
8652 /*
8653 Use fopen() on anything other than Windows. Requires a conversion. This is annoying because
8654 fopen() is locale specific. The only real way I can think of to do this is with wcsrtombs(). Note
8655 that wcstombs() is apparently not thread-safe because it uses a static global mbstate_t object for
8656 maintaining state. I've checked this with -std=c89 and it works, but if somebody get's a compiler
8657 error I'll look into improving compatibility.
8658 */
8659
8660 /*
8661 Some compilers don't support wchar_t or wcsrtombs() which we're using below. In this case we just
8662 need to abort with an error. If you encounter a compiler lacking such support, add it to this list
8663 and submit a bug report and it'll be added to the library upstream.
8664 */
8665 #if defined(__DJGPP__)
8666 {
8667 /* Nothing to do here. This will fall through to the error check below. */
8668 }
8669 #else
8670 {
8671 mbstate_t mbs;
8672 size_t lenMB;
8673 const wchar_t* pFilePathTemp = pFilePath;
8674 char* pFilePathMB = NULL;
8675 char pOpenModeMB[32] = {0};
8676
8677 /* Get the length first. */
8678 DRFLAC_ZERO_OBJECT(&mbs);
8679 lenMB = wcsrtombs(NULL, &pFilePathTemp, 0, &mbs);
8680 if (lenMB == (size_t)-1) {
8681 return drflac_result_from_errno(errno);
8682 }
8683
8684 pFilePathMB = (char*)drflac__malloc_from_callbacks(lenMB + 1, pAllocationCallbacks);
8685 if (pFilePathMB == NULL) {
8686 return DRFLAC_OUT_OF_MEMORY;
8687 }
8688
8689 pFilePathTemp = pFilePath;
8690 DRFLAC_ZERO_OBJECT(&mbs);
8691 wcsrtombs(pFilePathMB, &pFilePathTemp, lenMB + 1, &mbs);
8692
8693 /* The open mode should always consist of ASCII characters so we should be able to do a trivial conversion. */
8694 {
8695 size_t i = 0;
8696 for (;;) {
8697 if (pOpenMode[i] == 0) {
8698 pOpenModeMB[i] = '\0';
8699 break;
8700 }
8701
8702 pOpenModeMB[i] = (char)pOpenMode[i];
8703 i += 1;
8704 }
8705 }
8706
8707 *ppFile = fopen(pFilePathMB, pOpenModeMB);
8708
8709 drflac__free_from_callbacks(pFilePathMB, pAllocationCallbacks);
8710 }
8711 #endif
8712
8713 if (*ppFile == NULL) {
8714 return DRFLAC_ERROR;
8715 }
8716#endif
8717
8718 return DRFLAC_SUCCESS;
8719}
8720#endif
648db22b 8721/* End fopen */
9e052883 8722
8723static size_t drflac__on_read_stdio(void* pUserData, void* bufferOut, size_t bytesToRead)
8724{
8725 return fread(bufferOut, 1, bytesToRead, (FILE*)pUserData);
8726}
8727
8728static drflac_bool32 drflac__on_seek_stdio(void* pUserData, int offset, drflac_seek_origin origin)
8729{
8730 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
8731
8732 return fseek((FILE*)pUserData, offset, (origin == drflac_seek_origin_current) ? SEEK_CUR : SEEK_SET) == 0;
8733}
8734
8735
8736DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
8737{
8738 drflac* pFlac;
8739 FILE* pFile;
8740
8741 if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
8742 return NULL;
8743 }
8744
8745 pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
8746 if (pFlac == NULL) {
8747 fclose(pFile);
8748 return NULL;
8749 }
8750
8751 return pFlac;
8752}
8753
8754#ifndef DR_FLAC_NO_WCHAR
8755DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
8756{
8757 drflac* pFlac;
8758 FILE* pFile;
8759
8760 if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
8761 return NULL;
8762 }
8763
8764 pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
8765 if (pFlac == NULL) {
8766 fclose(pFile);
8767 return NULL;
8768 }
8769
8770 return pFlac;
8771}
8772#endif
8773
8774DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8775{
8776 drflac* pFlac;
8777 FILE* pFile;
8778
8779 if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
8780 return NULL;
8781 }
8782
8783 pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
8784 if (pFlac == NULL) {
8785 fclose(pFile);
8786 return pFlac;
8787 }
8788
8789 return pFlac;
8790}
8791
8792#ifndef DR_FLAC_NO_WCHAR
8793DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8794{
8795 drflac* pFlac;
8796 FILE* pFile;
8797
8798 if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
8799 return NULL;
8800 }
8801
8802 pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
8803 if (pFlac == NULL) {
8804 fclose(pFile);
8805 return pFlac;
2ff0b512 8806 }
8807
8808 return pFlac;
8809}
9e052883 8810#endif
8811#endif /* DR_FLAC_NO_STDIO */
2ff0b512 8812
8813static size_t drflac__on_read_memory(void* pUserData, void* bufferOut, size_t bytesToRead)
8814{
8815 drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8816 size_t bytesRemaining;
8817
8818 DRFLAC_ASSERT(memoryStream != NULL);
8819 DRFLAC_ASSERT(memoryStream->dataSize >= memoryStream->currentReadPos);
8820
8821 bytesRemaining = memoryStream->dataSize - memoryStream->currentReadPos;
8822 if (bytesToRead > bytesRemaining) {
8823 bytesToRead = bytesRemaining;
8824 }
8825
8826 if (bytesToRead > 0) {
8827 DRFLAC_COPY_MEMORY(bufferOut, memoryStream->data + memoryStream->currentReadPos, bytesToRead);
8828 memoryStream->currentReadPos += bytesToRead;
8829 }
8830
8831 return bytesToRead;
8832}
8833
8834static drflac_bool32 drflac__on_seek_memory(void* pUserData, int offset, drflac_seek_origin origin)
8835{
8836 drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8837
8838 DRFLAC_ASSERT(memoryStream != NULL);
8839 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
8840
8841 if (offset > (drflac_int64)memoryStream->dataSize) {
8842 return DRFLAC_FALSE;
8843 }
8844
8845 if (origin == drflac_seek_origin_current) {
8846 if (memoryStream->currentReadPos + offset <= memoryStream->dataSize) {
8847 memoryStream->currentReadPos += offset;
8848 } else {
8849 return DRFLAC_FALSE; /* Trying to seek too far forward. */
8850 }
8851 } else {
8852 if ((drflac_uint32)offset <= memoryStream->dataSize) {
8853 memoryStream->currentReadPos = offset;
8854 } else {
8855 return DRFLAC_FALSE; /* Trying to seek too far forward. */
8856 }
8857 }
8858
8859 return DRFLAC_TRUE;
8860}
8861
8862DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks)
8863{
8864 drflac__memory_stream memoryStream;
8865 drflac* pFlac;
8866
8867 memoryStream.data = (const drflac_uint8*)pData;
8868 memoryStream.dataSize = dataSize;
8869 memoryStream.currentReadPos = 0;
8870 pFlac = drflac_open(drflac__on_read_memory, drflac__on_seek_memory, &memoryStream, pAllocationCallbacks);
8871 if (pFlac == NULL) {
8872 return NULL;
8873 }
8874
8875 pFlac->memoryStream = memoryStream;
8876
8877 /* This is an awful hack... */
8878#ifndef DR_FLAC_NO_OGG
8879 if (pFlac->container == drflac_container_ogg)
8880 {
8881 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8882 oggbs->pUserData = &pFlac->memoryStream;
8883 }
8884 else
8885#endif
8886 {
8887 pFlac->bs.pUserData = &pFlac->memoryStream;
8888 }
8889
8890 return pFlac;
8891}
8892
8893DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8894{
8895 drflac__memory_stream memoryStream;
8896 drflac* pFlac;
8897
8898 memoryStream.data = (const drflac_uint8*)pData;
8899 memoryStream.dataSize = dataSize;
8900 memoryStream.currentReadPos = 0;
8901 pFlac = drflac_open_with_metadata_private(drflac__on_read_memory, drflac__on_seek_memory, onMeta, drflac_container_unknown, &memoryStream, pUserData, pAllocationCallbacks);
8902 if (pFlac == NULL) {
8903 return NULL;
8904 }
8905
8906 pFlac->memoryStream = memoryStream;
8907
8908 /* This is an awful hack... */
8909#ifndef DR_FLAC_NO_OGG
8910 if (pFlac->container == drflac_container_ogg)
8911 {
8912 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8913 oggbs->pUserData = &pFlac->memoryStream;
8914 }
8915 else
8916#endif
8917 {
8918 pFlac->bs.pUserData = &pFlac->memoryStream;
8919 }
8920
8921 return pFlac;
8922}
8923
8924
8925
8926DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8927{
8928 return drflac_open_with_metadata_private(onRead, onSeek, NULL, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
8929}
8930DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8931{
8932 return drflac_open_with_metadata_private(onRead, onSeek, NULL, container, pUserData, pUserData, pAllocationCallbacks);
8933}
8934
8935DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8936{
8937 return drflac_open_with_metadata_private(onRead, onSeek, onMeta, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
8938}
8939DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8940{
8941 return drflac_open_with_metadata_private(onRead, onSeek, onMeta, container, pUserData, pUserData, pAllocationCallbacks);
8942}
8943
8944DRFLAC_API void drflac_close(drflac* pFlac)
8945{
8946 if (pFlac == NULL) {
8947 return;
8948 }
8949
9e052883 8950#ifndef DR_FLAC_NO_STDIO
8951 /*
8952 If we opened the file with drflac_open_file() we will want to close the file handle. We can know whether or not drflac_open_file()
8953 was used by looking at the callbacks.
8954 */
8955 if (pFlac->bs.onRead == drflac__on_read_stdio) {
8956 fclose((FILE*)pFlac->bs.pUserData);
8957 }
8958
8959#ifndef DR_FLAC_NO_OGG
8960 /* Need to clean up Ogg streams a bit differently due to the way the bit streaming is chained. */
8961 if (pFlac->container == drflac_container_ogg) {
8962 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8963 DRFLAC_ASSERT(pFlac->bs.onRead == drflac__on_read_ogg);
8964
8965 if (oggbs->onRead == drflac__on_read_stdio) {
8966 fclose((FILE*)oggbs->pUserData);
8967 }
8968 }
8969#endif
8970#endif
8971
2ff0b512 8972 drflac__free_from_callbacks(pFlac, &pFlac->allocationCallbacks);
8973}
8974
9e052883 8975
8976#if 0
8977static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8978{
8979 drflac_uint64 i;
8980 for (i = 0; i < frameCount; ++i) {
8981 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
8982 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
8983 drflac_uint32 right = left - side;
8984
8985 pOutputSamples[i*2+0] = (drflac_int32)left;
8986 pOutputSamples[i*2+1] = (drflac_int32)right;
8987 }
8988}
8989#endif
8990
2ff0b512 8991static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8992{
8993 drflac_uint64 i;
8994 drflac_uint64 frameCount4 = frameCount >> 2;
8995 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8996 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8997 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
8998 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
8999
9000 for (i = 0; i < frameCount4; ++i) {
9001 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
9002 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
9003 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
9004 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
9005
9006 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
9007 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
9008 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
9009 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
9010
9011 drflac_uint32 right0 = left0 - side0;
9012 drflac_uint32 right1 = left1 - side1;
9013 drflac_uint32 right2 = left2 - side2;
9014 drflac_uint32 right3 = left3 - side3;
9015
9016 pOutputSamples[i*8+0] = (drflac_int32)left0;
9017 pOutputSamples[i*8+1] = (drflac_int32)right0;
9018 pOutputSamples[i*8+2] = (drflac_int32)left1;
9019 pOutputSamples[i*8+3] = (drflac_int32)right1;
9020 pOutputSamples[i*8+4] = (drflac_int32)left2;
9021 pOutputSamples[i*8+5] = (drflac_int32)right2;
9022 pOutputSamples[i*8+6] = (drflac_int32)left3;
9023 pOutputSamples[i*8+7] = (drflac_int32)right3;
9024 }
9025
9026 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9027 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9028 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9029 drflac_uint32 right = left - side;
9030
9031 pOutputSamples[i*2+0] = (drflac_int32)left;
9032 pOutputSamples[i*2+1] = (drflac_int32)right;
9033 }
9034}
9035
9036#if defined(DRFLAC_SUPPORT_SSE2)
9037static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9038{
9039 drflac_uint64 i;
9040 drflac_uint64 frameCount4 = frameCount >> 2;
9041 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9042 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9043 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9044 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9045
9046 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9047
9048 for (i = 0; i < frameCount4; ++i) {
9049 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9050 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9051 __m128i right = _mm_sub_epi32(left, side);
9052
9053 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9054 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9055 }
9056
9057 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9058 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9059 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9060 drflac_uint32 right = left - side;
9061
9062 pOutputSamples[i*2+0] = (drflac_int32)left;
9063 pOutputSamples[i*2+1] = (drflac_int32)right;
9064 }
9065}
9066#endif
9067
9068#if defined(DRFLAC_SUPPORT_NEON)
9069static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9070{
9071 drflac_uint64 i;
9072 drflac_uint64 frameCount4 = frameCount >> 2;
9073 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9074 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9075 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9076 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9077 int32x4_t shift0_4;
9078 int32x4_t shift1_4;
9079
9080 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9081
9082 shift0_4 = vdupq_n_s32(shift0);
9083 shift1_4 = vdupq_n_s32(shift1);
9084
9085 for (i = 0; i < frameCount4; ++i) {
9086 uint32x4_t left;
9087 uint32x4_t side;
9088 uint32x4_t right;
9089
9090 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9091 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9092 right = vsubq_u32(left, side);
9093
9094 drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
9095 }
9096
9097 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9098 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9099 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9100 drflac_uint32 right = left - side;
9101
9102 pOutputSamples[i*2+0] = (drflac_int32)left;
9103 pOutputSamples[i*2+1] = (drflac_int32)right;
9104 }
9105}
9106#endif
9107
9108static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9109{
9110#if defined(DRFLAC_SUPPORT_SSE2)
9111 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9112 drflac_read_pcm_frames_s32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9113 } else
9114#elif defined(DRFLAC_SUPPORT_NEON)
9115 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9116 drflac_read_pcm_frames_s32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9117 } else
9118#endif
9119 {
9120 /* Scalar fallback. */
9e052883 9121#if 0
9122 drflac_read_pcm_frames_s32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9123#else
2ff0b512 9124 drflac_read_pcm_frames_s32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 9125#endif
2ff0b512 9126 }
9127}
9128
9129
9e052883 9130#if 0
9131static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9132{
9133 drflac_uint64 i;
9134 for (i = 0; i < frameCount; ++i) {
9135 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9136 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9137 drflac_uint32 left = right + side;
9138
9139 pOutputSamples[i*2+0] = (drflac_int32)left;
9140 pOutputSamples[i*2+1] = (drflac_int32)right;
9141 }
9142}
9143#endif
9144
2ff0b512 9145static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9146{
9147 drflac_uint64 i;
9148 drflac_uint64 frameCount4 = frameCount >> 2;
9149 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9150 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9151 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9152 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9153
9154 for (i = 0; i < frameCount4; ++i) {
9155 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
9156 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
9157 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
9158 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
9159
9160 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
9161 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
9162 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
9163 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
9164
9165 drflac_uint32 left0 = right0 + side0;
9166 drflac_uint32 left1 = right1 + side1;
9167 drflac_uint32 left2 = right2 + side2;
9168 drflac_uint32 left3 = right3 + side3;
9169
9170 pOutputSamples[i*8+0] = (drflac_int32)left0;
9171 pOutputSamples[i*8+1] = (drflac_int32)right0;
9172 pOutputSamples[i*8+2] = (drflac_int32)left1;
9173 pOutputSamples[i*8+3] = (drflac_int32)right1;
9174 pOutputSamples[i*8+4] = (drflac_int32)left2;
9175 pOutputSamples[i*8+5] = (drflac_int32)right2;
9176 pOutputSamples[i*8+6] = (drflac_int32)left3;
9177 pOutputSamples[i*8+7] = (drflac_int32)right3;
9178 }
9179
9180 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9181 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9182 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9183 drflac_uint32 left = right + side;
9184
9185 pOutputSamples[i*2+0] = (drflac_int32)left;
9186 pOutputSamples[i*2+1] = (drflac_int32)right;
9187 }
9188}
9189
9190#if defined(DRFLAC_SUPPORT_SSE2)
9191static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9192{
9193 drflac_uint64 i;
9194 drflac_uint64 frameCount4 = frameCount >> 2;
9195 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9196 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9197 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9198 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9199
9200 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9201
9202 for (i = 0; i < frameCount4; ++i) {
9203 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9204 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9205 __m128i left = _mm_add_epi32(right, side);
9206
9207 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9208 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9209 }
9210
9211 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9212 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9213 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9214 drflac_uint32 left = right + side;
9215
9216 pOutputSamples[i*2+0] = (drflac_int32)left;
9217 pOutputSamples[i*2+1] = (drflac_int32)right;
9218 }
9219}
9220#endif
9221
9222#if defined(DRFLAC_SUPPORT_NEON)
9223static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9224{
9225 drflac_uint64 i;
9226 drflac_uint64 frameCount4 = frameCount >> 2;
9227 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9228 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9229 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9230 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9231 int32x4_t shift0_4;
9232 int32x4_t shift1_4;
9233
9234 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9235
9236 shift0_4 = vdupq_n_s32(shift0);
9237 shift1_4 = vdupq_n_s32(shift1);
9238
9239 for (i = 0; i < frameCount4; ++i) {
9240 uint32x4_t side;
9241 uint32x4_t right;
9242 uint32x4_t left;
9243
9244 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9245 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9246 left = vaddq_u32(right, side);
9247
9248 drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
9249 }
9250
9251 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9252 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9253 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9254 drflac_uint32 left = right + side;
9255
9256 pOutputSamples[i*2+0] = (drflac_int32)left;
9257 pOutputSamples[i*2+1] = (drflac_int32)right;
9258 }
9259}
9260#endif
9261
9262static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9263{
9264#if defined(DRFLAC_SUPPORT_SSE2)
9265 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9266 drflac_read_pcm_frames_s32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9267 } else
9268#elif defined(DRFLAC_SUPPORT_NEON)
9269 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9270 drflac_read_pcm_frames_s32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9271 } else
9272#endif
9273 {
9274 /* Scalar fallback. */
9e052883 9275#if 0
9276 drflac_read_pcm_frames_s32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9277#else
2ff0b512 9278 drflac_read_pcm_frames_s32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 9279#endif
2ff0b512 9280 }
9281}
9282
9283
9e052883 9284#if 0
9285static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9286{
9287 for (drflac_uint64 i = 0; i < frameCount; ++i) {
9288 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9289 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9290
9291 mid = (mid << 1) | (side & 0x01);
9292
9293 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
9294 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
9295 }
9296}
9297#endif
9298
2ff0b512 9299static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9300{
9301 drflac_uint64 i;
9302 drflac_uint64 frameCount4 = frameCount >> 2;
9303 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9304 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9305 drflac_int32 shift = unusedBitsPerSample;
9306
9307 if (shift > 0) {
9308 shift -= 1;
9309 for (i = 0; i < frameCount4; ++i) {
9310 drflac_uint32 temp0L;
9311 drflac_uint32 temp1L;
9312 drflac_uint32 temp2L;
9313 drflac_uint32 temp3L;
9314 drflac_uint32 temp0R;
9315 drflac_uint32 temp1R;
9316 drflac_uint32 temp2R;
9317 drflac_uint32 temp3R;
9318
9319 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9320 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9321 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9322 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9323
9324 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9325 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9326 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9327 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9328
9329 mid0 = (mid0 << 1) | (side0 & 0x01);
9330 mid1 = (mid1 << 1) | (side1 & 0x01);
9331 mid2 = (mid2 << 1) | (side2 & 0x01);
9332 mid3 = (mid3 << 1) | (side3 & 0x01);
9333
9334 temp0L = (mid0 + side0) << shift;
9335 temp1L = (mid1 + side1) << shift;
9336 temp2L = (mid2 + side2) << shift;
9337 temp3L = (mid3 + side3) << shift;
9338
9339 temp0R = (mid0 - side0) << shift;
9340 temp1R = (mid1 - side1) << shift;
9341 temp2R = (mid2 - side2) << shift;
9342 temp3R = (mid3 - side3) << shift;
9343
9344 pOutputSamples[i*8+0] = (drflac_int32)temp0L;
9345 pOutputSamples[i*8+1] = (drflac_int32)temp0R;
9346 pOutputSamples[i*8+2] = (drflac_int32)temp1L;
9347 pOutputSamples[i*8+3] = (drflac_int32)temp1R;
9348 pOutputSamples[i*8+4] = (drflac_int32)temp2L;
9349 pOutputSamples[i*8+5] = (drflac_int32)temp2R;
9350 pOutputSamples[i*8+6] = (drflac_int32)temp3L;
9351 pOutputSamples[i*8+7] = (drflac_int32)temp3R;
9352 }
9353 } else {
9354 for (i = 0; i < frameCount4; ++i) {
9355 drflac_uint32 temp0L;
9356 drflac_uint32 temp1L;
9357 drflac_uint32 temp2L;
9358 drflac_uint32 temp3L;
9359 drflac_uint32 temp0R;
9360 drflac_uint32 temp1R;
9361 drflac_uint32 temp2R;
9362 drflac_uint32 temp3R;
9363
9364 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9365 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9366 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9367 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9368
9369 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9370 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9371 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9372 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9373
9374 mid0 = (mid0 << 1) | (side0 & 0x01);
9375 mid1 = (mid1 << 1) | (side1 & 0x01);
9376 mid2 = (mid2 << 1) | (side2 & 0x01);
9377 mid3 = (mid3 << 1) | (side3 & 0x01);
9378
9379 temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
9380 temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
9381 temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
9382 temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
9383
9384 temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
9385 temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
9386 temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
9387 temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
9388
9389 pOutputSamples[i*8+0] = (drflac_int32)temp0L;
9390 pOutputSamples[i*8+1] = (drflac_int32)temp0R;
9391 pOutputSamples[i*8+2] = (drflac_int32)temp1L;
9392 pOutputSamples[i*8+3] = (drflac_int32)temp1R;
9393 pOutputSamples[i*8+4] = (drflac_int32)temp2L;
9394 pOutputSamples[i*8+5] = (drflac_int32)temp2R;
9395 pOutputSamples[i*8+6] = (drflac_int32)temp3L;
9396 pOutputSamples[i*8+7] = (drflac_int32)temp3R;
9397 }
9398 }
9399
9400 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9401 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9402 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9403
9404 mid = (mid << 1) | (side & 0x01);
9405
9406 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
9407 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
9408 }
9409}
9410
9411#if defined(DRFLAC_SUPPORT_SSE2)
9412static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9413{
9414 drflac_uint64 i;
9415 drflac_uint64 frameCount4 = frameCount >> 2;
9416 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9417 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9418 drflac_int32 shift = unusedBitsPerSample;
9419
9420 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9421
9422 if (shift == 0) {
9423 for (i = 0; i < frameCount4; ++i) {
9424 __m128i mid;
9425 __m128i side;
9426 __m128i left;
9427 __m128i right;
9428
9429 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9430 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9431
9432 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
9433
9434 left = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
9435 right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
9436
9437 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9438 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9439 }
9440
9441 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9442 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9443 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9444
9445 mid = (mid << 1) | (side & 0x01);
9446
9447 pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
9448 pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
9449 }
9450 } else {
9451 shift -= 1;
9452 for (i = 0; i < frameCount4; ++i) {
9453 __m128i mid;
9454 __m128i side;
9455 __m128i left;
9456 __m128i right;
9457
9458 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9459 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9460
9461 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
9462
9463 left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
9464 right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
9465
9466 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9467 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9468 }
9469
9470 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9471 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9472 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9473
9474 mid = (mid << 1) | (side & 0x01);
9475
9476 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
9477 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
9478 }
9479 }
9480}
9481#endif
9482
9483#if defined(DRFLAC_SUPPORT_NEON)
9484static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9485{
9486 drflac_uint64 i;
9487 drflac_uint64 frameCount4 = frameCount >> 2;
9488 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9489 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9490 drflac_int32 shift = unusedBitsPerSample;
9491 int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
9492 int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
9493 uint32x4_t one4;
9494
9495 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9496
9497 wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9498 wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9499 one4 = vdupq_n_u32(1);
9500
9501 if (shift == 0) {
9502 for (i = 0; i < frameCount4; ++i) {
9503 uint32x4_t mid;
9504 uint32x4_t side;
9505 int32x4_t left;
9506 int32x4_t right;
9507
9508 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
9509 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
9510
9511 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
9512
9513 left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
9514 right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
9515
9516 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9517 }
9518
9519 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9520 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9521 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9522
9523 mid = (mid << 1) | (side & 0x01);
9524
9525 pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
9526 pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
9527 }
9528 } else {
9529 int32x4_t shift4;
9530
9531 shift -= 1;
9532 shift4 = vdupq_n_s32(shift);
9533
9534 for (i = 0; i < frameCount4; ++i) {
9535 uint32x4_t mid;
9536 uint32x4_t side;
9537 int32x4_t left;
9538 int32x4_t right;
9539
9540 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
9541 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
9542
9543 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
9544
9545 left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
9546 right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
9547
9548 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9549 }
9550
9551 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9552 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9553 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9554
9555 mid = (mid << 1) | (side & 0x01);
9556
9557 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
9558 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
9559 }
9560 }
9561}
9562#endif
9563
9564static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9565{
9566#if defined(DRFLAC_SUPPORT_SSE2)
9567 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9568 drflac_read_pcm_frames_s32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9569 } else
9570#elif defined(DRFLAC_SUPPORT_NEON)
9571 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9572 drflac_read_pcm_frames_s32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9573 } else
9574#endif
9575 {
9576 /* Scalar fallback. */
9e052883 9577#if 0
9578 drflac_read_pcm_frames_s32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9579#else
2ff0b512 9580 drflac_read_pcm_frames_s32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 9581#endif
2ff0b512 9582 }
9583}
9584
9585
9e052883 9586#if 0
9587static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9588{
9589 for (drflac_uint64 i = 0; i < frameCount; ++i) {
9590 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample));
9591 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample));
9592 }
9593}
9594#endif
9595
2ff0b512 9596static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9597{
9598 drflac_uint64 i;
9599 drflac_uint64 frameCount4 = frameCount >> 2;
9600 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9601 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9602 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9603 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9604
9605 for (i = 0; i < frameCount4; ++i) {
9606 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
9607 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
9608 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
9609 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
9610
9611 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
9612 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
9613 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
9614 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
9615
9616 pOutputSamples[i*8+0] = (drflac_int32)tempL0;
9617 pOutputSamples[i*8+1] = (drflac_int32)tempR0;
9618 pOutputSamples[i*8+2] = (drflac_int32)tempL1;
9619 pOutputSamples[i*8+3] = (drflac_int32)tempR1;
9620 pOutputSamples[i*8+4] = (drflac_int32)tempL2;
9621 pOutputSamples[i*8+5] = (drflac_int32)tempR2;
9622 pOutputSamples[i*8+6] = (drflac_int32)tempL3;
9623 pOutputSamples[i*8+7] = (drflac_int32)tempR3;
9624 }
9625
9626 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9627 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9628 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9629 }
9630}
9631
9632#if defined(DRFLAC_SUPPORT_SSE2)
9633static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9634{
9635 drflac_uint64 i;
9636 drflac_uint64 frameCount4 = frameCount >> 2;
9637 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9638 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9639 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9640 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9641
9642 for (i = 0; i < frameCount4; ++i) {
9643 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9644 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9645
9646 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9647 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9648 }
9649
9650 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9651 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9652 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9653 }
9654}
9655#endif
9656
9657#if defined(DRFLAC_SUPPORT_NEON)
9658static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9659{
9660 drflac_uint64 i;
9661 drflac_uint64 frameCount4 = frameCount >> 2;
9662 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9663 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9664 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9665 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9666
9667 int32x4_t shift4_0 = vdupq_n_s32(shift0);
9668 int32x4_t shift4_1 = vdupq_n_s32(shift1);
9669
9670 for (i = 0; i < frameCount4; ++i) {
9671 int32x4_t left;
9672 int32x4_t right;
9673
9674 left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift4_0));
9675 right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift4_1));
9676
9677 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9678 }
9679
9680 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9681 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9682 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9683 }
9684}
9685#endif
9686
9687static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9688{
9689#if defined(DRFLAC_SUPPORT_SSE2)
9690 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9691 drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9692 } else
9693#elif defined(DRFLAC_SUPPORT_NEON)
9694 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9695 drflac_read_pcm_frames_s32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9696 } else
9697#endif
9698 {
9699 /* Scalar fallback. */
9e052883 9700#if 0
9701 drflac_read_pcm_frames_s32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9702#else
2ff0b512 9703 drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 9704#endif
2ff0b512 9705 }
9706}
9707
9708
9709DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut)
9710{
9711 drflac_uint64 framesRead;
9712 drflac_uint32 unusedBitsPerSample;
9713
9714 if (pFlac == NULL || framesToRead == 0) {
9715 return 0;
9716 }
9717
9718 if (pBufferOut == NULL) {
9719 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
9720 }
9721
9722 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
9723 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
9724
9725 framesRead = 0;
9726 while (framesToRead > 0) {
9727 /* If we've run out of samples in this frame, go to the next. */
9728 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
9729 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
9730 break; /* Couldn't read the next frame, so just break from the loop and return. */
9731 }
9732 } else {
9733 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
9734 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
9735 drflac_uint64 frameCountThisIteration = framesToRead;
9736
9737 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
9738 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
9739 }
9740
9741 if (channelCount == 2) {
9742 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
9743 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
9744
9745 switch (pFlac->currentFLACFrame.header.channelAssignment)
9746 {
9747 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
9748 {
9749 drflac_read_pcm_frames_s32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9750 } break;
9751
9752 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
9753 {
9754 drflac_read_pcm_frames_s32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9755 } break;
9756
9757 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
9758 {
9759 drflac_read_pcm_frames_s32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9760 } break;
9761
9762 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
9763 default:
9764 {
9765 drflac_read_pcm_frames_s32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9766 } break;
9767 }
9768 } else {
9769 /* Generic interleaving. */
9770 drflac_uint64 i;
9771 for (i = 0; i < frameCountThisIteration; ++i) {
9772 unsigned int j;
9773 for (j = 0; j < channelCount; ++j) {
9774 pBufferOut[(i*channelCount)+j] = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
9775 }
9776 }
9777 }
9778
9779 framesRead += frameCountThisIteration;
9780 pBufferOut += frameCountThisIteration * channelCount;
9781 framesToRead -= frameCountThisIteration;
9782 pFlac->currentPCMFrame += frameCountThisIteration;
9783 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
9784 }
9785 }
9786
9787 return framesRead;
9788}
9789
9e052883 9790
9791#if 0
9792static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9793{
9794 drflac_uint64 i;
9795 for (i = 0; i < frameCount; ++i) {
9796 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9797 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9798 drflac_uint32 right = left - side;
9799
9800 left >>= 16;
9801 right >>= 16;
9802
9803 pOutputSamples[i*2+0] = (drflac_int16)left;
9804 pOutputSamples[i*2+1] = (drflac_int16)right;
9805 }
9806}
9807#endif
9808
2ff0b512 9809static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9810{
9811 drflac_uint64 i;
9812 drflac_uint64 frameCount4 = frameCount >> 2;
9813 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9814 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9815 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9816 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9817
9818 for (i = 0; i < frameCount4; ++i) {
9819 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
9820 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
9821 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
9822 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
9823
9824 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
9825 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
9826 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
9827 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
9828
9829 drflac_uint32 right0 = left0 - side0;
9830 drflac_uint32 right1 = left1 - side1;
9831 drflac_uint32 right2 = left2 - side2;
9832 drflac_uint32 right3 = left3 - side3;
9833
9834 left0 >>= 16;
9835 left1 >>= 16;
9836 left2 >>= 16;
9837 left3 >>= 16;
9838
9839 right0 >>= 16;
9840 right1 >>= 16;
9841 right2 >>= 16;
9842 right3 >>= 16;
9843
9844 pOutputSamples[i*8+0] = (drflac_int16)left0;
9845 pOutputSamples[i*8+1] = (drflac_int16)right0;
9846 pOutputSamples[i*8+2] = (drflac_int16)left1;
9847 pOutputSamples[i*8+3] = (drflac_int16)right1;
9848 pOutputSamples[i*8+4] = (drflac_int16)left2;
9849 pOutputSamples[i*8+5] = (drflac_int16)right2;
9850 pOutputSamples[i*8+6] = (drflac_int16)left3;
9851 pOutputSamples[i*8+7] = (drflac_int16)right3;
9852 }
9853
9854 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9855 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9856 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9857 drflac_uint32 right = left - side;
9858
9859 left >>= 16;
9860 right >>= 16;
9861
9862 pOutputSamples[i*2+0] = (drflac_int16)left;
9863 pOutputSamples[i*2+1] = (drflac_int16)right;
9864 }
9865}
9866
9867#if defined(DRFLAC_SUPPORT_SSE2)
9868static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9869{
9870 drflac_uint64 i;
9871 drflac_uint64 frameCount4 = frameCount >> 2;
9872 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9873 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9874 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9875 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9876
9877 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9878
9879 for (i = 0; i < frameCount4; ++i) {
9880 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9881 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9882 __m128i right = _mm_sub_epi32(left, side);
9883
9884 left = _mm_srai_epi32(left, 16);
9885 right = _mm_srai_epi32(right, 16);
9886
9887 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
9888 }
9889
9890 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9891 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9892 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9893 drflac_uint32 right = left - side;
9894
9895 left >>= 16;
9896 right >>= 16;
9897
9898 pOutputSamples[i*2+0] = (drflac_int16)left;
9899 pOutputSamples[i*2+1] = (drflac_int16)right;
9900 }
9901}
9902#endif
9903
9904#if defined(DRFLAC_SUPPORT_NEON)
9905static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9906{
9907 drflac_uint64 i;
9908 drflac_uint64 frameCount4 = frameCount >> 2;
9909 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9910 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9911 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9912 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9913 int32x4_t shift0_4;
9914 int32x4_t shift1_4;
9915
9916 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9917
9918 shift0_4 = vdupq_n_s32(shift0);
9919 shift1_4 = vdupq_n_s32(shift1);
9920
9921 for (i = 0; i < frameCount4; ++i) {
9922 uint32x4_t left;
9923 uint32x4_t side;
9924 uint32x4_t right;
9925
9926 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9927 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9928 right = vsubq_u32(left, side);
9929
9930 left = vshrq_n_u32(left, 16);
9931 right = vshrq_n_u32(right, 16);
9932
9933 drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
9934 }
9935
9936 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9937 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9938 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9939 drflac_uint32 right = left - side;
9940
9941 left >>= 16;
9942 right >>= 16;
9943
9944 pOutputSamples[i*2+0] = (drflac_int16)left;
9945 pOutputSamples[i*2+1] = (drflac_int16)right;
9946 }
9947}
9948#endif
9949
9950static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9951{
9952#if defined(DRFLAC_SUPPORT_SSE2)
9953 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9954 drflac_read_pcm_frames_s16__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9955 } else
9956#elif defined(DRFLAC_SUPPORT_NEON)
9957 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9958 drflac_read_pcm_frames_s16__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9959 } else
9960#endif
9961 {
9962 /* Scalar fallback. */
9e052883 9963#if 0
9964 drflac_read_pcm_frames_s16__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9965#else
2ff0b512 9966 drflac_read_pcm_frames_s16__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 9967#endif
2ff0b512 9968 }
9969}
9970
9971
9e052883 9972#if 0
9973static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9974{
9975 drflac_uint64 i;
9976 for (i = 0; i < frameCount; ++i) {
9977 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9978 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9979 drflac_uint32 left = right + side;
9980
9981 left >>= 16;
9982 right >>= 16;
9983
9984 pOutputSamples[i*2+0] = (drflac_int16)left;
9985 pOutputSamples[i*2+1] = (drflac_int16)right;
9986 }
9987}
9988#endif
9989
2ff0b512 9990static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9991{
9992 drflac_uint64 i;
9993 drflac_uint64 frameCount4 = frameCount >> 2;
9994 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9995 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9996 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9997 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9998
9999 for (i = 0; i < frameCount4; ++i) {
10000 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
10001 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
10002 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
10003 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
10004
10005 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
10006 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
10007 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
10008 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
10009
10010 drflac_uint32 left0 = right0 + side0;
10011 drflac_uint32 left1 = right1 + side1;
10012 drflac_uint32 left2 = right2 + side2;
10013 drflac_uint32 left3 = right3 + side3;
10014
10015 left0 >>= 16;
10016 left1 >>= 16;
10017 left2 >>= 16;
10018 left3 >>= 16;
10019
10020 right0 >>= 16;
10021 right1 >>= 16;
10022 right2 >>= 16;
10023 right3 >>= 16;
10024
10025 pOutputSamples[i*8+0] = (drflac_int16)left0;
10026 pOutputSamples[i*8+1] = (drflac_int16)right0;
10027 pOutputSamples[i*8+2] = (drflac_int16)left1;
10028 pOutputSamples[i*8+3] = (drflac_int16)right1;
10029 pOutputSamples[i*8+4] = (drflac_int16)left2;
10030 pOutputSamples[i*8+5] = (drflac_int16)right2;
10031 pOutputSamples[i*8+6] = (drflac_int16)left3;
10032 pOutputSamples[i*8+7] = (drflac_int16)right3;
10033 }
10034
10035 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10036 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10037 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10038 drflac_uint32 left = right + side;
10039
10040 left >>= 16;
10041 right >>= 16;
10042
10043 pOutputSamples[i*2+0] = (drflac_int16)left;
10044 pOutputSamples[i*2+1] = (drflac_int16)right;
10045 }
10046}
10047
10048#if defined(DRFLAC_SUPPORT_SSE2)
10049static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10050{
10051 drflac_uint64 i;
10052 drflac_uint64 frameCount4 = frameCount >> 2;
10053 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10054 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10055 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10056 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10057
10058 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10059
10060 for (i = 0; i < frameCount4; ++i) {
10061 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10062 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10063 __m128i left = _mm_add_epi32(right, side);
10064
10065 left = _mm_srai_epi32(left, 16);
10066 right = _mm_srai_epi32(right, 16);
10067
10068 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10069 }
10070
10071 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10072 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10073 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10074 drflac_uint32 left = right + side;
10075
10076 left >>= 16;
10077 right >>= 16;
10078
10079 pOutputSamples[i*2+0] = (drflac_int16)left;
10080 pOutputSamples[i*2+1] = (drflac_int16)right;
10081 }
10082}
10083#endif
10084
10085#if defined(DRFLAC_SUPPORT_NEON)
10086static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10087{
10088 drflac_uint64 i;
10089 drflac_uint64 frameCount4 = frameCount >> 2;
10090 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10091 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10092 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10093 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10094 int32x4_t shift0_4;
10095 int32x4_t shift1_4;
10096
10097 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10098
10099 shift0_4 = vdupq_n_s32(shift0);
10100 shift1_4 = vdupq_n_s32(shift1);
10101
10102 for (i = 0; i < frameCount4; ++i) {
10103 uint32x4_t side;
10104 uint32x4_t right;
10105 uint32x4_t left;
10106
10107 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10108 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10109 left = vaddq_u32(right, side);
10110
10111 left = vshrq_n_u32(left, 16);
10112 right = vshrq_n_u32(right, 16);
10113
10114 drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
10115 }
10116
10117 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10118 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10119 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10120 drflac_uint32 left = right + side;
10121
10122 left >>= 16;
10123 right >>= 16;
10124
10125 pOutputSamples[i*2+0] = (drflac_int16)left;
10126 pOutputSamples[i*2+1] = (drflac_int16)right;
10127 }
10128}
10129#endif
10130
10131static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10132{
10133#if defined(DRFLAC_SUPPORT_SSE2)
10134 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10135 drflac_read_pcm_frames_s16__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10136 } else
10137#elif defined(DRFLAC_SUPPORT_NEON)
10138 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10139 drflac_read_pcm_frames_s16__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10140 } else
10141#endif
10142 {
10143 /* Scalar fallback. */
9e052883 10144#if 0
10145 drflac_read_pcm_frames_s16__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10146#else
2ff0b512 10147 drflac_read_pcm_frames_s16__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 10148#endif
2ff0b512 10149 }
10150}
10151
10152
9e052883 10153#if 0
10154static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10155{
10156 for (drflac_uint64 i = 0; i < frameCount; ++i) {
10157 drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10158 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10159
10160 mid = (mid << 1) | (side & 0x01);
10161
10162 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
10163 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
10164 }
10165}
10166#endif
2ff0b512 10167
10168static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10169{
10170 drflac_uint64 i;
10171 drflac_uint64 frameCount4 = frameCount >> 2;
10172 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10173 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10174 drflac_uint32 shift = unusedBitsPerSample;
10175
10176 if (shift > 0) {
10177 shift -= 1;
10178 for (i = 0; i < frameCount4; ++i) {
10179 drflac_uint32 temp0L;
10180 drflac_uint32 temp1L;
10181 drflac_uint32 temp2L;
10182 drflac_uint32 temp3L;
10183 drflac_uint32 temp0R;
10184 drflac_uint32 temp1R;
10185 drflac_uint32 temp2R;
10186 drflac_uint32 temp3R;
10187
10188 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10189 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10190 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10191 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10192
10193 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10194 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10195 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10196 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10197
10198 mid0 = (mid0 << 1) | (side0 & 0x01);
10199 mid1 = (mid1 << 1) | (side1 & 0x01);
10200 mid2 = (mid2 << 1) | (side2 & 0x01);
10201 mid3 = (mid3 << 1) | (side3 & 0x01);
10202
10203 temp0L = (mid0 + side0) << shift;
10204 temp1L = (mid1 + side1) << shift;
10205 temp2L = (mid2 + side2) << shift;
10206 temp3L = (mid3 + side3) << shift;
10207
10208 temp0R = (mid0 - side0) << shift;
10209 temp1R = (mid1 - side1) << shift;
10210 temp2R = (mid2 - side2) << shift;
10211 temp3R = (mid3 - side3) << shift;
10212
10213 temp0L >>= 16;
10214 temp1L >>= 16;
10215 temp2L >>= 16;
10216 temp3L >>= 16;
10217
10218 temp0R >>= 16;
10219 temp1R >>= 16;
10220 temp2R >>= 16;
10221 temp3R >>= 16;
10222
10223 pOutputSamples[i*8+0] = (drflac_int16)temp0L;
10224 pOutputSamples[i*8+1] = (drflac_int16)temp0R;
10225 pOutputSamples[i*8+2] = (drflac_int16)temp1L;
10226 pOutputSamples[i*8+3] = (drflac_int16)temp1R;
10227 pOutputSamples[i*8+4] = (drflac_int16)temp2L;
10228 pOutputSamples[i*8+5] = (drflac_int16)temp2R;
10229 pOutputSamples[i*8+6] = (drflac_int16)temp3L;
10230 pOutputSamples[i*8+7] = (drflac_int16)temp3R;
10231 }
10232 } else {
10233 for (i = 0; i < frameCount4; ++i) {
10234 drflac_uint32 temp0L;
10235 drflac_uint32 temp1L;
10236 drflac_uint32 temp2L;
10237 drflac_uint32 temp3L;
10238 drflac_uint32 temp0R;
10239 drflac_uint32 temp1R;
10240 drflac_uint32 temp2R;
10241 drflac_uint32 temp3R;
10242
10243 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10244 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10245 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10246 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10247
10248 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10249 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10250 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10251 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10252
10253 mid0 = (mid0 << 1) | (side0 & 0x01);
10254 mid1 = (mid1 << 1) | (side1 & 0x01);
10255 mid2 = (mid2 << 1) | (side2 & 0x01);
10256 mid3 = (mid3 << 1) | (side3 & 0x01);
10257
10258 temp0L = ((drflac_int32)(mid0 + side0) >> 1);
10259 temp1L = ((drflac_int32)(mid1 + side1) >> 1);
10260 temp2L = ((drflac_int32)(mid2 + side2) >> 1);
10261 temp3L = ((drflac_int32)(mid3 + side3) >> 1);
10262
10263 temp0R = ((drflac_int32)(mid0 - side0) >> 1);
10264 temp1R = ((drflac_int32)(mid1 - side1) >> 1);
10265 temp2R = ((drflac_int32)(mid2 - side2) >> 1);
10266 temp3R = ((drflac_int32)(mid3 - side3) >> 1);
10267
10268 temp0L >>= 16;
10269 temp1L >>= 16;
10270 temp2L >>= 16;
10271 temp3L >>= 16;
10272
10273 temp0R >>= 16;
10274 temp1R >>= 16;
10275 temp2R >>= 16;
10276 temp3R >>= 16;
10277
10278 pOutputSamples[i*8+0] = (drflac_int16)temp0L;
10279 pOutputSamples[i*8+1] = (drflac_int16)temp0R;
10280 pOutputSamples[i*8+2] = (drflac_int16)temp1L;
10281 pOutputSamples[i*8+3] = (drflac_int16)temp1R;
10282 pOutputSamples[i*8+4] = (drflac_int16)temp2L;
10283 pOutputSamples[i*8+5] = (drflac_int16)temp2R;
10284 pOutputSamples[i*8+6] = (drflac_int16)temp3L;
10285 pOutputSamples[i*8+7] = (drflac_int16)temp3R;
10286 }
10287 }
10288
10289 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10290 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10291 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10292
10293 mid = (mid << 1) | (side & 0x01);
10294
10295 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
10296 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
10297 }
10298}
10299
10300#if defined(DRFLAC_SUPPORT_SSE2)
10301static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10302{
10303 drflac_uint64 i;
10304 drflac_uint64 frameCount4 = frameCount >> 2;
10305 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10306 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10307 drflac_uint32 shift = unusedBitsPerSample;
10308
10309 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10310
10311 if (shift == 0) {
10312 for (i = 0; i < frameCount4; ++i) {
10313 __m128i mid;
10314 __m128i side;
10315 __m128i left;
10316 __m128i right;
10317
10318 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10319 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10320
10321 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10322
10323 left = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
10324 right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
10325
10326 left = _mm_srai_epi32(left, 16);
10327 right = _mm_srai_epi32(right, 16);
10328
10329 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10330 }
10331
10332 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10333 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10334 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10335
10336 mid = (mid << 1) | (side & 0x01);
10337
10338 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
10339 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
10340 }
10341 } else {
10342 shift -= 1;
10343 for (i = 0; i < frameCount4; ++i) {
10344 __m128i mid;
10345 __m128i side;
10346 __m128i left;
10347 __m128i right;
10348
10349 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10350 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10351
10352 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10353
10354 left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
10355 right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
10356
10357 left = _mm_srai_epi32(left, 16);
10358 right = _mm_srai_epi32(right, 16);
10359
10360 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10361 }
10362
10363 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10364 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10365 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10366
10367 mid = (mid << 1) | (side & 0x01);
10368
10369 pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
10370 pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
10371 }
10372 }
10373}
10374#endif
10375
10376#if defined(DRFLAC_SUPPORT_NEON)
10377static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10378{
10379 drflac_uint64 i;
10380 drflac_uint64 frameCount4 = frameCount >> 2;
10381 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10382 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10383 drflac_uint32 shift = unusedBitsPerSample;
10384 int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
10385 int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
10386
10387 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10388
10389 wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10390 wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10391
10392 if (shift == 0) {
10393 for (i = 0; i < frameCount4; ++i) {
10394 uint32x4_t mid;
10395 uint32x4_t side;
10396 int32x4_t left;
10397 int32x4_t right;
10398
10399 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
10400 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
10401
10402 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10403
10404 left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
10405 right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
10406
10407 left = vshrq_n_s32(left, 16);
10408 right = vshrq_n_s32(right, 16);
10409
10410 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10411 }
10412
10413 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10414 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10415 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10416
10417 mid = (mid << 1) | (side & 0x01);
10418
10419 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
10420 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
10421 }
10422 } else {
10423 int32x4_t shift4;
10424
10425 shift -= 1;
10426 shift4 = vdupq_n_s32(shift);
10427
10428 for (i = 0; i < frameCount4; ++i) {
10429 uint32x4_t mid;
10430 uint32x4_t side;
10431 int32x4_t left;
10432 int32x4_t right;
10433
10434 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
10435 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
10436
10437 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10438
10439 left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
10440 right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
10441
10442 left = vshrq_n_s32(left, 16);
10443 right = vshrq_n_s32(right, 16);
10444
10445 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10446 }
10447
10448 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10449 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10450 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10451
10452 mid = (mid << 1) | (side & 0x01);
10453
10454 pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
10455 pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
10456 }
10457 }
10458}
10459#endif
10460
10461static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10462{
10463#if defined(DRFLAC_SUPPORT_SSE2)
10464 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10465 drflac_read_pcm_frames_s16__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10466 } else
10467#elif defined(DRFLAC_SUPPORT_NEON)
10468 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10469 drflac_read_pcm_frames_s16__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10470 } else
10471#endif
10472 {
10473 /* Scalar fallback. */
9e052883 10474#if 0
10475 drflac_read_pcm_frames_s16__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10476#else
2ff0b512 10477 drflac_read_pcm_frames_s16__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 10478#endif
10479 }
10480}
10481
10482
10483#if 0
10484static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10485{
10486 for (drflac_uint64 i = 0; i < frameCount; ++i) {
10487 pOutputSamples[i*2+0] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) >> 16);
10488 pOutputSamples[i*2+1] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) >> 16);
2ff0b512 10489 }
10490}
9e052883 10491#endif
2ff0b512 10492
10493static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10494{
10495 drflac_uint64 i;
10496 drflac_uint64 frameCount4 = frameCount >> 2;
10497 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10498 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10499 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10500 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10501
10502 for (i = 0; i < frameCount4; ++i) {
10503 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
10504 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
10505 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
10506 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
10507
10508 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
10509 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
10510 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
10511 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
10512
10513 tempL0 >>= 16;
10514 tempL1 >>= 16;
10515 tempL2 >>= 16;
10516 tempL3 >>= 16;
10517
10518 tempR0 >>= 16;
10519 tempR1 >>= 16;
10520 tempR2 >>= 16;
10521 tempR3 >>= 16;
10522
10523 pOutputSamples[i*8+0] = (drflac_int16)tempL0;
10524 pOutputSamples[i*8+1] = (drflac_int16)tempR0;
10525 pOutputSamples[i*8+2] = (drflac_int16)tempL1;
10526 pOutputSamples[i*8+3] = (drflac_int16)tempR1;
10527 pOutputSamples[i*8+4] = (drflac_int16)tempL2;
10528 pOutputSamples[i*8+5] = (drflac_int16)tempR2;
10529 pOutputSamples[i*8+6] = (drflac_int16)tempL3;
10530 pOutputSamples[i*8+7] = (drflac_int16)tempR3;
10531 }
10532
10533 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10534 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10535 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10536 }
10537}
10538
10539#if defined(DRFLAC_SUPPORT_SSE2)
10540static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10541{
10542 drflac_uint64 i;
10543 drflac_uint64 frameCount4 = frameCount >> 2;
10544 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10545 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10546 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10547 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10548
10549 for (i = 0; i < frameCount4; ++i) {
10550 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10551 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10552
10553 left = _mm_srai_epi32(left, 16);
10554 right = _mm_srai_epi32(right, 16);
10555
10556 /* At this point we have results. We can now pack and interleave these into a single __m128i object and then store the in the output buffer. */
10557 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10558 }
10559
10560 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10561 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10562 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10563 }
10564}
10565#endif
10566
10567#if defined(DRFLAC_SUPPORT_NEON)
10568static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10569{
10570 drflac_uint64 i;
10571 drflac_uint64 frameCount4 = frameCount >> 2;
10572 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10573 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10574 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10575 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10576
10577 int32x4_t shift0_4 = vdupq_n_s32(shift0);
10578 int32x4_t shift1_4 = vdupq_n_s32(shift1);
10579
10580 for (i = 0; i < frameCount4; ++i) {
10581 int32x4_t left;
10582 int32x4_t right;
10583
10584 left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
10585 right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
10586
10587 left = vshrq_n_s32(left, 16);
10588 right = vshrq_n_s32(right, 16);
10589
10590 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10591 }
10592
10593 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10594 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10595 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10596 }
10597}
10598#endif
10599
10600static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10601{
10602#if defined(DRFLAC_SUPPORT_SSE2)
10603 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10604 drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10605 } else
10606#elif defined(DRFLAC_SUPPORT_NEON)
10607 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10608 drflac_read_pcm_frames_s16__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10609 } else
10610#endif
10611 {
10612 /* Scalar fallback. */
9e052883 10613#if 0
10614 drflac_read_pcm_frames_s16__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10615#else
2ff0b512 10616 drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 10617#endif
2ff0b512 10618 }
10619}
10620
10621DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut)
10622{
10623 drflac_uint64 framesRead;
10624 drflac_uint32 unusedBitsPerSample;
10625
10626 if (pFlac == NULL || framesToRead == 0) {
10627 return 0;
10628 }
10629
10630 if (pBufferOut == NULL) {
10631 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
10632 }
10633
10634 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
10635 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
10636
10637 framesRead = 0;
10638 while (framesToRead > 0) {
10639 /* If we've run out of samples in this frame, go to the next. */
10640 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
10641 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
10642 break; /* Couldn't read the next frame, so just break from the loop and return. */
10643 }
10644 } else {
10645 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
10646 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
10647 drflac_uint64 frameCountThisIteration = framesToRead;
10648
10649 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
10650 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
10651 }
10652
10653 if (channelCount == 2) {
10654 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
10655 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
10656
10657 switch (pFlac->currentFLACFrame.header.channelAssignment)
10658 {
10659 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
10660 {
10661 drflac_read_pcm_frames_s16__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10662 } break;
10663
10664 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
10665 {
10666 drflac_read_pcm_frames_s16__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10667 } break;
10668
10669 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
10670 {
10671 drflac_read_pcm_frames_s16__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10672 } break;
10673
10674 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
10675 default:
10676 {
10677 drflac_read_pcm_frames_s16__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10678 } break;
10679 }
10680 } else {
10681 /* Generic interleaving. */
10682 drflac_uint64 i;
10683 for (i = 0; i < frameCountThisIteration; ++i) {
10684 unsigned int j;
10685 for (j = 0; j < channelCount; ++j) {
10686 drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
10687 pBufferOut[(i*channelCount)+j] = (drflac_int16)(sampleS32 >> 16);
10688 }
10689 }
10690 }
10691
10692 framesRead += frameCountThisIteration;
10693 pBufferOut += frameCountThisIteration * channelCount;
10694 framesToRead -= frameCountThisIteration;
10695 pFlac->currentPCMFrame += frameCountThisIteration;
10696 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
10697 }
10698 }
10699
10700 return framesRead;
10701}
10702
9e052883 10703
10704#if 0
10705static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10706{
10707 drflac_uint64 i;
10708 for (i = 0; i < frameCount; ++i) {
10709 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10710 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10711 drflac_uint32 right = left - side;
10712
10713 pOutputSamples[i*2+0] = (float)((drflac_int32)left / 2147483648.0);
10714 pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
10715 }
10716}
10717#endif
10718
2ff0b512 10719static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10720{
10721 drflac_uint64 i;
10722 drflac_uint64 frameCount4 = frameCount >> 2;
10723 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10724 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10725 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10726 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10727
10728 float factor = 1 / 2147483648.0;
10729
10730 for (i = 0; i < frameCount4; ++i) {
10731 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
10732 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
10733 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
10734 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
10735
10736 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
10737 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
10738 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
10739 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
10740
10741 drflac_uint32 right0 = left0 - side0;
10742 drflac_uint32 right1 = left1 - side1;
10743 drflac_uint32 right2 = left2 - side2;
10744 drflac_uint32 right3 = left3 - side3;
10745
10746 pOutputSamples[i*8+0] = (drflac_int32)left0 * factor;
10747 pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
10748 pOutputSamples[i*8+2] = (drflac_int32)left1 * factor;
10749 pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
10750 pOutputSamples[i*8+4] = (drflac_int32)left2 * factor;
10751 pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
10752 pOutputSamples[i*8+6] = (drflac_int32)left3 * factor;
10753 pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
10754 }
10755
10756 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10757 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10758 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10759 drflac_uint32 right = left - side;
10760
10761 pOutputSamples[i*2+0] = (drflac_int32)left * factor;
10762 pOutputSamples[i*2+1] = (drflac_int32)right * factor;
10763 }
10764}
10765
10766#if defined(DRFLAC_SUPPORT_SSE2)
10767static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10768{
10769 drflac_uint64 i;
10770 drflac_uint64 frameCount4 = frameCount >> 2;
10771 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10772 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10773 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10774 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10775 __m128 factor;
10776
10777 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10778
10779 factor = _mm_set1_ps(1.0f / 8388608.0f);
10780
10781 for (i = 0; i < frameCount4; ++i) {
10782 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10783 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10784 __m128i right = _mm_sub_epi32(left, side);
10785 __m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor);
10786 __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10787
10788 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10789 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10790 }
10791
10792 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10793 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10794 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10795 drflac_uint32 right = left - side;
10796
10797 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10798 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10799 }
10800}
10801#endif
10802
10803#if defined(DRFLAC_SUPPORT_NEON)
10804static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10805{
10806 drflac_uint64 i;
10807 drflac_uint64 frameCount4 = frameCount >> 2;
10808 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10809 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10810 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10811 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10812 float32x4_t factor4;
10813 int32x4_t shift0_4;
10814 int32x4_t shift1_4;
10815
10816 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10817
10818 factor4 = vdupq_n_f32(1.0f / 8388608.0f);
10819 shift0_4 = vdupq_n_s32(shift0);
10820 shift1_4 = vdupq_n_s32(shift1);
10821
10822 for (i = 0; i < frameCount4; ++i) {
10823 uint32x4_t left;
10824 uint32x4_t side;
10825 uint32x4_t right;
10826 float32x4_t leftf;
10827 float32x4_t rightf;
10828
10829 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10830 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10831 right = vsubq_u32(left, side);
10832 leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4);
10833 rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
10834
10835 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
10836 }
10837
10838 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10839 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10840 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10841 drflac_uint32 right = left - side;
10842
10843 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10844 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10845 }
10846}
10847#endif
10848
10849static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10850{
10851#if defined(DRFLAC_SUPPORT_SSE2)
10852 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10853 drflac_read_pcm_frames_f32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10854 } else
10855#elif defined(DRFLAC_SUPPORT_NEON)
10856 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10857 drflac_read_pcm_frames_f32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10858 } else
10859#endif
10860 {
10861 /* Scalar fallback. */
9e052883 10862#if 0
10863 drflac_read_pcm_frames_f32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10864#else
2ff0b512 10865 drflac_read_pcm_frames_f32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 10866#endif
2ff0b512 10867 }
10868}
10869
10870
9e052883 10871#if 0
10872static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10873{
10874 drflac_uint64 i;
10875 for (i = 0; i < frameCount; ++i) {
10876 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10877 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10878 drflac_uint32 left = right + side;
10879
10880 pOutputSamples[i*2+0] = (float)((drflac_int32)left / 2147483648.0);
10881 pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
10882 }
10883}
10884#endif
10885
2ff0b512 10886static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10887{
10888 drflac_uint64 i;
10889 drflac_uint64 frameCount4 = frameCount >> 2;
10890 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10891 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10892 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10893 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10894 float factor = 1 / 2147483648.0;
10895
10896 for (i = 0; i < frameCount4; ++i) {
10897 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
10898 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
10899 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
10900 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
10901
10902 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
10903 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
10904 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
10905 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
10906
10907 drflac_uint32 left0 = right0 + side0;
10908 drflac_uint32 left1 = right1 + side1;
10909 drflac_uint32 left2 = right2 + side2;
10910 drflac_uint32 left3 = right3 + side3;
10911
10912 pOutputSamples[i*8+0] = (drflac_int32)left0 * factor;
10913 pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
10914 pOutputSamples[i*8+2] = (drflac_int32)left1 * factor;
10915 pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
10916 pOutputSamples[i*8+4] = (drflac_int32)left2 * factor;
10917 pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
10918 pOutputSamples[i*8+6] = (drflac_int32)left3 * factor;
10919 pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
10920 }
10921
10922 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10923 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10924 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10925 drflac_uint32 left = right + side;
10926
10927 pOutputSamples[i*2+0] = (drflac_int32)left * factor;
10928 pOutputSamples[i*2+1] = (drflac_int32)right * factor;
10929 }
10930}
10931
10932#if defined(DRFLAC_SUPPORT_SSE2)
10933static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10934{
10935 drflac_uint64 i;
10936 drflac_uint64 frameCount4 = frameCount >> 2;
10937 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10938 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10939 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10940 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10941 __m128 factor;
10942
10943 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10944
10945 factor = _mm_set1_ps(1.0f / 8388608.0f);
10946
10947 for (i = 0; i < frameCount4; ++i) {
10948 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10949 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10950 __m128i left = _mm_add_epi32(right, side);
10951 __m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor);
10952 __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10953
10954 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10955 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10956 }
10957
10958 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10959 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10960 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10961 drflac_uint32 left = right + side;
10962
10963 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10964 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10965 }
10966}
10967#endif
10968
10969#if defined(DRFLAC_SUPPORT_NEON)
10970static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10971{
10972 drflac_uint64 i;
10973 drflac_uint64 frameCount4 = frameCount >> 2;
10974 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10975 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10976 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10977 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10978 float32x4_t factor4;
10979 int32x4_t shift0_4;
10980 int32x4_t shift1_4;
10981
10982 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10983
10984 factor4 = vdupq_n_f32(1.0f / 8388608.0f);
10985 shift0_4 = vdupq_n_s32(shift0);
10986 shift1_4 = vdupq_n_s32(shift1);
10987
10988 for (i = 0; i < frameCount4; ++i) {
10989 uint32x4_t side;
10990 uint32x4_t right;
10991 uint32x4_t left;
10992 float32x4_t leftf;
10993 float32x4_t rightf;
10994
10995 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10996 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10997 left = vaddq_u32(right, side);
10998 leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4);
10999 rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
11000
11001 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11002 }
11003
11004 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11005 drflac_uint32 side = pInputSamples0U32[i] << shift0;
11006 drflac_uint32 right = pInputSamples1U32[i] << shift1;
11007 drflac_uint32 left = right + side;
11008
11009 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
11010 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
11011 }
11012}
11013#endif
11014
11015static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11016{
11017#if defined(DRFLAC_SUPPORT_SSE2)
11018 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11019 drflac_read_pcm_frames_f32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11020 } else
11021#elif defined(DRFLAC_SUPPORT_NEON)
11022 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11023 drflac_read_pcm_frames_f32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11024 } else
11025#endif
11026 {
11027 /* Scalar fallback. */
9e052883 11028#if 0
11029 drflac_read_pcm_frames_f32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11030#else
2ff0b512 11031 drflac_read_pcm_frames_f32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 11032#endif
11033 }
11034}
11035
11036
11037#if 0
11038static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11039{
11040 for (drflac_uint64 i = 0; i < frameCount; ++i) {
11041 drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11042 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11043
11044 mid = (mid << 1) | (side & 0x01);
11045
11046 pOutputSamples[i*2+0] = (float)((((drflac_int32)(mid + side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
11047 pOutputSamples[i*2+1] = (float)((((drflac_int32)(mid - side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
2ff0b512 11048 }
11049}
9e052883 11050#endif
2ff0b512 11051
11052static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11053{
11054 drflac_uint64 i;
11055 drflac_uint64 frameCount4 = frameCount >> 2;
11056 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11057 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11058 drflac_uint32 shift = unusedBitsPerSample;
11059 float factor = 1 / 2147483648.0;
11060
11061 if (shift > 0) {
11062 shift -= 1;
11063 for (i = 0; i < frameCount4; ++i) {
11064 drflac_uint32 temp0L;
11065 drflac_uint32 temp1L;
11066 drflac_uint32 temp2L;
11067 drflac_uint32 temp3L;
11068 drflac_uint32 temp0R;
11069 drflac_uint32 temp1R;
11070 drflac_uint32 temp2R;
11071 drflac_uint32 temp3R;
11072
11073 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11074 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11075 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11076 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11077
11078 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11079 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11080 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11081 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11082
11083 mid0 = (mid0 << 1) | (side0 & 0x01);
11084 mid1 = (mid1 << 1) | (side1 & 0x01);
11085 mid2 = (mid2 << 1) | (side2 & 0x01);
11086 mid3 = (mid3 << 1) | (side3 & 0x01);
11087
11088 temp0L = (mid0 + side0) << shift;
11089 temp1L = (mid1 + side1) << shift;
11090 temp2L = (mid2 + side2) << shift;
11091 temp3L = (mid3 + side3) << shift;
11092
11093 temp0R = (mid0 - side0) << shift;
11094 temp1R = (mid1 - side1) << shift;
11095 temp2R = (mid2 - side2) << shift;
11096 temp3R = (mid3 - side3) << shift;
11097
11098 pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
11099 pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
11100 pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
11101 pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
11102 pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
11103 pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
11104 pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
11105 pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
11106 }
11107 } else {
11108 for (i = 0; i < frameCount4; ++i) {
11109 drflac_uint32 temp0L;
11110 drflac_uint32 temp1L;
11111 drflac_uint32 temp2L;
11112 drflac_uint32 temp3L;
11113 drflac_uint32 temp0R;
11114 drflac_uint32 temp1R;
11115 drflac_uint32 temp2R;
11116 drflac_uint32 temp3R;
11117
11118 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11119 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11120 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11121 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11122
11123 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11124 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11125 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11126 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11127
11128 mid0 = (mid0 << 1) | (side0 & 0x01);
11129 mid1 = (mid1 << 1) | (side1 & 0x01);
11130 mid2 = (mid2 << 1) | (side2 & 0x01);
11131 mid3 = (mid3 << 1) | (side3 & 0x01);
11132
11133 temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
11134 temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
11135 temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
11136 temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
11137
11138 temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
11139 temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
11140 temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
11141 temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
11142
11143 pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
11144 pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
11145 pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
11146 pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
11147 pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
11148 pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
11149 pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
11150 pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
11151 }
11152 }
11153
11154 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11155 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11156 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11157
11158 mid = (mid << 1) | (side & 0x01);
11159
11160 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) * factor;
11161 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) * factor;
11162 }
11163}
11164
11165#if defined(DRFLAC_SUPPORT_SSE2)
11166static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11167{
11168 drflac_uint64 i;
11169 drflac_uint64 frameCount4 = frameCount >> 2;
11170 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11171 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11172 drflac_uint32 shift = unusedBitsPerSample - 8;
11173 float factor;
11174 __m128 factor128;
11175
11176 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
11177
11178 factor = 1.0f / 8388608.0f;
11179 factor128 = _mm_set1_ps(factor);
11180
11181 if (shift == 0) {
11182 for (i = 0; i < frameCount4; ++i) {
11183 __m128i mid;
11184 __m128i side;
11185 __m128i tempL;
11186 __m128i tempR;
11187 __m128 leftf;
11188 __m128 rightf;
11189
11190 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11191 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11192
11193 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
11194
11195 tempL = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
11196 tempR = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
11197
11198 leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
11199 rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
11200
11201 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11202 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11203 }
11204
11205 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11206 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11207 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11208
11209 mid = (mid << 1) | (side & 0x01);
11210
11211 pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
11212 pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
11213 }
11214 } else {
11215 shift -= 1;
11216 for (i = 0; i < frameCount4; ++i) {
11217 __m128i mid;
11218 __m128i side;
11219 __m128i tempL;
11220 __m128i tempR;
11221 __m128 leftf;
11222 __m128 rightf;
11223
11224 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11225 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11226
11227 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
11228
11229 tempL = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
11230 tempR = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
11231
11232 leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
11233 rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
11234
11235 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11236 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11237 }
11238
11239 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11240 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11241 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11242
11243 mid = (mid << 1) | (side & 0x01);
11244
11245 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
11246 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
11247 }
11248 }
11249}
11250#endif
11251
11252#if defined(DRFLAC_SUPPORT_NEON)
11253static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11254{
11255 drflac_uint64 i;
11256 drflac_uint64 frameCount4 = frameCount >> 2;
11257 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11258 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11259 drflac_uint32 shift = unusedBitsPerSample - 8;
11260 float factor;
11261 float32x4_t factor4;
11262 int32x4_t shift4;
11263 int32x4_t wbps0_4; /* Wasted Bits Per Sample */
11264 int32x4_t wbps1_4; /* Wasted Bits Per Sample */
11265
11266 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
11267
11268 factor = 1.0f / 8388608.0f;
11269 factor4 = vdupq_n_f32(factor);
11270 wbps0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11271 wbps1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11272
11273 if (shift == 0) {
11274 for (i = 0; i < frameCount4; ++i) {
11275 int32x4_t lefti;
11276 int32x4_t righti;
11277 float32x4_t leftf;
11278 float32x4_t rightf;
11279
11280 uint32x4_t mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
11281 uint32x4_t side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
11282
11283 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
11284
11285 lefti = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
11286 righti = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
11287
11288 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11289 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11290
11291 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11292 }
11293
11294 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11295 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11296 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11297
11298 mid = (mid << 1) | (side & 0x01);
11299
11300 pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
11301 pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
11302 }
11303 } else {
11304 shift -= 1;
11305 shift4 = vdupq_n_s32(shift);
11306 for (i = 0; i < frameCount4; ++i) {
11307 uint32x4_t mid;
11308 uint32x4_t side;
11309 int32x4_t lefti;
11310 int32x4_t righti;
11311 float32x4_t leftf;
11312 float32x4_t rightf;
11313
11314 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
11315 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
11316
11317 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
11318
11319 lefti = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
11320 righti = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
11321
11322 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11323 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11324
11325 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11326 }
11327
11328 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11329 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11330 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11331
11332 mid = (mid << 1) | (side & 0x01);
11333
11334 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
11335 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
11336 }
11337 }
11338}
11339#endif
11340
11341static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11342{
11343#if defined(DRFLAC_SUPPORT_SSE2)
11344 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11345 drflac_read_pcm_frames_f32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11346 } else
11347#elif defined(DRFLAC_SUPPORT_NEON)
11348 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11349 drflac_read_pcm_frames_f32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11350 } else
11351#endif
11352 {
11353 /* Scalar fallback. */
9e052883 11354#if 0
11355 drflac_read_pcm_frames_f32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11356#else
2ff0b512 11357 drflac_read_pcm_frames_f32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 11358#endif
11359 }
11360}
11361
11362#if 0
11363static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11364{
11365 for (drflac_uint64 i = 0; i < frameCount; ++i) {
11366 pOutputSamples[i*2+0] = (float)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) / 2147483648.0);
11367 pOutputSamples[i*2+1] = (float)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) / 2147483648.0);
2ff0b512 11368 }
11369}
9e052883 11370#endif
2ff0b512 11371
11372static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11373{
11374 drflac_uint64 i;
11375 drflac_uint64 frameCount4 = frameCount >> 2;
11376 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11377 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11378 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11379 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11380 float factor = 1 / 2147483648.0;
11381
11382 for (i = 0; i < frameCount4; ++i) {
11383 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
11384 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
11385 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
11386 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
11387
11388 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
11389 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
11390 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
11391 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
11392
11393 pOutputSamples[i*8+0] = (drflac_int32)tempL0 * factor;
11394 pOutputSamples[i*8+1] = (drflac_int32)tempR0 * factor;
11395 pOutputSamples[i*8+2] = (drflac_int32)tempL1 * factor;
11396 pOutputSamples[i*8+3] = (drflac_int32)tempR1 * factor;
11397 pOutputSamples[i*8+4] = (drflac_int32)tempL2 * factor;
11398 pOutputSamples[i*8+5] = (drflac_int32)tempR2 * factor;
11399 pOutputSamples[i*8+6] = (drflac_int32)tempL3 * factor;
11400 pOutputSamples[i*8+7] = (drflac_int32)tempR3 * factor;
11401 }
11402
11403 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11404 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11405 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11406 }
11407}
11408
11409#if defined(DRFLAC_SUPPORT_SSE2)
11410static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11411{
11412 drflac_uint64 i;
11413 drflac_uint64 frameCount4 = frameCount >> 2;
11414 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11415 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11416 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
11417 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
11418
11419 float factor = 1.0f / 8388608.0f;
11420 __m128 factor128 = _mm_set1_ps(factor);
11421
11422 for (i = 0; i < frameCount4; ++i) {
11423 __m128i lefti;
11424 __m128i righti;
11425 __m128 leftf;
11426 __m128 rightf;
11427
11428 lefti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
11429 righti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
11430
11431 leftf = _mm_mul_ps(_mm_cvtepi32_ps(lefti), factor128);
11432 rightf = _mm_mul_ps(_mm_cvtepi32_ps(righti), factor128);
11433
11434 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11435 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11436 }
11437
11438 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11439 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11440 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11441 }
11442}
11443#endif
11444
11445#if defined(DRFLAC_SUPPORT_NEON)
11446static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11447{
11448 drflac_uint64 i;
11449 drflac_uint64 frameCount4 = frameCount >> 2;
11450 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11451 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11452 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
11453 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
11454
11455 float factor = 1.0f / 8388608.0f;
11456 float32x4_t factor4 = vdupq_n_f32(factor);
11457 int32x4_t shift0_4 = vdupq_n_s32(shift0);
11458 int32x4_t shift1_4 = vdupq_n_s32(shift1);
11459
11460 for (i = 0; i < frameCount4; ++i) {
11461 int32x4_t lefti;
11462 int32x4_t righti;
11463 float32x4_t leftf;
11464 float32x4_t rightf;
11465
11466 lefti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
11467 righti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
11468
11469 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11470 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11471
11472 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11473 }
11474
11475 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11476 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11477 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11478 }
11479}
11480#endif
11481
11482static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11483{
11484#if defined(DRFLAC_SUPPORT_SSE2)
11485 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11486 drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11487 } else
11488#elif defined(DRFLAC_SUPPORT_NEON)
11489 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11490 drflac_read_pcm_frames_f32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11491 } else
11492#endif
11493 {
11494 /* Scalar fallback. */
9e052883 11495#if 0
11496 drflac_read_pcm_frames_f32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11497#else
2ff0b512 11498 drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 11499#endif
2ff0b512 11500 }
11501}
11502
11503DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut)
11504{
11505 drflac_uint64 framesRead;
11506 drflac_uint32 unusedBitsPerSample;
11507
11508 if (pFlac == NULL || framesToRead == 0) {
11509 return 0;
11510 }
11511
11512 if (pBufferOut == NULL) {
11513 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
11514 }
11515
11516 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
11517 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
11518
11519 framesRead = 0;
11520 while (framesToRead > 0) {
11521 /* If we've run out of samples in this frame, go to the next. */
11522 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
11523 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
11524 break; /* Couldn't read the next frame, so just break from the loop and return. */
11525 }
11526 } else {
11527 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
11528 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
11529 drflac_uint64 frameCountThisIteration = framesToRead;
11530
11531 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
11532 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
11533 }
11534
11535 if (channelCount == 2) {
11536 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
11537 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
11538
11539 switch (pFlac->currentFLACFrame.header.channelAssignment)
11540 {
11541 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
11542 {
11543 drflac_read_pcm_frames_f32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11544 } break;
11545
11546 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
11547 {
11548 drflac_read_pcm_frames_f32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11549 } break;
11550
11551 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
11552 {
11553 drflac_read_pcm_frames_f32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11554 } break;
11555
11556 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
11557 default:
11558 {
11559 drflac_read_pcm_frames_f32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11560 } break;
11561 }
11562 } else {
11563 /* Generic interleaving. */
11564 drflac_uint64 i;
11565 for (i = 0; i < frameCountThisIteration; ++i) {
11566 unsigned int j;
11567 for (j = 0; j < channelCount; ++j) {
11568 drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
11569 pBufferOut[(i*channelCount)+j] = (float)(sampleS32 / 2147483648.0);
11570 }
11571 }
11572 }
11573
11574 framesRead += frameCountThisIteration;
11575 pBufferOut += frameCountThisIteration * channelCount;
11576 framesToRead -= frameCountThisIteration;
11577 pFlac->currentPCMFrame += frameCountThisIteration;
11578 pFlac->currentFLACFrame.pcmFramesRemaining -= (unsigned int)frameCountThisIteration;
11579 }
11580 }
11581
11582 return framesRead;
11583}
11584
11585
11586DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
11587{
11588 if (pFlac == NULL) {
11589 return DRFLAC_FALSE;
11590 }
11591
11592 /* Don't do anything if we're already on the seek point. */
11593 if (pFlac->currentPCMFrame == pcmFrameIndex) {
11594 return DRFLAC_TRUE;
11595 }
11596
11597 /*
11598 If we don't know where the first frame begins then we can't seek. This will happen when the STREAMINFO block was not present
11599 when the decoder was opened.
11600 */
11601 if (pFlac->firstFLACFramePosInBytes == 0) {
11602 return DRFLAC_FALSE;
11603 }
11604
11605 if (pcmFrameIndex == 0) {
11606 pFlac->currentPCMFrame = 0;
11607 return drflac__seek_to_first_frame(pFlac);
11608 } else {
11609 drflac_bool32 wasSuccessful = DRFLAC_FALSE;
9e052883 11610 drflac_uint64 originalPCMFrame = pFlac->currentPCMFrame;
2ff0b512 11611
11612 /* Clamp the sample to the end. */
11613 if (pcmFrameIndex > pFlac->totalPCMFrameCount) {
11614 pcmFrameIndex = pFlac->totalPCMFrameCount;
11615 }
11616
11617 /* If the target sample and the current sample are in the same frame we just move the position forward. */
11618 if (pcmFrameIndex > pFlac->currentPCMFrame) {
11619 /* Forward. */
11620 drflac_uint32 offset = (drflac_uint32)(pcmFrameIndex - pFlac->currentPCMFrame);
11621 if (pFlac->currentFLACFrame.pcmFramesRemaining > offset) {
11622 pFlac->currentFLACFrame.pcmFramesRemaining -= offset;
11623 pFlac->currentPCMFrame = pcmFrameIndex;
11624 return DRFLAC_TRUE;
11625 }
11626 } else {
11627 /* Backward. */
11628 drflac_uint32 offsetAbs = (drflac_uint32)(pFlac->currentPCMFrame - pcmFrameIndex);
11629 drflac_uint32 currentFLACFramePCMFrameCount = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
11630 drflac_uint32 currentFLACFramePCMFramesConsumed = currentFLACFramePCMFrameCount - pFlac->currentFLACFrame.pcmFramesRemaining;
11631 if (currentFLACFramePCMFramesConsumed > offsetAbs) {
11632 pFlac->currentFLACFrame.pcmFramesRemaining += offsetAbs;
11633 pFlac->currentPCMFrame = pcmFrameIndex;
11634 return DRFLAC_TRUE;
11635 }
11636 }
11637
11638 /*
11639 Different techniques depending on encapsulation. Using the native FLAC seektable with Ogg encapsulation is a bit awkward so
11640 we'll instead use Ogg's natural seeking facility.
11641 */
11642#ifndef DR_FLAC_NO_OGG
11643 if (pFlac->container == drflac_container_ogg)
11644 {
11645 wasSuccessful = drflac_ogg__seek_to_pcm_frame(pFlac, pcmFrameIndex);
11646 }
11647 else
11648#endif
11649 {
11650 /* First try seeking via the seek table. If this fails, fall back to a brute force seek which is much slower. */
11651 if (/*!wasSuccessful && */!pFlac->_noSeekTableSeek) {
11652 wasSuccessful = drflac__seek_to_pcm_frame__seek_table(pFlac, pcmFrameIndex);
11653 }
11654
11655#if !defined(DR_FLAC_NO_CRC)
11656 /* Fall back to binary search if seek table seeking fails. This requires the length of the stream to be known. */
11657 if (!wasSuccessful && !pFlac->_noBinarySearchSeek && pFlac->totalPCMFrameCount > 0) {
11658 wasSuccessful = drflac__seek_to_pcm_frame__binary_search(pFlac, pcmFrameIndex);
11659 }
11660#endif
11661
11662 /* Fall back to brute force if all else fails. */
11663 if (!wasSuccessful && !pFlac->_noBruteForceSeek) {
11664 wasSuccessful = drflac__seek_to_pcm_frame__brute_force(pFlac, pcmFrameIndex);
11665 }
11666 }
11667
9e052883 11668 if (wasSuccessful) {
11669 pFlac->currentPCMFrame = pcmFrameIndex;
11670 } else {
11671 /* Seek failed. Try putting the decoder back to it's original state. */
11672 if (drflac_seek_to_pcm_frame(pFlac, originalPCMFrame) == DRFLAC_FALSE) {
11673 /* Failed to seek back to the original PCM frame. Fall back to 0. */
11674 drflac_seek_to_pcm_frame(pFlac, 0);
11675 }
11676 }
11677
2ff0b512 11678 return wasSuccessful;
11679 }
11680}
11681
11682
11683
11684/* High Level APIs */
11685
648db22b 11686/* SIZE_MAX */
2ff0b512 11687#if defined(SIZE_MAX)
11688 #define DRFLAC_SIZE_MAX SIZE_MAX
11689#else
11690 #if defined(DRFLAC_64BIT)
11691 #define DRFLAC_SIZE_MAX ((drflac_uint64)0xFFFFFFFFFFFFFFFF)
11692 #else
11693 #define DRFLAC_SIZE_MAX 0xFFFFFFFF
11694 #endif
11695#endif
648db22b 11696/* End SIZE_MAX */
2ff0b512 11697
11698
11699/* Using a macro as the definition of the drflac__full_decode_and_close_*() API family. Sue me. */
11700#define DRFLAC_DEFINE_FULL_READ_AND_CLOSE(extension, type) \
11701static type* drflac__full_read_and_close_ ## extension (drflac* pFlac, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut)\
11702{ \
11703 type* pSampleData = NULL; \
11704 drflac_uint64 totalPCMFrameCount; \
11705 \
11706 DRFLAC_ASSERT(pFlac != NULL); \
11707 \
11708 totalPCMFrameCount = pFlac->totalPCMFrameCount; \
11709 \
11710 if (totalPCMFrameCount == 0) { \
11711 type buffer[4096]; \
11712 drflac_uint64 pcmFramesRead; \
11713 size_t sampleDataBufferSize = sizeof(buffer); \
11714 \
11715 pSampleData = (type*)drflac__malloc_from_callbacks(sampleDataBufferSize, &pFlac->allocationCallbacks); \
11716 if (pSampleData == NULL) { \
11717 goto on_error; \
11718 } \
11719 \
11720 while ((pcmFramesRead = (drflac_uint64)drflac_read_pcm_frames_##extension(pFlac, sizeof(buffer)/sizeof(buffer[0])/pFlac->channels, buffer)) > 0) { \
11721 if (((totalPCMFrameCount + pcmFramesRead) * pFlac->channels * sizeof(type)) > sampleDataBufferSize) { \
11722 type* pNewSampleData; \
11723 size_t newSampleDataBufferSize; \
11724 \
11725 newSampleDataBufferSize = sampleDataBufferSize * 2; \
11726 pNewSampleData = (type*)drflac__realloc_from_callbacks(pSampleData, newSampleDataBufferSize, sampleDataBufferSize, &pFlac->allocationCallbacks); \
11727 if (pNewSampleData == NULL) { \
11728 drflac__free_from_callbacks(pSampleData, &pFlac->allocationCallbacks); \
11729 goto on_error; \
11730 } \
11731 \
11732 sampleDataBufferSize = newSampleDataBufferSize; \
11733 pSampleData = pNewSampleData; \
11734 } \
11735 \
11736 DRFLAC_COPY_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), buffer, (size_t)(pcmFramesRead*pFlac->channels*sizeof(type))); \
11737 totalPCMFrameCount += pcmFramesRead; \
11738 } \
11739 \
11740 /* At this point everything should be decoded, but we just want to fill the unused part buffer with silence - need to \
11741 protect those ears from random noise! */ \
11742 DRFLAC_ZERO_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), (size_t)(sampleDataBufferSize - totalPCMFrameCount*pFlac->channels*sizeof(type))); \
11743 } else { \
11744 drflac_uint64 dataSize = totalPCMFrameCount*pFlac->channels*sizeof(type); \
9e052883 11745 if (dataSize > (drflac_uint64)DRFLAC_SIZE_MAX) { \
2ff0b512 11746 goto on_error; /* The decoded data is too big. */ \
11747 } \
11748 \
11749 pSampleData = (type*)drflac__malloc_from_callbacks((size_t)dataSize, &pFlac->allocationCallbacks); /* <-- Safe cast as per the check above. */ \
11750 if (pSampleData == NULL) { \
11751 goto on_error; \
11752 } \
11753 \
11754 totalPCMFrameCount = drflac_read_pcm_frames_##extension(pFlac, pFlac->totalPCMFrameCount, pSampleData); \
11755 } \
11756 \
11757 if (sampleRateOut) *sampleRateOut = pFlac->sampleRate; \
11758 if (channelsOut) *channelsOut = pFlac->channels; \
11759 if (totalPCMFrameCountOut) *totalPCMFrameCountOut = totalPCMFrameCount; \
11760 \
11761 drflac_close(pFlac); \
11762 return pSampleData; \
11763 \
11764on_error: \
11765 drflac_close(pFlac); \
11766 return NULL; \
11767}
11768
11769DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s32, drflac_int32)
11770DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s16, drflac_int16)
11771DRFLAC_DEFINE_FULL_READ_AND_CLOSE(f32, float)
11772
11773DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11774{
11775 drflac* pFlac;
11776
11777 if (channelsOut) {
11778 *channelsOut = 0;
11779 }
11780 if (sampleRateOut) {
11781 *sampleRateOut = 0;
11782 }
11783 if (totalPCMFrameCountOut) {
11784 *totalPCMFrameCountOut = 0;
11785 }
11786
11787 pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11788 if (pFlac == NULL) {
11789 return NULL;
11790 }
11791
11792 return drflac__full_read_and_close_s32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11793}
11794
11795DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11796{
11797 drflac* pFlac;
11798
11799 if (channelsOut) {
11800 *channelsOut = 0;
11801 }
11802 if (sampleRateOut) {
11803 *sampleRateOut = 0;
11804 }
11805 if (totalPCMFrameCountOut) {
11806 *totalPCMFrameCountOut = 0;
11807 }
11808
11809 pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11810 if (pFlac == NULL) {
11811 return NULL;
11812 }
11813
11814 return drflac__full_read_and_close_s16(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11815}
11816
11817DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11818{
11819 drflac* pFlac;
11820
11821 if (channelsOut) {
11822 *channelsOut = 0;
11823 }
11824 if (sampleRateOut) {
11825 *sampleRateOut = 0;
11826 }
11827 if (totalPCMFrameCountOut) {
11828 *totalPCMFrameCountOut = 0;
11829 }
11830
11831 pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11832 if (pFlac == NULL) {
11833 return NULL;
11834 }
11835
11836 return drflac__full_read_and_close_f32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11837}
11838
9e052883 11839#ifndef DR_FLAC_NO_STDIO
11840DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11841{
11842 drflac* pFlac;
11843
11844 if (sampleRate) {
11845 *sampleRate = 0;
11846 }
11847 if (channels) {
11848 *channels = 0;
11849 }
11850 if (totalPCMFrameCount) {
11851 *totalPCMFrameCount = 0;
11852 }
11853
11854 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11855 if (pFlac == NULL) {
11856 return NULL;
11857 }
11858
11859 return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11860}
11861
11862DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11863{
11864 drflac* pFlac;
11865
11866 if (sampleRate) {
11867 *sampleRate = 0;
11868 }
11869 if (channels) {
11870 *channels = 0;
11871 }
11872 if (totalPCMFrameCount) {
11873 *totalPCMFrameCount = 0;
11874 }
11875
11876 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11877 if (pFlac == NULL) {
11878 return NULL;
11879 }
11880
11881 return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11882}
11883
11884DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11885{
11886 drflac* pFlac;
11887
11888 if (sampleRate) {
11889 *sampleRate = 0;
11890 }
11891 if (channels) {
11892 *channels = 0;
11893 }
11894 if (totalPCMFrameCount) {
11895 *totalPCMFrameCount = 0;
11896 }
11897
11898 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11899 if (pFlac == NULL) {
11900 return NULL;
11901 }
11902
11903 return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11904}
11905#endif
11906
2ff0b512 11907DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11908{
11909 drflac* pFlac;
11910
11911 if (sampleRate) {
11912 *sampleRate = 0;
11913 }
11914 if (channels) {
11915 *channels = 0;
11916 }
11917 if (totalPCMFrameCount) {
11918 *totalPCMFrameCount = 0;
11919 }
11920
11921 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11922 if (pFlac == NULL) {
11923 return NULL;
11924 }
11925
11926 return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11927}
11928
11929DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11930{
11931 drflac* pFlac;
11932
11933 if (sampleRate) {
11934 *sampleRate = 0;
11935 }
11936 if (channels) {
11937 *channels = 0;
11938 }
11939 if (totalPCMFrameCount) {
11940 *totalPCMFrameCount = 0;
11941 }
11942
11943 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11944 if (pFlac == NULL) {
11945 return NULL;
11946 }
11947
11948 return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11949}
11950
11951DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11952{
11953 drflac* pFlac;
11954
11955 if (sampleRate) {
11956 *sampleRate = 0;
11957 }
11958 if (channels) {
11959 *channels = 0;
11960 }
11961 if (totalPCMFrameCount) {
11962 *totalPCMFrameCount = 0;
11963 }
11964
11965 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11966 if (pFlac == NULL) {
11967 return NULL;
11968 }
11969
11970 return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11971}
11972
11973
11974DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
11975{
11976 if (pAllocationCallbacks != NULL) {
11977 drflac__free_from_callbacks(p, pAllocationCallbacks);
11978 } else {
11979 drflac__free_default(p, NULL);
11980 }
11981}
11982
11983
11984
11985
11986DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments)
11987{
11988 if (pIter == NULL) {
11989 return;
11990 }
11991
11992 pIter->countRemaining = commentCount;
11993 pIter->pRunningData = (const char*)pComments;
11994}
11995
11996DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut)
11997{
11998 drflac_int32 length;
11999 const char* pComment;
12000
12001 /* Safety. */
12002 if (pCommentLengthOut) {
12003 *pCommentLengthOut = 0;
12004 }
12005
12006 if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
12007 return NULL;
12008 }
12009
9e052883 12010 length = drflac__le2host_32_ptr_unaligned(pIter->pRunningData);
2ff0b512 12011 pIter->pRunningData += 4;
12012
12013 pComment = pIter->pRunningData;
12014 pIter->pRunningData += length;
12015 pIter->countRemaining -= 1;
12016
12017 if (pCommentLengthOut) {
12018 *pCommentLengthOut = length;
12019 }
12020
12021 return pComment;
12022}
12023
12024
12025
12026
12027DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData)
12028{
12029 if (pIter == NULL) {
12030 return;
12031 }
12032
12033 pIter->countRemaining = trackCount;
12034 pIter->pRunningData = (const char*)pTrackData;
12035}
12036
12037DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack)
12038{
12039 drflac_cuesheet_track cuesheetTrack;
12040 const char* pRunningData;
12041 drflac_uint64 offsetHi;
12042 drflac_uint64 offsetLo;
12043
12044 if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
12045 return DRFLAC_FALSE;
12046 }
12047
12048 pRunningData = pIter->pRunningData;
12049
12050 offsetHi = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
12051 offsetLo = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
12052 cuesheetTrack.offset = offsetLo | (offsetHi << 32);
12053 cuesheetTrack.trackNumber = pRunningData[0]; pRunningData += 1;
12054 DRFLAC_COPY_MEMORY(cuesheetTrack.ISRC, pRunningData, sizeof(cuesheetTrack.ISRC)); pRunningData += 12;
12055 cuesheetTrack.isAudio = (pRunningData[0] & 0x80) != 0;
12056 cuesheetTrack.preEmphasis = (pRunningData[0] & 0x40) != 0; pRunningData += 14;
12057 cuesheetTrack.indexCount = pRunningData[0]; pRunningData += 1;
12058 cuesheetTrack.pIndexPoints = (const drflac_cuesheet_track_index*)pRunningData; pRunningData += cuesheetTrack.indexCount * sizeof(drflac_cuesheet_track_index);
12059
12060 pIter->pRunningData = pRunningData;
12061 pIter->countRemaining -= 1;
12062
12063 if (pCuesheetTrack) {
12064 *pCuesheetTrack = cuesheetTrack;
12065 }
12066
12067 return DRFLAC_TRUE;
12068}
12069
12070#if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
12071 #pragma GCC diagnostic pop
12072#endif
12073#endif /* dr_flac_c */
12074#endif /* DR_FLAC_IMPLEMENTATION */
12075
12076
12077/*
12078REVISION HISTORY
12079================
648db22b 12080v0.12.42 - 2023-11-02
12081 - Fix build for ARMv6-M.
12082 - Fix a compilation warning with GCC.
12083
12084v0.12.41 - 2023-06-17
12085 - Fix an incorrect date in revision history. No functional change.
12086
12087v0.12.40 - 2023-05-22
12088 - Minor code restructure. No functional change.
12089
9e052883 12090v0.12.39 - 2022-09-17
12091 - Fix compilation with DJGPP.
12092 - Fix compilation error with Visual Studio 2019 and the ARM build.
12093 - Fix an error with SSE 4.1 detection.
12094 - Add support for disabling wchar_t with DR_WAV_NO_WCHAR.
12095 - Improve compatibility with compilers which lack support for explicit struct packing.
12096 - Improve compatibility with low-end and embedded hardware by reducing the amount of stack
12097 allocation when loading an Ogg encapsulated file.
12098
12099v0.12.38 - 2022-04-10
12100 - Fix compilation error on older versions of GCC.
12101
12102v0.12.37 - 2022-02-12
12103 - Improve ARM detection.
12104
12105v0.12.36 - 2022-02-07
12106 - Fix a compilation error with the ARM build.
12107
12108v0.12.35 - 2022-02-06
12109 - Fix a bug due to underestimating the amount of precision required for the prediction stage.
12110 - Fix some bugs found from fuzz testing.
12111
12112v0.12.34 - 2022-01-07
12113 - Fix some misalignment bugs when reading metadata.
12114
12115v0.12.33 - 2021-12-22
12116 - Fix a bug with seeking when the seek table does not start at PCM frame 0.
12117
12118v0.12.32 - 2021-12-11
12119 - Fix a warning with Clang.
12120
12121v0.12.31 - 2021-08-16
12122 - Silence some warnings.
12123
12124v0.12.30 - 2021-07-31
12125 - Fix platform detection for ARM64.
12126
12127v0.12.29 - 2021-04-02
12128 - Fix a bug where the running PCM frame index is set to an invalid value when over-seeking.
12129 - Fix a decoding error due to an incorrect validation check.
12130
2ff0b512 12131v0.12.28 - 2021-02-21
12132 - Fix a warning due to referencing _MSC_VER when it is undefined.
12133
12134v0.12.27 - 2021-01-31
12135 - Fix a static analysis warning.
12136
12137v0.12.26 - 2021-01-17
12138 - Fix a compilation warning due to _BSD_SOURCE being deprecated.
12139
12140v0.12.25 - 2020-12-26
12141 - Update documentation.
12142
12143v0.12.24 - 2020-11-29
12144 - Fix ARM64/NEON detection when compiling with MSVC.
12145
12146v0.12.23 - 2020-11-21
12147 - Fix compilation with OpenWatcom.
12148
12149v0.12.22 - 2020-11-01
12150 - Fix an error with the previous release.
12151
12152v0.12.21 - 2020-11-01
12153 - Fix a possible deadlock when seeking.
12154 - Improve compiler support for older versions of GCC.
12155
12156v0.12.20 - 2020-09-08
12157 - Fix a compilation error on older compilers.
12158
12159v0.12.19 - 2020-08-30
12160 - Fix a bug due to an undefined 32-bit shift.
12161
12162v0.12.18 - 2020-08-14
12163 - Fix a crash when compiling with clang-cl.
12164
12165v0.12.17 - 2020-08-02
12166 - Simplify sized types.
12167
12168v0.12.16 - 2020-07-25
12169 - Fix a compilation warning.
12170
12171v0.12.15 - 2020-07-06
12172 - Check for negative LPC shifts and return an error.
12173
12174v0.12.14 - 2020-06-23
12175 - Add include guard for the implementation section.
12176
12177v0.12.13 - 2020-05-16
12178 - Add compile-time and run-time version querying.
12179 - DRFLAC_VERSION_MINOR
12180 - DRFLAC_VERSION_MAJOR
12181 - DRFLAC_VERSION_REVISION
12182 - DRFLAC_VERSION_STRING
12183 - drflac_version()
12184 - drflac_version_string()
12185
12186v0.12.12 - 2020-04-30
12187 - Fix compilation errors with VC6.
12188
12189v0.12.11 - 2020-04-19
12190 - Fix some pedantic warnings.
12191 - Fix some undefined behaviour warnings.
12192
12193v0.12.10 - 2020-04-10
12194 - Fix some bugs when trying to seek with an invalid seek table.
12195
12196v0.12.9 - 2020-04-05
12197 - Fix warnings.
12198
12199v0.12.8 - 2020-04-04
12200 - Add drflac_open_file_w() and drflac_open_file_with_metadata_w().
12201 - Fix some static analysis warnings.
12202 - Minor documentation updates.
12203
12204v0.12.7 - 2020-03-14
12205 - Fix compilation errors with VC6.
12206
12207v0.12.6 - 2020-03-07
12208 - Fix compilation error with Visual Studio .NET 2003.
12209
12210v0.12.5 - 2020-01-30
12211 - Silence some static analysis warnings.
12212
12213v0.12.4 - 2020-01-29
12214 - Silence some static analysis warnings.
12215
12216v0.12.3 - 2019-12-02
12217 - Fix some warnings when compiling with GCC and the -Og flag.
12218 - Fix a crash in out-of-memory situations.
12219 - Fix potential integer overflow bug.
12220 - Fix some static analysis warnings.
12221 - Fix a possible crash when using custom memory allocators without a custom realloc() implementation.
12222 - Fix a bug with binary search seeking where the bits per sample is not a multiple of 8.
12223
12224v0.12.2 - 2019-10-07
12225 - Internal code clean up.
12226
12227v0.12.1 - 2019-09-29
12228 - Fix some Clang Static Analyzer warnings.
12229 - Fix an unused variable warning.
12230
12231v0.12.0 - 2019-09-23
12232 - API CHANGE: Add support for user defined memory allocation routines. This system allows the program to specify their own memory allocation
12233 routines with a user data pointer for client-specific contextual data. This adds an extra parameter to the end of the following APIs:
12234 - drflac_open()
12235 - drflac_open_relaxed()
12236 - drflac_open_with_metadata()
12237 - drflac_open_with_metadata_relaxed()
12238 - drflac_open_file()
12239 - drflac_open_file_with_metadata()
12240 - drflac_open_memory()
12241 - drflac_open_memory_with_metadata()
12242 - drflac_open_and_read_pcm_frames_s32()
12243 - drflac_open_and_read_pcm_frames_s16()
12244 - drflac_open_and_read_pcm_frames_f32()
12245 - drflac_open_file_and_read_pcm_frames_s32()
12246 - drflac_open_file_and_read_pcm_frames_s16()
12247 - drflac_open_file_and_read_pcm_frames_f32()
12248 - drflac_open_memory_and_read_pcm_frames_s32()
12249 - drflac_open_memory_and_read_pcm_frames_s16()
12250 - drflac_open_memory_and_read_pcm_frames_f32()
12251 Set this extra parameter to NULL to use defaults which is the same as the previous behaviour. Setting this NULL will use
12252 DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
12253 - Remove deprecated APIs:
12254 - drflac_read_s32()
12255 - drflac_read_s16()
12256 - drflac_read_f32()
12257 - drflac_seek_to_sample()
12258 - drflac_open_and_decode_s32()
12259 - drflac_open_and_decode_s16()
12260 - drflac_open_and_decode_f32()
12261 - drflac_open_and_decode_file_s32()
12262 - drflac_open_and_decode_file_s16()
12263 - drflac_open_and_decode_file_f32()
12264 - drflac_open_and_decode_memory_s32()
12265 - drflac_open_and_decode_memory_s16()
12266 - drflac_open_and_decode_memory_f32()
12267 - Remove drflac.totalSampleCount which is now replaced with drflac.totalPCMFrameCount. You can emulate drflac.totalSampleCount
12268 by doing pFlac->totalPCMFrameCount*pFlac->channels.
12269 - Rename drflac.currentFrame to drflac.currentFLACFrame to remove ambiguity with PCM frames.
12270 - Fix errors when seeking to the end of a stream.
12271 - Optimizations to seeking.
12272 - SSE improvements and optimizations.
12273 - ARM NEON optimizations.
12274 - Optimizations to drflac_read_pcm_frames_s16().
12275 - Optimizations to drflac_read_pcm_frames_s32().
12276
12277v0.11.10 - 2019-06-26
12278 - Fix a compiler error.
12279
12280v0.11.9 - 2019-06-16
12281 - Silence some ThreadSanitizer warnings.
12282
12283v0.11.8 - 2019-05-21
12284 - Fix warnings.
12285
12286v0.11.7 - 2019-05-06
12287 - C89 fixes.
12288
12289v0.11.6 - 2019-05-05
12290 - Add support for C89.
12291 - Fix a compiler warning when CRC is disabled.
12292 - Change license to choice of public domain or MIT-0.
12293
12294v0.11.5 - 2019-04-19
12295 - Fix a compiler error with GCC.
12296
12297v0.11.4 - 2019-04-17
12298 - Fix some warnings with GCC when compiling with -std=c99.
12299
12300v0.11.3 - 2019-04-07
12301 - Silence warnings with GCC.
12302
12303v0.11.2 - 2019-03-10
12304 - Fix a warning.
12305
12306v0.11.1 - 2019-02-17
12307 - Fix a potential bug with seeking.
12308
12309v0.11.0 - 2018-12-16
12310 - API CHANGE: Deprecated drflac_read_s32(), drflac_read_s16() and drflac_read_f32() and replaced them with
12311 drflac_read_pcm_frames_s32(), drflac_read_pcm_frames_s16() and drflac_read_pcm_frames_f32(). The new APIs take
12312 and return PCM frame counts instead of sample counts. To upgrade you will need to change the input count by
12313 dividing it by the channel count, and then do the same with the return value.
12314 - API_CHANGE: Deprecated drflac_seek_to_sample() and replaced with drflac_seek_to_pcm_frame(). Same rules as
12315 the changes to drflac_read_*() apply.
12316 - API CHANGE: Deprecated drflac_open_and_decode_*() and replaced with drflac_open_*_and_read_*(). Same rules as
12317 the changes to drflac_read_*() apply.
12318 - Optimizations.
12319
12320v0.10.0 - 2018-09-11
12321 - Remove the DR_FLAC_NO_WIN32_IO option and the Win32 file IO functionality. If you need to use Win32 file IO you
12322 need to do it yourself via the callback API.
12323 - Fix the clang build.
12324 - Fix undefined behavior.
12325 - Fix errors with CUESHEET metdata blocks.
12326 - Add an API for iterating over each cuesheet track in the CUESHEET metadata block. This works the same way as the
12327 Vorbis comment API.
12328 - Other miscellaneous bug fixes, mostly relating to invalid FLAC streams.
12329 - Minor optimizations.
12330
12331v0.9.11 - 2018-08-29
12332 - Fix a bug with sample reconstruction.
12333
12334v0.9.10 - 2018-08-07
12335 - Improve 64-bit detection.
12336
12337v0.9.9 - 2018-08-05
12338 - Fix C++ build on older versions of GCC.
12339
12340v0.9.8 - 2018-07-24
12341 - Fix compilation errors.
12342
12343v0.9.7 - 2018-07-05
12344 - Fix a warning.
12345
12346v0.9.6 - 2018-06-29
12347 - Fix some typos.
12348
12349v0.9.5 - 2018-06-23
12350 - Fix some warnings.
12351
12352v0.9.4 - 2018-06-14
12353 - Optimizations to seeking.
12354 - Clean up.
12355
12356v0.9.3 - 2018-05-22
12357 - Bug fix.
12358
12359v0.9.2 - 2018-05-12
12360 - Fix a compilation error due to a missing break statement.
12361
12362v0.9.1 - 2018-04-29
12363 - Fix compilation error with Clang.
12364
12365v0.9 - 2018-04-24
12366 - Fix Clang build.
12367 - Start using major.minor.revision versioning.
12368
12369v0.8g - 2018-04-19
12370 - Fix build on non-x86/x64 architectures.
12371
12372v0.8f - 2018-02-02
12373 - Stop pretending to support changing rate/channels mid stream.
12374
12375v0.8e - 2018-02-01
12376 - Fix a crash when the block size of a frame is larger than the maximum block size defined by the FLAC stream.
12377 - Fix a crash the the Rice partition order is invalid.
12378
12379v0.8d - 2017-09-22
12380 - Add support for decoding streams with ID3 tags. ID3 tags are just skipped.
12381
12382v0.8c - 2017-09-07
12383 - Fix warning on non-x86/x64 architectures.
12384
12385v0.8b - 2017-08-19
12386 - Fix build on non-x86/x64 architectures.
12387
12388v0.8a - 2017-08-13
12389 - A small optimization for the Clang build.
12390
12391v0.8 - 2017-08-12
12392 - API CHANGE: Rename dr_* types to drflac_*.
12393 - Optimizations. This brings dr_flac back to about the same class of efficiency as the reference implementation.
12394 - Add support for custom implementations of malloc(), realloc(), etc.
12395 - Add CRC checking to Ogg encapsulated streams.
12396 - Fix VC++ 6 build. This is only for the C++ compiler. The C compiler is not currently supported.
12397 - Bug fixes.
12398
12399v0.7 - 2017-07-23
12400 - Add support for opening a stream without a header block. To do this, use drflac_open_relaxed() / drflac_open_with_metadata_relaxed().
12401
12402v0.6 - 2017-07-22
12403 - Add support for recovering from invalid frames. With this change, dr_flac will simply skip over invalid frames as if they
12404 never existed. Frames are checked against their sync code, the CRC-8 of the frame header and the CRC-16 of the whole frame.
12405
12406v0.5 - 2017-07-16
12407 - Fix typos.
12408 - Change drflac_bool* types to unsigned.
12409 - Add CRC checking. This makes dr_flac slower, but can be disabled with #define DR_FLAC_NO_CRC.
12410
12411v0.4f - 2017-03-10
12412 - Fix a couple of bugs with the bitstreaming code.
12413
12414v0.4e - 2017-02-17
12415 - Fix some warnings.
12416
12417v0.4d - 2016-12-26
12418 - Add support for 32-bit floating-point PCM decoding.
12419 - Use drflac_int* and drflac_uint* sized types to improve compiler support.
12420 - Minor improvements to documentation.
12421
12422v0.4c - 2016-12-26
12423 - Add support for signed 16-bit integer PCM decoding.
12424
12425v0.4b - 2016-10-23
12426 - A minor change to drflac_bool8 and drflac_bool32 types.
12427
12428v0.4a - 2016-10-11
12429 - Rename drBool32 to drflac_bool32 for styling consistency.
12430
12431v0.4 - 2016-09-29
12432 - API/ABI CHANGE: Use fixed size 32-bit booleans instead of the built-in bool type.
12433 - API CHANGE: Rename drflac_open_and_decode*() to drflac_open_and_decode*_s32().
12434 - API CHANGE: Swap the order of "channels" and "sampleRate" parameters in drflac_open_and_decode*(). Rationale for this is to
12435 keep it consistent with drflac_audio.
12436
12437v0.3f - 2016-09-21
12438 - Fix a warning with GCC.
12439
12440v0.3e - 2016-09-18
12441 - Fixed a bug where GCC 4.3+ was not getting properly identified.
12442 - Fixed a few typos.
12443 - Changed date formats to ISO 8601 (YYYY-MM-DD).
12444
12445v0.3d - 2016-06-11
12446 - Minor clean up.
12447
12448v0.3c - 2016-05-28
12449 - Fixed compilation error.
12450
12451v0.3b - 2016-05-16
12452 - Fixed Linux/GCC build.
12453 - Updated documentation.
12454
12455v0.3a - 2016-05-15
12456 - Minor fixes to documentation.
12457
12458v0.3 - 2016-05-11
12459 - Optimizations. Now at about parity with the reference implementation on 32-bit builds.
12460 - Lots of clean up.
12461
12462v0.2b - 2016-05-10
12463 - Bug fixes.
12464
12465v0.2a - 2016-05-10
12466 - Made drflac_open_and_decode() more robust.
12467 - Removed an unused debugging variable
12468
12469v0.2 - 2016-05-09
12470 - Added support for Ogg encapsulation.
12471 - API CHANGE. Have the onSeek callback take a third argument which specifies whether or not the seek
12472 should be relative to the start or the current position. Also changes the seeking rules such that
12473 seeking offsets will never be negative.
12474 - Have drflac_open_and_decode() fail gracefully if the stream has an unknown total sample count.
12475
12476v0.1b - 2016-05-07
12477 - Properly close the file handle in drflac_open_file() and family when the decoder fails to initialize.
12478 - Removed a stale comment.
12479
12480v0.1a - 2016-05-05
12481 - Minor formatting changes.
12482 - Fixed a warning on the GCC build.
12483
12484v0.1 - 2016-05-03
12485 - Initial versioned release.
12486*/
12487
12488/*
12489This software is available as a choice of the following licenses. Choose
12490whichever you prefer.
12491
12492===============================================================================
12493ALTERNATIVE 1 - Public Domain (www.unlicense.org)
12494===============================================================================
12495This is free and unencumbered software released into the public domain.
12496
12497Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
12498software, either in source code form or as a compiled binary, for any purpose,
12499commercial or non-commercial, and by any means.
12500
12501In jurisdictions that recognize copyright laws, the author or authors of this
12502software dedicate any and all copyright interest in the software to the public
12503domain. We make this dedication for the benefit of the public at large and to
12504the detriment of our heirs and successors. We intend this dedication to be an
12505overt act of relinquishment in perpetuity of all present and future rights to
12506this software under copyright law.
12507
12508THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12509IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12510FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12511AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
12512ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
12513WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
12514
12515For more information, please refer to <http://unlicense.org/>
12516
12517===============================================================================
12518ALTERNATIVE 2 - MIT No Attribution
12519===============================================================================
648db22b 12520Copyright 2023 David Reid
2ff0b512 12521
12522Permission is hereby granted, free of charge, to any person obtaining a copy of
12523this software and associated documentation files (the "Software"), to deal in
12524the Software without restriction, including without limitation the rights to
12525use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
12526of the Software, and to permit persons to whom the Software is furnished to do
12527so.
12528
12529THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12530IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12531FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12532AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
12533LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
12534OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
12535SOFTWARE.
12536*/